Sunday, August 12, 2012

Connecting Farmer's Markets Observations to High-tech "Markets"

We usually visit the farmer's market in Sunnyvale on Saturday, but decided to visit the Mountain View farmer's market this morning. A few observations:

Selling tactics:

This farmers' market supposedly opens at 9am. When we showed up early at ~8 (as we usual do for the purpose of morning walk), I decided to get a few items. Here are several responses to my inquiries:

"No we are not quite ready. Why don't you come back?" - Well, maybe...

"Of course I can help you. Excuse us as we're still setting up." - Perfect, I'd be happy to be an early customer.

It's worth pointing out that the opening/closing times serve as guidelines only. The first customers are always there before opening times.

Pricing strategies:

"Mix and match, all the variety you'd like, same price" - Great, easier the pricing model, more consumable the products may be.

Cross-sell and up-sell:

"That will be $2.50, actually $3 with a complementary bunch of radishes. Try something new!" - Thanks for giving me a deal while introducing a new product, and make more money in the process.

Marketing strategies:

One of the booths has fresh vegetables piled up high and I don't recall seeing them at the Sunnyvale farmers market, so I asked: "Do you go to the Saturday market in Sunnyvale?" "No, actually we just come here." - I see, this booth with multiple helpers and streamlined process is selective about where they focus their energies.

While it is interesting to observe who and what are present, it is also interesting to observe who and what are missing. One of the vendors we generally buy from in Sunnyvale was not here. I haven't asked them but it kind of makes sense: that booth stood out in the smaller farmers market setting, but here it would have been lost in the this larger group of vendors.

With these observations and learning from today's farmer's market, I found myself thinking about the applicability of these in high-tech markets: make it as easy for customers to buy as possible, be in a habit to cross-sell and up-sell (naturally and opening new spaces of possibilities for customers), and focus energies on where there's a competitive advantage.

What are you noticing and observing from your daily/regular activities that may be applied to another domain?

Friday, June 22, 2012

Real Time Big Data Analytics at Hadoop Summit

I took the opportunity on June 12 to attend the BigDataCamp. The real-time analytics session of the unconference portion was led by Michael Hummel, CEO of ParStream. It ran less than 30 minutes as the conference center was shutting down for the day.

My take from the short discussion:
* Hadoop is good for adding value to data, not consuming data... it is not designed for "real time"

What is real time? it is predictability, with a "decision time", within the "allowed" time, which begs the question of who actually defines this.

So is real time big data analytics possible? It's not a fantasy according to GigaOM in 2011 (article). Some (e.g. IBM) choose to not use the words "real time" in products/solutions, as there are different definitions. Depending on how "real time" is defined, real-time analytics is entirely possible. Take a look at Jike Chong's notes on different fraud detection use cases from the Feb SF Hadoop User Group here and form your assessment.

Thursday, May 03, 2012

GPUs, Book Reviews and Game Shows!

Why read this book?
Who is the audience?
What is the value?

If you are thinking I'm trying to sell you a book, hang on for just a minute. What I would like to do here is to share a recent experience of an exceptional review of a technical book, or specifically, review of Rob Farber’s CUDA Application Design and Development by Chris Jang.

At the monthly HPC & GPU Supercomputing Group of Silicon Valley Meetup, book reviews are a regular occurrence, and are always placed at the end of the meeting. This tradition started a year ago when NVIDIA gave away two GPU Computing Gems books to the first two volunteers committed to review at least one Chapter in the book and share their learning with the group.

With participants often excitedly talking over pizza and asking question after question to the invited speakers, the meetings always seem to run out of time for book review talks at the end. Thanks to NVIDIA for regular contributions of GPU Computing Gems books, and a continuous flow of volunteer speakers, even with the end-of-meeting rush, reviews of about a dozen chapters have been shared at the meetups thus far.

To clarify, book reviews may sound academic, and it often is the more academic part of the meetings, the materials are mostly based on use cases and have practical examples. For that, most everyone sticks around and maximizes intake of learning at each meeting.

At the Monday April 23rd meetup, the book review was different: granted it was a different book, the review style, content and delivery all stood out. Chris started with the Why-Who-What questions, then shared his observations of three cultures in CUDA Application Design and Development: CUDA technology platform, stream processing on the GPU, and Machine learning, before diving into Acts and Themes, or his way of speaking about different parts of the book.

With this delivery of a comprehensive and fun review, those at the meetup learned that while the book is the whole world of CUDA in one place, it may not serve as a how-to guide, and while the book is helpful to those making technology choices, the various technology cultures may be confusing. The slides are here for those interested.

As this distinct book review talk closed the April meetup on a high note, Jike Chong, the meetup organizer, reminded everyone of the GTC Special meetup on May 15 that is co-located with the GPU Technology Conference. Instead of technical talks, this meetup will feature experience to help Start Your HPC and GPU Supercomputing Meetup Group, followed by What You Always Wanted to Know About GPUs, a Game Show with Raffles!

Since GPU enthusiasts from all over the United States (and even internationally) will be present for the GTC, this meetup will be a great opportunity for people to meet and learn! We also received a tip from a reliable source that there will be a well-known special surprise guest attending! You'll just have to sign up, show up and find out. Seats are filling up fast, so hurry!

Wednesday, February 29, 2012

Big Data vs. Big Compute, Two Sides of the Same Coin?

With Big Data designated as the next frontier for innovation, competition, and productivity by McKinsey, and Gartner selecting it as one of the top 10 strategic technologies for 2012, Big Data is everywhere in the media, and on top of people’s minds. But are Big Data technologies sufficient in addressing “big data” problems? What about the compute-intensive applications that involve processing significant amounts of data? The Silicon Valley HPC/GPU Supercomputing Meetup Group decided to bring the two topics together for a discussion.

“It wasn't what I expected, which turned out to be a good thing.” Commented by an attendee of the recent panel discussion and debate on similarities and differences of Big Data vs. Big Compute. When multiple reviews on the event run over 150 words each, it is clear that the event had struck a chord, and perhaps, invoked some controversy.

Big Data and Big Compute, specifically using GPUs in this case, are not new concepts. Big Data usually refers to datasets too large for database management tools, and GPU computing leverages the parallel nature of graphics processing unit to perform computation in applications traditionally handled by the CPU.

The Meetup organizer and panel moderator, Dr. Jike Chong, offered several interpretations of Big Data and Big Compute from speaking with the four panelists with diverse backgrounds: constraint-based, relative, scalability and historic. The discussion quickly heated up:

Is Big Data vs. Big Compute a wallet problem in an economic context? Depends. Do we know which data is valuable prior to computation taking place?

What are the differences in priority? For big data, it tends to be to explore things rapidly, where as for big compute, there is more focus on optimization and fine tuning of the hardware.

While the focuses of the two approaches appear to be different, both are concerned with throughput. Some may say these are two different ways of solving the same problem, and others (myself included) advocate these to work together to solve big data compute intensive problems.

Steve Scott, NVIDIA Tesla CTO, one of the panelists, puts it well with a summary on GPUs for Big Data:

If it is all data movement, there’s no need for GPU or CPU.

If there’s some serious computing that needs to be done on that data and the problem can be distributed, GPU can help allow more complex analysis.

If the problem has no locality such as in big graph analytics, GPUs may work well in the future.

So where is the convergence and what are the implications? Lots of big compute problem are also big data: reverse time migration in oil and gas, visual search for example. It also turns out a very critical thing for both is power: how do we get less power per productivity? Both Big Data and Big Compute have to optimize, and will be driving that effort in the next five years.

Big Data and Big Compute are not opposing concepts, and the discussion revealed there is more than a perspective, application, or priority difference, there is underlying culture difference of the two camps using these approaches. Going forward, Big data, as “marriage of ‘database’ with compute”, and Big Compute need to take the other side into consideration, as technologies for each can shine where their priorities and interests aligns.

More information can be accessed at the links to slides.

The four panelists come from diverse backgrounds, and both Aaron Kimball and Tim Child had presented to the group prior:

Aaron Kimball, co-founder of WibiData
Steve Scott, Tesla CTO, NVIDIA
Tim Kaldewey, IBM research
Tim Child, Chief Entrepeneur Officer at 3DMashup

Description of Aaron Kimball's talk, Large Scale Machine Learning on Hadoop and HBase, is here, and Tim Child's talk on Postgres is here and was also mentioned here.

Join us next time on March 26 for another exciting discussion in HPC and GPU Supercomputing!

Sunday, January 01, 2012

Will data analytics be bigger and hotter in 2012? You bet!

While Datameer has been focusing on continuing to deliver end-to-end business intelligence platform on Hadoop, there's been a lot of increased interest in (big) data analytics in the rest of the world. This both validates the market potential and creates challenges for other business aspects. Let's take a look at a few examples:

1). Analytics is dubbed by Inc magazine as one of five most competitive areas for talent, which means data analyst and data scientist roles will continue to be some of the hardest jobs to fill in 2012

Analytics is the science of analysis, and it is the application of computer technology, operational research, and statistics to solve problems in business and industry.

As data becomes more accessible, more decisions are made with insights from the data. "Analytics is becoming a central hub across companies where everything (web, marketing, sales, operations) is being measured and each decision is supported by data."

Analytics professionals are in higher demand than ever: Monica Rogati, a founding member of the data science team at LinkedIn, shared at the Strata Conference a graph of analytics and data science job growth, that is exponential in nature, even when properly normalized.

2). Enterprises are putting more emphasis on data analytics, as they seek to better understand their customers' behaviors.

The benefits are easy to see: “Data analysis will drive Intel's future", "EBay acquires data analysis firm Hunch to boost recommendations", "Visa Europe invests in Beyond Analysis for data analytics". Intel, EBay, Visa, and other companies sit on massive data sets, which can be a very rich source for them to better understand patterns and behaviors, and serve as a basis for predictive analysis.

Companies are working on creating a more intuitive and pleasant user and customer experience, which means they must effectively correlate purchasing and consumption patterns (often structured data) and sentiments around the purchases such as social media comments and response to certain interfaces (often unstructured data).

"Analyzing the sheer volume of transactional data is no small task. It could be very easy to get lost and drown in it all" says an EBay executive. Analyzing unstructured behavior to understand and predict behavior is even more challenging.

3). The significance of analytics is showing up more in entertainment and politics, signifying increased popularity as well as the introduction of the field to the general public.

*CNN’s article “How Obama's data-crunching prowess may get him re-elected”
discusses how the Obama campaign is taking data analytics seriously, as predictive modeling/data mining and data-crunching may not just give an edge, but may make the winning difference in a tight race. With 23 million Facebook likes, and 10 million twitter followers, the Obama data crew may know and learn a lot about these “fans”, cater to them, and win their votes.

*In the movie Moneyball, Oakland Athletics GM Billy Beane turned the baseball industry upside down by “using objectivity and data to help pick a baseball team” rather than trusting on gut feels of “experts”. Beane was able to manage the Oakland As to accomplish 5th best regular-season record with player salaries at 24th of 30, using sabermetric principles to run the team cost-effectively.

*The CBS series “Person of Interest” where an ex-CIA hitman partners with a scientist to prevent crimes before they occur by collecting and analyzing data. Unlike the previous two examples, which are from or based on real life, this series exposes the importance of data analytics to solving crimes to the general audience.

What these have in common is data and analytics are becoming popular with non-traditional high-tech, finance, retail industries, as people find easier ways to correlate different types of data, whether structured and unstructured. In doing so, they can answer important and interesting questions that they could not have before to solve real problems: what algorithms can I use to find the best talent? Has there been a rogue trader in house? Which store should be closed and which should be expanded to achieve the best revenue?

These three examples feed off of each other: as more emphasis on analytics drives a higher demand of talent in that field, more activities take place in the media, entertainment, and even politics.

Being able to perform data analytics is one thing, successfully implementing analytics-based strategies is another. I learned from “Money ball” that: 1). Billy Beane and Peter Brand's carefully laid out strategy didn't work when not executed by Art Howe, the Oakland field manager, so Billy had to eliminate some possibilities for Art by trading some players away to "force" Art to follow the strategy on the field. 2). Data analytics and statistics didn't help in finding the help needed and aligning incentives: analytics can detect trends in historic information, but is limited when looking at team dynamics in terms of the team's "state" when winning and when losing.

Luckily as data is collected for the implementation stage, there would be an opportunity to use analytics to identify the issues, and form a solution for the new problem. I consider the examples above indicators that data analytics are getting "even bigger, hotter in 2012", and companies better be armed with the right tool to face this reality… and yes, Datameer can help.