Reflections from the Hadoop Summit

Reflections from the Hadoop Summit

Recently I had the privilege of attending the Hadoop Summit in San Jose, which provided an opportunity to take a good long look at the growing range of offerings and value propositions that make up the big data space. One vendor that particularly caught my attention was an Israeli company called JethroData. As described on their website:

JethroData is an innovative index-based SQL engine that enables interactive BI on Big Data. It fully indexes select datasets on Hadoop HDFS or Amazon S3 – every single column is indexed. Queries use the indexes to access only the data they need instead of performing a full scan, leading to much faster response times and lower system resource utilization. Queries can leverage multiple indexes for better performance – the more you drill down, the faster your queries run.

So here is a hot new technology offering for BI on big data and their secret ingredient is...building indexes? That notion strikes me a especially interesting because of my own experience, going back a decade and a half or so, promoting columnar databases for data warehousing and BI. One of the primary messages in support of that shift was that we were making indexes obsolete. A columnar store, I have many times asserted, is pre-indexed for any query you want to run on it. Indexes are a solution from the past. We don’t need them any more.

And now here’s JethroData touting indexes as the next big thing. What gives?

When hot new technologies appear to be retreads of existing technologies, there are two possible explanations. The first is that in the big data era we are needlessly replaying earlier stages of technological development. In a piece he wrote for TDWI entitled Big Data -- Why the 3Vs Just Don't Make Sense, Steven Swoyer quotes Mark Madsen on this phenomenon:

[Hadoop is] re-evolving exactly like the DB industry: Schema? Check. Indexes? Check. Catalog? Check. Random lookup? Check. SQL-like interface? Check. And so it goes.

We’ll call this the “don’t reinvent the wheel” scenario. But there is another possibility.

Sometimes ideas that were inadequately rendered in old technologies can be more perfectly realized in a new substrate. For example,  SAP’s Irfan Khan tells us that today’s cloud and virtualization movements are not really giving us anything new; they are just finally delivering on the promises that mainframe systems made years ago. Or to provide a more extreme example:

A group of engineers have proposed a novel approach to computing: computers made of billionth-of-a-meter-sized mechanical elements. Their idea combines the modern field of nanoscience with the mechanical engineering principles used to design the earliest computers.

In a recent paper in the New Journal of Physics, the researchers, from the University of Wisconsin-Madison (UWM), describe how such a nanomechanical computer could be designed, built, and put to use.

Their work is a contemporary take on one of the very first computer designs: the “difference engine,” a 15-ton, eight-foot-high mechanical calculator designed by English mathematician and engineer Charles Babbage beginning in 1822. Corresponding UWM scientist Robert Blick said that he was also inspired by the design of a small hand-cranked mechanical calculator invented and sold in the 1950s, the Curta.

We’ll call this the “what goes around, comes around” approach. The question we have to ask about every old and busted idea that comes back claiming to be the new hotness, is whether this is an idea whose time has truly come or whether we have accidentally tuned into a technological re-run? The jury is still out on JethroData, although I have hard time believing that indexes can really be the next big thing for BI in big data environments. On the other hand, if a computing model introduced in the first quarter of the 19th century can not only have relevance, but potentially serve as the foundation for a new generation of computers, I want to be very careful about what I write off.

We shall see.

Phil Bowermaster

With more than 25 years experience analyzing and writing about emerging technologies, Phil Bowermaster focuses on the convergence of information and society as reflected in current developments around Big Data...

More About Phil Bowermaster