Hadoop’s Challenge is Different than Linux’s

By Jean-Paul Bergeaux
Chief Technology Officer, SwishData

Last time I wrote about Hadoop, I talked about its challenge to traditional SQL-based databases. I left off with mentioning that some SQL proponents have compared Apache Hadoop to Linux 10 years ago. They assume that Hadoop will find a niche and have little effect on the traditional database core business. However, that might not be the case. Some of the differences make that a potentially inaccurate comparison.

To start with, Linux is much lower in the IT stack at the operating system layer, not the application layer. So Linux could have indeed replaced all of its competition if applications and admins had chosen to use that platform. But Hadoop is a type of application that has specific usecases, allowing advocates to be laser-focused on where it fits well. This could be a significant threat to typical SQL databases because the most costly databases fit right into Hadoop’s sweet spot. More importantly, because it’s a type of application, not an OS, Hadoop doesn’t need the most mainstream commercial off-the-shelf (COTS) applications to port over their products to a new platform and gain momentum. That’s where Linux struggled to compete and why Linux was eventually relegated to a small, specialized segment of the data center.

Probably the best reason Hadoop is more likely to succeed is the cost/performance gap is larger for Hadoop vs. SQL than Linux vs. Windows was. Hadoop offers 10 times the performance of a monolithic database for 10 percent of the cost. Red Hat Enterprise Linux (RHEL) saved money, but the impact was hardly in this magnitude and harder to quantify. RHEL required large deployments to collect big numbers. A single SQL database design, cost and administration can be large enough to see an impact of millions in savings immediately with Hadoop. This is why Oracle makes so much money and why Microsoft (SQL) and IBM (DB2) fight so hard in this space.

Going back to the point about the COTS application conversion problem Linux had, Hadoop’s target is the applications enterprise organizations have already built around SQL databases. Current applications don’t have to be rebuilt to convert to Hadoop. As Facebook’s first implementation of Hadoop proved, Hive can translate the SQL commands to MapReduce/Hbase usable commands — then the store out the back end can go to SQL, Oracle or both.

The one hurdle that Hadoop still faces is that in its purest “commodity shared-nothing” form, Hadoop still isn’t enterprise ready. There are ways to make Hadoop enterprise ready, and they work, but they are not “mainstream” and sometimes outright rejected by the Hadoop community. MapR’s Hadoop distribution is an example of this phenomenon. Hadoop in its basic form does not offer risk-free data storage and computation. There are failure points that need to be address and can be addressed with enterprise infrastructure. However, until this becomes more mainstream, Hadoop will be relegated to special use-cases.

Next time we talk about Hadoop, I’ll discuss some use cases for where it does and does not fit for agencies.

Want to hear more from SwishData? Visit my Data Performance Blog, and follow me on Facebook and Twitter.