Cloudera Targets Data Warehousing

Cloudera is a data warehousing vendor. Say what?
The king of Hadoop and data lakes said last week at its annual influencers’ conference that it plans to become a data warehousing powerhouse. “We plan to disrupt the data warehousing market,” said Mike Olson, chief strategy officer at Cloudera.
The comment came amidst the rollout of Cloudera’s new data platform for machine learning and analytics optimized for the cloud. Cloudera is revamping both its product portfolio and salesforce to sell advanced analytics solutions to business people rather than data infrastructure to IT professionals. Wall Street, which doesn’t like retooling initiatives, recently pummeled Cloudera’s stock, which fell almost 50% in a single day.
“We are on a journey to build the platform that will underpin digital transformation at every large enterprise,” said Tom Reilly, CEO of Cloudera. “Building such a platform takes time.”
To Data Warehousing With Love
Reilly said the analytics pillar supported by Cloudera’s new data platform represents an estimated $12 billion market. Truth be told, the majority of Cloudera’s $370 million in annual revenue already comes from data warehousing workloads, namely ETL offloads, data mart consolidations, and discovery-based analytics. In many ways, the new strategy is simply a recognition by the company that Hadoop and data lakes won’t replace data warehouses.
There are caveats, however. Cloudera partners with some of the largest data warehousing vendors, namely Teradata, Oracle, and Microsoft, whose massively parallel processing (MPP) databases handle complex joins and queries that Cloudera’s Hadoop-based platform currently struggles to support. Thus, enterprise data warehouses (EDW) are out of bounds, according to Cloudera officials. So, too, are “active” data warehouses which update records instead of refreshing them. (Hadoop only appends data.)
But square in Cloudera’s sites are MPP specialists, such as Netezza, Greenplum, Sybase IQ, and Vertica. Cloudera lumps these vendors in the “data mart” category, which it says represents a $3.8 billion opportunity. Cloudera has already helped many companies consolidate dozens, if not, hundreds of “data marts” onto a single Cloudera data platform.
Other prime data warehousing markets in which Cloudera will compete are:
- Data warehouse offloads – Offloading ETL and detailed data from EDWs to avoid costly platform upgrades.
- Discovery data marts – Allow analysts to explore both structured and unstructured data sets using open source SQL (Impala) and search (Solr) technologies built into Cloudera’s platform.
- Operations data marts – Capture sensor, log, and other high-velocity internet-of-things (IoT) data and analyze on the fly using Cloudera’s real-time data environment (Kudo)
Cloud data warehousing
Cloudera is also eyeing the fast-growing market for cloud data warehousing, now dominated by Amazon Web Services with its Redshift database. Amazon is being chased by fast-growing competitors, including upstart Snowflake, Microsoft Azure, and Google Cloud Platform.
Platform as a Service. Despite its name, Cloudera has been slow to embrace the cloud and is now racing to catch up. It recently rolled out a platform-as-a-service (PaaS) called Altus that will provide not only cloud data warehousing (in beta) but also data engineering, and data science (in alpha) in a hybrid cloud environment.
Altus is a grand vision that could potentially coopt competitors because it provides a common set of services (security, governance, lifecycle management, data catalog) across a multiplicity of workloads (data engineering, analytics, and data science) on a single modern, scale-out data platform with multiple storage tiers, spanning the cloud and on-premises systems.
In other words, Altus could provide a scalable, portable data environment that protects customer investments no matter where they move or store their data or what workloads they choose to run against it. It’s also a bulwark against looming privacy regulations, such as the General Data Protection Regulation (GDPR).
Conclusion
The data analytics market is relentlessly innovative and fast-moving. Cloudera, which helped trigger the tidal wave of data innovation, will now have to march in double-time to keep up with the leaders.