Snowflake Aims to Remake the Data Warehousing Market

Snowflake Computing is banking that the future of data warehousing is in the cloud. The startup company has built a SQL database from scratch that is optimized for the cloud and big data. Its unique architecture and data warehouse-as-a-service approach makes it a strong competitor to Amazon Redshift and other up-and-coming cloud data warehousing services.

Snowflake Computing is a California-based company with an innovative cloud data management system called Snowflake Elastic Data Warehouse. The company’s mission is to reinvent the data warehouse, making it more flexible, scalable, agile, and cost effective than current generations of data warehouses. To do this, Snowflake built a SQL database from scratch that is designed to run as a scalable cloud service that handles any type of data and that customers do not have to manage or tune.

Founded in 2012, Snowflake has assembled a top-notch team with more than 100 years of collective experience building databases and 120 patents to their credit.  The company currently has more than 150 employees and more than 350 customers and has garnered $71 million in venture funding. Although Snowflake runs on Amazon Web Services, its primary competitor is Amazon Redshift, a massively parallel database that Amazon has retrofitted for the cloud.

Unique Architecture

Snowflake’s unique multi-cluster shared data architecture delivers a host of benefits. Rather than integrate storage and data processing components like traditional databases, Snowflake separates them so each operates independently and can be individually optimized. Snowflake uses Amazon S3 to store large volumes of data in a proprietary columnar, compressed format. It then spins up virtual compute clusters—or “virtual warehouses”—using Amazon EC2 machine instances—that customers can use to support myriad distinct workloads, such as data marts, development and testing, and data loading.  

Elasticity. This architecture is highly elastic. Customers can spin up new virtual warehouses quickly and then grow or shrink those warehouses on demand as workloads change without having to redistribute data across nodes as in a traditional shared nothing environment, such as Amazon Redshift. Each virtual warehouse transparently caches data on solid state disk drives (SSD) to optimize query performance and minimize data movement.

Open. Business users connect to Snowflake via ODBC, JDBC, a Web user interface, or native connectors. Snowflake currently partners with several leading business intelligence vendors, including MicroStrategy, Tableau, and Looker. Despite its SQL heritage, Snowflake supports semi-structured data whose schema it “infers” on load on a block by block basis. Users can query the data if they know the schema or they can interrogate Snowflake’s metadata via SQL to identify what data elements exist in the loaded data.

Two In One. Because Snowflake stores data on Amazon S3 but doesn’t process it there, customers can use Snowflake as both a data lake to house all their data and a data warehouse to process it. Customers can store an infinite amount of data in S3 and process that data in virtual warehouses without worrying about user concurrency or data distribution issues that can kill performance and availability. Also, Snowflake doesn’t upcharge for S3 storage, making it economical to store large volumes of data there.

Cost Effective. Moreover, this separation of duties makes Snowflake more economical and well suited for high-levels of user concurrency. Customers can scale compute clusters to support any number of concurrent users. In addition, the system can automatically scale within cost parameters defined upfront. Each virtual warehouse gets an optimal amount of processing power and scales dynamically within predefined limits.

DWaaS. Finally, Snowflake operates as a cloud service, eliminating the need for customers to hire database administrators to manage hardware and software or tune databases to optimize performance. Snowflake manages the hardware and database infrastructure, minimizing the need for IT resources and allowing customers to focus on analyzing their data.

 Market Impact

 Many companies struggle with data warehouses that deliver insufficient business value compared to their cost of operation. Some are built on older technology and others are poorly designed. Most have sprawled out of control, making it hard for business users to find relevant data and administrators to update and maintain them.

Companies fraught with these modern data warehouse challenges may benefit from Snowflake. It brings scale out architecture and as-a-service benefits to data warehousing without requiring big investment in technology infrastructure, radical redesign, re-tooling, and re-skilling. Whether the challenge is building a new data warehouse or migrating an existing data warehouse, Snowflake is a practical solution that offers cost savings, easy deployment, and simplified operations.

Snowflake does much more than simple cloud hosting of a data warehouse. It offers a scalable, high-performance complete data warehousing solution — processes as well as data — with ability to handle data of all types, connect to a multitude of data sources, automate most management tasks, and get full advantage from existing SQL skills while accessing semi-structured data.

FURTHER READING:

Transforming a Data Warehouse: How and Why by Laura Madsen

Making Integration Pervasive: The Rise of Cloud Integration Platforms by Julie Hunt

Selecting a Big Data Platform: Building a Data Foundation for the Future by Phil Bowermaster and Wayne Eckerson


Henry H. Eckerson

Henry Eckerson covers business intelligence and analytics at Eckerson Group and has a keen interest in artificial intelligence, deep learning, predictive analytics, and cloud data warehousing. When not researching and...

More About Henry H. Eckerson