Ultimate Guide to the Cloud Data Lake Engine: Six Criteria to Evaluate Products
By applying interactive SQL querying methods and a consolidated semantic layer to cloud storage, the Cloud Data Lake Engine (CDLE) aspires to deliver performance and efficiency breakthroughs that make the data lake a viable new home from many mainstream business intelligence (BI) workloads.
However, risks can run high. Different processor technologies yield different performance results and varying levels of compute efficiency. Some CDLE offerings require more data assembly and management effort than others. In addition, data teams must carefully determine at the outset how to govern data and maintain open interoperability across heterogeneous environments. This whitepaper provides guidelines to evaluate CDLE offerings based on their ability to deliver on the promise, improving performance, data accessibility and operational efficiency, as compared with earlier generations of the data lake.
Data teams should carefully evaluate how commercial and open source software meets their requirements according to six primary criteria: performance, compute efficiency, data assembly, ease of use, governance, and open interoperability. A CDLE has the potential to slow your decline in data management efficiency and then improve it—provided you select and implement the right technologies along the way.