Leave Your Data Warehouses and ETL Tools Behind With Dremio 2.0

Leave Your Data Warehouses and ETL Tools Behind With Dremio 2.0

Dremio this week released version 2.0 that includes the Dremio Learning Engine, compatibility with external data reflections, additional user access control, and more.

Launched in July 2017, Dremio is a self-service environment that provides business users and data scientists virtual access to any data source, eliminating the need for data marts, cubes, BI extracts, and many ETL tasks. Users can transform and join data to create virtual datasets, which they can query using SQL or any SQL-based BI tool, such as Tableau, Python, and R.

Dremio queries source systems in real time by rewriting SQL queries in the native query language of the source. To accelerate performance, Dremio also can persist source data as Apache Parquet files and then read this data into memory using Apache Arrow. Dremio calls its persistent data sets Reflections.

“Dremio cuts out all the heavy lifting necessary with data warehouses and ETL tools,” says Dave Wells, data management practice head at Eckerson Group, “and creates the fastest path from data to insight.”

Traditional data virtualization is not new, but has had limited adoption. It suffers from poor query performance, places too much load on source systems, and generates complex models that are difficult to query. Most have been restricted to relational databases.

Dremio opens up analytics to any data source. Although it relies on push-down queries to source systems, when performance is an issue, it creates data reflections to relieve source systems of query traffic and increase query performance up to 1000 times, according to Dremio officials. Data administrators can create a business model on top of the source data using Dremio’s virtual datasets to simplify data access for business users.

Dremio 2.0 includes the following:

Starflake Data Reflections

Dremio can now automatically detect star and snowflake schemas in data sources, including Data Lakes on S3, Azure ADLS, and Hadoop. Customers can use Starflake Data Reflections to accelerate their BI and data science workloads without the complexity and cost of loading the data into a proprietary data warehouse.

Dremio Learning Engine

The Dremio Learning Engine (DLE) is a family of capabilities that applies artificial intelligence to make the product easier to use. With version 2.0, the DLE analyzes query patterns across all databases and BI tools and tracks user behavior to learn what data goes well together. This enables Dremio to perform predictive joins and data transforms to bring users additional datasets based on the data each user views.

The DLE also automatically samples data and infers changes to source system schema. And the schema and Dremio metadata catalogue update every time users query a source. Typically, customers cannot derive schema from NoSQL, HBase, and HDFS without HCatalog.

Rule-based Row and Column Level Masking

Dremio has lightweight directory access protocol (LDAP), but managing granularity of access control in all systems is still difficult. This new access control capability sits between any source and any tool, enabling users building a virtual dataset to determine access at the cell level without having to move data.

Henry H. Eckerson

Henry Eckerson covers business intelligence and analytics at Eckerson Group and has a keen interest in artificial intelligence, deep learning, predictive analytics, and cloud data warehousing. When not researching and...

More About Henry H. Eckerson