Modernizing Data Management Architecture
Modernizing Data Management Architecture
There has been a technology revolution in recent years that has dramatically changed the way organizations design, deploy, and manage data and analytics systems. Hadoop, NoSQL, and the cloud have ushered in a new era of scale-out, elastic, and real-time computing while new data preparation, catalogs, and analysis tools aided by advanced machine learning and search technologies have radically changed the information supply chain.
The once stable world of data management has become dynamic and volatile. The past decade brought many advances in data management and data-related technologies: big data technologies, cloud computing, data lakes, analytics, self-service, data cataloging, machine learning and much more. Just as an earthquake shakes the foundations of things built on the earth’s surface, this “data quake” has shaken the foundations of data management. The shake up has created fissures in data management architectures and practices—gaps that must be bridged for cohesive data management.
In the post-data-quake world architects must build bridges:
Between data warehouse and data lake approaches.
Between data modeling best practices and NoSQL technologies.
Between rigorous data governance and self-service autonomy.
Between rigid schema and dynamic use cases.
Between optimizing for ease of use and optimizing for maximum analytics opportunities.
… and many more gaps in the practices of data management.
For many organizations, however, the technology revolution outpaced architectural evolution. Today’s data management architectures look a lot like those old data warehousing and BI architectures with new data management concepts and methods patched around the periphery. The need to modernize data management architecture is wide spread, as evidenced by the volume of architecture consulting inquiries we experience at Eckerson Group. Rethinking architecture from the ground up with ability to integrate legacy data management, embrace state-of-the-art methods, and adapt to the future is a complex and imposing job.
Wayne Eckerson describes Ten Characteristics of a Modern Data Architecture that are among the key things that modern data architects must strive to achieve. – smart, adaptable, collaborative, governed, and many more. In addition to Eckerson’s characteristics, modern architecture must be tech-savvy, taking advantage of Hadoop, NoSQL, cloud, and self-service technologies in purposeful ways. The first step is to design, define, and describe a new data management topology such as the example shown in Figure 1, where older data management components are encoded as blue and more recent components are show as orange..
Figure 1 – A Modern Data Management Topology
This architectural model shows an approach to organizing the core set of data that is used for BI and analytics applications. It includes data warehouses—one or more—as part of an enterprise data hub, along with MDM, ODS, data lake, and analytic sandboxes. Note that the data hub is a subset of the data core. In this topology, the hub is defined as data that is curated, profiled, and trusted. The core includes the hub as well as analytic sandboxes and the parts of the data lake don’t meet those criteria.
This model, or any similar model, raises some key questions for data architects:
How will you organize the data core?
What terminology will you use—core and hub or other terms?
How will you define each component and distinguish between them?
What data stores are part of your architecture—warehouse, data lake, ODS, MDM, etc.?
How many data warehouses do you need to include as part of new data architecture?
What is the purpose of each data warehouse? And how does each relate to a data lake and other data stores?
The last two questions on the list are big questions. Many organizations have more than one data warehouse, and they have business users who depend on the data in those warehouses on a daily basis. Thus data warehousing raises more hard questions:
Will you continue to support data warehousing or do you consider the data warehouse to be dead? (Look here and follow the links for some diverse opinions about the future of data warehousing.)
These are but a few of the hard questions for modern data architects. Ultimately, as architects we must stay focused on the goal of data management architecture—to provide accessible and reliable data to those who need it without undue risk of data loss, compromise, or corruption. Architecture must enable data management where:
Data is well suited for the variety of uses and use cases for which it is needed.
Data and data management processes fit gracefully into the business and technical environments in which they are deployed.
Data and data management practices comply with policy, legal, and regulatory constraints.
Data is sustainable and reliable throughout the lifespan for which it has value.
Data is a centerpiece in the age of self-service analytics and digital transformation of business. That raises the stakes for data architecture and for architects everywhere.