The Next Generation of Data Governance

Traditional data governance practices need to adapt to the realities of today’s data management practices. We need to start with the ABCs of modern governance — Agile, Big Data, and Cloud. Each of these has been in the mainstream for several years, yet most data governance organizations cling to practices of the past.  More recently, self-service analytics and self-service data preparation have challenged the old governance methods.

First, let’s touch on some basic truths about data governance, and then discuss each of the challenges to traditional governance techniques. One fundamental premise provides the foundation for effective data governance: We can’t govern data; we can only govern what people do with data. With that in mind, recognize that data governance has many goals – security, privacy, quality, regulatory compliance, data integration and standardization, metadata reliability, managed data retention and disposal, and actively managed value and risk of data. The relative importance of these goals varies among organizations and for different collections of data. There is no universal prescription for effective data governance.

Agile Data Governance

Can agile and governance coexist? From the perspective of agile projects, governance is often seen as a barrier to speed. From the perspective of governance, agile is frequently viewed as an excuse to bypass formal processes. Yet agile and governance must coexist. Both are needed and neither should be sacrificed to benefit the other. To work together, both agile and governance practices must change.

Governing with agility requires a change in mindset from traditional governance structures and practices to those that enable agile projects. Traditional governance operates as an external entity that exercises controls over projects. Agile data governance is an integral part of agile projects  – internal, not external. Governors participate and collaborate in ways that help the project to succeed while simultaneously accomplishing the goals and purpose of data governance. Some of the practices of agile governance include:

  • Focus on value produced, not methodology and processes. This includes value to the project and enterprise value produced by meeting governance goals.
  • Govern proactively. Introduce constraints as requirements at the beginning of a project instead of seeking remedial action at the end.
  • Strive for policy adoption over policy enforcement. Make it easy to comply with policies, communicate the reasons for and value that is created by the policies.
  • Write brief, concise, clear, and understandable policies. Use simple language that is not ambiguous or subject to interpretation.
  • Include data stewards on project teams. They bring valuable knowledge and are generally great collaborators.
  • Think “governance as a service” instead of “authority and control.”

Big Data Governance

Big Data certainly brings new governance challenges: governing data from external and open sources, questions of data quality and traceability, and more. Big data is difficult because it is diverse, complex, sometimes messy and … well, big. Simply writing policies to govern all of the big data is an enormous task. Adoption and enforcement are even larger efforts.

We need to apply some of the same principles to big data governance that are used to process big data – break large, difficult tasks into smaller more manageable tasks. Don’t attempt to govern all of the data. Identify the data that is privacy sensitive, security sensitive, compliance sensitive, or contains personally identifying information. Take advantage of data discovery tools, because identification as a manual task is impractical. With big data classified for sensitivity to governance objectives, you can take a govern-at-access approach that plays well with schema-on-read. Know the control and logging features of your data cataloging, data access, data preparation, and data analysis tools and make the most of those features. Technology automation is the key to big data governance.

Cloud Data Governance

The implications of off-site data, public cloud environments, and dependency on a service provider bring new governance challenges. Security and privacy are the most frequently discussed cloud data governance issues, but data in the cloud may have real implications for every governance goal. The first step in adapting data governance for the cloud is to understand the fundamental and systemic changes that come with cloud services. Data governance must change perspectives in three areas:

  • From procedural to procedural and commercial: Governance processes that were previously driven by internal policy and procedures must now expand to encompass the impacts of commercial service providers.
  • From enterprise to enterprise and external: Policies, processes, and accountabilities based on data that resides internally and on premises must now expand to include data that resides externally on cloud servers and in commercially hosted databases.
  • From consumer to consumer and service provider: The policies and processes governing how data is accessed and used must expand to address questions of how data is managed by service providers.

Cloud hosted data adds location to the list of governance considerations. Data in the cloud brings new compliance issues for data governance. The European Union’s data protection act, for example, restricts where data can be physically held and requires special permissions for data to reside outside the EU. The US has greater complexity with no single directive. US compliance criteria are based on a combination of legislation and regulation with inconsistencies among states and among industries. For cloud data governance it is important to know which laws and regulations apply in what circumstances, and where the data is physically stored by cloud service providers.

Migrating data from on-premises to the cloud raises data governance questions. What roles, responsibilities, and participation should data governance have in choosing cloud service providers and negotiating service agreements? This is a subjective question for which there is no definitive answer. The right answer depends on many variables including data governance maturity, organizational structure and culture, and working relationships among owners, stewards, and custodians of data. 

Self-Service and Data Governance

The adoption of self-service data preparation and data analysis tools inherently brings organizational changes, with data governance organizations directly affected. Long standing practices of enforcement and control struggle in a self-service world. Traditional governance roles – ownership, stewardship, and custodianship – do not disappear or diminish. But the self-service world calls out for new roles – curation, coaching, and community. Data curators manage the inventory of datasets including catalog, description, and utility. Curators help people who report and analyze to find the data that they need. Data coaches focus on data utilization skills – acquiring, preparing, reporting, analyzing, and visualizing data. Coaches collaborate with business people who use data to help them achieve goals, improve data skills, and become self-reliant. Collectively the owners, stewards, custodians, curators, and coaches form a data governance community. Community is sure to have greater impact than council or committee in a self-service culture.

Rethinking governance controls is also important in a self-service world. Traditional control is implemented as gates that require authentication and authorization. Gating won’t disappear, but we can minimize the frequency and frustration of “closed” gates by adding guides and guardrails. Guides direct people along the right course for appropriate use of data. Guardrails are safety measures to help keep them on track. Guides and guardrails focus on preventing governance policy violations. The purpose of gates is to enforce policy. Prevention is better suited to the self-service world.

What is Next Generation Data Governance?

Next generation data governance is a set of concepts and practices that help to evolve existing governance organizations to better adapt to the complex world of agile projects, big data, cloud implementations, and self-service data and analytics. The evolution includes:

  • A shift from governance hierarchy to governance community
  • A continuum of prevention, intervention, and enforcement with enforcement as an infrequent last resort when prevention and intervention have failed
  • A philosophy of minimalist policy making where a small number of important policies have greater value than a large, complex collection of all-encompassing policies
  • Finding the right balance old-style and new-world governance practices

I invite your thoughts, comments, and experiences. We need to have an extended conversation about the new world of data governance. Here I’ve discussed agile, big data, cloud, and self-service. Data ethics, not discussed here, is perhaps the biggest governance challenge of all – and one where we can no longer put our heads in the sand. Watch for my thoughts on data ethics in a future post.

Dave Wells

Dave Wells is an advisory consultant, educator, and industry analyst dedicated to building meaningful connections throughout the path from data to business value. He works at the intersection of information...

More About Dave Wells