How to Succeed with a Data Catalog

Analyst searching files and folders

Even the finest violin sounds like a dying chicken if played by someone who doesn’t know the instrument. The same is true for technology. It doesn’t matter how nice software is on its own if your organization lacks the processes to use it effectively. When it comes to data catalogs, the potential is enormous, but it’s easy to make mistakes in deployment that will limit the value of your investment. 

Data catalogs have grown in popularity over the last few years. Essentially, they store detailed information about data assets in order to serve a variety of data governance and self-service analytics use cases. Folks such as data stewards and data curators use catalogs to standardize company metrics, manage data access, track lineages, and stay in compliance with privacy laws. Analysts and data scientists rely on them to find the data sets, queries, metrics, models, visualizations, or reports they need. 

Although different catalogs have different bells and whistles, at the end of the day, the metadata they store comes in two flavors—auto-captured and user-added. The first group consists of attributes such as file and field names, cardinality, numbers of rows, and anything else the catalog can determine automatically by crawling data sources. The real value lies in the second group. Users augment data catalogs by providing comments, ratings and custom tags. This information represents previously undocumented tribal knowledge about an organization’s data and makes using a catalog worthwhile for the non-technical business user.  


The metadata catalogs store comes in two flavors—auto-captured and user-added… The real value lies in the second group.


Standard statistics only tell you so much about a data set’s suitability for a given project. In the pre-catalog world, business users relied on informal social networks and their own experience with data sets to find sources for their analyses. Why? Because people can give context for data. Out of five similar tables, they know which one worked best in the past.  A data catalog, ideally, centralizes and standardizes this information, reducing data silos and ensuring consistent quality of metrics throughout the organization. But unless the information in the catalog exceeds what the analyst already knows, they’ll be unlikely to use it. 

Therein lies the catch-22 of data catalogs: To be valuable, analysts have to use them and populate them with tribal knowledge, but to persuade analysts to use them, they already have to be valuable. Three strategies can help your organization overcome this hurdle and speed company-wide adoption of a data catalog.

1. Start small

Build momentum for your catalog deployment by racking up early victories. Pick one or two manageable projects to pilot the catalog, and let those teams begin to populate it with information about the data assets they use. Expand from these initial use cases to adjacent ones that build off the metadata added during the first few. 

Starting piecemeal helps the catalog develop “expertise” in certain areas. Analysts who see it as a valuable resource for one or two parts of the business will be more likely to incorporate the catalog into their workflows. This habit will, in turn, lead them to gradually expand the catalog to other use cases. The strategy also has the benefit of providing a proof of concept. You can more easily achieve widespread adoption later if you can show how well the catalog worked for a specific project. Finally, this tactic helps you to realize a faster return on your investment. Rather than spending months building out the catalog before using it, you can take advantage of it while it continues to grow.

2. Incentivize use

Try making a competition of adding content to the catalog. Reward the departments or individuals who contribute the most reviews or comments. The prize doesn’t need to be huge. It could be something like a catered lunch of the winners’ choice or a prime parking spot in the office lot. Since a catalog in its early days won’t tempt line-of-business users to abandon their old ways on its own, the goal is to create external incentives to use the catalog. 

You won’t need to keep these gimmicks up forever. Once the amount of tribal knowledge in the catalog crosses a certain threshold, the catalog will provide sufficient value to keep analysts coming back. Although there might be some upfront cost, depending on your prize, it will pale in comparison to the wasted investment if your catalog never catches on.

3. Don’t let perfection be the enemy of good

Too often data teams treat catalogs like museums, carefully curating each profile before rolling it out to the business. The real value of the catalog comes through use, so it’s better to deploy it first and clean it up after. The more time spent polishing the catalog before handing it over to analysts, the more time it takes to see an impact. 

That said, data teams must strike a balance between governance and autonomy. They should aim to empower analysts while keeping guard rails in place to prevent redundancies and chaos in the catalog. The goal is to democratize the process without bogging down the system.

Conclusion

Data catalogs can provide a lot of value but only if people use them. These three tips won’t solve every challenge in implementing a data catalog, but they will help you avoid the most common pitfalls. I always love to hear from my readers, so if you know of other strategies that worked at your organization let me know in the comment section below.  

Joe Hilleary

Joe Hilleary is a writer, researcher, and data enthusiast. He believes that we are living through a pivotal moment in the evolution of data technology and is dedicated to...

More About Joe Hilleary