17 Things a CDO Should Know: A Report from MIT’s CDO-IQ Symposium

I recently attended MIT’s Chief Data Officer and Information Quality symposium. It was a well-attended event with some twenty top-tier vendors present as well as, I’d guess, more than two hundred folks interested in information quality and advice and best practices for CDOs (and whether their organization should have a CDO or not).

Here are some words of wisdom inspired by a variety of talks and folks that I heard from at the symposium:

  1. Data can make better strategic decisions. Siddhartha Dalal, the chief data scientist for AIG talked about using AI to prevent disasters and lower risk. He showed that while many engineers had worried about and were opposed to launching the Challenger space shuttle in cold weather in 1986, they were overruled by more optimistic (though less informed) managers who were higher up in the organization. Later research showed that the higher the person was in the organization the more optimistic their opinion became. This may be a social / organizational issue that occurs everywhere whereby being optimistic (not necessarily a bad thing) greatly increases your likelihood to be successful in management positions. Unfortunately the engineers didn’t have a representative at the highest organization levels who could present a data driven opinion (i.e. there was no CDO at NASA at the time).
  2. Don’t ignore data. But even if NASA had had a CDO there was a fundamental error in the data analysis on which NASA based its decision to launch. For reasonable (but incorrect) reasons, the data that decision makers used to model the risk of failure of the O-rings in the Challenger included failures but not successes. The data lacked the inclusion of the successful launches. Without including the successful launch data the temperature and launch success seemed uncorrelated. When the success data was included it was clear: failure was 32 times greater at 31 degrees Fahrenheit than it was at 60 degrees. It was a simple oversight which resulted in catastrophe. It could have been avoided by better data science protocols and best practices.
  3. Artificial Intelligence feeds off of data and it is getting more powerful. Dalal also showed how Artificial Intelligence (AI) could now be used to detect risks in workplace environments by having an AI system analyze photographs and automatically flag dangerous practices (like not correctly securing a ladder). Much of the power of AI and Machine Learning is predicated on big data and unstructured data. The better your data resources the more powerful and useful will be your AI solutions.
  4. Use AI on your data to read research papers. AI was also used to ‘read’ through over twenty million medical and science articles on the PubMed website to look for research that might indicate a particular new drug or chemical might be hazardous. For instance, if the system had been in place before the dangers of asbestos had been recognized, it could have been flagged as dangerous much earlier that it was.  More recently these AI reading systems were able to discover that n-propyl bromide, which is used to clean aircraft wings, caused kidney and liver cancer and that diacetyl (the chemical used to give  microwave popcorn that buttery smell and taste) caused lung disease (affectionately called ‘popcorn lung’). These life-threatening effects were detected by the AI years before regulatory action would normally have be taken.
  5. Don’t forget about IA for your AI. Dalal argued that there was a huge need for “IA” (“Intelligent Application”) when using AI systems. By this he meant that it was important for a human being to be integrated into the process to lead and interpret the results coming out of the AI system. For instance the predictive analytics (AI) that was being utilized on Wall Street before the 2008 crash was correctly implemented but lacked human oversight to recognize that the risk of ‘black swan’ crashes and that the basis of most models hinged on the assertion that housing pricing rise continuously (and the fact that there was incestuous insuring of risk among all wall street players that was unrealistic and unable to provide real diminution of risk). The “Intelligent Application” of common sense to or parental oversight of these brute-force, data-driven AI solutions might have prevented this and other similar errors. As CDO you must never forget that you are fundamentally responsible for the decisions coming out of your AI.
  6. Watch out. GDPR is a game changer for the CDO. The General Data Protection Regulation (GDPR) reared up again to scare the symposium participants. The looming May 2018 enforcement date and the “4% of worldwide revenue” maximum penalty held participants attention but even with experts in the room there were many remaining questions about what GDPR really meant and how it would be implemented. Many CDOs are using GDPR as an external “forcing function” to upgrade many systems within their companies.Things that were difficult to do politically before now become easier with the heavy fine and looming deadline.
  7. GDPR is confusing even for the experts. Some deeper unanswered questions about GDPR that the CDO must wrestle with include: “If you are required to remove all data from a particular consumer from your systems what then do you do if another government agency would like to audit your system?” “How will you remove embedded consumer data that has been stored in a lambda architecture or blockchain ledger where it is critical to these architectures that data is appended and inserted but not updated?” “Could you have your customers consent to an agreement where they waived their rights to GDPR in exchange for use of your site (i.e. a warning that the site was not GDPR compliant and couldn’t be used by those under the GDPR umbrella)?”
  8. GDPR may make your data much less useful. GDPR may have a chilling effect on data scientists as even anonymous access to data may become quite limited. Federated and distributed solutions may be coerced to become more command-and-control / centralized solutions that can be more carefully monitored (e.g. no more pulling down that cool data set from the data lake without first filling out lots o paperwork).
  9. The biggest data problems for a CDO. Barbara Latulippe, Chief Data Governance officer from Dell / EMC was hired to run EIM. Her biggest problems were: 1). Trying to ‘sell’ the value of data governance to the business units. 2). Just finding the data that was needed (70% of her team’s time was spent trying to find the data). 3). Breaking down the data silos between business units (as teams don’t want to share their data since “data is power”).
  10. Focus on the value of your data. The solution at Dell/EMC was to have a ‘value-based’ approach to prioritizing the data that needed to be processed. The first thing the governance team did was to ask “what are the most critical data elements?”. Then they decided to pick no more than 50 columns/fields in each of the areas they focused on initially. The level of governance of the data was then based, in part, on its level of consumption. A key to success is to measure the impact of your efforts based on whether the data is actually being used and consumed by internal customers. If it isn’t then de-prioritize it.
  11. Don’t be afraid to tear down the data warehouses. Several talks concerned moving to a single data lake model and “tearing down the data warehouses” using a federated model.
  12. Corporate mergers create a data mess (and insure lifetime employment for CDOs). Not unlike what I’ve seen in many other companies, one of the biggest challenges faced by CDOs at MIT’s symposium was the data swamp that resulted from corporate mergers and the difficulty it created in comparing similar data across the acquired companies’ databases. Sorry don’t have any words of wisdom here, just know that you are not alone.
  13. Enable the data journey. Your goal should not be to necessarily control the data’s journey but to enable that journey. In the case of Dell/EMC, they took some 2,000 processes and worked together to see which processes would change if a data field were modified. This might require up to 30 conversations with all participants in the data’s journey before everyone agreed on exactly what a single field actually meant.
  14. Don’t boil the ocean. Start with one council, focus on the most important KPIs first, focus on the most important processes, focus on the most used and business-valuable data. When successful, move onto the second most important.
  15. Prioritize the business value. Make the distinction between intrinsic value and business value of a data field. For instance a customer part number has tremendous intrinsic value and it must be error free in the database, but a part number is unlikely to be a key generator of either revenue or profitability.
  16. Don’t overvalue the customer. Don’t just create a customer or data council. Instead think about the data stream and include a representative from each group responsible for the data’s journey from collection to the end consumer.
  17. Go with the flow. Embrace the process. View your role as creating process flows or data flows rather than just static ‘data’. Information value is all about dynamic movement. You will know that you are succeeding when people talk less about the data and more about “business processes involving the data”.
  18. Motivating business units via competition. Consider rallying support with simple competitions between business units. “Which team is providing the most error free data? The most used KPIs?”
  19. Don’t get discouraged.  Remember Amara’s law: “We tend to overestimate the effect of a technology in the short run and underestimate the effect in the long run.” In the long run you will succeed. In the long run the CDO plays a critical role in the health and growth of the business.

Four Years from Now

Four years from now the CDO role is only going to strengthen as challenges from the government (e.g. GDPR) and threats from hackers make protecting and leveraging data a key competitive advantage. A strong CDO and corresponding organization is also a critical and necessary foundation for any organization looking to make theirs a digital business. (This, by the way, is new thinking for me. I wouldn’t have agreed twelve months ago but I think I’ve now come around to embrace Gartner’s prediction that 90% of large companies will have a CDO by 2019.)

If you are a CDO there is some escalating urgency to patch up any leaks in your data processes that might result in an unwelcome visit to the front page of the Wall Street Journal for a data breach or a phone call requesting an audience with whoever becomes the enforcers of GDPR. These coercive ‘sticks’ will drive change in your business whether you like it or not.

But there is also a carrot. Those companies that have done their homework and have a solid data infrastructure will be ready to pass the final exam and then move onto the next level of utilizing machine learning and AI to optimize their businesses. I’d predict that even within four short years we will begin to see companies that have materially hurt their business and also those that have leaped ahead based on how competent and visionary their CDO is.

And finally, ok, yes I am aware that I promised 17 things all CDOs should know and I delivered 19. So let me use that as an example to introduce the twentieth thing a CDO should know: “As always, under-promise and over-deliver.”. 

Good luck.

Stephen J. Smith

Stephen Smith is a well-respected expert in the fields of data science, predictive analytics and their application in the education, pharmaceutical, healthcare, telecom and finance...

More About Stephen J. Smith

Books by Our Experts