Adulting and Data Science at Strata NYC 2018

Adulting and Data Science at Strata NYC 2018

With some trepidation, I recently spent a cool, drizzly day, Tuesday, September 11th, in New York City. I was there to attend Strata NYC 2018. At the event I heard some great new ideas as I spoke to the vendors and wandered the show floor at the Jacob Javits center. Overall, it felt like the industry was moving forward from adolescence to adulthood.

What is Adulting?

My children have all graduated college now, and as they have moved on to their first jobs and their first apartments, they regularly inform me when they are ‘adulting’. For them, adulting is when they do the things that they need to do as they grow up that is not quite as much fun as what they got to do when they were in college. Mind you they worked hard in college, but adulting has to do with dealing with the real world. Being an adult.

Strata Reflects the New Focus on Business - Very Adult of Them...

My overall take away from Strata is that our big data and Hadoop friends have graduated from college and have begun adulting. For Strata, this means a shift in focus from technology towards business and other real-world areas like revenue, growth, and profitability. This is a good thing. Big data adulting will result in amazing technology being put to use in real world, mission-critical situations where it will make a big difference.

Here are some cool adulting-related ideas I came across at Strata:

GDPR Mess - Especially in Europe

We all knew GDPR was coming. We all knew it would be expensive. We didn’t know how poor compliance would be.

Talend just released some brilliant research on GDPR compliance. They checked GDPR compliance in 103 companies around the world. They found that 98% of the companies had updated their privacy policies to comply with GDPR but only 30% responded to a personal data request within the required 30 days. Amazingly Europe did worse that than the rest of the world (only 35% compliant in Europe vs. 50% compliant elsewhere). Didn’t this law come from the European Union? It gets worse. 7% of companies confused a GDPR request for an individual’s data (“just show me my data”) with data deletion (the right to be forgotten … oops!). One surveyor had his hotel reservations and profile deleted for an upcoming trip. Maybe this should fall under the label of “not adulting very well just yet…”.

Data Scientists Create the “Noisy Neighbor Problem”

Here’s a real-world problem that is surfacing as data science reaches adulthood. It is the ‘noisy neighbor problem’. I heard the phrase used in reference to data science by some folks at Cloudera. It refers to the phenomenon when predictive model building processes, running on a shared big data infrastructure, overwhelm resources and pull them away from other mission-critical processes. Cloudera provides its Workload XM product to keep these “noisy neighbors” at an appropriate resource level that match their value and keep the CPU and terabyte hungry data scientists from drowning out their neighbors.

Confirming the Identities of 1.2 Billion People

Opening a bank account or getting a personal loan is a clear sign of adulting for many U.S. millennials. Sadly, this has not been so easy in India. Just ten years ago, nearly 30% of the people living in India didn’t have credible proof of their identity and couldn’t open a bank account or even receive government assistance. MapR is now running a biometric database of iris scans, photos, and fingerprints for the Indian government through the Aadhaar project to confirm that a person is who they say they are. They can do it all in under 200 milliseconds. Soon, all Indian millennials will be able to get that checking account as easily as U.S. millennials do.

Providing Devops to Cloud Data Warehouse Modernization

Modernizing your data warehouse and moving it to the cloud can be very complicated even if you are using a complete ecosystem technology solution like Cloudera. Cazena is helping to solve this problem by lending a helping hand in these large modernization projects. Cazena provides services that offload the need for DevOps resources from the customer. Not exactly flashy stuff like deep learning or AI, but perhaps even more critical in the march towards big data adulthood.

Model Liability and Reverse Inventory

I always enjoy my conversations with the folks at Domino Data Lab. In this visit, they offered me two interesting ideas I hadn’t encountered before. The first idea was ‘reverse inventory’ management. We data scientists are always talking about using models to predict the inventory needed in the future to correctly stock a store but the reverse problem is also important: How do you use predictive models to figure out when (and how) to get unsold inventory out of a retail store as profitably as possible?  (e.g. What does Macy’s actually do with all those unsold bathing suits come September?).

The other interesting idea was ‘model liability’. Who is responsible when a model goes bad? If, for instance, a model incorrectly denies credit by discriminating against minorities, is it the model’s fault or the data scientists’? Model liability is just one part of the much larger topic of ethics in data science, machine learning, and AI. In the future, managing model liability will need to include tracking who worked on the model, model lineage, and provenance. Maybe even a whole new class of insurance policies will be needed to protect data scientists and the companies that use their models. How much more adult can you get than that?

Stephen J. Smith

Stephen Smith is a well-respected expert in the fields of data science, predictive analytics and their application in the education, pharmaceutical, healthcare, telecom and finance...

More About Stephen J. Smith

Books by Our Experts