Can Dirty Data Lead to Smart Decisions?

Jul 11

Computers are precise, definite, and rule-following. Similarly, data science depends on error-free algorithms, detailed mathematics, and clean data. But does using data to make decisions have to be an exact science?

Not necessarily.

And this is good news since data experts never get handed clean data. Like… ever.

As much as we wish it were that easy, we never get data with every row populated, nothing misspelled, and values categorized and ready to use (i.e., query, report on, graph, feed a statistical model, or train AI)

When your goal is to make vital strategic decisions with the data you have, how do you get "close enough for rock 'n' roll"?

Key Takeaway - Focus less on making your data 100% clean and more on making it trustable.

Here are 3 areas you can focus on to make your stakeholders trust results that come from dirty data.

Transparency: A simple note like “Area code is missing for 15% our customers” builds credibility and helps others understand limitations. Also, talk about how you handled the issue whether you built a cleaning script, ran a manual process, included a data dictionary, or made a list of assumptions.
Prioritize: You don’t need to clean everything. Focus on small improvements in critical areas (e.g. customer IDs, revenue, dates).
Document as you go: Because your solution is custom for this unique situation, in the future, nobody will know how the results were derived if your process isn’t documented.

The goal isn’t perfection, it’s to make your strategy more informed. Clean data is often out of reach, but trustable data is achievable.

Kevin D'Aguiar

Can Dirty Data Lead to Smart Decisions?

Why All the Pushback?