From the course: Introduction to Career Skills in Data Analytics

Understanding the importance of data quality

From the course: Introduction to Career Skills in Data Analytics

Understanding the importance of data quality

- As a little girl, I got sick. I mean really sick. My mother immediately took me to the doctor, and they did an x-ray because I had a headache so bad, I'd been sick for two days. The x-ray showed nothing, but the physical signs of the illness, and the blood work were enough for the doctor to send me to the ER. A day or two later, they did another x-ray, and when they did, they discovered why I was so sick. You see, I had a bacterial infection that was unfortunately on its way to my brain. I was hospitalized for 11 days, and then given the right types of treatments to prevent it getting worse, and treatments to help get it better. What does this have to do with data quality? Well, the first x-ray showed nothing, and what they actually discovered is that their machine was broken. Would I have gotten better faster if that first x-ray showed them what the second x-ray did? We'll never know. We can't go back in time. Quality data is data that can be trusted to produce accurate insights so decisions can be made. In my situation, had they waited even longer to do the second x-ray or even sent me home, I would not be here today. Not all data decisions are life or death, but they can have terrible consequences for businesses if data quality is not an everyday part of the culture. It is important for us to all remember as data professionals that people are using data to make decisions, and bad data can mean bad decisions with profound consequences. There are data quality dimensions that you can be aware of as a data analyst. This isn't a complete list of everything you will find for data quality, but here are the four major hallmarks of quality data, complete, consistent, valid and accurate. Completeness of data. Do we have all the data that's needed? Is any of it missing? Is it all usable? Consistency. Is this data in other systems, and is the information consistent across all of them? In other words, does the same record in production system match what we sent to the invoicing system? Validity. Does the data meet the requirements of what we are attempting to do with it? And is it in the right format in which we need to do it? Accuracy. Is it accurate? This is a big one. Is this information accurate? And in my case, it was not. I think it's important that we know quality can be measured, and we can determine if it's complete, consistent, valid and accurate. And if it's not 100%, well, we need to know that. Again, some data means life or death. So data quality at the highest rate is important.

Contents