From the course: Introduction to Career Skills in Data Analytics

Discovering and interpreting existing data

From the course: Introduction to Career Skills in Data Analytics

Discovering and interpreting existing data

- Have you really thought about how much data is around a person? There's more than you may think. There's data like date of birth, names, race, and ethnicity. There's work data like employee ID, job title, hire date, or department. These data points are the items we think about when we work with data related to people, right. Some of this data is one value, like birthday. It's a value that it is, and it doesn't change. Then there are other items like job title, which might change when you get a new promotion at work. There's also real time data always occurring like heart rate, blood sugar, blood pressure, and even temperature. There's also geographical data like location. Imagine social data as well as what brands we follow, what brands we purchase, how often we have food delivered versus go out to eat. Data is always happening. The challenge we face as data analysts is there's a lot of potential data and not all of it is actually available to us. We also find a lot of the same data is redundant and in some cases can even be incomplete or inaccurate. All of us are seeking the single source of truth from the data that we work with. We actually want it to be accurate when we report on the data. Let me give you some examples. Companies have several different software packages that are used to handle different types of information. And they're often disconnected. There's people management software for HR type information, which is employee data. We have our marketing and sales management data. That's maybe in a couple of different systems and it handles not only staff information in regards to sales, but also customer information. There is also software that kicks in when a customer goes from being in conversations with our sales team to purchasing from the company. That data flows from purchasing to the warehouse. There's also data that flows to the accounting team to handle transactions that support reporting like profit and loss. What this means is that data flows through the organization at different times. Systems are often disconnected so finding which systems have the most accurate information is one of the first challenges. The only way to really know is to begin the investigation and question along the way. We sometimes hit roadblocks due to permissions and the sensitivity of data. For example, the data you might need to confirm your values is stored in the accounting software and only the accounting team has access to that data. Just because you can't directly access it doesn't mean you're done. You can provide them the values and those teams will work to help you validate. In reality, whether systems are connected or not, they should hold the same record of information. If your sales team reports that there's a hundred thousand dollars set to invoice this month, then the accounting software should reflect a hundred thousand dollars worth of invoices. When they don't balance out, you have to figure out where the breakdown has occurred. As a data analyst, you need to be thoughtful of the type of data you might find. And then you have to find the data you do have access to and develop strategies to validate your reports. Just remember data shows up in everything but it's our job to bring it together accurately.

Contents