From the course: Statistics Foundations 3: Using Data Sets

Sample considerations

- Imagine that you're a candidate trying to get elected as mayor of a city with one million people. Your campaign wants to calculate your chances of winning. The election is only one month away, and the campaign has a limited budget. So this means the team doesn't have the time or the money to ask every voter what they think. Believe it or not, this situation occurs more frequently than you think. You want to know everything, but measuring everything is impossible. You can't poll every voter, just like a manufacturer can't measure the quality level of every single cell phone. A farmer can't measure the average size of every tomato. And think about scientists, they can't track the health of every single person in the country. When you can't measure an entire population, you can instead use a sample. A sample is a section or subset of an entire population. And under the right circumstances, a sample can act as a good representative of the entire population. But gathering that good representative sample is tough. Let's look back at our election example. Remember, the city has a population of one million eligible voters but you discover a polling organization took a sample. They discovered that 60% of the people in the sample support your opponent. Only 40% of the sample support you. Before you panic, let's ask some simple questions about that sample. What was the size of the sample? How many of those one million eligible voters were polled? 100, 1,100, 100,000? Small samples often have large margins of error but even a large sample can be flawed. And the selection process is important. Let's say you polled 5,000 people. How are those 5,000 people selected? Were they called on the phone or approached outside of a grocery store? And how many people decline to be surveyed? Also, is it possible the polling organization was biased? Some unethical polling groups try to collect biased data, data that isn't truly representative of the population. And other groups are just sloppy. They want to collect good data but they don't understand basic sampling methodology. Also, how did they measure? Which questions were asked in the poll? It's possible that questions were too complex, confusing, or even misleading. Strangely enough, despite the endless list of sample considerations, the best samples are the ones that are chosen at random. - The simple random sample is the gold standard when it comes to collecting data, but in the world of statistics, nothing comes easy. So let's look at the complex nature of the simple, random sample.

Contents