Showing posts with label north america. Show all posts
Showing posts with label north america. Show all posts

December 13, 2011

Mapping Wikipedia Article Quality in North America

The maps of Wikipedia previously posted on the blog offer useful insights into the geographies of one of the world's largest platforms for user-generated content. They, along with similar visualizations, reiterated some of the massive inequalities in the layers of information that augment our planet.

But not all articles are created equally, and those maps didn't give us much of a sense of the quality of articles. "Quality" is obviously a slippery word and there are infinite ways of measuring it, but for the purposes of this post, we'll crudely use the term to refer to article length (future maps will employ a variety of other metrics).

The maps below visualize this measure of quality within Wikipedia entries -- yellow dots represent the location of relatively short articles in the English version of Wikipedia (e.g. the article on "Bandana, Kentucky"), while red dots indicate the location of relatively long articles (e.g. the articles on the "Republic of Molossia".


The map below displays the same data, but with smaller dots: making it easier to see some of the patterns if you expand the image.


Interestingly, the states with the highest average word counts are New Jersey (966) and Michigan (914). The states with the lowest averages are Delaware (534) and West Virginia (492). The reasons for these rather large differences are unclear.

Are Wikipedians from New Jersey that much more loquacious than their West Virginian counterparts? Or does it just take more words to describe the many dazzling wonders of New Jersey? Or is it something else entirely?

Apart from the obvious and increasingly evident urban bias in these information geographies, we'd certainly welcome your thoughts in explaining some of these patterns.

March 15, 2010

Drunken Maps or Why the Netherlands is the World's Designated Driver

Given the results of our map of Alcohol, Caffeine and Tobacco (particularly Rachel Maddow's pithy "we're a nation of drunks!" comment) we thought it prudent to take a more ahem, sober look at the issue. So today's map is a comparison between the number of user generated placemarks referencing the terms "drunk" and "sober".

Of course the term for drunk varies with language and complicated by the fact that many terms are slangish, "Ich bin blau" (literally I am blue) in German. We don't really understand why blue=drunk either. Then again, why does blue=sad in English? In any case a closer look is clearly warranted.

The global maps seem at first glance to indicate that people are much more interested in documenting drunkenness than sobriety. Must be a lot of college students out there. But there are a number of intriguing patterns. Most interesting is that the European continent contains many more references to sober than drunk when compared to the U.S. which seems awash in a green sea of drunkenness. This is particularly interesting given that alcohol consumption is much more strictly regulated in the U.S. via high drinking ages and various blue laws (again with the blue references) restricting its sale.

World Map of Drunk and Sober

Zooming into the European level, one can see a fair amount of regional variation. The United Kingdom in particular contrasts fairly strongly with the rest of Western Europe. Western Europe itself has a fairly variegated pattern with certain areas such as the Netherlands and Belgium being particularly sober places. On the other hand, the U.K. is blanketed in references to drunk with nary a mention of sober. Given the make-up of the Anglo-American research team of Floatingsheep, this does not come as much of a surprise. But perhaps some further ethnographic participant observation in a range of pubs is warranted. Equally interesting is the steady increase in references to "drunk" (versus sober) as one moves eastward across Europe.

European Map of Drunk and Sober
But it is at the U.S. level that things are particularly compelling. As noted earlier, there are many more references to drunk than sober with a few intriguing exceptions. Most notable is a band of sobriety in Central Iowa (which incidentally seems to correspond to a lower number of bars). While Iowan farmers have always struck us as a particularly sober bunch, some of the other clusters such Southern California, Virginia Beach and Tampa-St Petersburg are a bit more surprising.

North American Map of Drunk and Sober
Given this variation we thought it worthwhile to compare the number of "sober" user generated placemarks to an independent measure of drunk related behavior. A quick search provided us with National Highway Traffic Safety Administration data on traffic fatalities related to drunk driving. Aggregating our point data up to the state level and normalizing each variable by population shows a statistically significant and negative relationship. See the graph below.

Sober References vs. Drunk Driving Fatalities, State Level

In short, the number of user generated placemarks referencing the word "sober" is negatively related to the number of traffic fatalities resulting from drunk drivers. Although there isn't a direct causal relationship between the two (after all, how would the creation of a placemark affect individual decisions about driving?), the existence of a correlation at all is a compelling example of how online and offline human activity can mirror each other.

For those who are interested, there is no correlation between our measure of bars per capita and drunk driving related fatalities. Nor does the number of bars seem to correlate to references to drunk or sober.

And because we know a lot of folk from Wisconsin (ground zero for bars in the U.S.) are likely to read this, Wisconsin ranks right in the middle of states in terms of references to sober or drunk within user generated placemarks as well as drunk driven related fatalities.

So, Rachel, we're not able to reject your characterization of the U.S. as a nation of drunks (at least with this data) but an international comparison to the U.K. does suggest we're a bit more sober than some others. It also suggests that the Netherlands might be the best candidate for the world's designated driver...now if we can just get the keys out of the hands of the usual suspects.


February 24, 2010

The many guns of urban America

God and guns keep us strong
That's what this country was founded on
Well we might as well give up and run
If we let them take our God and guns
-Lynyrd Skynyrd, "God and Guns"
As we have shown in earlier maps (here and here) guns have become a central fixture of the American landscape.

And often proponents of the Second Amendment are associated with a predominantly rural, religious and conservative population as exemplified by the above song lyric. Whether or not this is because rural Americans are 'bitter', the stereotype remains pervasive. However, when we map the number of user-generated Google Maps placemarks mentioning the word "gun", a much different pattern emerges.


Absolute Number of Guns in User-Generated Placemarks



Although the smaller dots peppered throughout the rural United States certainly show that guns maintain a presence in the rural landscape, the highest concentrations of guns in user-generated placemarks are undoubtedly found in the nation's urban centers.

Relative Specialization in Guns in User-Generated Placemarks


By focusing instead on those places with a higher-than-average number of placemarks with the word "gun", the concentration in urban areas becomes more obvious - rural areas are all but wiped off the map of indexed values. A plausible explanation would simply say that the prevalence of guns is more a function of population (more references to guns because there are more people) than of a stylized cultural trait.

Or could the differences in user-generated content been explained, at least in part, by a digital divide between urban and rural Americans? For example, rural Americans could simply be too busy actually using their guns to worry about adding user-generated placemarks to Google Maps? We should also note that the meaning of a reference to the word "gun" in a placemark is not straightforward. In other words, it could be a protest against guns or, alternatively, an affirmation of them.

Unfortunately, we end with an entirely new set of questions and are left clinging to conjecture, just as much of America remains clinging to their guns.

February 10, 2010

Where Users Like to Vacation

Over the past few months, we've published a number of maps showing the automatically- and user-generated online representations of place, from the seedy to the holy to the hoppy. Perhaps you've found yourself thinking, "I'd sure like to go there!", wherever there may be. So where exactly is it that people want to go?

The following maps show the incongruities between these automatically- and user-generated representations of place when searching for "tourism" and "vacation" in Google Maps. The values in each of the four maps were normalized using the national average for each search term, with any points not 20% greater than the average (indexed value >1.2) being excluded. These maps thus specifically show the places in which there is a higher-than-average concentration of placemarks (either user-generated or directory) mentioning the words "tourism" or "vacation".

Tourism: Directory

Tourism: User-Generated

Perhaps the starkest contrast between these maps of tourism is the much smaller number of user-generated placemarks as compared to the automatically-generated directory placemarks, usually drawn from pre-existing sources like the Yellow Pages. In moving from directory to user-generated representations, almost all rural locations disappear from the map, although the vast areas west of the Mississippi River with no information at all show that even some urban areas don't possess larger-than-average amounts of tourism-related information.

Vacation: Directory

Vacation: User-Generated

Shifting our attention to searches for "vacation", it is interesting that in this case, user-generated representations still have considerable coverage across the United States. Moreover user generated references to vacation differ from the "official" map of vacation based on Google Maps directory listings.[1] That is, "vacation" shows up most often in New York City in the Google Maps directory but user-generated representations show that Orlando, Florida, the home of Disney World, is the place to go on your coveted break each year.

God help us all.

Take note as well, that coastal areas all across the United States are prominent in the peer produced constructions of vacation, from the coastal Carolinas and Georgia to the Gulf Coast, and even throughout California, Oregon and Washington. So perhaps there is hope of eluding our mouse overlords after all.

Most importantly, these maps call our attention to the significant variances in how place is perceived online, depending on what measures are being used to represent these constructions. Even if it's possible to dig a hole through the planet on Google Earth, the difference between, and within, places remains as important as ever.

[1] This is also one of the few cases in which the maximum value in a map deviates from one of the nation's largest urban areas.

January 24, 2010

Where do people Make it Rain'?

I make it rain. I make it rain on them.
-Fat Joe featuring Lil' Wayne, "Make it Rain"

No surprises here (except for FloatingSheep's mastery of slang). The folks in Las Vegas make it rain. No, not precipitation. The kind defined by the Urban Dictionary as "When you're in da club with a stack, and you throw the money up in the air at the strippers. The effect is that it seems to be raining money." Indeed.

It shouldn't startle anyone that the largest city in the only US state where prostitution is legal also has the most user-generated references to strip clubs. Contrasting its usual ranking in the urban hierarchy of user-generated geographic information (i.e., somewhere in the middle), Las Vegas is undoubtedly considered by the collective intelligence of the Internet as the place to go to see the clothes come off.

But it is also clear that this phenomenon is national with clusters of strip club reference throughout the U.S. with Florida, Chicago, Detroit, Toronto, Montreal, New York-New Jersey (Bada Bing!) and Portland standing out in particular. Does Las Vegas retain its penchant for seedy entertainment when the raw number of hits are normalized by both the average number of mentions of 'strip clubs' in user-generated placemarks and the relative specialization at each point (values divided by the number of mentions of "1")?

Even when the raw values of user-generated placemarks are normalized by these two measures (with values showing less-than-average specialization excluded), Las Vegas remains the national hotbed for strip clubs by a considerable margin. But what explains the relative prevalence of strip clubs in the area around Aiken, SC? Or most of Connecticut, for that matter?

Clearly further research is needed but that's NOT what we mean. We're more than content to let it remain one of life's little mysteries for now.

January 18, 2010

Rust Belt Bowling

What is one to make of Robert Putnam's now-infamous assertion that despite bowling reaching an all-time high in popularity, it's new found nature as a solitary activity is indicative of a decline in civic engagement, increasing social isolation and alienation amongst Americans? Although it cannot support any definitive conclusions, the relative concentration of listings of bowling alleys in the Google Maps directory[1] tells an interesting story about where this process of social isolation might be taking hold.


The above map shows places in which the number of listings for bowling alleys in a single place exceeds the national average number of listings by 20% (i.e., only indexed values >1.2 are shown). Although some of these places continue to show the dominance of urban areas (the larger a place is the more bowling alleys it might have), this explanation is far from sufficient. The maximum indexed value is located in Southfield, Michigan, a suburb of Detroit, a highly unlikely location, given that listings in Google Maps directory are concentrated in major cities such as New York City, Los Angeles, Chicago and San Francisco. Further inspection shows that much of the activity mirrors the extent of the American Rust Belt, a region formerly known for its dominance in the manufacturing industry, now known more for its collective decline in the face of a severe economic downturn.

So what does this spatial correlation mean? A loose application (and we mean loose) of the theories of Max Weber (the 'iron cage') and Karl Marx (alienation of labor) might show that due to their full integration into the world of capitalist manufacturing, individuals living throughout the Rust Belt have turned to bowling as a refuge from their work lives, or lack thereof. It could be possible however, contra Putnam, that Rust Belt citizens have actually turned to bowling as a way of reconnecting with their community, rather than disengaging from it.

Or (stepping back from the brink of Germanic socio-economic theory) this map could simply highlight the cultural geography of a leisure activity with strong associations to the geography of early to mid 20th century manufacturing centers. Unfortunately for us, however, Google Maps cannot tell us why people bowl or whether they are bowling alone [2]. So for now, we remain wondering whether bowling is indicative of a resurgence of community or growing individualism. Let along the more troubling question of how Wii bowling fits into this.

[1] Google Maps directories are drawn from a range of sources such as yellow page listings. This category is distinct from and excludes user generated placemarks that we use in other maps.

[2] Or at least not until the release of Google BowlCam which is now in beta testing.

December 11, 2009

Finding a Restaurant

Finding a restaurant can be one of the most vexing tasks in modern life and an extremely useful application of Google Maps is getting help locating nearby establishments. The map below shows the number of user-generated placemarks containing the word "restaurant". The density of restaurant references corresponds closely with the distribution of population in the United States and Canada. In particular, the densely populated Northeast is blanketed with New York City containing the largest concentration.
When user generated placemarks are compared to regular Google Maps directory listings one sees essentially the same pattern of clusters, albeit and a higher density. For example, the largest number of directory listings of restaurants (again in New York City) is about 25 percent higher than user generated ones. Moreover, more rural areas (see the eastern U.S.) clearly have a high number of directory listing relative to user generated ones.
This suggests that user generated placemarks are biased towards urban areas where early technology adopters are most likely to dwell and use.

December 10, 2009

Swine flu: a user-generated pandemic?

In a recent post at 538.com, Nate Silver delves into mapping the spatio-temporal diffusion of swine flu in the US, via Google Flu Trends. Drawing from queries referencing swine flu, the map below shows the approximate date at which state-wide searches for "swine flu" crossed a particular threshold, potentially signifying the onset of what has become a swine flu pandemic. According to Silver, the date at which the relative number of searches reaches the indexed value of 5000 serves as a proxy for measuring the diffusion of the year's most talked about genetic mix-up.
So we know when and where people were looking for information about swine flu, but what about geo-references to the virus? How does the geography of swine flu differ between Google Flu Trends and user-generated Google Maps placemarks? How do Google's multiple representations compare to the actual number of cases of swine flu in the United States?

Although the CDC has stopped collecting data on the outbreak of swine flu on a state-by-state basis, the regional-level data in the map above shows the concentration of swine flu cases. The upper Midwest, for example, which has the highest number of swine flu infections in the country, only recently surpassed the 5000 point mark on Google Flu Trends. Clearly the act of searching for information on swine flu need not closely correspond to the number of cases. And while this region shows significant clustering in user-generated Google Maps placemarks, the values fail to approach the maximums for the nation as a whole. The peer produced geography of swine flu also seems to support CDC statistics for the southeastern US (showing a relatively high infection rate), while the Flu Trends data fails to match accordingly both there and along the US-Mexico border.
The greatest number of mentions of swine flu in user-generated placemarks is located in Baltimore, Maryland - part of District 3, which is home to the second-most cases of swine flu in the US. However, as one moves up the DC-Philadelphia-NYC-Boston metropolitan corridor there is an increasing disconnection between the online representations and material reality of swine flu. Although the absolute and population-adjusted number of actual swine flu cases in Regions 1 and 2 (home to Boston and New York respectively) are relatively low compared to other regions, they are highly visible in terms of user generated placemarks references to H1N1 or swine flu.
The population-adjusted map does, however, give a much clearer picture of the swine flu landscape in the US. Both the west coast and upper midwest, despite having the highest incidence of swine flu in the country, were previously overshadowed by the population centers of the east coast. Normalized by population, the placemark density comes to mirror much more closely the actual diffusion of swine flu across the country.

December 08, 2009

Toronto and Cape Cod are the "funnest" places in North America

These maps illustrate the distribution of "fun" in North America as defined by user generated placemarks containing the term. Luckily for society, fun seems to be well dispersed and corresponds with the distribution of population. In other words, where there are people there is also fun. But one can also see concentrations and specializations in fun.


For example, Toronto has a massive (dare we say strategic?) reserve of fun clustered around it. Who knew? I have fond memories of my trips to Toronto but had no idea. The film festival is great, the neighborhoods are fantastic and the underground walkways keep you warm in the winter but how does it all come together to make this mother lode of fun? Jane Jacobs clearly had it right. Perhaps this will become the next invisible export for the region's economy.

Also the Northwest is suspiciously fun. How does that work with all the rain?

Clearly, some means of standardizing "fun" needs to be down to separate the large concentrations from the places that truly specialize in fun. When we use population, i.e., fun per capita, it turns out that Cape Cod, a place outside of Ogden, Utah and Cancun, Mexico have the most fun per person in North America. But before you start planning a vacation to the Great Salt Lake, remember that the high showing outside of Ogden was largely due to a very small population figure.

November 30, 2009

Baptists, bibliophiles, and bibles, Oh My!

Two powerful and often opposing forces within society are faith and reason. Regardless of the extent to which a cultural war exists, the balance between the two (e.g., teaching evolution in the schools, etc.) is a prominent feature of popular socio-political discourse in the United States. Thus, the topics makes a perfect subject of a map and leads us to ask which parts of the country prefer bookstores to bibles? What's the ratio of Baptists to bibliophiles?
Using the number of Google Maps directory listings[1] for "bookstores" and "churches" as proxy values, this visualization maps the spectrum of the faith and reason conflict. As there are an overwhelmingly larger number of churches than bookstores nationwide it is important to index each of these variables before comparison. The technique used in this map was to divide the number of churches (or bookstores) at a location by the national average of churches or bookstores. If a location had twice the number of churches as the national average it would receive an indexed value of 2. Similarly having only 50 percent of the national average of bookstores would produce an indexed value of 0.5. The church index was then divided by the bookstore index to see each locations relative balance of churches to bookstores. If each of the indexed values were the same, the faith-reason index would be equal to 1. But as in the case of the example above (church index = 2, bookstore index = 0.5) the faith-reason index would be 4. This indicates that this particular location has a much higher relative number of churches to bookstores. In order to exclude places that had approximately equal number of churches and bookstores, this map only includes locations where the faith-reason index was skewed more than 20 percent in either direction (i.e., values greater than 1.2).

For the most part, the relative prevalence of bookstores occurs in and around the big cities - Los Angeles, California is the site of the highest indexed value, and is joined by the megalopolis of the eastern seaboard as having the highest concentrations in favor of bookstores. Even cities such as Atlanta, nestled in the Bible Belt of the American southeast, tend towards a relatively large number of bookstores. On the converse, other large cities like Dallas, San Antonio and Houston continue to favor churches, with New Orleans (the largest city in Louisiana) having the highest relative concentration of churches in the nation. Suburban areas surrounding large population centers also show a near-universal favoritism for churches.

So while there appears to be no single variable determining the local trends toward faith or reason, it is evident that even some of the most common assumptions regarding the geographies of faith and reason have proven to be more complicated; not all large cities are necessarily bookish, but neither is the bible belt a homogeneous geographic unit.

[1] Google Maps directories are drawn from a range of sources such as yellow page listings. This category is distinct from and excludes user generated placemarks that we use in other maps.

November 16, 2009

Visualizing the abortion debate

Abortion is a hotly contested political issue in the United States, as it has been since even before the Supreme Court's decision in Roe v. Wade in 1973. Regardless of one's position on the matter, the ongoing debate often lends itself to hyperbole, obscuring the observable facts.In this visualization, the difference between the number of abortion alternatives and abortion providers listed in the Google Maps directory is mapped across the US in quarter degree intervals. The greatest difference in favor of abortion providers is found in New York City, with Los Angeles and Seattle representing a similarly disproportionate number of abortion providers. Similar to some previous maps we've published, this concentration of abortion providers has a strong urban bias. However, there are many cities such as Atlanta, Dallas and Cincinnati which have more abortion alternatives than providers while some rural areas such as upstate New York and Maine have more providers.

Overall, the blue coverage across the United States shows that, in a vast majority of the country, abortion alternatives are much easier to find than abortion providers. So while the "pro-life" camp ended up on the wrong side of the 1973 Supreme Court ruling legalizing abortion, they have built a significant organizational infrastructure which can be leveraged to promote their cause, while "pro-choice" advocates remain concentrated primarily in the nation's more politically progressive urban centers.

November 07, 2009

Where in the world is Barack Obama? (and John McCain, too!)

To follow up on our previous map showing the difference in the number of mentions between Barack Obama and John McCain in user-generated Google Maps content prior to the 2008 US Presidential Election, we figured an alternative visualization might be beneficial. The following maps represent the absolute number of mentions of Obama and McCain, respectively, in user-generated placemarks, a disaggregation of the map in our previous post.
This map, much like the previous iteration, shows the vast concentration of user-generated placemarks mentioning Obama in the nation's urban centers. The nation's largest cities - New York City, Los Angeles and Chicago - all appear prominently in this map. Although many of the notable points in both the Obama and McCain maps can be attributed to the large populations (and thus, presumably, a greater level of connectedness), a number of other explanations remain necessary. Despite being the 3rd largest city in the United States, Chicago is also the home of Barack Obama, and it houses the highest concentration of placemarks that mention his name. Significant events also seem assert their presence spatially, as Denver, Colorado, the site of the 2008 Democratic National Convention, is another relatively well-represented area, along with Portland, Oregon, where 70000+ rallied for Obama in May 2008.
Mirroring the already established pattern of urban primacy, much of McCain's presence is concentrated in the nation's urban centers, again including both New York City and the Washington, DC metro area (where McCain has the highest concentration). Unlike Obama, the places McCain is best represented in Google Maps were not necessarily the places he fared the best during either the primary or general election. For example, both Iowa and Michigan, in which McCain receives a nearly uniform number of mentions across the state, voted against him in both the primary and general elections.

Despite some of these patterns of user-generated content merely confirming the primacy of urban areas in virtual representations of the material world, others depart significantly from the predicted spatial clustering. Some areas that voted for McCain feature more prominently in the user-generated representations for Barack Obama, and vice versa, with the number of mentions for Barack Obama being more than double the number of mentions for John McCain. Although not all of the patterns displayed can be easily attributed to a particular causal factor, they only further complicate the relational geographies of the virtual and material world.

October 17, 2009

Google Mapping the 2008 US Presidential Election

Despite being highly contentious, the 2008 US Presidential Election resulted in an overwhelming electoral college victory by President Barack Obama. This map shows the difference in the number of mentions of Barack Obama and Republican candidate John McCain in user-generated placemarks indexed by Google. This peer-produced representation is remarkably similar to more official cartographic representations of the final election results, with a couple of notable exceptions.

Because placemark concentration is correlated with large urban populations, even the states that overwhelmingly voted for Senator McCain seem to favor Obama. This concentration of placemarks in urban areas show a significant advantage for Obama, mirroring his successes during the election. Another anomaly is the red clustering in New Hampshire, a state in which Obama defeated McCain 54%-45%. However, this cluster can be explained by McCain's momentum-building primary win in the Granite State, which eventually propelled him on to the GOP nomination.

Following J.B. Harley (1988), we should also take interest in the silences of this map. Here the primarily rural areas contain either no user-generated placemark information or an equal number of mentions for both Obama and McCain, but nonetheless appear uniformly devoid of content.

July 10, 2009

The Virtual ‘Bible Belt’

The size of the dots in this map represents the relative number of mentions of the word “church” in placemarks uploaded to Google. Results for the word “church” have been divided by the "0" and "1" baseline measure (see the last two blog posts), thus highlighting the parts of North America in which mentions of the word “church” are over- and under- represented. Interestingly, while the “bible belt” in the physical world is often talked about as being synonymous with the American South, the virtual “bible-belt” additionally incorporates large parts of the Midwest. Less surprising is the fact that the Northeast and the West have relatively low scores. The GeoWeb is in many ways a mirror (albeit a distorted one) of the physical places that it represents.