Showing posts with label wikipedia. Show all posts
Showing posts with label wikipedia. Show all posts
June 01, 2013
Mapping Controversy in Wikipedia
Wikipedia, the collection of 37 million articles that anyone can edit, is defined by conflict. The ability for anyone to shape this global repository of knowledge inevitable means that we are presented with fascinating, shocking, and often hilarious discussions on the talk pages of articles. Just check out the talk pages of articles about Barack Obama, the Persian Gulf, and Freddie Mercury (or, if you really want to waste an afternoon, dive into Wikipedia's collection of 'lamest edit wars').
So, a natural question for was whether we can model and map the controversiality of Wikipedia articles. Does controversy have distinct geographies? It turns out that it does.
To quantify the controversiality of an article based on its editorial history, we focused on “reverts”, i.e. when an editor undoes another editor’s edit completely. We counted all of the reverts in the history of every article and gave a higher weight to editors that revert each other repeatedly. To validate everything, we measured the classifier against human judgement. If you want to read more about the method check articles by friends of the sheep here or here.
This all allowed us to get a sense of what the most controversial articles in each Wikipedia language editions are. In English, the most controversial article is George W. Bush, followed by Anarchism, followed by Muhammed. Whereas in French, the top-three most controversial articles are Ségolène Royal, UFOs, and Jehovah's Witnesses (we're certain there are some good jokes hiding in the orders of these lists). For the full list of top-10 controversial articles in ten languages, check out our in press chapter on the topic (or look at the complete lists here and an interactive visualisation of Wikipedia conflicts at this link). But the short version is that at the top of the lists in multiple languages we see articles related to religion, politics, and football; i.e. pretty much exactly what you would expect people to be arguing about.
But what about the geography of these controversial articles in different languages? Where do we see the most controversial articles in different languages? Below is the full list of maps that we created:
What do these maps tell us? First, we see an interesting amount of difference between the various language editions of Wikipedia. Some of the smaller Wikipedias have a high-degree of self-focus in articles that are characterized by the greatest degree of conflict (check out some of Brent Hecht's work for more on this). For instance, we see articles with the highest amount of conflict in the Czech and Hebrew Wikipedias being about the Czech Republic and Israel respectively.
Even when looking at large languages that are primarily spoken in more than one country, we are able to see that a significant amount of self-focus occurs (look at the Arabic and Spanish maps of conflict for examples of this).
The interesting exception to this rule is the Middle East. All languages in our sample apart from Hungarian, Romanian, Japanese, and Chinese actually include articles in Israel as some of those characterised by a large amount of conflict.
Also, worth pointing out is the fact that we see significant differences in the geographic topics that generate the most conflict. The articles in Japanese that generate the most conflict are not only all located in Japan (and are all educational institutions). The Portuguese articles that generate the most conflict are similarly all located in Brasil (the world’s largest Portuguese-speaking nation), with four out of the top five conflict scores being about football teams.
Within our sample, we actually only see the English, German, and French Wikipedias with a significant amount of diversity in the topics and patterns of conflict in geographic articles. This probably indicates the less significant role that specific editors and arguments play in these larger encyclopaedias.
Ultimately by visualizing the geography of conflict in Wikipedia, we're able to see both topics that appear to have cross-linguistic resonance (e.g. Arab-Israeli conflict), and those of more narrow interest such as the Islas Malvinas/Falkland islands article in the Spanish Wikipedia.
These maps therefore offer a window into not just the topics that different language communities are interested in, but also the topics that seem worth fighting about.
To read more about conflict and Wikipedia:
Yasseri, Taha, Spoerri, Anselm, Graham, Mark and Kertesz, Janos, (2014) The Most Controversial Topics in Wikipedia: A Multilingual and Geographical Analysis. In: Fichman P., Hara N., editors, Global Wikipedia: International and cross-cultural issues in online collaboration. Scarecrow Press. Available at SSRN.
Graham, M., M. Zook., and A. Boulton. 2012. Augmented Reality in the Urban Environment: contested content and the duplicity of code. Transactions of the Institute of British Geographers. DOI: 10.1111/j.1475-5661.2012.00539.x
Graham, Mark, The Virtual Dimension (2013). Global City Challenges: Debating a Concept, Improving the Practice, M. Acuto and W. Steele. Available at SSRN: http://ssrn.com/abstract=2212824
Yasseri, T., Sumi, R., Rung, A., Kornai, A., and Kertész, J. (2012) Dynamics of conflicts in Wikipedia. PLoS ONE 7(6): e38869.
Török, J., Iñiguez, G., Yasseri, T., San Miguel, M., Kaski, K., and Kertész, J. (2013) Opinions, Conflicts and Consensus: Modeling Social Dynamics in a Collaborative Environment. Physical Review Letters 110 (8).
March 25, 2013
What percentage of edits to English-language Wikipedia articles are from local people?
As part of our on-going efforts to explore the geographies of
participation in Wikipedia, we have calculated the percentage of local
edits to articles about places. In other words, this map illustrates the
percentage of edits about any country that come from people with strong
associations to that country.
So what do these results tell us?
Unsurprisingly, they show that in predominantly English-speaking countries most edits tend to be local. That is, we see that most Wikipedia articles (85%) about the US tend to be written from America, and most articles about the UK are likewise written from the UK (78%). The Philippines (68%) and India (65%) score well in this regard, likely because of role that English plays as an official language in both countries. But why then do we see relatively low numbers is other countries that also have English as an official language, such as Nigeria (16%) or Kenya (9%)?
We also, interestingly, see relatively high local edit percentages from a handful of countries that don't count English as an official language: Finland (50%), Norway (56%), Romania (54%), and Bulgaria (53%).
Then we also observe large parts of the world in which very few English-language descriptions about local places are created about local people. Almost all of Sub-Saharan Africa falls into this category. The key question is whether these data actually tell us anything meaningful. For instance, just because most edits about the United States likely come from the United States does not necessarily mean that those articles are representative, include a diversity of viewpoints, or fail to exclude people, places, and processes.
But the data nonetheless, in a very broad way, do tell a story about voice and representation. Some parts of the world are represented on one of the world's most-used websites predominantly by local people, while others are almost exclusively created by foreigners, something to bear in mind next time you read a Wikipedia article.
December 11, 2012
We're also hiring a researcher in spatial statistics!
In addition to our new position in Internet Geography, we are now also hiring a full-time five-month researcher to study the geographies of user-generated content and participation on Wikipedia. We specifically seek to employ a researcher with experience in quantitative geography or quantitative sociology in order to statistically explain national and sub-national patterns and geographies of Wikipedia articles and editing behaviour.
Across the globe, daily economic, social and political activities increasingly revolve around the use of social content on the Internet. This user-generated content influences our understandings of, and interactions with, our social environment. Despite rapid increase in Internet access, there are indications that many people remain largely absent from websites and services, and many voices are absent from important platforms of information.
We explore this phenomenon through one of the world's most visible and most accessed source of content: Wikipedia. This project will employ a range of (primarily quantitative) methods to assess, explain, and model the variable levels of access, participation and representation on Wikipedia.
Candidates should have a keen interest in platforms of peer-production and the geographies of online participation. We welcome applications from candidates with a background in statistical methods, a strong record of scholarly research, and a desire to co-author academic publications.
Based at the Oxford Internet Institute, this position is available immediately for five months in the first instance, with the possibility of renewal thereafter funding permitting.
Applications for this vacancy are to be made online. To apply for this role and for further details, including a job description and selection criteria, please click on the following link: https://www.recruit.ox.ac.uk/pls/hrisliverecruit/erq_jobspec_version_4.jobspec?p_id=105871
Only applications received before 12:00 midday on 14th January 2013 can be considered. Interviews for those short-listed are currently planned to take place in the week commencing 21st January 2013.
Please also feel free to get in touch with any questions about the position.
Labels:
job,
statistics,
wikipedia
May 14, 2012
Mapping Wikipedia edits from Europe
Time for a few more maps from our database of Wikipedia edits (which tells us how many contributions to the encyclopedia originate in each country). In the maps of Europe below, the height of each country represents the number of edits originating in that place. The shading indicated the number of edits per Internet user (darker reds meaning higher per capita participation).
Interestingly though, Germany and UK have fairly low participation rates when normalised by Internet population. Internet users in Italy, Scandinavia, the Baltic States, and even Ukraine are more likely to make an edit to Wikipedia than their British or German counterparts.
Also notable are the relatively low (total and relative) participation rates from Portugal and Poland.
May 08, 2012
Hiring a part-time research assistant to do statistical, spatial, and social analysis
Mark is hiring a part-time Research Assistant to carry out research into the geography and social structure of Wikipedia in the Middle East and North Africa through large-scale data analysis. The position will involve the analysis of the corpus of Wikipedia text, user-pages and history files and the use of statistical techniques to explain spatial and social patterns. Our research question focuses on patterns of representation on Wikipedia as well as an articulation of patterns of conflict and barriers to participation.
The successful candidate will manage and perform queries on a large database, statistically and geographically analyse and visualise results, explore alternate methods to answer the project's core research questions, and assist in writing academic papers and technical reports.
Essential attributes:
• A graduate degree or postgraduate training in quantitative social science. Preference will be given to candidates in geography or sociology;
• Experience with statistical modeling, particularly regression analysis;
• Experience working with databases and large datasets (i.e. N > 1 million);
• Familiarity with GIS software;
• Familiarity with social network analysis software;
• Ability to work autonomously and be creative in the ways that you answer research questions.
Desirable attributes:
• Experience visualising statistical, social networks and geographic data;
• Experience with text mining;
• Experience writing for an academic audience (i.e. journal articles and book chapters);
• Interest in and enthusiasm for the work of the OII;
• Experience working with publicly available secondary datasets.
The deadline for applications is June 1. Please get in touch with Mark if you have any questions.
(link to apply is here)
Links to our relevant Wikipedia projects:
http://www.oii.ox.ac.uk/research/projects/?id=66
http://www.oii.ox.ac.uk/research/projects/?id=70
The successful candidate will manage and perform queries on a large database, statistically and geographically analyse and visualise results, explore alternate methods to answer the project's core research questions, and assist in writing academic papers and technical reports.
Essential attributes:
• A graduate degree or postgraduate training in quantitative social science. Preference will be given to candidates in geography or sociology;
• Experience with statistical modeling, particularly regression analysis;
• Experience working with databases and large datasets (i.e. N > 1 million);
• Familiarity with GIS software;
• Familiarity with social network analysis software;
• Ability to work autonomously and be creative in the ways that you answer research questions.
Desirable attributes:
• Experience visualising statistical, social networks and geographic data;
• Experience with text mining;
• Experience writing for an academic audience (i.e. journal articles and book chapters);
• Interest in and enthusiasm for the work of the OII;
• Experience working with publicly available secondary datasets.
The deadline for applications is June 1. Please get in touch with Mark if you have any questions.
(link to apply is here)
Links to our relevant Wikipedia projects:
http://www.oii.ox.ac.uk/research/projects/?id=66
http://www.oii.ox.ac.uk/research/projects/?id=70
Random map from the project:
April 20, 2012
O Mundo Pela Wikipédia
Some of our work just got picked up by the Brazilian magazine Exame.
The spread offers an alternate visualisation to the data that we're collecting about the geographies of Wikipedia. It also includes penguins. None of us speak Portuguese, so we're not sure what the penguins have to do with Wikipedia. But, being the purveyors of sheep that we are, who are we to talk?
A PDF of the piece is here.
The spread offers an alternate visualisation to the data that we're collecting about the geographies of Wikipedia. It also includes penguins. None of us speak Portuguese, so we're not sure what the penguins have to do with Wikipedia. But, being the purveyors of sheep that we are, who are we to talk?
A PDF of the piece is here.
Labels:
wikipedia
April 03, 2012
A new tool to explore the geography of Wikipedia
We know by now that all online platforms have distinct, and highly uneven, geographies. Wikipedia is no exception. The team at TraceMedia, as well as the Oxford Internet Institute's Mark Graham and Bernie Hogan, therefore decided to make a tool that would allow people to explore what, and where, is represented in the world's most popular encyclopedia.
The tool is built as part of Mark and Bernie's project to study participation and representation on Wikipedia in the Middle East and North Africa. It currently allows you to explore the geography of all geotagged Wikipedia articles in Arabic, Egyptian Arabic, English, Farsi, French, Hebrew and Swahili. It also allows mapping of a range of metrics including the word count of an article, date created, number of authors, and number of images.
A few screenshots of the tool are below. You can also read more about how it was built, or simply start playing. The tool is still work in progress, and there is a lot to add and fix, but we hope it is useful in the meantime!
Labels:
wikipedia
February 01, 2012
Open invitation to a workshop in Amman: Middle Eastern Participation and Presence in Wikipedia
Your voice matters. Come and share your experience and opinions about Wikipedia with other Wikipedians, wiki producers, researchers, and representatives from the Wikimedia Foundation during a two-day workshop.
The goal of the workshop is to talk about and understand the most significant barriers to participation in Wikipedia in the Middle East and North Africa. As such, we would love to hear from you if you meet any of the following criteria:
- A Wikipedian who edits Arabic Wikipedia
- A Wikipedian who edits Wikipedia (in any languages) on articles about the Middle East
- Someone who translates articles between any of the following language versions in Wikipedia: Arabic, Egyptian Arabic, English, French, Hebrew, Persian.
- Someone who is eager to get more involved with the project, and would like to meet people with similar ambitions.
- Someone that would like to give a short talk or presentation to other Wikipedians from the region (e.g. about conflict or marginalization, barriers to participation, and circumvention strategies and tools).
The workshop will have limited space available, so we ask everyone to submit a one page letter detailing why your participation will benefit Wikipedia, the goals of the workshop, and your personal development as a contributor to Wikipedia.
Sessions and conversations will be held simultaneously in Arabic and English, and you will only need to be fluent in one of these languages to participate.
In order to facilitate participation, we have a small number of scholarships available that will support travel to (and in some cases accommodation in) Amman.
Please email Dr. Ilhem Allagui at ilhemallagui@hotmail.com and express your interest in joining this workshop. Please discuss your experience and how involved are you with Arabic Wikipedia, you may be eligible to a travel grant to attend this workshop.
Workshop location: Jordan Media Institute- Amman, Jordan
Workshop dates: April 11-12, 2012
More information about this project at: http://www.oii.ox.ac.uk/research/projects/?id=70
Workshop organisers:
Mark Graham (University of Oxford)
Bernie Hogan (University of Oxford)
Ilhem Allagui (American University of Sharjah)
A flyer that you disseminate to interested parties is available here. And a version in Arabic is here.
دعوة للمشاركة في ورشة عمل بخصوص ويكيبيديا
دعوة للمشاركة في ورشة عمل بخصوص ويكيبيديا
تسرنا دعوتكم لورشة عمل بخصوص ويكيبيديا لمدة يومين و تضم ثلة من الباحثين و ممثلي مؤسسة وكيميديا، نتبادل خلالها الأفكار و الخبرات حول ويكيبيديا بمشاركة خبراء و منتجين و مهتمين بشأن ويكيبيديا.
الغاية من هذه الورشة هي تبادل الآراء و فهم أهم العقبات و الحواجز التي تحول دون المشاركة في تطوير ويكيبيديا العربية.
للمشاركة يجب ان تتقن اللغة العربية أو الإنجليزية كما يجب أن تتوفر لديك أحد الشروط التالية:
أن تكون :
محررا لويكيبيديا العربية ،
محررا لويكيبيديا (أي لغة من اللغات) و تكتب مقالات حول الشرق الأوسط.
مترجما لمقالات بأي من اللغات التالية: العربية الفصحى، العربية باللهجة المصرية، الإنجليزية، الفرنسية، العبرية و الفارسية.
راغبا في المساهمة الفعالة بخصوص تطوير ويكيبيديا عربي.
عازما على تبادل أفكارك مع الحضور من خلال تقديم محاضرة بخصوص المواضيع التي من ضمنها
الصراع و التهميش على ويكيبيديا.
الحواجز التي تحول دون المشاركة في تطوير ويكيبيديا عربي.
استراتيجيات و أدوات التحايل على المشاركة في ويكيبيديا عربي.
منضمو الورشة:
الدكتور مارك قراهام، معهد الأنترنت بأكسفورد-المملكة المتحدة.
الدكتور برني هوقان، معهد الأنترنت بأكسفورد-المملكة المتحدة.
الدكتورة الهام العلاقي، الجامعة الأمريكية بالشارقة- الإمارات العربية المتحدة.
مكان الورشة: معهد الإعلام بالأردن - عمان.
تاريخ الورشة: 11 و 12 أبريل/ نيسان 2012.
يجدر العلم بتوفر منح المشاركة لدعم مصاريف السفر و الإقامة بمكان الورشة بعمان الأردن (حسب الحالات). كما يجدر التنويه بأن الأماكن محدودة، وعلى الراغبين في المشاركة أن يرسلوا طلب (صفحة واحدة) في أقرب وقت ممكن و قبل 10 مارس/ آذار 2012 يتضمن عرضا توضيحيا بخصوص مساهمتكم في اثراء ورشة العمل حسب ما تقدم ذكره.
للمزيد من المعلومات و التسجيل يرجى الاتصال بالدكتورة إلهام العلاقي عن طريق البريد الإلكتروني ilhemallagui@hotmail.com
December 13, 2011
Mapping Wikipedia Article Quality in North America
The maps of Wikipedia previously posted on the blog offer useful insights into the geographies of one of the world's largest platforms for user-generated content. They, along with similar visualizations, reiterated some of the massive inequalities in the layers of information that augment our planet.
But not all articles are created equally, and those maps didn't give us much of a sense of the quality of articles. "Quality" is obviously a slippery word and there are infinite ways of measuring it, but for the purposes of this post, we'll crudely use the term to refer to article length (future maps will employ a variety of other metrics).
But not all articles are created equally, and those maps didn't give us much of a sense of the quality of articles. "Quality" is obviously a slippery word and there are infinite ways of measuring it, but for the purposes of this post, we'll crudely use the term to refer to article length (future maps will employ a variety of other metrics).
The maps below visualize this measure of quality within Wikipedia entries -- yellow dots represent the location of relatively short articles in the English version of Wikipedia (e.g. the article on "Bandana, Kentucky"), while red dots indicate the location of relatively long articles (e.g. the articles on the "Republic of Molossia".
The map below displays the same data, but with smaller dots: making it easier to see some of the patterns if you expand the image.
Interestingly, the states with the highest average word counts are New Jersey (966) and Michigan (914). The states with the lowest averages are Delaware (534) and West Virginia (492). The reasons for these rather large differences are unclear.
Are Wikipedians from New Jersey that much more loquacious than their West Virginian counterparts? Or does it just take more words to describe the many dazzling wonders of New Jersey? Or is it something else entirely?
Apart from the obvious and increasingly evident urban bias in these information geographies, we'd certainly welcome your thoughts in explaining some of these patterns.
December 05, 2011
Malamanteau and the Floatingsheep tribute to Wikipedia
Despite the fact that FloatingSheep does not exist on Wikipedia, we love the project. And as a testament to our love for the encyclopedia, we wanted to put together a list of our favourite articles. What emerged is that we don't just like geography-related info, such as the article on places with fewer than 10 residents or Bir Tawil (one of the few places on Earth not claimed by any country), but also more, er, esoteric subjects.
Here's our list. Bonus points to anyone that can combine them all into a poem.
Malamanteau
Zorbing
Global Orgasm
Infinite Monkey Theorem
Accessory breast
and, last but not least, the Enumclaw horse sex case.
We promise many many maps of Wikipedia soon (none of which will unfortunately involve any of these terms).
November 14, 2011
Mapping Wikipedia Globally
Wikipedia is an incredibly impressive coming-together of human labour on a scale that the world rarely sees. Over the last few years, we've also seen a few maps of the encyclopedia (including some work on this blog) which have shown that the project is far from complete (whatever that might mean).
Each one of these yellow dots represents human effort that has gone into describing some aspect of a place. The density of this layer of information over some parts of the world is astounding. Some of our future posts will look more closely at measures of inequality in Wikipedia, but it is still hard not to be awed by this cloud of information about hundreds of thousands of events and places around the globe.What we can also do is compare the English Wikipedia to the Arabic, French, Hebrew, and Swahili versions (these languages are chosen because they are the subject of the research project mentioned above).
This map should be interpreted with caution for a few reasons. First, it only displays content from six Wikipedias (there are currently 282 of them). Second, many articles in multiple languages appear in the same place. The reason for this is that they are articles about the same feature, event, or place: albeit in different languages. This means that when mapping those features, the dots in each language will show up on the map in exactly the same place. As such, we get a lot of overlapping dots. And dots that higher up in the legend will then necessarily show up on top of others.
The map still remains useful to show some of the different geographical foci of different linguistic groups. In Iran, for instance, there are more articles in Persian than any other languages in our sample. We see more articles about Quebec and parts of North Africa in French, and then a complicated mix of Arabic, Hebrew, English and French in the Levant.Nonetheless it remains that there are far more English language articles than articles in any other language. As such, it remains that if your primary free source of information about the world is the Persian or Arabic or Hebrew Wikipedia, then the world inevitably looks very different to you than if you were accessing knowledge through the English Wikipedia. There are far more absences and many parts of the world simply don't exist in the representations that are available to you.
That doesn't mean we should stop mapping the project though, and as part of a multi-year project to study Wikipedia in the Middle East, North Africa, and East Africa, we present this global-scale maps of every article in the November 2011 version of the English Wikipedia.
The English encyclopedia is by far the largest, and currently hosts almost 700,000 geotagged articles (click on the image for a larger and more detailed version):
Each one of these yellow dots represents human effort that has gone into describing some aspect of a place. The density of this layer of information over some parts of the world is astounding. Some of our future posts will look more closely at measures of inequality in Wikipedia, but it is still hard not to be awed by this cloud of information about hundreds of thousands of events and places around the globe.What we can also do is compare the English Wikipedia to the Arabic, French, Hebrew, and Swahili versions (these languages are chosen because they are the subject of the research project mentioned above).
The map still remains useful to show some of the different geographical foci of different linguistic groups. In Iran, for instance, there are more articles in Persian than any other languages in our sample. We see more articles about Quebec and parts of North Africa in French, and then a complicated mix of Arabic, Hebrew, English and French in the Levant.Nonetheless it remains that there are far more English language articles than articles in any other language. As such, it remains that if your primary free source of information about the world is the Persian or Arabic or Hebrew Wikipedia, then the world inevitably looks very different to you than if you were accessing knowledge through the English Wikipedia. There are far more absences and many parts of the world simply don't exist in the representations that are available to you.
April 04, 2011
What's up with Montana? Comparing Google and Wikipedia in the US
As mentioned in an earlier post we're starting to have some fun with cartogram representations of geoweb data. For those who have forgotten, cartograms distort geographical areas based on the proportional value of some characteristic.
In the two cartograms below the characteristics used to determine size are (1) Google Maps placemarks and (2) the total number of geotagged Wikipedia articles. The distortion was done at the county level and include the 48 lower continental U.S. states. The coloration represents the relative number of geotags/placemarks by population. This gives a better understanding of the distribution of geotags/placemarks both by population and by area.
While many of the results are expected -- California is bursting with geoweb goodness no matter what the measure -- there are some intriguing differences between the distribution of wikipedia and Google Maps placemarks.
For example, Texas, Florida and North Carolina are bulging with placemarks but slim tremendously when you consider wikipedia entries. In contrast, New York and Vermont seem to have proportionally more wikipedia than Google Maps placemarks.
But the biggest contrast between these measures is Montana whose size balloons tremendously when you move from placemarks to wikipedia entries. We're really not sure what's going on with Montana and so invite folks to take a closer look. We suspect it has to do with someone (or perhaps some automated bots) who were/are extremely dedicated to documenting EVERYTHING in Montana. Interestingly this dedication does not extend to the neighboring states of North and South Dakota or to creating placemark entries for use in Google Maps.
In any case, these cartograms and the case of Montana highlights how diverse each digital layer within any place's cyberscape can be.
UPDATE: Thanks to commenter Mongo for pointing us to the page for the WikiProject Montana, where questions emanating from this blog post have uncovered that a couple of diligent Wikipedians (one of them being Mongo) have been geotagging all kinds of stuff out in the Big Sky country. So thanks for passing the info along and proving our hypothesis about the bots to be wrong!
In the two cartograms below the characteristics used to determine size are (1) Google Maps placemarks and (2) the total number of geotagged Wikipedia articles. The distortion was done at the county level and include the 48 lower continental U.S. states. The coloration represents the relative number of geotags/placemarks by population. This gives a better understanding of the distribution of geotags/placemarks both by population and by area.
While many of the results are expected -- California is bursting with geoweb goodness no matter what the measure -- there are some intriguing differences between the distribution of wikipedia and Google Maps placemarks.
For example, Texas, Florida and North Carolina are bulging with placemarks but slim tremendously when you consider wikipedia entries. In contrast, New York and Vermont seem to have proportionally more wikipedia than Google Maps placemarks.
But the biggest contrast between these measures is Montana whose size balloons tremendously when you move from placemarks to wikipedia entries. We're really not sure what's going on with Montana and so invite folks to take a closer look. We suspect it has to do with someone (or perhaps some automated bots) who were/are extremely dedicated to documenting EVERYTHING in Montana. Interestingly this dedication does not extend to the neighboring states of North and South Dakota or to creating placemark entries for use in Google Maps.
Wikipedia Entries in Google Maps
In any case, these cartograms and the case of Montana highlights how diverse each digital layer within any place's cyberscape can be.
UPDATE: Thanks to commenter Mongo for pointing us to the page for the WikiProject Montana, where questions emanating from this blog post have uncovered that a couple of diligent Wikipedians (one of them being Mongo) have been geotagging all kinds of stuff out in the Big Sky country. So thanks for passing the info along and proving our hypothesis about the bots to be wrong!
March 22, 2011
Heatmap of Wikipedia articles: the concentrated geographies of history
Gareth Lloyd has put together a brilliant visualisation of all geotagged Wikipedia articles.
Even more fascinating is this video, showing the data mapped out over time and space:
Even more fascinating is this video, showing the data mapped out over time and space:
There are, unsurprisingly enough, quite similar patterns to those found in the maps that we made of Wikipedia biographies mapped out by century. Early concentrations in the Mediterranean, and then an explosion of interest in the rest of the world in the last few centuries. This data gives us a fascinating insight into just how spatially concentrated our knowledge of history is.
February 03, 2011
Wikipedia Demographics
We've written a fair amount about the geographic and linguistic clusters of Wikipedia authors but were reminded today (via New York Times "Room for Debate" forum") that there are plenty of other clusters along social and economic dimensions. Last year a survey of Wikipedia users was conducted which highlights some interesting fissures within the user group.
One of the most provocative findings (and the one highlighted by the New York Times forum) is that less than 15 percent of the regular contributors to Wikipedia are women. This really grabs one's attention but a closer look at the data report (see also here and here) makes us wonder if this figure accurately reflects the Wikipedia community. Some of the questions are:
One of the most provocative findings (and the one highlighted by the New York Times forum) is that less than 15 percent of the regular contributors to Wikipedia are women. This really grabs one's attention but a closer look at the data report (see also here and here) makes us wonder if this figure accurately reflects the Wikipedia community. Some of the questions are:
- What was the sampling method used? Nothing is listed in the reports.
- What is the bias in the sample? For example, Russia and Russian speakers are the largest language and country groups represented in the survey even though the Russian section of Wikipedia is only the 8th largest linguistic group. (English, German, French, Italian, Polish, Japanese and Spanish are all larger).
- Did women have a lower participation rate then men in the survey? There were three times as many male respondents as female respondents. Does this accurately reflect the makeup of the Wikipedia audience? Given the unexpected results for language and country, it is not clear if there might be gender bias as well.
January 10, 2011
Hiring Part-Time Research Assistant to work at the Oxford Internet Institute
Mark Graham and Bernie Hogan are hiring a part-time Research Assistant to work on our Wikipedia mapping project. Details are below. Please forward widely and get in touch if you have any questions.
Grade 6: Salary £25,751 - £30,747 p.a. (pro rata)
We are a leading international research and policy Institute looking for a part-time (50% FTE) Research Assistant to work on a range of programming and database administrative tasks on a Wikipedia-related research projects with Drs Mark Graham and Bernie Hogan. The current offer is for a half time position with a likelihood of expansion to full time, funding permitted.
The research will involve a substantial array of computer science skills applied to questions of social science interest. The application does not necessarily need to have social science training, but should be interested in how contemporary technologies can address new and novel research questions.
This part-time post (50%FTE) is available immediately for 12 months in the first instance, with the possibility of renewal thereafter funding permitting. Some flexibility over the number of hours worked per week may be possible.
Download: Application Pack for Part-Time Research Assistant
The closing date for applications is 12:00 GMT on Thursday 27 January 2011. Interviews are currently planned for Monday 7 February 2011.
Grade 6: Salary £25,751 - £30,747 p.a. (pro rata)
We are a leading international research and policy Institute looking for a part-time (50% FTE) Research Assistant to work on a range of programming and database administrative tasks on a Wikipedia-related research projects with Drs Mark Graham and Bernie Hogan. The current offer is for a half time position with a likelihood of expansion to full time, funding permitted.
The research will involve a substantial array of computer science skills applied to questions of social science interest. The application does not necessarily need to have social science training, but should be interested in how contemporary technologies can address new and novel research questions.
This part-time post (50%FTE) is available immediately for 12 months in the first instance, with the possibility of renewal thereafter funding permitting. Some flexibility over the number of hours worked per week may be possible.
Download: Application Pack for Part-Time Research Assistant
The closing date for applications is 12:00 GMT on Thursday 27 January 2011. Interviews are currently planned for Monday 7 February 2011.
November 30, 2010
Geographies of Wikipedia in the UK
After a lot of data cleaning and number crunching, we are able to present the following three maps of the geographies of Wikipedia in the UK using brand new November 2010 data. Looking at the first map (total number of articles in each district), we see some interesting patterns. With a few exceptions, it is rural districts in Scotland, Wales and the North of England that are characterised by the highest density of articles.
What we're likely picking up on is the fact that large districts simply have more potential stuff to write about. If we normalise the map by area we see an entirely different pattern. The map below displays the number of articles per square KM.
We see that most of the large urban conurbations in the UK are covered by a dense layer of articles. Most sparsely populated areas in contrast have a much thinner layer of virtual representation in Wikipedia. There are, however, some notable exceptions. Parts of Cornwall, Somerset and the Isle of Wight all have a denser layer of content than might be expected for such relatively rural parts of the country. On the other hand, one might expect a higher density in the districts surrounding Belfast (in fact almost all of Northern Ireland is characterised by very low levels of content per square KM).
Finally, we can look a the number of articles per person in each district:
Here some more surprising results are visible. All major urban areas have relatively low counts of article per person (with the exception of central London). In contrast, many rural areas (particularly areas containing national parks) have high counts per person.
There are obviously a range of ways to measure the geographies of Wikipedia in the UK. We see that some areas are blanketed by a highly dense layer of virtual content (e.g. central London and many of the UK's other major conurbations). These maps also highlight the fact that some parts of the UK are characterised by a paucity of content irrespective of the ways in which the data are normalised. Northern Ireland in particular stands out in this respect.
We'll attempt to upload similar analyses of other countries in the next few months. In the meantime, however, we would welcome any thoughts on the uneven amount of virtual representation that blankets the UK.
p.s. many thanks to Adham Tamer for his help with the data extraction.
What we're likely picking up on is the fact that large districts simply have more potential stuff to write about. If we normalise the map by area we see an entirely different pattern. The map below displays the number of articles per square KM.
We see that most of the large urban conurbations in the UK are covered by a dense layer of articles. Most sparsely populated areas in contrast have a much thinner layer of virtual representation in Wikipedia. There are, however, some notable exceptions. Parts of Cornwall, Somerset and the Isle of Wight all have a denser layer of content than might be expected for such relatively rural parts of the country. On the other hand, one might expect a higher density in the districts surrounding Belfast (in fact almost all of Northern Ireland is characterised by very low levels of content per square KM).
Finally, we can look a the number of articles per person in each district:
Here some more surprising results are visible. All major urban areas have relatively low counts of article per person (with the exception of central London). In contrast, many rural areas (particularly areas containing national parks) have high counts per person.
There are obviously a range of ways to measure the geographies of Wikipedia in the UK. We see that some areas are blanketed by a highly dense layer of virtual content (e.g. central London and many of the UK's other major conurbations). These maps also highlight the fact that some parts of the UK are characterised by a paucity of content irrespective of the ways in which the data are normalised. Northern Ireland in particular stands out in this respect.
We'll attempt to upload similar analyses of other countries in the next few months. In the meantime, however, we would welcome any thoughts on the uneven amount of virtual representation that blankets the UK.
p.s. many thanks to Adham Tamer for his help with the data extraction.
October 25, 2010
The Full Wiki
A fascinating website called The Full Wiki has recently been brought to our attention.
The site contains an excellent mapping tool that allows users to visualise the locations of all places and events mentioned in any wikipedia article.
The entry for World War II for example, brings up a detailed map. Clicking on any point will bring up the snippet of text in the Wikipedia article that mentions that specific place. Needless to say, this is a useful tool for uncovering the not only the places mentioned in any article, but also the silences and omissions.
We should point out a curious anomaly for the map of the article on monkeys. The only location shown is in downtown Washington DC. We won't ask why.
The site contains an excellent mapping tool that allows users to visualise the locations of all places and events mentioned in any wikipedia article.
The entry for World War II for example, brings up a detailed map. Clicking on any point will bring up the snippet of text in the Wikipedia article that mentions that specific place. Needless to say, this is a useful tool for uncovering the not only the places mentioned in any article, but also the silences and omissions.
We should point out a curious anomaly for the map of the article on monkeys. The only location shown is in downtown Washington DC. We won't ask why.
Labels:
wikipedia
October 15, 2010
More Flickr Mapping
Building on our visualisation of 34 million geotagged Flickr images, we have decided to map the data normalised by population and area. In doing so, some quite interesting patterns are evident.
Predictably, we see some of the same core-periphery patterns that are observable in other types of user-generated content (e.g. Wikipedia). More surprising is the fact that unlike the geography of Wikipedia content, there are a significant number of low-income countries with relative large amounts of content (i.e. images) per every 100,000 people and 100km. Cambodia, Oman, Namibia, South Africa, Nepal and a host of other countries all score highly using these normalised measures.
I would hypothesise that two factors are at play here. First, there are lower barriers to entry on Flickr versus Wikipedia. In other words, despite the openness of Wikipedia, it is still easier to upload geotagged photos to Flickr than to create a new article and defend it's existence against nominations for deletion and overzealous editors. Moreover, the binary developed vs. developing country division has always masked the range of differences between and within countries, e.g., an interesting comparison between Oman and Yemen.
Second, it is also probable that much of the content in low-income countries is created by visitors and tourists. For instance, a significant number of photos geotagged to Cambodia are likely tourist shots of the Angkor Wat temple complex rather than locally created scenes of more everyday events.
Whatever the reasons are, more research is clearly needed on the topic to uncover what the specific biases in authorship are. Furthermore, irrespective of the specific reasons, it remains that these maps continue to show significant unevenness in user-generated content around the world.
For further reading see:
Graham, M. 2010. Neogeography and the Palimpsests of Place. Tijdschrift voor Economische en Sociale Geografie 101(4): 422-436.
Zook, M. and M. Graham. 2007. The Creative Reconstruction of the Internet: Google and the privatization of cyberspace and DigiPlace. GeoForum 38(6): 1322-1343
Flickr Images per 100,000 people
Predictably, we see some of the same core-periphery patterns that are observable in other types of user-generated content (e.g. Wikipedia). More surprising is the fact that unlike the geography of Wikipedia content, there are a significant number of low-income countries with relative large amounts of content (i.e. images) per every 100,000 people and 100km. Cambodia, Oman, Namibia, South Africa, Nepal and a host of other countries all score highly using these normalised measures.
I would hypothesise that two factors are at play here. First, there are lower barriers to entry on Flickr versus Wikipedia. In other words, despite the openness of Wikipedia, it is still easier to upload geotagged photos to Flickr than to create a new article and defend it's existence against nominations for deletion and overzealous editors. Moreover, the binary developed vs. developing country division has always masked the range of differences between and within countries, e.g., an interesting comparison between Oman and Yemen.
Second, it is also probable that much of the content in low-income countries is created by visitors and tourists. For instance, a significant number of photos geotagged to Cambodia are likely tourist shots of the Angkor Wat temple complex rather than locally created scenes of more everyday events.
Whatever the reasons are, more research is clearly needed on the topic to uncover what the specific biases in authorship are. Furthermore, irrespective of the specific reasons, it remains that these maps continue to show significant unevenness in user-generated content around the world.
For further reading see:
Graham, M. 2010. Neogeography and the Palimpsests of Place. Tijdschrift voor Economische en Sociale Geografie 101(4): 422-436.
Zook, M. and M. Graham. 2007. The Creative Reconstruction of the Internet: Google and the privatization of cyberspace and DigiPlace. GeoForum 38(6): 1322-1343
July 05, 2010
Wikipedia and Internet Use
The following map displays the total number of Wikipedia articles normalised by the number of internet users at the country level. The countries with the highest number of articles per 100,000 internet users are Nauru (4667), the Central African Republic (1253) and Myanmar (824). In fact most of the places that score highly by this measure, like the countries listed above, have extremely low levels of internet use per capita.
In contrast, countries with higher level of per-capita internet usage tend to have far lower rates of Wikipedia article per 100,000 internet users (e.g. the United Kingdom (70) and France (67)). While it is entirely possible that the high rates of articles per internet users in some countries is an indication of dedicated Wikipedia editors, it seems instead more likely that Myanmar, the Central African Republic and most other nations with low levels of internet penetration are being represented by editors from outside of their boundaries.
In contrast, countries with higher level of per-capita internet usage tend to have far lower rates of Wikipedia article per 100,000 internet users (e.g. the United Kingdom (70) and France (67)). While it is entirely possible that the high rates of articles per internet users in some countries is an indication of dedicated Wikipedia editors, it seems instead more likely that Myanmar, the Central African Republic and most other nations with low levels of internet penetration are being represented by editors from outside of their boundaries.
May 13, 2010
Wikipedia Vision
We wanted to use this post to draw attention to an interesting project that maps anonymous Wikipedia edits in almost real-time. The project extracts data from Wikipedia's recent changes page, geolocates the IP addresses and plugs everything into the Google Maps API. So, for example, in at the moment the screenshot below was taken, somebody in Newcastle, England was editing the rather extensive article on the list of Desperate Housewives characters.
Be warned: you can spend a long time being mesmorised by random Wikipedia edits flashing around the world.
Be warned: you can spend a long time being mesmorised by random Wikipedia edits flashing around the world.
Labels:
wikipedia
Subscribe to:
Posts (Atom)