Northwest History: digital history

Showing posts with label digital history. Show all posts

Tuesday, February 12, 2013

Mapping Spokane's Dead: A Pedagogical Experiment in Flash-mob Data Visualization

I taught Digital History last quarter. The course was a lot of fun as a dozen traditionally-trained MA students and I explored some of the new digital historical landscapes. The class is divided between readings discussions and a weekly "make" where we bust open some new digital tool and see what we can do with it. For our weekly make a few months ago we created, populated, and visualized a historical database in about an hour. This post is about what we did and how we did it.

I had a couple of different pedagogical goals.First, I wanted my students to understand the importance and power of constructing a database, rather than merely building a website. Second, I wanted them to explore one tool for doing so, Google Fusion Tables. Third,.I wanted to use some of the rich historical resources of my employer, the Washington State Archives, Digital Archives. Finally I wanted them to experience the difficulty, decisions and compromises of building a database and extracting metadata from handwritten historical documents.

Working with my grad student, the excellent Lee Nilsson, we chose the Spokane County Death Returns.1888-1907 as our data set. These records were interesting, the images of the death certificates and some metadata were already online, they represented a broad cross-section of the population of early Spokane, and they presented certain complications as well, in terms of handwriting and uneven data. Here is a sample death return from the collection, that of the unfortunate Owen J. Jones:

When the State Archives originally scanned and indexed these records, they chose to record as the metadata the first and last names, age at death, and place of death. This was a good start, but missed some data that was recorded on the death certificates and that historians would find important. So we added race, occupation, place of birth and cause of death to the metadata fields that we wanted to capture.

Then Nilsson created a Google Fusion table with our metadata fields and entered the information from a few death returns. Right away he realized that one problem the students would run into was transcribing the causes of death--things like phthisis (tuberculosis of the lungs) and morasumus (malnutrition) written in sometimes terrible 19th-century handwriting. A quick Google found us some lists of 19th century causes of death. Nilsson added about 150 names from the 1880 death index to the Google Fusion table and used color highlighting to organize the list in groups of ten names. I took the email addresses of the students in the class and gave them permission to edit the table.

The actual lesson took about an hour. We gave each student ten names and asked them to read the death certificates and to add the metadata to the table. Nilsson and I circulated in the classroom to help people out. The names and dates went pretty smoothly. The students stumbled with causes of death at first, but the guides to 19th century causes of death cleared up most questions. A bigger problem was missing data. 1880 was the first year of death records in Spokane and the record keeping was erratic. Places of birth, causes of death, and other items were not always filled out.

Then came the experiment part--visualizing the data. My original inspiration for the project was the idea that we would produce a map of where in Spokane people had died. This did not work so well:

In retrospect the reasons are clear. A few of the death certificates gave street addresses, and Google was able to map these accurately. In other cases no place was given, or given only as Spokane. In some cases there was a location but the student transcribing was unable to read it or just did not bother. With better instruction and monitoring this map might have come out better.

Much more satisfying was this map of where the people were born. It nicely illustrates patterns of migration into 19th century Spokane. Don't miss the guy from China!

Google Fusion tables allow for other types of visualization as well. Here is a pie graph of causes of death:

Those are some mighty thin slices of pie! The chart does not tell us very much, except that typhoid and pneumonia were common. We would have been better off creating categories--contagious disease, accidents, infant death, etc.

You can also make bar graphs--here is one for the occupations of the deceased. The laboring classes seem to have had it bad in early Spokane, though you would need some demographic analysis to make any conclusions here:

Finally, here is the a frequency showing the distribution of deaths throughout the year. The table is interactive, you can click and drag to explore it. Notice anything odd? The Grim Reaper seems to have forgotten Spokane entirely some months:

The students were quite puzzled by this and came up with all sorts of reasons that several months could have passed without a death. Of course the most likely explanation is simply that the records for those months have gotten lost in the century since they were initially recorded.

All in all this pedagogical experiment was a great success. My students learned how digital sausage is made--the decisions that go into choosing what metadata to record and visualize, the challenges of working with hand-written 19th-century documents, the amount of pain-staking work that went into a data visualization.

Sunday, December 30, 2012

So What is Happening with Spokane Historical?

Come and explore Spokane Historical!

Spokane Historical is a smartphone app and website for local history, and is a project of the Public History program at Eastern Washington University. It has been awhile since I blogged about this project, and there is a lot to report.

There are now over 150 high quality points of interpretation, or "tour stops" on Spokane Historical. All were created by EWU students in public history courses. I have given students a pretty free hand in choosing their topics, and I am pleased with the diversity of topics. In Spokane we have everything from a tour of the historic parks and Fort George Wright to stops along the Art Walk and many historic buildings and events.

Spokane Historical also includes substantial content in Cheney, including downtown landmarks such as the Odd Fellows Building and Bill's Tavern. We have interpreted over a dozen sites on the EWU campus, including Showalter Hall, the JFK Library, and Jore School--the one-room schoolhouse on the Cheney campus.

There is so much terrific content on Spokane Historical that for the next few weeks I will feature a series of posts about the project. But you don't have to wait for me--go ahead and take a look around.

Thursday, October 13, 2011

Digital History in the Public History News

Posted: Visualizing US expansion through post offices. from Derek Watkins on Vimeo.

Public History News is the newsletter of the National Council on Public History. Back issues are here. A regular feature of the newsletter is "Worth Another Look," which offers capsule introductions to various articles and public history projects. In the latest issue it is striking to me how many of the public history projects are digital projects. Some interesting examples:

The World Memory Project is a partnership between the Holocaust Museum and Ancestry.com to utilize volunteers to index some of the museums 170 million documents. So far 3000 volunteers have indexed over 600,000 records.
Speaking of Crowdsourcing, Scripto is "a lightweight, open source tool that allows users to contribute transcriptions to online documentary projects." It is the latest digital tool from the folks who brought us Zotero and Omeka.
Visualizing US Expansion through Post Offices (seen above) is a straightforward project that scraped some public databases and added some computer magic to create a neat video showing what we might dub the Post Office Frontier. The link in this paragraph will take you to an interactive map where you can sort the results by date range and zoom in on a region. Did you know that the first PO in eastern Washington was established at Colville in 1862?
What Big Media Can Learn From the New York Public Library is an Atlantic Magazine article that highlights how the library is doing "some of the most innovative online projects in the country." These include "Biblion, a storytelling app whose iPad icon features the lion head, is the flashiest of these efforts...Then there is the library's slick crowdsourcing projects, which allow users to digitize beautiful old menus from New York's restaurants and plot historical maps of the city onto the GPS-enabled digital maps of today. Both projects are both useful and feature user interfaces that best most commercial crowdsourcing applications. The library is even improving its basic infrastructure to keep pace with the big social networks, announcing this week that they are launching a new log-in system through Bibliocommons that will bring simplified and more powerful catalog and account services to the library's users.
4Humanities is a Canadian digital humanities site offering digital tools, news, and a valuable Humanities Showcase.

A few years ago I was at a ThatCamp where one of the participants proudly announced that "Public History is Digital History!" This is a silly overstatement. But it might not be too much to say that Digital History is Public History.

Tuesday, September 13, 2011

Hacking the Academy, a Publishing Experiment

A brief essay of mine has appeared in a new volume, Hacking the Academy. I am excited because this is not just a book but an experiment in digital publishing, backed by some of the most respected names and institutions in the field of digital humanities.

Hacking the Academy is edited by Dan Cohen and Tom Scheinfeldt of the Roy Rosenzweig Center for History and New Media at George Mason University. In May of 2010 they used social media networks to put out a call for contributions to the volume, giving would-be contributors only a week to submit their essays, "the better to focus their attention and energy." They received "329 submissions from 177 authors, with nearly a hundred submissions written during the week-long event and the other two-thirds submitted by authors from their prior writing on the subject matter." One-sixth of the submissions were accepted for the volume, including my "How to Read a Book in One Hour."

Hacking the Academy is interesting for both its content and its approach to publication. The content focuses on "how the academy might be beneficially reformed using digital media and technology," particularly "writing that moved beyond mere complaints about the state of the academy into shrewd diagnoses and potential solutions." The essays are organized into three broad categories: "Hacking Scholarship," "Hacking Teaching," and "Hacking Institutions." The essays alternate between provocative big-picture, "this is how we ought to start doing things" pieces (such as David Parry's Burn the Boats/Books and Jo Gildi's terrific "Reinventing the Academic Journal") and more immediately practical pieces such as "Unconferences," a how-to guide by Ethan Watrall, James Calder, and Jeremy Boggs.

Hacking the Academy will be published in two ways--a free, digital publication available right now and a forthcoming print edition. The publisher is the University of Michigan Press, via their digitalculturebooks imprint. UM Press attracted a lot of attention when they announced a shift to predominantly digital publishing in 2009. The digitalculturebooks series now features over two dozen titles, available online and in print as either cloth of paper editions.

Hacking the Academy is something of a test case for a new model of producing a scholarly anthology. Coming out under the imprimatur of some of the most respected names and institutions in digital humanities, and with an timely topic and high-quality content, this book should have an impact. It will be interesting to see if the work is adopted in classrooms, cited in the literature, blogged and tweeted and run through the social networks. It will also be interesting to see if people will buy print editions of what they could read online for free.

If this is successful it could be a first step into a new publishing world.

Friday, April 22, 2011

Behold Again the Awesomeness of My Public History Students

Last quarter I taught the graduate Introduction to Public History class. for the final course project I gave the students the option to do anything whatsoever that a public historian might do--from a walking tour to a museum collection plan to a historic register nomination to a digital project. Quite a few took me up on the last option and I thought I would share their work here.

Pippin Rubin had been having fun reading some of the diaries of the women missionaries who came west along the Oregon Trail in 1838--part of the research for her thesis. With her typical attention to detail she plotted every single campsite along the trail in a Google Map and entered a few lines from each diary into the place marks. She added other information as well. The result is this dramatic visualization of the long struggle of these women to bring their message to the Pacific Northwest:

View Missionary Trail 1838 in a larger map

Two students wanted to create mobile walking tours of historic sites in Spokane. They went about this in very different ways. Tracy Rebstock created a an audio tour of Manito Park. She did a wonderful job of distilling her research into short 1-2 minute guides to some of the most significant areas of this 100+ year-old urban park. She even got the tour onto iTunes--search for Manito or for Librarygirl70.

Clayton Hanson took a different tack to creating his walking tour of Spokane Falls. Clayton wanted to create a multi-media tour with text, historic photographs and video. The method he hit upon was to create the stops on his tour as place markers on a custom Google Map. The idea is that Google Maps is an existing platform that is available on all smartphone operating systems. Each place mark would hold HTML to take the user to a webpage with more information. But how to create a mobile-optimized webpage for each tour stop? Clayton's solution was to use Omeka to hold the content for his tour and to create an Omeka exhibit for each stop. It is an ingenious solution, but unfortunately Omeka does not display very well on mobile phones so the pages are crunched and the text tiny. They display just fine on a computer, however:

View Historic Tour of Spokane's Riverfront Park in a larger map

Tiffany Fulkerson did a project in Google Earth "Climate Change at the Late Pleistocene-Early Holocene A Comparison of Proxy Data Sets in Washington State" that also built on her thesis research and technology to bring her work to a broader public.

Finally, Nikolai Cherny used the final project to document a soon-to-be-dismantled museum exhibit at the Northwest Museum of Arts & Culture. The Spokane Timeline uses the Blogger platform to present a tour of the exhibit, a slideshow of images from the exhibit, the exhibit script, and even an interview with exhibit designer and MAC curator Marsha Rooney:

Not everyone did a digital project, I also received a wonderful history of a local elementary school, a processing plan for an unsorted archival collection, a history of an aviation museum, and a career paper involving historic preservation. It was the finest set of final projects that I have received in my Intro to Public History class.

Tuesday, December 7, 2010

A Digital Toolbox for Graduate Students in History

Readers, help me out here. What does a 21st century graduate student need to know in the way of digital tools and resources? I am trying to develop a presentation for incoming students in our graduate program in history. Here is my list so far, what should I add? I am trying to identify both tools and the minimum skill set that students should try to master with each.

Students need to master the Google search engine. They should know how to search for phrases, exclude certain terms, filter by date range, search within a domain, use the cache to view expired pages, and how to frame a good query in the first place. I am surprised how many students who grew up with Google don't know these things.
Google Books is the historian's boon companion, offering access to millions of books, searchable and sometimes downloadable. Students should master the advanced search features, be able to set up their own libraries, and be able to share, save, and organize what they find. Students should also know the other big book/content projects, Archive.org, the Hathi Trust, and Open Library.
Zotero is a citation manager and so much more that helps tame the information overflow of the web. Students should be able to set up a Zotero account, sync their files, create Zotero items for items in multiple formats, create a library and share it with other Zotero users.
Students should use an RSS reader to simplify keeping track of blogs and other changing information. (I love this Common Craft video, RSS Readers in Plain English. I have been using Google for this but I suspect there are better solutions. Should I recommend Feedly? Help me out here.
Students need to be able to capture, edit, save and organize images. They should be able to use a digital camera to take notes in the archives, back up and share their photos online, and capture images from websites. My preferred tools are Picasa and Picnik.
Dropbox is the preferred online backup for your files. Did I ever tell you about the friend whose laptop with two year's worth of dissertation research was stolen? Fortunately she had backed up her files--on disks that she kept in her laptop case. Don't let this happen to you.
Twitter is an important source for finding sharing information and Tweetdeck seems to be the best management tool.
Finally, I want to have a section about managing your online presence. Students should have a professional email address that is a recognizable version of their first and last names (and really, it should be Gmail), should have accounts at LinkedIn and Academia.edu, and should consider blogging and Tweeting--or least claiming their real name on Twitter if it is not too late. More importantly students should learn how not to leave incriminating evidence online. Future employers are not going to be impressed with how wasted you got in Cancun or by those photos of your new tattoo.

Wow, the above list is already longer and more intimidating than I wanted it to be. And yet I don't want to leave anything out. Please post your comments and suggestions below.

Thursday, May 13, 2010

THATCamp Pacific Northwest

THATCamp Pacific Northwest will be at the University of Washington this year, October 23rd & 24th. THATCamp PNW is a "digital humanities unconference" where participants "show, tell, collaborate, share, and walk away inspired." Last year I attended the main THATCamp at the Center for History and New MEdia in Georgetown as well as THATCamp PNW in Pullman and came away with so many great new ideas and contacts. THATCamp PNW is by application, and the deadline is June 7. See you there!

Tuesday, April 13, 2010

An Interesting Graduate Student Project

One of my graduate students, Shaun Reeser, is working to recover a long-lost government website for his MA public history project. Specifically, he will be working with some of the staff at the Washington State Archives, Digital Archives to recover the website of Ralph Munro, who served five terms as Washington Secretary of State from 1981 to 2001. Munro launched the first website for the Secretary of State's office in 1996, and the site was regularly updated until he left office in 2001. How to bring it back?

The Digital Archives has done something like this before, when we preserved the website of Washington Governor Gary Locke. In that 2005 effort the DA staff raced the clock to migrate Locke's website (an important public record) to the DA before it was taken down to make way for the website of Governor Gregoire. It was a pioneering project, but what Reeser will try to do is different. Munro's website was preserved in several versions at the Internet Archive Wayback Machine. But simply pulling the information off the Wayback Machine is not sufficient for establishing archival authenticity. And the versions of the website at the Wayback Machine are often incomplete, lacking some of the original images, for instance, and full of broken links.

Right now Reeser is trying to hunt down the original digital files and to interview Munro and members of his staff. He is also studying other efforts to spider and preserve government websites as historic documents. Does anyone know of similar effort that we should study?

[Image: Oh come on, you know.]

Thursday, February 18, 2010

QR Codes, Part 1: What are they? How do they work?

by Greg Shine, guest blogger

Over the past few months there's been a lot of interest in our use of QR Codes at Fort Vancouver NHS, so the following four posts comprise a quick primer on 1) what they are & how they work, 2) what's new about them, 3) how to make them, and 4) how they can be of use to historic site interpretation (and how we're using them). They are also posted over at my blog, too. Please share your thoughts; I know that we're only hitting the tip of the iceberg!

What are QR Codes? How do they work?

Generally speaking, QR (Quick Response) codes are a type of bar code, similar to those you find on products at your neighborhood grocery store. As our archaeologist Dr. Bob Cromwell (a railroad enthusiast) is quick to point out, one of the first uses of bar code technology was to help track the nation’s myriad railroad cars in the mid-nineteenth century. Since then, the technology has been widely adopted (and adapted) for other retail and inventory uses. Most recently, it is becoming more consumer driven…and directed.

There are many places that you can learn about the specific bar code symbology, and I won’t attempt to go into detail here, but bar codes embed data in a way that can be easily and quickly read by another device. The most common place that most folks encounter bar codes is at the grocery store, where scanners can “read” a product’s UPC (Universal Product Code). At the checkout, this technology allows the clerk (and us) to quickly identify the product and its price, but behind the scenes it also tracks the item from production to purchase, links to the product’s inventory, and provides other important metrics such as what it was purchased with, when it was purchased, and often where in the store it was purchased. This provides the grocery with valuable information about consumer choice patterns. The data embedded can vary greatly, too, and is not limited to what it is and when/where it was produced.

In national parks today, bar code technology is used in many ways. In the NPS’ Pacific West Region (56 national park units in California, Hawaii, Idaho, Nevada, Oregon, Washington and the islands of the outer Pacific) all sensitive equipment is given a bar code sticker for help in scheduling repair and replacement. Many park libraries also use the technology for controlling the checkout of books, similar to the way that many public libraries do. In some parks, equipment for seasonal firefighters is tracked through bar coding. Many park publications sport a bar code on their derrieres, and most of our park partners and cooperating associations use the technology in ways very similar to our local grocery stores.

At Fort Vancouver, we're also using bar code technology (in the form of QR Codes like the image above) as one, small tool to enhance and compliment our historic site interpretation. We have a fantastic crew of staff and volunteers, but even their herculean efforts don't allow us to have round-the-clock personal interpretation in every building and site in the park. Plus, we know that that is not every visitor's desire. With QR Codes, we can connect visitors directly to content via the internet by building a specific URL directly into a QR Code. By using one of a variety of free (and paid) applications on a smartphone or other web-enabled portable device, visitors can simply point their device's camera at the strange assemblage of black and white squares and instantly access web content we've specifically chosen for that location. Pretty cool, eh?

I don't want to get too far ahead of myself here (there are three more posts on the topic yet to come) but I am curious about the experience of readers of Northwest History. Have you found bar code technology in unexpected places? In places relating to Northwest history? If so, where? What impressions do you have?

Thursday, December 10, 2009

Public History Has to Get the History Right

The explosion of digital technologies, falling prices for equipment, and the development of an online audience interested in history has allowed everyone to get into the business of public history. Anyone with a netbook and perhaps a Flip video camera can produce original history content, put it online via Blogger or Facebook or dozens of other free platforms, and publicize their wares with Twitter or listservs or whatever. But what good does this do us if they get their history wrong?

Exhibit A: This podcast on the Whitman Tragedy by what appears to be a one-man outfit, US History Travelcast. I very much admire what Jeff Linder, the author of this podcast series, is trying to do. His fledgling site has podcasts on topics as diverse as Plymouth Rock, Seth Bullock (of Deadwood fame), and a 1959 prison riot in Montana. Many of the topics are drawn from Linder's travels and his website includes photographs of many of the featured locations. Unfortunately the content, at least for this episode, is pretty weak.

The problem with Linder's podcast is that he tells the story of the Whitman Mission exactly the way historians of a hundred years ago told the story--of saintly missionaries who headed west and were murdered by superstitious (and faceless) Indians who did not know any better. Linder begins his story with the missionaries at their homes in the northeast, rather than with the Cayuse and Walla Walla peoples along the Columbia. He carefully recites the the full names of each missionary, while mentioning no native person by name until he drops the names of two of the individuals who killed the Whitmans. Indeed the first mention of Indians at all is when he says that the Whitman's joined a fur brigade for safety against "raiding bands of Indians." He repeats the idea that part of the importance of the missionaries is that their party included the "first white women over the Rockies," which is one of those racialized "firsts" that makes modern historians cringe.

The important back story to the mission--the fur trade, the native journey to Saint Louis, the religious changes before the Whitman's arrival--are absent. Linder has Lewis and Clark "discovering" the Columbia River, which would have been news to the American and English sea captains who had sailed up its mouth a generation earlier, let alone the native peoples who had lived there for a hundred generations. He repeats the old myth that a wagon train of immigrants gave measles to the Cayuse, an idea disproven by anthropologist Robert Boyd a decade ago. And Linder has the mixed bloods Joe Lewis and Nicolas Finley as instigators of the killings--a popular 19th century theory but one not widely accepted today.

In his introductory episode Linder explains that he was never interested in history until a recent visit to Washington D.C. where history really came alive to him. "I am not a historian," Linder explains, "I don't have a degree in history." And yet he does have a podcast about history, and his lack of historical training undermines his efforts.

I have spent too many words picking on this poor podcaster who wanted nothing more than to share his love of history. Let me turn to the real culprit here--the historical profession, which has been slow to adopt new technologies and has left the digital path open to well-meaning but untrained amateurs. Most of us could easily create a podcast and some blog posts on some of our favorite topics. It isn't that hard. But we fail to embrace the new opportunities to reach a public that is hungry for history, and others fill the vacuum.

[Illustration: This diorama of the killing of the Whitman's used to be on display at Whitman Mission National Monument. It was eventually removed because of its factual inaccuracy and because it was offensive to the tribes. Photo courtesy of the Washington State Digital Archives.]

Thursday, September 24, 2009

The Promise of Mobile History

Look closely at the image of the iPhone--see the app with the letter H icon? That is a mock up of an iPhone app that would use the GPS system in the iPhone to help users to find historic sites when they travel. It is the brainchild of Twitterer DriveByHistory, known to you squares as Cynthia Sengel. Click here to play with the mockup and see how it would work.

I blogged a while back about the Duke Digital Collections iPhone app. But what is really promising about mobile devices is the promise of making history, well, mobile. Imagine being a road trip where you were alerted not just to historical markers but to museums that are currently open, historic trails along the way, old cemeteries, buildings on the National Register, etc., in each case with some pictures and a quick text blurb to tell you more. My immediate thoughts are 1) that would be amazing, and 2) I'd never get anywhere.

The next step would be to develop location-specific content for such mobile devices. How about a geotagged podcast that would take you on a walking tour of a historic site without having to have a set route? Or a virtual museum guide who knew what room you were in and which painting you were looking at? Or being able to see your location on a historic Sanborn or other map, or compare historic photos to the present-day house or building in front of you.

There was recently an interesting post over at Wired about a "Bionic Eye" iPhone app that produced "augmented reality." It looked to me like a good way to get hit by a car. But these augmented reality apps that overlay data from the internet on the scene in front of you have obvious uses for creating historical tours. In a few years you will see people standing at the edge of the Gettysburg Battlefield and holding their smart phones in front of their faces to see Pickett's charge reenacted on a 3" screen.

Also, it would be nice to see a way for historians to develop mobile content in a platform neutral way. I cannot see having my public history students develop content for a proprietary device that they cannot themselves afford.

Monday, September 14, 2009

Dan Cohen on The Future of the Digital University

Here is an interesting talk from Dan Cohen, Director of the Center for History and New Media at George Mason University. (He also has a blog and tweets and is one of the hosts of the Digital Campus podcast.) Despite the video presentation this is largely a talk with few illustrations and you can play it in the background while you play World of Warcraft or whatever.*

*I have never played World of Warcraft.

Wednesday, September 9, 2009

Building the Digital Lincoln

Journal of American History - Building the Digital Lincoln: "This special resources site offers a snapshot of how historians andBlogger: Northwest History - Edit Post "Journal of American History - Building the Digital..." digital humanists have helped to build a new understanding of Abraham Lincoln with a series of innovative and powerful Web-based tools. Their contributions

during the decade preceding the Lincoln bicentennial have significantly altered the landscape of Lincoln scholarship by widening and deepening access to a vast array of primary sources. The result has been a more finely detailed portrait of President Lincoln, his relationships, and his career’s most pivotal moments."

This interesting site from the Journal of American History uses the life and legacy of Abraham Lincoln as the thread that brings together a variety of approaches to digital history. In fact it is a fine introduction to some aspects of digital history--from word clouds to GIS layers to 3D modeling (using Google Sketchup!) to an interactive online essay to a brief documentary created using Photostory. What I like about the Building the Digital Lincoln site is that it is not all bleeding-edge technology, but uses well-known and widely available software packages. A visitor to the site thinks "Hey that is really cool--and I can do it!"

By the way, I will buy a drink for the first person who can identify the origin of the animated GIF on this page. (Hint: It is not from the JAH!)

Wednesday, September 2, 2009

Google's Book Search: A Disaster for Scholars?

Your humble Northwest History blogger is sometimes accused of being a Google fanboy. A fair cop. But you know who is not a Google fanboy? Geoffrey Nunberg, that is who. Over at the Chronicle of Higher Education Nunberg has a witty jerimiad, Google's Book Search: A Disaster for Scholars.

Nunberg's beef is with Google's sloppy and commercially driven metadata schemes. He demonstrates that even with such a basic item as date of publication, Google Books very frequently gets it wrong. This in turn often corrupts search results: "A search on 'Internet' in books published before 1950 produces 527 results; 'Medicare' for the same period gets almost 1,600." By comparing Google's data to that found in the catalogues of the contributing libraries Nunberg shows that these errors do in fact belong to Google, not to their partners.

Nunberg also whacks Google for the classification errors where books are placed in the wrong categories: " H.L. Mencken's The American Language is classified as Family & Relationships. A French edition of Hamlet and a Japanese edition of Madame Bovary are both classified as Antiques and Collectibles . . . An edition of Moby Dick is labeled Computers; The Cat Lover's Book of Fascinating Facts falls under Technology & Engineering."

Worst of all to Nunberg is Google's adoption of the Book Industry Standards and Communications categories for Google Books, which he describes as a modern commercial invention used to sell books, rather than a scholarly system of classification like the Library of Congress subject headings: "For example the BISAC Juvenile Nonfiction subject heading has almost 300 subheadings, like New Baby, Skateboarding, and Deer, Moose, and Caribou. By contrast the Poetry subject heading has just 20 subheadings. That means that Bambi and Bullwinkle get a full shelf to themselves, while Leopardi, Schiller, and Verlaine have to scrunch together in the single subheading reserved for Poetry/Continental European. In short, Google has taken a group of the world's great research collections and returned them in the form of a suburban-mall bookstore."

I think that Nunberg has a number of good points--point he gathers together to form a molehill, from which he conjures up a mountain. Google's metadata may be everything he says (and I think he is probably right) but how great a problem is that really? This scholar at least uses Google Books either 1) to locate a digital copy of a book I already know about, or 2) via a string of search terms. In the first case, it is not relevant to me that Google has classified Adventures of Huckleberry Finn under "wild plants" or whatever. I know perfectly well what it is, and just wanted to find a quote I remember.

In the second case, I might search for mentions of the Columbia River in books published before 1860. And suppose a faulty date in Google's database brings me to something written after 1860. So what? Surely when I click on the link and find myself reading Sherman Alexie instead of Lewis and Clark, I will notice the fact. (Actually I just did the search and on the first 10 pages of results I don't see any errors at all. Take that, Nunberg.)

So for which scholars exactly is Google Book Search a "disaster?" Nunberg cites "linguists and assorted wordinistas" who are "adrenalized" at the thought of data mining to "track the way happiness replaced felicity in the 17th century, quantify the rise and fall of propaganda or industrial democracy over the course of the 20th century, or pluck out all the Victorian novels that contain the phrase "gentle reader." But who does this? OK, I know that people do it, but most data mining of this type has always struck me as more of a parlour trick than actual scholarship.

The other thing Nunberg ignores is that metadata is not that hard to fix. Google already provides a "feedback" button on every virtual page so readers can report unreadable or missing pages. If we howl loud enough we could easily see similar feedback mechanisms on the "More book information" page so we could correct names and dates and categories.

Nunberg is absolutely correct to recognize the monumental importance to scholars of the Google Book Search project. It is vital that scholars take a critical stance that will push Google to improve the project and make it even more useful. His article is a valuable push in that direction.

UPDATE 9/3/09: Reader Ed points out that Geoff Nunberg also posted a nicely illustrated version of his article on the blog Language Log, and got a brief response in the comments from John Orwant, who manages the metadata at Google Books.

Tuesday, September 1, 2009

Date Announced for THATCamp Pacific Northwest

THATCamp Pacific Northwest: "THATCamp Pacific Northwest will be held on the campus of Washington State University in Pullman, WA on Saturday, October 17, 2009. For more information, visit the schedule and location pages, e-mail the organizers at THATCampPNW@gmail.com, or just apply now."

If you are in the PNW and are interested at all in the digital humanities I strongly recommend that you apply. The glory of THATCamp is that it invites people of all different levels of knowledge, experience and training. I came away from THATCamp at George Mason full of ideas and enthusiasm. My post about the event is here.

Monday, July 20, 2009

New Features on Google Books

Good news for users of Google Books--a passel of new features on Google Books have been released. The improvement range from better overview pages to browsable thumbnail to improved search within books (my favorite feature). There is even a nifty "page turn" animation.

For historian of 18th and 19th century America, Google Book Search is the most radical expansion of available research resources since--I don't know, since JSTOR. (Because of copyright it is much less useful to 20th century historians). Suddenly the number of period books at my disposal has gone from the few thousand at my library to a few hundred thousand--all of them downloadable and subject to keyword searches.

(If you use Google Books as much as I do, it pays to keep Inside Google Book Search, the official blog, in your RSS feed.)

Sunday, June 28, 2009

What Happens at THATCamp...

...gets Tweeted all over the world. (So watch yourself.)

I have just finished up at THATCamp, "a user-generated 'unconference' on digital humanities organized and hosted by the Center for History and New Media at George Mason University." It was a heady two days of presentations, debates, discussions and twittering about digital history with some of the most interesting people in the field. There was so much content and so many ideas it is hard to know what to blog. So rather than explore any one thing from the conference, I thought I would explain the "unconference" format of THATCamp and how it worked.

[And by the way, I hope THATCampers who read this will use the comments to correct me or make additions. I want this post to be a resource for next year's ThatCampers.]

Attendance at THATCamp is by application--you write a few paragraphs about what you will bring to share at THATCamp in terms of skills, experience, projects, or whatever, and also what you hope to learn. If you are accepted you receive an email with details about conference lodging and so on and also a user account to the THATCamp blog.

The blog is where the "user generated" part begins. People are encouraged to post their ideas on the blog and to use it to organize sessions. Many of us were not clear on this (and by "many of us" I mean myself) and participation on the blog was perhaps not what it should have been, but we did kick around some initial ideas. Here is my post.

On Saturday we came together at George Mason for breakfast. Along with coffee and baked goods there were three tables covered by large sheets of paper and handfuls of sharpies. The paper was divided into three large columns: Session Topic, Leaders, Attendees. The organizers had grouped the blog ideas into sessions and put down the names of the most voluble posters as the session leaders. We were encouraged to add ourselves as attendees or leaders or even to add new sessions.

We did this for half an hour and went to a sort of welcoming discussion. Twenty minutes later the staff had worked up a schedule for the weekend. We all bookmarked it on our iPhones and netbooks and filtered out to our sessions. (I thought about suggesting a printed copy but something told me that this just isn't done.)

The sessions were great. You know how at a regular conference you sit through the overly-long papers, checking your email and hoping that people stick to the time limit so you can get to the discussion? We skipped the papers. The sessions were extended discussions on digital history topics with people who are on the front lines of the digital revolution (and me).

Technology suffused the conference. The rooms were wired and included wireless signal. Usually someone would plug in a laptop to the digital projector and people would jump up to display a website or digital tool as it came up in the conversation. And there were power strips to plug in! More strips than there are snakes in Raiders of the Lost Ark.

Another thing that made the conference different was the use of Twitter, which was encouraged by the organizers. At THATCamp Twitter serves as a social organizer, a platform for exchanging

information, and most of all a back channel of communication during the sessions. Nearly everyone had a computer or cell phone at hand and as we talked about digital history a second conversation was happening online. By using the hash tag #thatcamp in our tweets we could customize a Twitter feed (if you have a Twitter account you can see the conversation here, or visit an archive of all the 2500+ THATCamp tweets here). Many followed the conversation online using the free application Tweetdeck. It might sound odd to anyone unaccustomed to the technology but Twitter really was an effective and natural tool for enhancing the conversations. Occasionally someone would say "Now Susan just made a good point in her tweet [paraphrases Susan] what do we think of that?"Another interesting effect of Twitter is that quite a few digital humanists not at the conference took part in the Twitter conversations.

One additional innovation was the series of three minute presentations during lunch, which were lovingly titled "Dork Shorts." People signed up to

give three-minute presentation of their digital projects. It was enough time to give a taste of the project but short enough that everyone who wanted to could show off their work.

The one other thing worth mentioning about the conference format was the variety of attendees. We had people from museums and libraries as well as academics, and undergraduates and graduate students mixed with university faculty and staff. It was very democratic and welcoming.

The "unconference" format of THATCamp gave me a lot of food for thought. The format was not perfect. Some of the conversations wandered too much, a few of the session organizers spoke a bit longer than necessary, and it took half a day for everyone to get in the groove of the unconference. But it was so much better than any other conference I have attended lately. I walked away from THATCamp not only with a lot of new knowledge and ideas but with a sense of having made meaningful connections with a bunch of digital history people. One of my goals is to bring the Pacific Northwest History conference to Spokane one year (oh God, did I just say that?) and I like the idea of adapting some of the unconference techniques to a regional history conference.

THATCampers--what did you think of the format? What did I miss?

[The picture of the Dork Shorts board is from CHNM director Dan Cohen's blog, Found History, which has his end-of-conference impressions of the event.]

Thursday, June 25, 2009

"Lick This": LOC, Flickr, and the Limits of Crowd Sourcing

[Update: This post has provoked quite the discussion over at the Flickr Commons board.]

In January of 2008 the Library of Congress and the photo-sharing web service Flickr announced a unique partnership. The Library of Congress Flickr Pilot Project put 3000 historic LOC photographs on the website Flickr and invited the public to view, annotate, tag, and generally mess with them. This was perhaps the LOC's first foray into the world of Web 2.0 and generated a tremendous buzz. "In the first 24 hours after launch, Flickr reported 1.1 million total views on our account, with 3.6 million views a week later," according to this LOC report on the project. The project--"a match made in photo heaven" according to the LOC blog--has been praised everywhere from the New York Times to the popular community weblog Metafilter.

The goals of the project are to "increase awareness of the Library and its collections; spark creative interaction with collections; provide LC staff with experience with social tagging and Web 2.0 community input; and provides leadership opportunities to cultural heritage and government communities." Especially talked about was the second goal--sparking interaction with the collections. The idea was that visitors to Flickr could add useful metadata LOC images, such things as the names of people in the photographs, locations, models of cars or other machinery, etc.

The project may well be a success overall, but as a way to add useful metadata to historical documents, the Library of Congress Flickr Pilot Project is a disappointment. Let me explain...

Above is a screen shot of this photograph, from the very popular 1930s-40s in Color photograph set. This iconic photograph is also used as the cover image on the LOC's Final Report Summary for the project. This one photograph, and the user-generated metadata attached to it, demonstrate the problems with inviting the general public to contribute to a historical collection.

One of the most innovative features of Flickr is the ability of visitors to add notes to the pictures. You can create a rectangular box over some portion of an image and add a text note. This is especially useful for identifying individuals in group photos or pointing out specific details.

So what sort of metadata have users added to supplement the sparse LOC identification ("Bransby, David,, photographer. Woman aircraft worker, Vega Aircraft Corporation, Burbank, Calif. Shown checking electrical assemblies, 1942 June ") of the photo?

There are 20-30 notes on the photograph and not one contains useful historical information to give context or help us understand the photograph. Most are throw-away jokes or comments, "I love this fabric!" by Flickr user Mrelia and "Lick this" by user HeatherrFalk (referring to the woman's forehead!). Most of the rest of the notes refer to the woman's appearance or the composition of the picture. Almost useful is a little nested debate about the authenticity of the photograph--how staged was it?--but the discussion is hard to follow, requiring hovering the mouse over each box to see the comment.

Flickr users may also add comments and tags to images, and organize them together into sets. But here again the crowdsourced noise overwhelms the signal of useful historical information. There are over 100 comments attached to this one photograph, all but a few devoted to the picture's composition (well it is a photography website after all) or how pretty the woman is or posting just to post something. Within the chaff there are a few grains of wheat--as when user BeadMobile adds some pencil drawings made by his grandmother when she worked in a factory during World War Two. But you really have to dig.

What about tagging? User tagging is often presented as a simple and powerful way to crowdsource metadata in online archives. There are 71 user-generated tags for this image. Some are obvious and useful--"1942" and "rosie the riveter." Many others however are odd ("everyone did their part") or cryptic ("sfv" "LF").

And the sets? How have Flickr users organized this image with others? Well the woman in the picture should be proud that she is in the "Nation Of Domination. (We Rule The Universe)" photo pool and the "cable porn" pool.

The above might seem like a lot of text to bash on one image and its metadata, but the problems extend to all of the other images in the project. The notes are mostly smart-ass remarks, the comments are empty, the tags are idiosyncratic. The frustrating thing is that there really is some crowd sourced gold withing the flood of junk, such as the transcriptions of hand-lettered signs in the windows of the Brockton Enterprise newspaper office in this photo.

The most useful comment I found in this project? User Catskills Grrl's comment: "Gee, I wish the stupid, smart-ass notes would be deleted off these photos."

I will pick up the topic of crowd sourcing again in a future post, pointing towards some archives that I believe are doing it correctly.

Thursday, May 28, 2009

New Perspectives Issue Focused on Digital History

The May issue of Perspectives, the monthly magazine of the American Historical Association, is titled Intersections: History and New Media. It is a nice, accessible round-up of brief articles on topics such as blogs, teaching with digital objects, narrative challenges of online exhibits, etc. Many of the articles are of the "hey look at what I am doing" genre, but still very useful.

Tuesday, May 26, 2009

The Death of Scholarly Publishing?

The University of Michigan Press has announced that they will be "redefining scholarly publications in the digital age"--by which they mean they will no longer print books. Rather they will shift their resources to "digital monographs." You to have give them props for the positive spin in the press release.

"Freeing the press, in large part, from the constraints imposed by the print-based business model will permit us to more fully explore and exploit ever-expanding digital resources and opportunities," Phil Pochoda, director of U-M Press, quotes himself as saying. Pochada also refers to his team and himself as "visionaries."

There is much fussiness in the academic community about this move, but it is not like we did not see it coming. Scholarly publishing of monographs has been on its death bed for years, with press runs of many books dropping below 1000, then below 500, then into the low hundreds even as prices have soared and subventions have become almost respected.

But as the guys over at Digital Campus pointed out in a recent podcast, the vital element of scholarly publishing is the peer review, not the physical form of the end product. Though no one seems to be noticing, academic articles have already made the leap. I am willing to bet the average article in the Journal of American History gets far more digital readers via the commercial databases such as the History Cooperative and JSTOR than through actual subscribers who crack open a physical copy.

And the digital versions of the articles are far superior to the printed ones. First of all you can actually find relevant articles via search engines. Then you can do keyword searches to take you to a relevant passage. You can store the articles you are working on in your laptop and mark them up with various tools. Within five years most of all of our history journals will cease publication in the dead-tree format.

But even I have to admit that the book poses special challenges.

First of all, we have no good delivery format for digital books. The Kindle solves many of the readability problems of digital publications, but it also locks away your content into a closed proprietary system. You don't actually own your books on a Kindle, you just pay Amazon for permission to read them. The Sony Reader does not seem to be catching on, and there is no open source reader that I know of. (Update: Not so fast...)

Second, will anyone buy digital scholarly monographs? Grad students are too broke and their professors too deep in their print fetish to buy digital books. And books have a somewhat different revenue model than scholarly journals, depending more on individual and less on institutional purchases. Journal subscription costs are largely borne by institutions, but books still generate some of their revenue via sales to individuals.

Third, authors who have a choice will go to publishers who print physical books until the last one closes shop. After all, what kind of gift to grandma is a digital book? The answer here is print-on-demand (POD) services to turn digital books into hard copies. One can imagine a bookstore that has exactly one hard copy of each title on its shelves. When you make a selection you bring the book to the clerk who punches a few buttons and a machine in the back spits out a lovely bound copy. In fact you have to imagine such a store, because none currently exist, despite developments such as the Espresso Book Machine. There are quite a few online POD vendors, and I was pleased with my experiment with one of them, but I don't think they represent any significant fraction of the book market.

So the transition to digital is apt to be trickier for books than it has been for journals. As university presses pull back and are closed down in the current economic crises (is LSU press next?) the search for a new model of scholarly publishing grows more urgent.