Search this keyword

Showing posts with label CiteBank. Show all posts
Showing posts with label CiteBank. Show all posts

Mendeley as CiteBank: some ideas

Here are some quick notes on how BHL could use Mendeley as a "CiteBank".

As a repository of bibliographic data

If the goal is to assemble a "bibliography of life" then there are various ways this could be done.

Taxon-specific bibliographies

Create groups that are taxon-specific (or find existing groups in Mendeley. For example, I've created groups for amphibias (Amphibian Species of the World) and reptiles (TIGR/JCVI Reptile Database) based on the Amphibian Species of the World and TIGR/JCVI Reptile Database, respectively. Taxon-specific groups are probably going to be attractive to users, but the quality of bibliographic metadata can be variable. However, a bibliography for a specific taxonomic group that is populated with links to BHL content would be very useful.

Journal-specific bibliographies

This is where I've spent most of my efforts. I've created around 300 groups for various journals (see list below, or go directly to http://dl.dropbox.com/u/639486/groups.html). In some cases I've managed to populate these with the complete set of articles published in that journal, typically harvested from the journal's own web site. Typically the metadata from journal sites is high quality, although one has to be wary of Orwellian metadata.



I use these groups in two ways. The first is as a source of metadata for extracting articles from BHL using BioStor. If you have article-level metadata finding articles in BHL becomes easier, and can be automated so that 1000's can be added in a few minutes.

The second is for the taxon-literature mapping project, where one strategy is to use approximate string mapping to find equivalent citations in Mendeley and the ION database. Ultimately I'd like to link to the Mendeley citations as they tend to be higher quality than those in the original ION database.

BHL could create Mendeley groups for journals it has scanned, and populate those.

As an article-level index to BHL

This is perhaps the most direct way BHL could use Mendeley is as follows:

  1. Create a BHL account.
  2. For each BHL title create a Mendeley group (the name would be the BHL TitleID).
  3. For each item in that title create a folder in the corresponding group (the folder name would be the ItemID).
  4. Within each folder list the articles, book chapters or other component parts. If these aren't available yet, encourage people to add them. Some of these could be pre-populated with content from BioStor.
  5. Harvest the contents of these groups to provide an article-level index to BHL (which for me is the single biggest impediment to using BHL). Previously I've suggested a way to easily add article data to BHL, Mendeley title/item groups and folders might be way to facilitate this process.
PDF storage

Although Mendeley offers PDF storage, this is one feature I'd be less inclined to use. Mendeley's rule for sharing PDFs and making them publicly available are too restrictive (they often don't know whether a PDF can, in fact, be shared). Plus you want tools to visualise, index, and archive PDFs. In effect a big file store with added features. I have some ideas on how this can be implemented (and have a rough working version to support http://iphylo.org/~rpage/itaxon). Alternatively, one could use Internet Archive services.

Summary

As I've often argued, given the success of tools like Mendeley it seems pointless for anyone to try and build yet another online bibliographic database. The trick is to figure out how to leverage what Mendeley provides to support what the taxonomic (and broader biodiversity) community needs.

First thoughts on CiteBank and BHL-Europe

This week saw the release of two tools from the Biodiversity Heritage Library, CiteBank and the BHL-Europe portal. Both have actually been quietly around for a while, but were only publicly announced last week.

In developing a new tool there are several questions to ask. Does something already exist that meets my needs? If it doesn't exist, can I build it using an existing framework, or do I need to start from scratch? As a developer it's awfully tempting sometimes to build something from scratch (I'm certainly guilty of this). Sometimes a more sensible approach is to build on something that already exists, particularly if what you are building upon is well supported. This is one of the attractions of Drupal, which underlies CiteBank and Scratchpads. In my own work I've used Semantic Mediawiki to support editable, versioned databases, rather than roll my own. Perhaps the more difficult question for a developer is whether they need to build anything at all. What if there are tools already out there that, if not exacty what you want, are close enough (or most likely will be by the time you finish your own tool).

CiteBank
bhlsquare_reasonably_small.png
CiteBank is an open access platform to aggregate citations for biodiversity publications and deliver access to biodiversity related articles. CiteBank aggregates links to content from digital libraries, publishers, and other bibliographic systems in order to provide a single point of access to the world’s biodiversity literature, including content created by its community of users. CiteBank is a project of the Biodiversity Heritage Library (BHL).

I have two reactions to CiteBank. Firstly, Drupal's bibliographic tools really suck, and secondly, why do we need this? As I've argued earlier (see Mendeley, BHL, and the "Bibliography of Life"), I can't see the rationale for having CiteBank separate from an existing bibliographic database such as Mendeley or Zotero. These tools are more mature, better supported, and address user needs beyond simply building lists of papers (e.g., citing papers when writing manuscripts).

For me, one of BHL's goals should be integrating the literature they have scanned into mainstream scientific literature, which means finding articles, assigning DOIs, and becoming in effect a digital publishing platform (like BioOne or JSTOR). Getting to this point will require managing and cleaning metadata for many thousands of articles and books. It seems to me that you want to gather this metadata from as many sources as possible, and expose it to as many eyes (and algorithms) as possible to help tidy it up. I think this is a clear case of it being better to use an existing tool (such as Mendeley), rather than build a new one. If a good fraction of the world's taxonomists shared their person bibliographies on Mendeley we'd pretty much have the world's taxonomic literature in one place, without really trying.

BHL-Europe
logo.jpg
It's early days for BHL-Europe, and they've taken the "lets use an existing framework" approach, basing the BHL-Europe portal on DISMARC, the later being a EU-funded project to "encourage and support the interoperability of music related data".

BHL-Europe is the kind of web site only its developers could love. It's spectacularly ugly, and a classic example of what digital libraries came up with while Google was quietly eating their lunch. Here's the web site showing search results for "Zonosaurus":

bhleu.png


Yuck! Why do these things have to be so ugly?. DISMARC was designed to store metadata about digital objects, specifically music. Look at commercial music interfaces such as iTunes, Spotify, and Last.fm. Or even academic projects such as mSpace.

To be useful BHL-Europe really needs to provide an interface that reflects what its users care about, for example taxonomic names, classification, and geography. It can't treat scientific literature as a bunch of lifeless metadata objects (but then again, DISMARC managed to do this for music).

Where next?
CiteBank and BHL-Europe seem further additions to the worthy but ultimately deeply unsatisfying attempts to improve access biodiversity literature. To date our field has failed to get to grips with aggregating metadata (outside of the library setting), creating social networks around that aggregation, and providing intuitive interfaces that enable users to search and browse productively. These are big challenges. I'd like to see the resources that we have put to better use, rather than being used to build tools where suitable alternatives already exist (CiteBank), or used to shoe horn data into generic tools that are unspeakably ugly (BHL-Europe portal) and not fit for purpose. Let's not reinvent the wheel, and let's not try and convince ourselves that squares make perfectly good wheels.

Mendeley, BHL, and the "Bibliography of Life"

One of my hobby horses is the disservice taxonomic databases do their users by not linking to original scientific literature. Typically, taxonomic databases either don't cite primary literature, or regurgitate citations as cryptic text strings, leaving the user to try and find item being referred to. With the growing number of publishers that are digitising legacy literature and issuing DOIs, together with the Biodiversity Heritage Library's (BHL) enormous archive, there's really no excuse for this.

Taxonomic databases often cite references in abbreviated forms, or refer to individual pages, rather than citable units such as articles (see my Nomenclators + digitised literature = fail post for details). One way to translate these into links to articles would be to have a tool that could find a page within an article, or could match an abbreviated citation to a full one. This task would be relatively straightforward if we had the "bibliography of life," a freely accessible bibliography of every taxonomic paper ever published. Sadly, we don't...yet.

Bibliography of life

Mendeley is rapidly building a very large bibliography (although exactly how large is a matter of some dispute, see Duncan Hull's How many unique papers are there in Mendeley?), and I'm starting to explore using it as a way to store bibliographic details on a large scale. For example, an increasing number of smaller museum or society journals are putting lists of all their published articles on the web. Typically these are HTML pages rather than bibliographic data, but with a bit of scraping we can convert them to something useful, such as RIS format and import them in to Mendeley. I've started to do this, creating Mendeley groups for individual journals, e.g.:

These lists aren't necessarily complete nor error-free, but they contain the metadata for several thousand articles. If individual societies and museums made their list of publications freely available we would make significant progress towards building a bibliography of life. And with the social networking features of Mendeley, we could have groups of collaborators clean up any errors in the metadata.

Of course, this isn't the only way to do this. I suspect I'm rather atypical in building Mendeley groups containing articles from only one journal, as opposed to groups based on specific topics, and of course we could also tackle the problem by creating groups with a taxonomic focus (such as all taxonomic papers on amphibians). Furthermore, if and when more taxonomists join Mendeley and share their personal bibliographies, we will get a lot more relevant articles "for free." This is Mendeley's real strength in my opinion: it provides rich tools for users to do what they most want to do (manage their PDFs and cite them when they write papers), but off the back of that Mendeley can support larger tasks (in the same way that Flickr's ability to store geotagged photos has lead to some very interesting visualisations of aggregated data).

BioStor
cover.png
For some of the journals I've added to Mendeley I just have bibliographic data, the actual content isn't freely available on line, and in some cases isn't event digitised. But for some journals the content exists in BHL, it's "just" a matter of finding it. This is where my BioStor project comes in. For example, BHL has scanned most of the journal Spixiana. While BHL recognises individual volumes (see http://www.biodiversitylibrary.org/bibliography/40214) it has no notion of articles. To find these I scraped the tables of contents on the Spixiana web site and ran them through BioStor's OpenURL resolver. If you visit the BioStor page for the journal (http://biostor.org/issn/0341-8391) you will see that most of the articles have been identified in BHL, although there are a few holes that will need to be filled.
spixiana.png

These articles are listed in a Mendeley group for Spixiana, with the articles linked to BioStor wherever possible.

CiteBank and on not reinventing the wheel
If we were to use Mendeley as the primary platform for aggregating taxonomic publications, then I see this as the best way to implement "CiteBank". BHL have created CiteBank as an "an open access repository for biodiversity publications" using Drupal. Whatever one thinks of Drupal, bibliographic management is not an area where it shines. I think the taxonomic community should take a good look at Mendeley and ask themselves whether this is the platform around which they could build the bibliography of life.