Accounting Careers

Showing posts with label iPad. Show all posts

Reading the Biodiversity Heritage Library using Readmill

tl;dr Readmill might be a great platform for shared annotation and correction of Biodiversity Heritage Library content.

Thinking about accessing the taxonomic literature I started revisiting previous ideas. One is DeepDyve (see DeepDyve - renting scientific articles). Imagine not having to pay large sums for an article, but being able to rent it. Yes, open access would be great, but ultimately it's all a question of money (who pays and when), the challenge is to find the mix of models that encourage people to digitise the relevant literature. Instead of publishers insisting we pay $US30 for an article, how about renting it for the short time we actually need to read it?

Another model is unglue.it, a Kickstarter-like company that seeks to raise funds to digitise and make freely available e-Books. unglue.it has campaigns where people pledge donations, and if sufficient pledges are made the book's rights-holder has the book digitised and released DRM-free.

Looking at unglue.it I stumbled across Readmill, "a curious community of readers, highlighting and sharing the books they love." Readmill has an iPad app where you can highlight passages of text and add your own annotation. These annotations can be shared, and multiple people can read and comment on the same book. Imagine doing this on BHL content. You could highlight parts of the text where the OCR has failed, and provide a correction. You could highlight taxonomic names that automatic parsers have missed, geographic localities, cited literature, etc. All within a nice, social app.

Even better, Readmill has an API. You can retrieve highlights and comments on those highlights. So, if someone flags a sentence as mangled OCR and provides a correction, that correction could be harvested and feed back to, say, BHL. These corrections could be used to improve searches, as well as the text delivered when generating searchable PDFs, etc.

You can even add highlights via the API, so we could upload a ePub book then add all the taxonomic names found by uBio or NetiNeti, enabling users to see which bits of text are probably names, correcting any mistakes along the way. Instead of giving readers a blank canvas they could already have annotations to start with.

Building an app from scratch to read and annotate BHL content would be a major undertaking. From my cursory initial look I wonder if Readmill might just provide the platform we need to clean up and annotate key parts of the BHL corpus?

Touching the tree of life

Prompted by a conversation with Vince Smith at the recent Online Taxonomy meeting at the Linnean Society in London I've been revisiting touch-based displays of large trees. There are a couple of really impressive examples of what can be done.

Perceptive Pixel

I've blogged about this before, but came across another video that better captures the excitement of touch-based navigation of a taxonomy. Perceptive Pixel's (recently acquired by Microsoft) Jeff Han demos browsing an animal classification. The underlying visualisation is fairly sttaightforward, but the speed and ease with which you can interact with it clearly makes it fun to use.

DeepTree

DeepTree comes from Life on Earth lab, and there's a paper coming out by @blockflorian and colleagues (I was reminded of this project by @treevisproject):

Technique added: DeepTree (2012); Florian Block, Michael Horn, Brenda Phillips, Judy Diamond, Margaret Evans, Chia Shen researchgate.net/publication/23…
— treevis.net Project (@treevisproject) September 21, 2012

For technical details on the layout algorithm see https://lifeonearth.seas.harvard.edu/downloads/DeepTree.pdf. Below is a video of it in use:

Both of these are really nice, but what I really want is to have this on my iPad…

Decoding Nature's ENCODE iPad app - OMG it's full of ePUB

The release of the ENCODE (ENCyclopedia Of DNA Element) project has generated much discussion (see Fighting about ENCODE and junk). Perhaps perversely, I'm more interested in the way Nature has packaged the information than the debate about how much of our DNA is "junk."

Nature has a website (http://www.nature.com/encode/) that demonstrates the use of "threads" to navigate through a set of papers. Instead of having to read every paper you can pick a topic and Nature has collected a set of extracts on that topic (such as a figure and its caption) from the relevant papers and linked them together as a thread. Here is a video outlining the rationale behind threads.

Threads can be viewed on Nature's web site, and also in the iPad app. The iPad app is elegant, and contains full text for articles from Nature, Genome Research, Genome Biology, BMC Genetics. Despite being from different journals the text and figures from these articles are displayed in the same format in the app. Curious as to how this was done I "disassembled" the iPad app (see Extract and Explore an iOS App in Mac OS X for how to do this. If you've downloaded the app on your iPad and synced the iPad with your Mac, then the apps are in the folder "iTunes/iTunes Media/Mobile Applications" folder inside your "Music" folder. The app contains a file called encode.zip, and inside that folder are the articles and threads, all as ePub files. ePub is the format used by a number of book-reading apps, such as Apple's iBooks. Nature has a lot of experience with ePub, using it in their iPhone and iPad journal apps (see my earlier article on these apps, and my web-based clone for more details).

ePub has several advantages in this context over, say, PDFs. Because it ePUb is essentially HTML, the text and images can be reflowed, and it is possible to style the content consistently (imagine how much clunkier things would have looked if the app had used PDFs of the articles, each in the different journals' house style). Having the text in ePub also makes creating threads easy, you simply extract the relevant chunks and combine them into a new ePub file.

Threads are an interesting approach, particularly as they cut across the traditional boundaries of individual articles to create a kind of "mash up." Of course, in the ENCODE app these are preselected for you, you can't create your own thread. But you could imagine having an app that would enable you to not just collect the papers relevant to a topic (as we do with bibliographic software), but enable you to extract the relevant chunks and create a personalised mash up across papers from multiple journals, each linked back to the original article (much like Ted Nelson envisioned for the Xanadu project). It will be interesting to see whether thread-like approaches get more widely adopted. Whatever happens, Nature are consistently coming up with innovative approaches to displaying and navigating the scientific literature.

EOL iPad web app using jQueryMobile

As part of a course on "phyloinformatics" that I'm about to teach I've been making some visualisations of classifications. Here's one I've put together using jQuery Mobile and the Encyclopedia of Life API. It's pretty limited, but is a simple way to explore EOL using three different classifications. You can view this live at http://iphylo.org/~rpage/phyloinformatics/eoliphone/ (looks best on an iPad or iPhone). Once I've tidied it up I'll put the code online. Meantime here's a quick demo:

Towards an interactive taxonomic article: displaying an article from ZooKeys

One of the things I keep revisiting is the way we display scientific articles. Apart from Nature's excellent iPhone and iPad apps, most efforts to re-imagine how we display articles are little more than glorified PDF viewers (e.g., the PLoS iPad app).

Part of the challenge is that if we make the article more interactive we immediately confront the problem of how to link to other content. For example, we may have a lovingly crafted ePub view (e.g., Nature's apps), but what happens when the user clicks on a citation to another paper? If the paper is published by the same journal, then potentially it could be viewed using the same viewer, but if not then we are at the mercy of the other publisher. They will have their own ideas of how to display articles, so the simplest fallback is to display the cited article in a web browser view. The problem with this is that it breaks the user experience - the other publisher is unlikely to follow the same conventions for displaying an article and its links. If we are lucky the cited article might be published in an Open Access journal that provides, say, XML based on the NLM DTD standard. Knowing whether an article is Open Access or not is not straightforward, and different journals have their own unique interpretation of the NLM standard.

Then there is the issue of other kinds of content, such as taxonomic names, specimens, DNA sequences, geographic localities, etc. We lack decent services for many of these objects, as a result efforts like PLoS Biodiversity Hub end up being underwhelming collections of reformatted journal articles, rather then innovative integrations of biodiversity knowledge.

With these issues in mind I've started playing with ZooKeys XML, initially looking at ways to display the article beyond the conventional format. Ultimately I'd like to embed the article in a broader web of citations and data. ZooKeys articles are available in PDF, HTML, and XML. The HTML has links to taxon pages, maps, etc., which is nice, but I personally find this a little jarring because it interrupts the reading experience. The ZooKeys web site also surrounds the article with all paraphernalia of a publisher's web site:

Zookeys

As a first experiment, I've taken the XML for article At the lower size limit for tetrapods, two new species of the miniaturized frog genus Paedophryne (Anura, Microhylidae) http://dx.doi.org/10.3897/zookeys.154.1963 and used a XSLT style sheet to reformat the article. I've borrowed some ideas from Nature's apps, such as the font for the title, displaying the abstract in bold, and showing all the figures in the article as thumbnails near the top. I've also added some basic interactivity, which you can see in the video below. Instead of figures being in one place in the article, wherever a figure is mentioned in the article (e.g., "Fig. 1") if you click on the reference to the figure it appears. If the article display a point locality using latitude and longitude, instead of launching a separate browser window with a Google map, click on the locality and the map appears. The idea is that the flow of reading isn't interrupted, figures, maps, and citations all appear in the text.

This demo (which you can see live at http://iphylo.org/~rpage/zookeys) is limited, but most of its functionality comes from simply reformatting XML using XSLT. There's a little bit of jQuery for animation, and I ended up having to write a PHP script to convert verbatim latitude and longitude coordinates to the decimal coordinates expected by Google Maps, but it's all very light weight. It wouldn't take much to add some JSON queries to make the taxon names clickable (e.g., showing a summary of a taxon from EOL). Because ZooKeys uses the NLM DTD for its XML, some of this code could also be applied to other journals, such as PLoS, so we could start to grow a library of linked, interactive taxonomic articles.

Suggested apps for BHL's Life and Literature Code Challenge

Since I won't be able to be at the Biodiversity Heritage Library's Life and Literature meeting I thought I'd share some ideas for their Life and Literature Code Challenge. The deadline is pretty close (October 17) so having ideas now isn't terribly helpful I admit. That aside, here are some thoughts inspired by the challenge. In part this post has been inspired by the Results of the PLoS and Mendeley "Call for Apps", where PLoS and Mendeley asked for people (not necessarily developers) to suggest the kind of apps they'd like to see. As an aside, one thing conspicuous by it's absence is a prize for winning the challenge. PLoS and Mendeley have a "API Binary Battle" with a prize of $US 10,001, which seems more likely to inspire people to take part.

Visual search engine
I suspect that many BHL users are looking for illustrations (exemplified by the images being gathered in BHL's Flickr group). One way to search for images would be to search within the OCR text for figure and plate captions, such as "Fig. 1". Indexing these captions by taxonomic name would provide a simple image search tool. For modern publications most figures are on the same page as the caption, but for older publications with illustrations as plates, the caption and corresponding image may be separated (e.g., on facing pages), so the search results might need to show pages around the page containing the caption. As an aside, it's a pity the Flickr images only link to the BHL item and not the BHL page. If they did the later, and the images were tagged with what they depict, you could great a visual search engine using the Flickr API (of course, this might be just the way to implement the visual search engine — harvest images, tags with PageID and taxon names, upload to Flickr).

Mobile interface
The BHL web site doesn't look great on an iPhone. It makes no concessions to the mobile device, and there are some weird things such as the way the list of pages is rendered. A number of mainstream science publishers are exploring mobile versions of their web sites, for example Taylor and Francis have a jQuery Mobile powered interface for mobile users. I've explored iPad interfaces to scientific articles in previous posts. BHL content posses some challenges, but is fundamentally the same as viewing PDFs — you have fixed pages that you may want to zoom.

OCR correction
There is a lot of scope for cleaning up the OCR text in BHL. Part of the trick would be to have a simple use interface for people to contribute to this task. In an earlier post I discussed a Firefox hOCR add-on that provides a nice way to do this. Take this as a starting point, add a way to save the cleaned up text, and you'd be well on the way to making a useful tool.

Taxon name timeline
Despite the shiny new interface, the Encyclopedia of Life still displays BHL literature in the same clunky way I described in an earlier blog post. It would great to have a timeline of the usage of a name, especially if you could compare the usage of different names (such as synonyms). In many ways this is the BHL equivalent Google Books Ngram viewer.

These are just a few hastily put together thoughts. If you have any other ideas or suggestions, feel free to add them as comments below.

- Posted using BlogPress from my iPad

Viewing scientific articles on the iPad: cloning the Nature.com iPhone app using jQuery Mobile

Over the last few months I've been exploring different ways to view scientific articles on the iPad, summarised here. I've also made a few prototypes, either from scratch (such as my response to the PLoS iPad app) or using Sencha Touch (see Touching citations on the iPad).

Today, it's time for something a little different. The Sencha Touch framework I used earlier is huge and wasn't easy to get my head around. I was resigning myself to trying to get to grips with it when jQuery Mobile came along. Still in alpha, jQuery Mobile is very simple and elegant, and writing an app is basically a case of writing HTML (with a little Javascript here and there if needed). It has a few rough edges, but it's possible to create something usable very quickly. And, it's actually fun.

So, to learn a it more about how to use it, I decided to see if I could write a "clone" of Nature.com's iPhone app (which I reviewed earlier). Nature's app is in many ways the most interesting iOS app for articles because it doesn't treat the article as a monolithic PDF, but rather it uses the ePub format. As a result, you can view figures, tables, and references separately.

The cloneYou can see the clone here.

I've tried to mimic the basic functionality of the Nature.com app in terms of transitions between pages, display of figures, references, etc. In making this clone I've focussed on just the article display.

A web app is going to lack the speed and functionality of a native app, but is probably a lot faster to develop. It also works on a wider range of platforms. jQuery Mobile is committed to supporting a wide range of platforms, so this clone should work on platforms other than the iPad.

The Nature.com app has a lot of additional functionality apart from just displaying articles, such as list the latest articles from Nature.com journals, manage a user's bookmarks, and enable the user to buy subscriptions. Some of this functionality would be pretty easy to add to this clone, for example by consuming RSS feeds to get article lists. With a little effort one could have a simple, Web-based app to browse Nature content across a range of mobile devices.

Technical stuff

Nature's app uses the ePub format, but Nature's web site doesn't provide an option to download articles in ePub format. However, if you use a HTTP debugging proxy (such as Charles Proxy) when using Nature's app you can see the URLs needed to fetch the ePub file.

I grabbed a couple of ePub files for articles in Nature communications and unzipped them (.epub files are zip files). The iPad app is a single HTML file that uses some Ajax calls to populate the different views. One Ajax call takes the index.html that has the article text and replaces the internal and external links with calls to Javascript functions. An article's references, figure captions, and tables are stored in separate XML files, so I have some simple PHP scripts that read the XML and extract the relevant bits. Internal links (such as to figures and references) are handled by jQuery Mobile. External links are displayed within an iFrame.

There are some intellectual property issues to address. Nature isn't an Open Access journal, but some articles in Nature Communications are (under the Commons Attribution-NonCommercial-Share Alike 3.0 Unported License), so I've used two of these as examples. When it displays an article, Nature's app uses Droid fonts for the article heading. These fonts are supplied as an SVG file contained within the ePub file. Droid fonts are available under an Apache License as TrueType fonts as part of the Android SDK. I couldn't find SVG versions of the fonts in the Android SDK, so I use the TrueType fonts (see Jeffrey Zeldman's Web type news: iPhone and iPad now support TrueType font embedding. This is huge.). Oh, and I "borrowed" some of the CSS from the style.css file that comes with each ePub file.

BHL and the iPad

@elyw I'd leave bookmarking to 3rd party, e.g. Mendeley. #bhlib specific issues incl. displaying DjVu files, and highlighting taxon namesless than a minute ago via Tweetie for MacRoderic Page
rdmpage

Quick mock-up of a possible BHL iPad app (made using OmniGraffle), showing a paper from BioStor(http://biostor.org/reference/50335). Idea is to display a scanned page at a time, with taxonomic names on page being clickable (for example, user might get a list of other BHL content for this name). To enable quick navigation all the pages in the document being viewed are displayed in a scrollable gallery below main page.

Key to making this happen is being able to display DjVu files in a sensible way, maybe building on DjVu XML to HTML. Because BHL content is scanned, it makes sense to treat content as pages. We could extract OCR text and display that as a continuous block of text, but the OCR is sometimes pretty poor, and we'd also have to parse the text and interpret its structure (e.g., this is the title, these are section headings, etc.), and that's going to be hard work.

Touching citations on the iPad

Quick demo of the mockup I alluded to in the previous post. Here's a screen shot of the article "PhyloExplorer: a web server to validate, explore and query phylogenetic trees" (doi:10.1186/1471-2148-9-108) as displayed as a web-app on the iPad. You can view this at http://iphylo.org/~rpage/ipad/touch/ (you don't need an iPad, although it does work rather better on one).

I've taken the XML for the article, and redisplayed it as HTML, with (most) of the citations highlighted in blue. If you touch one (or click on it if you're using a desktop browser) then you'll see a popover with some basic bibliographic details. For some papers which are Open Access I've extracted thumbnails of the figures, such as for "PhyloFinder: an intelligent search engine for phylogenetic tree databases" (doi:10.1186/1471-2148-8-90), shown above (and in more detail below):

The idea is to give the reader a sense of what the paper is about, beyond can be gleaned from just the title and authors. The idea was inspired by the Biotext search engine from Marti Hearst's group, as well as Elsevier's "graphical abstract" noted by Alex Wild (@Myrmecos).

Here's a quick screencast showing it "live":

iPad citation popover from Roderic Page on Vimeo.

The next step is to enable the reader to then go and read this paper within the iPad web-app (doh!), which is fairly trivial to do, but it's Friday and I'm already late...

CouchDB, Mendeley, and what I really want in an iPad article viewer

Playing with @couchdb, starting to think of the Mendeley API as a read/write JSON store, and having a reader app built on that...less than a minute ago via Tweetie for MacRoderic Page
rdmpage

It's slowly dawning on me that many of the ingredients for an alternative different way to browse scientific articles may already be in place. After my first crude efforts at what an iPad reader might look like I've started afresh with a new attempt, based on the Sencha Touch framework. The goal here isn't to make a polished app, but rather to get a sense of what could be done.

The first goal is to be able to browse the literature as if it was a connected series of documents (which is what, of course, it is). This requires taking the full text of an article, extracting the citations, and making them links to further documents (also with their citations extracted, and so on). Leaving aside the obvious problem that this approach is limited to open access articles, an app that does this is going to have to store a lot of bibliographic data as the reader browses the literature (otherwise we going to have to do all the processing on the fly, and that's not going to be fast enough). So, we need some storage.

MySQL
One option is to write a MySQL database to hold articles, books, etc. Doable (I've done more of these than I care to remember), but things get messy pretty quickly, especially as you add functionality (tags, fulltext, figures, etc.).

RDF
Another option is to use RDF and a tripe store. I've played with linked data quite a bit lately (see previous "Friday follies" here and here), and I thought that a triple store would be a great way support an article browser (especially as we add additional kinds of data, such as sequences, specimens, phylogenies, etc.). But linked data is a mess. For the things I care about there are either no canonical identifiers, or too many, and rarely does the primary data provider served linked data compliant URLs (e.g., NCBI), hence we end up with a plethora of wrappers around these sources. Then there's the issue of what vocabularies to use (once again, there are either none, or too many). As a query language SPARQL isn't great, and don't even get me started on the issue of editing data. OK, so I get the whole idea of linked data, it's just that the overhead of getting anything done seems too high. You've got to get a lot of ducks to line up.

CounchDB

So, I started playing with CounchDB, in a fairly idle way. I'd had a look before, but didn't really get my head around the very different way of querying a database that CouchDB requires. Despite this learning curve, CouchDB has some great features. It stores documents in JSON, which makes it trivial to add data as objects (instead of mucking around with breaking them up into tables for SQL, or atomising them into triples for RDF), it supports versioning right out of the box (vital because metadata is often wrong and needs to be tidied up), and you talk to it using HTTP, which means no middleware to get in the way. You just point your browser (or curl, or whatever HTTP tool you have) and send GET, POST, PUT, or DELETE commands. And now it's in the cloud.

In some ways ending up with CouchDB (or something similar) seems inevitable. The one "semantic web" tool that I've made most use of is Semantic MediaWiki, which powers the NCBI to Wikipedia mapping I created in June. Semantic Mediawiki has it's uses, but occasionally it has driven me to distraction. But, when you get down to it, Semantic Mediawiki is really just a versioned document store (where the documents are typically key-value pairs), over which have been laid a pretty limited query language and some RDF export features. Put like this, most of the huge Mediawiki engine underlying Semantic MediaWiki isn't needed, so why not cut to the chase and use a purpose-built versioned document store? Enter CouchDB.

Browsing and Mendeley
So, what I have in mind is a browser that crawls a document, extracting citations, and enabling the reader to explore those. Eventually it will also extract all the other chocolatey goodness in an article (sequences, specimens, taxonomic names, etc.), but for now I'm focussing on articles and citations. A browser would need to store article metadata (say, each time it encounters an article for the first time), as well as update existing metadata (by adding missing DOIs, PubMed ids, citations, etc.), so what easier way than as JSON in a document store such as CouchDB? This is what I'm exploring at the moment, but let's take a step back for a second.

The Mendeley API, as poorly developed as it is, could be treated as essentially a wrapper around a JSON document store (the API stores and returns JSON), and speaks HTTP. So, we could imagine a browser that crawls the Mendeley database, adding papers that aren't in Mendeley as it goes. The act of browsing and reading would actively contribute to the database. Of course, we could spin this around, and argue that a crawler + CouchDB could pretty effectively create a clone of Mendeley's database (albeit without the social networking features that come with have a large user community).

This is another reason why the current crop of iPad article viewers, Mendeley's included, are so disappointing. There's the potential to completely change the way we interact with the scientific literature (instead of passively consuming PDFs), and Mendeley is ideally positioned to support this. Yes, I realise that for the vast majority of people being able to manage their PDFs and format bibliographies in MS Word are the killer features, but, seriously, is that all we aspire too?

Viewing scientific articles on the iPad: browsing articles

In previous articles I've looked at how various apps display scientific articles. The apps I looked at were:

So, where next? As Ian Mulvany noted in a comment on an earlier post, I haven't attempted to summarise the best user interface metaphors for navigation. Rather than try and do that in the abstract, I'd like to create some prototypes to play with various ideas. The Sencha Touch framework looks a good place to start. It's web-based, so things can be prototyped rapidly (I'm not going to learn Objective C anytime soon). There's a moderately steep learning curve, unless you've written a lot of Javascript (I've done some, but not a lot), but it seems to offer a lot of functionality. Another advantage of developing a web app is that it keeps the focus on making the content accessible across devices, and using the web as the means to display and interact with content.

Then there is also the issue (in addition to displaying an individual article) of how to browse and find articles to view. Here are some possibilities.

Publisher's stream
Apps such as the Nature app and the PLos Reader provide you with a stream of articles from a single publisher. This is obviously a bit limiting for the reader, but might have some advantages if the publisher has specifically enhanced their content for devices such as the iPad.

Personal library
Apps such as Mendeley and Papers provide articles from your personal library. These are papers you care about, and one you may make active use of.

Social
Social readers such as Flipboard show the power of bringing together in one place content derived from social streams, such as Twitter and Facebook, as well as curated sources and publisher streams. Mendeley and other social bookmarking services (e.g., CiteULike, Connotea) could be used to provide social similar streams of papers for an article viewer. Here the goal is probably to find out what papers people you know find interesting.

Spatial

In an earlier post I used a map to explore papers in my BioStor archive. This would be an obvious thing to add to an iPad app, especially as the iPad knows where you are. Hence, you could imagine browsing papers about areas that are near you, or perhaps by authors near you. This would be useful if, say, you wanted to know about ecological or health studies of the area you live in. If the geographic search was for people rather than papers, you could easily discovering what kind of research is published by universities or other research bodies that are near your current location.

Of course, Earth is not the only thing we can explore spatially. Google maps can display other bodies in the solar system, (e.g., Mars), as well as the night sky. Imagine being interested in astronomy and being able to browse papers about specific planetary or stellar objects. Likewise, genomes can be browsed using Google maps-inspired browsers (e.g., jBrowse), so we could have an app where you could easily retrieve articles about a particular gene or other region of a genome.

Categories
Another way to browse content is by topic. Classifying knowledge into categories is somewhat fraught, but there are some obvious wasy this could be useful. A biologist might want to navigate content by taxonomic group, particularly if they want to browse through the 1000's of articles published in a journal such as Zootaxa (hence my experiments on browsing EOL). Of course, a tree is not the only way to navigate hierarchical content. Treemaps are another example, and I've played with various versions in the past (see here and here).

I have a love-hate relationship with treemaps, but some of the most interesting work I've seen on treemaps has been motivated by displaying information on small screens, e.g. "Using treemaps to visualize threaded discussion forums on PDAs" (doi:10.1145/1056808.1056915).

Summary
These notes list some of the more obvious ways to browse a collection of articles. It would be fun to explore these (and other approaches) in parallel with thinking about how to display the actual articles. These two issues are related, in the sense that the more metadata we can extract from the articles (such as keywords, taxonomic names and other named entities, geographic localities, etc.) the richer the possibilities for finding our way through those articles.

Viewing scientific articles on the iPad: iBooks

Apple's iBooks app is an ePub and PDF reader, and one could write a lengthy article about its interface. However, in the context of these posts on visualising the scientific article there's one feature that has particularly struck me. When reading a book that cited other literature the citations are hyper-links: click on one and iBooks forwards you (via the page turning effect) to the reference in the book's bibliography. This can be a little jarring (one minute you're reading the page, next you're in the bibliography), but to help maintain context the reference is preceded by the snippet of text in which it is cited:

To make this concrete, here's an example from Clarky Shirky's "Cognitive Surplus."

In the body of the text (left) the text "notes in his book The Success of Open Source" (which I've highlighted in blue) is a hyper-link. Click on it, and we see the source of the citation (right), together with the text that formed the hyper-link. This context helps remind you why you wanted to follow up the citation, and also provides the way back to the text: click on the context snippet and you're taken back to the original page.

Providing context for a citation is a nice feature, and there are various ways to do this. For example, the Elsevier Life Sciences Challenge entry by Wan et al. ("Supporting browsing-specific information needs: Introducing the Citation-Sensitive In-Browser Summariser", doi:10.1016/j.websem.2010.03.002, see also an earlier version on CiteSeer) takes a different approach. Rather than provide local context for a citation in an article (a la iBooks), Wan et al. provide context-sensitive summaries of the reference cited to help the the reader judge whether it's worth her time to fetch the reference and read it.

Both of these approaches suggest that we could be a lot more creative about how we display and interact with citations when viewing an article.

Viewing scientific articles on the iPad: Mendeley

Previously I've looked at the Nature, PLoS, and Papers apps, now it's the turn of the Mendeley iPad app. As before, this isn't a review of the app as such, I'm more interested in documenting how the app interface works, with a view to discovering if there are consistent metaphors we can use for navigating bibliographic databases.

Perhaps the key difference between Mendeley and the other apps is that Mendeley is cloud-based, in that the bibliography exists on Mendeley's servers, as well as locally on your desktop, iPad, or iPhone. Hence, whereas the Nature and PLoS apps consume a web stream of documents, and Papers enables you to sync collections between desktop and iOS devices, Mendeley syncs to central web server. At present this appears to be done over HTTPS. Mendeley recently released an API, which I've discussed at length. Mendeley's app doesn't use this API, which is a pity because if it did I suspect the API would be getting the love it needs from Mendeley's developers.

Like Papers, the Mendley app uses a split view, where the left-hand panel is used for navigation.

You can drill down to lists of references, and display basic details about an article.

PDF
The Mendeley app is a PDF viewer, but whereas the PLoS app has page turning, and the Papers app scrolls pages from left to right, the Mendeley app displays PDF pages vertically (which is probably the more natural way to scroll through content on the iPad):

Summary
It's clearly early days for the Mendeley app, but it's worth noting two of its most obvious limitations. Firstly, it depends entirely on the user's existing Mendeley bibliography - you can't add to this using the app, it's simply a viewer. Compare this to Papers which can access a suite of search engines from which you can download new papers (albeit with some limitations, for example the Papers iPad app doesn't seem to support extracting metadata via XMP, unlike the desktop version). Secondly, despite Mendeley having as one of its goals being a

research network that allows you to keep track of your colleagues' publications, conference participations, awards etc., and helps you discover people with research interests similar to yours

the Mendeley app lacks any social features, apart from sharing by email(!). I think designing social interactions in bibliographic apps will be a challenge. For an example of what social reading can look like, check out Flipboard.

Viewing scientific articles on the iPad: Papers

Continuing the series of posts about reading scientific articles on the iPad, here are some quick notes on perhaps the most polished app I've seen, Papers for iPad. As with earlier posts on the Nature and PLoS apps, I'm not writing an in-depth review - rather I'm interested in the basic interface design.

Papers is available for the Mac, as well as the iPhone and iPad. Unlike social bibliographic apps such as Zotero and Mendeley, Papers lacks a web client. Instead, all your PDFs are held on your Mac, which can be wirelessly synced with Papers on the iPad or iPhone.

Navigation popover

Papers makes extensive of use of the split view, in which the screen is split into two panes, the left-hand split becoming a popover when you hold the iPad in porrtait orientation. Almost all of the functionality of the iPhone version is crammed into the left-hand split. The popover displays the main interface categories (library, help, collections that you've put PDFs into), collections of documents, metadata for individual papers (which you can edit), as well as search results from a wide range of databases:

Some of these features you encounter as you drill down, say from library to list of papers, to details about a document, others you can access by clicking on the tab bar at the bottom.

PDF display
Like the PLoS app, Papers displays PDFs. It doesn't use a page-turning effect, rather you swipe through the pages from left to right, with the current page indicated below in a page control (what Sencha Touch describe as a carousel control).

Given that the document being displayed is a PDF there is no interaction with the images or citations, but you can add highlights and annotations.

Summary
Papers is the first of the iPad apps I've discussed that isn't limited to a single publisher. If and article is online, or in your copy of Papers for the Mac, then you can view it in Papers for iPad. It is the app that I use on a day to day basis, although the PDF viewer can feel a little clunky. I think anyone designing an application reader should play with Papers for a while, if only to see the level of functionality that can be embedded in the basic iPad split view.

Navigating the Encyclopedia of Life tree on the desktop and the iPhone

This week seems to be API week. The Encyclopedia of Life API Beta Test has been out since August 12th. By comparison with the Mendeley API that I've spent rather too much time trying to get to grips with, the EOL API release seems rather understated.

However, I've spent the last couple of days playing with it in order to build a simple tree navigating widget, which you can view at http://iphylo.org/~rpage/eoltree/.

The widget resembles Aaron Thompson's Taxonomy (formerly called KPCOFGS) iPhone app in that it uses the iPhone table view to list all the taxa at a given level in a taxonomic tree. Clicking on a row in this table takes you to the descendants of the corresponding taxon, clicking "Back" takes you back up the tree. if you've reached a leave node (typically a species) the widget displays a snippet of information about that taxon. It also resembles Javier de la Torre's taxonomic browser written in Flex.

Here's a screen shot of the widget running in a desktop web browser:

Here's the same widget in the iPhone web browser:

Using the API
The EOL API is pretty straightforward. I call the http://www.eol.org/api/docs/hierarchy_entries API to get the tree rooted at a given node, then populate each child of that node using http://www.eol.org/api/docs/pages. The result is a simple JSON file that I cache locally to speed up performance and avoid hitting the EOL servers for the same information. because I'm locally caching the API calls I need a couple of PHP scripts to do this, but everything else is HTML and Javascript.

iPhone and iPad
I've not really developed this for the iPhone. I've cobbled together some crude Javascript to simulate some iPhone-like effects, but if I was serious about the phone I'd look into one of the Javascript kits available for iPhone development. However, I did want something that was similar in size to the iPhone screen. The reason is I'm looking at adding taxonomic browsing to the geographic browser I described in the post Browsing a digital library using a map, so I wanted something easy to use but which didn't take up too much space. In the same way that the Pygmybrowse tree viewer I played with in 2006 was a solution to viewing a tree on a small screen, I think developing for the iPhone forces you to strip things down to the bare essentials.

I'm also keeping the iPad in mind. In portrait mode some apps display lists in a popover like this:

This popover takes up a similar amount of screen space to the entire iPhone screen, so if I was to have a web app (or native app) that had taxonomic navigation, I'd want it to be about the size of the iPhone.

Let me know what you think. Meantime I need to think about bolting this onto the map browser, and providing a combined taxonomic and geographic perspective on a set of documents,

Viewing scientific articles on the iPad: the PLoS Reader

Continuing on from my previous post Viewing scientific articles on the iPad: towards a universal article reader, here are some brief notes on the PLoS iPad app that I've previously been critical of.

There are two key things to note about this app. The first is that it uses the page turning metaphor. The article is displayed as a PDF, a page at a time, and the user swipes the page to turn it over. Hence, the app is simulating paper on the iPad screen.

But perhaps more interesting is that, unlike the Nature app discussed earlier, the PLoS app doesn't use a custom API to retrieve articles. Instead the app uses RSS feeds from the PLoS site. PLoS provides journal-specific RSS feeds, as well as subject-specific feeds within journals (see, for example, the PLoS ONE home page). The PLoS Reader app takes these feeds and uses them to create a list of articles the reader can choose from.

A nice feature of the PLoS ATOM feeds is the provision of links to alternative formats for the article (unlike many journal RSS feeds, which provide just a DOI or a URL). For example, the feed item for the article "Transmission of Single HIV-1 Genomes and Dynamics of Early Immune Escape Revealed by Ultra-Deep Sequencing" doi:10.1371/journal.pone.0012303 contains links to the PDF and XML versions of the article:


<link rel="related" 
   type="application/pdf" 
   href="http://www.plosone.org/article/fetchObjectAttachment.action?uri=info:doi/10.1371/journal.pone.0012303&representation=PDF" 
   title="(PDF) Transmission of Single HIV-1 Genomes and Dynamics of Early Immune Escape Revealed by Ultra-Deep Sequencing" />
<link rel="related" 
   type="text/xml" 
   href="http://www.plosone.org/article/fetchObjectAttachment.action?uri=info:doi/10.1371/journal.pone.0012303&representation=XML" 
   title="(XML) Transmission of Single HIV-1 Genomes and Dynamics of Early Immune Escape Revealed by Ultra-Deep Sequencing" />

This makes the task of an article reader much easier. Rather than attempt to screen scrape the article web page, or rely on a rule for constructing the link to the desired file, the feed provides an explicit URL to the different available formats.

I've not seen this feature in other journal RSS feeds, although article web pages sometimes provide this information. BMC journals, for example, provide <link rel="alternate"> tags in the web page for each article, from which we can extract links to the XML and PDF versions, and some journals (BMC included) provide the Google Scholar metadata data tag <meta name="citation_pdf_url"> to link to the PDF. Hence, a generic article reader will need to be able to extract metadata tags from article web pages as it seeks formats suitable to display.

Viewing scientific articles on the iPad: towards a universal article reader

There are a growing number of applications for viewing scientific articles coming out for the iPhone and iPad. I'm toying with extending the experiments described in an earlier post when I took the PLoS iPad app to task for being essentially a PDF page-turner, so I thought I should take a more detailed look at the currently available apps. In particular, I'm interested in how the apps solve some basic tasks, and whether there is a consistent "vocabulary" for interacting with an article. Put less pretentiously, do the apps display things such as lists of articles, citations, references, figures, and bibliographic data in similar ways, or does the user have to learn new rules for each app? I'm also interested in how the apps treat the article (e.g., as a monolithic PDF, as a document with pages, or as a web document where pagination has no meaning), and how they get their content (from a publisher, from the user's social network, from the user's personal library).

Nature
In this post I'm going to look at Nature.com's app. Future posts will explore other apps. I'm interested in what people have done so far, and how we could improve the reading experience. Long term I'm interested in whether there's scope for a "universal article reader" that can take diverse formats (including XML, PDF, and page images) and display them in a consistent and useful way. In the diagrams below I'm using touch gesture symbols from Graffletopia (see Touch Gesture Reference Guide).

Contents
Nature's app is limited to articles published by Nature, and displays the available articles as a list with thumbnails of a figure from the article. The app fetches this list using Nature's mobile API. Up until April 30th the fulltext of an article was free, at at present you are limited to getting abstracts. It's interesting that the list of articles isn't retrieved using a RSS feed, I presume because Nature wanted to use some simple authentication to avoid users downloading all their closed-access content for free.

Article display
Nature's app, unlike all the others I've seen so far, doesn't use PDFs. Instead it uses ePub. Unlike many ePub book readers (including Apple's own iBooks), the Nature app doesn't render the article as a series of pages, but as one continuous document that you scroll down by dragging (it's essentially a web page). You can't zoom the text, but the text size is fine for reading.

Citations
Citations in the body of the article are links. If you tap them the full citation slides in from the right, with a link to the publisher's website. If you tap the link the app opens the website within the app. This can be a little jarring as you move from a customised view of an article to a web page designed for a desktop. In the case of a Nature article, it would be more elegant if the app recognised that the cited reference was a Nature article and rendered it natively in the app. More generally the transition between app and website might be less jarring if journal publishers developed mobile versions of their websites.

Figures
The figures aren't displayed directly in the body of the article, but each mention of a figure in the body of the text is a link. Tapping the link causes the figure to slide up from the bottom of the screen. A button in the top right hand corner enables you to toggle between displaying the figure and it's caption (shown as white text on a black background). You can use pinch and spread to zoom in and out of the figure, as well as save it to the photo library on your device.

Summary
I've started with the Nature app as I think it's the only one so far to seriously tackle the challenge of displaying an article on a mobile device. Instead of displaying PDFs it repackages the articles in ePub format and the result is much more interactive than a PDF.

I hope to explore other article viewing apps in later posts, but it's worth noting that we should also be looking at other apps for ideas. Personally I really like the Guardian's iPhone app, which I use as my main news reader. It has a nice gallery feature to display thumbnails of images (imagine a gallery of an article's figures), and uses tags effectively.

PLoS doesn't "get" the iPad (or the web)

PLoS recently announced a dedicated iPad app, that covers all the PLoS Journals, and which is available from the App Store. Given the statement that "PLoS is committed to continue pushing the boundaries of scientific communication" I was expecting something special. Instead, what we get (as shown) in the video below is a PDF viewer with a nice page turning effect (code here). Maybe it's Steve Job's fault for showing iBooks when he first demoed the iPad, but there desire to imitate 3D page turning effects leaves me cold (for a nice discussion of how this can lead to horribly mixed metaphors see iA's Designing for iPad: Reality Check).

PLoS iPad Demo from PLoS on Vimeo.

But I think this app shows that PLoS really don't grok the iPad. Maybe it's early days, but I find it really disappointing that page-turning PDFs is the first thing they come up with. It's not about recreating the paper experience on a device! There's huge scope for interactivity, which the PLoS app simply ignores — you can't select text, and none of the references. It also ignores the web (without which, ironically, PLoS couldn't exist).

Instead of just moaning about this, I've spent a couple of days fussing with a simple demo of what could be done. I've taken a PLoS paper ("Discovery of the Largest Orbweaving Spider Species: The Evolution of Gigantism in Nephila", doi:10.1371/journal.pone.0007516), grabbed the XML, applied a XSLT style sheet to generate some HTML, and added a little Javascript functionality. References are displayed as clickable links inline. If you click on one a window pops up displaying the citation, and it then tries to find it for you online (for the technically mined, it's using OpenURL and bioGUID). If it succeeds it displays a blue arrow — click that and you're off to the publisher's web site to view the article.

Figures are also links, click on and you get a Lightbox view of the image.
You can view this article live, in a regular browser or in iPad. Here's a video of the demonstration page:

PLoS paper on the iPad from Roderic Page on Vimeo.

This is all very crude and rushed. There's a lot more that could be done. For references we could flag which articles are self citations, we could talk to bookmarking services via their APIs to see which citations the reader already has, etc. We could also make data, sequences, and taxonomic names clickable, providing the reader with more information and avenues for exploration. Then there's the whole issue of figures. For graphs we should have the underlying data so that we can easily make new visualisations, phylogenies should be interactive (at least make the taxon names clickable), and there's no need to amalgamate figures into aggregates like Fig .2 below. Each element (A-E) should be separately addressable so when the text refers to Fig. 2D we can show the user just that element.

journal.pone.0007516.g002.png

The PLoS app and reactions to Elsevier's "Article 2.0" (e.g., Elsevier's 'Article of the Future' resembles websites of the past and The “Article of the Future” — Just Lipstick Again?) suggests publishers are floundering in their efforts to get to grips with the web, and new platforms for interacting with the web.

So, PLoS, I challenge you to show us that you actually "get" the iPad and what it could mean for science publishing. Because at the moment, I've seen nothing that suggests you grasp the opportunity it represents. Better yet, why not revisit Elsevier's Article 2.0 project and have a challenge specifically about re-imagining the scientific article? And please, no more page turning effects

Why I want an iPad

OK, first of all, I want one, I want one real bad.

There's been a general sense of disappointment about the iPad, which I suspect is only natural given the enormous hype leading up to the announcement, as well as the fact that the applications shown were fairly conventional. Personally I don't think book reading is where the action is. For time-sensitive stuff like newspapers, and rich, complex documents such as scientific papers, sure, but physical books strike me as a piece of technology that we're not really going to improve on, rather like knives and forks.

But some grasp that this is magic. What I hope the iPad will do is finally move some visualisation tools into the main stream (as much as phylogenetics can be thought of as mainstream). The challenge of visualising large phylogenies has yielded some cool tools which, sadly, remained under-developed, such as TreeJuxtaposer, which seems clunky and counter-intuituve when using a mouse, but with a touch screen would just be awesome.

Tools such as Paloverde would also be more intuitive to use, as would the magnifier feature in Dendroscope. Imagine "pinch and zoom" in TreeJuxtaposer or Dendroscope, or for viewing large a sequence alignments.

Then there's the existing tabletop tools that I blogged about earlier:

And of course there's Perceptive Pixel's view of a taxonomic classification:

There would be some work involved porting these tools to the iPad (e.g., porting code from Java to Objective C in the case of TreeJuxtaposer and Dendroscope), but the person who does this is going to have an impact on this field comparable to Maddison brothers when they released MacClade in 1986.

Subscribe to: Posts ( Atom )