Taxonomy Watch: 2009

Friday, December 11, 2009

Designing for Faceted Search

Special Report – Designing for Faceted Search By Stephanie Lemieux (with Seth Earley & Associates) Altsearchengines (Dec 2009)

This article, originally published in KMWorld March 2009, is reproduced with diagrams and illustrations in AltSearchEngines. Article has good, practical advice for designing for faceted search.

Wednesday, December 09, 2009

Portfolio of articles on Taxonomies and Tagging

The FUMSI Folio on Taxonomies and Tagging looks to be a useful reference for taxonomists. This is a collection of articles on taxonomies and tagging practices published in October 2009. Only $64 US.

Table of Contents:

* Editorial Introduction by Karen Loasby, Contributing Editor, Manage
* Taxonomies and Tagging Survey Results
* Creating User Centred Taxonomies: Part One, by James Kelway
* Creating User Centred Taxonomies: Part Two, by James Kelway
* Folksonomies: Business Use, by Fran Alexander
* Automatic Classification: A Panel Discussion, by Karen Loasby
* Image Findability: Improving through Tags, by Ian Davis
* Becoming a Taxonomist: Real Life Stories, by Karen Loasby
* Recommended Resources

Friday, November 27, 2009

Digital Asset Management and Metadata for Images and Video

This piece was previously posted on - 'Ian Davis - managing information'

Missing out on the recent Photo Metadata Conference - http://bit.ly/6PlLJj - has reminded me how much I love working in the DAM world, in particular in the area of creating metadata and controlled vocabularies to support digital image and video search and browse.

Reading about the Photo Metadata Conference programme it seems like there were some great presentations. I downloaded them all, they're available from the conference website, and had great fun going through all the excellent experiences, comments and ideas.

I wish I'd been there for Madi Solomon's keynote on the collapse of boundaries in the digital world. I agree that it's less and less about what format an asset is in and more about what that asset is, and how it needs to be organised to support its use.

Assets need to work for their places in the world. Finding them and using them needs to be simpler, and metadata and controlled vocabularies need to support and enable this.

Understanding the assets an organization has, analysing the needs of that organisation, and ensuring they have what they need and that each asset is organised to support its use, is where the really exciting and satisfying work is for me.

After having worked for Corbis from 1991 to 1999, in the early research and development days of digital image organisation and sale, I was excited to see Max Wieberneits presentation on still and video metadata.

Video and still images have much in common. I've blogged about this in the past and it's still a big area for me. Both asset types have technical metadata, depicted content metadata and aboutness metadata, to name but a few. Add to this the sound tracks for video - which can be indexed for retrieval, and the ability to segment video into scenes and key frames, and you have an exciting mix of metadata across both formats.

I agree with Max that using established metadata systems makes a huge amount of sense, as does working to get as much metadata as possible from the creators or custodians of images and video - it's much easier to capture metadata early on in the creation process than down the line, and some metadata will be lost if you leave its capture too late.

As Max says, one key concern for image and video asset metadata is the users of the assets. Different people have different needs and need different metadata. For many people a good level of access to video can be built using initial metadata associated with the videos, key scene and frame analysis and the indexing of the audio tracks of the videos. Whereas for others, access to the mood of the video may only come through music analysis, lack of noise at key moments, and manually applied subject tags.

On the image side, as Max says, editorial users have somewhat differing needs to commercial users of stock photos. Max showed a great slide listing a long set of conceptual keywords: 'comfortable, dreaming, luxury, spoiled' etc. I remember the fun we had creating these concepts, arranging them in hierarchies, providing synonyms for them, and creating definitions and application rules to control how they're assigned. It sounds easy, but trying to accurately use a concept like, "spoiled" or "luxury" often brings many challenges.

I've already touched on the needs of video users, and some of the basic ways video can be organised. It was great to read Lionel Faucher's piece on how a video agency uses metadata. Video is easier than still images to work with, automated solutions are more applicable to video and much more successful, but challenges still abound, as Lionel clearly shows in his presentation.

One of the interesting topics I've been following for a while is the metadata being generated from digital cameras, and the work being done to make more use of it. Related to this is the exciting area of geographic coordinate metadata, which is created by some digital cameras when a photo is taken, and the uses to which that can be put.

Two presentations in the area of geography and image metadata were given by Bern Beuermann, and Ross Purves. A great research area was mentioned by Bernd - the taking of GPS co-ordinates and linking them to points of interest that are within a certain range of a GPS location. This can make the tagging of images with key depicted buildings, or topography a little easier and will produce many advantages for image tagging and retrieval..

A couple of things that I'm interested in were missing from the conference. I'd have liked to have seen more on: working with video soundtracks, automatic scene and frame analysis, and the place of manually applied tags in video indexing. I'd also like to have seen more about the creation of hybrid image retrieval systems that bring together content based image retrieval with controlled vocabulary and folksonomy tags. Maybe that's all for next year!

There also seemed to have been a big emphasis on technology, file formats, and metadata standards - in many ways the building blocks or key tools for organising and providing access to video and image content. What I'd have liked to see more of is the uses to which these building blocks have been put, the real world sharing of user needs and the challenges of actually making the technology and the supporting structures work to achieve business aims.

I should end by thanking the organisers of the event, and the presenters, for putting so many presentations online - it's very helpful and refreshing to have such a good level of access to this form of content.

One way in which I keep involved in the image and video world is through my involvement in the DAM Foundation on Linkedin. There is a coffee meet-up organised for this afternoon, which I hope will kick start a lot of exciting developments. I'll post more about the outcome of the meeting next week.

Ian

Monday, November 23, 2009

Taxonomies and new technology

The Death of Taxonomies, revisited by Theresa Regli, CMS Watch (Nov 13)

Technology of text mining, entity extraction and semantic analysis is doing the grunt work of taxonomists. Theresa Regli foresees change in the life and work (and even name) of the taxonomist.

"Taking taxonomies beyond what technology can achieve on its own is the metadata architect’s challenge for the next decade, because technology is at the point where it achieves what taxonomists were doing a decade ago."

Sunday, November 22, 2009

Nstein Semantic Search

Nstein Technologies Launches Semantic Site Search, press release, NStein (Nov 17)

"Nstein Technologies Inc. www.nstein.com (TSX-V: EIN), a leader in digital content management solutions for information-rich enterprises, today announced the release of a new product, Semantic Site Search (3S). “3S is a front-end, multi-index search engine designed to provide users an unparalleled search experience,” said Nstein CTO Jean-Michel Texier. 3S leverages Nstein’s patented text-mining technology to power a faceted site search which returns highly accurate results that are organized categorically."

Thursday, November 12, 2009

Text Analytics to Help in Classification

Rise of the Machines: The Role of Text Analytics in Record Classification and Disposition by James Santangelo, Information Management, ARMA (Nov/Dec 2009)

Classification is essential but may be overwhelming to staff. Because of the volume automated classification is needed - and text analytics software can help.

"The latest advancements in text analytics use sophisticated techniques to determine the conceptual meanings within each file to compensate for shortcomings and extend the functionality of the applications that use policy rule engines. Use of text analytics greatly increases the accuracy of the classification by interpreting the meaning of terms in their context instead of being limited by the character strings inherent in policy rule engines."

Text Analytics

Text Analytics Gains a Broader Audience in the Enterprise by Paula J. Hane, IT Newslinks (Nov 2)

Text analytics is becoming more important to search. As this article explains:

"Text analytics extracts key information from unstructured text and helps to retrieve otherwise hidden information. It is a key component of many customer relationship management (CRM) applications, as well as for media and publishing, competitive intelligence, reputation monitoring, e-discovery, compliance, and financial analysis. Because of this, we've seen a number of acquisitions of text analytics firms by larger search companies (Business Objects acquired Inxight, Reuters acquired ClearForest, SAS acquired Teragram, and IBM acquired SPSS) and an increased pace of product and service rollouts."

It's really automatic tagging. Susan Feldman said of one vendor, TEMIS:

""It's clear that text analytics has taken off as a hot market, and TEMIS' expansion of its US business underlines this fact." "As the volume and flow of information increases, publishers and corporations are turning to automation to tag their content to make it findable, to understand what their customers are saying, to monitor trends and opinions about their products and their companies. That's impossible, given the exponential growth of information that needs to be processed, unless the process is automated." "

Wednesday, November 11, 2009

OpenCalais is amazing

Learn About and Try OpenCalais (a Free Service from Thomson Reuters), ResourceShelf (Nov 6)

OpenCalais is making a dent in use of semantic technology to extract entities and topics from text.

"In a nutshell, OpenCalais uses semantic technology and natural language processing to analyze text and add metadata by drawing out entities from documents, blog posts, news stories, etc. In some cases, ths type of data can identify or help identify relationships between people, businesses, etc."

This post gives an example of what it can do, and points to the OpenCalais viewerbox where we can try it for ourselves - take a substantial story from an online news site and see the types of data that Calais can extract and organize.

Explore to see the power of the tool. Will we need taxonomies if we have tools like OpenCalais?

Further, we can have this at our fingertips for content we follow with Feedly, a Firefox plugin.

Feed(ly)ing The Enterprise, by Jennifer Zaino, Semantic Web (Nov 9)

"For one thing, it’s the semantic technology embedded within Feedly, which uses the OpenCalais web service to get a clean representation of metadata behind content. That gives power to enterprise users such as marketing professionals, who might be subscribed to various blogs and feeds and services and different content that’s relevant to their brand."

Friday, October 30, 2009

Endeca - Aiming for integrated search/BI

Endeca Stresses Simplicity With New Partnerships by Theresa Cramer, Newsbreaks (Oct 29)

Endeca is partnering with Informatica and SAP in a bid to integrate search with business intelligence.

"Under the terms of the OEM agreement, Endeca will integrate Xcelsius software, SAP BusinessObjects Intelligent Search software, PowerCenter, and PowerExchange into the Endeca Information Access Platform. These partnerships will address two very different issues for Endeca and its customers, says Sonderegger."

Inxight, now part of SAP, turns up in this too.

"There is also a little something in this new partnership for Endeca's intelligence customers, namely, the licensing of SAP's query federation tool, Inxight SmartDiscovery Awareness Server. Intelligence customers will be able to send off a query about "uranium enrichment," for example, to multiple information sources such as Endeca applications, proprietary databases, The New York Times, and The Washington Post and expect to see results in one screen. "This comes straight from the traditional search world," says Sonderegger. "We've added our own twist. ... When results come back, the analyst will be able to do faceted browsing across search results.""

The Taxonomists Career

Becoming a Taxonomist: Real Life Stories by Karen Loasby, FUMSI (Oct 2009)

Four information taxonomists tell their career stories - Heather Hedden, Helen Lippell, Dorothy Tuma and Stephen D'Arcy. We see a mix of indexing, information architecture, information management. There are many backgrounds, such as Dorothy's "In addition to abetting my creative avocations, getting my MFA taught me to think about concepts, meaning, semantics, aboutness, and implicit versus explicit meaning. "

Information Access Technology from Gartner

Gartner has issued its 2009 Magic Quadrant for Information Access Technology (Sept 2009)

The overall description is a fair representation of enterprise search technology today. Selecting the vendor is only one step. A much larger one is to make the technology meet an organization's information needs - an endeavour that takes much planning and much adaption of both the technology and the organization.

"This Magic Quadrant assesses vendors with capabilities that go beyond enterprise search to encompass a range of technologies. Their capabilities include: search; federated search, content analytics, such as content classification, categorization and clustering, fact and entity extraction, taxonomy creation and management, information presentation (for example, visualization) to support analysis and understanding; and desktop search to address user-controlled repositories to locate and "invoke" documents, data and e-mail."

Gartner included vendors that have search as the foundation piece. It notes that "Many include other capabilities such as autocategorization, taxonomy functions and clustering, but we excluded those that offer only these capabilities, with no search."

Companies to note: Autonomy, Endeca, Exalead, IBM, Microsoft, Oracle, Recommind, Vivisimo. (There are others.)

Unfortunately the report does not describe the specific capabilities of the vendors.

Not Otherwise Categorized

Add this blog - Not Otherwise Categorized - to the reading list. This comes from Seth Earley and Associates and has several contributors. Caategories include content management, taxonomy, semantic web, Sharepoint (MOSS) and much else. Not a high volume blog, but has some very good reading.

Wednesday, October 28, 2009

Taxonomy Jargon Explained

What’s the difference between Taxonomies and Ontologies? - Ask Dr. Search at New Idea Engineering (June 2009)

Excellent question - excellent answer. These words are often interchangeable - but it depends on the person, as Dr Search explains. The casual user might use either term, but the "deep researcher" might prefer ontology. Dr Search suggests that ontology is the big sweeping pictureof knowledge, and the taxonomy more specific subject domain. There are differences in understanding in computer science context also.

"Beyond academic precision, ontologies try to represent knowledge in a form so carefully that even computers can derive meaning by traversing the various relationships. If a computer were actually relying on this data you can understand that the “is-a” relationship in “Obama is-a president” and “my boss is-a huge pain” have slightly different meanings, the former conferring a job function, the latter a behavioral attribute. Unless you are a researcher or vendor of this technology, most people don’t need to worry about this.

Taxonomies can also be read and used in computer software, for example Verity’s Topic Sets were a form of taxonomy, and could be loaded into a profiler to classify incoming documents; many other companies have had this idea as well. But the linkages between parent and child branches were much simpler in nature, and were designed to simply combine fulltext search terms in various ways. There was no hint of “understanding” in the relationship between a parent and child, beyond simple fulltext matching. This was still very advanced for its time (the late 1980s), but it didn’t attempt to encode meaning."

There are many other terms that may come into the discussion that are related to use of taxonomies such as topic trees, knowledge base, folksonomy (not the same at all), tagging. and sometimes natural language processing (which might be used to help create a taxonomy) and metadata.

Taxonomy is also discussed in Do You Need A Taxonomy? where it is explained that there are three types: Subject based (a subject domain), Content based (derived from the content), and Behaviour (not as clear - might be usage).

Friday, October 23, 2009

Transparent semantic search at Lexis Nexis

Semantic technology is being adopted more for search processing. LexisNexis, known for legal research tools, is enhancing those with a greater understanding of the meaning of content and, according to this press release, is also revealing what it has done to the searcher. Most semantic search engines seem to work like magic - they just "do" and the searcher must accept on faith.

LexisNexis Introduces Transparent Semantic Search Technology for Patent Research, Business Wire (Oct 12)

"Through a development alliance with Dallas-based Pure Discovery, LexisNexis has become the first provider of legal information services to integrate the power of semantic search technology with familiar Boolean search technology, giving the user greater control over the patent research process via a simple, streamlined user interface that matches their typical daily workflow. "

Thursday, October 22, 2009

Intute Thesaurus Removed

Intute has removed its thesaurus for social science. This announcement - The Thesaurus Engine service has been withdrawn - suggests that it did not "align" with UK higher education courses. Odd - would be helpful for students to learn about and use thesauri.

Monday, October 05, 2009

Search Solutions 2009

I recently attended the Search Solutions 2009 one day conference. For an excellent summary of a very interesting day take a look at Karen's recent blog post

For me, 'a star of the show' was Dave Mountain's enthralling discussion, "Location-Based Services: Positioning, Geocontent and Location-Aware Applications". Dave looked at location based services, their current uses and future possibilities. One aspect, which sparked heated debate over coffee, was the very real security implications of having your position pinpointed to a couple of metres. Location Based Services will I think continue to grow and meld together with social applications such as Twitter, Facebook, Flickr, e-commerce and mobile devices. We will increasingly know where the nearest coffee shop is to our location in terms of direct route, time taken to get there etc. Add to this the possibility that everyone else will know where you are in real time and what you're doing and you have a world of many information and privacy challenges. I wonder whether we'll end up with people paying a surcharge to cloak themselves from all this information gathering?

If you want to know more about the world of Geocontent and Location Aware Applications Dave Mountain is a great person to talk to.

Ian

Friday, October 02, 2009

FUMSI Taxonomy Survey

Participate in this taxonomy survey by FUMSI and you'll receive a copy of the results. This group is investigating "the ways organisations of all types implement and leverage taxonomies, folksonomies and other tagging and classification systems".

Tutorial on Constructing a Thesaurus

Tim Craven, Professor Emeritus at The University of Western Ontario, makes available resources he had developed for courses in the Library and Information field. One of these is a Web-based module that teaches the basics of constructing an information retrieval thesaurus and includes interactive quizzes.

See this and some other resources at http://publish.uwo.ca/~craven/

Saturday, September 19, 2009

Taxonomy Boot Camp 2009

Taxonomy Boot Camp is coming up - November 19 - 20 - in San Jose, Ca.

Keynote speakers are Thomas Vander Wal - credited with creating the term folksonomies; and Leslie Owens from Forrester - author of many insightful articles.

From the KMWorld announcement --

At Taxonomy Boot Camp 2009 attendees will learn about:
Creating and implementing successful taxonomies
Enhancing your information infrastructure with the right taxonomy
Taxonomy design concepts and strategies
New technologies & tools and where the market is headed
Selecting the right metadata, taxonomy approach, and tools for your environment
Evaluating auto-categorization schemes and tools
Managing the build, buy, or automate decision
Working collaboratively with your content and stakeholders
Case studies,lessons learned, and best practices
Measuring and demonstrating the business impact and ROI
Managing and growing a taxonomy

Friday, September 04, 2009

Semantic Web and the work of librarians

Can Librarians Be Put Directly Onto the Semantic Web? by Eric Hellman, go to Hellman (Aug 4, 2009)

What does semantic web mean to librarians? It may mean moving away from an orientation to the human user when creating metadata, to constructing for the machine.

"In many respects, the most important question for the library world in examining semantic web technologies is whether librarians can successfully transform their expertise in working with metadata into expertise in working with ontologies or models of knowledge. Whereas traditional library metadata has always been focused on helping humans find and make use of information, semantic web ontologies are focused on helping machines find and make use of information. Traditional library metadata is meant to be seen and acted on by humans, and as such has always been an uncomfortable match with relational database technology. Semantic web ontologies, in contrast, are meant to make metadata meaningful and actionable for machines. An ontology is thus a sort of computer program, and the effort of making an RDF schema is the first step of telling a computer how to process a type of information."

Thursday, September 03, 2009

SLA Taxonomy Division

Taxonomy building gets additional recognition from SLA as it sets up a division for its members who work in this field.

SLA Now Has a Taxonomy Division!

The Division Chair is Marjorie M.K. Hlava. She wrote, “Taxonomies are widely used and increasingly proven to cut search time by more than 50 percent, increase worker productivity up to seven fold, and allow for location and application of mission-critical information throughout an organization. We are so pleased that SLA members and knowledge workers practicing in this area will be able join this network of taxonomy professionals within a premier professional organization with global reach.”

Wednesday, September 02, 2009

User Testing with Card Sorting

Card Sorting: Pushing Users Beyond Terminology Matches, Jakob Nielsen's Alertbox (Aug 24)

Card sorting can be used for user testing of navigation.

As we learn from this article, "Card sorting is often a good way to get initial insights into users' mental model of an information space, and in our project it did indeed generate good starting point for the IA. After the card-sorting study, we conducted several rounds of user testing of wireframes, further refining the structure and how the site presented it. All of this effort would have been wasted if we'd gotten data on users' keyword-matching skills rather than on how they approach the site's target healthcare issues. "

Has worked examples.

Tuesday, August 18, 2009

Evolution of the Web to Web 3.0

Web 3.0 Concepts Explained in Plain English (Presentations) from Digital Inspiration (May 2009)

This is brilliant - Web 1.0, 2.0, 3.0 explained in a single slide of points, some text, and 5 excellent slide presentations. As the note on the page explains, the two about the red stamp describe the semantic aspects and workings very clearly.

Saturday, August 15, 2009

Good Look at Semantic Web

Jean Graef of the Montague Institute answers the question, Where will semantic content come from? in terms we can begin to understand.

"Semantic web" is usually a very theoretical idea of linking up data in magical ways. In this posting, we get a real-life example drawn from a mortgage refinance application of how a semantic infrastructure would work.

Also raised here is the matter of trust. How will we know to trust the information that is brought together through a semantic infrastructure?

Jean Graef concludes: "The technology piece of the infrastructure is almost ready; it's the semantic content and editorial oversight that's missing.

Card Sorting with Users

Holger Maassen has the inside track on card sorting. In this posting on Card Sorting Maassen describes using card sorting as a categorization technique with user groups in order to learn more about how they view and work with information.

"Card sorting is a categorization technique where users sort cards describing and giving their picture, their understanding and their mental picture of concepts, workflows and information and knowledge."

But, he notes, card sorting will not deliver a finished taxonomy.

He presented this as the first of a series on design and analysis techniques for improving user experience.

Tuesday, August 04, 2009

ASIST on Information Architecture

BULLETIN of the American Society for Information Science and Technology for August / September 2009 [pdf] is a special issue about Information Architecture.

Contents:

+ A Tonic for the Busy Troops by Stacy Merrill Surla, Guest Editor
+ The Information Architecture of Social Experience Design: Five Principles,
Five Anti-Patterns and 96 Patterns (in Three Buckets) by Christian Crumlish
+ The Debut of Usable, Influential Content by Colleen Jones
+ An Internet Watered Down by John Pettengill
+ Gaming the Design: Using Game Design Techniques in the Realm of Investing by Kellie Rae Carter and Dominic La Cava
+ IA for the Rest of the World by Miles Rochford
+ Lessons from Slime Mold: How to Survive and Thrive in Ever-Changing Organizational Environments by Kate Rutter

+ A Reflection on the Structure and Process of the Web of Data by Marko A. Rodriguez

From the editor:

"Our annual special section on information architecture (IA) – A Tonic for the Busy Troops – is the centerpiece of this issue and a very fine one. Stacy Surla, our associate editor for IA, has put together a collection selected from presentations at
the 2009 IA Summit.As they were coming in, I was already emailing friends saying that a number of the articles were going to be must-reads. Immediate utility and impact were Stacy’s selection criteria, and she certainly implemented them well: lots of very timely information for practitioners and some fine thought-pieces. Stacy’s introduction says it all in terms of what’s there and why. Don’t miss it.

Our feature article is another one not to miss. Marko Rodriguez writes about theWeb of Data – the RDF (Resource Description Framework) side of the web – and the problems of exploiting it.Whether you are an RDF novice or someone for whom RDF is daily fare, Marko’s ideas and insights will intrigue you. It’s an outstanding contribution."

Saturday, August 01, 2009

Hakia's Commercial Ontology

A New Commercial Ontology from hakia by Dr. Riza C Berkan, Hakia (

CEO of Hakia, Dr Riza Berkan, proposes a Commercial Ontology that will serve the types of queries that users put to the web. Berkan says these have a "commercial pattern", but I suppose we could think of it as consumer interest just as easily. The key bit is -- "One particular distinction of the commercial pattern is that they come in short packages including a name (onomasticon), or always referring to something sold, bought, watched, heard, etc." Commercial Ontology will also work with sequence of words - which may be more efficient than individual terms.

Also see hakia Unveils Commercial Ontology, Search Engine Watch (July 30)

Friday, July 31, 2009

Introduction to Metadata - Online

Introduction to Metadata (online book)

Stephen Downes has an entry about a free online book about metadata - and provides a short review that finds the first three chapters comprehensive, but the treatment of "rights metadata" poor.

Nonetheless, worth looking at.

Introduction to Metadata. Version 3.0, edited by Murtha Baca. "An online publication devoted to metadata, its types and uses, and how it can improve access to digital resources." Published by Getty.edu.

Interesting note: "Reader's Note: The editor and authors of this publication are aware that the noun "metadata" (like the noun "data") is plural and, therefore, should take a plural verb form. However, in order to avoid awkward locutions, it has been treated here throughout as singular."

Contents:

Home
Introduction
Setting the Stage
Metadata and the Web
Crosswalks, Metadata Harvesting, Federated Searching, Metasearching
Rights Metadata Made Simple
Practical Principles for Metadata Creation and Maintenance
Glossary
Selected Bibliography
Contributors

Web 3.0 a matter of restructuring

Web 3.0: The Next Step for the Internet by Michael Baumann, Information Today (May 2009) via Allbusiness.com

Web 3.0, says Nova Spivack, CEO of Radar Networks, is going to be about restructuring the web.

"Web 3.0 will usher in a revolution in the construction of the internet itself. In the third decade of the Web, the focus is going back to the back end and we're focusing on upgrading the infrastructure of the Web again."

"When an application sees a page on the Web today it doesn't really know what to do with it," Spivack says. "But as we add more of this open standard metadata to the Web, it makes the Web machine understandable. Also, as applications get smarter because they can understand language and know what words mean, that also adds meaning and structure to the Web."

Wednesday, July 29, 2009

Resources for Thesaurus Construction

A posting in Buslib-L titled 'Summary of responses re thesaurus for records management purposes' (July 22) gives us a good starter list of resources for constructing thesauri. Thanks to Sarah Knight

Thesaurus Construction and Publishing Solutions
http://www.multites.com/index.htm
Provides software. Also has a list of Resources for Thesaurus Construction.

Taxonomy Warehouse (Factiva)
List of all vocabularies: http://www.taxonomywarehouse.com/querybyvoc_search_include.asp

"Taxonomy Warehouse was created in 2001 as a valuable community resource, available free to users and vocabulary publishers to help organizations maximize their information assets and break through today’s information overload."

Index New Zealand Thesaurus
http://innz.natlib.govt.nz/content/thesaurus/index.htm

"The Index New Zealand Thesaurus has been designed to describe journal and newspaper articles about New Zealand and the South Pacific in the areas of social sciences and humanities. This version was released in November 2005 and contains over 1,200 preferred terms."

Australian Governments' Interactive Functions Thesaurus (AGIFT)
http://www.naa.gov.au/records-management/create-capture-describe/describe/agift/agift-zip.aspx

"The Australian Governments’ Interactive Functions Thesaurus (AGIFT) is a three-level hierarchical thesaurus that describes the business functions carried out across Commonwealth, state and local governments in Australia. It contains 25 high-level functions, each with second and third level terms, as well as non-preferred terms and related terms. A scope note describes the range of activities covered by a preferred term and provides cross-references."

Victoria Online Thesaurus (July 2008)
Department of Innovation, Industry, and Regional Development
http://www.egov.vic.gov.au/index.php?env=-innews/detail:m2110-1-1-8-s-0:n-9-1-0--

"The Victoria Online Thesaurus is a subject thesaurus of descriptive terms that reflect the themes and resources within Victoria Online (VO), the Victorian Government’s online portal. Victoria Online is a metadata-driven gateway to Victorian State, Federal and Local government information. The VO Thesaurus has been developed to populate the Keyword (DC.Subject) field within the VO Metadata Application Profile (VOMAP)."

Health Thesaurus - Health and Ageing Thesaurus
Australia - Department of Health and Ageing
http://www.health.gov.au/internet/main/Publishing.nsf/Content/health-thesaurus.htm

"The Health and Ageing Thesaurus is a living working tool which assists consistency and subject retrieval of health and ageing concepts. By standardising concepts to one single subject heading, the Thesaurus forms the basis for a common terminology within the Department.

MeSH (medical Subject Headings) produced by the US National Library of Medicine has been used as the basis of the medical terms and the corresponding hierarchical schedules in the Health and Ageing Thesaurus. We are very grateful for their permission to use MeSH in this way. For this edition 2004 MeSH has been used."

National Taxonomy of Exempt Entities
http://nccs.urban.org/classification/NTEE.cfm

The NTEE-CC classification system is used by the IRS and NCCS to classify US nonprofit organizations. It divides the universe of nonprofit organizations into 26 major groups under 10 broad categories.

Google Book Search: Can use Google Book Search to find thesaurus - combine that term with others related to organizations, fund raising, charity etc.

TIPS: Taxonomies in the Public Sector
Taxonomies and thesauri: a list of references and resources for public
sector applications (Great Britain)
http://www.govtalk.gov.uk/documents/Bibliography2005-05-11.pdf

"This bibliography has been prepared for the Taxonomies in the Public Sector (TIPS), a discussion group which supports the Metadata Working Group by encouraging information professionals in the public sector to meet and develop guidance on the implementation of taxonomies and metadata. While IPSV (Integrated Public Sector Vocabulary) and its predecessors GCL and LGCL are the main focus, public sector applications commonly use a complementary specialised vocabulary in tandem with IPSV. the bibliography gives background references across the gamut from development to exploitation and sharing the outputs"

United Nations Bibliographical Information System Thesaurus
http://unhq-appspub-01.un.org/LIB/DHLUNBISThesaurus.nsf

Entries are in six languages.

UNESCO Thesaurus
http://www2.ulcc.ac.uk/unesco/

"The UNESCO Thesaurus is a controlled vocabulary developed by the United Nations Educational, Scientific and Cultural Organisation which includes subject terms for the following areas of knowledge: education, science, culture, social and human sciences, information and communication, and politics, law and economics. It also includes the names of countries and groupings of countries: political, economic, geographic, ethnic and religious, and linguistic groupings."

International Labour Organisation (ILO) Thesaurus
http://www.ilo.org//thesaurus/defaulten.asp

Can use the rotated index to get a sense of the categories and content.

Tuesday, July 28, 2009

Structure First

The Perfect Search By Penny Crosman, Intelligent Enterprise (March 1, 2006 )

Overview article on the importance of structuring content to improve findability of information in an enterprise.

"Google-style search is all right for some, but greater accuracy in the enterprise demands a mix of techniques including content tagging and taxonomy development and technologies such as entity, concept and sentiment extraction tools."

Sunday, July 12, 2009

Text Mining for Meaning

Many are pointing to the emergence of a Web 3.0 that brings together Web 2.0 collaborative qualities with a semantic web of connections and semantic tools for understanding.

Nstein, the digital publishing technologies company, held a webinar titled From Metadata to Meaning: Intelligence in the Semantic Era on June 25, 2009, moderated by Diane Burley.

Seth Grimes of Alta Plana presented the case of an exploding digital universe where individuals are creating 70% of the content. Search based on keyword matching won't be sufficient. The next generation of tools must make sense of the data to deliver meaningful answers. Text mining combined with semantic analysis is one direction. Google has some new capability at picking out data and its context - eg the employment data for a US state. Newssift from the Financial Times shows more capability at identifying organizations, places, people and topical themes in its aggregation of news. Grimes noted that semantically enriched content, linked data, context sensitivity, and location awareness are now recurring themes. Text mining / analytics is enabling Web 3.0 and the Semantic Web.

For designers and users this means:

• Automated content categorization and classification.
• Text augmentation: metadata generation, content tagging.
• Information extraction to databases.
• Exploratory analysis and visualization.

Jean-Michel Texier, Chief Technology Officer at Nstein Technologies, described Nstein modules for mining and analyzing data - essentially to make sense of vast stores of text by extracting concepts, identifying and normalizing entities, categorizing content, extracting relevant sentences, analyzing sentiment, and finding similar content.

The presentation is available in PDF on the dowload section of the Nstein website: http://www.nstein.com/en/downloads.php?sourceId=236 - registration required.

Friday, July 10, 2009

Semantic Metadata

Semantic Metadata & Sagacious Serendipty by Diane Burley, Silicon Valet (June 2009)

Digital media consultant, Diane Burley, says that metadata is the key to creating websites that enable discovery and provide user satisfaction. But she doesn't mean that run-of-the-mill metadata - date, author, format etc - administrative metadata, but semantic metadata.

"The academics at Kent State call it descriptive metadata, while the folks at Nstein prefer to call it semantic metadata (semdata??). It is metadata that is generated using a multi-faceted approach of computational and linguistic analysis. It not only extracts meaning from documents – but also embeds the synonyms, summary, categories, even the tone, in order to create a linguistic fingerprint. This linguistic fingerprint can then be matched against any other linguistic fingerprint – to find like pieces of content."

Thursday, June 25, 2009

Taxonomy Boot Camp 2009

Taxonomy Boot Camp coming up November 19-20, 2009, in San Jose, California.

http://www.taxonomybootcamp.com/2009/

Making content discoverable is the job of a well-constructed, robust taxonomy — and a mission-critical objective for today’s organizations. Designed, developed, implemented, and managed effectively, a taxonomy or categorization scheme ensures people are finding and using precise information in myriad internal data collections and websites.

Thursday, June 04, 2009

Getting Best of Both Worldsto

Folksonomies: Business Use,by Fran Alexander. FUMSI (May 2009)

Finds strengths and weaknesses in using folksonomies in business depending upon how much precision and recall is needed. Sums it up as "Business contexts where precision and recall are not important tend to be in less process-critical areas or where individual content items are not business critical, such as wikis, blogs, or corporate social networking sites."

Provides a comparison of characteristics for folksonomies and the taxonomy.

Proposes that we should aim at getting "best of both worlds" and presents "suggested contexts" for using each.

Folksonomies Essential

Survival for the fittest tag: Folksonomies, findability, and the evolution of information organization by Alexis Wichowski, First Monday 14.5 (May 2009)

Abstract: "Folksonomies have emerged as a means to create order in a rapidly expanding information environment whose existing means to organize content have been strained. This paper examines folksonomies from an evolutionary perspective, viewing the changing conditions of the information environment as having given rise to organization adaptations in order to ensure information “survival” — remaining findable. This essay traces historical information organization mechanisms, the conditions that gave rise to folksonomies, and the scholarly response, review, and recommendations for the future of folksonomies."

Especially note conclusion:

"Folksonomies may be flawed, but they are, at present, the best means known to track what is happening with the non–mainstream of the information environment. If the greatest evolutionary changes in the biological environment — the birth of new species — occur not at the center but in the long tail, what great new transformations may be occurring in the long tail of the information environment? Tagging provides this outlying information, published far from the mainstream, a chance to be found, to be considered useful, and ultimately, to survive."

Card Sorting for Categories

All About Card Sorting: An Interview with Donna Spencer by Steve Baty, UX Matters (May 25)

From the introduction to the interview transcript:

"Donna Spencer is one of Australia’s best-known information architects, organizer of the UX Australia conference, and a frequent presenter at UX conferences in Australia, the US, and Europe. I caught up with Donna between her appearances at the IA Summit and RedUX DC to talk about card sorting and her new book, Card Sorting: Designing Usable Categories, which Rosenfeld Media recently published."

MeSH Training Course

National Library of Medicine has a free online course on Using Medical Subject Headings (MeSH®) in Cataloging

http://www.nlm.nih.gov/tsd/cataloging/trainingcourses/mesh/index.html

Some of the thinking that goes into subject analysis for cataloging applies in building taxonomies. And of course, MeSH is a taxonomy itself.

(All examples are current as of the 2009 release of MeSH)

Thursday, May 21, 2009

Tags - Not so Useful

Do Tags Work?, by Cathy Marshall

Tags are deeply entrenched in the Web now - and likely enterprises. But - here's the question we all ask - "But are the tags that people create really an effective way of describing information so that it can be found and managed, folded and put in the right drawer?" Mr Everything is Miscellaneous, David Weinberger, extols social tagging.

Cathy Marshall undertook her own study, in which she gathered and closely analyzed the metadata on 322 Flick images of the floor of the Galleria in Milan. She looked at tags, descriptions, and title - and found more useful words in the latter two. Tags, she said, "can be a rich source of noise"... "The message here is almost painful: a great proportion of user tags add little or no further information; as such, they don't appear as often in narratives or titles."

Could they be made useful? Taggers would have to be more specific - and that's not likely to happen.

Tuesday, April 21, 2009

MetaVis Graphical Metadata Tools

Managing SharePoint taxonomies, KMWorld (Apr 20)

MetaVis Technologies builds visual metadata management tools. ARCHITECT is for creating metadata and hierarchies.

"MetaVis Technologies has released MetaVis ARCHITECT for SharePoint, which gives information architects, developers, consultants and administrators the capability to design, document and deploy SharePoint objects using a graphical tool."

Trial version available.

Friday, April 17, 2009

Social Tagging Studies

Studying Social Tagging and Folksonomy: A Review and Framework

Trant, Jennifer (2009) Studying Social Tagging and Folksonomy: A Review and Framework. Journal of Digital Information 10(1).

Abstract

"This paper reviews research into social tagging and folksonomy (as reflected in about 180 sources published through December 2007). Methods of researching the
contribution of social tagging and folksonomy are described, and outstanding research questions are presented. This is a new area of research, where theoretical perspectives and relevant research methods are only now being defined. This paper provides a framework for the study of folksonomy, tagging and social tagging systems. Three broad approaches are identified, focusing first, on the folksonomy itself (and the role of tags in indexing and retrieval); secondly, on tagging
(and the behaviour of users); and thirdly, on the nature"

Wednesday, April 15, 2009

Vivisimo's Enterprise Search

Vivisimo Releases New Multimedia Outreach, Stan, Enterprise Search Center,

Vivisimo, a provider of enterprise search solutions, tells us more about them through Stan, a series of videos, blogs and other social media.

"The Meet Stan campaign includes a website (meetstan.com), videos, and social networking sites including Facebook and Twitter. Stan’s videos, available on meetstan.com and YouTube, describe different aspects of enterprise search. After an introductory video, the initial series will focus on the three critical dimensions of search—discovery, personalization and collaboration. Stan also has a blog on the website where he discusses his search and collaboration dilemmas and how enterprise search can help."

Why Stan? Why not Stephanie or Sue? Anyway - to meet Stat go to http://meetstan.com/

Saturday, April 11, 2009

Developing Taxonomies

Ontologies, Taxonomies, Thesauri: Learning from Texts, by Christopher Brewster and Yorick Wilks; Department of Computer Science, University of She±eld, She±eld, UK (2004)

On automating the creation of taxonomies:

"This paper takes the approach that, given the `info-smog' we live in (AKT 2001), hand-crafting is impractical and undesirable. While it is still a major research challenge to construct ontologies entirely automatically, the current tools available from the Natural Language Processing community make it possible to automate the task to a large extent and reduce manual input to where it makes the most qualitative di®erence. In Section 2, we describe discuss in greater detail the problem of manually constructing ontologies and argue for the use of text corpora as the main source of knowledge. In Section 3, we present a number of criteria as a guide for the method that need to be used for the automation of ontology construction. In Section 4, we present a number of methods for constructing ontologies from texts based on the criteria presented. Section 5 considers how to bridge the gap between the implicit knowledge assumed by a given text and the actual explicit knowledge present in the texts."

Friday, April 03, 2009

Endeca's New Platform

Endeca Aims to Scale New Enterprise Heights by Paula J. Hane, Newsbreaks (Apr 2)

Endeca Technologies, Inc. has announced a new release of the Endeca Information Access Platform called the McKinley Platform, with "improvements in speed, scalability, and simplicity".

Endeca is noted for the Guided Navigation experience - a faceted view that organizes structured and unstructured enterprise data.

Financial Times recently debuted the Newssift service based on this.

"John Greenleaf, chief marketing officer, Financial Times Search, Newssift.com, says, "We partnered with Endeca to launch Newssift.com, a revolutionary site that monitors business news and the ever-changing landscape of ideas and opinions, facts, and supposition. Endeca's new platform is the foundation for our robust news site that provides customers with the ability to quickly sift through large amounts of content to make informed business decisions."

Thursday, March 19, 2009

Enterprise Search Market - Short Report

2009 Overview of the Enterprise Search Market by Miles Kehoe, New Idea Engineering (Updated March 2009)

This annual report on enterprise search vendors covers tiers 1 and 2 of "commercial vendors, open source solutions, and search-related tools and utilities".

Topics:

The State of Enterprise Search
The DNA of Search Technology Companies
Leading Vendor Overview
Vendors by Market
Search Resources
Looking forward in 2009... and beyond

Designing for faceted search

Designing for faceted search By Stephanie Lemieux, KMWorld, (Mar 1, 2009)

Faceted search or guided navigation has become remarkedly popular at e-commerce sites and is seen often as a feature of auto-classification at some metasearch engines (eg ISeek.com).

This article describes faceted search and related trends, and shares five do's and don'ts on designing for faceted search.

"The power of faceted search lies in the ability of users to create their own custom navigation by combining various perspectives rather than forcing them through a specific path. Think of a cookbook: Authors have to organize the recipes in one way only—by course or by main ingredient, etc.—and users have to work with whatever choice of organizing principle that has been made, regardless of how it fits their particular style of searching. An online recipe site using faceted search can allow users to decide how they’d like to navigate to a specific recipe, offering multiple entry points and successive refinements. Figure 1 on page 15 (KMWorld, Vol 18 #3) shows how facets can help pinpoint content very precisely through the combination of perspectives. Just three facets with five terms each can represent 243 possible combinations."

Note: diagrams not available online.

Concept Searching

Getting to the point, KMWorld (Mar 18, 2009)

Sharepoint, taxonomies, and automatic classification seem to come together in ConceptSearching

"Content Types can be used to enforce metadata governance, adhere to policies and drive workflows in line with business processes. Included in the new release is the ability to assign taxonomies to specific Content Types. Documents that correspond to the selected Content Types will be classified and documents that do not correspond to a content type or do not include some metadata elements that a specific content type has specified will not be classified."

Learn more about ConceptSearching's conceptClassifier and its capabilities for metadata generation, taxonomy classification, and taxonomy creation at http://www.conceptsearching.com/web/

Page also links to case studies such as Microsoft's Two Page Partner Case Study
(2008)

Tuesday, March 10, 2009

Enterprise Search Blogs

Enterprise Search Resources at the Enterprise Search blog (March 2, 2009) provides a list of resources for learning more about enterprise search and staying abreast with developments: Linked In groups, newsletters, blogs, and trade shows. Good starting point for setting up RSS feeds.

Wednesday, March 04, 2009

Panelists talk about auto-classification

Automatic Classification: A Panel Discussion FUMSI (Jan 2009) -- "Karen Loasby discusses automatic classification with freelance information architect Helen Lippell and BBC information architect Silver Oliver."

Panelists covered a lot of ground in this discussion: types of auto classification systems (2), problems the English language present, taxonomies and folksonomies, and situations in which auto-classification is suitable and when not.

"Taxonomies can be the glue of an automatic classification implementation. They are the vocabulary that rules, whether Boolean or statistical, are built upon, allowing concepts to be applied consistently to content. Taxonomies also provide the framework of relationships, such as synonyms and related terms between concepts - they help the automatic system to understand the domain in the way that users do."

Tuesday, March 03, 2009

Generate Metadata for Multimedia

EveryZing Launches Automatic Metadata Generator For Multimedia by Chris Sherman, Search Engine Land (Mar 2)

This looks revolutionary:

"Video search and SEO services provider EveryZing has introduced MediaCloud, an online service that automatically generates search-friendly metadata for video, audio and other rich media content. One of the big problems with rich media is that without accompanying metadata that describes the contents, video and audio files are essentially “invisible” to search engines."

There are several more articles at EveryZing. The MediaCloud seems intended to improve the findability of web sites / content, but with the increase in rich media everywhere, this would surely be an advantage inside the organization.

Friday, February 06, 2009

Taxonomies are not a quick fix

Taxonomy: Silver Bullet or Shallow Puddle by Stephen Arnold, Beyond Search (Sept 2008)

You can count on Stephen Arnold to not pull punches. Everybody is talking about taxonomy as the ultimate solution to information retrieval. But do they appreciate how difficult it is to do well?

Arnold answers his question - "Why are taxonomies perceived as the silver bullet that will kill the vampire search or CMS system" - with five points that mainly show that people think taxonomies are a) a quick fix, and b) easy to create especially with the aid of software. Neither is true.

Wednesday, February 04, 2009

Txaonomy Bootcamp 2009

Dates are set for the 2009 Taxonomy Bootcamp conference in San Jose, California. Theme is Organizing and Optimizing Information. Dates are November 19-20

http://www.taxonomybootcamp.com/2009/

From the announcement:

"Sessions and speakers will explore the state of taxonomies and the technologies involved, and challenge you to consider how taxonomies and information organization approaches are evolving and where you'll fit in that future. The program highlights case studies, practical and thought-engaging sessions on ontologies, folksonomies, taxonomies in Sharepoint, lessons learned, metrics, demonstrating value, governance, and taxonomy management."

Tuesday, February 03, 2009

Creating Metadata on Legacy Materials

Metadata Tools Report Released by DLF Aquifer - Econtent (Feb 3)

The Digital Library Federation (DLF) has released a report on the Future Directions in Metadata Remediation for Metadata Aggregators that identifies and evaluates tools that can be used to normalize and enhance metadata. The intended application is for "cultural heritage materials" in digital libraries, but there are aspects that could be applied more generally for any situation involving digitisation.

Download the report from http://www.diglib.org/ - DLF Aquifer News.

From the report:

"DLF Aquifer, a Digital Library Federation initiative, focuses on making digital content—especially cultural heritage materials pertinent to American culture and life—easier for scholars to find and use. One avenue to providing better access to digital collections is by including the collections in aggregations that are promoted and exposed through commonly used channels such as commercial search services.

Successful aggregation depends on robust, consistent metadata. While data providers may strive to include all applicable fields for their chosen metadata format in newly created records, records that have been mapped from legacy data in other formats will seldom be optimized in their new home, and the creators of these records may not have the resources to augment these records in any more than the simplest ways. Remediation tools to improve the quality of metadata for improved services are therefore highly desirable."

Postscript: (Feb 24)

Full report is at http://www.diglib.org/pubs/dlf110.pdf

Sunday, January 25, 2009

Taxonomy Building Process

Taxonomy and Glossaries for Enterprise Search Terminology by Lynda Moulton, Enterprise Search Practice Blog, Gilbane Group (Jan 21)

Lynda Moulton points to a problem that bedevils anyone who works to organize information through categorization - terminology changes.

"Taxonomies are never static, and require periodic review, even when the amount of content is small. Taxonomists need to keep pace with current use of terminology and target audience interests. New jargon creeps in although I prefer to use generic and terms broadly understood in the technology and business world."

She describes her method for gathering and defining terms related to enterprise search (one which involves validation through web searches), and for keeping this current with new discoveries and developments in the industry.

We see what can be done in the glossary for search and text retrieval that is attached to her posting.

Metadata, as one example, received this clear definition: "Explicitly defined labels for structuring content that describes any document or file regardless of the native format."

This posting is a very helpful worked example of an approach for developing and growing a controlled vocabulary or taxonomy.

Friday, January 23, 2009

Autonomy To Acquire Interwoven

Autonomy to acquire Interwoven, KMWorld (Jan 22)

Autonomy further strengthens its capabilities with unstructured information management software by acquiring Interwoven.

From the press release:

"The acquisition will strengthen Autonomy's access to the worldwide legal and compliance industry through Interwoven's significant sales force with industry expertise."

Friday, January 09, 2009

The Problems of Relevance

Google Tech Talk: Reconsidering Relevance by Daniel Tunkelang, The Noisy Channel (Jan 8, 2009)

Daniel Tunkelang, Chief Scientist at Endeca, has posted slides of a presentation on Reconsidering Relevance at SlideShare of a talk he gave on the weaknesses of relevance-centric search and what might be used as alternatives.

"We’ve become complacent about relevance". This would be developers and users. As users, we are easily satisfied by the results we get from web search engines and don't appreciate that much can be done to assist and promote exploratory search. He reminds us that "information needs evolve as we learn". Faceted search and tagging, in particular, can assist searchers refine and clarify.

Look for the notes on the slides.

Thursday, January 08, 2009

Autonomy and Unstructured Data

More information about Autonomy in InFocus: Autonomy Corp., PLC bt Teresa Cramer, Enterprise Search (Dec 1, 2008)

Nicole Eagan, CMO for Autonomy, Corp., spoke about Autonomy's place in the enterprise search marketplace.

To modify an old slogan, it's mainly because of the unstructured data.

"By "applying very advanced algorithms" to information, Eagan says Autonomy makes "understanding the meaning of human-friendly information" a reality for computers. Autonomy’s approach centers on what it calls "meaning-based computing"—the ability to form an understanding of all information, whether it be structured, semistructured, or unstructured, and recognize the relationships that exist within it."

...

"It is a commonly referenced statistic that about 80% of a company’s information is unstructured. Autonomy tackled this challenge from the start, with its early focus on turning information that computers could not understand into searchable and more readily usable data. "

Wednesday, January 07, 2009

Is Keyword Search Dead?

Search is Dead says Stephen Arnold at the Enterprise Search Center (Jan 7, 2009)

Among other reasons, there are these startling two:

Keyword search is often a turnoff. Users say that hitting on the right combination of keywords to get the information is too difficult.

As many as two-thirds of enterprise search system users report that other means must be used to find needed information

Silver bullets promised by enterprise search vendors have been taxonomies, natural language processing, "intelligent systems that discover the meaning of documents using latent semantic indexing", and various hybrids.

Studies have shown that people expect the following:

Offer a web page that gives users specific suggestions and options with hotlinks to topics, categories, and key subjects.

Include a search box, but provide the user with point and-click options and a way to get started on the quest for the needed information without requiring the user to frame a keyword query.

Allow the user to drill down or jump across topics. The technology should make it easy to explore the available information and—equally important—to backtrack or find a previously displayed piece of information.

Arnold lists four companies he has tested that "point to the future".

Autonomy Update

It Ain’t Over By Jason Stamper, Computer Business Review (Dec 2008)

Autonomy CEO Mike Lynch talked to Jason Stamper about his company and the challenges in enterprise search. There is a good review of the competition in enterprise search and mergers that have taken place in the last few years such as Microsoft's purchasing FAST. More directly, Mike Lynch outlines the direction Autonomy is taking, and quotes IDC's Sue Feldman about the company and industry.

"IDC’s Feldman though says that, “At this point, it is clear that Autonomy should no longer be considered purely a search vendor. It builds search-based applications to answer market demands for better information-centric software.”"

"What does that mean? Autonomy’s website explains: “Autonomy's software powers the full spectrum of mission-critical enterprise applications including pan-enterprise search, proactive information risk management, information governance, e-discovery, consolidated archiving, call centre solutions, rich media management, security applications, customer relationship management (CRM), knowledge management (KM) and BPM [business process management].”"

There is also a sidebar on the rise of e-discovery.

Importance of Information Architecture

Enterprise Information Architecture: A View From The Legal World by Kate Simpson, FUMSI (Nov 2008)

Information Architecture is the context for a taxonomy for the enterprise. Kate Simpson identifies a need for law firms to review and improve "a firm's Enterprise Information Architecture (or Firmwide IA) through an information housekeeping initiative". She looks at processes, systems and tools (where taxonomies fit, information assets, and governance as components to Firmwide IA and recommends a coordinated approach.

Building an ECM Taxonomy

Forrester has published a guide to creating taxonomies -- How To Build A High-Octane Taxonomy For ECM And Enterprise Search Systems (Nov 17, 2008). Forrester makes the point that "... it's vital to enrich the content that flows through these systems with more meaningful and structured metadata. Enriched content is simply easier to isolate, promote, find, and control." Table of contents is provided in the document excerpt.

Cost is $379 US but there is a promise of full refund in the first 3 weeks if you aren't satisfied.