Sunday, December 11, 2005

"Millions of Games"- example of structured tagging

Search Engine Watch,a web site dedicated to the search engine industry, has had very little good to say about folksonomy-type tagging. In folksonomy tagging users annotate pages, images and other web content with descriptive keywords without the aid of controlled vocabulary. For their view on this controversy see "Tagging Not Likely The Killer Solution For Search"

Recently, Search Engine Watch cited an example of a site called Millions of Games, a search tool for finding games on the Internet. Unlike other user-tagged sites, the developers of this site provide controlled vocabulary categories to assist users assign unambiguous category tags.

Search Engine Watch comments: "Rather than allowing users to tag anything with any words they fancy, [we think] adding structure, defining a mutually agreeable taxonomy and relying less on idiosyncratic language seems like a very promising approach. Millions of Games is an example of a good start to a more useful form of structured tagging. See "Where Tagging Works: Searching for a Good Game"

Value of the faceted taxonomy

Faceted Taxonomy - Creating a Better User Experience through Information Discovery - is taken up by Todd Peters in a paper that is part of KMWorld's Best Practices in Enterprise Knowledge Management (Nov/Dec 2005).

This is a compact article that lays the basis for understanding taxonomies as "a way of consistently organizing and classifying large amounts of data through a controlled vocabulary of terms". It describes the building-in of facets with Human Resources as an example, and the resulting navigational components. Author recommends this approach when "considering a site re-design or the implementation of a Web content management system".

Article is available after registration from http://www.kmworld.com/Categories/CategoryIndex.aspx?CategoryID=59

Tuesday, December 06, 2005

Tips about Taxonomies

I Column Like I CM: What’s so funny about Taxonomy CoPs, FAQs, and Tips? EContent (Dec 6, 2005) - Bob Doyle has created a web site named TaxoTips that he hopes will provide answers to questions people have about taxonomies and controlled vocabularies. He says that, "At TaxoTips, I'm going to tackle these and other more sophisticated options like multiple hierarchies, thesauri, and faceted classifications, as well as the ontologies, semantic networks, topic maps, and conceptual graphs that underpin the new thinking of the Semantic Web that aims to add machine-readable meaning to content."

Tuesday, November 29, 2005

CMS selection is a jungle of possibilities

EContentMag.com: Bob Doyle of EContent Magazine performed a study of the number of Content Management Systems (CMS) available for purchase. He came up with the astounding figure of 1,879 choices. CMS selection is a jungle, Doyle says, and you may well need a seasoned guide who will bring you back alive. See Bob's article as he provides links to directories of CMS products with lists of features.


Note to those shopping for a CMS: These CMS systems need to have some form of taxonomy and auto-categorization capability. Know what a taxonomy is and what kinds of features you require.

Autonomy Purchases Verity

AUTONOMY CORPORATION PLC ANNOUNCES AGREEMENT TO ACQUIRE VERITY, INC.: CAMBRIDGE, ENGLAND - 4 November 2005 - Autonomy Corporation plc, a global leader in infrastructure software, and Verity, Inc., a leading provider of business search and process management software, today announce that they have entered into a definitive agreement under which Autonomy will acquire Verity.


Autonomy and Verity are both major vendors of content management systems with auto-categorization and taxonomy functionality built into their product offerings.

Friday, November 18, 2005

Siderean extracts metadata

Siderean Announces New Solution for Metadata Generation Econtent Magazine (Nov 18, 2005)

"Siderean Software, provider of Seamark Navigator, an enterprise content navigation solution, has announced a product and services solution for extracting and aligning metadata from large amounts of unstructured and structured data based on the IBM Unstructured Information Management Architecture (UIMA)."

Views on Tagging

Tagging is very hot right now on the public Web. Del.icio.us and Flickr.com are most famous for it. People at Furl.net do it. Technorati has adopted it. And those using the new Yahoo MyWeb2.0 can do it. Tagging is the act of adding keyword terms for an article or bookmark, or, in the case of Flickr, a photo. Collectively these tags are called folksonomies.

There have been a few articles on this phenomenon (or craze depending on your point of view.)

Daniel Terdiman at CNET feels that 'Tagging' gives Web a human meaning (Nov 16, 2005). Cory Doctorow of the popular blog BoingBoing is quoted as saying, ""We've had this decades-long program of top-down metadata. People (were asked) to go out and become familiar with one ontology and to make sure data is categorized like this. But people are not very good at this ...". With tagging, people use words that mean something to them. It's not mentioned that one person's category for technical support may not the be same as another's.

These tags may be more useful to communities that share interests - perhaps smaller work groups. Brad Hill sees tagging as a "tool for collaborative social use". There is mention in this article that some corporations are adopting tagging internally.
""In a corporate environment, the interests are narrower than all the human interests on the Web and the vocabulary becomes narrower," said Dave Weinberger, a fellow at the Harvard Berkman Center."
There are other examples that show the potential in tagging - especially the one where people re-classified George Orwell's 1984.

Will these tags help in general web search? The editors at SearchEngineWatch are sceptical.

Danny Sulivan wrote that Tagging Not Likely the Killer Solution for Search (Mar 22 2005) - tags are wide open to abuse by spammers, and search engines have learned from past experience to ignore them. Sullivan, himself, doesn't find that tags (or categories in directories for that matter) much help in search. He prefers, as do so many, keyword search - the "warp drive to zip you to what you wanted". In Yahoo My Web Tagging and Why (So Far) It Sucks (June 30, 2005) , he describes the many weaknesses to tagging, among which is the sheer effort involved in browsing tags and trying to assess which would be most relevant. People have been disinclined to use directories even though directories contain hand-picked and reviewed sites. Why would they spend time with other people's tags? Automated classification is probably more effective as a search aid.

Where will it work? Maybe, just maybe, where people can choose terms from an established list. Chris Sherman says in Where Tagging Works: Searching for a Good Game (Nov2) that it can work if there is a controlled vocabulary in the background and points to the success of Millions of Games.
"The site uses controlled vocabulary (called "Gameology") to describe categories (arcade, shooter, puzzle, etc). Although you can also add your own free-form tags, these category tags are well known to most users, so there's little ambiguity about what the tags mean.""

Will people spend time tagging with or without a controlled vocabulary? For photos they will, since, as many have noted, that's the only way you can find them again. But will they for articles, documents, and other text items? Would people even tag their email? We'll see. This will play out on the Web first.

Sunday, November 06, 2005

Categorizing the Web

The general purpose Web search engine (Yahoo, Google and others) does not serve people needing specialized, industry-specific information well. This article mentions two companies that use categories to tailor the collection of information and display of results to particular business needs.

Web Search Gets Down to Business, by Cathleen Moore, Infoworld (Oct 31, 2005)

Convera's private label search engine, Excalibur, seems especially advanced with capability to do "deep web" indexing employing millions of topical categories and displaying results as "facets". Convera is targeting Excalibur to "vertical industries such as media, financial services, research, and legal".

The Excalibur web page states that "Convera has assembled a vast storehouse of commercially recognized taxonomies, plus privately compiled classification systems and other knowledge sources, to give Excalibur one of the largest and most detailed Web indexes on the planet. This Excalibur "intelligence index" links users to topical, targeted Web results in real time for a complete and comprehensive perspective."

Exalead, from France, also uses categorization to help searchers see how "provide an integrated search experience that serves up results from multiple sources in a single view".

Saturday, November 05, 2005

PaperThin: New Taxonomy Software

EContentMag.com: PaperThin, Inc., a Web publishing and content management software provider, has announced advanced content classification and discovery capabilities in CommonSpot Content Server version 4.6. The product's taxonomy module now provides taxonomy management for Web content administrators and contributors. New features include a taxonomy term editor and a taxonomy API, along with a facet-based navigation element.

Saturday, October 15, 2005

Inxight Smart Discovery

Announcements about new capabilities of Inxight Software's Smart Discovery have been appearing.

KMWorld noted in Text analysis takes off (Sep 28) that "
Inxight Software has introduced SmartDiscovery Analysis Server 4.2, which is said to offer faster, more flexible and accurate extraction, categorization and search indexing capabilities."

The entry at EContent (Sep 16) - Inxight Announces Availability of SmartDiscovery Analysis Server 4.2 - singled out the ThingFinder for entity extraction (people, places, companies etc) which also allows users to define their own entities using special syntax.

The Inxight web site has descriptions of company use of Smart Discovery under the Solutions tab.

Thursday, September 22, 2005

Collaborative Tagging Systems

Scott Golder and Bernardo A. Huberman of the HP Labs have written an article that explores collaborative tagging systems (AKA "folksonomies") vis a vis taxonomies using data from the Del.icio.us web site. To quote:

"Collaborative tagging describes the process by which many users add metadata in the form of keywords to shared content. Recently, collaborative tagging has grown in popularity on the web, on sites that allow users to tag bookmarks, photographs and other content. In this paper we analyze the structure of collaborative tagging systems as well as their dynamical aspects."

Monday, September 19, 2005

LookAhead Searching

It's not quite a taxonomy but it does work from a controlled vocabulary. Surfwax, known for a variety of Web-search toolsets, is making its look-ahead technology available to webmasters for a fee. Searcher can select from a list of words or begin to type and be presented with suggestions. You can see the vocabulary listings and how it works at the Surfwax News Accumulator. Gary Price describes the service in Surfwax Offers Look-Ahead Technology for Web Sites at SearchDay (Sept 19).

Tagging

Tagging seems to be well past the tipping point of acceptance. People in communities such as del.icio.us and flickr are tagging postings with keywords. This article in Business Week Online about Tagging: Keeping Tabs On The Net (Sept 26) describes how companies can use tags at popular Web community sites to monitor buzz. But tagging has potential for other business use in organizing information as seen in this bit -- "When marketing agency MarCom:Interactive holds seminars, it now creates resource pages with tags to blogs, Web sites, and research presented in the seminar. Afterwards, MarCom and its clients can add more links, keeping the discussion going."

Sunday, September 11, 2005

Semantic Technologies

Semantics ( the study of meaning) and its application to information systems is another new tool for the information professional. This evolving field focuses on the study of the way we encapsulate and express meaning in our systems. Controlled vocabulary, taxonomy, and ontology comprise key areas within this discipline. The phrase "semantic technologies" has appeared to refer to the software standards and methodologies aimed at providing more explicit meaning for the information at our disposal.

Semantic technologies is a byproduct of the work done by W3C on the Semantic Web concept of Tim Berners-Lee. In the spring of 2004 the W3C recommended RDF and OWL as the primary standards for expressing meaning on the Internet.

The first conference on semantic technologies was held in in 2005 with a follow-up Semantic Technology Conference scheduled for San Jose, California in March, 2006. On the conference site you can read a summary of what is involved in semantic technologies - "The CIO's Guide to Semantics" by Dave McComb, Semantic Arts, Inc.

Saturday, September 10, 2005

Taxonomy of Human Services

InformCanada, a national association of information and referral providers, is currently undertaking a pilot project to implement a bilingual, pan-Canadian classification system based on the AIRS/INFO LINE Taxonomy of Human Services developed and maintained over the last 20 years by 211 LA County, formerly known as INFO LINE of Los Angeles. 211 is an easy-to-remember three-digit dialing code that enables a caller to access over 28,000 health and human service programs throughout Los Angeles County 24 hours per day, 7 days per week. The Taxonomy of Human Services is endorsed by the Allianceof Information and Referral Services (AIRS) and the United Way of America. It is now a designated standard for indexing 211 databases in the US.

A 211 service has been in development in Canada since 2001. The AIRS/INFO LINE Taxonomy was recently chosen as the common classification system for the databases used by the network of 211 call centres in Ontario and other provinces. InformCanada and an editing committee are working closely with Georgia Sales of 211 LA County to incorporate concepts related to Canadian services into the Taxonomy and to add French language equivalents for the terminology.

Friday, September 09, 2005

Taxonomy Community of Practice

The information consultant Seth Earley has set up a taxonomy community of practice as a forum to communicate ideas, techniques and experiences in deriving, applying and maintaining taxonomies. Members include practitioners of various backgrounds and responsibilities: consultants, taxonomists, indexers, content managers, knowledge management professionals, librarians, and others.

This is a unique place on the web to carry on a discussion with your taxonomy peers on the practicalities of implementation. To join go to http://earley.com.

Seven Stages to Content Management

In I Column Like I CM: Seven Stages of the CM Lifecycle at eContent (Sept 8, 2005) Bob Doyle proposes that there are seven stages to the Content Management cycle rather than the three or four outlined by other writers. Most particularly he argues for an initial phase called Organization, "where categories are created, vocabularies are controlled, taxonomic hierarchies are designed, and faceted classification schemes are developed." Other stages named were creation, storage, workflow, versioning, publishing, and archives.

Doyle refers to other key texts about Content Management.

Monday, September 05, 2005

Verity Wins KMWorld Magazine Award

August 23, 2005 - Verity Inc. (NASDAQ: VRTY), a leading provider of enterprise search software that enables organizations to discover, analyze and process all the digital information within their enterprises, today announced that its Verity Collaborative Classifier (VCC) technology has been selected by KMWorld magazine as a "2005 Trendsetter." An add-on module to Verity's K2 Enterprise search software, VCC enables global organizations to better manage search taxonomies and maximize the return on their intellectual capital investment.

The VCC technology .. simplifies the process of creating taxonomies. With this tool, organizations can distribute taxonomy and classification management to assigned subject matter experts and enable collaboration through workflow.

For more information about Verity's Collaborative Classifier and K2 Enterprise search technologies, please visit http://www.verity.com/products/ics.html

Information Architecture & Taxonomy Workshop

The Delphi Group is holding a high-end, intensive two day workshop on the developing of taxonomies within the broader context of information architecture. Here are a few of the topics covered:

- Positioning Search, Thesaurus, Ontology and Taxonomy
- Developing Customized Taxonomies
- Separating Taxonomy From Its Front-end
- Taxonomy Evaluation and Migration Strategies

Called "Proving Ground for Information Architecture and Taxonomy" Delphi is offering two separate dates and locations for this workshop:September 13 -15, 2005 in Boston, MA and December 6 - 8, Boca Raton, FL. For more information go to
http://www.delphigroup.com/events/taxonomy-pg/index.htm.

Taxonomy Boot Camp

A two-day course taught by world-class taxonomy experts and information scientists will be held at the Hilton in New York City September 27- 28, 2005.

The organizers advertise " At Taxonomy Boot Camp attendees will learn how to know when a taxonomy is needed, categorization option, and how to develop a classification system. The structured curriculum will cover topics including what you need to know to make a “build or buy” decision, how to build a taxonomy framework, how to apply metadata and taxonomy principles, and how to manage and maintain the information in your taxonomy. Participants will also study governance issues and learn how to integrate a taxonomy with search and content management systems for maximum effectiveness and ROI."

For more information go to http://www.taxonomybootcamp.com/

Friday, August 26, 2005

SchemaLogic enhances enterprise taxonomy management system

SchemaLogic Introduces Vignette Taxonomy Integrator Adapter - Press release (Aug 23)

"SchemaLogic, a leader in enterprise metadata and taxonomy management software, today released the Vignette Taxonomy Integrator Adapter for SchemaLogic’s SchemaServer application. The Adapter allows taxonomists, information architects and knowledge workers the ability to develop and manage Vignette categories and classifications hierarchies within SchemaLogic’s SchemaServer. Organizations can now manage and deploy Vignette taxonomies from a centralized enterprise taxonomy management system."

Friday, August 19, 2005

Taxonomies as underpinning for sophisticated tools

In attending the SLA 2005 conference, held in Toronto in June, I was impressed with how prominent taxonomies had become as a feature of many of the products being demonstrated in the Exhibit Hall. This became material for this article - The Future of Search: Observations from SLA 2005 Conference - published in the SLA Toronto Chapter Courier (Summer 2005)

From the introduction: "One thing was clear from the many content aggregators at the SLA Conference 2005, the interfaces are beautiful to look at, taxonomies and metadata are the underpinning for all sophisticated tools, and clustering is being adopted as a search aid. All of that is happening for the specialized information products. On the Web it’s a different matter, at least judging from Google. There could be a battle building between the sophistication of specialty tools and the simplicity, sometimes deceiving, of Google."

Thursday, July 14, 2005

Zenome: Human indexing for the web

Zenome: What is Zenome?:

"We'd like you to help us build the largest, most comprehensive directory in the world. Our goal at Zenome is to "sequence the Web." That means building a dynamic, digital library of web pages to provide the best, most relevant search results. Since we believe directories indexed by humans are far more effective than those managed by software, we're enlisting the help of the Web community. We need people like you to help us grow our digital library by informing us of new, important and interesting web pages."

Zenome.com is a new web directory with a number of pre-set categories and human editors who review the listings in each category and decide which links are the best fit for a given search topic. A more selective complement to Google, Zenome is similar to the Open Directory Project (ODP) but structured to overcome some of ODP's problems. You can sign on as an editor for a category.

Tuesday, July 12, 2005

Teragram: another taxonomy management software vendor

Teragram Unveils TK240 Version 5 Taxonomy Management Software EContentMagazine

Teragram, a provider of multilingual natural language processing technologies, has announced the general availability of Teragram TK240 Version 5, the company's flagship taxonomy management software. Teragram TK240 Version 5 allows content management specialists to develop, test, and deploy taxonomies using Teragram's categorization and concept extraction technology that is designed to enable faster and more accurate classification of vast amounts of information in real-time.

SchemaLogic partners with Intellisophic

SchemaLogic and Intellisophic Join Forces to Offer a Taxonomy Solution.EContentMag.com

SchemaLogic, a provider of enterprise metadata and taxonomy management software, has announced that it has partnered with Intellisophic, a leading publisher of taxonomic content, to deliver a comprehensive taxonomy solution to global 1000 companies. The partnership will offer customers taxonomic content, coupled with a platform to view, manage and disseminate the taxonomies to its search and content management applications.

Taxonomy Balance

Taxonomy turmoil: Should taxonomies be condemned in an age of sophisticated search? Maybe not. By Patricia Beelby & Marcel Roy Information Highways, March-April, 2005

"The answer to the question of whether organizations need taxonomies or search engines is that they need both. Both systems can be inherently complementary: the taxonomy supports the engine in its organization and the search engine compensates for any omissions by finding information outside the taxonomy."

A more balanced perspective on the need for taxonomies from two records management consultants.

Taxonomies: To Be Or Not To Be

Contrarian - I hate taxonomies! By Huw Morgan, CTO of Torstar Digital Information Highways, May-June, 2005

"Why do I hate static taxonomies so much? Because I don't think like a librarian and I like to find stuff I'm looking for..... Taxonomies are too rigid and limit the usability of Web sites. Search engines are good, but functionality may be limited by small content collections. Filtering and sorting engines are a lot better than taxonomies. They let people find things the way they want to."

A strong opinion piece designed to peak your interest or get you riled up. An example of throwing the baby out with the bath water. Taxonomies aren't the only solution in all situations. Searching, filtering, and sorting are other tools. It depends on the content, the audience, the user requirements. Beware of over simplification.

Factiva Acquires Synapse

In a move to beef up an important part of its enterprise service and consultancy business, Factiva has acquired Synapse, a leading taxonomy software and consulting firm. With this deal, Factiva will bring in Synapse’s founding team of Trish Yancey and Dave Clarke to lead Factiva’s Taxonomy Services Group, and will add its Synaptica suite of software for building and maintaining taxonomies as well as the Taxonomy Warehouse, a directory of over 500 taxonomies and other classification schemes.

The lion’s share of Factiva’s business is in the enterprise space, which is the hot spot for content integration and role-based information applications. Factiva’s recent initiatives such as SalesWorks and its Insight suites are all built for enterprise applications that require consulting and customization from Factiva’s consulting and services organization. Taxonomies are a growing part of those content integration projects, so this acquisition is a natural jump-start for that part of Factiva, and another sign of Factiva’s continuing commitment to the enterprise market. Factiva is one of the few companies that are really doing what it takes to enable the role-based products and custom content integration that are critical in that market.

[Outsell Now]

Taxonomies vs. Folksonomies

Bloug: Folksonomies? How about Metadata Ecologies?: Jan 06, 2005: Folksonomies? How about Metadata Ecologies?

"Lately, you can't surf information architecture blogs for five minutes without stumbling on a discussion of folksonomies (there; it happened again!). As sites like Flickr and del.icio.us successfully utilize informal tags developed by communities of users, it's easy to say that the social networkers have figured out what the librarians haven't: a way to make metadata work in widely distributed and heretofore disconnected content collections."

This is a good starting point for finding reasoned discussion by knowledgeable people on the pros and cons of taxonomies vs. folksonomies (AKA "mob indexing")

Monday, July 11, 2005

Taxonomy Experts

Bloug: Taxonomy Experts "Seems like I get asked for names of taxonomy specialists frequently, so I thought I'd just publicize my list here on Bloug."

Here is a list of taxonomy experts compiled by one of the early pioneers in information architecture - Louis Rosenfeld. Mostly US consultants but consultants from other countries are being added as they identify themselves.