Monday, December 29, 2008

Social Tagging

Many have examined the value of user tagging and compared it to the use of controlled vocabulary.

These are the topics in "The Value of Social Tagging in a Corporate Setting" (April 2006) by Stephanie Lemieux in the TaxoCop wiki. It includes a description of the use of social tagging at Raytheon where "people submit website suggestions (URLs) along with recommended tags/keyword which are subsequently verified and approved by librarians."
"Why does it work? Chiefly because the sites submitted are specific to a group or discipline, and no matter how hard we try, having a degree in library science does not give you a degree in engineering (insert discipline here). We do not speak their vernacular. We do well enough to add value with controlled terms, but these folk tags have a life of their own."

The 2008 Conference on Human Factors in Computing Systems
had two papers on social tagging available through the ACM Portal:


Abstract and references are available. Access requires subscription to ACM.

Wednesday, December 17, 2008

Social Tagging at Museums

There are some interesting postings about tagging and folksonomies at conference.archimuse.com site hosted by Archives and Museums Informatics.

The site is described as "a collaborative space for professionals creating culture, science and heritage on-line"

J Trant posts entries to the archimuse blog about tagging. One refers to her presentation on Access to Art Museums Online: A role for social tagging and folksonomy?. Findings come from STEVE:The Museum Social Tagging Project.

Saturday, December 06, 2008

Considerations for Enterprise Search

Why enterprise search is not internet search by Mary Branscombe, IT Pro (Dec 4, 2008)

Google has proven to be terrific on the Web, but will it do as well searching resources in an organization? Google uses link analysis in relevancy ranking which doesn't work as well inside an organization where there are no links between documents and files. An enterprise search tool needs to do more - as this article points out - working in large part with metadata. Several companies are mentioned - Recommind, Autonomy, Microsoft and InQuira - and Google's Search Appliance is also described.

The enterprise search engine must also "integrate with document repositories, corporate databases, ERP and CRM systems, email, call centre and customer support systems, directories like LDAP and Active Directory, your HR and accounting systems and everywhere else".

There are some figures in this article on the costs of looking for information - "businesses waste the equivalent of 10 per cent of salary costs (says the Butler Group) or information workers waste around three to four hours a week – a total of five weeks a year - because they don’t find the information they’re looking for a third to a half of the time (IDC and HP)."

Thursday, December 04, 2008

Explaining Taxonomies

Waffles and Taxonomies by Rich Payne, AIIM Infonomics Magazine

Rich Payne orients readers to the concepts of taxonomy through an analogy with food orders at a Waffle House. The example works - somewhat.

I'm not sure he made the point about eggs completely clear. Let's imagine the Waffle House needs to sales records and other information about the product line. It would see that different kinds of eggs are big business for them. To help people find the information about eggs, they might set up a "taxonomy" for eggs with sub-types of scrambled, fried, poached, boiled. This resembles a thesaurus, where the scrambled, boiled etc are the Narrow Terms.

Two basic rules are interesting:
  • Users must understand the "product of taxonomy"
  • Almost everything has a relationship or is part of a hierarchy

Sunday, November 30, 2008

Controlled Vocabulary in Folksonomies

Editorial - Folksonomies: Why do we need controlled vocabulary? by Alireza Noruzi , Webology (June 2007)

Answers the question "what can a thesaurus do for a folksonomy-based system?" Also makes clear how the folksonomy differs from a classification system.

"Folksonomy is a bottom-up approach where users themselves join the classification, compared to top-down taxonomy and library classifications. By this nature, folksonomy classification can reflect users' actual interest in real time (Niwa et al., 2006). In contrast to hierarchical library classifications (e.g., DDC or LCC) and thesauri, there is usually no limit for choice of tags in folksonomy; so many similar tags are generated. "

Friday, November 28, 2008

Taxonomies and ECM

Paul Quigley, Editorial Director at Enterprise Content Management 365, makes some interesting comments about enterprise taxonomy and folksonomies in the wake of Google's Index Now site search application.

"In the week that SEO behemoth Google took the wraps off its new indexing for content application 'Index Now' site search, the whole sphere of enterprise taxonomy and folksonomies is conjured as the role of knowledge management and context become ever more critical to effective content management strategies. ... The world of semantics, ontology and Kantian epistemology comes to mind in the vast 'contextualisation' process of content management, a veritable holy grail for all umbrella content in a massed digital asset world."

Off-the-shelf taxonomy will not work, but it is suggested that business process management can help in creating a classification / organizing scheme.

These are comments from the November 28th, 2008 newsletter. Subscribe to the newsletter to get updates and ECM industry news.

Wednesday, November 26, 2008

Clustify documents

Automatic categorization, EContent (Nov 6)

"Hot Neuron has released Version 2.0 of its Clustify document clustering software, which automatic document categorization and other tools to help corporations and law firms explore and organize large document sets."

Thursday, November 06, 2008

Clustify will do auto-categorization

Automatic Categorization Added in Hot Neuron’s Clustify 2.0, Newsbreaks (Nov 6)

"Hot Neuron, LLC announced the release of version 2.0 of its Clustify document clustering software, which features automatic document categorization and other tools to help corporations and law firms explore and organize large document sets. Clustify groups related documents into clusters, allowing the user to explore the document set and efficiently and consistently categorize documents by categorizing entire clusters of documents with a single mouse click."

For more information on Clustify, visit www.cluster-text.com.

Wednesday, November 05, 2008

Creating a Taxonomy

Creating User-Centred Taxonomies: Part Two by James Kelway, FUMSI (Sept 2008)

In Part 2, James Kelway show how to create, test, and launch the taxonmomy.

Wednesday, October 29, 2008

IBM Benefits from Enterprise Tagging Service

Enterprise Tagging Service social software saves IBM $4.6 million a year by Mixed by Gregory Culpin (Knowledge Officer @ Whatever) in Enterprise 2.0

Interesting posting at a blog for Enterprise Social Search - IBM set up an Enterprise Tagging Search service and have reported excellent results

"What they found was amazing when you look at it in context: the average person saved 12 seconds, across the 286000+ searches performed through ETS each week. This sums up to 955 hours saved each week across the company. In terms of cost savings, it amounts to a rough estimate of $4.6 million a year, in terms of productivity gain."

Tuesday, October 28, 2008

Basics of Faceted Search

The Future of Search? Faceted Search by Daniel Tunkelang, Chief Scientist at Endeca, Altsearchengines (Oct )

Endeca is known for the faceted search it offers. In this article, its chief scientist, Daniel Tunkeland, provides a brief description, some images, and some history.

Mainly -- "As a practical matter, the facets for a collection are typically derived either from pre-existing fields in a database (e.g., the columns in a relational database or data warehouse) or by applying information extraction techniques to unstructured content, (e.g., detecting names of people and places in free text)."

Faceted approach is most often seen at retail web sites since it is an excellent way to allow browse and search of products by brand, prices, and various attributes.

Saturday, October 25, 2008

Federated Search

There is an excellent primer on Federated Search by Jill Hurst-Wahl in FUMSI (Oct 22, 2008). This is an excerpt from her report, the Federated Search Report and Toolkit. The excerpt provides a clear description of what federated search is ( -- single interface that has the ability to simultaneously search multiple data sources) and the questions that should be asked when considering whether your company would benefit from adopting this kind of software. Among the features to look for is -- Faceted or topically clusteredsearch results.

Wednesday, October 15, 2008

Taxonomy Starter

Creating User Centred Taxonomies: Part One by James Kelway, FUMSI (Aug 2008)

"This two-part article is a step-by-step guide for those wishing to create new taxonomies for their business unit, or client. It will outline the many different elements that make up a quality taxonomy and the pitfalls you should be aware of when starting a new project."

Creating User-Centred Taxonomies: Part Two (Sept 2008)

"In part two of this article, we look at creating, testing and launching the taxonomy."

Both are excellent summaries and are well illustrated.

Tuesday, September 16, 2008

Taxonomy Extension for Microsoft Sharepoint

SharePartXXL International GmbH provides "applications and solutions that add additional features to WSS and MOSS". Among these is the SharePartXXL Taxonomy Extension for Microsoft SharePoint.

From the e-mailing:

"The SharePartXXL Taxonomy Extension for Microsoft SharePoint enables documents and list items in a SharePoint-based portal to be organized cross-site as part of a connected content network and knowledge model. SharePoint items are associated by category using one or more custom taxonomies. So users can organize documents and other portal contents in the way that they think and work. With that extension
the SharePoint portal really can become a place to share knowledge as well as content:"

Details about the product are desribed on the Taxonomy Extension product page.

Friday, September 12, 2008

Forrester Research on Enterprise Tagging

Forrester Research has made a presentation from a teleconference on folksonomies in the enterprise available.

Folksonomy in the Enterprise: How to effectively leverage social tagging
(Feb 15, 2008)

Download from http://www.k-sync.com/document.cfm?iDocumentID=4029358

Agenda covers:

  1. Content is increasingly unregulated and unstructured. This is the problem that tagging addresses.
  2. Content enrichment options - metadata, classification, automated classification, and social tagging.
  3. How tagging aligns with taxonomy - shows taxonomy vs tagging - mentions some software systems, good diagram for "full enrichment cycle".
  4. User case study - IBM Information Discovery Initiative - enterprise social tagging.
  5. Recommendations - try tagging.

Monday, September 08, 2008

Taxonomy Directed Folksonomies

TAXONOMY DIRECTED FOLKSONOMIES -- Integrating user tagging and controlled vocabularies for Australian education networks -- by Sarah Hayman and Nick Lothian at education.au, 2007

This paper was presented at theWorld Library and Information Congress at the 2007 IFLA Conference.

It opens with the key question -- "What is the role of controlled vocabulary in a Web 2.0 world? Can we have the best of both worlds: balancing folksonomies and controlled vocabularies to help communities of users find and share information and resources most relevant to them?"

Paper includes a review of the growth and characteristics of user tagging and folksonomies on the Web. Throughout there is a discussion of the value that user tagging provides - along with some of the difficulties.

The aim is to examine if the folksonomy and formal taxonomy can be used together. The answer is Yes - and the authors have a proof of concept model that they are using at education.au.

"Is it possible to combine the two approaches and gain benefits from both? Some attempts have been made already and a few are mentioned here in a consideration of some future developments for social tagging. We then discuss our own model: the taxonomy-directed folksonomy for the myedna proof of concept."

Saturday, August 30, 2008

Meta-tagging in the office

The EnterpriseSearch blog by Mike Kehoe has entries on taxonomies as well as many enterprise search vendors such as Endeca, Microsoft, IBM.

He seems to favour user tagging but when combined with some forethought on types of metatags. In this article from 2004 he recommends a full study to determine what tags would be best used. 5 Steps to Better Tagging

Friday, August 08, 2008

Enterprise Search Sourcebook

Enterprise Search Sourcebook 2008

The Enterprise Search Sourcebook for 2008 is available in e-book format is available through Nxtbook.com.

This is a free resource with a great range of articles about search tools and practices for the enterprise. A few sections of note:

+ Why enterprise search will never be Google-y
+ Enterprise search: trends for 2008
+ E-Discovery essentials: the rules you need to know
+ Sharepoint Search: an enterprise contender?
+ A natural search solution

There are 102 pages in total. Can save the entire book for offline viewing or print sections.

Monday, August 04, 2008

Taxonomy Boot Camp

Full conference details for Taxonomy Boot Camp 2008 are available at the website.

Organizing Information for Search & Discovery
September 25-26, 2008
San Jose McEnery Convention Center - San Jose, CA

Keynote speakers are:
  • Peter Morville, President, Semantic Studios, and author, Ambient Findability
  • Leslie Owens, Analyst, Forrester

Saturday, July 26, 2008

Taxonomy Folksonomy Cookbook

Free Taxonomy & Folksonomy Book Posted by Oliver Marks in ZDnet (July 23)

Points us to a great new book by Daniela Barbosa of Synaptica at Dow Jones Client Solutions, ‘The Taxonomy Folksonomy Cookbook: Finding the right recipe for organizing enterprise metadata‘. She samples "the different flavors of how enterprises can incorporate social tagging into their taxonomies".

The objective is to get the best of the discipline of the prepared taxonomy and the involvement of social taggers in building knowledge and enhancing findability.

"Fortunately, the taxonomy versus folksonomy issue is not an “either/or” debate, but an opportunity for mutual progress. By combining the virtues of each approach into a working hybrid model, the enterprise can achieve its goal: a user-friendly system that encourages collaboration and makes information easier to find."

Has many "recipes".

The e-book is online and free with registration.

Monday, July 07, 2008

State of Enterprise Search

AIIM Study Finds Enterprise Search Still Lacking By Ron Miller, EContent (Jun 27, 2008 )

"A study conducted by AIIM in May has found that enterprise search is still lags behind consumer-oriented search when it comes to helping people find the information they need inside the firewall, but enterprise employees may have unreasonable expectations based on their experiences on the consumer web."

Saturday, June 28, 2008

Entity Extraction

13 Powerful Entity Extraction Techniques from the Enterprise Search blog (June 23)

Lists 13 techniques that entity extraction systems use.

The blog is managed by Miles Kehoe of New Idea Engineering - a resource to watch.

Tuesday, June 17, 2008

KMWorld's Enterprise Search 2008

KMWorld Enterprise Search, Vol IV [May 2008] features articles on best practices in enterprise search.

Contents:

+ The Enterprise Search "Essay Test" by Andy Moore
+ What Makes Search Great? by Susan Feldman
+ Text Analytics for Enterprise Search - The Essential Components for High Performance Systems by Dr. Johannes Scholtes
+ Access With Security Safely - Improve Knowledge Worker Efficiency with Secure Search
by John McCormick
+ Making Connections - Search and Content: Keys to a Better Customer Experience by Jason Hekl
+ The Path to Universal Search by Vijay Koduri
+ New Equation for Findability by Michael Schmitt
+ Jump-Starting Collaboration with Social Search by Jerome Pesenti
+ The New Frontier—Capturing User Context by Harald Jellum , Trond Lein , Mikael Svenson
+ Search’s Critical Role in Litigation Preparedness by Craig Carpenter
+ Search is an Iterative Process by Ian Davies
+ Your Customers Can Search, But Do They Find? by Nitin Badjatia
+ Whither Enterprise Search?Enterprise search is undergoing rapid transformation. What was once seen... by Martin Muldoon
+ Enterprise 2.0 and Search - Serious Results from Light Tooling by Hadley Reynolds , Zia Zaman
+ Mastering a Key Challenge of Enterprise Search - Put Structured Data at Employees’ Fingertips by Matthias Weber
+ The Forgotten Search Solution - When you buy a new home, one of the first things you set out to do is decorate it... by Dr. Shaun Ryan
+ Five Ways to Waste a Million Dollars - After years of false starts and uncertain strategies for knowledge management... by Rob Guilbert
+ The Enterprise Search "Essay Test"—Extended Remix

Sunday, June 15, 2008

Forrester paper on Information Classification

Information Classification Must Reach Beyond Knowledge Management "There Are Many Faces Of Information Classification" by Paul Stamp, Barry Murphy, Stephanie Balaouras with Matthew Brown, Diana Levitt. Forrester (October 2007)

A free report from Forrester for Information and knowledge management (I&KM) professionals. There are several critical reasons for classifying information in addition to information retrieval — "such as ensuring security, implementing a retention policy, and optimizing the use of storage". "Now is the time for I&KM professionals to sync up with security and IT operations professionals to identify and then augment existing classification policies. Create a classification template that meets the 80/20 rule, enabling all team members to quickly classify about 80% of information in the organization."

Tuesday, June 10, 2008

ASI Taxonomies SIG

The American Society for Indexing (ASI) added a special interest group on taxonomies. ASI is expanding "its interests beyond traditional back-of-the-book indexing".


"The Taxonomies & Controlled Vocabularies SIG is for those in the indexing profession who are involved in creating or editing taxonomies, thesauri, or controlled vocabularies used for indexing. Members may be either as freelancers/consultants or employees of companies that develop taxonomies or controlled vocabularies for externally offered information sources. The SIG's founder and current manager is Heather Hedden, currently a taxonomist at Viziant Corporation and formerly a controlled vocabulary editor at Gale (Cengage Learning). The SIG's Yahoo discussion group is "taxonomies." It also has a web site: www.taxonomies-sig.org."

Saturday, June 07, 2008

Synaptica upgrades

Dow Jones Introduces Enhanced Business Semantic Management Tool, Synaptica Press Release, Dow Jones (May 26)

Dow Jones & Company has added function to Synaptica, "its business semantic management tool" - Synaptica is presented as a tool "that simplifies the process of organizing an enterprise's information assets, putting news, research and other critical internal information within easy reach of employees". These technology investments in Synaptica further build on the Dow Jones Enterprise Media Group's commitment to "Powering the Intelligent Enterprise."

Among the upgrades are a few that related to taxonomies or use of controlled vocabulary for finding information.

o "Semantic Web standardization and Web services integration: Clients can integrate informal social tagging with structured enterprise vocabulary management."

o "Enhanced adding and editing of term relationships: Simple drag-and-drop editing with side-by-side windows makes managing and editing vocabulary hierarchies much easier, eliminating the need for multiple screens."

o "Improved global term editing: Clients save time finding, creating and editing multiple terms with a new one-step process".

Information about Syntaptica and taxonomies: http://solutions.dowjones.com/djcs/index.asp

Tuesday, June 03, 2008

Blog: Matt's Musings

Matt's Musings comes up frequently in an alert I have set up for taxonomies through PSSdir. Matthew Hodgson is a management consultant with SMS Management & Technology in Canberra, Australia, and he seems to think a lot about information architecture and social computing.

From time to time he also muses on classification and the use of taxonomies. There is a particular interesting pair of postings - Folk taxonomy and the taxonomy with examples and some views on benefits and problems.

Also Part III on Folksonomies as shown through their use at delicious and Flickr.

Key question: "In the traditional information classification space (or even information architecture space) we create lots of artifacts like site maps, navigation systems, and taxonomies. In creating all these things we really should be asking ourselves whether our information models are built on the assumption that a single way to organise things can suit all users, one IA to rule them all, so to speak."

Thursday, May 22, 2008

Vivisimo Velocity Discovery Module

Vivísimo Launches New Velocity Discovery Module - Newsbreaks (May 22)

"Vivísimo (www.vivisimo.com), a provider of enterprise search software and expertise, launched the Velocity Discovery Module, an addition to the Vivísimo Velocity Search Platform that expands the ability of organizations to quickly and automatically classify all data by topics and themes. The new module also adds improved collaboration tools that speed the process of accessing, reviewing, and sharing search results."

Tuesday, May 13, 2008

Data Harmony for Taxonomies

Suite taxonomies, KMWorld (May 12)

"Access Innovations has released Version 3.4 of its Data Harmony software suite. The Data Harmony suite contains three major modules:

* M.A.I. for automatic and assisted indexing,
* ThesaurusMaster for taxonomy and thesaurus creation, and
* XML Intranet System for content creation and maintenance.

The release includes more than 30 new features and revised and updated documentation. Current users will find the same look and feel with friendlier and more functional features, says the company. Data Harmony 3.4 Professional Edition fully supports ISO 8859-1 character encoding covering most character sets, including diacriticals used in Western Europe and the Americas. Data Harmony 3.5.1 International Edition provides full Unicode support, UTF-8, multiple language support and multilingual display."

See http://www.dataharmony.com/

Sunday, April 20, 2008

Study: Beyond Search from Stephen Arnold

The New Study Beyond Search Now Available Stephen Arnold, ArnoldIT (Apr 2008)

Stephen Arnold announced the availability of his latest study into enterprise search -- Beyond Search: What to Do When Your Enterprise Search System Doesn't Work. Arnold found that people are quite unhappy with current methods of searching corporate / enterprise materials. Keyword search is not satisfactory.

"Key word retrieval is no longer enough for today's increasingly savvy and demanding users. Few people want to guess what magic sequence of key words unlocks the information in an enterprise search system. Users want to go "beyond search" with the system providing suggestions, delivering answers, and providing actionable information. Laundry lists just aren't what users want today."

From selected quotes

"Classification, entity extraction, and point-and-click access to related content are quickly becoming “must have” features. However, many organizations find themselves unable to afford the seven figure price tags of some of the higher profile systems. Page 89"

Tuesday, April 15, 2008

Alex Wright talks about Glut

The History of Information Architecture - "Listen as Gartner analyst Whit Andrews speaks with Alex Wright - information architect for the New York Times - about his recent book, Glut."

Friday, April 11, 2008

About the Thesaurus

There is much to learn about creating a specialized thesaurus from this interview with Daphne Worsham - The Thesaurus Challenge - done by Cybele Elaine Werts. This was published in the March 2008 issue of the SLA magazine, Information Outlook.

Daphne Worsham has lots of experience from her work to create a special eduction thesaurus for the Western Regional Resources Center in Eugene, Oregon.

The interview covered the main questions in people's minds - value of the thesaurus, what is controlled vocabulary, the process for building a thesaurus and amount of time needed, and the problems with user generated tagging.

One eye-opening remark on the importance of a thesaurus came from a discussion of regional differences in language. Worsham gave this example: "I learned when I worked in a restaurant, that if you order a regular coffee in certain parts of the country, that automatically means to put cream in it. If you order a regular on the West Coast, it means black. So if we use "regular" to tag something as an identified, it wouldn't mean the same thing to everyone." This one example show the importance of "clarity in language".

Wednesday, April 02, 2008

Need a Vision for Search

Avoiding the Big Mistakes -- How to Step Aside and Recover from the Worst Problems by Susan E Aldrich, Enterprise Search Center

Written for project and program leaders, this article reflects on the influence of culture or attitudes and the understanding of publishing requirements that affect the success of a project to improve "search".


"First, most companies do not understand how to be effective content producers. In fact, most companies don’t believe they are publishers."

"...it is impossible to guess who will use the search service you are building and what questions they will expect it to answer. Nailing down requirements becomes extraordinarily difficult: pilgrims in pursuit of the unknowable."

Identifies four phases to the lifecycle of the project, each with failure points.

1 Establish and Communicate Vision
2 Analyze Requirements and Select Technologies
3 Establish Findability Policies and Procedures
4 Post implementation

Makes a plea to consider "interfaces to manage synonyms, concepts, metadata, reprting, analysis, rnaking, promotion and merchandising". There aren't easy answers.

Full document available with registration.

Monday, March 31, 2008

Autonomy's Enterprise Search with Clustering

Search tool analyzes context, user profiles By: Kathleen Lau, Computer World Canada (March 31)

Autonomy added new features to its enterprise search platform, IDOL, that provides for "deep video indexing and quantum clustering".

"The goal of IDOL, or Intelligent Data Operating Layer, is to provide a single unified and vendor-neutral platform for searching all file formats and media-types for legal and business purposes, said the San Francisco-based company’s CEO, Mike Lynch."

Features:

+ "intent-based ranking bases search results on a user’s profile and other contextual factors, instead of solely depending on popularity or keyword matches".

+ "the quantum clustering feature applies quantum mathematics to calculate concepts within data so that users can better identify conceptual information"

+ "enhancements to existing media support for deep video indexing and analytics."

It's another advance in meaning extraction and meaning-based search. Could advances with this technology obviate the need for taxonomies or will the controlled vocabulary of the taxonomy help in expressing meaning?

Sunday, March 23, 2008

Reports on Sharepoint

The March 2008 issue of the Montague Institute Review has two articles on Sharepoint.

http://www.montague.com/le/alerts/le0308.html

From the newsletter:

Sharepoint conference report: Report on what's new at the first public Sharepoint conference in Seattle on March 3 - 6. Includes information on the FAST acquisition and a new federated search function.


Sharepoint search: An enterprise contender? (free but requires registration)
Montague Institute founder Jean Graef thinks that MOSS 2007 will suffice
in some situations -- but by no means all.


Jean Graef commented on MOS capabilities for using existng metadata and thesaurus relationships.

Friday, March 21, 2008

Color and Information Design

Color Chart: Reinventing Color from 1950 to Today
[Macromedia Flash Player]
http://www.moma.org/exhibitions/2008/colorchart/flashsite/

This online exhibit on color from the Museum of Modern Art is a fascinating for the way it has organized information and used multiple points of views. Visitors can browse by artist name, medium (photographs, paintings, sculptures, drawings and prints, and other), and timeline by year starting in 1918 to the present. The commercial color chart is the organizing premise or starting point.




The site "handles" very well in navigation and has a good mix of image, audio, and text. When viewing the art work, watch for links to Full Caption | Extended Text.

There are also four videos of artworks being installed at MOMA.

Postscript (Mar 10, 2009) New Link - http://www.moma.org/visit/calendar/exhibitions/30 - page about the exhibition with link to the exhibition site and a link to an audio tour.

Wednesday, March 19, 2008

Checklist for enterprise search

Delivering on the Promise of Enterprise Search -- 10 Issues to Consider, by BRIAN DIRKING for the Enterprise Search Center (Mar19)

Checklists are always helpful. This one is about enterprise search.

"When an organization identifies the need for enterprise search, there are myriad questions it must answer. It is vital to identify the most important criteria for your organization, as they will guide your evaluation and eventual implementation of enterprise search. As a starting point, here are 10 issues every organization must consider to help ensure that its investment in a search solution delivers on its promise."

Taxonomy isn't mentioned directly, but it might come into play when considering scale and scope -- "As you consider what needs to be searched, you will also find new uses and new users for your search technology. With new users comes a question: Should you choose separate search tools for different uses, or can you use a single search solution to address these different applications?"

Thursday, March 13, 2008

Types of Taxonomies

Taxonomy Design Types by
Barbara Blackburn, AIIM (May 31, 2006)

Good introductory article on types of taxonomies with examples.

"Taxonomies are usually hierarchical. Categories (nodes) in the hierarchy progress from general to specific. Each subsequent node is a subset of the higher level node. There are three basic types of hierarchical taxonomies: subject, business-unit, and functional."

Wednesday, March 05, 2008

Information is Overwhelming

National Workplace Survey Reveals American Professionals Overwhelmed, Headed for “Breaking Point” LexisNexis press release (Feb 26)

LexisNexis survey of 650 US white-collar and knowledge workers points to information overload

+ more than 70% of 10 U.S. white-collar workers feel inundated by information at work

+ more than two in five say "they're headed for an information "breaking point"

+ "62 per cent said they spend a lot of time sifting through irrelevant information to find what they need"

+ "68 per cent wish they could spend less time organizing information and more time using it"

+ white-collar workers spend on average 7.89 hours each day "conducting research, attending meetings and searching for previously created documents".

+ 68% would like to spend less time "finding and organizing information and more time using it".

Also see "The High Cost of Not Finding Information", Stephen Abram, Stephen's Lighthouse (Mar 4)

Tuesday, March 04, 2008

Leveraging Knowledge

Wonder how knowledge management will change through the move to social networks and collaboration? Ross Dawson answers the question in The Future of Knowledge Management (2004)

"In the course of my travels and speaking and consulting engagements around the world, I have found that there are five key frames for leveraging knowledge in organisations that are emerging as the successors to knowledge management, and that executives find relevant, compelling, and actionable."

Monday, March 03, 2008

Document Clustering Software

New document clustering software from HotNeuron.

Hot Neuron Introduces Document Clustering Software Newsbreaks, Information Today (Mar 3)



Clustify
(www.cluster-text.com) is "aimed at helping corporations and law firms explore, organize, and tag large document sets. Clustify uses a proprietary, agglomerative algorithm designed to provide excellent cluster quality and scalability. Clustify can help corporations organize their internal documents. It can also enhance search engines by identifying related documents that may be of interest even if they don’t match the search query exactly."

Enterprise 2.0 with Open Text

Two press releases today (March 3) from Open Text show how important "web 2.0" concepts have become in companies and the ever widening adoption collaboration technologies for knowledge management.

Open Text Unveils Enterprise 2.0 Strategy -- Comprehensive Strategy Will Help Organizations Accelerate Social Computing, Embrace Web 2.0, Advance Collaboration and Ensure Compliance; Company Announces New Solutions As Part of Broader 2.0 Strategy.

Some bits:

+ Experience Optimization: Optimize the end user experience through personalization, language preferences, analytics, voting, tagging, blogs and moderation.

+ Meta-Data to Underpin Experience Optimization: Deeper levels of meta-data handling will enrich the user experience. This includes richer meta-data handling for content, but with particular emphasis on richer meta-data for people and processes as well. It also includes blending the best of both top-down taxonomies with bottom-up tags.

+ Classification: Improved intuition to auto-classify content, particularly drawing on past user behaviors and current context.


Open Text Introduces Solution to Accelerate Social Computing and Collaboration in Organizations -- New Livelink ECM - Extended Collaboration Connects People, Processes and Content in Real-Time

"The solution combines a robust knowledge repository with project workspaces, polls, news channels, tasks and milestones. A set of community applications brings specialized, enterprise-ready tools, such as forums, blogs, wikis, and real-time collaboration; along with newsletter views, FAQs, and event calendars to promote shared expertise and best practices."

Sunday, March 02, 2008

Book: Tagging: People-powered Metadata for the Social Web


Many enterprise search companies are incorporating elements of collaborative tagging into their systems. The book Tagging: People-Powered Metadata for the Social Web is intended to help people understand the tagging and folksonomies and design for it in their systems.

Peter Morville wrote, "In Tagging, Gene Smith has written the definitive book on designing applications for the social web."

The author, Gene Smith is an information architect, blogger, designer, consultant and "tagging aficionado". He is principal at nForm User Experience in Edmonton, Alberta and has advised clients like Comcast, Ancestry.com and the Canadian Patient Safety Institute. [From bio at Northern Voice.]

Some notes and materials are at the companion web site - Tagging. This has a couple of interviews and some footnotes for the chapters. Potential readers would most like to see the table of contents, but this is not at either the web site or the Amazon page.

Mr Smith also maintains the blog Atomiq which is largely about information architecture and tagging.

For an introduction to the principles of tagging see Tagging 101, his presentation to Northern Voice available through Slideshare.

Friday, February 29, 2008

Enterprise Content Management

KMWorld has released Best Practices for Enterprise Content Management (March 2008) [Register to access the document]

"How do you CONTROL and BENEFIT from content? Start with educating yourself about the many forms and functions of content management tools."

The online publication has articles that look at tools (Sharepoint, InQuira and others) and strategies (handling email overload, search, new formats).

Andy Moore introduces the set with an article "The Chaos of Content">

"Harnessing the power of content is, it seems, easier said than done. Between the
massive volumes that challenge the most well-equipped IT teams and the crazy new
ways that knowledge is transferred that stymie the business managers, content has
become the rose with the thorn. This KMWorld White Paper is a perfect example;
the subjects here range from records management to knowledge transfer and
back again faster than you can turn the pages. But please turn them, and read them,
because if this group is correct (and I think they are) the future of your organizations will depend on it."

Nothing on taxonomies or classification in this set, but the context is important.

Tuesday, February 26, 2008

Northern Light's MI Analyst

Northern Light Launches MI Analyst 2.0, eContent (Feb 26)

"Northern Light has launched its second major release of MI Analyst, an automated "meaning extraction" application designed specifically for market intelligence, market research, and product research. MI Analyst 2.0 adds many new "facets" (categories of terms) by which the software can analyze search results, automatically extracting meaning from internal and research documents, licensed secondary research, news stories and web sources."

New release adds facets for the pharmaceutical industry - "Human Anatomy, Diseases, Drugs, Cells, Cell Receptors, Proteins, Genes, Enzymes, Pharmaceutical Markets, Life Sciences Scenarios and Research Strategies and Therapeutic Approaches".

Monday, February 25, 2008

Study: Social Tagging in Communities

Collaborative and Social Tagging Networks

Emma Tonkin, Edward M. Corrado, Heather Lea Moulaison, Margaret E. I. Kipp, Andrea Resmini, Heather D. Pfeiffer and Qiping Zhang gather a series of international perspectives on the practice of social tagging of documents within a community context.

Ariadne, Issue 54 January 2008

From the introduction:

"Social tagging, which is also known as collaborative tagging, social classification, and social indexing, allows ordinary users to assign keywords, or tags, to items. Typically these items are Web-based resources and the tags become immediately available for others to see and use. Unlike traditional classification, social tagging keywords are typically freely chosen instead of using a controlled vocabulary. Social tagging is of interest to researchers because it is possible that with a sufficiently large number of tags, useful folksonomies will emerge that can either augment or even replace traditional ontologies. As a result, social tagging has created a renewed level of interest in manual indexing [1]. In order for researchers to understand the benefits and limitations of using user-generated tags for indexing and retrieval purposes, it is important to investigate to what extent community influences tagging behaviour, characteristic effects on tag datasets, and whether this influence helps or hinders search and retrieval.

This article reports on research presented on a panel at The American Society for Information Science & Technology (ASIS&T) 2007 annual conference which investigated the use of social tagging in communities and in context. The panel was co-sponsored by SIG-TAG, a special interest group of ASIS&T that is interested in the study of social tagging, the Special Interest Group on Knowledge Management (SIG-KM), and the Special Interest Group on Classification Research (SIG-CR)."

...

"Several models of tagging behaviour, aimed at describing the ways in which people tag, are invoked in these studies along with metrics such as the number of tags given, tag co-occurrence and measured frequency. This reflects an ongoing dialogue between researchers; some apply methods from social network analysis, some from the many subfields of linguistics, knowledge management and classification research. Tagging practice is generally from a known stance, such as metadata, keyword or thesaurus provision, a matter of situating the relevance of the concept to known disciplines. Here too, we begin with a familiar theme; writ large, this panel examines certain facets of contextuality in information retrieval."

Saturday, February 23, 2008

Taxonomies for Local Business Search

Yellow Pages and Business Directories on the Web have for the most part operated with the taxonomies that were used in print to help users find what they needed. You'll see examples of this at Yahoo Local Canada
(image below) and YellowPages Canada.



Brian Wool at Clickz, where search marketing is much discussed, noted in his article Local Searchers Hunt for Ideas, Not Categories (Feb 14, 2008) that there's a view that "the generation born from roughly 1980 to 1995, no longer think in terms of categories like previous generations." It's the old boomers who grew up with print who want the categories; young people use keywords. He specifically points to Google as an example of service that has dropped categories - "Gone are the days of matching local search queries to categories. In the last few years, Google has dramatically scaled back its utilization of a category schema in its local search results."

Not so quick. Mike Blumenthal writing for Search Engine Land (Feb 22, 2008) asks - Google Maps Categories: Will The Pain End Soon?. Google Maps is Google Local. He determined that Google uses the categories from SuperPages and allows businesses to add their own categories when using Google's Local Business Center. But the end result has been confusing (likely to users) and frustrating to small businesses who were not showing in the same search results as competitors.

Blumenthal learned through conversations with Google that "... Google's general idea about categorization was to not pick a single taxonomy, provider or structure and their goal was to increase confidence by using many data source signals. Their general approach was to create an overarching categorization system that is a natural reflection of the way people think about these types of searches." Google wants a flexible taxonomy that adapts with more user input.

Blumental -- "It is clear that a straight up flat, categorization system will not be sufficient to meet searcher needs in the age of internet expectations. I assume that the transition from this relatively flat structure to the more flexible taxonomy that Google is speaking about is one of the friction points currently causing problems."

And the other question is will it ever work? Might the rejection by the 'millennials' of taxonomies have been because of poor design of the interface? Putting like things together is an age-old practice of facilitating access. If the taxonomy isn't up front, it will at least need to be operating in the background.

Monday, February 18, 2008

Enterprise Search Summit 2008

Taxonomies will really be shown in context at the upcoming Enterprise Search Summit. This will take place May 20-21, 2008 in New York City.

Full program is at http://www.enterprisesearchsummit.com/

"The emphasis for ENTERPRISE SEARCH SUMMIT 2008 is on “extending search,” as search becomes not just an application but a major computing platform. The agenda focuses on how enterprise search software and related applications really work inside organizations and covers the complex issues and problems that challenge experienced search managers. Expert speakers tackle tough topics such as tuning search to deliver optimum results, making the most of search logs and analytics, applying Web services solutions to metadata challenges, developing topic maps, and much more. Breakout sessions pack more hours of programming into the concentrated conference schedule and give you the chance to select topics of special interest to customize your conference experience."

Look for sessions on:

+ Categorizations: A Case Study by Tom Reamy, KAPS Group
+ Taxonomy-Powered Discovery: Who says you need an expert -- Heather Hedden, Viziant Corp.
+ Search as the Gateway to Enterprise Information - Lou Paglia, Factiva
+ How Faceted Navigation Aids Discovery - Seth Earley and Associates
+ Social Work: Is Social Search Right for the Enterprise - moderated by Jean Graef, The Montague Institute
+ Keynote Panel: Take a 30,000-Foot View of Enterprise Search Implementation - moderated by Sue Feldman, Content Technologies Group, IDC

Saturday, February 16, 2008

Peter Morville Speaks About Findability

Peter Morville spoke about findability at the WebStock Conference in New Zealand. Richard MacManus of ReadWriteWeb reported on the session.

The Future of Search: Peter Morville Defines, Shows Examples by Richard MacManus, ReadWriteWeb (Feb 13, 2008)

Peter Morville, author the book Ambient Findability, defines it as "the quality of being locatable or navigable". To accomplish this means using metadata, pattern finding, taxonomies and tagging.

"Morville says it's about bringing together things like tags and taxonomies, not keeping them apart. He points to sites like Etsy, where users can classify their items and tag. He says the future of search will be "a future where search and browsing work together". He refers to a Marcia Bates article about "berrypicking"."

A few examples of products and services are mentioned: Endeca for guided navigation, which Morville says "mirrors the way our psychology works when searching for information". Buzzillions, a product review site, uses facets to help people navigate through products.

Wednesday, February 13, 2008

Videos on Taxonomy Development

Video Library: Dave Clarke on Taxonomy Management Tools by Daniela Barbosa (Feb 3, 2008)

Features three video segments and slides of a presentation by Dave Clarke Director of Global Taxonomy at Dow Jones on the process and software tools for developing a taxonomy. Synaptica, the tool that Dow Jones uses, is well described.

Patrick Lambe, from whose blog these videos first appeared, also has video podcasts of Joseph Busch of Taxonomy Strategies talking about two taxonomy development case studies in the public sector: Singapore metadata standards, and the US Environment Protection Agency.

The sessions were from the iKMS conference held in Singapore in August 2007.

Primer on Taxonomies

Better Living Through Taxonomies by Heather Hedden, Digital Life Magazine (Feb 5, 2008)

Heather Hedden tackles the matter of improving navigation at a web site through the use of a well planned taxonomy. This article is an excellent primer to the design and use of taxonomies - what they look like, how to create them, and how they benefit the searcher. Concludes with a list of resources.

"Large websites and intranets can benefit from improved methods of search and navigation. These include site maps, A-Z indexes, sophisticated search engines, and generally improved navigational design—and playing a potential role in all of these methods is well-planned taxonomy."


Thanks to Patrick Lambe of Organising Knowledge for this lead.

Negative for taxonomies

Enterprise Search: Leveraging and Learning From Web Search and Content Tools Lynda Moulton, Gilbane Enterprise Search Practice(Dec 14 2007)

Lynda Moulton asked panelists at the Gilbane Boston 2007 Conference, "Will Web and Internet Search Technologies Drive the Enterprise (Internal) Search Tool Offerings or Will the Markets Diverge?" She noted their comments in this blog entry.

Of interest: "Among the other noteworthy comments in this session was a negative about taxonomies. The gist of it was that they require so much discipline that they might work for a while but can’t really be sustained. If this attitude becomes the norm, many of the semantic search engines which depend on some type of classification and categorization according to industry terminologies or locally maintained lists will be challenged to deliver enhanced search results. ..."

Monday, February 11, 2008

Microsoft Sharepoint and Enterprise Search

Jean Graef of the Montague Institute provides an analysis of Microsoft's Sharepoint for enterprise search in a paper available from the Enterprise Search Center. Main points are summarized in the e-newsletter of February 6, 2008

Whether you deploy MOSS for enterprise search depends on your technology strategy and budget, how much you’ve invested in metadata and taxonomies, and how you plan to search multiple content repositories. If you use SharePoint for collaboration and content management but choose another product for enterprise search, you’ll need to consider two kinds of complimentary products: taxonomy management programs that integrate with MOSS search, and search engines that can search SharePoint content.


Analysis looks at metadata discovery capabilities of Microsoft Office SharePoint Server (MOSS) search and use of thesaurus data and relationships to expand a search. These capabilities, however, are limited.


So while it’s possible to tweak MOSS search results using a variety of techniques along with some data from an existing thesaurus, it’s a labor-intensive endeavor. For this reason, some organizations with large, complex taxonomies opt to purchase third-party thesaurus management software that integrates with SharePoint—an approach that Microsoft endorses. Examples of MOSS-compatible taxonomy management tools include Factiva Synaptica,Data Harmony Machine-Aided Indexer, Schemalogic
SchemaServer, and Interse I-box.



The full report, Sharepoint Search - an Enterprise Contender?, is available for free with registration. [PDF - 6 pages]

Saturday, February 09, 2008

Subject Navigation

Mapping the library future is a presentation done by John Mark Ockerbloom from the University of Pennsylvannia Libraries for the ALA Catalog Form and Function Group (Jan 12, 2008). It examines "Subject navigation for today’s and tomorrow’s library catalogs". He points out the richness of Library of Congress subject headings and the value of using subject maps to enhance browsing. Also comments on facets and use of tagging in library catalogs. It's about navigating library collections but the principles apply for any body of information.

Friday, February 08, 2008

Design for Exploratory Search

From Keyword Search to Exploration: How Result Visualization Aids Discovery on the Web Kules, B., Wilson, M., Schraefel, M., Shneiderman, B., Juman, Computer Interaction Lab, University of Maryland (February 2008) [pdf 57 pages]

This monograph, published by the University of Maryland, examines methods for exploratory web search where the subject is new to them or complex. It begins with theory of search, information retrieval, and information seeking, surveys existing information systems , and concludes with some thoughts on evaluation.

From the abstract:

"This monograph offers fresh ways to think about search-related cognitive processes and describes innovative design approaches to browsers and related tools. For instance, while key word search presents users with results for specific information (e.g., what is the capitol of Peru), other methods may let users see and explore the contexts of their requests for information (related or previous work, conflicting information), or the properties that associate groups of information assets (group legal decisions by lead attorney)."

The "search environments review" describes the value and use of a variety of classification methods: hierarchical, faceted, automatic clustering, and social tagging - with examples drawn from the public Web. These techniques help a user in refining or clarifying a search.

Next there is the matter of viewing the results and ways in which results can be presented in ways that go beyond linear: 2D format (treemap, hyperbolic tree, scatter plots, cluster maps), or 3D (Data Mountain browser).

Both sections provide a good overview of the principles of the design and their relative merits.

Wednesday, February 06, 2008

Kent State - Information Architecture and Knowledge Management

Kent State University is advertising its Information Architecture program with a glossy pdf -- The Architects of the Information Age with the message, "In today’s high-tech economy, managing data efficiently gives companies a competitive edge".

On a strategic level, information architects need to understand and address both a company’s business model and the needs of its customers, says Reiss. On a tactical level, that means creating the right metadata—information about the information—to help search engines return more accurate results. It also means creating a site thesaurus, so when users type in one word, all the synonyms they could have meant are also considered. Ultimately, it means developing new and novel categorization systems—like collaborative filtration, where customers buying a product can see related products other customers bought.


The master's program is in Information Architecture and Knowledge Management and involves Kent State’s business, library and information science, and communication studies schools.

Tuesday, February 05, 2008

Twine - tagging with an "intelligent personal Web assistant

Here is a new web-based application that everyman can employ to automatically extract names of people, places, companies and other entities from email and documents. It's called Twine, and the New York Times gave it a glowing review.

An Online Organizer That Helps Connect the Dots
by Anne Eisenberg, New York Times (Feb 3)

"Customers have individual accounts on Twine’s Web site, where they save URLs or other information. They can make their collections, or “twines,” private, share them in groups with other members having common interests like politics or fashion, or even make the twines public."

Twine automatically tags according to its recognition of entities and can assign a subject from what it can infer the paper is about. Users can also add their own descriptive tags. People participating in the current test have been pleased with the results.

Twine is accepting registrants for the Beta test to begin in the next couple of months.

From the Twine site: "... in a nutshell Twine uses the Semantic Web, natural language processing, and machine learning to make your information and relationships smarter. But if that’s all Greek to you, just think of Twine as your very own intelligent personal Web assistant, working for you behind the scenes so you can be more productive."

This has tremendous potential as a social tool for groups of all kinds - any set of people who would benefit from sharing documents - as well as a personal productivity tool for individuals.

Saturday, February 02, 2008

Coveo Solutions for Enterprise Search

Coveo Solutions' enterprise technology received thumbs up from IDC Research Vice President Susan Feldman, long time analyst and search technology expert. Her assessment is in the white paper, Searching for Search: Can it Be Delightful? available (with registration) from Coveo.com.

Press release New White Paper by Search Analyst Susan Feldman Demonstrates the Value of Easily Deployable, Platform-class Search Technology (Jan 29, 2008) provides a preview.

The paper begins with an identification of the characteristics of what makes search great. Among these are "specialized vocabularies, ontologies and taxonomies to remove ambiguity ..."

IDC conducted interviews with Coveo's customers and was impressed by the very favourable reviews. The paper includes four case studies that showcase what IDC found as "representative comments about deploying and using Coveo Enterprise search". There are a variety of business situations: a geotechnical and environmental engineering firm, Michigan Criminal Defense Bar, technical support for CA, and a Fortune 20 company that needed to work with audio and video.

Categorization and entity extraction are two of the features that were seen to improve search. This screenshot shows metadata or filters in the right panel as aids to browsing content.

Reuters promotes semantic tagging

Reuters Embraces Tagging, Semantic Web by Jennifer Zaino, Intranet Journal (Jan 29, 2008)

Reuters has made a major move to encourage tagging of content for better understanding and discovery. In 2007, Reuters bought ClearForest Ltd noted for the semantic capabilities of its technology for tagging structured and unstructured data and extracting meaning. Through the Calais web service the technology is to be extended from internal content tagging to using externally generated tags on blogs and other sources.

That vision is being realized as the company extends its internal web service for content-tagging structured and unstructured data (its vast store of corporate information as well as reporters' stories), based upon ClearForest technology, to the world at large.


From Reuters Releases Open API for New Calais Web Service, Centre Daily (Jan 29)

The Calais Web service enables publishers, bloggers and sites of all kinds to automatically metatag the people, places, facts and events in their content to increase its search relevance and accessibility on the Web. It also lets content consumers, such as search engines, news portals, bookmarking services and RSS readers, submit content for automatic semantic metatagging that is performed in well under a second.


The Calais Web service generates semantic metadata automatically, and adds metadata from publishers.

Wednesday, January 30, 2008

Longitude for enhanced SharePoint search

BA-Insight has two short demos on how its Longitude product enhances SharePoint search. BA-Insight claims that searchers will be able to find information in SharePoint 10X faster using Longitude. This is partially achieved through the tagging and use of meta-data.

- Automatic meta-data tagging: "Users automatically tag relevant content when they search for information and find relevant content".
- Taxonomy navigation: Makes use of the meta-data to guide searchers through search results. Administrators can define the "key enterprise entities (topic, author, creation date, content type etc).

Longitude is described at http://www.ba-insight.net/search-solutions.html

The two video demos are available from the Products page or directly at http://www.ba-insight.net/enterprise-search-demo.html

Part 2 covers:
  • Automatically tuning the SharePoint ranking algorithm
  • Automatic Metadata tagging by users
  • Creating a “bottom up” or user based Taxonomy.
  • Dynamic page extracts
  • Actionable search results

Taxonomy Development Process Checklist

Librarians Kathryn Breininger and Mary Whittaker at the Boeing Library in Seattle identified seven stages to creating a taxonomy in their presentation at the Internet Librarian conference October 2007. Their 6-page handout, the Taxonomy Development Process (pdf), is a useful checklist for the questions to ask, tasks, outputs, and possible deliverables for each of the seven stages.

Seven stages are:

  1. Determine taxonomy requirements
  2. Identify concepts - where is the content and what do the users think?
  3. Develop draft taxonomy
  4. Review with users and SMEs
  5. Refine taxonomy
  6. Apply taxonomy to content
  7. Manage and maintain taxonomy

Tuesday, January 22, 2008

Taxonomies at Shopping Sites

Shopping is one area on the Web where we see taxonomies deployed to assist users in product searches. This posting on Shopzilla Gets Organized compares Shopzilla to Pricegrabber, Yahoo Shopping, and Shopping.com and shows the variety of approaches. Brian Smith at ComparisonEngines.com clearly prefers the detailed categorization at Shopzilla.

Certainly he thinks that a strong taxonomy is good for the searcher and for the merchant.


"Starting with the big, bold pictures representing the categories and then moving down into cleaner/more robust filtering options and a more granular taxonomy, shoppers should be finding products easier (which makes for a better shopping experience) and merchants should be getting more relevant clicks (which improves ROI)."


Follow the screenshots on a search for football or the links for a search on 'home organization' to see the different treatments.