Taxonomy Watch: 2007

Friday, December 28, 2007

Using Web 2.0 for the Enterprise Portal

Beyond enterprise portals by Greg Pepus, KMWorld (Dec 28, 2007)

Enterprise portals may have become too complex and unwieldy; web browser don't have the "programmatic sophistication" to support security, content management and much else. Argues that companies "need better solutions and smarter technical and business approaches to take advantage of increasing desktop/laptop computing power and emerging service-oriented architectures in the business enterprise. Perhaps that is one of the things that Web 2.0 is all about—breaking the tyranny of the portal." Recommends adding Rich Internet Application (RIA) technology to expand function and better inter-application data sharing.

Thursday, December 27, 2007

IBM Classification Module

IBM Boosts Content Classification Software Enterprise Search Center (Dec 26)

Press release:

"IBM announced new capabilities in its content classification software used to automatically categorize large volumes of enterprise information, making it easier to find, access, and use in the context of enterprise content management systems. The IBM Classification Module provides seamless connection to the IBM FileNet P8 content management platform to tackle the categorization of vast amounts of unstructured content in the enterprise, especially content stored or arriving in FileNet repositories. It automates the process of determining whether content is important, and how it should be handled. It can also automatically classify vast amounts of previously unmanaged content or reclassify content already under management so it can be leveraged for business purposes such as records management.

IBM also announced that Cloudmark, a provider of carrier-grade messaging security, has selected IBM content classification software to support its customer base with improved online customer support. The IBM software is intended to help Cloudmark reduce the workload and cost of handling online customer queries."

Saturday, December 22, 2007

Webinar on Folksonomies and Taxonomies

Webinar: Folksonomies and Taxonomies in the Enterprise

Daniela Barbosa of Dow Jones Client Solutions will be leading a Webinar organized by the Dow Jones InfoPro Alliance about Folksonomies & Taxonomies in the Enterprise on January 10, 2008. Registration is free. (Link is in that posting)

Among the topics:

* Business value of a taxonomy/folksonomy
* Impact of social networking tools on the enterprise
* Governance tools
* Merging folksonomies and existing taxonomies
* Some best practices and common obstacles

This entry comes from her weblog - daniela barbosa chitchatting about information delivery.

Postscript Feb 15, 2008 - webcast for this session is available from Factiva InfoPro probably until mid 2008. Uses Event On24.

Friday, December 07, 2007

Demonstrating Value of a Taxonomy

Demonstrating taxonomy value to senior managers - Montague Institute will be holding a teleconference primer/roundtable discussion on February 20, 2008 from 11:00 am to 2:00 pm Eastern time.

"In this primer/roundtable, Montague Institute founder Jean Graef will show how concepts from IT, library science, and corporate publishing can be used to communicate taxonomy benefits to different stakeholder groups. She will also summarize the experiences of Society members in selling taxonomy to their management."

Full description, price, and registration at http://www.montague.com/roundtable43.html

Exalead Enterprise 4.6

Exalead Announces Updated Version of exalead one:enterprise, EContent (Dec 7, 2007)

Exalead, notable for its search product that can extract related terms, has released exalead one:enterprise 4.6 with "several new enhancements that are designed to help organizations easily configure and customize business applications, including hybrid vertical search applications."

Thursday, November 29, 2007

Teragram for managing ontologies

Ontology management, taxonomy development, Enterprise Search Center (Nov 28, 2007)

"Teragram has unveiled Semantic Term Manager (STM) 2.0, software that enables management of content and maintenance of ontologies in enterprise content repositories and databases. STM 2.0 is designed to help corporate librarians maintain ontologies and integrate this information directly with Teragram’s TK240 taxonomy management tool. The combination of these two programs allows knowledge workers to maintain metadata across repositories and databases and to automatically tag documents according to the defined taxonomies. These tools help to simplify the enterprise search and retrieval process, says the company."

Enterprise Search: Information Architecture that includes users

"Enterprise Search: Rethinking it in a Web 2.0 World" By Jayne Dutra, Freepint (Nov 29, 2007)

Jayne Dutra is the Lead Enterprise Information Architect at the Jet Propulsion Laboratory, California Institute of Technology. She understands the importance of information architecture and the need to engage users in tagging content as well.

Users are in the habit now to add "metadata" to describe what something means to them and how this can be useful to others. But that alone won't be sufficient.

"Successful enterprise search today doesn't mean making keywords work well. It means creating a holistic information architecture designed for the enterprise that allows input and evolution by the users themselves. Ironically, this usually relies on the time honored and humble practice of generating metadata and controlled vocabularies that enable data connectedness and intuitive recall. For years, we've heard that users won't fill out metadata fields. Then how does one account for the phenomenal success of Flickr? If one enters a set of bookmarks in del.icio.us, doesn't that tell us something about the person's interests and background? New Web 2.0 technologies generate metadata in the wild that can be domesticated if we are wily enough to recognise the opportunity."

Dutra also argues for installing the foundation pieces - specifically the creation of a "metadata core specification" and an associated taxonomy.

"The ultimate goal is an information environment enhanced by metadata and served up through a number of rich user interactions facilitated by role based access. "

Monday, November 19, 2007

Recommind does Federated Search

Recommind Enhances MindServer With Deep Federated Search Framework, Newsbreaks (Nov 19, 2007)

"Recommind (www.recommind.com), a provider of enterprise search, automatic categorization, and eDiscovery systems for law firms and enterprises, announced the availability of the MindServer 5.1 platform, which combines robust navigation and grouping controls over external content with multilayered security to deliver a federated search framework. This latest version of Recommind’s flagship MindServer enterprise search platform is designed to bring the full potential of federated content to organizations."

Sunday, November 18, 2007

Weinberger on organizing digital information

David Weinberger answered questions about his book, Everything is Miscellaneous, in an interview with Hugh McKellar, KMWorld (Nov 1, 2007)

He explained that he doesn't mean miscellaneous as a jumble of things that are unrelated to each other but as "the aggregation of everything, with the important difference that with the digital miscellaneous, we find all sorts of ways that the things are alike, all sorts of connections and relationships". He believes in the power of user tagging - of using the relationships that people identify as the means for finding information in an enterprise.

"Tagging systems let the users of information decide how they’re going to think about that information, or what that information means to them. Tagging within the corporation is potentially a very powerful tool for sharing knowledge and for enabling social networks to emerge around shared expertise."

On being asked if this replaces the traditional top-down taxonomies, Weinberger comes very close to saying yes, although in the end he seems to see them as being complementary.

"The real importance of a folksonomy is that it retains much more information than the traditional top-down taxonomy does. The top-down taxonomy only knows, typically, that x is a member of y and y is a member of z. With a folksonomy, you know that 17 percent of people think of x as a member of y, but 23 percent think of it as a member of q, and 42 percent of them think that it’s really the same thing as an x." ... "The folksonomy doesn’t have to replace the taxonomy with another static set of categories. It can instead allow the people who are in the minority a way of thinking about something to search the way that they want to. The folksonomy can surface those minority relationships."

Follow David Weinberger's musings about the organization of information at his blog Everything Is Miscellaneous.com/. The main page also has links to interviews, videos, and podcasts with Weinberger.

Professor Michael Wesch's video Information R/Evolution is especially recommended as it brings home the point that organizing digital information is much different to what civilization worked out for paper.

Friday, October 26, 2007

Taxonomies & the Semantic Web - Call Session

Taxonomies & the Semantic Web from the Taxonomies Communities of Practice - a teleconference by Earley and Associates - October 31, 2007, 1 - 2 pm EST. 50$ US (Slides are sent in advance.)

"Taxonomy is the art of adding value to information by placing it in a useful order that supports both direct searching and serendipitous browsing. Taxidermy is the art of stuffing and arranging the skins of dead animals to create lifelike effects. Taxonomies are a fundamental part of the Semantic Web: machine-readable hierarchies that enable intelligent agents to make logical inferences, thereby making information retrieval an entirely new, more sophisticated experience. However, recent books such as Dave Weinberger's Everything is Miscellaneous and Eric Abrahamson's A Perfect Mess suggest that taxonomy and taxidermy are closer than we care to acknowledge."

Montague Institute on Sharepoint

Montague Institute is taking a closer look at Microsoft Office Sharepoint Services (MOSS 2007) with a round-table teleconference, a new discussion list, and a web course.

> Yahoo discussion list for Sharepoint search.
http://tech.groups.yahoo.com/group/sharepointsearch

"This topic includes all issues relating to creating, organizing, and finding documents on the Sharepoint platform. The group is intended as a forum for people who manage records, documents, intranet content, and external collaboration sites as well as for indexers, corporate taxonomists, and enterprise search managers."

> Web course Taxonomies, Search & Sharepoint -($$$)

Thursday, October 25, 2007

Semantic Web Blend

A Smarter Web - New technologies will make online search more intelligent--and may even lead to a "Web 3.0." By John Borland, MIT Technology Review (March 2007)

The Semantic Web - the structure that will enable us to see connections between databases and gather information with less effort - is getting closer. This article describes the objectives, the progress, the players in making smarter tools for organizing and finding information. Ontologies are involved as are user-generated tagsonomies.

"The Semantic Web community's grandest visions, of data-surfing computer servants that automatically reason their way through problems, have yet to be fulfilled. But the basic technologies that Miller shepherded through research labs and standards committees are joining the everyday Web. They can be found everywhere--on entertainment and travel sites, in business and scientific databases--and are forming the core of what some promoters call a nascent "Web 3.0.""

The writer traces the history to organize information from Melvil Dewey's days, through the early days of directories on the Web, and the increasing acceptance of using metadata to describe information objects.

Eric Miller, an MIT-affiliated computer scientist, has been one of the contributors to furthering "semantic web" enabling technologies.

Meanwhile, social tagging has been gaining acceptance imposing a "grassroots order" on collections.

"No one knows what organizational technique will ultimately prevail. But what's increasingly clear is that different kinds of order, and a variety of ways to unearth data and reuse it in new applications, are coming to the Web. There will be no Dewey here, no one system that arranges all the world's digital data in a single framework."

Saturday, October 20, 2007

Folksonomies and Tagging

The current issue (Oct/Nov 2007) of the Bulletin of the American Society of Information Science and Technology is entirely about folksonomies and tagging. Diane Neal, as the guest editor, introduces the issue with an overview of folksonomies as a hot trend, how and where they are applied, their strengths for information retrieval and user involvement, and the weaknesses. Neal is assistant professor in the School of Library and Information Sciences, North Carolina Central University. It's interesting to see the growing acceptance in the Library and Information Science communities of user tagging.

Introduction: Folksonomies and Image Tagging: Seeing the Future? by Diane Neal

Check other articles in the issue:

Why Are They Tagging, and Why Do We Want Them To?
Trouble in Paradise: Conflict Management and Resolution in Social Classification Environments
Image Indexing: How Can I Find a Nice Pair of Italian Shoes?
Flickr Image Tagging: Patterns Made Visible

Thursday, October 18, 2007

Faceted Analysis of online media types

Anticipating new media: A faceted classification of material types Green, Rebecca and Fallgren, Nancy (2007) Anticipating new media: A faceted classification of material types. In Tennis, Joseph T., Eds. Proceedings North American Symposium on Knowledge Organization 2007 1, pages pp. 87-99, Toronto, Ontario.

Interesting application of faceted analysis.

Abstract

"The emergence of new media types, many seemingly without counterparts in the non-digital world, challenges the readiness of existing knowledge organization schemes to accommodate them. A knowledge organization scheme based on a faceted analysis of existing classes of bibliographic materials is likely to accommodate new developments better than one based on a list of unanalyzed material types. The faceted analysis undertaken here, in which seven facets are recognized (content, generation of content, recording of content, publication/distribution, physical characteristics, perception/use, and relationships) shows the inadequacy of the traditional view of the bibliographic community of a fundamental distinction between content and carrier; interaction between content and carrier is common and enters into the characterization of material types. The facet analysis is validated by applying it to two new material types, wikis and blogs.

Wednesday, October 17, 2007

Enterprise Search Practice Blog

Lynda Moulton of The Gilbane Group writes an excellent weblog on enterprise search called very simply - the Enterprise Search Practice Blog, having the description line - Analysis, opinion, and advice on enterprise search technologies and applications.

From time to time she comments on taxonomies and taxonomy development. For example, in The Marginal Influence of E-commerce Search and Taxonomies on Enterprise Search Technologies she wrote,

"The second distinction relates to taxonomies, and the increase in their development and use. I’ve seen a dramatic increase in job postings for “taxonomists” and have managed several projects for enterprises over the years to build these controlled lists of terms for categorizing content. What is noteworthy about recent job opportunities is that most seem to be for customer facing Web sites. Historically, organizations with substantial internal content (e.g. research reports, patents, laboratory findings, business documents) hired professionals to categorize materials for a narrowly defined audience of specialists. The terminology was often highly unique, could number in the hundreds or thousands of terms, even for a relatively small enterprise. This is no longer a common practice."

Saturday, October 13, 2007

Folksonomies meet Taxonomies

Thomas Vander Wal, the person who coined the term folksonomies, spoke with Paul Miller of Talking with Talis about folksonomies and their relationship to taxonomies. User tagging can co-exist with formal taxonomies and validate them. There are many interesting points about the ways people organize information for themselves in this podcast of 58 minutes.

Thomas Vander Wal Talks with Talis about Folksonomies (Aug 3, 2007)

Wednesday, October 10, 2007

Book: Making Search Work

"Making Search Work" by Martin White -- Reviewed by Jothi Nedungadi, Freepint, Oct 11, 2007

Reviewer Jothi Nedungadi suggests that better titles might be "Making Intranet Searches Work" or "Making Enterprise Searches Work". These titles capture the breadth and the intent better. A company would need to be visionary to adopt the practices necessary to really improve search. He doubts that many would but still recommends the book - "This is however an impressive compilation of information and a commendable effort by the author to address intranet search. His perspectives on making searches work are invaluable. I would recommend this book to those who are considering implementing an enterprise/intranet search engine."

Martin White gives a sample in an article written for Update Magazine.

The table of contents and sample chapter are at the publisher's site - Facet Publishing Online. There is some (small) mention of taxonomy management and social tagging.

Monday, October 01, 2007

Tagging Practices and Their Value

Tagging Practices on Research Oriented Social Bookmarking Sites by Margaret E.I. Kipp (mkipp@uwo.ca) Faculty of Information and Media Studies, University of Western Ontario, London, Ontario. Delivered at Canadian Association for Information Science 2007 (CAIS/ACSI)

[Also available through http://eprints.rclis.org/archive/00011413/ ]

"Abstract: This paper examines the tagging practices evident on CiteULike, a research oriented social bookmarking site for journal articles. Tagging practices were examined using standard informetric measures for analysis of bibliographic information and term use. Additionally, tags were compared to author keywords and descriptors assigned to the same article."

Shows that user tagging can enhance findability over full-text search by keyword and indexing with controlled vocabulary.

Conclusion: "The differing terminology use in tag lists suggests that tagging may be a working example of Vannevar Bush's associative trails. He argued that associative trails better represented how users actually work with their documents: by association rather than by categorisation. (Bush 1945) This suggests that user tagging could provide additional access points to traditional controlled vocabularies and provide users with the associative classifications necessary to tie documents and articles to time and task relationships as well as other associations which are new and novel."

Classification Software - webcast

Accelerating Management and Value of Your Enterprise Content -- webcast by KMWorld on October 2, 2007. Will examine the use of classification software to deal with the fast growing volume of unstructured information in organizations.

"In simple terms, classification software catalogues your information so that it can be found easily and logically, regardless of the file format or where it is stored. It should classify new information as it is brought in as well as sort through content that is already under management. It should learn more about your content over time, so that it becomes more reliable and knowledgeable. And it should streamline searches by HR, Finance, Customer Care agents, Marketing, and all the other decision makers in your organization—making them more productive and accelerating the time-to-value of your investment in ECM."

Sponsored by IBM

More information and registration at http://www.kmworld.com/Webinars/Details.aspx?EventID=251

Friday, September 21, 2007

Ways to Improve Enterprise Search

Strategies for Improving Enterprise Search Beyond the Out-of-the-Box Experience by John Ferrara, Boxes and Arrows (Sept 11, 2007)

"Quality search results only come about through applied effort, requiring in particular the skills of an information architect.² And IAs must be ready to go well beyond their traditional front-end role, digging into the functional backend and source data of the search engine. This article outlines how we can bolster findability and win back users’ confidence."

Excellent description on the design factors that contribute to better search including the use of metadata, standardized set of keywords, and an ontology.

Book Review: Glut: Mastering Information Through the Ages

A book for information architects and all information junkies -- Glut: Mastering Information Through the Ages by Alex Wright, reviewed in The Hidden History of Information Management, Bob Goodman, Boxes and Arrows (Sept 11)

" Information architects—and anyone curious about the roots of information management—will find much of interest in Glut’s thought-provoking tale."

Microsoft SharePoint 2007 - Taxonomy Piece

Microsoft entered the Electronic Content Management market with SharePoint Server 2007 (MOSS). Organizations will want to examine its capabilities when planning enterprise systems for content management and search - taxonomy design and deployment may be a piece of that.

BA-Insight has published a white paper on the search capabilities of SharePoint -- SharePoint Search: Five Keys to a Successful Implementation

"BA-Insight is a Microsoft ISV partner whose product Longitude for SharePoint extends the search capabilities of SharePoint to deliver dramatic improvements in usability and relevance for the user."

One of the five keys listed is to "Plan and build an effective taxonomy in SharePoint". In the paper, BA-Insight recognizes that taxonomies provide a means for "clarifying" results, but argues that the "centralized, classified and managed store of content", attractive as it is, is impossible. They recommend leveraging SharePoint to create a "simple" taxonomy.

The first step is to accurately map the hierarchy in your organization to you SharePoint Site structure. A site per department is typical and useful. Secondly, identify what types of content is useful to each department, and create a document library and/or List for each type. Heavily leverage the Content Type construct in SharePoint to tag specific types of content. Finally, and most importantly, optimize the SharePoint ranking algorithm. Taxonomy is only necessary, because the ranking algorithm isn’t doing its job.

Some might take issue with the statement that "taxonomy is only necessary because the ranking algorithm isn't doing its job". A taxonomy can also provide navigational access, big picture, vocabulary, and greater precision in search.

BA-Insight's Longitude module adds to this the capability for users to tag content - claiming that "over time a rich set of metadata is derived". There is some value to user tagging - different points of view, meaningfulness to the user - but "rich set of metadata" is a stretch unless the tagging is managed or guided.

Microsoft SharePoint Server has a large presence in corporations and with SharePoint 2007 Microsoft is taking on content management along with enterprise search. Taxonomy design and use will need to be figured into the plans. There may be other vendors, like BA-Insight, who will develop products that will enhance the MOSS search function.

SharePoint 2007 Review: Six Pillars of MOSS (Nov 2006) at CMS Wire provides a full description of the direction and capabilities of SharePoint.

Thursday, September 20, 2007

Endeca Discovery Suite

Search gets even smarter, KMWorld (Sept 19, 2007)

Endeca is extending the capabilities of extend the Endeca Information Access Platform (IAP)with the Endeca Discovery Suite. Some of the improvements relate to tag extraction and tag-based visualization.

tag extraction capabilities to pull together and reveal common themes, concepts and entities from text-based reviews, blogs and posts for use in site navigation, search relevancy and search engine optimization;

tag-based visualization and navigation to complement static and dynamic site navigation, giving users more ways to explore and find desirable content and products

meta-relational capabilities to link different content types by common concepts, allowing people to dynamically summarize the user-generated content associated with any set of products

Tuesday, September 11, 2007

Freebase for Structured Content

Freebase.com might be the forerunner of the semantic web that has long been talked about. Ivor Tossell at the Globe and Mail described in A web that can read itself may be in our future (Sept 10)

"Freebase, like Wikipedia, is an open encyclopedia that most anyone can edit. But alongside each free-form article in Freebase, there are database fields for relevant hard data points. If the article is about a movie, you'll find fields for its release date, director, producer, screenwriters and so on. If the article is about a city, it will have fields for the city's population and location. If the article is about an artist, it will have a field for every one of that artist's works."

Freebase is an open project for building structured data applications using types and defined properties. Film will have one set or properties, geographic places another set. It will support complex queries.

From the FAQ - "Finally, while information in Freebase appears to be structured much like a conventional database, it’s actually built on a system that allows any user to contribute to the schemas—or frameworks—that hold the data. This wiki-like approach to structuring information lets many people organize the database without formal, centralized planning. And it lets subject experts who don’t have database expertise find one another, and then build and maintain the data in their domain of interest."

Understanding OWL

Web Ontology Language (OWL) and Semantic Web by Goutam Kumar Saha, Ubiquity Volume 8 Issue 35 (Sept 10, 2007)

In this ACM IT magazine article, Saha describes the web ontology language (OWL) which is a principal part of enabling a "semantic web". Has illustrations, examples and explanations of code.

"Web Ontology Language (OWL) is a language for defining and instantiating web ontologies (a W3C Recommendation). OWL ontology includes description of classes, properties and their instances. OWL is used to explicitly represent the meaning of terms in vocabularies and the relationships between those terms. Such representation of terms and their interrelationships is called ontology. OWL has facilities for
expressing meaning and semantics and the ability to represent machine interpretable content on the Web. OWL is designed for use by applications that need to process the content of information instead of just presenting information to humans. This is used for knowledge representation and also is useful to derive logical consequences from OWL formal semantics."

Card Sorting Challenges

Card Sorting: Mistakes Made and Lessons Learned By Sam Ng, UX Matters (Sept 10, 2007)

The author speaks from experience in this article about card sorting. It's a simple concept, deceptively so, and people may expect more than it can deliver.

"I’ve accepted the fact that card sort analysis—much like usability test analysis—is often messy and subjective. It’s part science, but mostly art. As with many aspects of our work, there isn’t necessarily a single correct, quantitative answer, but rather a number of different qualitative answers—all of which could be correct. Our job is to use our experience and our understanding of people to make judgment calls."

[Mentioned in InfoDesign: Understanding by Design ]

Wednesday, August 22, 2007

Debate about tagging

Is Tagging A Disruptive Innovation? - Joel Lamantia asks that question at Tagsonomy.com (July 21, 2007. The spread of tagging could distract from creating or maintaining taxonomies and possibly in use of metadata. But there could also be a large element of hype in the attention tagging is getting. This is one piece of a longer discussion. Lamnatia concludes "Though it’s been a few years since tagging became visible, it seems too early to understand what kind of changes - if any - will occur in the metadata management ecosystem as a result of tagging’s emergence."

One wonders if people really want to spend the extra few seconds to tag an item, and if they do tag to use something more useful than "read later". I suspect that tagging will remain personal, and that general access will depend on automatic categorization based on business rules.

Tuesday, August 21, 2007

Social search and taxonomies

What will be the impact on the use of taxonomies in companies as more adopt enterprise 2.0 ways of connecting people? Social search, social networking, social enterprise - these are new possibilities being adapted for enterprise use from the consumer universe of Facebook / MySpace, del.ico.us and other social bookmarking services, Digg and Flickr and all the places where people tag what they find. Ajay Gandhi at BEA Systems in a KMWorld webinar posits that social search tools will greatly enrich knowledge management by assisting in sharing knowledge and forming communities. Harvesting Enterprise Wisdom through Social Search reviews knowledge management archetypes, notes the rising state of information overload, and describes the ways social tools will help people cope with that load. It does not mean the end of formal taxonomies to support intranet navigation and repository search, but it will see the emergence of folksonomies based on how people tag and social networks developed according to interest and expertise.

Webinar will be available for 90 days at www.kmworld.com/webinars/bea/21aug2007

Thursday, August 16, 2007

Facets and Taxonomies

The Taxonomy Community of Practice (Earley and Associates) is running a session on Facets and Taxonomies Search online on August 29, 1pm to 2pm EDT. Price $50 US.

From the announcement: "We'll start with an overview of facets and faceted search and then hear from Peter Bell, one of the founders of Endeca, a faceted search company, about new developments in the field that allow a combination of unstructured and structured tagging and classification. "

Thursday, July 19, 2007

SchemaLogic's Content Tagging

SchemaLogic, an information management company, has released Business Semantics Management software specifically designed for media and publishing enterprises. Associated Press is among the first to adopt it. The software allows users to tag content while the software manages the semantic connections.

SchemaLogic provides a solution for customers to implement a collaborative process that enables writers, photographers, and editors to participate in the development and enrichment of the underlying “content tags” that describe information in a dynamic, ever-changing environment – and they do not have to change the way they use their own terminology. Content tagging is an advanced method of identifying and labeling information assets including audio, video, news stories, and other web content using text descriptions. SchemaLogic’s software manages the definition and relationships between content tags so that each individual in each department can continue to work in a way that makes sense for them, while the semantic differences are resolved by the technology.

SchemaLogic Delivers First Business Semantics Management Solution for Media and Publishing Enterprises, Press Release (July 16)

Friday, July 13, 2007

Information Professionals in the Text Mine By Kathryn A. Lavengood and Pam Kiser, Online (May / June 2007)

Authors argue that text mining - for drawing relationships between disparate data from many sources - needs a "semantic infrastructure that focuses on information quality and decision support".

Key point (bolding added) : "The interpretation of text is just the first step in making the information usable. Another key part is then organizing the resulting “text pieces” into some form of usable network. This is addressed by building taxonomies and ontologies that can be navigated to explore specific topics of interest. Finally, the results must be output in a format that can be interpreted and lead to knowledge discovery."

It identifies three parts to a text-mining system: parsing the text into parts, tagging extracted information, and organizing the parts using taxonomies and ontologies.

Wednesday, July 11, 2007

Dow Jones Releases Synaptica 6.4 for Improved Business Semantic Management

In early June 2007 Dow Jones & Company introduced Synaptica 6.4 - its latest semantic Web-enabled knowledge organization system for the enterprise.

Synaptica 6.4 simplifies and standardizes vocabulary and metadata management in order to unlock valuable business intelligence.

“Computers can store, search and display enormous amounts of information, but until recently machines have not been able to understand the meaning of the content,” said Dave Clarke, global taxonomy director, Dow Jones. “Now, with the semantic Web being able to capture the meaning in a machine-readable way, users can discover latent information and make new connections between isolated content while benefiting from comprehensive and precise information recall.”

To read the full press release visit http://www.factiva.com/investigative/releases/20070605_synaptica.asp?node=menuElem1176

For more information about Synaptica 6.4, visit http://www.factiva.com/products/taxonomy/synaptica.asp?node=menuElem1511

To learn more about Dow Jones services, visit www.dowjones.com/clientsolutions

Sunday, July 08, 2007

Taxonomy Boot Camp 2007

The advance program from Taxonomy Boot Camp to take place in November 8-9, 2007 is now available. This is done in conjunction with KMWorld & Intranets and Enterprise Search Summit.

Monday, July 02, 2007

Cogito semantic intelligence

Expert System is an Italian firm specializing in semantic intelligent software for document management. It has just released Cogito SIMS, a semantic intelligent management system with capabilities for auto-classification. "Cogito is said to streamline the development of applications to understand, discover and classify information contained in unstructured text, as well as to review, normalize and automatically enhance metadata--through the extraction of named entities, relations and event-related data trapped in text." [Semantic intelligence management, Enterprise Search Center, May 16, 2007].

Monday, May 28, 2007

Book and Blog about Organizing Knowledge

People working with knowledge management, taxonomies, and folksonomies will be interested in Patrick Lambe's new book (February 2007) Organising Knowledge. Some excerpts and comments are provided on Lambe's page about the book.

He explained, "Hence, as far as I know, this is also the first taxonomy book that combines a practical guide to taxonomy development with a broader explanation of how taxonomy work contributes to knowledge management in a variety of ways."

His weblog, Green Chameleon, has several categories related to knowledge management and to taxonomy which provide various insights. This one on Folksonomies and Rich Serendipity argues for the value of people as "knowledge aggregators". This is a very thoughtful piece that was later included in Lambe's book.

Patrick Lambe is a principal with Straits Knowledge, a consulting firm for information and knowledge management based in Singapore.

Tuesday, May 15, 2007

Enterprise Search

Best Practices in Enterprise Search, Vol III [May 2007] from KMWorld - state of enterprise search today. Has articles on relational navigation, semantic search, metadata, search 2.0, evolution of enterprise search, e-discovery compliance, and much more. Features the technologies of the leading enterprise search vendors.

Siderean's Relational Navigation

Raising the bar on relational navigation, KMWorld (May 17, 2007)

Siderean's Seamark Navigator 4.5 enables identification of the relationships between sets of data across disparate sources whether structured or unstructured. Among other benefits, it may make it easier for users to create and manage taxonomies, but also to be able to see connections between facets - to be able to "branch out".

"The offering takes full advantage of semantic technology to enable users to harness content from across the enterprise and on the Web, greatly facilitating information access and discovery. Further, says Siderean, it enables more user participation than before, providing new tagging, voting, ranking and reviewing capabilities. It also helps knowledge workers to efficiently collaborate via commenting features and the ability to save and share searches."

Paula Hane at Information Today's Newsbreaks provides a longer description in Siderean Upgrades Its Relational Navigation Platform (May 14) Sue Feldman of IDC was among those interviewed for this article. She said that "Siderean handles dynamism — elements don’t have to be predetermined. And, while other discovery tools allow drill down through facets and hierarchies, Siderean can handle ‘sideways relationships’ in a unique way."

Siderean Software website has whitepapers and articles on its relational navigation. Especially recommended is the short video (3 minutes) on the Evolution of Search in which the VP of software engineering, Jack Berkowitz, compares keyword search, faceted search, and relational search.

David Weinberger in his blog, Everything Is Miscellaneous, references the Siderean patent for relational navigation, saying "Faceted classification and taxonomies both work by showing the user narrower and narrower results. That's often what we want, but in this crazy world, we may also want to leap off the branch we've walked onto." Siderean's relational navigation might be the method.

Tuesday, May 01, 2007

Endeca Finds Relationships

Endeca Tackles Complex Interrelationships in New IAP Version by Paula J. Hane, Newsbreaks (Apr 30, 2007)

Endeca Technologies (www.endeca.com) is best known for Guided Navigation, a faceted view for discovering content at a website. This article describes improvements that make possible "new data-driven approach with metarelational indexing that lets users navigate complex relationships between different types of information from different sources".

From the article - "Here’s a simplified description of how Endeca’s architecture works. Each document or record is a set of facets. Some facets are explicit, such as database fields or file metadata. In addition, the text itself can be transformed into explicit facets through entity and term extraction, classification, and other techniques. Finally, what the document is “about” from the user’s perspective is implicit, so Endeca keeps a full-text index of the document as another facet. Working with facets allows Endeca to adapt to disparate information types, to make connections and correlations, and to show the content interrelationships to users."

Article includes screenshots and links to a demo.

Thursday, April 26, 2007

Evaluating Classification

Taxonomy is classification. This article may help in developing and validating a classification scheme in the early stages.

Measuring the Success Of a Classification System by Iain Barker, Boxes and Arrows (April 2007)

Barker adapted work by Donna Maurer for evaluating card-based classification and applied it to quantitatively showing the improvements to be obtained from a new classification system for the company intranet.

Tuesday, April 24, 2007

Tagging made easier

User tagging of company documents and Web materials may just have got easier with TagEasy from TagSearch Technologies. Social search tools are gradually being adopted by companies.

Tagsearch Technologies Launches Tagging and Web Collaboration Platform, EContent (Apr 24)

Friday, April 20, 2007

Recommended: Enterprise Search Sourcebook 2007

Information Today announced the publication of an e-book for this year's publication of Enterprise Search SourceBook thanks to sponsorship by Endeca. It can be found at http://www.enterprisesearchcenter.com/.

+ Seth Earley has an article on Taxonomies, MetaData and search.
+ Peter Morville is interviewed in Enterprise Search and the Future of Findability
+ Susan Feldman writes on Search, The Quiet Revolution.
+ Francois Bourdoncle, CEO of Exalead, wrote Transform your Intranet into a Source of Knowledge.

There is also an index to advertisers and a showcase area for vendors.

I found the easiest way to navigate was to use the Contents button in the e-book control menu to get to articles, and then to set the view to single page at 100% for reading. Fortunately, it is possible to selectively print pages.

This e-book is a substantial resource on enterprise search for the variety of topics, the excellent writers, and the size - 124 pages.

Wednesday, April 11, 2007

Metadata helps keyword search

Montague Institute Review has an excellent article on the use of a variety of retrieval tools to enhance the findability of information at a web site.

Web site makeover: Legacy retrieval tools save time for users (March 2007)

The article examines the U.S. Supreme Court web site and presents a careful analysis of the audience for this site, alternative sources of information, and a makeover that would incorporate the best from each.

The main point: "it's time to rethink legacy retrieval tools in a Web context and consider a metadata repository as the implementation vehicle".

Monday, April 09, 2007

FAST buys Convera's RetrievalWare

FAST Acquires Convera’s RetrievalWare Business by Paula J. Hane, Newsbreaks (Apr 9, 2007)

Convera Corp has sold its RetrievalWare business to FAST. This gives FAST extra strength in the enterprise search market along with large number of client departments in the U.S. government.

"Convera was formed in December 2000 through the combination of the former Excalibur Technology Corp. and Intel’s Interactive Media Services Division. Excalibur had previously merged with Conquest in the mid-1990s."

The announcement states that FAST won't be developing RetrievalWare any further, but will "port" some capabilities from it to FAST's platform.

"However, according to Bauert [Peter Bauert, senior vice president of corporate development at FAST], RetrievalWare does offer some desirable features and functionality that FAST does not have now. He explained: “An example of a feature that we intend to ‘port’ from RetrievalWare to FAST ESP is the way that Convera is doing semantic mining using ontologies and taxonomies. While FAST already supports ontologies/taxonomies, Convera’s customers are used to a tool called the KnowledgeWorkbench and we intend to make that available to the customers as they migrate to FAST ESP.”"

Tuesday, April 03, 2007

Inxight Extracts Metadata

Inxight Launches SmartDiscovery Metadata Management System, EContent (Apr 3, 2007)

"Inxight Software, a provider of enterprise software solutions for information discovery, has launched the Inxight SmartDiscovery Metadata Management System (MMS), designed to allow users to review, cleanse, and augment automatically extracted metadata--the entities, relations, and event data trapped in electronic text."

New taxonomy at CIO.com

CIO.com has embraced many of the new Web 2.0 qualities in its revamped web site. CIO is published by IDG and appeals to the business and IT community; and now, with these improvements, it will be its own community.

Of note, are two navigational devices:

1) A new taxonomy covering technology and leadership.

2) A Google Custom Search that searches the CIO.com domain for content, and offers some standard tags for viewing the content: blogs, white papers, webcasts, advice/opinion.

Among the Web 2.0 elements are polls, RSS feeds, blogs with readers' comments and discussion.

Vendors will have more opportunity to add content, such as white papers and articles. Search for taxonomies as a start.

Thursday, March 22, 2007

Taxonomy Warehouse Update

Factiva has three new partners for the Taxonomy Warehouse. These are Cycorp, Ibuki, and Intellisophic. These are all featured on the main page. The Taxonomy Warehouse now offers 650 taxonomies produced by nearly 300 publishers in 40 languages and in 73 subject areas. [Source: Information Today, p 41. March 2007)

Wednesday, March 21, 2007

Using The Thesaurus of Aging Terminology

The Thesaurus of Aging Terminology 8th Edition (July 2005) is available online from AARP (formerly American Association of Retired Persons).

This is a good application for seeing how the thesaurus has been constructed and how it is used to assist in finding materials in the AgeLine Database of articles and studies.

Thesaurus - http://www.aarp.org/research/ageline/thesaurus.html

"The Thesaurus of Aging Terminology is a controlled vocabulary of subject terms (also called keywords or descriptors) used to index all publications cited in AgeLine. Because AgeLine focuses on aging-related topics from a variety of disciplines, the Thesaurus can be very useful in constructing a thorough search of the database, in defining how a term is used in AgeLine, and in identifying references having a major focus on that topic."

It is not the easiest to use. Basically, browse the PDF version of the thesaurus (272 pages), note terms you'd like to use, and do a copy and paste into the AgeLine search form.

"The Thesaurus of Aging Terminology is divided into three sections: Relational Terms, Rotated Terms, and Geographical Terms. The Relational Terms section indicates all levels of relationship among Thesaurus terms. The Rotated Terms section provides an alphabetized columnar listing of all words found within Thesaurus terms. The Geographical Terms section provides a ready reference list of state, province, country, regional, and continent names searchable as Descriptors."

AgeLine Database - http://www.aarp.org/research/ageline/index.html

Search Ageline - there are several options - basic keyword, subject, and multiple options. It covers a great range of aging topics related to health, living, and well being.

Watch for the Descriptors on articles that are displayed, and navigate to other topics.

Thursday, February 15, 2007

Taxonomy Boot Camp 2006 Presentations

Two PowerPoint presentations from sessions at the Taxonomy Boot Camp 2006 conference are available from Conference Buzz. There is a short recap of the two days plus:

+ Taxonomy 101 by Marjorie Hlava, Data Harmony - introduces taxonomies and how to create them.

+ Semi-Automated Creation of Faceted Hierarchies Marti Hearst, Berkeley

"Taxonomy Boot Camp 2006 offered nearly 200 enthusiastic attendees the rare chance to focus on taxonomies-all the time-and nothing but taxonomies."

Conference for 2007 is scheduled for November 8-9, 2007 in San Jose, CA.

Thursday, February 08, 2007

Earley buys Wordmap

Wordmap Ltd, makers of the the enterprise taxonomy software, has been bought by Earley and Associates, consultants in content management and taxonomies.

See Tantalizing taxonomies, KMWorld (Feb 7, 2007)

Seth Earley, principal of the firm, explained, "An interesting aspect of the Wordmap suite that differentiates it from many products on the market is the integration with content tagging and search. After all is said and done, taxonomies are only useful if they are presented to the user in a meaningful way. Wordmap modules have the ability to do that without a lot of API level coding. The tagging module overcomes many limitations of content management tools in presenting and applying taxonomies for tagging. The navigation module is an easy way to add faceted search (also called guided navigation) without having to acquire additional faceted search tools. In these ways, Wordmap adds value to existing search and content management environments."

Wednesday, January 17, 2007

Study into Tagging Practices

Patterns and Inconsistencies in Collaborative Tagging Systems: An Examination of Tagging Practices by Margaret E.I. Kipp and D. Grant Campbell, Faculty of Information and Media Studies, University of Western Ontario

Abstract

This paper analyzes the tagging patterns exhibited by users of del.icio.us, to assess how collaborative tagging supports and enhances traditional ways of classifying and indexing documents. Using frequency data and co-word analysis matrices analyzed by multi-dimensional scaling, the authors discovered that tagging practices to some extent work in ways that are continuous with conventional indexing. Small numbers of tags tend to emerge by unspoken consensus, and inconsistencies follow several predictable patterns that can easily be anticipated. However, the tags also indicated intriguing practices relating to time and task which suggest the presence of an extra dimension in classification and organization, a dimension which conventional systems are unable to facilitate.