Semedica Silverchair
HomeProductsServicesWho We ServeGet StartedNewsletterAbout UsJobs


Winter 2010


• Topic Collections: A Better Way of Organizing & Integrating STM Information?
• Solutions: CardioExchange
• How to...Taxonomy as the Key to Content Organization
• What's Trending?


Fall 2009



Subscribe Now


The Semantic Source is emailed quarterly


How to...Taxonomy as the Key to Content Organization

You’ve got great content. And it’s critical to the work of your users—students, teachers, practitioners, researchers. But are they finding the precise answers they need when they need them? Do you offer collections of content organized topically that allow users to find related content easily? Content organizers like taxonomies add semantic structure to your content and increase its retrieval and discoverability.

First we’ll explore various classification schemes to illustrate how increasing levels of sophistication aid content organization. We’ll then look at best practices for taxonomy development.

Key word list

Key words are simply an uncontrolled list of main concepts in a document. Authors have traditionally supplied such key words with their journal article manuscripts. But what one author calls "cardiac arrest," another calls "asystole"—making it difficult to connect these two obviously related articles. Without control of terms selected and normalization of language variations, key words are at the bottom of the heap in terms of usefulness.

Controlled vocabulary

A controlled vocabulary contains preferred terms agreed on by stakeholders. Its purpose is to reduce ambiguity in how users (or computers) categorize information objects. For instance, a controlled vocabulary for vehicles may indicate that all pieces of information about cars are to be tagged with the term automobiles. A step up from key words, but individual content collections may still contain too much information to make things easy to find.

Taxonomy

A taxonomy takes controlled vocabulary one step further by organizing concepts into relationships. At its most basic form a taxonomy is hierarchical, defining broader or narrower (parent–child) relationships.

A taxonomy should be granular enough to allow individual nuggets of information that precisely meet your user’s interests to be found. Let’s say the cardiology branch of your taxonomy unfolds to the level of arrhythmias. But how does that help the clinician who is looking specifically for articles about supraventricular tachycardia (a type of arrhythmia)? She’ll still need to weed through many nonrelated articles to find those that meet her specific interest. More likely, she’ll abandon your site for one that better serves her information needs.

Because taxonomies are hierarchical, it’s easy to step up or down the tree to create increasingly broad or narrow collections of content. You can even put this power in the hands of your users by making your taxonomy visible to support hierarchical browse—allowing them to create their own personalized topic collections on the fly.

Many publishers Silverchair has been talking with lately are working on projects to build automatic application of taxonomy terms into their content production as early as the manuscript submission process. Such early application of semantic metadata allows you to take advantage of author expertise in vetting automatically applied tags and makes tagging available to be used throughout all stages of production, from peer reviewer selection through publication to the web.

Thesaurus

If you pair your taxonomy with a robust thesaurus you’ll significantly improve findability. A thesaurus allows you to define "same as" equivalent relationships between terms that "normalize" the natural variations in the terms authors use to describe things and searchers use to find them (cardiac arrest vs. asystole; car vs. automobile). The thesaurus refers back to your preferred taxonomy term so you can return all relevant results. In the controlled vocabulary example above, all information about cars is tagged with the term automobiles. If a user searches for cars, a thesaurus will link that query to all information tagged automobiles and retrieve the relevant results.

At Silverchair, we go beyond synonyms in maintaining the thesaurus that can be paired with our Cortex taxonomy. We include all equivalents—abbreviations, acronyms, jargon, vernacular, and even common misspellings. Our rule: If it will help people to find a concept, put it in the thesaurus.

A thesaurus will also significantly aid automated application of taxonomy terms to your content. Why? Because precise strings are easier for the computer to match, and a thesaurus magnifies the number of strings available to match to a given concept.

Ontology

In a taxonomy, relationships between terms are usually confined to "broader than" or "narrower than." An ontology is a classification system with multiple types of relationships possible. Think of an ontology as a way to structure the complex semantic relationships between things—in health care, possible relationships are causes/is caused by or treats/is treated by.

Best Practices in Taxonomy Development

So—how do you develop a taxonomy for your publishing program? The taxonomy experts at Silverchair offer seven best practices:

1. Make a plan!

Start with a strong sense of the scope of your content and how you plan to use your taxonomy to create products and features. The best taxonomies are developed always keeping the end goals of the taxonomy in mind. Consider how your taxonomy will be used for tagging, content management (including recombination and reuse), search, and navigation. Helpful questions to ask include:

  • Who are your users? How do they find content? What problems are they trying to solve?
  • What are the most common use models for your site?
  • What are the most important topics and aspects of your content collection? What information is of highest value to users?
  • What business benefits do you expect from your taxonomy?

Knowing your goals and potential use cases will establish a framework for taxonomy development and how best to apply it in classifying your content.

2. Connect to industry standards.

If a standard, domain-specific taxonomy is available for your industry or discipline, start there. Not only will a standard taxonomy offer a shortcut in your taxonomy development, it will allow downstream interoperability and integration with third parties who also use that standard for content classification.

A flaw in many standards, however, is that their bureaucratic nature translates to slow adoption of new concepts—they often must wait for a critical mass of use, or consensus in the field. Make sure you stay connected to the standard, but be prepared to depart from it as needed. Keep your taxonomy up to date with emerging concepts and include "local" concepts that support your unique users. The ideal is a taxonomy that strikes the proper balance between standards (maximizing the opportunity for interoperability) and flexible timeliness (allowing it to keep up with the fast pace of information change).

Silverchair’s Cortex biomedical taxonomy is connected via unique ID to NLM’s UMLS Metathesaurus (which contains ICD, MeSH, SNOMED, and many other medical vocabularies) to ensure interoperability and simplified integrations across health care but also incorporates emerging concepts far more quickly than those standards. (Learn more about Cortex.)

3. Analyze your content.

Analyze your content to extract the important themes and appropriate topic coverage. Such analysis can be done by experts or programmatically. We recommend a mix of automated analysis and human insight for maximum effectiveness.

4. Consult subject matter experts.

Validate your planned topical coverage and taxonomy hierarchy with subject matter experts. Such experts can help you understand audience needs and develop realistic use cases for your taxonomy. You probably have access to such subject matter experts in the form of authors, editors, and editorial boards.

5. Learn from your users.

Analyze search logs, user tagging, and user-generated content continuously against your taxonomy to look for new concepts and changes to terminology. Normalize your user inputs by creating equivalents (synonyms, abbreviations, jargon) in a thesaurus to your taxonomy (e.g., "c diff" = "c difficile" = "clostridium difficile").

6. Use your taxonomy.

Make your taxonomy actionable by using it in your applications and applying it to your content through tagging. Your taxonomy is platform-neutral and can be used to enhance key features when deployed in any information system, powering

  • Faceted search and browse
  • Hierarchical navigation
  • Topic collections
  • Contextual linking
  • User personalization/profiling
  • Alerting/messaging
  • Advertising
  • Integrations/mash-ups
  • E-learning object identification

7. Measure and update continuously.

Your taxonomy is never finished—it must be allowed to evolve as your discipline, content, and users change (Silverchair’s Cortex is updated daily). Conduct frequent reviews to continually gauge your taxonomy’s fit and relevance. Establish a taxonomy governance policy with process and editorial rules that will allow your taxonomy to evolve in a controlled and predictable way. Consider using web-based taxonomy management software (like Silverchair’s Totem) that will help you enforce that policy and provide a simple interface so that nontechnical users can update the taxonomy easily.

To learn more about using taxonomy to organize your content, please contact Silverchair.


Previous: Solutions: CardioExchange   Next: What's Trending?



Semedica, 316 East Main Street, Charlottesville, Virginia 22902, USA  ·  T: 434.296.6333  ·  F: 434.296.3027
©2009 Semedica  |  All Rights Reserved  |  Privacy Policy  |  Semedica is a division of Silverchair