The intelligence of machines and the branch of computer science which aims to create it

Artificial Intelligence Journal

Subscribe to Artificial Intelligence Journal: eMailAlertsEmail Alerts newslettersWeekly Newsletters
Get Artificial Intelligence Journal: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


Artificial Intelligence Authors: Jason Steinbach, Yeshim Deniz, Liz McMillan, Pat Romanski, Zakia Bouachraoui

Related Topics: Yacht Charter Blog, Artificial Intelligence Journal, ERP Journal on Ulitzer

Yacht Charters: Article

Combine Precise Search Results with Sophisticated Navigation

Combine Precise Search Results with Sophisticated Navigation

The key issue of the information society is getting the right information in the right context to the right person at the right time.This statement is true if you're an executive going over the last balance, an analyst gathering the current parameters of the stock market, an officer collecting the facts for a report, a student looking for the latest research results, or simply a traveler interested in some flight information.

The statement is also true if you're searching the Web, intranet, Document Management system, Enterprise Resource Planning (ERP) system, or Product Data Management (PDM) system.

We have to search for information not because it's absent, but because we're drowning in too much of it, and the amounts are continuously growing. The information available doubles every two to three years with decreasing intervals. We're used to queries with thousands of hits or not a single one - especially on the Web.

After we've found the necessary information, we want to see it in context. The facts alone aren't of much value; the relationships between the facts are the real knowledge. These relationships are expressed with linking technologies. With unstructured links we end up in a link maze without an exit - lost in hyperspace.

This article explains why XML, links, and metadata and fulltext searches solve only part of the problem. A new technology called Topic Maps provides the features that allow elegant and future-proof solutions of the knowledge management challenge.

Evolution of Data from Information to Knowledge
Various IT data management concepts have evolved in the last 10 years and continue to evolve. They attempt to solve the stated information-access problem (see Figure 1).

Asset and Document Management
Electronic assets such as images, graphics, multimedia files, scanned document pages, and electronic documents (e.g., text processing or DTP files) were and are the focus of this kind of data management. The assets and documents are in a binary monolithic format and portions of the data aren't addressable, and therefore not accessible - the data is "stupid." The management systems offer functions like check-in/out, locking, versioning, metadata, and user access rights. The search component supports metadata search and sometimes fulltext search - the right person gets the right data.

Workflow Management
Workflow management systems don't control the data. They control the process to capture, maintain, proofread, and deliver the data - give the right data to the right resource (person or program) at the right time - so the resource can perform its tasks within a defined timeframe. Workflow management requires an existing infrastructure (network and hardware) because all involved resources need electronic access to the data. Usually workflow management is integrated with asset, document, or content management.

Content Management
Content management is different from asset and document management because the managed data is no longer "stupid" but "intelligent." The system and the user have controlled access to portions of the data - the right person gets the right information. Such access requires structured formats such as SGML, XML, CGM, or SVG. Content management systems can provide link support (see also link management), improved QA (validation against schema), structure search in selected elements, single-source/multiple-use functions, document variants, and media-neutral content pools that are the base for all publications. Structured formats like SGML and XML transform data into information.

Communication Management
Content and workflow management control the creation, maintenance, and release of information. Communication management is responsible for the proper delivery and exchange of information - the right person gets the right information at the right time. Buzzwords like EDI, B2B, B2C, and e-commerce all rely on a working communication management that's the middleware implementing the business logic. Again, XML plays an important role as the underlying neutral protocol language, smoothing the exchange of information in a global and heterogeneous IT world.

Link Management
Links connect information objects and bring them into a relationship. Links shouldn't be stored as part of the content because their management becomes nearly impossible - managing HTML links with the element and its href attribute is a nightmare. They should be stored in a link database. XML and its companions, XLink/XPointer, define the technology for sophisticated linking concepts that allow a separate link management - getting the right information with some context information to the right person at the right time.

Knowledge Management
An integrated system that consists of subsystems for content, workflow, communication, and link management comes very close to real knowledge management - getting the right information in the right context to the right person at the right time. Two minor but important pieces are missing:

  1. Metadata, fulltext, and structured fulltext searches aren't sufficient because the number of hits is often too small (zero) or too large (hundreds to thousands). The user has to know the answer in advance to define the proper query. Fuzzy fulltext search and case-based reasoning (CBR) technologies rely on artificial intelligence (AI) concepts and offer valuable results.
  2. Links relate the information resources that bring certain information in context with other information. Unfortunately, an increasing amount of resources result in an ever-increasing number of links. And even if everything is "only one click away," it's likely that these links end up in an unusable link maze. Thus link management has to be extended by a concept that structures links the same way XML structures data. The new ISO standard Topic Maps define such a concept. Topic Maps transform information into knowledge; they bridge information and knowledge management.
Topic Maps - Link Organization for the Next Millennium
Topic Maps are developed by the same ISO committee that developed SGML, DSSSL, and HyTime. This committee's name is ISO JTC1 SC34. ISO/IEC 13250:2000 is the official ISO standard published in the beginning of 2000. However, its roots go back to 1991 when the Davenport group (initiator of DocBook DTD) wanted to merge the back-of-the-book indexes of two UNIX manuals. Over the years Topic Maps became an ISO project and evolved into a powerful but implementable concept.

The ISO standard defines the concepts: the data model and exchange syntax. The latter is based on SGML and HyTime. XML Topic Maps (XTM) are already under development using XLink as a linking syntax. A first draft of the XML Topic Maps DTD is available at www.topicmaps.com. The XTM working group hosted by IDEAlliance consists of Topic Map vendors and users. Most of the ISO committee members are involved in XTM, ensuring compatibility between the SGML and XML variants.

The Topic Map model is very similar to semantic networks, an AI concept developed for knowledge representation. Some minor differences make Topic Maps a bit more powerful.

The concrete relation between the W3C recommendation RDF (Resource Description Framework) and Topic Maps is under investigation. It appears that RDF is more general than Topic Maps. This implies that Topic Maps can be expressed by means of the RDF, and its software provides more functionality out of the box because of the predefined Topic Map semantics.

A Topic Map defines a metalayer "above" the information resources. The metalayer models all the topics - persons, objects, concepts, thoughts, and more - that are described "in" the resources and relations between the topics. The topics and resources are connected by hyperlinks using HyTime or XLink syntax. A Topic Map can provide different views on the same set of resources (e.g., beginner and export views on a technical manual) and, very important, the map has a value on its own, even without connected resources (see Figure 2).

Topic Maps in a Nutshell
Topic Maps can be easily explained using a back-of-the-book index as an application example. Think about a Caribbean travel guide; its index could look like Table 1.

The index items list the "topics" that can be found in the book. The page numbers point to the "occurrences" of the topics in the book showing where the reader can find the information resources. Different formatting of the topics and the page numbers (occurrences) signals that they're a different type (e.g., topic in roman font: island, capital city, site of interest; topic in italics: water sport; occurrence in roman: description of the topic; occurrence in bold-italics: city map). The "see also" defines an association between two topics.

Table 1 already contains the fundamental concepts of Topic Maps: topics, topic types, occurrences, occurrence types, associations, and association types.

Topic
A topic, in its most generic sense, represents any "thing" - a person, an entity, a concept, really anything - regardless of whether it exists or has any other specific characteristic about which anything whatsoever may be asserted by any means. The topics represent the things - the concepts - that are in the application domain. Every topic has an identifier, one or more types, and several characteristics (see Figure 3). The mandatory identifier is the unique address of the topic. The optional types describe "which kind" the topic is - in other words, a "class-instance" relationship. The referenced type is also a topic that allows self-documenting Topic Maps. One characteristic is the optional topic name that's divided into base, display, and sort names (only the base name is required). See Listing 1 for topic types and examples.

Occurrence
The second characteristic of a topic is the occurrence. It's a link to an information resource that's somehow relevant to the topic; it connects the topic domain with the resource domain. Every occurrence plays a role that's expressed by the occurrence role type, again a topic (see Figure 4). A topic can have as many occurrences as necessary. The link addressing is done with either HyTime or XLink/XPointer and therefore is as powerful as these standards. See Listing 2 for occurrence role types and topics with occurrences.

Identity
The difficulty with the automatic merging of different Topic Maps into one map can be reduced to the question: Are the two topics about the same subject? This sometimes philosophical issue is solved pragmatically by Topic Maps. Two topics are the same if they have the same name in the same scope or refer to the same thing using their identity attribute. The topics and all their characteristics could be merged if this condition holds.

Facet
Facets are property-value pairs that can be assigned to information resources as a kind of metadata. It's not intended, but also not forbidden by the standard, to assign facets to topics.

Advanced Concepts
A few additional concepts are necessary to give Topic Maps the full power that's needed for knowledge representation. These concepts are type hierarchies to cover complex classifications, transitivity property for associations, inference rules for the deduction of implicit knowledge, and consistency constraints to validate the semantics of large Topic Maps and guide the map's author. The concepts are modeled in so-called Topic Map templates, which can be compared with application profiles (see Figure 6). Topic Maps software that supports such a template provides all the functions that knowledge management applications require. See Listing 5 for a type hierarchy example.

Topics as GPS Satellites
The description of the Topic Map concepts showed that resource and topic domains are two distinct layers. The former contains all information resources - documents, graphics, images, audio/video clips, database records, and more - and the latter contains all topics, their characteristics, and type information. Topics represent the concepts that are expressed by the resources. Associations model the knowledge by bringing the topics into meaningful relationships. The two domains (layers) are connected by occurrence links that point from topics to information resources or vice versa.

The user can use both layers for searching and navigating. Fulltext or XML-based structure searches are known search functions that can be applied to the resource domain. Their results aren't precise because you might get a large number of resources and have to find the requested information in each resource by reading through it. Searching in the concept level - the topic domain - leads to precise results. The result is a list of topics. The size of the list can be easily decreased if the query was underspecified or increased if the query was overspecified. Getting from the topics to the needed corresponding resources or resource fragments is only one click away - you have to follow an occurrence link.

Searching and navigating in the topic domain are similar issues. Navigation is done by explicitly following existing links; for example, getting from the typing topic "sport" to the topic "sailing" and following the association of type "see also" to get to the topic "yacht charter" and its description (= occurrence) on "page 14." Thus Topic Map navigation can be seen as a path through the Topic Map that consists of concrete values. Searching can be modeled as a navigation path with concrete values and uninstantiated variables. The goal of searching is to find possible instantiations of the variables that result in a meaningful navigation path; for example, find all topics X associated with the topic "Barbados" and have occurrences of role "city map." Listing 3 shows that the topic "Bridgetown" would be a possible instantiation of X.

Navigating the resources is mainly about following the hyperlinks that connect the different resources. The already mentioned link maze problem arises; it doesn't matter to the user if the links are stored inline in the documents or separately in a link base. If the resources are linked as occurrences to topics, then there's an "uplink" to the much richer conceptual level. If the server delivering the resources to the clients is aware of all the uplinks, then the related topics or other parts of the Topic Map can be displayed with the resources. This means that the user always sees the concepts in which the current resource is embedded and can always switch back and forth between the resource and topic domains.

The topics that provide the user with a hint of where he or she is when looking at a resource behave like satellites of the Global Positioning System. These satellites send a signal down to earth that's used by GPS receivers to inform you where you are on the globe so you won't get lost. The topics "send" the occurrence "signal" down to the resource domain that's used by Topic Maps software to inform you where you are in the information universe so you won't get lost in hyperspace. In other words, Topic Maps are the GPS of the information universe.

Typical Application Domains
Defined as an enabling standard, Topic Maps can be applied to various application domains. The most obvious are Web portals, knowledge management, and commercial and corporate publishing.

Web Portals
A Web portal collects information assets about a certain domain and connects these assets by hyperlinks. The main goal of the portal is to provide the user/customer with a complete set of information, and easy and efficient search and navigation functions. This goal is exactly what Topic Maps fulfill. The biggest advantage is the improved search. User queries are always performed on the topic level, not fulltext. The returned query results aren't Web pages but topics (= concepts) that are much more precise. If the user is interested in a topic, he or she will see the various occurrences (= "real" Web pages) or associated topics. If the topic search returns no hits, a fulltext search might be started automatically.

Enterprise Knowledge Management
One interpretation of the buzzword knowledge management is the gathering, storing, and retrieving of corporate knowledge. The goal is to make the knowledge that's "in the heads" of the employees accessible to the whole company. Topic Maps are seen as a standardized base technology for knowledge representation. Therefore they could be used for the storage and retrieval of the gathered knowledge. It's the knowledge-gathering process that's mission-critical. The amount of information produced by a company or larger enterprise could be enormous. It'll be more or less impossible to investigate and model all the enclosed knowledge. Thus we need support by automatic processes. Automatic Topic Map creation by thesaurus-based fulltext indexing is a promising technology that leads to useful results.

Commercial Publishing
Commercial publishers, especially professional ones, are faced with a new competitor: the Web. Almost every piece of information that can be found in their publications can be found somewhere on the Internet for free. So how is a publisher to compete? Paradoxically, the answer lies in the fact that most users today don't need more information - if anything, they need less information that's more accurate because they're already drowning in enormous quantities of it. At the very least, they need to be able to find their way to relevant and proofed information as quickly as possible and to filter out the "noise" created by all the information they have no use for. Applying Topic Maps to commercial publishing is the next logical step after the publishers have applied XML or SGML to their substances, because the Topic Maps technology provides the required mechanisms for intelligent retrieval and navigation. In addition, Topic Maps can support the editorial teams by providing complex metadata for the management of the substances and products.

Corporate Publishing
Documentation of a complex product consists of thousands of pages or megabytes of textual data. All corporate publishers have to manage and publish the documentation for different product versions and variants. Versions and variants require an organization of the text that can't be book-oriented. Modularization is the first step toward an appropriate solution. The existing chapter-section-subject structure of book-oriented technical documentation has to be split up into hundreds or thousands of separate text modules (information objects). The modules consist of "self-contained" text about a given subject (e.g., installation). This technique is directly related to Interactive Electronic Technical Manuals (IETMs). Each IETM is a collection of so-called data modules. Hyperlinks connect the modules. Hierarchical subject codes, assigned as metadata, allow quick access by querying the database that contains the modules. These two characteristics (self-contained text, hierarchical subject code) imply that IETMs or, in a more general sense, technical documentation is an ideal candidate for Topic Map application.

Topic Map Tools
It's obvious that the Topic Maps technology needs support from dedicated software tools. Since they relate to the whole publication production cycle - design, creation, storage, management, maintenance, publication, delivery, and consumption - each step requires specialized support.

Editorial Functions
The editorial system integrates all necessary tools to design, edit, store, manage, view, navigate, and query Topic Maps in a client/server environment. Topic Map-specific functions are completed by general data management features such as user access rights, version control, and lock control. The server software stores all information in a database ensuring multiuser access through the client software. The client consists of the editor, viewer, navigator, and query-user interface (see Figure 7). The server implements fast access to the Topic Maps and provides services such as validation, inferencing, querying, import/export, bulk operations, interfaces to content management systems, and an API.

Delivery Functions
The online delivery of Topic Maps is mainly about querying and navigating. Navigating in Topic Maps and viewing its information require special user interfaces. Navigation, especially, is much more intuitive if the topics and the associations are rendered as an interactive graph. Filtering of unnecessary or unwanted information is very important because the visualization parameters - icons, colors, edge shapes - are limited and real-life Topic Maps will become very large. Other text-based renditions of Topic Maps could make use of an XML representation that can be rendered using XSLT. The transformation can create Web-browser renditions (see Figure 8) but also other formats like WAP (Wireless Application Protocol). Another important issue of Topic Map navigation is the ability to get from the resource domain back into the Topic Map domain. When the user browses through the content (= resource domain) he or she should always be informed about where and how the content is linked to the Topic Map and to other content through the map. This feature is evidently necessary to make Topic Maps the GPS of the information universe.

Famous Last Words
The right information in the right context for the right person at the right time is the key to success for enterprises, organizations, administrative bodies, small companies, and people in the information society. Knowledge management provides the IT technology to fulfill this requirement. Traditional search concepts, such as fulltext, fulltext in XML structures, or metadata, aren't sufficient. Fuzzy search, case-based reasoning, and Topic Maps are the concepts that make real knowledge management work.

Dubbed the "GPS of the information universe," Topic Maps are the bridge between information and knowledge management. The integration with AI-based fuzzy searches combine precise search results with sophisticated navigation, qualifying Topic Maps as one of the top technologies for knowledge management.

Topic Map tools are already available - you've seen screenshots of STEP's editing and navigation interfaces - which is the precondition for the predicted success of Topic Maps.

References

  1. ISO/IEC FCD 13250:2000 Topic Maps:http://www.y12.doe.gov/sgml/sc34/document/0129.pdf
  2. Latest Topic Maps publications: www.gca.org/papers/xmleurope2000/rel/sess10.html and www.gca.org/papers/xmleurope2000/rel/sess25.html
  3. STEP's Topic Maps: www.topicmaps.com
  4. Resource Description Framework: www.w3.org/RDF/
  5. XML Topic Maps: www.topicmaps.org

More Stories By Hans Rath

Hans Holger Rath is the director of STEP's consulting department. One of the leading Topic Maps experts worldwide, his focus is on the development of Topic Map profiles for real-world knowledge management applications. Hans represents Germany on the ISO standards committee, which is
responsible for SGML, DSSSL, HyTime, and Topic Maps.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.