The intelligence of machines and the branch of computer science which aims to create it

Artificial Intelligence Journal

Subscribe to Artificial Intelligence Journal: eMailAlertsEmail Alerts newslettersWeekly Newsletters
Get Artificial Intelligence Journal: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


Artificial Intelligence Authors: Jason Steinbach, Yeshim Deniz, Liz McMillan, Pat Romanski, Zakia Bouachraoui

Related Topics: Artificial Intelligence Journal, XML Magazine

RSS Feed Item

Re: Creating a single XML vocabulary that is appropriatelycust

For the sake of discussion, consider these two solutions to the 
underlying problem, namely that different communities communicate 
differently within themselves, and their syntaxes need to evolve in 
different contexts with ever-diverging requirements:

(1) HyTime Architectural Forms. These are supported by nsgmls. The 
architectural forms (AFs) are the element type definitions shared by all 
of the sub-vocabularies; a set of such element type definitions is 
called a "meta-DTD". AFs impose certain constraints, but the 
sub-vocabularies can rename things and add constraints. The notion of 
"adding constraints" is complex, which is perhaps why the whole idea of 
Architectural Forms was vehemently rejected when XML was first adopted, 
and the whole XML namespace fiasco was adopted instead. Nonetheless, 
nsgmls can parse, validate, process and report instances of 
architectural forms in XML in such a way that any instance of a 
sub-vocabulary can be viewed as a document conforming to the 
architectural forms (described in the meta-DTD). Actually, I'm not sure 
why I'm bothering to rip this scab off the wound again, except that the 
problem keeps coming up: how to distribute, and limit the distribution, 
of authority over a large-community-wide document type, among smaller 
sub-communities. The HyTime Architectural Form solution is still an 
international standard (ISO/IEC 10744:1997), and it's still supported by 
the gold-standard markup parser, in both SGML and XML. And it really 
works. Few use it because certain powerful people, who still have no 
good answer to the authority-distribution problem, considered 
Architectural Forms "ugly" and refused to engage in any further 
discussion of the actual pros and cons, as they felt their RDF vision 
demanded.

Personally, I don't recommend HyTime Architectural Forms any more 
because I no longer believe that syntactic tricks constitute a realistic 
basis for the distribution of semantic authority, at least not in the 
general case. However, they may make a lot of sense in situations where 
there is no shortage of markup expertise.

(2) Topic mapping. A grove (e.g., a DOM tree or similar parse tree made 
from any kind of document, XML or otherwise) is nothing more or less 
than a semantic network whose nodes signify syntactic constructs. When 
two groves are produced from the same instance according to different 
parsing rules ("property sets" and property subsets called "grove 
plans"), and are merged in such a way that when any two nodes represent 
exactly the same syntactic construct they are merged and become a single 
node, then you can look at single nodes from different perspectives. 
(Actually, the Architectural Form stuff was moving in this direction, 
but all development stopped when the W3C forbade further consideration 
of it and demanded that everyone adopt XML Namespaces instead. In 
retrospect, it seems possible that the W3C's focus on machine-to-machine 
communication and AI left little room for questions about human issues, 
like the issue of how to deal with the fact that top-down authority over 
document types simply can't work across diverse human communities. This 
story is very far from being over.)

A further generalization in the direction of multiple perspectives on 
the same information is to consider multiple document instances only in 
terms of what they are taken to mean (by one or more persons), and for 
such persons to reify each subject of conversation as a topic node. Yet 
more human effort (very significantly aided by computers) can then 
determine how to merge the resulting semantic networks. That's topic 
mapping, at least at the level of the Topic Maps *Reference* Model (not 
to be confused with the far more heavily promoted Topic Maps *Data* 
Model, which assumes a specific ontology). Such an approach can factor 
out ("transcend") any and all differences in the syntaxes used by 
different communities, but machines can't do it alone. It's an editorial 
task requiring deep knowledge of multiple cultural contexts. People can 
do it if they make a specialty of being members of multiple communities 
and producing topic maps that provide wormholes between different 
universes of discourse. With the help of appropriate editorial tools, 
such people can earn a living -- not a bad thing, really.

Michael Kay wrote:
>> How do you create a single XML vocabulary, and validate that XML 
>> vocabulary, for a community that has sub-groups that have overlapping 
>> but different data needs?
>>     
>
> With difficulty. I've seen the problem more often in a different guise: how
> do you design a set of 400 messages for application data interchange that
> reflect different information about different events affecting the same
> objects?
>
> One approach is to rediscover the concept of subschemas, as used in the
> Codasyl database model. (In the relational model, these became views, but
> that's a less useful concept in this context.)
>
> You can start with a schema that makes everything mandatory, and construct
> from it a subschema in which parts are optional and/or prohibited. Or you
> can start with a schema in which everything is optional, and your subschema
> can make some parts mandatory. Either way, I think you are using some kind
> of process that modifies a schema to create a different schema. Plenty of
> users are doing such things by applying XSLT transformations to XSD
> documents, but it's not easy. Others are doing it using xs:redefines, which
> is not much better. Others are simply giving up: I've seen users stuff
> unwanted data into a message because it's too hard to change the schema to
> make it optional, and I've seen users relax the schema to make an element
> optional for everybody even though there are some contexts where it's
> required.
>
> Assertions in XSD 1.1 could be used to make the process much easier. If your
> schema is permissive (everything optional), you can add assertions to make
> it more constrained.
>
> Michael Kay
> http://www.saxonica.com/
>
>
> _______________________________________________________________________
>
> XML-DEV is a publicly archived, unmoderated list hosted by OASIS
> to support XML implementation and development. To minimize
> spam in the archives, you must subscribe before posting.
>
> [Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
> Or unsubscribe: [email protected]
> subscribe: [email protected]
> List archive: http://lists.xml.org/archives/xml-dev/
> List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
>
>   

Read the original blog entry...