W e b w o r d

Vol. 2, No. 3: MAY-JUNE 1998


The XML Files

Charles Petrie

Stanford Center for Design Research


Some of my best friends are XML proponents.


But I don't get it.

In case you haven't heard, XML is The Next Big Thing. It will enable wonderful new applications, electronic commerce will take off, and the world will be one. But there's a high-tech oddity here.

The problem with HTML is that it's only a presentation standard.1 The syntax only determines something about the appearance of the content. In other words, what is machine executable is the presentation. There is nothing that would help formalize the semantics of the content. It's all show and no tell.


Enter XML with its user-definable tags. You can add new tags, such as <price units="dollars">2.00</price>. A suitable parser can read the tag and pluck out the price for use in computation. Once you can do that, you can do anything. You can define any tags you like and then write software that attaches content semantics to the tags. Is this not wonderful?

But there's a dirty little secret: you can do this now in HTML. My research group does it for some agent applications. An authoring tool allows the user to highlight text and click on icons that then embed invisible (to the user) new tags that can be used by a proxy agent processor to send messages to other agents.

We got the idea from ABSML (a better standard markup language), which was used at Lockheed for the Multimedia Engineering Collaboration Environment, an engineering design documentation system. Some companies today are quietly doing server-side processing of application-specific HTML tags.


What does XML add? It adds formal ways to define the presentation of user-defined tags and to check their syntax, which defines the presentation semantics. But XML adds nothing to aid with the semantics of the content. For example, it does no better than HTML at ensuring different browsers reading the same document mean the same thing by "price."

What does XML take away? It takes away the simplicity of HTML. Moreover, if you want to convert your old HTML, you must make many small changes. Yes, you can run scripts to do this. (No doubt a cottage industry in shareware for running these scripts will emerge.) And what you get for your trouble is the ability to display the formerly invisible tags in a consistent manner in all browsers.

I'm pushing this point because everyone pushing XML touts its ability to add content processing. Moreover, whenever I hear folks talk up XML, I sense background concepts lurking nearby that smack of one-world ontologies.2


There, I've said it - the "O" word. If anyone is thinking along ontology lines, I would like to break some old news to them. Without going into specifics, a lot of sophisticated technologies developed by a lot of smart people over a long period of time have not produced any wildly successful distributed general-purpose, all-encompassing ontologies. If you thought distributed databases were difficult, try distributed ontology merging.

And if you look closely at what XML might bring to the unified distributed ontology game, the answer is nil. XML per se doesn't say a darned thing.

So there's a mystery here. Using embedded tags for defining content semantics is a great idea, but lots of people have already done it in HTML. Defining the presentation of embedded tags in a standard way is a good idea, but it could probably be done with some form of HTML macro, if anyone wanted to go that way. And XML doesn't seem to add anything else, except a finer level of control over presentation.

Is there a conspiracy here? Are our Web government leaders selling us a cover-up story about the virtues of user-defined tags that hides the real agenda of SGML geeks?


The answer is: XML will allow programs to execute on tags at display-time. Think of XML as allowing a generalization of <IMG> tags. Only now you can specify any program instead of just those helper applications defined by file suffixes. Wow. This will make for some wild display possibilities.

But is that all there is?

Well, if these programs are scripts that can read, write, and make socket connections, then we could do general-purpose apps on the client side, instead of being restricted to server-side processing of user-defined tags, as we are now with HTML.

But the specs so far are noncommittal on application execution issues. As one example, the security concerns will be even more complicated than they are for applets, but the specs say no more than that XML has a provision for specifying an application.

OK, here's the point. Darned if XML documents won't make HTML look dull. I get that. There are some very good people at W3C driving XML (see IC's interview with Dan Connolly, head of the Architecture Domain at the W3C). But they seem dazzled by the wonderful display possibilities. So while business people are hoping for real document-driven interoperable applications, this aspect could be inadequately addressed by the presentation-driven folk.


None of this is bad, and if it gets us all to use embedded tags and drives industry to build interoperable applications using them, this will be a major Good Thing. And there are some very good XML-based tools being developed that could make domain-specific ontologies really useful. For example, Veo Systems is an aggressive Silicon Valley startup in XML-EC. But almost all of this work could be done with today's HTML. Where XML could add something new, it hasn't.

XML is still being defined, and some of the concern is with HTML compatibility and script interaction. So there's time for you to check in with W3C and have your say.

It's your Web after all.

And thanks to Dale Dougherty for the column title.


1. C. Petrie, "Agent-Based Engineering, the Web, and Intelligence," IEEE Expert, Vol. 11, No. 6, Dec. 1996, pp. 24-29; also available online at http://cdr.stanford.edu/NextLink/Expert.html.
2. J. Bosak, "XML, Java, and the Future of the Web," white paper, Sun Microsystems, http://sunsite.unc.edu/pub/sun-info/standards/xml/why/xmlapps.htm.

URLs in this article

Dale Dougherty webreview.com/97/05/16/feature/

Interview with Dan Connolly: "Let a Thousand Flowers Bloom" computer.org/internet/v2n2/connolly.htm

Ontologies www-ksl-svc.stanford.edu:5915/doc/frame-editor/what-is-an-ontology.html

SGML www.sil.org/sgml/sgml.html

Veo Systems, Inc. www.veosystems.com/

W3C www.w3.org/

XML www.w3.org/XML/

XML FAQ www.ucc.ie/xml/

Back to IC Online

Copyright © 1997, 1998 Institute of Electrical and Electronics Engineers, Inc., All rights reserved.