Title
A Reference Model for Data Interchange Standards
Author
Michael Spring, Department of Information Science and Telecommunications, School of Information Sciences, University of Pittsburgh
Date
1/01/2005
(Original Publish Date: 4/6/1996)
(Original Publish Date: 4/6/1996)
Abstract
In 1991, the World Wide Web (WWW) went public. It was based on a simple assumption. Documents would be represented in a standard structural copymarking language (HyperText Markup Language -- HTML), employ a standard document identification scheme (Universal Resource Locator -- URL) and use a standard retrieval form (HyperText Transfer Protocol -- HTTP). A mere 5 years later, there is one document index site -- Alta Vista - - with a 30,000,000,000 byte index of more than 20,000,000 pages of text. Alta Vista indexes more than 2,500,000 new or revised pages every day and handles more than 5,000,0000 queries every day. What makes the WWW work is the agreement to use a standard form for documents and document references. By some criteria, HTML, URL, and HTTP are too simple, especially given the more comprehensive standards on which they are based, i.e., SGML, DSSSL, Hytime, and CCL. Even with the simplicity, and maybe because of it, WWW standards for document interchange have dramatically changed the way we do business in the electronic realm. These developments are encouraging in what they portend as well as what they actualize. They point to the need for more attention to be paid to document interchange standards specifically and to data interchange standards more generally. Consider for example the general platform dependence of audio formats, or the issue of translating procedurally formatted word processor documents to HTML format. Related to the WWW, there are two specific concerns: 1. The stability of standards. The rapid evolution of the standards from HTML, HTML2.0, HTML3.0, including intermediate and extended variants, to imagemaps, CGI scripts, forms, and java applets needs to be examined and made more systematic. 2. The investment being made needs to be protected. One wonders how the investment in links will slow, or be affected by the evolution of a new domain name space? How will the migration from ASCII to UNICODE impact the investment? How will the evolution of full featured SGML documents using DSSSL and Hytime impact the HTML store?