Wednesday, June 16, 2010

XML

Introduction

The introduction of the Extensible Markup Language, or XML as it is commonly known, created a buzz in the Web world. It "provides both a standards-based way to identify the information that is of importance in a particular application, and the ability to process information tagged according to highly user-specific requirements with general-purpose software, such as editing tools, composition engines, and electronic browsers" (Usdin & Graham, 1998). In simpler terms, XML allows users to customize a markup language and apply it to an information object that can then be interpreted to determine its contents, whether it is an order form, a newspaper or an advertisement. Given these descriptions, it becomes apparent that XML is a tool, an enabling technology that can be used in conjunction with other tools to provide powerful Web applications. How this tool can be customized and utilized by the Web community is the subject of this tutorial.
History

XML's roots lie in the Standard Generalized Markup Language, or SGML. SGML was developed 20 years ago as a formal method of annotating documents to describe their meaning and structure, but it's complexity and cost hindered widespread acceptance. However, a subset of SGML called the Hypertext Markup Language, or HTML, is a phenomena that has enabled the rapid growth of the Web over the past decade. Used primarily for stylistic and formatting purposes, HTML has caused anxiety for many of its users who were interested in utilizing its tag set for more complex presentation control, data processing and programming (Treese, 1998). Because of these issues, the World Wide Web Consortium, or W3C, started a working group for a new subset of SGML, XML, in January 1997. The group "proposed a markup language that could work in concert with existing Web technologies, using some of the tools developed for use with HTML, while moving forward with more manageable techniques" (St. Laurent, 1999). A year later, in February 1998, the XML specification was ratified as a W3C standard.

While XML has its foundation in SGML, its philosophy differs and is based on four fundamental principles (Usdin & Graham, 1998).

1. Separation of Content from Format: What a piece of information is should be managed separately from how the information is presented. Information should be identifiable by its appearance, its use in a particular application, its role in the document in which it is contained, and its nature. For example, "knowing that a phrase is in italic is useful; knowing that it is the title of a subsection of a paper is more useful; and knowing that it is a genus and species name is potentially more useful still."
2. Hierarchical Data Structures: In XML, the data is assumed to be hierarchically organized, that is, a piece of information may contain other pieces of information and may be contained by yet another piece of information. Textual documents often exemplify this type of structure. For example, a book contains several chapters, each of which contains sections. Each section may have a heading, paragraphs and subsections, which also contain a heading and paragraphs.
3. Embedded Tags: The data marked up with XML contains tags, words or phrases enclosed in point brackets, which identify where the data structures begin and end. These tags can also have attributes, which provide information about the data enclosed by the tags. Example: < tag attribute="value"> content
4. User-Definable Structures: As mentioned above, XML is a tool, and it defines a method of customized tag creation. "XML assumes that users will create new tags as they create and work with documents, and that software such as browsers will have to display or process the content of these novel tags." As such, XML provides flexibility and extensibility by not providing a standard tag set like HTML.

Components of XML

The basic components of XML are similar to that of HTML: tags, elements and their attributes. A tag is a piece of markup such as an opening tag <P> and a closing tag This text is part of a paragraph element.
It includes the <B>bold element and
the <I>italics element.


The paragraph above has 6 tags comprising 3 elements, 2 of which are contained within the paragraph element. The paragraph element also contains an attribute specifying that the paragraph should be centered on the page. This style of markup is used in the creation of XML documents, which can be of two types: well-formed and valid.

A well-formed document is syntactically correct but does not refer to a Document Type Definition (DTD) that specifies tag requirements and allows the document to be validated. Syntactical correctness includes:

* utilizing a root element
* providing closing tags for all opening tags
* placing quotes around all attribute values
* ensuring the same case is maintained throughout the tags

A valid XML document is well formed and complies with the guidelines of a DTD, which defines a tag set. The DTD can be part of the XML document, or it can be referred to by the XML document. To create a DTD, please refer to the Creating a DTD section of this tutorial.

The convergence of the XML document and the DTD provides content for the browser (in this case, Internet Explorer 5) to interpret and display. The and the make up the prolog of the XML document, or "the glue that binds DTDs to the code that applies to them" (St. Laurent, 1999). The first statement tells the browser the version of XML in use, and the second statement provides the filename of the DTD, whether it is a system or public DTD, and its location/file name on the system. A system DTD is one that has been developed for a particular Web site or business, while a public DTD has been developed for use by types of organizations (e.g. advertising, newspapers etc.). The DTD defines the available elements and attributes which can be incorporated to comprise the logical structure of a single XML document or document groups.

The contents of an XML document must be formatted by using a stylesheet such as XSL (Extensible Style Language). Using a stylesheet adds another layer of complexity to the XML document display process. In the XML document, a line is added below the line that contains a reference to the XSL formatting file such as,
A valid XML document reflects the collaboration of the DTD, XSL stylesheet and the XML document contents. The document contents can be created by marking up text in a text editor or generated through a database. To generate a valid, formatted XML document from an Access database, please refer to the Creating an XML Document section of this tutorial.

No comments:

Post a Comment