What is XML? XML stands for Extensible Markup Language. It is a markup language which is like HTML.
HTML was designed to present data while XML was designed for how to store and transmit data. HTML tags are predefined, XML tags are not predefined.
XML is a structural and semantic language. It provides for the creation of common information formats allowing the sharing of both the format and the data. XML can be used to simplify data storage and sharing. XML provides for information interchange.
An XML Document’s structure is defined by the W3C XML specification AND a Document Type Definition (DTD). DTD defines the structure for data interchange. The fundamental components of an XML document are the elements and attributes. XML is case-sensitive.
All tags occur in pairs. An XML document must contain a pair of root tags for the “parent” element which contains all other elements. Only one root element per XML document. All elements may have child elements.
The definition of an XML document is controlled by a set of rules (a syntax).
The characters “<“and “&” are special characters and may not be used within an element’s data.
5 predefined entity references in XML
< less than
> greater than
&os apostrophe ‘
" quotation mark “
The naming of XML elements are also controlled by rules. Whitespace is a fundamental building block of good design. A “best” practice whenever programming is to include comments in the code. The syntax for writing comments in XML is the same as in HTML.
XML is used throughout the technological landscape for managing data.
Content management systems use XML to store information. The XML documents’ data is transformed for display using HTML and CSS. The XML documents’ data is easily shared. Content changes, additions, and deletions are made in a certral location and the changes will cascade out to all formats for presentation. Formatting changes are made in a central location.
Web Services use XML-based data transmission protocols. SOAP, WSDL, UDDI.
RDF (Resource Description Framework) is a framework for writing XML-based languages to describe information. RSS (RDF Site Summary) is an implementation of this framework. RSS makes content available as a feed allowing users can access some of their content without actually visiting their site.
An XML document is composed of three parts, A prolog, A document element usually containing nested elements, Optional comments or processing instructions. A prolog is optional.
The prolog of an XML document may contain the following items: An XML declaration, Processing instructions, comments, A Document Type Declaration (DOCTYPE Declaration).
The purpose of a DTD (Document Type Definition) is to define the legal building blocks of an XML document. A DTD defines the document structure with a list of legal elements and attributes. Describe the format of the XML document.
An internal DTD is declared inside the XML file within a DOCTYPE definition.
DTD element operators allow fine-grain control of the XML document’s definition.
An XML element maybe defined to be EMPTY.
An XML element maybe defined to be of ANY type. An XML element maybe defined as containg parsed character data with #PCDATA.
An XML element begins from the opening tag to the element’s end tag.
An XML element’s cardinality may be defined using element operators.
A DTD defines the document structure with a list of legal elements and attributes.
XML attributes allow us to include additional information about an element. The information included in an attribute should describe or augment the data. Do not use attributes to represent data. It is best practice to avoid the use of attributes in XML. Attribute limitations include cannot containing multiple values, cannot contain tree structures, not easily extensible.
Attributes are name-value pairs. Use attributes for data about the data – Metadata.
To declare attributes in a DTD use the ATTLIST declaration.
The attribute TYPE filed can be set to one of the following four value types:
An actual value is enclosed in quotes, #IMPLIED #REQUIRED #FIXED value.
XML defines reserved attribute names which cannot be used except as defined by the XML specification.
XML ENTITIES may be used as a shorthand for including a particular character or a long string.
XML NOTATIONS are used to identify the format of unparsed entities, elements with a notation attribute, or specific processing instructions.
An XML namespace provides a mechanism for avoiding element and attribute name conflicts.
An XML namespace maybe defined using the xmlns attribute in the start tag of an element.
Document Type Definition documents use an inflexible syntax. Using XML Schema provides an alternative to the use of DTDs. An XML schema document describes the structure of an XML document.
The XML Schema defines the structure of an XML document. Rules for data content. Describes the type and values that can be placed into each element or attribute.
The purpose of an XML Schema is to define the legal structure of an XML document. XML schema definitions allow for the specification of data types allowable for an XML element.
Attributes are considered simple types. Remember, simple elements cannot contain attributes. Attributes are optional; however, the attribute maybe required by adding the “use” attribute.
XML Schema indicators allow us to control how an element is used in the XML document.
Restrictions on XML elements are called facets.
*New Zealand Review Study Channel