XML

From DARC (Digital Archive Research Collective)
Jump to navigation Jump to search

What is XML?

XML, or eXtensible Markup Language, is not actually a language, but a framework for creating more specific markup languages. Encoding documents in XML makes them machine-readable and human-readable, and prepares the text for further textual analysis.

Here is a breakdown of the XML acronym:

"eXtensible”: means that you can extend, or customize, the XML. In other words, XML allows you to create your own tags.

“Markup”: like HTML, XML allows you to “mark up” or “tag” text with additional data, such as <section>, <line break>, or <greeting>. XML is interested in the meaning of data in addition to its presentation. Contrast this to HTML, which concerns itself with how elements are laid out on the screen.

One more thing to remember: because XML organizes data structures, XML documents must be well-formed according to a defined syntax where tags are nested, one within the other, like below:

<sentence>This is <emphasis>good</emphasis>form.</sentence>

This next example is not ideal for XML documents:

<sentence>This is <emphasis>bad</sentence>form.</emphasis>

Who would want to use it?

XML is “data wrangling” for textual data. It is useful for researchers who are interested in manipulating or processing electronic text through various kinds of applications, programs, or analysis.

XML allows you to organize and describe textual elements, and model them into an ordered hierarchy. Another way of understanding XML’s utility is to approach the document as inherently containing a tree structure: the root element being the chapter or page, which contains smaller elements of sections, paragraphs, sentences, etc.,as the tree branches out. Organizing your document into this tree structure with XML allows the computer to parse and process text much more easily than it could otherwise.

One of XML’s child languages is TEI (the Text Encoding Initiative), a standard for marking up electronic texts in the humanities.

How do I get access to it?

You can write XML in any plain text editor. See the guidelines by the World Wide Web Consortium, who develops and maintains XML standards.

Where can I get more help with it?

Birnbaum, David J. “What is XML and Why Should Humanists Care?

Birnbaum, David J. <dh> course materials

What are some projects built with it?

The Letters of Vincent van Gogh encode the letters between van Gogh and his brother, Theo. All XML files are available for download [1].

Another example which walks you through using XML on the text of Shakespeare’s plays to run analysis.