TEI

From DARC (Digital Archive Research Collective)
Jump to navigation Jump to search

What is TEI?

TEI (Text Encoding Initiative) is a markup schema created specifically for text-encoding projects in linguistics, social sciences, library sciences, or humanities. TEI allows editors to mark up structural, renditional, and conceptual features of textual data. In more basic terms, TEI offers a set of guidelines for editors to “tag” text according to its various elements.

TEI guidelines are based off of XML, the eXtensible Markup Language, and were developed and are currently maintained by an academic collective known as the TEI Consortium.

As a markup schema, TEI is a series of instructions that indicates how a text ought to be presented on a page or screen, and information about the content of the text. Although there is an emphasis on appearance and structure, markup also applies to conceptual elements, such as cultural and personal references. For example, a TEI tag might encode the author’s identity, a section of deleted text, or geographic locations reference within the text.

TEI is often used for manuscript encoding. Here is an example of some TEI that encodes a line of text and indicates a portion that has been struck from a manuscript.

<line>This is a <del>paragraph</del>sentence.</line>

TEI, at first glance, looks a lot like HTML. Both use a tagging structure that encloses textual elements to indicate something about those elements. However, while HTML encodes how text should appear on a webpage, in the form of titles, headings, paragraphs, or links, TEI encodes the content of the text, in addition to some description about its appearance.

Who would want to use it?

People who are undertaking a text-encoding project in linguistics, social sciences, library sciences, or humanities will likely benefit from using TEI in their projects.

Once a text is marked up with TEI, the relevant elements can be searched, processed, and rendered to facilitate scholarly research. These elements range from the more objective, physical traces on the page/screen to the more subjective ideas and assumptions about meaning and references behind the visible elements.

Encoding with TEI also adds the benefit of making the text sustainable and portable. Because the format of the TEI itself is built from a non-invasive markup language (XML), it can be preserved over time and across computer programs, and will outlast more stylized file formats like a Microsoft Word document.

How do I get access to it?

The TEI consortium is the main governing body that maintains the guidelines The actual guidelines are hosted on the TEI-C Online Documentation page Where can I get more help with it?

See the workshop by Filipa Calado (Digital Fellows).

What are some projects built with it?

The Cather Archive, a digital archive of the author's novels, stories, nonfiction, letters, and journalism.

The Shelley-Godwin Archive, containing the manuscripts of Percy Shelley, Mary Shelley, William Godwin, and Mary Wollstonecraft.