Contents
- 1 General
- 2 Our xml file
- 3 The ConTeXt style file
- 4 Removing unwanted strings from xml source
-
5
Additional considerations for TEI → ConTeXt workflows
- 5.1 TEI markup and poetic structures
- 5.2 From semantic description to typographic procedure
- 5.3 Historical examples and their limitations
- 5.4 External XML versus internal XML buffers
- 5.5 Alignment of parallel poetic texts
- 5.6 Commentary, notes, and verse-based layouts
- 5.7 Typographic context and poetic scope
- 5.8 Explicit mappings as an editorial choice
- 5.9 Example TEI sources and workflows
- 6 Overview (three approaches)
- 7 Example TEI sources and repositories
- 8 Another MWE (Minimum Working Example) in poetry
- 9 A much more complete MWE in XML-TEI poetry
-
10
A progressive TEI → ConTeXt example (Greek + French)
- 10.1 Scope and limitations of this example
- 10.2 A minimal TEI source file
- 10.3 Loading a TEI file in ConTeXt
- 10.4 Routing XML elements to ConTeXt macros
- 10.5 Distinguishing Greek and French text
- 10.6 A simple parallel presentation
- 10.7 What this example does not do
- 10.8 Summary
- 10.9 Distinguishing Greek and French text
- 10.10 A simple parallel presentation
- 10.11 What this example does not do
- 10.12 Summary
- 11 About \xmltext vs \xmlflush when processing TEI content
- 12 Overview: how the workflow works
- 13 Expected TEI structure
- 14 Why ConTeXt does not require an external XML parser
- 15 A ConTeXt-native XML example (recommended)
- 16 What this example demonstrates
- 17 When Lua code becomes necessary
- 18 Summary
- 19 1. TEI file structure: what does Lua expect?
- 20 A ConTeXt-native XML example (recommended)
- 21 8. What this example demonstrates
- 22 Further reading
- 23 See also
General
TEI (Text Encoding Initiative) is "a consortium which collectively develops and maintains a standard for the representation of texts in digital form," to quote their own website. They have developed a series of guidelines for editing texts in a digital form. In their latest form (which is called P 5), these guidelines weigh in at a hefty 1350 pages (OK, that's counting the bibliography and the index too; there are only 1290 pages of real text). These describe an xml format which is suitable for editing texts. The TEI guidelines have the advantage of being very well documented. There are a number of free resources available that should help everyone who is interested in getting started (one extremely helpful website with lots of tutorials, examples, and tests is TEI by example). They are not (and do not aspire to be) an absolute standard that everyone has to follow, but many academic projects use these guidelines, and they should be a pretty good way to make sure that your electronic edition of a text will be useful in the future.
Since editing texts is something which quite a few users of ConTeXt are involved in, it makes sense to think about ways in which xml documents which follow the TEI guidelines can be typeset with ConTeXt. We would invite users to keep a few caveats in mind:
- The TEI guidelines are very detailed because they try to cater to a large number of needs. Most users will only need a small subset of the tags and attributes which the guidelines offer (in fact, TEI is aware of this and has a slimmed down version of their guidelines which is called TEI Lite . This is a very good starting place to familiarize yourself with TEI). It would not make sense to try and provide a monolithic solution that defines all TEI tags; instead, localized ConTeXt style sheets are necessary which will define a subset which is relevant for a number of texts with similar features.
- Even with this huge number of tags, TEI does not expect to be sufficient for every text. Users are encouraged to develop their own styles; again, this necessitates special ConTeXt style sheets to process such adaptations.
- Encoding and typesetting texts in xml is an ongoing process. As you go forward in your edition, you realize that you need more tags, that you need to distinguish more special cases, that you want to add more information to your edition. This means that you will have to go back and forth between your xml file and the ConTeXt style and adapt both to your needs.
All of which means that the following paragraphs are just the first step in an ongoing attempt. I (Thomas) have written down a setup for a text that I am editing (for those who are interested: the Lives of the Sophists by Philostratus). I fully expect this to be a community effort: as others use TEI xml, they will discover new ways of handling things, will want to add features or add examples for other sorts of texts. My example is meant to start the discussion. Since those who edit texts usually have a background in the humanities, not in programming, I have added lengthy comments which will explain every step.
Our xml file
Philostratus's text is in ancient Greek, but since the text itself doesn't matter much when we talk about structure and typesetting xml, I have replaced it here with a simple lorem ipsum text that is easier to display. So here's what the first paragraphs of the xml file philostratus.xml look like:
<?xml version="1.0"?> <TEI xmlns="http://www.tei-c.org/ns/1.0" xml:lang="en"> <teiHeader> <fileDesc> <titleStmt> <title>Lives of the Sophists</title> <author>Philostratus</author> <respStmt> <resp>editor</resp> <name xml:id="TAS">Thomas</name> </respStmt> </titleStmt> <publicationStmt> <p>Work in progress</p> </publicationStmt> <sourceDesc> <p>See indication of manuscripts</p> </sourceDesc> </fileDesc> </teiHeader> <text> <front> <div type="sigla"> <listWit> <witness id="c2">codd. 2</witness> <witness id="Richards">Richards</witness> </listWit> </div> <div type="work"> <head type="main">Philostrati</head> <head type="sub">Vitae Sophistarum.</head> <opener> <salute>Lorem <pb ed="Olearius" n="479"/>Ipsum</salute> </opener> </div> </front> <body> <div xml:id="VS1" n="I" type="book"> <div xml:id="VS1.1" n="1" type="chapter"> <div xml:id="VS1.1.1" n="1" type="section"> <p>Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt <app> <rdg wit="#Richards">induunt</rdg> </app> ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren <app> <rdg wit="#c2">arrgl</rdg> </app> <pb ed="Olearius" n="480"/> no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt.</p> </div> <div xml:id="VS1.1.2" n="2" rend="inline" type="section"> <p>ut labore et dolore magna aliquyam erat, sed diam voluptua. <pb ed="Olearius" n="481"/> At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. <lg> <l>At vero eos et accusam et justo duo dolores</l> </lg> et <lg> <l>ea rebum. Stet clita kasd gubergren, no sea takimata</l> </lg> sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet</p> </div> <div xml:id="VS1.1.3" n="3" rend="paragraph" type="section"> <p>Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat.</p> </div> </div> </div> </body> </text> </TEI>
So let's have a look at this file. This can be brief since most of the tags are described in the TEI guidelines and tutorials.
Every TEI file has as its root level (i.e. the "outer" level of the xml file) the element
<TEI> </TEI>
which defines it as a TEI xml file. Everything else is a "child" of this root level. At the next level, you see two of these children: on the one hand, the <teiHeader> element. This contains meta-information about your electronic edition: title, author, editor, publication status, source of your edition. There can be much more information here. This is meta-information which will usually not be typeset in your edition.
The other child is the <text> element. This is what will really be in a typeset, printed edition. As you see, the <text> element has again two children. The <front> contains the title of the work you edit in the form in which it will appear in your typeset document, prefatory material, etc. The <body> element contains the text itself. This text has a logical structure: It consists of books, chapters, and sections. All of these logical parts are expressed via different <div> elements; to distinguish them from each other, these <div> elements have so-called attributes, so we have:
<div xml:id="VS1" n="I" type="book"> <div xml:id="VS1.1" n="1" type="chapter"> <div xml:id="VS1.1.1" n="1" type="section"> </div> </div> </div>
As you can see, most of these "div" elements have other attributes as well: the "xml:id" attribute gives every section in your document a unique identifier. This makes it easier to refer to these sections later. You are free to choose these attributes; as an example, I have opted for a short numeric tag that refers to the paragraph. The "n" attribute is the name of the section as it will appear in your typeset edition. For classical prose texts, it is customary to have the chapter and section numbers appear in the margin of the edition, with no prefix and no additional information about the structure. E.g., at the beginning of chapter 8, there will be a bold 8 in the margin (the mark for "section 1" is understood and usually not expressed). For subsequent sections of chapter 8, there will be smaller section numbers in the margin, like "2," "3," etc. Finally, such sections of chapters do not necessarily begin a new paragraph. In order to make this clear, I have used the "rend" attribute (not exactly in the way TEI defines it, but close enough). For sections, I have two types of "rend" attributes: "inline" means that this section should just continue the typographical paragraph; "paragraph" means that it should begin in a new paragraph. This is an important distinction which I want to emphasize: in your typeset edition, these two will appear very different. For the logical structure of your digital text, however, they are both on the same level. That's why they are both "div" of the same type, but with different "rend" attributes.
Further, we have <pb> elements. These are used to denote pagebreaks in standard editions, which are often used for reference purposes and displayed in the margin; in the case of the Lives of the Sophists, this is the 18th-century edition of Olearius. These elements are inserted at the places where these pagebreaks occur.
Finally, we have the critical apparatus. Its notes are included in <app> elements. Every single entry into the apparatus is within a <rdg> (= reading) element.
This should be enough to get us started. We will now look at the way in which we will typeset such a xml document with ConTeXt.
The ConTeXt style file
NB: Some of the functionality described here has been introduced quite recently. You will need a ConTeXt version not earlier than December 2010 in order to try this example!
In order to typeset such a file with ConTeXt, we need a style file which will map xml elements and attributes to specific ConTeXt commands. We have to save this file (let's call it tei-style.tex) somewhere where ConTeXt can find it (e.g., somewhere in your personal texmf tree or in the same directory as the xml file) and then typeset with the command context --environment=tei-style philostratus.xml. We will look at this file in detail:
\startxmlsetups xml:teisetups \xmlsetsetup{#1}{*}{-} \stopxmlsetups
We define a set of \xmlsetups in a \start \stop environment, and we give it a name in the namespace xml:. The first line of these setups does only one thing: the \xmlsetsetup operates on the current xml tree (that's what the first argument {#1} refers to), takes all its elements ({*}) and discards them ({-}). That means only elements which we address explicitly will be typeset. This is necessary in our case because we do not want the information in the TEI header to be typeset.
For those elements we do want typeset, we have to add instructions. This involves a three-step process:
-
We have to add their names to a line which defines a
\xmlsetsetup - We define a specific setup for them
- (optional) we define TeX commands for typesetting
Let us begin with some easy steps. The xml tree we are operating on is empty now. So we first have to tell ConTeXt to pass the content of the topmost elements to its typesetting engine. The topmost element is TEI, so we write:
\startxmlsetups xml:teisetups \xmlsetsetup{#1}{*}{-} \xmlsetsetup{#1}{TEI}{xml:*} \stopxmlsetups \xmlregistersetup{xml:teisetups} \startxmlsetups xml:TEI \xmlflush{#1} \stopxmlsetups
So: we add the TEI element to a new \xmlsetsetup. We "register" the setups we have defined. And then we declare that the content of the element TEI should be passed to ConTeXt; this is what the line \xmlflush{#1} does.
Of course, we will do the same for the text element, but not for the TEIheader element, which we do not want to be typeset. So we now have:
\startxmlsetups xml:teisetups \xmlsetsetup{#1}{*}{-} \xmlsetsetup{#1}{TEI|text}{xml:*} \stopxmlsetups \xmlregistersetup{xml:teisetups} \startxmlsetups xml:TEI \xmlflush{#1} \stopxmlsetups \startxmlsetups xml:text \xmlflush{#1} \stopxmlsetups
Things become a bit more interesting when we look at the next level. We will start with the text proper, which is contained in the body element. For the text, we want line numbers in the margin, and we want these linenumbers in steps of five, in a small font. Here you can see the three steps we have to take:
\startxmlsetups xml:teisetups \xmlsetsetup{#1}{*}{-} \xmlsetsetup{#1}{TEI|text|body}{xml:*} \stopxmlsetups \xmlregistersetup{xml:teisetups} \startxmlsetups xml:TEI \xmlflush{#1} \stopxmlsetups \startxmlsetups xml:text \xmlflush{#1} \stopxmlsetups \startxmlsetups xml:body \startlinenumbering \xmlflush{#1} \stoplinenumbering \stopxmlsetups \setuplinenumbering[location=inner, step=5, method=page, style=\tfxx, align=left, distance=0.3em, width=0.3cm]
So we have:
-
added the element
body
to our
\xmlsetsetup -
added a specific setup for the element which puts its content within a
\startlinenumberingenvironment -
added ConTeXt setup commands for the
\startlinenumberingenvironment.
Things become even more interesting at the next level. When you look at our xml document, you will see that the entire body consists of different divisions in div elements; the different levels are distinguished by different type attributes. This means we cannot simply add the div element to our general \xmlsetsetup, but have to add a specific \xmlsetsetup for every type. Fortunately, ConTeXt makes it easy to address these different elements. We begin with the book level: (for clarity, I will now only show the new steps, not the entire style document):
\startxmlsetups xml:teisetups \xmlsetsetup{#1}{*}{-} \xmlsetsetup{#1}{TEI|text|body}{xml:*} \xmlsetsetup{#1}{div[@type='book']}{xml:div:book} \stopxmlsetups \startxmlsetups xml:div:book \blank[line]\midaligned{\xmlatt{#1}{n}}\blank[medium] \xmlflush{#1} \stopxmlsetups
What happens here? The expression div[@type='book'] means "every element div which has an attribute 'type' with the value 'book.'" We want a blank line before the title of the book. Then, we take the value of the n attribute (that's what the construct \xmlatt{#1}{n} expands to: the value of the attribute n of the current tag) and typeset it midaligned. We add another, smaller blank. And don't forget to "flush" the content of the div element!
For the next level, the chapter, we need again three steps: add it to the \xmlsetsetup, define a setup command and a ConTeXt macro for it:
\startxmlsetups xml:teisetups \xmlsetsetup{#1}{*}{-} \xmlsetsetup{#1}{TEI|text|body}{xml:*} \xmlsetsetup{#1}{div[@type='chapter']}{xml:div:chapter} \stopxmlsetups \startxmlsetups xml:div:chapter \PhilSection{\xmlatt{#1}{n}} \xmlflush{#1} \par \stopxmlsetups \defineinmargin [PhilSection] [outer] [normal] [distance=0.3em,style=\tfa\bf]
So: here, the argument of the n attribute is passed to a ConTeXt
macro \PhilSection. This macro is defined as an
\inmargin which will be typeset in the outer margin, in a
bigger, bold font. This will be the "chapter" numbering in the outer
margin.
For the section numbering, we take a similar approach, but as you will see, we need to define even more different setups:
\startxmlsetups xml:teisetups \xmlsetsetup{#1}{*}{-} \xmlsetsetup{#1}{TEI|text|body}{xml:*} \xmlsetsetup{#1}{div[@type='section']}{xml:div:section} \stopxmlsetups \startxmlsetups xml:div:section \doifelse {\xmlatt{#1}{n}} {1} {\xmlflush{#1}} {\doifelse {\xmlatt{#1}{rend}} {paragraph} {\par\PhilSubsection{\xmlatt{#1}{n}}\xmlflush{#1}} {\PhilSubsection{\xmlatt{#1}{n}}\xmlflush{#1}}} \stopxmlsetups \defineinmargin [PhilSubsection] [outer] [normal] [distance=0.3em,style=normal]
Here, we define a setup for the section level which contains two further
tests, for which we use ConTeXt's \doifelse macro. The first
\doifelse tests if the value of the n attribute is
"1," i.e., if this is the first section in a chapter. If it is, it does
nothing more than "flush" the content of this section -- remember, the
number for the first section should not appear in the margin since it is
implied in the chapter number. It's still good to have this number -- if
you ever decide that your typeset output should look different, the
information is there and can be shown. But for the time being, we do not
want it to appear, and that's what the first condition does. If the
n attribute's value isn't 1, another test is performed; this
time, we look at the value of the rend attribute. If this
attribute has the value "paragraph," we insert a \par, pass
the value of the n attribute to the macro
\PhilSubsection, and "flush" the content of our section. If
the value is anything else (i.e., "inline"), we flush the content without
inserting a \par. Then, we define \PhilSubsection
as another \inmargin, which will appear in the outer margin,
at the same place as the chapter numbering, but in a normal font. Finally,
when you look at the main text, you will see that we now have defined
setups for books, chapters, sections, but not yet for the smallest element,
p. Remember: we don't want paragraph breaks for these elements, so
all we need to do is "flush" them. Which means: we add the p
element to the list:
\xmlsetsetup{#1}{TEI|text|body|p}{xml:*}
and the appropriate setup is:
\startxmlsetups xml:p \xmlflush{#1} \stopxmlsetups
And that's it! This is our structure for the main text! If you typeset the xml file with this setup, you get text with marginal numbering for your chapters and sections.
We now add the bells and whistles. We begin with the Olearius pagebreaks, the <pb> elements. If you've followed so far, this should be easy. As you see, these elements contain a reference to the relevant edition (the ed= attribute) and the pagenumber. If we had more elements of this type, it would make sense to define a setsetup for every one of them. In the case of Philostratus, we will probably only have Olearius, so we just add them to our list:
\xmlsetsetup{#1}{TEI|text|body|p|pb}{xml:*}
and add both the setup for the xml element and a new definition for a
marginal text (since we're a bit paranoid, we still test whether the
xmlattribute ed is set to Olearius). Since I want the
Olearius numbers in square brackets, I needed to take a two-step approach
(the square brackets would be confusing to the ConTeXt parser). So I first
define an inmargin \Zolearius and then a macro
\Olearius which takes this value and typesets it within square
brackets, in the outer margin, at a distance of 2em from the main text:
\startxmlsetups xml:pb \doifelse {\xmlatt{#1}{ed}} {Olearius} {\Olearius{\xmlatt{#1}{n}}} {} \stopxmlsetups \defineinmargin [ZOlearius] [outer] [normal] [distance=2em,style=small] \define[1]\Olearius% {\ZOlearius{[#1]}}
Thomas 21:38, 7 November 2010 (UTC)
Removing unwanted strings from xml source
In some cases you might want to remove strings or characters from the xml source. For example ConTeXt cannot process a hashmark. The following example shows how to remove the hashmark from a xml identifier before processing with the command \cldcontext
The xml source:
<a href="#myspecialid">the previous section</a>
The setup code:
\startxmlsetups xml:initialize \xmlsetsetup{#1}{a}{xml:*} \stopxmlsetups \xmlregistersetup{xml:initialize} \startxmlsetups xml:a \cldcontext{string.sub([[\xmlatt{#1}{href}]],2)} \stopxmlsetups
Summary of the TEI + ConTeXt approach (after Thomas)
The TEI (Text Encoding Initiative) defines a very rich XML vocabulary for scholarly editions. ConTeXt is not designed to handle the entire TEI specification, and should not attempt to do so. Instead, as Thomas notes, each project should define a project-specific TEI subset that matches its editorial goals and conventions.
Main principles:
- 1. TEI is too large to be supported “as a whole”. The full TEI P5 specification is overly broad and not intended for direct rendering.
- 2. Projects should adopt a restricted TEI profile. Only necessary elements and attributes should be encoded.
- 3. ConTeXt excels at processing such custom subsets. Lua filters and ConTeXt styles can map TEI structures to typeset output.
- 4. Editorial conventions must be explicit and documented. Clear rules ensure consistent encoding and predictable rendering.
- 5. Workflows evolve. Additional TEI elements can be supported progressively as the edition grows.
In short: ConTeXt is not a universal TEI renderer; it is a flexible framework for building TEI-powered editorial workflows.
Additional considerations for TEI → ConTeXt workflows
The following notes expand the original text and provide practical guidance for users wishing to combine TEI XML with ConTeXt.
Working with TEI-encoded poetry in ConTeXt raises a number of specific difficulties that are not immediately visible from general XML examples. These difficulties are closely tied to the structural and typographic constraints of verse: line segmentation, stanza grouping, alignment, and the close coupling between text and commentary.
This section focuses exclusively on TEI workflows applied to poetic material.
TEI markup and poetic structures
In TEI, poetic structures are typically encoded using elements such as <lg> (line group) and <l> (verse line). These elements describe the logical and semantic organization of a poem: strophes, verse units, and their hierarchy.
ConTeXt, however, does not interpret these elements semantically. It processes XML as a tree of nodes selected by XPath expressions, leaving the typographic interpretation entirely to the user.
As a consequence, verse lines and stanzas must be explicitly mapped to typographic constructs: lines must be iterated over, grouped, aligned, and rendered according to editorial intent.
From semantic description to typographic procedure
The main conceptual difficulty lies in the transition from a semantic description of poetry to a procedural typographic workflow.
TEI expresses what a verse line is. ConTeXt requires instructions describing how verse lines are composed on the page.
This involves, for example:
- explicit iteration over <l> elements,
- control of line numbering,
- synchronization of parallel verse sequences (original text, translation, commentary),
- management of page and column breaks in relation to poetic structure.
For poetry, this mapping step is unavoidable and must be designed consciously.
Historical examples and their limitations
Many existing TEI-related examples in ConTeXt documentation originate from earlier development stages and often mix different approaches without clearly identifying their scope or limitations.
For poetic material, this can be problematic:
- examples may assume fixed stanza lengths,
- alignment mechanisms may not scale beyond trivial cases,
- commentary and notes may interfere with verse layout.
A careful separation between structural logic (iteration and alignment) and typographic decisions is therefore essential.
External XML versus internal XML buffers
A common approach consists in placing the TEI source in an external XML file and processing it from a ConTeXt document. While this is suitable for production workflows, it can obscure the interaction between poetic structure and typographic output during experimentation.
Embedding the TEI source directly inside the ConTeXt document, using an internal XML buffer, provides a clearer working model. This internal buffer approach, suggested by Hans Hagen, offers a particularly clear way to keep the poetic XML structure and its typographic processing under tight control within a single source file.
For verse-based material, this makes it easier to observe how changes in stanza structure, line order, or commentary placement affect the composed page.
Alignment of parallel poetic texts
Poetic editions frequently involve parallel structures, such as an original text and a translation aligned verse by verse. In ConTeXt, such alignments are typically implemented by iterating over verse lines and synchronizing parallel sequences by position.
This approach is intentionally explicit and mechanical: verse lines are aligned because they occupy the same index within their respective structures. Although simple, this method provides stability and predictability, which are essential for scholarly poetic layouts.
Commentary, notes, and verse-based layouts
Commentary in poetic editions is often attached to individual verse lines rather than to continuous prose. When such commentary is combined with structured layouts (tables or columns), standard footnotes can quickly become intrusive or unstable.
A more controlled strategy consists in defining dedicated note mechanisms and placing the collected notes explicitly. This allows the editor to:
- preserve the visual integrity of verse layouts,
- avoid interference between notes and line alignment,
- maintain a clear relationship between verses and their commentary.
This level of control is particularly important in poetic material, where spatial organization is part of the meaning.
Typographic context and poetic scope
When annotations are triggered from within a specific poetic context—such as a column containing the original verse text—they inherit typographic properties from that context. This behavior can be used deliberately to reinforce the editorial structure of the edition.
In poetic workflows, understanding how ConTeXt propagates typographic context helps avoid unintended side effects and enables more refined layouts.
Explicit mappings as an editorial choice
There is no automatic or universal solution for processing TEI-encoded poetry in ConTeXt. Instead, ConTeXt encourages explicit mappings between poetic XML structures and typographic procedures.
Such mappings make editorial choices visible and controllable. They are particularly well suited to poetic material, where structure, alignment, and commentary are integral to the text itself.
Example TEI sources and workflows
This part of the page provides orientation for TEI → ConTeXt work. It does not describe a single mandatory workflow. Depending on the size of the source, the editorial goal (especially for verse), and the amount of preprocessing needed, there are three distinct approaches.
Overview (three approaches)
The same TEI source can be processed in different ways:
(A) ConTeXt-native XML processing TEI XML → ConTeXt XML mechanisms (xmlsetups / xmlcommand / XPath) → PDF
(B) Lua-driven XML processing TEI XML → Lua parsing + traversal → context() calls → PDF
(C) Hybrid workflows (advanced) TEI XML → preprocessing / transformation (Lua or external tools) → ConTeXt typesetting → PDF
In the context of this page (poetry and versification), approach (A) is generally the most transparent and reproducible, and it scales well from minimal examples to more elaborate layouts (line numbering, parallel columns, controlled note placement).
(A) ConTeXt-native XML processing (recommended for verse MWEs)
ConTeXt can process XML directly, using its built-in XML tools:
-
load XML from an external file (e.g.
\xmlload), -
or embed XML in a buffer for a self-contained example (e.g.
\startbuffer...\stopbuffer+\xmlprocessbuffer), - define mappings from XML elements to ConTeXt actions (xmlsetups),
- select nodes with XPath and typeset the result.
This approach is well suited for:
-
TEI-encoded verse (
<lg>,<l>), - line-by-line alignment (original / translation),
- commentary attached to verse lines,
- notes placed in a controlled location (especially when tables or columns are involved).
(B) Lua-driven XML processing (optional, project-specific)
Lua can also be used to parse and traverse the TEI tree and to send extracted content to ConTeXt via context() calls.
This approach can be useful when:
- complex transformations are needed before typesetting,
- the workflow includes algorithmic selection, filtering, or restructuring,
- the project already relies on Lua-based processing.
However, it introduces additional moving parts (Lua code, parsing strategy, module availability), so it is not used in the MWEs presented here for verse workflows.
(C) Hybrid workflows (advanced)
In large or research-oriented projects, a hybrid pipeline is common:
- preprocessing or transformation of TEI (Lua or external tools),
- followed by ConTeXt-native typesetting.
Such workflows are outside the scope of this page and are best documented separately.
Example sources (TEI)
For testing, learning, or experimenting with TEI encoding, the following resources are often useful:
- TEI by Example (tutorial with progressively structured samples)
- TEI Guidelines (specification and examples)
- EpiDoc (TEI subset for inscriptions and classical texts)
- Various public repositories of TEI samples (check licences before reuse)
Note: when reusing sample TEI files, always verify the licence terms and attribution requirements.
Example TEI sources and repositories
A minimal TEI → ConTeXt example
The following example shows how to:
- store a very small TEI document in a buffer within a ConTeXt file,
- and typeset it with ConTeXt as a chapter and a paragraph.
This is not a full TEI engine, just a didactic “first step”.
\setuppapersize[A4]
\setupTABLE[frame=off]
\setupTABLE[column][1][width=.1\textwidth]
\setupTABLE[column][2][width=.450\textwidth]
\setupTABLE[column][3][width=.45\textwidth]
\startbuffer[poem]
<?xml version="1.0" encoding="UTF-8"?>
<TEI>
<text>
<body>
<lg type="stanza">
<lg type="orig">
<l>First light touches the closed houses,</l>
<l>and the street breathes a pale memory.</l>
</lg>
<lg type="trans">
<l>La première lumière effleure les maisons closes,</l>
<l>et la rue respire une mémoire pâle.</l>
</lg>
<lg type="comm">
<l>light] atmospheric motif in the poem.</l>
<l>memory] semantic key recurring in several stanzas.</l>
</lg>
</lg>
</body>
</text>
</TEI>
\stopbuffer
\startxmlsetups xml:main
\xmlsetsetup{#1}{*}{xml:*}
\stopxmlsetups
\xmlregistersetup{xml:main}
\startxmlsetups xml:TEI
\xmlcommand{#1}{text/body/lg[@type='stanza']}{xml:stanza}
\stopxmlsetups
\startxmlsetups xml:original
\language[french]
\xmltext{#1}
\stopxmlsetups
\startxmlsetups xml:translation
\language[french]
\xmltext{#1}
\stopxmlsetups
\startxmlsetups xml:comment
\footnote{\xmltext{#1}}
\stopxmlsetups
\startxmlsetups xml:stanza
\bTABLE
\bTR
\bTH line \eTH
\bTH original \eTH
\bTH translation \eTH
\eTR
\dorecurse {\xmlcount{#1}{./lg[@type='orig']/l}} {
\bTR
\bTD
##1
\eTD
\bTD
\xmlcommand{#1}{./lg[@type='orig']/l[##1]}{xml:original}
\xmlcommand{#1}{./lg[@type='comm']/l[##1]}{xml:comment}
\eTD
\bTD
\xmlcommand{#1}{./lg[@type='trans']/l[##1]}{xml:translation}
\eTD
\eTR
}
\eTABLE
\stopxmlsetups
\setupfootertexts
[{\em Document généré avec \ConTeXt}]
[]
\starttext
\xmlprocessbuffer{main}{poem}{}
\stoptext
Another MWE (Minimum Working Example) in poetry
Mapping TEI (verse) to ConTeXt: why the buffer-based approach is useful
In this example — slightly more elaborate than the previous one — we deliberately adopt a method suggested by Hans Hagen: the TEI-like XML document is stored in an internal buffer. This choice is not merely technical; it makes the logic of the workflow fully visible in a single file.
The buffered XML already contains all the semantic structure needed for the edition:
- the division into stanzas,
- the verse lines,
- the English original,
- the French translation,
- and the line-by-line comments (lemmas).
ConTeXt does not interpret TEI by itself. Its role is to map specific XML structures to typesetting actions. This mapping is made explicit through a small set of \xmlsetups, each one answering a simple question: when ConTeXt encounters this XML element, what should be done with it?
In practical terms:
-
<l>elements are mapped to numbered table rows, -
<lg type="orig">and<lg type="trans">are mapped to distinct columns, -
<lg type="comm">is mapped to line-bound notes, -
the surrounding
<lg type="stanza">controls the iteration over verses.
Once this mapping is defined, the actual typesetting phase is minimal. Between \starttext and \stoptext, ConTeXt can be reduced to a single call:
\xmlprocessbuffer{main}{poem}{}
At that point, ConTeXt is no longer “reading a poem”: it executes a predefined correspondence between XML semantics and typographical structures. The result is a stable, reproducible layout where editorial logic remains in the XML, typographical logic remains in ConTeXt, and the mapping between the two is explicit and inspectable.
This separation of concerns — semantic encoding in XML, layout logic in ConTeXt, and an explicit mapping between the two — is the key idea demonstrated by this example.
A much more complete MWE in XML-TEI poetry
This section presents a more complete XML-TEI example based on a poetic text. Such examples can quickly become difficult to read and reuse if introduced all at once.
For this reason, the material below is approached progressively. Before examining the full poetic TEI source, we first introduce a sequence of minimal working examples that explain the underlying mechanisms step by step:
- how a TEI document is loaded in ConTeXt,
- how XML elements are routed to ConTeXt macros,
- how multiple languages are handled,
- and how these techniques scale to a more complex poetic structure.
The complete XML-TEI poetry example shown later in this section can then be read as a direct application of these principles, rather than as an isolated or opaque code sample.
This section presents a step-by-step example showing how a small TEI document containing Greek text and a French translation can be processed directly by ConTeXt.
The goal is not to demonstrate the full TEI specification, but to explain how ConTeXt can read a TEI source, select its elements, and map them to typographical structures.
All examples below are minimal working examples and can be compiled independently.
The general processing flow can be summarized as follows:
TEI XML source
|
v
\xmlprocessfile
|
v
XML element routing
(\xmlsetups, \xmlflush)
|
v
ConTeXt macros
|
v
Typeset output (PDF)
Scope and limitations of this example
This example illustrates:
- loading a TEI XML file in ConTeXt,
- routing XML elements to ConTeXt macros,
- handling multiple languages (Greek and French),
- producing a simple typographical layout.
It deliberately does not cover:
- critical apparatus or textual variants,
- complex TEI customizations,
- XSLT-based preprocessing,
- advanced editorial layouts.
The focus is on clarity and reproducibility.
A minimal TEI source file
We start with a very small TEI document, containing only the elements needed for this example.
Minimal TEI file (
example.xml
)
<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:lang="grc">
<teiHeader>
<fileDesc>
<titleStmt>
<title>Sample Greek Text</title>
</titleStmt>
<publicationStmt>
<p>Unpublished</p>
</publicationStmt>
<sourceDesc>
<p>Demonstration only</p>
</sourceDesc>
</fileDesc>
</teiHeader>
<text>
<body>
<div>
<l xml:lang="grc">βουλὴ δὲ κακὴ νίκησεν ἑταίρων</l>
<l xml:lang="fr">Mais un mauvais conseil l’emporta sur les compagnons.</l>
</div>
</body>
</text>
</TEI>
Remarks
- Only standard TEI elements are used.
-
Language information is provided via
xml:lang. - No apparatus or annotations are included.
Loading a TEI file in ConTeXt
The first step in ConTeXt is simply to load the XML file and make it accessible.
Minimal ConTeXt file (
example.tex
)
\starttext
\xmlprocessfile{main}{example.xml}{}
\stoptext
At this stage, nothing is typeset. This confirms only that the XML file can be read by ConTeXt.
Routing XML elements to ConTeXt macros
ConTeXt does not format TEI automatically. Instead, XML elements are explicitly mapped to ConTeXt commands.
The routing mechanism used in this example can be summarized as follows:
<l xml:lang="grc">...</l> --> \xmlsetups xml:l --> Greek formatting <l xml:lang="fr">...</l> --> \xmlsetups xml:l --> French formatting
Defining XML setups
\startxmlsetups xml:tei
\xmlflush{#1}
\stopxmlsetups
\startxmlsetups xml:l
\xmlflush{#1}\par
\stopxmlsetups
Processing the TEI structure
\starttext
\xmlprocessfile{main}{example.xml}{xml:tei}
\stoptext
A progressive TEI → ConTeXt example (Greek + French)
This section presents a step-by-step example showing how a small TEI document containing Greek text and a French translation can be processed directly by ConTeXt.
The goal is not to demonstrate the full TEI specification, but to explain how ConTeXt can read a TEI source, select its elements, and map them to typographical structures.
All examples below are minimal working examples and can be compiled independently.
The general processing flow can be summarized as follows:
TEI XML source
|
v
\xmlprocessfile
|
v
XML element routing
(\xmlsetups, \xmlflush)
|
v
ConTeXt macros
|
v
Typeset output (PDF)
---
Scope and limitations of this example
This example illustrates:
- loading a TEI XML file in ConTeXt,
- routing XML elements to ConTeXt macros,
- handling multiple languages (Greek and French),
- producing a simple typographical layout.
It deliberately does not cover:
- critical apparatus or textual variants,
- complex TEI customizations,
- XSLT-based preprocessing,
- advanced editorial layouts.
The focus is on clarity and reproducibility.
A minimal TEI source file
We start with a very small TEI document, containing only the elements needed for this example.
Minimal TEI file (
example.xml
)
<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:lang="grc">
<teiHeader>
<fileDesc>
<titleStmt>
<title>Sample Greek Text</title>
</titleStmt>
<publicationStmt>
<p>Unpublished</p>
</publicationStmt>
<sourceDesc>
<p>Demonstration only</p>
</sourceDesc>
</fileDesc>
</teiHeader>
<text>
<body>
<div>
<l xml:lang="grc">βουλὴ δὲ κακὴ νίκησεν ἑταίρων</l>
<l xml:lang="fr">Mais un mauvais conseil l’emporta sur les compagnons.</l>
</div>
</body>
</text>
</TEI>
Remarks
- Only standard TEI elements are used.
-
Language information is provided via
xml:lang. - No apparatus or annotations are included.
Loading a TEI file in ConTeXt
The first step in ConTeXt is simply to load the XML file and make it accessible.
Minimal ConTeXt file (
example.tex
)
\starttext
\xmlprocessfile{main}{example.xml}{}
\stoptext
At this stage, nothing is typeset. This confirms only that the XML file can be read by ConTeXt.
Routing XML elements to ConTeXt macros
ConTeXt does not format TEI automatically. Instead, XML elements are explicitly mapped to ConTeXt commands.
The routing mechanism used in this example can be summarized as follows:
<l xml:lang="grc">...</l> --> \xmlsetups xml:l --> Greek formatting <l xml:lang="fr">...</l> --> \xmlsetups xml:l --> French formatting
Defining XML setups
\startxmlsetups xml:tei
\xmlflush{#1}
\stopxmlsetups
\startxmlsetups xml:l
\xmlflush{#1}\par
\stopxmlsetups
Processing the TEI structure
\starttext
\xmlprocessfile{main}{example.xml}{xml:tei}
\stoptext
Key idea
ConTeXt treats TEI as structured data. Formatting decisions are made entirely by the user.
Distinguishing Greek and French text
Language information from xml:lang can be used to select different
formatting rules.
Language-aware routing
\startxmlsetups xml:l
\doif{\xmlatt{#1}{xml:lang}}{grc}
{\italic\xmlflush{#1}}
{\xmlflush{#1}}
\par
\stopxmlsetups
In this example:
- Greek lines are typeset in italics.
- French lines are typeset normally.
More advanced font or language setups can be added later if needed.
A simple parallel presentation
The same XML structure can be used to produce a simple parallel layout.
Example using blocks
\startxmlsetups xml:l
\startalignment[flushleft]
\xmlflush{#1}
\stopalignment
\stopxmlsetups
This example keeps the layout intentionally simple. More complex column-based layouts are possible, but outside the scope of this demonstration.
What this example does not do
This example intentionally avoids:
- critical apparatus handling,
- lemma numbering,
- manuscript variants,
- TEI transformations outside ConTeXt.
Such features require additional editorial structures and are better treated in dedicated examples.
Summary
This example demonstrates a minimal and transparent TEI → ConTeXt workflow:
- TEI provides structured textual data.
- ConTeXt selects and maps XML elements.
- Typography is controlled entirely at the ConTeXt level.
This approach scales from simple demonstrations to more advanced editorial projects.
Final note
This example is meant to be read, modified, and extended. Its purpose is not to prescribe a workflow, but to clarify how ConTeXt interacts with TEI at a basic level.
Key idea
ConTeXt treats TEI as structured data. Formatting decisions are made entirely by the user.
Distinguishing Greek and French text
Language information from xml:lang can be used to select different
formatting rules.
Language-aware routing
\startxmlsetups xml:l
\doif{\xmlatt{#1}{xml:lang}}{grc}
{\italic\xmlflush{#1}}
{\xmlflush{#1}}
\par
\stopxmlsetups
In this example:
- Greek lines are typeset in italics.
- French lines are typeset normally.
More advanced font or language setups can be added later if needed.
---
A simple parallel presentation
The same XML structure can be used to produce a simple parallel layout.
Example using blocks
\startxmlsetups xml:l
\startalignment[flushleft]
\xmlflush{#1}
\stopalignment
\stopxmlsetups
This example keeps the layout intentionally simple. More complex column-based layouts are possible, but outside the scope of this demonstration.
What this example does not do
This example intentionally avoids:
- critical apparatus handling,
- lemma numbering,
- manuscript variants,
- TEI transformations outside ConTeXt.
Such features require additional editorial structures and are better treated in dedicated examples.
Summary
This example demonstrates a minimal and transparent TEI → ConTeXt workflow:
- TEI provides structured textual data.
- ConTeXt selects and maps XML elements.
- Typography is controlled entirely at the ConTeXt level.
This approach scales from simple demonstrations to more advanced editorial projects.
Final note
This example is meant to be read, modified, and extended. Its purpose is not to prescribe a workflow, but to clarify how ConTeXt interacts with TEI at a basic level. Now that the procedures have been clarified, here is a slightly more substantial example (an English poem, with its French translation).
- This MWE is larger and more sophisticated than the previous one, but the structure is the same (some addenda about \xmltext and \xmlflush are developped, after the code below:
\setuppapersize[A4]
\setupbodyfont[EBGaramond,10pt] % Latin Modern (défaut), 10pt
\setupTABLE[frame=off]
\setupTABLE[column][1][width=.1\textwidth]
\setupTABLE[column][2][width=.450\textwidth]
\setupTABLE[column][3][width=.45\textwidth]
% ----------------------------------------------------------
% Pied de page
% ----------------------------------------------------------
\setupfootertexts
[]
[{\tfxx Document composé avec \ConTeXt}]
% ----------------------------------------------------------
% Notes "plaçables" (robustes avec les tableaux)
% ----------------------------------------------------------
\definenote[commnote]
\setupnote[commnote][
rule=off,
paragraph=yes,
inbetween=\quad,
style=\tfxx,
]
% ----------------------------------------------------------
% XML dans un buffer (méthode Hans) — 16 vers + commentaire
% ----------------------------------------------------------
\startbuffer[poem]
<?xml version="1.0" encoding="UTF-8"?>
<TEI>
<text>
<body>
<lg type="stanza">
<lg type="orig">
<l>First light touches the closed houses,</l>
<l>and the street breathes a pale memory.</l>
<l>I hear the fountain count its coins,</l>
<l>while sparrows edit yesterday’s crumbs.</l>
<l>The baker lifts a shutter like a veil,</l>
<l>and flour rises, a brief white weather.</l>
<l>At the corner, a newspaper opens,</l>
<l>its headlines folded like tired wings.</l>
<l>I write your name, then cross it out,</l>
<l>the paper keeps the pressure, not the meaning.</l>
<l>Between two words a well appears,</l>
<l>and echoes practice someone else’s voice.</l>
<l>A tram arrives; its doors hesitate,</l>
<l>as if choosing which century to enter.</l>
<l>In my pocket, a key warms slowly,</l>
<l>metal remembering the shape of home.</l>
</lg>
<lg type="trans">
<l>La première lumière effleure les maisons closes,</l>
<l>et la rue respire une mémoire pâle.</l>
<l>J’entends la fontaine compter ses pièces,</l>
<l>tandis que les moineaux corrigent les miettes d’hier.</l>
<l>Le boulanger relève un volet comme un voile,</l>
<l>et la farine monte, brève météo blanche.</l>
<l>Au coin, un journal s’ouvre,</l>
<l>ses gros titres pliés comme des ailes lassées.</l>
<l>J’écris ton nom, puis je le rature,</l>
<l>le papier garde la pression, pas le sens.</l>
<l>Entre deux mots, un puits apparaît,</l>
<l>et les échos répètent la voix d’un autre.</l>
<l>Un tram arrive ; ses portes hésitent,</l>
<l>comme si elles choisissaient le siècle où entrer.</l>
<l>Dans ma poche, une clé se réchauffe lentement,</l>
<l>le métal se souvient de la forme du foyer.</l>
</lg>
<lg type="comm">
<l>light] atmospheric motif; establishes the scene.</l>
<l>memory] semantic key recurring across the poem.</l>
<l>fountain] auditory image; introduces counting/metre.</l>
<l>edit] meta-textual verb; aligns with “variants” later.</l>
<l>shutter/veil] unveiling metaphor; morning as revelation.</l>
<l>white weather] synesthetic image; flour as climate.</l>
<l>newspaper] public voice entering the private walk.</l>
<l>headlines] folded wings; fatigue of news cycles.</l>
<l>name] inscription + erasure; motif of revision.</l>
<l>pressure/meaning] material trace vs semantic loss.</l>
<l>well] gap between words; abyss of interpretation.</l>
<l>someone else] polyphony; speaker displaced.</l>
<l>tram] modern intrusion; tests punctuation in table.</l>
<l>century] controlled anachronism; time as choice.</l>
<l>key] pocket object; memory by touch/heat.</l>
<l>home] “shape” as form; stable equivalence in translation.</l>
</lg>
</lg>
</body>
</text>
</TEI>
\stopbuffer
% ----------------------------------------------------------
% Setups XML (structure Hans)
% ----------------------------------------------------------
\startxmlsetups xml:main
\xmlsetsetup{#1}{*}{xml:*}
\stopxmlsetups
\xmlregistersetup{xml:main}
\startxmlsetups xml:TEI
\xmlcommand{#1}{text/body/lg[@type='stanza']}{xml:stanza}
\stopxmlsetups
\startxmlsetups xml:original
\language[english]
\xmltext{#1}
\stopxmlsetups
\startxmlsetups xml:translation
\language[french]
\xmltext{#1}
\stopxmlsetups
% commentaire -> note "plaçable"
\startxmlsetups xml:comment
\commnote{\xmltext{#1}}
\stopxmlsetups
\startxmlsetups xml:stanza
\bTABLE
\bTR
\bTH line \eTH
\bTH original \eTH
\bTH translation \eTH
\eTR
\dorecurse{\xmlcount{#1}{./lg[@type='orig']/l}}{%
\bTR
\bTD ##1 \eTD
\bTD
\xmlcommand{#1}{./lg[@type='orig']/l[##1]}{xml:original}
\xmlcommand{#1}{./lg[@type='comm']/l[##1]}{xml:comment}
\eTD
\bTD
\xmlcommand{#1}{./lg[@type='trans']/l[##1]}{xml:translation}
\eTD
\eTR
}
\eTABLE
\stopxmlsetups
\starttext
\centerline{\bfb Poème TEI-XML : original, traduction, notes “plaçables”}
\blank[big]
{\em
Objectif : montrer une méthode robuste (inspirée de Hans Hagen) pour afficher un poème encodé en XML-TEI
en trois colonnes (n° du vers, original, traduction) tout en attachant un commentaire ligne à ligne sous forme de notes,
placées à un endroit choisi (ici : après le tableau), afin d’éviter les collisions “tableau + footnotes”. On remarquera les commentaires en anglais placés dans la cellule concernant le texte original.}\par
Purpose: to demonstrate a robust method (inspired by Hans Hagen) for displaying a poem encoded in XML-TEI in three columns (verse number, original, translation) while attaching line-by-line commentary in the form of notes, placed in a chosen location (here: after the table), in order to avoid collisions between the table and footnotes. Note the comments in English placed in the cell concerning the original text.
\blank[medium]
\centerline{\sc — Procédure (résumé technique) —}
\starttyping
1) XML intégré dans un buffer (\startbuffer...\stopbuffer)
2) Boucle sur le nombre de vers : \dorecurse{\xmlcount{#1}, etc.
3) Extraction des lignes via XPath : orig/l[i], trans/l[i], comm/l[i]
4) Commentaires collectés avec \commnote{\xmltext{#1}}, puis rendus par \placenotes[commnote]
\stoptyping
\blank[big]
\xmlprocessbuffer{main}{poem}{}
\blank[big]
% Notes en deux colonnes pour limiter les débordements
\startcolumns[n=2,balance=yes]
\placenotes[commnote]
\stopcolumns
\stoptext
About \xmltext vs \xmlflush when processing TEI content
In the examples above, the command \xmltext is used to extract and typeset the textual content of TEI elements. This approach is perfectly adequate for many minimal or controlled MWEs, especially when the XML elements only contain plain text.
However, it is important to understand a limitation of \xmltext: it only returns the textual content of the current node, and therefore silently discards any nested XML elements that may occur further down the tree.
For example, consider the following TEI line:
<l>First light touches the closed houses,</l>
If this element is processed using:
\xmltext{#1}
the output will be:
First emphlight/emph touches the closed houses,
The word light will be preserved as text, but the <emph> element itself is lost, and no specific formatting can be applied to it.
In contrast, using \xmlflush preserves the full XML structure of the node and allows ConTeXt to process nested elements via dedicated setups. For instance:
\startxmlsetups xml:original
\language[english]
\xmlflush{#1}
\stopxmlsetups
\startxmlsetups xml:emph
\emph{\xmlflush{#1}}
\stopxmlsetups
With this approach, ConTeXt can correctly detect and format the <emph> element, producing emphasized output in the final document.
This distinction becomes crucial in more advanced TEI workflows, especially when dealing with:
- inline markup such as <emph>, <hi>, <foreign>,
- named entities (<persName>, <placeName>),
- critical apparatus elements (<app>, <lem>, <rdg>),
or any situation where semantic XML markup must be preserved during typesetting.
As Denis Maier has pointed out, using \xmltext in such cases may result in an unintended loss of structural information, whereas \xmlflush provides a safer and more extensible mechanism for mapping TEI markup to ConTeXt commands.
In short:
-
\xmltextis simple and convenient for flat content,
-
\xmlflushis preferable whenever nested XML elements need to be processed or preserved.
For pedagogical MWEs, both approaches are useful, but it is important to be aware of their respective implications when designing scalable TEI → ConTeXt workflows.
Overview: how the workflow works
Before looking at concrete examples, it is useful to clarify the mental model behind XML processing in ConTeXt.
ConTeXt does not treat XML as an external format that must be converted beforehand. Instead, XML is read, traversed, and interpreted inside the ConTeXt engine, using Lua as an internal extension language.
The workflow can therefore be summarized as follows:
TEI XML document
↓
ConTeXt XML interface
(\xmlload, \xmlsetups, \xmlflush)
↓
LuaMetaTeX engine
↓
Typeset PDF
In this model:
- XML provides structure and semantics ;
- Lua is used internally by ConTeXt ;
- ConTeXt controls all typographic decisions .
No external preprocessing step is required.
For an overview of the XML subsystem, see:
- XML processing
- XML setups
- Hans Hagen, ConTeXt MkIV / LMTX Manual , XML chapters.
Expected TEI structure
ConTeXt does not impose a specific TEI schema. However, successful processing relies on a clear and stable structural convention.
For the examples below, we assume a minimal and readable structure:
TEI
└─ text
└─ body
├─ div type="work"
│ ├─ head
│ └─ lg
│ ├─ l (with optional notes)
│ └─ l
└─ div type="translation"
├─ head
└─ p
ConTeXt will then extract and process:
-
<head>elements as section titles, -
<l>elements as individual lines or verses, -
<note>elements as footnotes or critical annotations, -
<p>elements as prose paragraphs.
The important point is not the exact TEI vocabulary, but the logical consistency of the structure.
See also:
Why ConTeXt does not require an external XML parser
Some XML workflows rely on external Lua libraries (such as lxp.lom) to parse XML files manually. While this approach can be useful in generic Lua environments, it is not the intended method in ConTeXt.
ConTeXt already embeds Lua at engine level and provides:
-
a native XML loader (
\xmlload,\xmlprocessfile), - XPath-like selection,
- setup-based routing of XML nodes,
- controlled flushing of content into the typesetting stream.
Using an external parser duplicates functionality and bypasses ConTeXt’s own XML model. For this reason, ConTeXt documentation and examples do not rely on lxp.lom.
The preferred approach is therefore ConTeXt-native XML processing.
Reference:
- Hans Hagen, Lua in ConTeXt
- LuaMetaTeX
- XML processing
A ConTeXt-native XML example (recommended)
The following example illustrates the recommended method: XML is loaded once, and its elements are mapped to ConTeXt structures through setup definitions.
XML file (simplified TEI)
<?xml version="1.0" encoding="UTF-8"?>
<TEI>
<text>
<body>
<div type="edition" xml:lang="la">
<head>Exemplum Ciceronis</head>
<p>
<persName>Marcus Tullius Cicero</persName>
in
<placeName>Arpino</placeName>
natus est.
<note>Simple note.</note>
</p>
</div>
<div type="translation" xml:lang="fr">
<head>Traduction française</head>
<p>Cicéron naquit à Arpinum.</p>
</div>
</body>
</text>
</TEI>
ConTeXt file
\setuppapersize[A5]
\setupbodyfont[latin-modern]
\xmlload{cicero}{cicero-sample-tei.xml}{}
\starttext
\chapter{\xmltext{cicero}{/TEI/text/body/div[@type='edition']/head}}
\sc{\xmltext{cicero}{/TEI/text/body/div[@type='edition']/p/persName}}
\space in
\em{\xmltext{cicero}{/TEI/text/body/div[@type='edition']/p/placeName}}
\space natus est.\footnote{%
\xmltext{cicero}{/TEI/text/body/div[@type='edition']/p/note}
}
\blank[big]
\subject{\xmltext{cicero}{/TEI/text/body/div[@type='translation']/head}}
\xmltext{cicero}{/TEI/text/body/div[@type='translation']/p}
\stoptext
What this example demonstrates
This example shows how ConTeXt handles XML in a controlled and typographically sound way:
- XML encodes semantic information (persons, places, notes),
- ConTeXt decides how these elements are rendered,
- the XML file remains reusable and readable,
- no external Lua parsing is required.
This approach scales naturally to:
-
poetry (
<lg>,<l>), -
critical apparatus (
<app>,<lem>,<rdg>), - multilingual editions,
- scholarly annotations.
See also:
- Critical apparatus
- Poetry typesetting
- Hans Hagen, ConTeXt XML Explained
When Lua code becomes necessary
In advanced cases, Lua code may still be useful, for example to:
- preprocess large XML datasets,
- generate indices,
- compute derived structures.
In such situations, Lua is still used inside ConTeXt, and should interact with the XML interface already provided by the system.
External XML parsers such as lxp.lom belong to a different class of solutions and should be considered experimental or exploratory, not part of the standard ConTeXt workflow.
Summary
- ConTeXt integrates Lua natively and does not require external XML parsers.
- XML processing is declarative and setup-driven.
- Typographic responsibility always remains on the ConTeXt side.
- This makes ConTeXt particularly well suited for scholarly and critical editions.
References:
- Hans Hagen, ConTeXt MkIV / LMTX Manual
- XML processing
- LuaMetaTeX
- TEI XML
1. TEI file structure: what does Lua expect?
Lua will assume the TEI file has this structural pattern:
TEI
└── text
└── body
├── div type="work" (Greek text)
│ ├── head
│ └── lg
│ ├── l (verse 1) + note
│ └── l (verse 2)
└── div type="translation"
├── head
└── p
Everything inside these elements is extracted:
-
<head>⇒ chapter title -
<l n="…">⇒ numbered verse -
<note>⇒ converted to ConTeXt footnote -
⇒ printed in French
A ConTeXt-native XML example (recommended)
The examples above introduced the general principles of XML processing in ConTeXt. This section provides a compact and fully working example, intentionally small, that demonstrates the recommended approach:
- XML is loaded by ConTeXt,
- relevant nodes are selected using ConTeXt’s XML interface,
- ConTeXt decides how each element is typeset.
No external XML parser and no standalone Lua XML code is required.
1. Create the TEI file
Create a file named: cicero-sample-tei.xml
<?xml version="1.0" encoding="UTF-8"?>
<!-- Simplified TEI, no namespace for clarity -->
<TEI>
<text>
<body>
<div type="edition" xml:lang="la">
<head>Exemplum Ciceronis</head>
<p>
<persName>Marcus Tullius Cicero</persName>
in
<placeName>Arpino</placeName>
natus est.
<note>Simple note.</note>
</p>
</div>
<div type="translation" xml:lang="fr">
<head>Traduction française</head>
<p>Cicéron naquit à Arpinum.</p>
</div>
</body>
</text>
</TEI>
2. Directory layout
Place both files in the same directory:
/my-tei-demo/ cicero-sample-tei.xml cicero-tei-demo.tex
(If you use subfolders, adjust the path given to \xmlload accordingly.)
3. Create the ConTeXt file
Create a file named: cicero-tei-demo.tex
\setuppapersize[A5]
\setupbodyfont[latin-modern]
% Load the TEI file:
\xmlload{cicero}{cicero-sample-tei.xml}{}
\starttext
% 1. Title of the Latin edition:
\chapter{\xmltext{cicero}{/TEI/text/body/div[@type='edition']/head}}
% 2. Latin paragraph with a simple semantic mapping:
% <persName> -> small caps
% <placeName> -> italics
% <note> -> footnote
\sc{\xmltext{cicero}{/TEI/text/body/div[@type='edition']/p/persName}}
\space in
\em{\xmltext{cicero}{/TEI/text/body/div[@type='edition']/p/placeName}}
\space natus est.\footnote{%
\xmltext{cicero}{/TEI/text/body/div[@type='edition']/p/note}%
}
\blank[big]
% 3. Title of the translation:
\subject{\xmltext{cicero}{/TEI/text/body/div[@type='translation']/head}}
% 4. Translation paragraph:
\xmltext{cicero}{/TEI/text/body/div[@type='translation']/p}
\stoptext
4. Run the example
From your terminal:
context cicero-tei-demo.tex
The resulting PDF will contain:
-
a chapter title (taken from
<head>), -
a short Latin text where
<persName>and<placeName>are typeset differently, -
a footnote extracted from
<note>, - a French translation section.
5. What this demonstrates
This compact example illustrates the ConTeXt-native XML workflow:
- TEI stores structure and semantics (names, places, notes, translation).
- ConTeXt loads the XML file and selects nodes using its XML interface.
- ConTeXt controls typesetting (fonts, spacing, notes, headings).
This approach scales to larger TEI projects, including:
-
verse structures (
<lg>,<l>), -
richer inline markup (
<persName>,<placeName>, etc.), -
more elaborate routing via
\xmlsetupsand\xmlflush, -
scholarly additions such as critical apparatus (
<app>,<lem>,<rdg>).
6. Next step (recommended)
For maintainability, avoid hardcoding each element selection in the main document. Instead, define XML setups to route elements to ConTeXt macros in a declarative way (see the earlier progressive examples on this page).
8. What this example demonstrates
- TEI stores structure + semantics (verses, notes, translation).
- Lua reads TEI as a tree .
- ConTeXt handles:
* chapter heading, * line-by-line typesetting, * footnotes, * bilingual layout.
- The pipeline is modular:
TEI → Lua → ConTeXt → PDF
- A real project would add:
- namespaces,
-
multiple
<lg>groups, -
persons/place indexing (
<persName>,<placeName>), -
critical apparatus (
<app>,<lem>,<rdg>), - parallel typesetting for facing-page editions.
For testing, learning, or experimenting with TEI → ConTeXt processing, the following sample files and repositories are helpful
- TEI Guidelines examples – Minimal and pedagogical TEI documents published by the TEI Consortium.
- eeditiones / tei-examples (GitHub) – A curated collection of TEI examples: poetry, prose, critical editions, scholarly annotation.
- sample-tei-xml-files (GitHub) – Simple TEI files suitable for testing Lua parsers and ConTeXt workflows.
- TEI by Example – A structured tutorial introducing TEI step by step, with exercises and downloadable files.
- BVH TEI Manual (Université de Tours) – Comprehensive French documentation on TEI encoding used in scholarly early-modern editions.
- EpiDoc – TEI subset for epigraphy, inscriptions, papyri and classical texts.
Note — Many repositories are published under open licences (MIT, public domain). Please check licence conditions before reuse.
Further reading
The following resources provide authoritative and in-depth documentation on XML processing, Lua integration, and scholarly workflows in ConTeXt.
- Hans Hagen, ConTeXt MkIV / LMTX Manual
https://www.pragma-ade.nl/general/manuals/
- Hans Hagen, XML in ConTeXt
https://www.pragma-ade.nl/general/manuals/xsteps-s.pdf
- ConTeXt Garden — XML processing
https://wiki.contextgarden.net/XML_processing
- ConTeXt Garden — TEI XML
https://wiki.contextgarden.net/TEI_xml
- ConTeXt Garden — LuaMetaTeX
https://wiki.contextgarden.net/LuaMetaTeX
These documents describe the native ConTeXt approach to XML processing, based on setup definitions and controlled flushing, without external parsers.
See also
A more pedagogical introduction to TEI + ConTeXt (in French) is available in the ConTeXt Wikibook:
→ Travailler avec XML-TEI dans ConTeXt (which is : How to work XML-TEI within ConTeXt)
Adeimantos 11:38, 29 November 2025 (UTC)