St John’s TEI Project

Over the past 465 years, St John’s College Oxford has acquired and preserved a large variety of special collections, aiding research and interest in the manuscripts, early printed books, and personal libraries the College holds. The latest development in St John’s management of these special collections is a project to digitise the manuscripts.

Manuscript digitisation is a twofold project: digitising images of each part of the manuscript on the one hand, while converting the metadata from the printed catalogue to a machine-readable format for online publication on the other. Due to the progression of the Covid-19 pandemic, the image digitisation aspect has been temporarily put on hold, while the catalogue conversion is going full speed ahead. Thanks to funding from the Thompson Family Charitable Trust, conversion of our most recent catalogue of the Western Medieval Manuscripts at St John’s College is being converted to TEI, to be incorporated into the Medieval Manuscripts in Oxford Libraries online database.

St John’s is not the first Oxford college to convert their manuscript catalogues to TEI: several others have already converted their metadata, which is available on the online database. The bulk of manuscript information on the database however is from manuscripts in the Bodleian’s collections. You can narrow your search for individual manuscript collections on the homepage of the database, as well as via the collection part of the side bar when your search results have loaded. 

History of Manuscript Cataloguing at St John’s College

The first attempt at cataloguing the western medieval manuscripts at St John’s was undertaken  by Edward Bernard in the seventeenth-century. At this time, there were just under 200 manuscripts held by the College, and Bernard’s documentation of these manuscripts corresponded to their current organisation in the library. Many of the shelfmarks as documented by Bernard are still visible in the manuscripts themselves, despite their later reordering by College President William Derham (president 1748-57). Derham reorganised the manuscripts to account for their growth throughout the years, and his system of numbering the manuscripts in relation to their size remains in place. Each manuscript number is usually found in black ink on the fore-edges of the manuscript, a relic from the days of spine-inward shelving. This numbering has continued, and was catalogued by Henry O. Coxe in 1852, and most recently Ralph Hanna in 2002.

Image of printed catalogue

The latest development in this history is the TEI conversion of Ralph Hanna’s catalogue. Hanna’s catalogue documents all of the western medieval manuscripts at St John’s College, but the catalogue has only been available for many years in print format. Each manuscript entry has been converted to a word document format, and researchers have been able to access individual manuscript entries over email upon request. The TEI conversion will open up access entirely to Hanna’s catalogue, allowing each catalogue entry to be accessible to all who search for it on the online database. 

What is TEI and how is Hanna’s printed catalogue being converted? 

TEI is an XML standard, a markup language for encoding text into a human- and machine-readable format. In terms of converting a printed catalogue to TEI, the metadata is encoded, or marked up, into a machine- and human- readable format which allows for each catalogue entry to be incorporated into a searchable online database (i.e. Medieval Manuscripts in Oxford Libraries). 

The task of converting the printed catalogue is primarily based against the word document format of each entry for ease of use, but a first step is always checking the word document against the printed catalogue. Particular attention is paid to any numbers (for instance, with the dimensions of the manuscript), to ensure all the converted metadata is accurate. Occasionally, the actual manuscript is consulted to check against the printed catalogue. One instance of this was with SJC’s MS 117. In the printed catalogue, MS 117 is described simply as Terence’s Comedies. However, the contents section is broken into items, with the detail suggesting individual plays as each item, and yet the printed catalogue does not give individual titles to each item. This becomes more unusual when compared to MS 87, a similar manuscript of Terence’s Comedies which does give individual titles to each item. A consultation of MS 117 proved that each item did indeed have a title, which could then be added to the TEI record. 

MS-117-Terence-Eunuchus
MS 117: clear rubrication presents the title as Eunuchus, despite it appearing title-less in the printed catalogue.

In terms of the catalogue structure, TEI has a hierarchical nested structure which must be used in the conversion. The structure used for each entry in the printed catalogue is of course not the same, leading to a need to first translate the different sections over before marking up the actual metadata. Within the overarching manuscript description element of TEI, one of the first sections is the contents element – this naturally corresponds directly to the contents section of the printed catalogue. However, where the printed catalogue includes the information in paragraphs, leaving the reader to navigate the conventions as set out in Hanna’s introduction, TEI breaks each section into locus, rubrication, incipit, explicit, author, title, and notes. Understanding the record thus becomes much easier for those unfamiliar with the conventions used in the printed catalogue. The next large element of TEI is the physical description, which incorporates elements from almost every other part of the printed catalogue. Beginning with the secundo folio (which is included as part of the header in the printed catalogue), the physical description in TEI also includes the support and layout structure, as well as details of collation, decoration, script, binding, and so on.

TEI-XML-Screenshot
A screenshot of the Oxygen XML software used to encode the TEI [for MS 34].

Another important element in TEI is the history of the manuscript – referring to the date and place of origin (included in the header of the printed catalogue) and the provenance, which has its own section at the end of the printed catalogue entry.

The goal of TEI conversion is thus to create an online record which translates the original printed catalogue as faithfully as possible, whilst also ensuring it meets the TEI standards and merges into the structure required. 

As well as opening up access to the manuscript catalogue to all, a significant advantage of the TEI conversion is the linked data. People, works, and places are all marked up with their key attributes – a number corresponding to their authority record, allowing the linking of the same place, person or work, within all the manuscripts on the online database. This enhances research efforts within Oxford’s medieval manuscripts, as we can now, for instance, search for the name of a manuscript donor, and find all the manuscripts in Oxford [that are available on the online database] that were donated by this person. 

Final Result

The end result of the St John’s TEI project will be Hanna’s full catalogue of the college’s western medieval manuscripts available online for all to view and search. The entries are currently being uploaded in monthly batches, and can be found at this link: https://tinyurl.com/yxatasqh.

Later on in the wider digitisation project, images will be taken of all the manuscripts and will be available online, allowing researchers from across the globe to access both the images of our amazing manuscripts, along with their metadata.

4 thoughts on “St John’s TEI Project

Leave a Reply

%d bloggers like this: