Beta maṣāḥǝft TEI-XML Data
DOI: 10.25592/DANO-01-001
Authors
Liuzzo, Pietro Maria, Reule Dorothea, Solomon Gebreyes Beyene
Hiob Ludolf Centre for Ethiopian Studies
Alsterterrasse 1, 20354 Hamburg
Abstract
The project Beta maṣāḥǝft: Manuscripts of Ethiopia and Eritrea collectively encodes, in an open and collaborative research environment, TEI XML data. The data described here is the TEI XML as edited by the project contributors. Details are given of the management of the data, its current status and ways to access it in additional formats.
Keywords
Ethiopic, TEI, Manuscripts
Basic Information
Field of Research | Ethiopian Studies |
Publication date | May 26, 2019 |
Location of Study area | Ethiopia, Eritrea Horn of Africa, Middle East |
Data Type | XML |
Link to Dataset | 10.25592/uhhfdm.132 |
Related Article | |
Licence/Copyright | Creative Commons Attribution-ShareAlike 4.0 |
Is Data Open or Closed | Open |
Brief Description
The project Beta maṣāḥǝft: Manuscripts of Ethiopia and Eritrea (Schriftkultur des christlichen Äthiopiens und Eritreas: eine multimediale Forschungsumgebung) is a long-term project funded within the framework of the Academies' Programme (coordinated by the Union of the German Academies of Sciences and Humanities) under survey of the Akademie der Wissenschaften in Hamburg. The funding will be provided for 25 years, from 2016–2040. The project is hosted by the Hiob Ludolf Centre for Ethiopian and Eritrean Studies at the Universität Hamburg. It aims at creating a virtual research environment that manages complex data related to the predominantly Christian manuscript tradition of the Ethiopian and Eritrean Highlands.
The dataset discussed here in any of its versions is closed and stored at an arbitrary moment in time, and it can be used freely in the understanding that it is a version of a dataset in an ongoing project which may never be completed. Completed records are marked as such, but are still subject to change if new information becomes available. The dataset will be updated regularly, at least once a year, until the project ends. Anyone is free to use a version or the latest. The full history of each file is preserved in the corresponding GitHub Organization https://github.com/BetaMasaheft.
This dataset, together with its alternative representations detailed below, is the first entirely open and collaboratively curated dataset about the Ethiopic and Eritrean manuscript tradition, and represents an ongoing effort to provide scholars with freely accessible, highly curated and vetted information about this manuscript tradition. International standards are used to allow as easy as possible interchange and interoperability.
The data is curated by the project team and all its collaborators as XML (eXtensible Markup Language). All other formats available via the web application (https://betamasaheft.eu ) are derived from this, with scripts which are also freely available (https://github.com/BetaMasaheft/BetMas). The XML vocabulary used is that of the Text Encoding Initiative (TEI) and the use of the TEI modules is documented by the project Schema generated from the ODD (One Document Does it all). The ODD and the generated schema are available at https://github.com/BetaMasaheft/Schema and linked from each individual file, for the user to be able to validate them individually.
All the data is managed in GitHub, in an openly accessible organization, https://github.com/BetaMasaheft. The application available via https://betamasaheft.eu serves this TEI / XML data from an eXist-db instance, which serves alternative formats, which are detailed at http://betamasaheft.eu/apidoc.html for all the APIs and http://betamasaheft.eu/lod.html for the RDF which is maintained in parallel.[1]
The Schema regulates a TEI format which is far from canonical but allows a faster data entry process. Among the derivative formats available there is also a more canonical TEI, which can be downloaded from each entry or obtained using the XSLT script available with the application code.
The aim of this long term project is to encode descriptions of manuscripts in existing catalogues, as well as to build critical editions of the text contained, serve images of the manuscripts and especially give the scholars of Eritrean and Ethiopian Studies a research environment where they can contribute data and reuse data with contributions from others in the ways they prefer.
The organization of the data in the stored version is functional to its navigation and not meaning-carrying. The .zip file was created by directly compressing the GitHub repositories cloned on a machine. The preferred way of reuse of this data is from its latest version or one of the commits to any branches in the relative repositories in the GitHub organization, and the dump is provided as an additional static version of all the edited data at a given point.
There are seven collections, one for each core entity for the project:
- Manuscripts
- Works
- Narratives
- Institutions
- Places
- Persons
- Authority files
Each of these will be detailed in the following sections, but all of the files are TEI files and all of them validate to the same Schema and are encoded according to a set of guidelines which are available at https://betamasaheft.eu/Guidelines/[2]. These Guidelines, also freely available and collaboratively edited (https://github.com/BetaMasaheft/guidelines) detail all the encoding practices of the project with examples, which will not be described here.
Manuscripts
Each directory inside Manuscripts, roughly and without any guarantee, represent an institution or place of conservation of the manuscripts catalogued. This organization is not meaning-carrying and is left to the taste and practice of the encoders. In this way, the actual choice of names and placement of the manuscript records largely reflects their taste, so that they may find the records they are editing more easily. Each folder in this first level of organization may contain subfolders for collections, which again are instrumental only. For example, the large directories called ES (Ethio-SPaRe),[3] EMIP (Ethiopic Manuscript Imaging Project) and EMML (Ethiopic Manuscripts Microfilm Library) contain data which has been automatically converted from legacy data and are thus organized differently.
What is relevant for the actual organization of the data is all inside the XML datasets. For example, the pertinence of a manuscript to a repository or to one or more catalogues is encoded respectively in the <repository> element and in a special <listBibl> element.[4]
Each file represents one known manuscript, although, especially in the large collections named above, there may be yet-unidentified doublets. See https://betamasaheft.eu/Guidelines/?id=manuscripts .
Works
The files in this folder are organized by the numeric part of their name, which is also a numeric identifier guaranteed by the project to identify one Textual Unit.[5] The subfolders are only used to facilitate processing. See https://betamasaheft.eu/Guidelines/?id=works.
Narratives
Narrative Units differ from the above works as they do not refer to a fixed sequence of words, but to a structure only. See https://betamasaheft.eu/Guidelines/?id=narrativeUnits.
Institutions
The Institution folder contains, organized by number, one record for each Repository in which Ethiopic manuscripts are currently or were previously found. See https://betamasaheft.eu/Guidelines/?id=institution.
Places and Persons
The folders with Places and Persons originally contained mainly records automatically produced from the data collected in the analytical Indexes of the Encyclopaedia Aethiopica. These Indexes and their tabular representations were curated by Eugenia Sokolinski. New records have been created as needed during the project work for yet undescribed, or ambiguous places and persons.[6] Only some of these have been edited, which have relations to the Manuscripts or Works which have been edited given attention in the project until this point.[7] See https://betamasaheft.eu/Guidelines/?id=places and https://betamasaheft.eu/Guidelines/?id=persons.
Authority files
All the XML files, including the ones in this directory can be assigned keywords. This folder contains a taxonomy.xml file, which details one possible hierarchy for them, while each file is a TEI file which can represent this concept. See https://betamasaheft.eu/Guidelines/?id=authority-files.
Acquisition of the data
Data is edited by the project team and contributors directly in XML, using any editor which has the capability of validating. The project suggests the use of Oxygen XML editor or Atom with the linter-autocomplet-jing package for validation, but the choice of the tool for encoding is left to the contributors. The project team and contributors edit the data using branches on the repositories for the data and make Pull Requests in GitHub which are merged into the master upon review.
Limitations of the data
The data is offered as is and without guarantee. It is edited daily in the GitHub repositories linked above by a large team of contributors, who change and enrich them constantly. The Guidelines are also adapted as needed and enriched accordingly. Thus the data is and will always remain a work-in-progress and is meant to be used like that by any interested party. For example transliterations are unified according to the guidelines only in those files which have been carefully edited, but all the included legacy data (see above) will have different and possibly inconsistent transliterations. Although there are thousands of records, only a limited number have been encoded deeply. Each file contains a specific licence statement with further details if necessary. Where not otherwise stated, the licence is CC-BY-SA 4.0. The web application also provides fully formatted citations of the entries and exportable metadata to facilitate correct citation of individual files within the dataset.
References
Liuzzo, P. M. 2019. Digital Approaches to Ethiopian and Eritrean Studies, Supplement to Aethiopica, 8 (2019).
Liuzzo, P., D. Reule, E. Sokolinski, Solomon Gebreyes, D. Elagina, D. Nosnistin, E. Dal Sasso, and J. Gnisci Beta maṣāḥǝft Guidelines (2018) <http://betamasaheft.eu/Guidelines/>, accessed 30 April 2018 (DOI: http://dx.doi.org/10.25592/BetaMasaheft.Guidelines).
Nosnitsin, D. 2010. Ethio-SPaRe. Cultural Heritage of Christian Ethiopia: Salvation, Preservation and Research. Second Mission. November-December 2010. Report (Hamburg: Universität Hamburg, 2010), 1–33.
Solomon Gebreyes Beyene and P. M. Liuzzo 2018. ‘Encoding and Annotation of Ancient Places in Ethiopia’, Comparative Oriental Manuscript Studies Bulletin, 4/1 = Linking Manuscripts from the Coptic, Ethiopian and Syriac Domain: Present and Future Synergy Strategies (2018), 121–142.
EAe. Uhlig, S. and A. Bausi, eds, 2003. Encyclopaedia Aethiopica, I–V (Wiesbaden: Harrassowitz, 2003).
[1] See Liuzzo 2019 where the entire data flow is described, as well as the parallel maintenance of the RDF data.
[2] The Guidelines Application, inclusive of the data is also stored for reuse at 10.25592/uhhfdm.126 . Liuzzo et al. 2018
[3] Nosnitsin 2010.
[4] See the Guidelines cited above for more details about this.
[5] For a definition see https://betamasaheft.eu/Guidelines/?id=definitionWorks.
[6] Uhlig and Bausi 2003
[7] On places see Solomon Gebreyes Beyene and Liuzzo 2018