Computing antiquity

Welcome to the Computing Antiquity Database site, curated by Jacob P.B. Mortensen (New Testament Studies, Aarhus University). The Computing Antiquity Database is designed to be your go-to resource for accessing and analysing ancient Greek texts in computer-readable formats. We aim to make these texts easily accessible and suitable for both basic and advanced algorithmic analysis. Our mission is to bridge the gap between ancient literature and modern technology, providing a comprehensive and user-friendly database for scholars, students, and enthusiasts alike.

The Computing Antiquity Database hopes to lend a helping hand to academic efforts in the digital analysis of ancient texts. The main issue of many DH/computational projects is the acquisition of proper digital texts. Opensource corpora are available at various places online through the mostly voluntary efforts of different parties. For example, the Perseus Project have made a massive effort to collect a wide variety of Classical texts, and The First1KGreek project fills the gaps of the missing texts from Perseus, striving to make available a collection of texts from Homer until 250 AD. Other projects, such as The Diorisis Corpus and the GLAUx corpus have also tried to make digital texts available online. However, one of the main problems is the various formats used in the different corpora. Here, at the Computing Antiquity Database, we have supplemented the base of these corpora with other sources pertinent to the specific interests of our field of research: an opensource critical text of the Greek New Testament by the Society of Biblical Literature, the texts from The Online Critical Pseudepigrapha, which cover Jewish and Christian apocryphal/pseudepigraphal texts, a "stable" digitized version of Henry Barclay Swete's edition of the Septuagint, as well as a single text from Attalus.

Our database houses a vast collection of texts spanning various genres. Each text is digitized and pre-processed to ensure accuracy and ease of use in digital contexts. Whether you are a scholar, student, or enthusiast, our platform provides a variety of tools that can help you explore and analyse these timeless works.

Our main effort has been to provide ancient Greek texts ready for computation from the Classical era, Hellenistic times and the centuries preceding and giving rise to the New Testament texts. We are aware that our database is not all-encompassing, and we openly confess that our work builds upon the hard work from many other passionate people engaged in this flourishing field. However, we hope that future projects will help expand, develop and improve the (quality of the) texts and the annotations. Additionally, we do not provide regular, book-like versions of the ancient texts with nice chapter and paragraph subdivisions; these can easily be found and accessed digitally on sites like Perseus or First1KGreek, or as ordinary books in the Loeb series. Our focus is digital texts ready for computational purposes, as well as providing a glimpse of what NLP processing can yield for a curious eye. Therefore, we have also supplied each text with different annotations, achieved via the openly available models by Jacobo Myerston.

Key features of our database include:

Comprehensive Collection:
- Access a wide range of texts from renowned authors such as Homer, Plato, Sophocles, Herodotus, Plutarch, Philo, Josephus and Lucian.
Search and Filter:
- Easily find specific texts or comparable genres using our search and filtering options.
Downloadable Formats:
- Download .txt-files of the texts in various versions (e.g. joined and full-stop separated) for offline analysis and integration with other software.
- Ready access to annotations by an NLP model:
  - /annotations (annotated from the -joined.txt files)
    - .csv - .csv ("," = integer) with three columns, header = 1) ID - matches a given token 2) TOKEN - given word within a text & 3) the various functions/attributes from the NLP-analysis.
      - ID-lemma.csv
        Purpose: Identifies the lemma (base form for a given morpheme), ie. ἀπενίψατο –> ἀπονίπτω
      - ID-upos.csv
        Purpose: Maps words to their grammatical categories based on the Universal Dependencies (UD) framework. Assigns tags like NOUN, VERB, ADJ, ADV, PRON.
      - ID-ner.csv
        Purpose: Identifies and categorizes named entities within the text. Recognizes entities such as PERSON among others.
      - ID-stop.csv
        Purpose: Flags stop words (common words with low contextual meaning) to improve text processing efficiency.
      - ID-dot.csv
        Simply identifies punctuation - not limited to only full stops, but also the raised dot (typically used as a (semi-)colon in modern punctuation - ·/·). Nifty for adding context in sentence segmentation.
    - ID-conllu.conllu - as per the guideline from Universal Dependencies, a file, in which various NLP data is aggregated, suitable to corrections, which can then be used for training models.
User-Friendly Interface:
- Navigate our intuitive platform with ease, whether you are conducting detailed research or casual reading.
Community and Support:
- Join a community of like-minded individuals and access a growing community of engaged experts thrilled to provide new research ideas and future collaborations.

The Computing Antiquity Database hopes to bridge the gap between ancient texts and modern technology, empowering users to uncover new insights and deepen their understanding of ancient Greek literature. Explore the Computing Antiquity Database today and embark on a journey through the literary treasures of the past.

How to cite

Mortensen, J. P. B., Beyer, M. A. and Fog, B. V. 2025. Computing Antiquity: an open-source resource for accessing textual versions of Ancient Greek texts along with examples of computational analysis. https://computing-antiquity.au.dk