Much of our understanding of the ancient world comes from writing, which means it comes from texts written on papyrus, stone, clay, and metal. However, many ancient artifacts that contain writing are too damaged to read. Fortunately, (AI) can be programmed to understand how humans process images and language and then decipher what is written on a document with about 70% accuracy, compared to 30% accuracy without AI.
One of the tools used is Ithaca, a deep neural network for recovering damaged texts and chronological and geographical attribution. Scholars enter transcriptions of the text into a software interface. The programming team measures reaction times during the transcription process to understand which words, characters, and passages are harder or easier to understand. This creates a network that can provide an accurate reading of the text. Ithaca’s neural network architecture is called the transformer and it uses an attention mechanism to weigh the influence of different parts of the input on the model’s decision-making process. The attention mechanism considers the sequence and position of the characters to estimate the text.
For the task of restoration, instead of providing historians with a single restoration hypothesis, Ithaca offers a set of the top 20 decoded predictions ranked by probability. This first visualization facilitates the pairing of Ithaca’s suggestions with historians’ contextual knowledge, therefore assisting human decision making. This is complemented by saliency maps, a method used to identify which unique input features contributed the most to the model’s predictions, for both the restoration and attribution tasks.
Why does this matter? For the same reason that a text document is easier to navigate than a photocopy. A photograph of a text is a lot less useful to scholars than a searchable text. AI enables the text to be searchable. Through the use of AI, historians can more accurately reconstruct the past, aiding in our understanding of the ancient cultures that ultimately led to our own.