AI and the Roman Empire

NSF-powered researchers are using machine learning to uncover previously unattainable information from human history

By Heather Masson-Forsythe

The legacy of the Roman Empire can be seen throughout science, art, architecture, systems of government, networks of cultural exchange and language. Unfortunately, much of the information about the ancient Roman and Greek civilizations was lost during the cultural and political turbulence of the Dark Ages following the fall of the Roman Empire. Uncovering this nearly 2,000-year-old history can reveal the roots of Western civilization and lead researchers to a more expansive and complete understanding of historical human behavior and ingenuity

Many texts referenced by classical Roman and Greek thinkers remain lost, but hope rises from the ashes of the eruption of Mount Vesuvius in A.D. 79, which famously decimated the Roman city of Pompeii along with other nearby Roman cities such as Herculaneum. In the 1700s, more than 800 papyrus scrolls were uncovered from Herculaneum, but they more closely resembled logs after a bonfire than documents of historical significance. Attempts to unroll the scrolls resulted in a complete loss of the document, but if they could be read, the collection of texts from before the Middle Ages could be more than doubled.

Ancient scroll
Credit: The Vesuvius Challenge
A Herculaneum scroll that carbonized by the heat of the volcanic debris from Mt. Vesuvius.

Thanks to the ingenuity of NSF-supported researchers, there may be new opportunities for the preservation and analysis of these scrolls and other historic artifacts using modern computing technologies. 

Using computers to 'read' ancient manuscripts

In 1999, Brent Seales, a computer science professor at the University of Kentucky, was part of an NSF-funded research project focused on developing new digital libraries from aging and damaged portions of the Cottonian Collection at the British Library. This collection of documents from the 1400s to 1800s contains manuscripts of great historical, philosophical, religious and artistic significance. The project included developing new illumination methods and processing techniques to digitally restore and enhance manuscripts damaged by fire, water and aging.

In 2006, Seales began a new approach to heritage science — the study of cultural and natural heritage — in which he developed methods to use computerized tomography scans to read unrollable ancient scrolls. Seales developed revolutionary software for locating and mapping 2D surfaces (the layers of the scroll) within a 3D object (the rolled up scroll itself).

X-ray scans enable the fragile documents to be imaged in slices, nondestructively, moving through the scrolls' layers of rolled papyrus. By 2015, Seales' team had applied these techniques successfully to read sections of a document that contained metal ink and dated between the first and fourth centuries. These images can then be virtually reconstructed into their flat surface area. In 2018, machine learning was integrated into the process to detect ancient ink from these reconstructed images. 

“Segmentation of the papyrus”
Credit: The Vesuvius Challenge
Segmentation is the mapping of sheets of papyrus (“segments”) in a 3D X-ray volume. The community has built various tools to do this.

The 'Vesuvius Challenge' 

Unfortunately, like the scrolls themselves, the ink on the Herculaneum scrolls is carbon-based, not metal-based, so X-rays pass straight through the layers and don't produce a signal, making it invisible using previous techniques — or at least invisible to humans. AI, however, is particularly useful for seeing and amplifying patterns that humans do not.

Computer vision, the area of AI that enables computational understanding of visual input like photos or videos, can help researchers identify image elements they cannot see on their own. Early in 2023, the Seales team released a machine learning algorithm that was able to identify subtle ink detections in fragments of 3D scroll X-rays.

Scroll fragments
Credit: The Vesuvius Challenge
Researchers train ML models on fragments with known ink data. These trained models are then applied to full scrolls. From a fragment (a) researchers obtain a 3D volume (b), from which they segment a mesh (c), around which they sample a surface volume (d). They also take an infrared photo (e) of the fragment, which is aligned (f) with the surface volume, and then manually turn into a binary label image (g).
These ancient Roman scrolls have already been waiting thousands of years to be read again.  To expedite scaling and model optimization, the team made the data and models publicly available and with two Silicon Valley investors launched the "Vesuvius Challenge," a race to read at least four passages from a full Herculaneum scroll — with prize money at the end. The crowdsourced challenge creates an opportunity to bring new perspectives and ideas to the research and also offers participants a new educational opportunity to help finish a story hundreds of years in the making.

In fall 2023, it was announced that a computer science undergraduate student was able to use machine learning to decipher the first word from these scrolls: "purple," a color symbolic of wealth and power in ancient Rome. This demonstrates the potential these techniques have to interpret these scrolls and other damaged historical documents, too.

Seales and his collaborators continue to apply interdisciplinary approaches to overcome the diverse and unique challenges posed by historic artifacts. The team received an NSF Mid-Scale Research Infrastructure award in 2021 to advance these approaches, including materials characterization, advanced multimodal imaging and cyberinfrastructure for processing large-scale datasets. Of note is the team's collaboration with Kentucky's Mammoth Cave site, the longest cave in the world, to explore the many human uses of that area over millennia. The work advances new cross-disciplinary discoveries essential to ongoing conservation and education, as well as fundamental scientific advancement in areas like data science and AI.

Hear more about this NSF-supported discovery:

About the Author

Heather Masson-Forsythe
Heather Masson-Forsythe
AAAS Science & Technology Policy Fellow

Heather Masson-Forsythe is a AAAS Science and Technology Policy Fellow at the National Science Foundation. She is on detail in the Office of Government Affairs office where she oversees the AI and Directorate for Computer and Information Science and Engineering (CISE) portfolios. Heather holds a Ph.D. in Biochemistry and Biophysics and her science communication portfolio includes blogs and articles, vlogging, and podcasting. She is a winner of Science Magazine’s “Dance Your Ph.D.” contest, and her viral science communication videos have earned her recognition in Forbes, NPR, International Business Times, and more. While at NSF she has managed crafting CISE impact stories, and supported multiple funding programs and inter-and-intra-agency working groups. She also currently serves as the Executive Editor for the AAAS Science & Technology Policy blog and podcast, Sci on the Fly.