Skip to main content
Skip to main content

New Software to Digitize Persian and Arabic Materials

July 08, 2019 Roshan Institute for Persian Studies

OpenITI AOCP Hero

Mellon Foundation grant will support development of Persian and Arabic digitization software with two-year, $800k grant.

College Park, Md.—The Andrew W. Mellon Foundation has awarded a two-year, $800,000 grant to the University of Maryland, College Park (UMD) to develop technology expanding digital access to a vast trove of literature from the pre-modern Persian and Arabic world.

"The Open Islamicate Texts Initiative (OpenITI) Arabic-script OCR Catalyst Project (AOCP)" will support the development of user-friendly, open-source software capable of creating digital texts from Persian and Arabic books. 

Matthew Thomas Miller, assistant professor in the Roshan Institute for Persian Studies in the School of Languages, Literatures, and Cultures in UMD's College of Arts and Humanities (ARHU), leads an interdisciplinary team of researchers, including David Smith, associate professor in the College of Computer and Information Sciences at Northeastern University, Sarah Bowen Savant, professor of Islamic history at Aga Khan University (AKU) in London, Maxim Romanov universitätassistent für digital humanities at the University of Vienna along with Raffaele Viglianti, research programmer in the Maryland Institute for Technology in the Humanities. 

"We realized that there was work being done separately in different areas to create tools for digitizing Persian and Arabic documents," said Miller, "but there wasn't a lot of communication across fields and these new advances were not making their way into the hands of users." 

To date, the development of digitization software has primarily focused on Latin-script languages, and in many cases requires specialized knowledge to run. Existing Persian and Arabic digitization tools fall short on accuracy and are often prohibitively expensive for academic and public users. 

Through the creation of new digitization tools for Persian and Arabic, the project team hopes to challenge traditional narratives of Islamic cultural history. The staggering number of Persian and Arabic texts produced in the pre-modern period make it humanly impossible to read them all, even in an entire scholarly lifetime.

"These thousands of unread texts are a potential treasure trove," said Miller. "Until we really get into it and begin digitizing and then examining them, we won't know what we might find or what new narratives and histories might unfold."

The grant will also fund two postdoctoral fellows and two graduate fellows in computer science and Middle Eastern studies. 

"Our goal is to grow capacity throughout these fields," Miller said, "which means both training scholars of Persian and Arabic in digital methods and computer scientists in the particularities of Persian and Arabic documents."

The Andrew W. Mellon Foundation has supported other UMD projects in the intersections of cultural studies and digital humanities, including the African-American Digital Humanities Initiative, Documenting The Now Phase 2 and Books.Files, all led by faculty and staff in ARHU. UMD is a widely-acknowledged leader in not only digital humanities, but also Persian and Arabic studies. The Roshan Institute for Persian Studies, which supported an earlier version of this project, is a premier center for the study and teaching of Persian culture in the U.S.

"Centering the digital humanities through the lens of cultural studies is among the college’s top priorities," said Bonnie Thornton Dill, ARHU dean and professor. "As scholars and teachers, our goal is to offer researchers and students new modes of inquiry that expand and deepen their abilities to understand and interpret our increasingly multicultural, global society."

Image: With the new Mellon Foundation grant, OpenITI’s digitization platform, CorpusBuilder, developed in collaboration with the SHARIAsource project of Harvard Law School, will be transformed into a full digital text production pipeline.