Digital humanities

The book of numbers

How data analysis can enrich the liberal arts

IT ALL STARTED with a preposition. In 1941 Father Roberto Busa, a Roman Catholic priest, started noting down as many uses of the word “in” as he could find in the Latin work of Thomas Aquinas, a medieval theologian and saint. Eight years and 10,000 handwritten cards later he completed his linguistic analysis of Aquinas’s “interiority”—his introspective faith—at Rome’s Pontifical Gregorian University. By then he had a suspicion that his work could be done far more efficiently. He started hunting for “some type of machinery” to speed up his new project, recording the context of all 10m words written by Aquinas.

Father Busa’s zeal took him to the office of Thomas Watson, IBM’s chairman. Soon he had switched from handwritten cards to IBM’s punch-card machines, before adopting magnetic tape in the 1950s. In the 1960s dozens of full-time typists were involved. By 1980, when his team finally printed the “Index Thomisticus” in 56 volumes, they had spooled through 1,500km (930 miles) of tape. A CD-ROM containing 1.4GB of data came out in 1992, with a website following in 2005. The 97-year-old priest died in 2011. But not before he had initiated a new quest, to annotate the syntax of every sentence in the Index Thomisticus database.

Such is the creation story of the digital humanities, a broad academic field including all sorts of crossovers between computing and the arts. The advances since its punch-card…