How to find a poem in 200-year-old newspapers

Nov. 1, 2014, 8:34 a.m. ·

for%20web_DSC1404.jpg
Computer science professor Leen-Kiat Soh goes over the program code with University of Nebraska-Lincoln students Spencer Kulwicki and Manas Varam Datla.

Listen To This Story

Liz Lorang is on the hunt for poetry. Not poetry from today, but from 200 years ago. And she’s looking for it where it published the most in the 19th century: in American newspapers. She’s hoping a team of computer scientists can help her find all of it.


If you’ve ever looked at microfilm, you know it can be kind of a pain to use even for a few minutes. Imagine going through a century’s worth of newspapers…all on microfilm.

“I noticed it in my eyes and my back, and sort of thinking about my posture over time as well. I remember the chairs being so horribly uncomfortable,” Lorang said. When Liz Lorang was a graduate student at the University of Nebraska Lincoln, she did exactly that for eight hours a day, for a year and a half. Lorang was doing research for her dissertation about 19th century poetry. She wanted to find as many examples of poetry in newspapers between 1835 and 1880. In a year and a half, she cataloged about 3000. But that’s just a tiny fraction of what was published in the entire 19th century.

Poetry as part of daily life

“We tend today to think of it as sort of stodgy old fashioned or esoteric material that only academics engage in. or maybe the kind of you thing you find in hallmark cards at Target,” said Amanda Gailey, assistant professor of English at UNL and 19th century American literature expert. Except for American Life in Poetry, a weekly column Ted Kooser started when he was poet laureate, it’s hard to find poetry in wide circulation today.

  • Want to know where to find poetry in newspapers today? Read Ted Kooser's weekly American Life in Poetry column.
  • Want to see some of the newspapers where Lorang and Soh are looking for poetry? They're from the Library of Congress's digital newspaper archives Chronicling America.

“But in the 19th century when they didn’t have movies or television or for the most part even recorded sound, poetry was an important way for them to communicate thoughts, feelings, political opinions, you name it,” Gailey said.

Poems in newspapers weren’t just written by big names like Walt Whitman and Henry Wadsworth Longfellow. Anyone who was literate and knew how to write poetry could submit a poem to a newspaper.

Lorang keeps a personal collection of old newspapers by her office. “So this is a weekly newspaper from 1841 and the entire first column of the first page is full of poetry. 6 or 8 poems just in this single issue of the newspaper.”

Teaching a computer to see poetry

Lorang knew there was no way she — or any person — could find all the poems in the archives by themselves.

“Throughout that time I would go through the microfilm and was thinking that the sorts of cues that I’m looking for as a human reader, the computer should be able to do that same sort of work and be able to do it much more quickly.”

Lorang is now the digital humanities projects librarian at UNL. She turned to Leen-Kiat Soh, a computer science professor also at UNL, with her question. They’re working on building a program now with two UNL students.

Soh and Lorang don't want their program to read the newspapers. Most archives are already doing that using optical character recognition or OCR. OCR is what Google uses to let you search the texts of books online. But text searches only work if you’re looking for a specific word or phrase. Not when you're looking for a kind of text.

Soh and Lorang want their program to recognize the shape of a poem.

What the computer sees

“What inspires us is Liz and her students. When they first went through all these newspapers, there’s no way they could read every single page carefully. So they just visually, ah! This looks like a poem, and they zoom in, oh yeah, it’s a poem,” Soh said.

It’s kind of like the way we’re taught to recognize a poem in elementary school. It looks different from other things we read. Soh and Lorang’s program looks for clues like jagged edges instead of straight blocks of text, and more white space, representing line breaks and fewer words, to identify poetry. Right now, the project is still in its early stages.

“Most recently we’re averaging sort of 75 percent. So far the code is really good this thing is not a poem. It’s less good at saying this thing is a poem,” Lorang said.

Soh said that’s because, “human vision is really powerful. We know how to block things out , we how to filter out noises. But to teach computer vision is not easy.”

Eventually they want to the run the program on the 8 million pages archived in the Library of Congress’s digital newspaper collection. Lorang hopes that this project not only makes newspaper archives more accessible to academics, but also changes how we understand poetry’s place in American history — without microfilm.

“The reading that we do in literature courses, we’re exposed to maybe 100 poems that you might read," Lorang said, "we have a very different sense of the history of American poetry than what we get if we actually think about the fact that millions of poems were circulating and people were encountering them and many aspects in their daily life.”