By Kimberly Patch and Philip Bennett June 3, 2018
Part 1: Journalism Cornerstone – and Roadblock
Interviews are an essential building block of journalism. Journalists rely on interviews to discover stories, collect facts, verify claims, assemble accounts of events, reveal voices, draw portraits of people and places, and check power.
But despite the fast pace of technological advances that seem like surefire improvements to the interview process, the methods many journalists use to conduct, record, process and publish interviews haven’t changed in decades.
These limitations affect accuracy, discourage in-depth interviewing and keep audiences from a key potential benefit: greater transparency. Enabling the public to explore and share the source material behind the news could deepen trust and increase engagement.
In a survey of dozens of journalists, we found pervasive frustration about the inefficiency and expense of managing the interview process, and a pattern of improvised hacks to address common problems. The lack of an efficient interview system costs journalists time and expense, makes accurate quotation harder, and risks the loss of important material hidden in partly digested interviews.
We set out to find a better way.
This report details what we learned from interviewing journalists, how we came up with the technologies to build a more practical interviewing workflow and publishing system, and what that system looks like. We also provide a guide to technologies and products that hold potential for improving interviewing practices.
Our first step was to interview 45 journalists on the minutia of their reporting processes: recording, transcribing, organizing notes, sharing with colleagues, writing, and publishing. The reporters ranged in age from their 20’s to their 70’s, and included 12 Pulitzer Prize winners. They worked for major US newspapers, magazines, online publications, in television, and in radio.
There was a strong consensus among the journalists that managing interviews is more difficult than it should be. In general, we found a wide gulf between the tools and methods that reporters say they need to make the most of interviews, and the tools and methods they actually use.
The gap can be an obstacle to quality journalism. Interviews reveal “nuance and motivation and feelings and hidden agendas,” said Tim Golden, editor at large at ProPublica and a former senior writer on the investigative staff of The New York Times. But, he said, “it just hasn’t been practical in my professional lifetime to transcribe all the time.” As a consequence, “you just lose things,” he said. “Accuracy suffers, depth suffers.”
For some journalists, the obstacles can discourage recording. The recorder is ”distracting,” said Rob Copeland, a reporter who covers hedge funds at the Wall Street Journal. And often, “I don’t have time to listen back to myself”
For those working with continuous deadlines online, the obstacles can also discourage conducting interviews. Caroline Fairchild, now managing news editor at LinkedIn, remembered her first journalism job aggregating stories: “There were times when I would pick up the phone to fact check something or interview someone to get more color for a story… and my editor would come over to me and ask me what I was doing,” she said. Interviewing “was definitely viewed in certain roles as a waste of time.”
Overall, we identified four key pain points for journalists:
The attention gap. Taking extensive interview notes can interfere with attention to the interviewee and directing the interview. The difficulty increases the risk of missing information and/or opportunities to ask unplanned questions. At the same time, fiddling with recording technology can be distracting for everyone involved.
Time and expense of processing interviews. Automated speech recognition software can’t flawlessly transcribe an interview. Manually transcribing can be prohibitively expensive, tedious and time-consuming. So can correcting an automatically generated transcript. But the risks of misquoting and missing the best quotes go up without a searchable recording and verifiable transcript.
Memory load. Recollecting an interview precisely is difficult even in the short run. On long projects, memory is taxed as the volume of material grows, especially if interviews are only partially transcribed. It’s also difficult to keep disparate types of source material – written notes, audio, images and documents — organized and connected.
Comparing and sharing. Most journalists, whether working alone or in teams, have few ways to effectively search and mine interview content or collaborate with colleagues across multiple interviews.
To develop solutions to these pain points, we surveyed and tested many existing technologies: recording devices, including smartphone software; notetaking by pen and paper, digital device, tablet, and computer; automatic speech transcription, including Dragon and web-based methods; tools for manual transcription; tools for manual and digital organization, including browsing, searching and markup; tools for editing audio and video; and publishing tools.
Our goal was to design a practical workflow and publishing system that allows reporters to do more interviews, easily and cheaply, and creates better ways to share those interviews with audiences. We built or adapted tools that aim to help journalists – and also help the cause of journalism transparency, credibility and audience engagement.
We developed a practical workflow – here are the the key pieces:
- A foolproof, easy-to-use, low-profile recording tool that allows the reporter to section and/or mark up an interview on-the-fly.
- An efficient manual transcription process that keeps the transcription connected to the audio, which enables visual navigation through temporal media like audio and video. The process also integrates two types of automatic transcription so that reporters can experiment with and use automatic transcription when it makes sense.
- Search and annotation processes that give the reporter quick access to any piece of information, an ability to search across notes, ways to markup information at every step, and ways to extract information based on markup.
- An interface that allows different types of notes, including text, audio, pictures and PDFs, to be seen and searched together.
- Tools that make it practical to sync text with audio and video. The system makes it possible to work with synced media, add annotations, and export synced transcripts, media and annotations for publication.
The workflow includes software that we recommend as current best practice, but reporters can keep using preferred tools for any step – recording, transcribing, organizing, or writing.
We also developed a system for publishing and sharing text, audio and video among journalists and with audiences. We call the publishing system InSite (for the “interview site” we’ve developed at Duke). The publishing system is open-source and based on a WordPress template. It enables content creators to upload, adjust and download synced materials. It makes interactive, annotated interviews available on computers and mobile devices.
The InSite publishing system enables audiences to use interview transcripts to search, navigate and share text, audio and video down to the sentence level. Excerpts can be shared on social media, emailed, or linked to from another website. When an excerpt is shared it includes a direct link to the video and text that shows it in the full context of the interview. The system can be used to publish interactive transcripts privately or publicly.
We’ve tested our workflow and publishing system by building the Rutherfurd Living History site. The technologies we used to build the publishing system are detailed in the site colophon.
We distilled findings from our technology research into a list of technologies to watch, which notes the technologies we use in our current workflow, and details technologies that could become key pieces of an efficient workflow in the future.
We also worked with the PBS series FRONTLINE to publish interactive transcripts based on the InSite system. These applications are continuing to give us insight into the value of interactive interviews that can be annotated by content creators, have advanced search features that support discovery, and can be shared and linked to at the sentence level.
A spoiler alert: although our system includes options for automatic transcription, automatic speech recognition is not perfect and requires a proofing step. Automatic transcription is practical depending on the quality of the recording and preferences for correcting mistakes versus efficient manual transcription. Given this reality, the workflow includes options for switching back-and-forth between manual and automatic transcription.
In the following two parts, we describe our research and conclusions in greater detail. Part 2 is a detailed look at reporters’ needs, wishes, and struggles with managing interviews Part 3 is a detailed look at how we applied our research to building InSite.
Part 2: Wrestling with the Interview Process
Here’s what we learned in our interviews with 45 journalists who span a wide range of ages and work for major US newspapers, magazines, online publications, television and radio stations.
Most of the journalists we spoke with record interviews regularly. Some investigative journalists tended to record only final interviews, and some daily reporters tended to record only important interviews.
Most of the journalists used a personal shorthand when taking handwritten notes, abbreviating common words and starring important passages. Many typed notes when doing interviews over the phone, but most took handwritten notes when interviewing in person. It was common to add timestamps to notes to mark important bits when recording.
Many of the journalists took handwritten notes even when recording. Some even took extensive handwritten notes when recording. These served as a sure way to access a quote that might be needed quickly, as an index, and as a backup. In some cases reporters said they preferred taking handwritten notes as part of the process of hearing and understanding the interview.
Many journalists said that they were reluctant to entrust the reporting to a recording device that could fail. Poor recording quality degrades interview quality and increases error rates. And the specter of outright equipment failure causes angst. “My greatest fear is that the technology that I’m not used to using will fail me,” said Allison Young, an investigative reporter at USA Today.
David Finkel, a Pulitzer-Prize-winning long-form narrative writer and editor at The Washington Post, said that he values the immediacy and accuracy of a recorded interview, but doesn’t trust that the technology is fail-proof.
“If I’m recording and it doesn’t work I just lost all the work,” said Finkel. Second, the recording process can be distracting. “I find myself paying attention to it – looking to see if the red light is on – wondering if the thing really is working,” he said.
Anything a reporter is doing at the same time as conducting an interview, whether fiddling with technology or taking extensive notes, can distract from listening and following up. Cut down on the distractions and reporters say they could listen more carefully, think of better follow-up questions, or probe certain areas more deeply.
At the same time, having an accessible audio recording can improve the reporting and writing process. Having a recording to check makes it easier to quote someone accurately and make sure quotations are not taken out of context. It preserves information that can lead to discovering new stories. “There’s a lot of stuff that I miss that I don’t realize I miss [until] I go back and listen to it,” said Alana Semuels, a staff writer at The Atlantic, reflecting a shared view.
Audio is also useful for gauging emotional context. “On occasion, I will go back and listen to a tape for things beyond just the words,” said Ken Armstrong, a Pulitzer Prize-winning investigative journalist and author and Senior Reporter at Pro Publica. It’s useful to know how long a silence stretches out or where there’s a catch in someone’s voice, he said. For example, “we asked [a] police detective a question that was emotionally wrenching for him and he waited a long time before he answered it – I wanted to know how long that silence was.”
Transcribing is tedious, time-consuming and expensive. Several journalists we interviewed used the same blunt words to describe the experience: I hate transcribing.
The journalists who transcribed full interviews said they spend lots of time doing so. “I think my editors would be shocked and appalled at how much time I spend transcribing interviews,” said Sari Horwitz, a Pulitzer Prize-winning journalist from the Washington Post, who transcribes full interviews for big projects. “It’s a pain to do the transcription and yet when I have it done it’s so useful,” she said. “I can quickly get [to a quote] and see it in context.”
Many journalists said they restricted the length of interviews and transcribed only portions of interviews because the transcription process is long and tedious. “Because of the pressure in newsrooms today, people don’t have time to tape and transcribe very much anymore,” said Mike McGraw, who was a longtime investigative reporter for the Kansas City Star, KCPT public radio and NPR.
Producing a full transcription of an hour-long interview takes two to six hours of a journalist’s time, or $60-$120 to hire someone else to transcribe, plus more time verifying quotes that will be used in the story. “One hour equals four hours – four times, or maybe longer – and that’s conservative,” said McGraw.
A partial transcription of an hour-long interview takes anywhere from 30 minutes to two hours. But triaging the transcription process means losing access to some source material – as memory fades, the portions of interviews that didn’t get transcribed become less and less accessible even to the reporter who carried out the interview.
Some untranscribed material may turn out to be needed later, especially for investigative projects. “You hear different things depending on what you know,” said Sarah Cohen, a Pulitzer Prize-winning journalist and Knight Chair at the Walter Cronkite School of Journalism at Arizona State University. “So, something that seemed unimportant a month ago suddenly becomes very important.”
The techniques the journalists described during the interviews tended to be improvised and vary by individual. Most were less efficient than the transcription machines with foot pedals of a generation ago. Some journalists listened to a digital recorder and typed the transcript into a computer program. Others downloaded recordings to a computer, used Windows Media Player, QuickTime or iTunes to listen, and typed the transcript into a separate window. This requires using the mouse to change the cursor focus every time there’s a stop/start.
Only a few journalists used software designed for listening and typing, such as Scribe Express, Scrivener and Transcriva. A few used web-based versions such as oTranscribe. None used foot pedals that attached to a computer, although several mentioned that they missed the old transcription machines that were controlled via foot pedals.
Several journalists had tried automatic transcription, but none used it to transcribe interviews. One journalist used a recording app that taps automatic transcription just for search purposes. Several journalists had tried and Dragon speech input software and one journalist regularly wrote stories using it, but didn’t use Dragon’s automatic transcription abilities.
A big challenge in making improvements is journalists’ reluctance to try new things, said Tyler Dukes, a public records reporter for WRAL News. It’s not that journalists don’t want improved tools, said Dukes. But any new way of working must “seamlessly integrate into their workflow, they have to immediately recognize what its benefits are, and it’s gotta be perfect – all the time,” he said, adding “I’m the same way”.
Journalists we interviewed said that if it were easy, they’d transcribe entire interviews, make them searchable, and use the transcripts to go back and listen to a quote. These abilities would save time and improve stories.
The raw material of an interview is a resource, said David E. Hoffman, a Pulitzer Prize-winning journalist and author. “I want to preserve it, intact – sense of it, color, everything – for when the moment I’m going to need it,” he said. “Often times it’s not for another year,” he added.
“Transcripts are better than my notes because they’re verbatim, and they’re complete”, said Armstrong. “I could do word searches in the transcript that I would not be able to do with my notes.”
“A reasonable transcription would be a heavenly, tremendous timesaver,” said Jim Steele, a Pulitzer Prize-winning journalist at Vanity Fair. “You could also go back and listen.”
The ability to transcribe quickly and easily “would allow me to just not miss anything,” said Maurice Chammah, a staff writer at the Marshall Project. “The result would be more bits of nuance here and there. It might not change the entire story but it would enrich it a lot,” he said.
A transcript makes it easier to put longer quote blocks in stories, Armstrong added. “I like long quote blocks to see the person’s thinking process, like thinking out loud. A lot of times I think those quotes can be very effective in narrative pieces. You’re not going to get those kinds of quotes typically out of pen and paper, but you are with recordings.”
Longer interviews can also give reporters a better understanding of the bigger picture, said Danny Vinik, an assistant editor at Politico. “I let [interview subjects] talk for longer so I can learn a little bit more about what’s going through their minds when they’re talking about this issue and go off on the tangents they go off on so that I understand the wide framework and issues at play.”
While some of the journalists we interviewed welcomed the idea of an accurate automatic transcription, others talked about the transcription process as being part of the writing process. “I’ve found that it’s really important for me to go back and listen to the interview… and to actually transcribe it myself, because I pick up on things and hear things very differently than if I were to outsource that work or rely on some sort of tech to transcribe it for me,” said Fairchild.
Working With Notes and Transcripts
The journalists we talked to used a range of techniques to organize written and typed interview notes and transcriptions.
It was common to use stars, underlining and timestamps to mark important quotes in written notes, and to use timestamps to indicate where something should be filled in later from the recording. A few journalists had more elaborate ways of organizing written notes, including a notebook index and page numbers that made it quick to find the notes for any interview, and a number and letter scheme that added a subject category to each quote.
Many journalists typed up written interview notes, and several said they found that process much less tedious than transcribing.
In typed interview notes and in transcripts it was common to use bold to mark important quotes and timestamps. A few journalists used spreadsheets to categorize important quotes and make them easy to sort and find.
Many of the journalists we talked to said they were slowed by a less-than-ideal ability to search across typed interview notes and transcripts. This feature is lacking in Word, the most common word processor used by our sample of journalists.
For journalists who used PCs, avoiding this stumbling block meant putting many documents into a single large Word file or sidestepping the problem by using Google Docs, which is free, readily available and allows for searching across documents. But because Google Docs is a cloud application, investigative reporters are hesitant to use it when managing confidential or senstitive notes. Although Mac users can more easily search multiple Word documents using the Spotlight search, some Mac users also put multiple interviews in single large Word files.
Some journalists who dealt with large volumes of interviews kept master outlines. Others had precise methods of structuring folders, and updated the structure as a project progressed.
There was general dissatisfaction with available organization and search tools and methods.
“It would be fabulous to have notes and transcripts that are easily searchable, that are able to be indexed, that perhaps could be kept in separate files but searched across files,” said Young.
On a longer-term project, being able to quickly find something you are remembering “can save hours,” said Brad Heath, an investigative reporter for USA Today. Even saving five minutes on deadline is “quite significant,” he said.
Multi-media compounds the problem. “The big thing that’s missing – it’s half of source material – you can search and report on text, but not audio, video and images,” said Cohen.
There are relatively expensive audio/video tools that allow for phonetic searches in audio, but they return false negatives and false positives. You can’t be sure that you’re finding everything. These tools aren’t easy to learn, and take considerable computing power. The journalists we spoke to who use them are generally film editors.
Several journalists said they wanted to search across all sources at once: notes, transcripts, source audio/video, PDFs, slideshows, and photos. “I need to see them in front of me. I need to see everything,” said Dana Priest, a Pulitzer Prize-winning journalist at the Washington Post and the Knight Chair in Public Affairs Journalism at Philip Merrill College of Journalism at the University of Maryland.
Journalists of varying ages regularly print notes so they can highlight and use sticky flags.
There is value in going over notes, said Fairchild. What’s needed is a way to “know what in the moment of the interview I was excited about,” or “track thoughts I was having while the interview was going on,” she said. “Anything that could actually bring me back to the interview and what I experienced then would be valuable to me.”
We also heard that it’s important for some journalists, especially those who do investigative work, to avoid using cloud services or other technology that requires an Internet connection and whose privacy is impossible to guarantee.
Privacy often conflicts with convenience, which is why so many journalists use cloud solutions that allow them to share in-process material across their own devices and with other reporters even as they express misgivings about doing so. Others actively avoid cloud software for this purpose. “I’m not comfortable putting anything like notes or drafts in a place that’s not under my control,” said Heath.
Sharing and Publishing
Even when there’s a full transcription of an interview, the text and accompanying audio or video are generally accessible only to the journalist who did the interview. It doesn’t become source material that can be shared among journalists or viewed by the public.
More than 100 interviews can go into an investigative reporting project, book, or hour-long documentary. Only a small percentage of that material appears in the book or documentary. The act of writing a story curates this material – but just for the purposes determined by the author of that publication.
Out of scores of people interviewed for a story, “maybe 20% are people I’m going to end up quoting,” said Mike McGraw.
The audio recordings of the interviews that inform a story are also rarely used as source material published alongside the final product. Although the Internet makes it possible to hyperlink audio to a written quote, the process is time-consuming.
T. Christian Miller, a Pulitzer Prize winner and senior reporter at ProPublica, has experimented with linking a given quote to audio hosted on SoundCloud. “It’s a tremendous lot of work”, he said. It entails editing the quote snippet and knowledge about coding, HTML and CSS. And because it’s just the standalone quote, “it’s not contextualized,” he said.
It can serve readers to show how a quote sounded and to see it in the context of a full transcript. “There’s something powerful about hearing a quote in a person’s voice instead of just reading it,” said Andrew Joseph, a reporter at STAT.
Showing source material lends credibility, said Sanette Tanaka, Senior Product Designer at Vox media. “It’s so quick for people to yell ‘that’s taken out of context,’” she said. “If people don’t trust the article and that the journalist is not taking it out of context… and they can go to the source material if they want – I think it would lend another air of credibility to the finished piece.”
Publishing transcripts would allow reporters to “cover more ground, be more accurate and at the same time avoid misrepresenting or taking out of context something that somebody said,” said Golden. This is especially important for the “deeper, more substantial, more public-interest-oriented reporting that a relatively small number journalists are still doing,” he said.
Publishing full transcripts connected to audio or video takes transparency one step further. The usefulness of publishing transcripts is nuanced for journalism, however. Interviews or portions of interviews might be off the record. A source might be confidential or speaking on condition of anonymity.
Some journalists we interviewed had mixed feelings about publishing transcripts.
Some people may not understand the nuances of reporting and the relationships that journalists have with their sources, said Bill Adair, Knight professor of journalism and public policy at the Sanford school for public policy at Duke University, and creator of Politifact. “I make really good sausage – I may not want people to see me making the sausage,” he said.
Other reporters worried that readers would get bogged down in long transcripts. “I’m torn because there’s a certain role of transparency, which I think is useful,” said Vinik. But publishing something like a 5,000-word transcript of a policy interview might just cause confusion, he said. “Short interviews on the Hill… where quotes more easily get taken out of context – I can see a larger role for that there.”
Part 3: Developing InSite
Our survey of journalists enabled us to compile a wish list of tools that would make interviewing more efficient and productive:
- An idiot-proof, single-button way to start recording
- An unobtrusive way to mark interview highlights and section the recording on the fly
- A way to transcribe that’s faster and less tedious than today’s processes
- A transcript that includes marks made during and after recording, is properly paragraphed and punctuated, and is connected to the recording
- Better ways to browse and search text, audio and video
- Apps that work on both major phone platforms: Android/iPhone, and software that works on both major computer platforms: PC/Mac
- The option to not use cloud services because privacy is critical for some reporters
- A workflow that doesn’t disturb existing processes that work well
If full transcriptions were easier and less tedious to do, reporters might do more of them. If a full, accurate transcript were connected to a recording or video, reporters could browse and search source media. Reporters might find facts or insights they missed while conducting an interview. These abilities could speed and deepen the process of building a story. And a transcript connected to audio – or portions of the transcript connected to audio – might also be shared with other journalists and/or audiences.
Based on the reporters’ wish list, we designed the InSite workflow and publishing system that enables interactive transcripts.
We developed and refined our system in several steps:
We researched and tested scores of products, from recording aids to automatic transcription. We tested the technologies that look promising on paper, and sent hundreds of emails to technologists explaining what’s needed, asking questions, and requesting features.
We compiled a long document that detailed many different technology choices aimed at journalists as well as similar technologies aimed at people such as court reporters who carry out related tasks.
We found many useful features in products already on the market, but not everything that we needed.
We also needed the features to coalesce into an efficient workflow and publishing system rather than a series of tools that would each have to be learned and used separately. And we needed the system to be flexible enough that reporters could continue using preferred tools for any step and then easily plug an existing recording or transcript into the workflow.
Recording and marking on the fly
Smart phones have made many technologies more mobile and easier to use, including recording. High-end smartphones have some advantages over dedicated recorders:
- Easy file transfer
- Recording apps that allow for sectioning and marking a recording on the fly
- Bluetooth connectivity, which allows the phones to be controlled with remotes
- Excellent microphone quality
The ability to control a recording remotely means journalists don’t have to touch a recording device, risking distraction, to section or mark an interview. Excellent microphone quality lets reporters get the most out of any automatic transcription tool. These turned out to be key pieces of our workflow.
One challenge in finding an app that could take advantage of these qualities was addressing the Achilles’ heel of smart phones – battery life. The key to overcoming this challenge was recognizing that the screen uses the most battery power. A recorder with a mode that keeps the screen mostly black would function much longer without depleting the battery.
We encouraged a smart phone app maker that already had several key features in its Android/iPhone recorder to add a couple more. The recorder had easy ways to transfer files and an efficient way to mark and section text on the fly.
Here’s what we asked for, and what the app maker delivered:
- A recording option that would make the screen mostly black to save battery life and just show a counter and section number
- A way to connect a remote button so that a journalist could section and mark the recording on the fly without having to reach for the phone
Transcription – better software
We tried many manual transcription tools, focusing on those that had three crucial features: recorder control and text entry in a single window, keyboard shortcuts for controlling the recorder, and the ability to speed up or slow down a recording without changing pitch.
We also looked for a tool that would make it easy to jump from one place to another in a recording, retain any sectioning and marking a reporter does while recording, and allow for additional markup on the reporter’s computer. We also wanted the option to use automatic speech recognition.
In addition, we wanted to link the text transcript to the audio in a format that enabled publishing a transcript linked to media at the sentence level. Linking each sentence rather than a random fragment would make it easier to follow along with a transcript, and possible to share excerpts that start at the beginnings of sentences.
There are many tools designed for captioning that link a transcript to audio. But these assign time codes based on a certain amount of time rather than by sentence. There are also services that sync existing transcripts to audio. Some of these don’t offer a way to export the linked transcript and audio so they can be published on a website. There are several web services that offer exports at the word level, but it takes extra work to turn those into sentence-level files.
We found this key piece of the InSite puzzle in an unusual place.
People who are dyslexic have trouble processing written text. In order to take notes more easily, it’s useful to be able to mark and manipulate audio.
Audio Notetaker is a program designed for dyslexic students. It depicts audio as a series of rectangular bars that provide a nonlinear way to picture and process audio. The format also speeds working with audio.
The software enables students to put the rectangular bars in sections and mark them by color. Users can put pictures in a column beside the rectangular bars so pictures can be lined up by section. The program offers two columns for text. It also offers state-of-the-art transcription – the ability to slow down or speed up the recording without changing pitch, and filters to reduce buzz and background noise.
We realized that journalists working with audio or video could make good use of these capabilities, too.
The text columns work well for transcription and notes, and put the text right beside the corresponding rectangular chunks of audio. The pictures column is useful for inserting a snapshot of a scene, person, whiteboard, or handwritten interview notes . Section and text colors help organization. And sections of a given color, for instance, those marked as off-the-record, can be set to be omitted when a reporter makes a copy to share with someone else or publish.
We identified features that could be added to Audio Notetaker to further improve the software for journalists. The Audio Notetaker developers responded by improving many details, including adding timestamps, increasing automatic transcription options, and adding an export option that saved us from having to do extra formatting before publishing. Note: the export option is in beta as of June, 2018.
Audio Notetaker is available for both PC and Mac and is computer software, not cloud software, so it met our needs no matter which common platform a journalist wanted to use.
Audio Notetaker is not open-source, however. We’re continuing to encourage developers to put these features into open source software that a community can adjust and develop so reporters can have more good choices.
Transcription – practical automatic speech recognition
Turn a machine loose on the audio of your recorded interview and the transcription process is painless – but more or less filled with mistakes, depending on factors like recording quality and background noise. It’s not the Holy Grail, but today’s automatic speech transcription can have a role in improving transcription for journalists.
We tested many automatic speech recognition systems for the desktop and for the Web.
Automatic speech recognition engines for the desktop include Dragon for PC and Mac, and those built into the operating systems of Windows and Mac OS. Speech engines also power speech input for smart phones.
There are two ways speech recognition can work – in real time, where words appear as you speak, or by transcribing an existing recording from start to finish. All of the above speech engines work in real time. The Dragon engine also transcribes recordings.
There are three challenges to using speech engines to automatically transcribe interviews.
First, automatic speech recognition technology is sensitive to recording quality, so recordings must be done with a good microphone in relative quiet. Even with good recording quality, state-of-the-art transcription technology makes mistakes. To get an accurate transcript, there must be a proofing step comparing the automatic transcript with the audio. Some mistakes, such as mixing up phonemes like “pear”, “pair” and “pare” are easy to catch and usually don’t propagate misinformation. Others, like mixing up similarsounding opposites like “can” and “can’t”, are larger problems. Even when automatic speech transcription is used just to search text, misrecognitions can result in false negatives and false positives, and you can inadvertently absorb something that’s been transcribed incorrectly just by reading it.
Second, speech engines often don’t mark paragraphs. Automatic punctuation, if it’s present, is fairly terrible. The wrong punctuation can change meaning. Even accurate recordings require work to add and/or correct punctuation and paragraphing.
Third, while web-based automatic transcription services have proliferated, they have limitations. The costs of these services range from free (YouTube) to as expensive as human transcriptionists, who charge $1 to $2 per minute. Transcriptions also tend to be more expensive if they include time codes. Like desktop systems, the web offerings are sensitive to recording quality. A major drawback of web-based services is that they are not absolutely secure.
Some of the web-based services offer interfaces that allow the user to correct and/or search an automaticly generated transcript. But this requires that the recording continue to be stored on that company’s server, either temporarily or permanently.
We developed two automatic transcription best practices. Because there are times when automatic speech recognition might be good enough as a basis for transcription, and times when it is not, these are optional parts of the InSite workflow. The first option is based on Dragon, which doesn’t require cloud services, and the second option is based on the Speechmatics cloud service.
Both services are integrated into Audio Notetaker, which makes it possible to use automatic speech recognition without extra import/export steps to and from different tools. In both cases the automatic transcription will flow into sections following the way the reporter has organized the audio during or after the interview. This mitigates the automatic transcription paragraphing problem. Proofing an automatic transcription is still a lot of work and requires listening to the recording to catch some errors.
The integration also enables seamless switching back and forth between manual and automatic transcription so reporters can experiment with automatic transcription as its accuracy improves, and easily abandon it midstride if it’s not working well.
The two automatic transcription best practices are not open-source, however. We’re continuing to encourage developers to put these features into open source software that a community can adjust and develop. One of the technologies that is not likely to be ready anytime soon, but we are watching closely, is Mozilla’s open-source speech-to-text project.
Transcription – better hardware
Several of the journalists we interviewed remembered using dedicated transcription cassette players with foot pedals. The foot controls freed the hands for typing.
The modern-day equivalent is a foot pedal peripheral that connects to a computer. Several on the market are aimed at transcription and/or gaming, but most of the journalists we interviewed were unaware of this option. Foot pedals that connect to a computer will work with any program that uses keyboard shortcuts. The pedals can be programmed to any key combination or string of keys.
We added this to our workflow as an incremental improvement to the manual transcription process.
One way to get more use out of interviews is to connect each sentence of text in the transcript to the corresponding audio or video, so journalists and audiences can look at both at once.
An accurate transcript linked to audio or video makes browsing media quicker by making the text a navigation tool for temporal media. It lends organization by providing places to mark and structure to build a visual map. It enables searching audio and video using the words of the transcript. And it makes it more possible to share the source material.
There are four potential audiences for source interviews:
- The journalist who did the interview
- The journalist’s colleagues within the organization, including fellow reporters, editors and fact checkers
- Journalists at other organizations
- The public, especially those deeply interested in learning more about a subject
We found that the key to publishing a transcript connected to audio or video is WebVTT, a emerging standard for formatting a transcript that includes timecodes that link text to audio. A website reads WebVTT as code, but the format is also easy for people to read and work with.
To export linked files in this standard format as the last part of our workflow, we needed a video player that would support it. The open source player Able Player provided this support. With it, we had close to what we needed to build a workflow and website that would support interactive transcripts.
We developed an open source WordPress template for publishing interactive transcripts that connected to Able Player and videos hosted on YouTube. We also had to add key functionalities that were missing from the WebVTT standard: ways to specify paragraphs and subheadings as we linked the transcript to media at the sentence level.
Our publishing system back-end enables content creators to add an interactive transcript to the site almost instantly by pointing to a video address and uploading a transcript and supporting content that can include links, documents, maps, images and videos. Content can be adjusted on the site and downloaded as text or WebVTT, creating a closed loop with the formatting system. This makes site adjustments and corrections efficient.
Because it’s based on a WordPress template, a basic interactive transcript site can be set up quickly. This makes it easier to host a closed site for a group of journalists, or even use a tool like Flywheel to host interactive transcripts locally on a Windows or Mac computer.
Kim Patch is Lead Researcher for the Rutherfurd Living History program. She’s a user interface expert, writer, editor, software developer and musician. firstname.lastname@example.org
Philip Bennett is the Eugene C. Patterson Professor of the Practice of Journalism and Public Policy at Duke and director of the Rutherfurd Living History Program. He is Special Projects Editor of the PBS documentary series FRONTLINE. email@example.com
David Graham, Rutherford Living History Research fellow and staff writer at The Atlantic, contributed reporting.
We’ve separately published a guide and and a list of technologies to watch.
For a step-by-step guide to using InSite, see InSite: A Guide for Recording, Transcribing and Publishing Interviews.
For a summary of the promising technologies we are monitoring, see Technology to Watch.