Times Insider explains who we are and what we do, and delivers behind-the-scenes insights into how our journalism comes together.
Data journalism is not new. It predates our biggest investigations of the last few decades. It predates computers. Indeed, reporters have used data to hold power to account for centuries, as a data-driven investigation that uncovered overspending by politicians, including then-congressman Abraham Lincoln, attests.
But the vast amount of data available now is new. The federal government’s data repository contains nearly 250,000 public datasets. New York City’s data portal contains more than 2,500. Millions more are collected by companies, tracked by think tanks and academics, and obtained by reporters through Freedom of Information Act requests (though not always without a battle). No matter where they come from, these datasets are largely more organized than ever before and more easily analyzed by our reporters. At the same time they are more available to our sources, and the proliferation of accessible data in and of itself can lead politicians, companies and government officials to misinterpret it or use it without proper context to back their own agendas.
So while The Times has the best data experts, investigative editors and graphics professionals in the business, our news reporters are increasingly choosing to level-up their data skills as well in order to find stories hidden in the numbers, organize their reporting and check government conclusions. The demand for this knowledge has been so great that our digital transition team now runs a training program to help reporters work on these skills. And more groundbreaking articles are coming.
[Read more about how the data training was developed and download the training materials.]
Below, five reporters from across our news desks describe how they have used data in their reporting. (Hint: It’s not always displayed front and center in a splashy graphic; data is now seamlessly woven into almost everything we do.)
Karen Zraick, breaking news reporter
In November, a Brooklyn councilman posted a message on Facebook that left me scratching my head. He wrote that small businesses were suddenly removing their signs, amid a panic about the city issuing fines to stores that lacked permits for their signs and awnings. The post immediately attracted hundreds of likes and comments, many from immigrant store owners who were up in arms.
The city’s Buildings Department said it was merely responding to a sudden spike in 311 complaints about store signs. But who complains about store signs? These weren’t safety complaints — someone was reporting that the merchants lacked the right permits, which you could discover only through a very complicated process on the city’s website. This seemed to be someone with an agenda. Some locals suspected a sign company was behind it.
But it was hard to get a sense of the scale of the problem just by collecting anecdotes. So I turned to NYC Open Data, a vast trove of information that includes records about 311 complaints. By sorting and calculating the data, we learned that many of the calls were targeting stores in just a few Brooklyn neighborhoods. On one busy avenue, 25 stores in a two-block stretch had received complaints, which the city was required to investigate. If a violation was discovered — even if it was only related to missing paperwork — the minimum fine was $6,000.
The data allowed us to zero in on the hardest-hit areas and made our report much more comprehensive. A month after the article was published, the City Council passed a two-year moratorium on new violations and created an interagency task force to address past fines. The law also required the city to provide more training to help small business owners navigate the permit process, and to try and figure out who is behind the 311 calls.
That was the one element missing from the data we got. Because of privacy rules, a 311 caller’s identity is hidden from the public database. We have submitted a Freedom of Information Act request for that information, and await the city’s response.
John Ismay, At War reporter
When I became a reporter, I thought I’d never use a spreadsheet again.
I had used them when I was in the Navy and when I worked for a defense contractor afterward, but I often screwed them up. So I avoided them whenever I could.
As a journalist, I did the same. At first.
Then I started taking on projects that, in time, became too unwieldy to handle with paper printouts, manila folders and web browser bookmarks. I had to find some way to manage tons of information.
Now I have multiple spreadsheets for almost every article I work on.
Earlier this year, I read through an unredacted investigation report I obtained about a friendly fire incident in Vietnam. Spreadsheets helped me organize all the characters involved and the timeline of what happened as the situation went out of control 50 years ago. I also used them to save all the relevant location data I later used in Google Earth to analyze the terrain, which allowed me to ask more informed questions of the survivors. This year I’ve learned even more skills that help me to quickly find story lines in sprawling databases — and to be confident of my analysis.
Eliza Shapiro, education reporter for Metro
After I found out in March that only seven black students won seats at Stuyvesant, New York City’s most elite public high school, I kept coming back to one big question: How did this happen? I had a vague sense that the city’s so-called specialized schools once looked more like the rest of the city school system, which is mostly black and Hispanic.
With my colleague K.K. Rebecca Lai from The Times’s graphics department, I started to dig into a huge spreadsheet that listed the racial breakdown of each of the specialized schools dating to the mid-1970s.
We quickly realized that the schools had lost nearly all their black and Hispanic students over the last decade in particular, and we were determined to figure out why.
We analyzed changes in the city’s immigration patterns to better understand why some immigrant groups were overrepresented at the schools and others were underrepresented. We mapped out where the city’s accelerated academic programs are, and found that mostly black and Hispanic neighborhoods have lost them. And we tracked the rise of the local test preparation industry, which has exploded in part to meet the demand of parents eager to prepare their children for the specialized schools’ entrance exam.
To put a human face to the data points we gathered, I collected yearbooks from black and Hispanic alumni and spent hours on the phone with them, listening to their recollections of the schools in the 1970s through the 1990s. The final result was a data-driven article that combined Rebecca’s remarkable graphics, yearbook photos, and alumni reflections.
Reed Abelson, Health and Science reporter
In covering health care, I’ve discovered that many of the most compelling stories take powerful anecdotes about patients and pair them with eye-opening data. Over the last 15 years, data has come to play an increasingly important role in my articles because there is so much more information available about hospitals, health insurers and doctors to analyze — from the rampant increases in insurance premiums to the comparative burdens of rising health care costs for employers and individuals to the various ways in which mergers have transformed the industry.
In a recent article, I used data from researchers at the University of California, Berkeley, to show how hospital mergers had helped lead to higher prices in various communities. And I created my own spreadsheet to look closely at the experience in a single state.
Being comfortable with data and spreadsheets allows me to ask better questions about researchers’ studies.
Spreadsheets also provide a way of organizing sources, articles and research, as well as creating a timeline of events. By putting information in a spreadsheet, you can quickly access it, and share it with other reporters.
Maggie Astor, Politics reporter
As a political reporter dealing with more than 20 presidential candidates, I use spreadsheets to track polling, fund-raising, policy positions and so much more. Without them, there’s just no way I could stay on top of such a huge field.
One of my tasks lately has been keeping track of who has qualified for the first Democratic debates. I have a big spreadsheet that includes every relevant poll and the percentage of support it shows for each candidate. (Along with my colleagues Matt Stevens and Denise Lu, I used that sheet to put together a piece in April on who had qualified so far.) I set up conditional formatting to highlight numbers that are 1 percent or higher, which is the Democratic National Committee’s current threshold. That lets me see at a glance which candidates are falling short.
The climate reporter Lisa Friedman and I used another spreadsheet to track the candidates’ positions on several climate policies. I used yet another one for a fun piece last month in which the graphics editor Alicia Parlapiano and I looked at the previous political experience of every president.
I’m using an absolutely enormous one right now — dozens of tabs, tens of thousands of rows, “if” statements nested 15 deep — for a piece on gun politics that I’m hoping to publish later this month.
Follow the @ReaderCenter on Twitter for more coverage highlighting your perspectives and experiences and for insight into how we work.