How do I get data if my country doesn’t publish any?

Spotlight photo by Paul Green on Unsplash

In many countries public data is limited, and access to data is either restricted, or information provided by the authorities is not credible. So how do you obtain data for a story? Here are some techniques used by reporters around the world.

Look outside your country for data about your country

We live in an interconnected, digitised world, so if a government’s data is restricted or unreliable, there are plenty of other places to look — and the first place is outside of the country itself.

Many other countries and international bodies will be collecting data about what’s going on in a particular nation. The World Bank, for example, has data that covers multiple countries and regions, and the UK’s Department for International Development is just one of many countries’ government departments that publishes data on its spending in other countries.

You can also use other countries’ freedom of information acts to request data and documents about their interactions with your own country. Check the GIJN’s list of countries where foreigners can make FOI requests for more, and Emilia Díaz-Struck provides some useful tips on requesting records in another country.

Drafting an FOI for another country is also a task that generative AI can be quite useful for, as you specify what country you’re asking for advice on and ‘teach’ it what you already know about the FOI regime.

Investigating US Government — and Trump Administration — Influence in Your Country
Data in plain sight: scraping data

A second technique is scraping: this is the process of automating the gathering of information published on the web and putting it into a structured data format.

For example you could scrape the job ads in a particular sector to get an idea of wages or discriminatory practices, or scrape judgements, announcements or media reports to get the sense of what is happening, how often.

Compiling data for stories

A third technique is compiling data manually: it might be going through annual reports and inputting the same piece of data from each into a spreadsheet, or using media reports.

Tips for Investigating Hate Crimes and Violence When Government Data Sources Fail

The Bureau of Investigative Journalism project Dying Homeless used this approach, for example, because there was no official record of how many people were dying while homeless. And Rachel Chitra used this approach for her project on hate crime.

Collaboration can be key to realising a successful data compilation project. It might be partnering with a research project, as Cerosetenta did with Forensic Architecture (and Bellingcat) to map police violence, or partnering with local residents, as ProPublica did with its Black Snow investigation into Florida air quality.

It might be collaboration with teams in other parts of the country, as in this US investigation into “a generation of Black men lost to overdoses”, or with teams in other countries, as the ICIJ have with its Pandora Papers and medical devices project. It might be collaboration with journalism schools, to access extra manpower.

Partnering with polling organisations or groups that can survey their members, such as professional bodies, industry groups and unions, is another approach that news organisations and reporters use to compile data.

The Collaborative Data Journalism Guide provides more tips on collaboration, and tools like ProPublica’s Collaborate and the ICIJ’s Datashare have been created to try to make collaboration easier.

Turn text documents into data

There may be ‘public data’ that doesn’t look like data. Text documents such as speeches, policies and press releases, for example, can be treated as data.

You might analyse the frequency with which certain terms are used, how much that has changed over time, and so on.

Old-fashioned AI techniques like machine learning can be used to classify each document, identify changing sentiment or what clusters of themes the text reveals (topic modelling). 

The AI tool Google Pinpoint, for example, will extract entities such as people, places and organisations from a document set, and tell you how many documents contain each one.

And generative AI tools are making text and document analysis easier too: Swedish journalist Inas Hamdan used ChatGPT to summarise coverage of Sweden on Al Jazeera as part of her investigation, for example.

Leaks and threat modelling  

Leaks are obviously another potential source. This comes with an obvious risk not only to the journalist but to the source as well, so it’s important to assess risks carefully before considering this route. 

Threat modelling can be a very useful skill in assessing and addressing risks more generally. For example if you’re scraping you may want to make sure that your scraper’s IP address is not connected to you or your organisation.

Why every journalist should have a threat model (with cats)
Report on the lack of data

Finally, remember that the lack of data, or poor data, can be the story itself. So if nothing else is practical, look for people affected by the lack of data, those campaigning for change, or those working around the problem. Any of these could be the basis for a feature story exploring the issue.

Examples include the Marshall Project reporting that the FBI’s data was “too unreliable to tell” what crime was like the previous year; concerns that wildlife was “disappearing in the dark” because data had not been collected on the issue for some time; and Mapping Makoko, a project in Lagos where “putting themselves on the map was a way for the community to state their right to exist”.

Open data advocacy is one of the techniques listed in the research paper Scrape, Request, Collect, Repeat: How Data Journalists Around the World Transcend Obstacles to Public Data, which provides further insights into tactics used to tackle a lack of data.

 •  0 comments  •  flag
Share on Twitter
Published on March 25, 2025 02:14
No comments have been added yet.