How to Cite Data Sets in APA Format
Data sets form the empirical foundation of research across sciences, social sciences, and humanities. Whether you're citing publicly available research data, government statistics, survey data, or scientific measurements, understanding APA 7th edition data citation format ensures transparency, reproducibility, and proper credit for data creators while advancing open science principles.
Why Data Set Citations Matter
The movement toward open science and data sharing has made data sets increasingly important scholarly outputs. Researchers publish data sets in repositories, journals publish supplementary data, and government agencies release public data. These data resources enable replication studies, meta-analyses, and new research questions. Proper citation of data sets acknowledges the considerable effort required to collect, clean, and document data while allowing other researchers to access and build upon existing work.
APA 7th edition recognizes data as a citable scholarly product, with specific guidelines that emphasize findability through persistent identifiers like DOIs. Data citations should provide enough information for readers to locate and understand the data, including version numbers when datasets are updated. Understanding these principles helps you properly document data sources and contribute to the scientific community's data sharing infrastructure.
Basic Format for Data Set Citations
Data set with DOI:
Author, A. A. (Year). Title of data set (Version X) [Data set]. Publisher/Repository. https://doi.org/xx.xxxx
Data set without DOI:
Author, A. A. (Year). Title of data set [Data set]. Publisher/Repository. URL
Government statistical data:
Agency Name. (Year). Title of data/statistics [Data set]. URL
In-text Citation:
- Parenthetical: (Author, Year)
- Narrative: Author (Year)
Step-by-Step Instructions
Step 1: Identify the Author or Creator
Data set authors are the individuals or organizations who created, collected, or compiled the data. List all authors up to 20, following standard APA author formatting. For organizational or collaborative projects, use the organization name. For government data, use the specific agency that produced the data (e.g., U.S. Census Bureau, not just "U.S. Government").
Step 2: Determine the Publication Year
Use the year the data set was published or made publicly available, not necessarily when the data was collected. For regularly updated data sets, use the year of the specific version you accessed. If the data is continuously updated, include a retrieval date.
Step 3: Format the Data Set Title
Italicize the complete title of the data set and use sentence case. If the data set has a formal title, use it. If not, create a descriptive title that clearly identifies the data. Include version numbers in parentheses after the title if the data has versions.
Step 4: Specify [Data set]
After the title and version, include [Data set] in square brackets. This descriptor clarifies that you're citing data, not a publication about the data. For specific data types, you might use [Database], [Statistical data], or other descriptors.
Step 5: Include Repository or Publisher
Name the data repository, archive, or publisher where the data is housed. Common repositories include figshare, Dryad, Inter-university Consortium for Political and Social Research (ICPSR), Dataverse, or institutional repositories. For government data, the agency may serve as both author and publisher.
Step 6: Add DOI or URL
Include the DOI if available (format: https://doi.org/xx.xxxx). DOIs provide persistent access to data sets. If no DOI exists, include the direct URL to the data set's landing page. For regularly updated data, include a retrieval date before the URL.
Detailed Examples
Example 1: Research Data Set with DOI
Reference list:
Martinez, S. R., Chen, L., & Williams, K. (2023). Global climate measurements 2000-2022: Temperature and precipitation data (Version 2.1) [Data set]. figshare. https://doi.org/10.6084/m9.figshare.1234567
In-text citation:
(Martinez et al., 2023)
Standard research data citation with multiple authors, version number, repository, and DOI.
Example 2: Government Statistical Data
Reference list:
U.S. Census Bureau. (2020). American Community Survey 5-year estimates [Data set]. https://data.census.gov/cedsci/table?tid=ACSDP5Y2020.DP05
In-text citation:
(U.S. Census Bureau, 2020)
Government statistical data cites the agency as author with the data set name and access URL.
Example 3: Survey Data from Repository
Reference list:
Pew Research Center. (2024). American trends panel wave 128: January 2024 survey data [Data set]. Roper Center for Public Opinion Research. https://ropercenter.cornell.edu/data-set/pew-american-trends-panel-w128
In-text citation:
(Pew Research Center, 2024)
Survey data from research organizations cite the organization and repository where data is archived.
Example 4: Scientific Measurement Data
Reference list:
National Oceanic and Atmospheric Administration. (2023). Sea surface temperature anomalies: Global ocean data [Data set]. National Centers for Environmental Information. https://www.ncei.noaa.gov/data/sea-surface-temperature-anomalies/
In-text citation:
(National Oceanic and Atmospheric Administration, 2023)
Scientific measurement data from government agencies cite both the producing agency and data center.
Example 5: Genomic or Biological Data
Reference list:
Johnson, A. M., & Thompson, R. K. (2024). RNA-seq data for breast cancer cell lines (Version 1.0) [Data set]. Gene Expression Omnibus. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE123456
In-text citation:
(Johnson & Thompson, 2024)
Biological data sets include repository name (GEO, GenBank, etc.) and accession numbers in the URL.
Example 6: Continuously Updated Data
Reference list:
World Bank. (n.d.). World development indicators: GDP growth annual % [Data set]. Retrieved January 15, 2025, from https://databank.worldbank.org/source/world-development-indicators
In-text citation:
(World Bank, n.d.)
For continuously updated data, use "n.d." and include a retrieval date before the URL.
Common Mistakes to Avoid
1. Citing Analysis Papers Instead of the Data
If you used the actual data set, cite the data set itself, not just papers that describe or analyze the data. Both citations may be appropriate if you reference both the data and analytical methods.
2. Omitting Version Numbers
Data sets are often updated. Include version numbers (Version 2.0, v3.1) when they're provided to ensure readers access the same data you used.
3. Forgetting [Data set] Descriptor
Always include [Data set] or similar descriptor in brackets after the title. This clarifies that you're citing data, not a publication.
4. Not Including DOIs When Available
DOIs provide permanent access to data sets. Always use DOIs when they're provided by the repository, even if you have a direct URL.
5. Using Generic Database Names
Be specific about the repository or database. Use "figshare," not "online database." Use "ICPSR," not "data archive."
6. Incomplete Government Data Attribution
For government data, use the specific agency that produced the data (e.g., Bureau of Labor Statistics) rather than just "U.S. Government" or "Department of Labor."
7. Not Including Retrieval Dates for Dynamic Data
For data that changes over time (like stock prices, weather data, or regularly updated statistics), include the retrieval date to document when you accessed the specific values.
Quick Reference Guide
Essential Elements for Data Set Citations:
- Author(s) or organization that created/collected the data
- Publication year (or "n.d." for continuously updated data)
- Data set title in italics and sentence case
- Version number in parentheses (if applicable)
- [Data set] or similar descriptor in brackets
- Repository or publisher name
- Retrieval date (for dynamic data only)
- DOI or URL
Data Type Descriptors
- [Data set]: General research data
- [Database]: Structured database with multiple tables
- [Statistical data]: Statistical datasets
- [Survey data]: Survey responses and results
- [Genomic data]: DNA, RNA, protein sequences
- [Geospatial data]: Geographic or spatial data
- [Time series data]: Temporal measurements
- [Image data]: Collections of images for analysis
Common Data Repositories
- figshare: General research data repository
- Dryad: Scientific and medical research data
- Dataverse: Social science and interdisciplinary data
- ICPSR: Social and behavioral science data
- GenBank/GEO: Genomic and biological data
- Zenodo: European research data repository
- OSF: Open Science Framework data sharing
- GitHub: Code and data version control
Data Citation Best Practices
Using Data You Collected
If you collected original data for your research, you don't cite yourself in the reference list. Instead, describe your data collection methods in your methods section. Consider depositing your data in a repository for others to cite.
Citing Subsets of Larger Datasets
When using a portion of a large dataset, cite the complete dataset and explain in your text which subset you used. Don't create a separate citation for the subset unless it has its own DOI.
Multiple Versions of the Same Data
If you used multiple versions of a dataset across time, cite each version separately with its specific version number and access date.
Proprietary or Restricted Data
For data with access restrictions, cite it normally but note access restrictions in your text. Explain how readers can request access if applicable.
Finding Data Set Information
Locate citation details from these sources:
- Data repository landing page: Title, authors, version, DOI
- README files: Often include proper citation format
- Dataset documentation: Metadata about data collection and version
- "Cite this dataset" button: Many repositories provide formatted citations
- Data papers: Some datasets have companion publications
- Repository metadata: Structured information about the dataset
Supplementary Data vs. Main Data Sets
Supplementary Data from Journal Articles
If using supplementary data files from a journal article, cite the article and note the supplementary material:
Smith, J. (2024). Climate patterns analysis. Journal of Climate, 37(2), 123–145. https://doi.org/10.xxxx [Supplementary data files]
Standalone Data Sets
Independent data sets with their own DOIs and repository entries receive separate citations from any related publications.
Generate Perfect Data Set Citations
Data citations require specific elements like DOIs, version numbers, and repository names. Our free APA citation generator formats all types of datasets—research data, government statistics, survey data, and more. Get accurate citations instantly with proper attribution.
Try Free APA Citation Generator →Frequently Asked Questions
Do I cite the data set or the paper that describes it?
If you used the actual data, cite the data set. If you're discussing the research findings or methodology, cite the paper. Often, you'll cite both—the paper for context and methods, the data set because you reanalyzed it.
How do I cite data I downloaded from a government website?
Cite the government agency as the author, include the data set name, mark it as [Data set], and provide the URL. Include version or update information if available.
What if the data set doesn't have a formal title?
Create a descriptive title that clearly identifies the data, using italics and sentence case. For example: Daily temperature measurements from weather stations in California, 2010-2020.
Should I include the file format (.csv, .xlsx)?
No. The [Data set] descriptor is sufficient. File formats are technical details that don't belong in citations.
How do I cite data from multiple years of the same survey?
Cite each year separately if they're separate data releases. If you combined multiple years, explain your approach in your methods and cite the series or each year used.
What if I can't find all the citation information?
Include as much information as you can find. At minimum, provide the data creator/source, approximate date, descriptive title, and access URL.