Skip to content
← Back to Guides

How to Cite Data Sets in APA Format

Data sets form the empirical foundation of research across sciences, social sciences, and humanities. Whether you're citing publicly available research data, government statistics, survey data, or scientific measurements, understanding APA 7th edition data citation format ensures transparency, reproducibility, and proper credit for data creators while advancing open science principles.

Why Data Set Citations Matter

The movement toward open science and data sharing has made data sets increasingly important scholarly outputs. Researchers publish data sets in repositories, journals publish supplementary data, and government agencies release public data. These data resources enable replication studies, meta-analyses, and new research questions. Proper citation of data sets acknowledges the considerable effort required to collect, clean, and document data while allowing other researchers to access and build upon existing work.

APA 7th edition recognizes data as a citable scholarly product, with specific guidelines that emphasize findability through persistent identifiers like DOIs. Data citations should provide enough information for readers to locate and understand the data, including version numbers when datasets are updated. Understanding these principles helps you properly document data sources and contribute to the scientific community's data sharing infrastructure.

Basic Format for Data Set Citations

Data set with DOI:

Author, A. A. (Year). Title of data set (Version X) [Data set]. Publisher/Repository. https://doi.org/xx.xxxx

Data set without DOI:

Author, A. A. (Year). Title of data set [Data set]. Publisher/Repository. URL

Government statistical data:

Agency Name. (Year). Title of data/statistics [Data set]. URL

In-text Citation:

  • Parenthetical: (Author, Year)
  • Narrative: Author (Year)

Step-by-Step Instructions

Step 1: Identify the Author or Creator

Data set authors are the individuals or organizations who created, collected, or compiled the data. List all authors up to 20, following standard APA author formatting. For organizational or collaborative projects, use the organization name. For government data, use the specific agency that produced the data (e.g., U.S. Census Bureau, not just "U.S. Government").

Step 2: Determine the Publication Year

Use the year the data set was published or made publicly available, not necessarily when the data was collected. For regularly updated data sets, use the year of the specific version you accessed. If the data is continuously updated, include a retrieval date.

Step 3: Format the Data Set Title

Italicize the complete title of the data set and use sentence case. If the data set has a formal title, use it. If not, create a descriptive title that clearly identifies the data. Include version numbers in parentheses after the title if the data has versions.

Step 4: Specify [Data set]

After the title and version, include [Data set] in square brackets. This descriptor clarifies that you're citing data, not a publication about the data. For specific data types, you might use [Database], [Statistical data], or other descriptors.

Step 5: Include Repository or Publisher

Name the data repository, archive, or publisher where the data is housed. Common repositories include figshare, Dryad, Inter-university Consortium for Political and Social Research (ICPSR), Dataverse, or institutional repositories. For government data, the agency may serve as both author and publisher.

Step 6: Add DOI or URL

Include the DOI if available (format: https://doi.org/xx.xxxx). DOIs provide persistent access to data sets. If no DOI exists, include the direct URL to the data set's landing page. For regularly updated data, include a retrieval date before the URL.

Detailed Examples

Example 1: Research Data Set with DOI

Reference list:

Martinez, S. R., Chen, L., & Williams, K. (2023). Global climate measurements 2000-2022: Temperature and precipitation data (Version 2.1) [Data set]. figshare. https://doi.org/10.6084/m9.figshare.1234567

In-text citation:

(Martinez et al., 2023)

Standard research data citation with multiple authors, version number, repository, and DOI.

Example 2: Government Statistical Data

Reference list:

U.S. Census Bureau. (2020). American Community Survey 5-year estimates [Data set]. https://data.census.gov/cedsci/table?tid=ACSDP5Y2020.DP05

In-text citation:

(U.S. Census Bureau, 2020)

Government statistical data cites the agency as author with the data set name and access URL.

Example 3: Survey Data from Repository

Reference list:

Pew Research Center. (2024). American trends panel wave 128: January 2024 survey data [Data set]. Roper Center for Public Opinion Research. https://ropercenter.cornell.edu/data-set/pew-american-trends-panel-w128

In-text citation:

(Pew Research Center, 2024)

Survey data from research organizations cite the organization and repository where data is archived.

Example 4: Scientific Measurement Data

Reference list:

National Oceanic and Atmospheric Administration. (2023). Sea surface temperature anomalies: Global ocean data [Data set]. National Centers for Environmental Information. https://www.ncei.noaa.gov/data/sea-surface-temperature-anomalies/

In-text citation:

(National Oceanic and Atmospheric Administration, 2023)

Scientific measurement data from government agencies cite both the producing agency and data center.

Example 5: Genomic or Biological Data

Reference list:

Johnson, A. M., & Thompson, R. K. (2024). RNA-seq data for breast cancer cell lines (Version 1.0) [Data set]. Gene Expression Omnibus. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE123456

In-text citation:

(Johnson & Thompson, 2024)

Biological data sets include repository name (GEO, GenBank, etc.) and accession numbers in the URL.

Example 6: Continuously Updated Data

Reference list:

World Bank. (n.d.). World development indicators: GDP growth annual % [Data set]. Retrieved January 15, 2025, from https://databank.worldbank.org/source/world-development-indicators

In-text citation:

(World Bank, n.d.)

For continuously updated data, use "n.d." and include a retrieval date before the URL.

Common Mistakes to Avoid

1. Citing Analysis Papers Instead of the Data

If you used the actual data set, cite the data set itself, not just papers that describe or analyze the data. Both citations may be appropriate if you reference both the data and analytical methods.

2. Omitting Version Numbers

Data sets are often updated. Include version numbers (Version 2.0, v3.1) when they're provided to ensure readers access the same data you used.

3. Forgetting [Data set] Descriptor

Always include [Data set] or similar descriptor in brackets after the title. This clarifies that you're citing data, not a publication.

4. Not Including DOIs When Available

DOIs provide permanent access to data sets. Always use DOIs when they're provided by the repository, even if you have a direct URL.

5. Using Generic Database Names

Be specific about the repository or database. Use "figshare," not "online database." Use "ICPSR," not "data archive."

6. Incomplete Government Data Attribution

For government data, use the specific agency that produced the data (e.g., Bureau of Labor Statistics) rather than just "U.S. Government" or "Department of Labor."

7. Not Including Retrieval Dates for Dynamic Data

For data that changes over time (like stock prices, weather data, or regularly updated statistics), include the retrieval date to document when you accessed the specific values.

Quick Reference Guide

Essential Elements for Data Set Citations:

  1. Author(s) or organization that created/collected the data
  2. Publication year (or "n.d." for continuously updated data)
  3. Data set title in italics and sentence case
  4. Version number in parentheses (if applicable)
  5. [Data set] or similar descriptor in brackets
  6. Repository or publisher name
  7. Retrieval date (for dynamic data only)
  8. DOI or URL

Data Type Descriptors

  • [Data set]: General research data
  • [Database]: Structured database with multiple tables
  • [Statistical data]: Statistical datasets
  • [Survey data]: Survey responses and results
  • [Genomic data]: DNA, RNA, protein sequences
  • [Geospatial data]: Geographic or spatial data
  • [Time series data]: Temporal measurements
  • [Image data]: Collections of images for analysis

Common Data Repositories

  • figshare: General research data repository
  • Dryad: Scientific and medical research data
  • Dataverse: Social science and interdisciplinary data
  • ICPSR: Social and behavioral science data
  • GenBank/GEO: Genomic and biological data
  • Zenodo: European research data repository
  • OSF: Open Science Framework data sharing
  • GitHub: Code and data version control

Data Citation Best Practices

Using Data You Collected

If you collected original data for your research, you don't cite yourself in the reference list. Instead, describe your data collection methods in your methods section. Consider depositing your data in a repository for others to cite.

Citing Subsets of Larger Datasets

When using a portion of a large dataset, cite the complete dataset and explain in your text which subset you used. Don't create a separate citation for the subset unless it has its own DOI.

Multiple Versions of the Same Data

If you used multiple versions of a dataset across time, cite each version separately with its specific version number and access date.

Proprietary or Restricted Data

For data with access restrictions, cite it normally but note access restrictions in your text. Explain how readers can request access if applicable.

Finding Data Set Information

Locate citation details from these sources:

  • Data repository landing page: Title, authors, version, DOI
  • README files: Often include proper citation format
  • Dataset documentation: Metadata about data collection and version
  • "Cite this dataset" button: Many repositories provide formatted citations
  • Data papers: Some datasets have companion publications
  • Repository metadata: Structured information about the dataset

Supplementary Data vs. Main Data Sets

Supplementary Data from Journal Articles

If using supplementary data files from a journal article, cite the article and note the supplementary material:

Smith, J. (2024). Climate patterns analysis. Journal of Climate, 37(2), 123–145. https://doi.org/10.xxxx [Supplementary data files]

Standalone Data Sets

Independent data sets with their own DOIs and repository entries receive separate citations from any related publications.

Generate Perfect Data Set Citations

Data citations require specific elements like DOIs, version numbers, and repository names. Our free APA citation generator formats all types of datasets—research data, government statistics, survey data, and more. Get accurate citations instantly with proper attribution.

Try Free APA Citation Generator →

Frequently Asked Questions

Do I cite the data set or the paper that describes it?

If you used the actual data, cite the data set. If you're discussing the research findings or methodology, cite the paper. Often, you'll cite both—the paper for context and methods, the data set because you reanalyzed it.

How do I cite data I downloaded from a government website?

Cite the government agency as the author, include the data set name, mark it as [Data set], and provide the URL. Include version or update information if available.

What if the data set doesn't have a formal title?

Create a descriptive title that clearly identifies the data, using italics and sentence case. For example: Daily temperature measurements from weather stations in California, 2010-2020.

Should I include the file format (.csv, .xlsx)?

No. The [Data set] descriptor is sufficient. File formats are technical details that don't belong in citations.

How do I cite data from multiple years of the same survey?

Cite each year separately if they're separate data releases. If you combined multiple years, explain your approach in your methods and cite the series or each year used.

What if I can't find all the citation information?

Include as much information as you can find. At minimum, provide the data creator/source, approximate date, descriptive title, and access URL.

Related Guides