Popular searches

Donate Join us

Data sharing

Guidance for best practice and reproducibility of experimental data.

We want our authors and readers to trust the research that is published in our journals. To that end, we support the entire community to achieve best practice, in both the sharing and archiving of research data.

For guidance on how to present experimental results included within your article, please see our experimental reporting requirements.

On this page

Our data sharing policy

The Royal Society of Chemistry believes that where possible, all data associated with the research in a manuscript should be Findable, Accessible, Interoperable and Reusable (FAIR), enabling other researchers to replicate and build on that research.

We strongly encourage authors to deposit the data underpinning their research in appropriate repositories.

For all submissions to Royal Society of Chemistry journals, any data required to understand and verify the research in an article must be made available on submission. To comply, we suggest authors deposit their data in an appropriate repository. Where this isn’t possible, we ask authors to include the data as part of the article Supplementary Information.

To maintain high standards of transparency, research reproducibility, and to promote the reuse of new findings, a data availability statement (DAS) is required to be submitted alongside all articles.

Some journals may have additional subject requirements for both sharing and/or publishing supporting data, so please ensure you check the specific journal guidelines.

Why is data sharing important?

Data sharing is central to improving many aspects of research culture.

  • It supports the validation of data to maintain high standards of research reproducibility
  • It increases transparency and encourages trust in the scientific process
  • It enables and encourages the reuse of new findings
  • The formal citation of data ensures all researchers involved in producing the data can gain credit for their research outputs
  • It may also be a formal requirement placed on researchers by funders or institutions

What is research data?

Research data generally refers to the results of observations or experiments that validate your research findings. It forms part of a wider group of useful materials associated with your research project, including but not limited to:

  • raw or processed data and metadata files, e.g. spectra, images, structure files
  • software and code, including software settings
  • models
  • algorithms

Research data typically refers to digital, machine-readable files and we encourage authors to make data available in standard formats that can be opened and re-used by others.

Recommended repositories

A data repository is an external storage space for researchers to deposit datasets associated with their research. Data should be submitted to a discipline-specific, community-recognised repository where possible, or alternatively to an institutional repository, or a generalist repository if no subject discipline repository is available for the given data type.

The choice of repository is the author’s decision, provided it is in line with institutional or funder guidelines. The exception to this is small molecule crystal data, which must be deposited with the Cambridge Crystallographic Data Centre (CCDC).

Choosing a repository

The Royal Society of Chemistry supports the TRUST principles for digital repositories – Transparency, Responsibility, User focus, Sustainability and Technology – for repository selection. We strongly encourage the use of repositories offering persistent identifiers, such as DOIs, for deposited datasets. These help to make robust connections between datasets and papers, e.g. via the inclusion of a Data Availability Statement.

Authors should also consult the following resources to make selections:

  • RSC guidance for repositories for specific data types.
  • Institutional guidelines. Please consult your subject librarian or research data support service for any local guidance, e.g. on depositing data within institutional data repositories
  • Funder guidelines. Please consult your funder on specific compliance requirements, such as to the creation of research data management plans (DMPs)
  • Repositories of repositories. The following websites may help for searching and selecting subject specific repositories.
    • re3data
    • FAIRsharing

Subject specific repositories

The Royal Society of Chemistry encourages the use, where possible, of subject specific rather than general repositories, and recommendations by data type are given below.

Where deposition in a particular repository is required for submission, this is indicated in the table.

Data typeRepositoryURLFile / standard
Crystal structure (organic / organometallic / metal organic)
Required for all RSC journals
Cambridge Structural Database (CSD) – managed by the Cambridge Crystallographic Data Centre (CCDC)CCDCCrystallographic information file (.cif)
Crystal structure (biological)Protein Data Bank (PDB)PDBMacromolecular CIF (mmcif)
Crystal structure (inorganic)Inorganic Crystal Structure Database (ICSD), deposition via CCDCCCDCCrystallographic information file (.cif)
Crystal structure (powder)Either The International Centre for Diffraction Data (ICDD) or Cambridge Structural Database (CSD)ICDD
CCDC
Powder Diffraction File (PDF)
CryoEMElectron Microscopy Data BankElectron Microscopy Data Bank
MRC file
bio-NMRBiological Magnetic Resonance Data Bank (BMRB)bio-NMR
NMR Self-defining Text Archival and Retrieval (NMR-STAR), format conversion tools available
Data typeRepositoryURLFile / standard
Software / code
Please also refer to ‘Software and Code’ guidelines for Data Availability Statements and Data citations
GitHubGitHubPlease also consider archiving code in combination with a repository that can issue a DOI
View an example.
Software / code
Please also refer to ‘Software and Code’ guidelines for Data Availability Statements and Data citations
Code OceanCode Ocean
Models of biochemical reaction networksBioModels databaseBioModels databaseSupported formats include SMBL and PharmML.
Data typeRepositoryURLFile / standard
Atomic coordinatesNonspecific / consider general or institutional repository
We recommend the use of any standard structure file, such as xyz, cif, pdb; or a text file with structure in Cartesian, fractional, z-matrix or other common representation.
Input/configuration files and program outputNonspecific / consider general or institutional repository
We recommend sharing the standard input and output formats generated by the simulation software.
Materials simulation data including electronic structure and molecular dynamicsNOMADNOMADSee repository guidelines.
Computational materials scienceMaterials CloudMaterials Cloud
See Materials Cloud for guidance.
Computational chemistry filesioChem-BD - The Computational Chemistry Results RepositoryioChem-BDSee ioChem-BD documentation.
Data typeRepositoryURLFile / standard
NMRNonspecific / consider general or institutional repository
There is no single, widely accepted data standard. We encourage the deposition of a zip file of the raw instrument data (the entire file directory for the experiment, including the FID and associated files). Processed spectra may also be included.
IR / RamanNonspecific / consider general or institutional repository
.csv, xlsx, or other machine-readable format
UV-visNonspecific / consider general or institutional repository
.csv, xlsx, or other machine-readable format
EPRNonspecific / consider general or institutional repository
.dsc, .dta
bio-NMRBiological Magnetic Resonance Data Bank (BMRB)bio-NMRNMR Self-defining Text Archival and Retrieval (NMR-STAR), format conversion tools available.
Mass spectral data for small chemical molecules, metabolomics, exposomicsMassBankSee MassBank contributor guidance
Data typeRepositoryURLFile / standard
Electrophoretic gels and blotsNonspecific / consider general or institutional repository
Please deposit raw, unedited files in a high-resolution image format (e.g.tiff)
Microscopy (e.g. SEM, TEM, STM) <Nonspecific / consider general or institutional repository
Please deposit raw, unedited files in a high-resolution image format (e.g.tiff)
Coherent X-ray imagesCoherent X-ray Imaging Data Bank (CXIDB)CXIDBCXI file (see repository website)
Bioimages, multidimensional life sciences image data (cell and tissue)Image Data Resource (IDR)IDRFor supported formats see OME guidance 
Data typeRepositoryURLFile / standard
Materials (various)Materials Data FacilityMaterials Data Facility
Multiple – see MDF Connect
Materials simulation data including electronic structure and molecular dynamicsNOMADNOMADSee repository guidelines
Computational materials scienceMaterials Cloud Materials Cloud   See Materials Cloud for guidance
Data typeRepositoryURLFile / standard
All proteomics dataAny ProteomeXchange memberProteomeXchangeSee relevant target repository
Proteomics mass spectrometryProteomics Identification Database (PRIDE)PRIDEMultiple - see EMBL-EBI website
Human geno- and phenotype data, epigeneticsDatabase of Genotypes and Phenotypes (dbGaP)dbGaPMultiple - get guidance from dbGaP 
Human genetic variation data (<=50bp), e.g.single-base nucleotide substitutions, small-scale deletion or insertionsdbSNPdbSNPMultiple – get guidance from dbSNP
Human genomic structural variation data (>50bp), e.g. insertions, deletions, translocationsDatabase of Genomic Structural Variation (dbVAR)dbVARExcel and VCF files – get requirements from dbVAR
Genetic variation data (all species)European Variation Archive (EVA)EVAVCF files – get requirements from the EVA
Gene expression data, array- and sequence-basedGene Expression OmnibusGene Expression OmnibusSee repository guidelines 
High-throughput functional genomics dataArrayExpressArrayExpressSee repository guidelines 
Protein-protein, protein-DNA/RNA and molecular interactions    IntAct molecular interaction database (IntAct)IntActMultiple – get guidance from IMEx Consortium
miRNA sequences and annotationmiRBase: the microRNA databasemiRBaseMultiple – get guidance from miRBase 
MetabolomicsMetaboLightsMetaboLightsSee repository guidelines 
MetabolomicsMetabolomics WorkbenchMetabolomics WorkbenchSee repository guidelines 
Data typeRepositoryURLFile / standard
DNA & RNA sequence data    

Any INSDC repository member

INSDC
Genome sequence dataGenome Sequence Archive (GSA) GSAMultiple - get GSA guidance on suitable types
Metagenomics sequence dataMGnifyMGnifyMGnify guidance on sequence data
Protein sequencesUniversal Protein Resource (UniProt)UniProtUniProtKB / Swiss-Prot
Data typeRepositoryURLFile / standard
Atmospheric and earth observation research, environmental dataCEDA Archive (Centre for Environmental Data Analysis)CEDASee repository guidelines 
Environmental and ecological data Environmental Data Initiative (EDI)EDI


See EDI guidelines

Geochemical, geochronological, and petrological dataEarthChemEarthChemSee EarthChem guidelines 
Climate or Earth system research, climate model dataWorld Data Center for Climate (WDCC)WDCCSee WDCC for guidance
Data typeRepositoryURLFile / standard
Functional enzymology data (kinetic and experimental data)Standards for Reporting Enzymology Data (STRENDA DB)STRENDASee STRENDA guidelines
Flow cytometry dataFlowRepositoryFlowRepositorySee FlowRepository guidelines 
Protein circular dichroism and protein synchrotron radiation circular dichroism Protein Circular Dichroism Data Bank (PCDDB)PCDDBSee repository guidance
Data typeRepositoryURLFile / standard
Intermolecular and supramolecular interactions of molecular systems, binding, assembly, and interaction phenomenaSupraBankSupraBankJSON (DataCite), CDX (for 2D/3D molecule structure), PNG, proprietary formats

General repositories

Where subject specific or institutional/funder repositories are not available, authors may wish to choose a general repository, such as:

RepositoryInformation on costs
Dryad Digital Repository Fees apply
figshare Fees apply
Harvard DataverseContact repository for datasets over 1 TB
Open Science FrameworkFree of charge
Science Data BankFree of charge
ZenodoDonations towards sustainability encouraged
ChemotionFree of charge

Data availability statements

To maintain high standards of transparency, research reproducibility, and to promote the reuse of new findings, a data availability statement (DAS) is required to be submitted alongside all articles.

Data availability statements provide information about where data, software, or code supporting the results reported in a published article can be found. These should include, where applicable, links to datasets shared in an external data repository, which have been analysed or generated during the study. This section should list the database, accession number, DOI, URL or any other relevant details. The full URL link to data sets should be provided (not embedded behind text). Authors are also encouraged to include data citations to associated datasets in the reference section of an article.

The data availability statement can provide information about the data presented in an article (e.g., in Figures or Tables) or provide a reason if data are not available to access (e.g. human health data). If supporting data or code have been included in the article’s Supplementary Information, this should also be stated here.

If data for the article cannot be made available, for example, due to legal or ethical confidentiality requirements, then the DAS should state this.

A data availability statement must be included at the end of the article under the heading “Data availability”, after the conflicts of interest statement and before any acknowledgements.

The following are some examples of DAS that you can use:

  • Data for this article, including [description of data types] are available at [name of repository] at [URL – format https://doi.org/DOI].
  • The data supporting this article have been included as part of the Supplementary Information.
  • Crystallographic data for [compound number] has been deposited at the [name of repository, such as CCDC / ICSD / PBD] under [accession number] and can be obtained from [URL of data record, format https://doi.org/DOI].
  • The code for [description of software] can be found at [URL to code location] with [DOI – see guidelines below for citing software and code]. The version of the code employed for this study is version [XXX].
  • This study was carried out using publicly available data from [name of repository] at [URL] with [accession number].
  • The data analysis scripts of this article are available in the interactive notebook [name of notebook, e.g. Google Collab] at [URL].
  • Data for this article are available at [name of repository] at [URL – format https://doi.org/DOI]. Data collected from human participants, described in [Fig. X], are not available for confidentiality reasons.
  • No primary research results, software or code have been included and no new data were generated or analysed as part of this review.

The following statement is generally not acceptable “Data are available upon request from the authors".

Data and software citation

Citing datasets and code ensures effective and robust research dissemination. We strongly encourage Royal Society of Chemistry authors to formally cite associated datasets as bibliographic references.

Doing this will:

  • help readers to discover your data
  • allow funders to easily link to articles and data associated with science they support
  • provide formal credit to repositories and data creators.

Citing data

For author-generated datasets that are directly associated with the article:

We encourage authors to add data citations as bibliographic references within Data Availability Statements, alongside the information on datasets associated with the study and where to find them.

For other datasets associated with previous studies:

We encourage authors to add data citations as bibliographic references within the main text as they are mentioned. Data citation is encouraged as an alternative to informal references or mentions of local identifiers.

Suggested reference format for data citations:

[Name of data creators, format: A. Name, B. Name and C. Name], [Year], [Name of repository / type of dataset: deposition number], [DOI, or URL if not available, of the dataset].

Example

P. Cui, D. P. McMahon, P. R. Spackman, B. M. Alston, M. A. Little, G. M. Day and A. I. Cooper, 2019, CCDC Experimental Crystal Structure Determination: 1915306, DOI: 10.5517/ccdc.csd.cc22912j

Please also refer to the guidelines from the relevant repository on which information to provide in a citation.

Citing software and code

We encourage authors to add formal bibliographic references for software and code associated with their articles in Data Availability Statements and/or to directly credit use of other software and code by adding citations to the main text of their article at the relevant point.

Authors are asked to provide the names of all code creators in the reference, the name of the repository, and a DOI, although a URL can be provided if a DOI is not available. We strongly recommend you use CRediT (the Contributor Roles Taxonomy from CASRAI) for standardised contribution descriptions.

Find out how to cite GitHub-deposited code.

Please cite the specific release where possible – find out more about releases on GitHub.

Suggested reference format for code citations:

[Name of code creators, format: A. Name, B. Name and C. Name], [Year], [Name of code repository / type of code], [DOI, or URL if not available – in the instance where code has been deposited in GitHub and Zenodo, as per the guidelines above, the Zenodo DOI is preferred for bibliographic references]