Practical steps to publish the research data

Publishing the underlying research data increases citations to your journal articles and other publications. Also data citations count. Follow these instructions to get maximum impact with reasonable effort.
The image is from Aalto University material bank.

Choose and prepare data for publishing

Start preparing your dataset by choosing what data to publish and preserve. Some options include:

  1. At minimum, the data that is needed to validate your results
  2. Everything that is needed to replicate a study
  3. Everything that is potentially useful for scientific community

Dataset could include:

  • Documentation, such as: code books, lab journals, informed consent forms, etc. These are important for understanding the data and combining them with other data sources.
  • Software, hardware, tools, syntax queries, machine configurations – domain-dependent, and important for using the data. Alternatively, share information about the software etc. used in research.

Source:  Sarah Jones and Marjan Grootveld: How to write a Data Management Plan


Make sure that your data can be made open

Some limitations in opening the research data are possible if you have:

  1. Personal data or personal sensitive data which cannot be anonymized (more information can be found from handling personal data and Research Ethics sections)
  2. Confidential data
  3. Research outcomes which can be commercially or industrially exploited (more information can be found from Publishing and Commercialization section)

    Document the data for reusability

    To maximise the impact and possible reuse for your datasets, follow the FAIR (Findable, Accessible, Interoperable, Reusable) principles. Describing data to be understandable by another researcher in your discipline and choosing a good repository is enough to comply to most of them. For more information, look at the principles at:

    Data documentation, organization, and metadata

    Metadata describes the research data. Information about the creator, license, relevant dates, and summary statistics can all be metadata.

    The image is from Aalto University material bank.

    Choose a repository

    We recommend using a reliable discipline-specific repository to maximize impact of your dataset. If a suitable discipline-specific repository is not available, we recommend Zenodo as a general purpose data repository.

    Remember to use ORCID when you publish dataset. More information can be found in Researcher Identification and Research Profiles section. 

    Check the link below to learn more about data repositories and other publishing options like data journals.

    Data publishing repositories

    Data publishing repositories used in Aalto University

    The image is from Aalto University material bank

    Choose the license for your data set

    The Principal Investigator is responsible for choosing the appropriate license needed to achieve the goals of the research project. When choosing the license, comply with the possible restrictions on openness, e.g. project agreements and confidentiality of data. If there are no limitations to make the data open, choose the license that maximises the re-use of your data.

    Creative Commons licenses are standard licences used to define the terms of use for datasets. We recommend the Creative Commons license CC BY 4.0 if there is no commercial or any special reasons to limit the range of the re-use of your data. CC BY 4.0 allows sharing, copying and redistributing and adapting the material for any purpose, even commercially. The terms of the license require users to give appropriate credit to the authors, so authors of datasets will always get the citations.

    In case you want to reserve the right for commercial re-use of your data, you can choose the CC BY NC 4.0 license to allow only the non-commercial use. This allows you to commercialise the data yourself. Check the separate page on commercialisation for further information:

    Publishing and commercialisation – Can I have both?

    To license a dataset that has been created in other than externally funded projects, the creators have to make  written agreements on the ownership and licensing of the data. In research projects receiving external funding, data ownership is transferred to Aalto University and thus licensing is possible without extra agreements.

    Comprehensive information about the CC licenses

    Tool to help you in selecting the proper CC license

    The CC license information and tool are translated to several languages. Scroll to the end of the page to see the language options.

    Read more about Licensing research data at Aalto University

    Describe your dataset

    To ensure that your dataset will be found and it's understandable and interesting enough for reuse, write a good description of it to the repository you've chosen.

    Describing datasets in the data repositories

    Make a brilliant description to maximize re-use, citation and findability of your dataset.

    Computer screen with several windows open, showing cylindrical shapes on topmost windoe and three-dimensional grid structure underneath it. Photo by Mikko Raskinen.

    Upload your data to the repository

    Once everything is prepared, it's time to upload your data to the repository. You'll get a persistent identifier to your dataset (DOI, URN or similar), making it findable and citable. '

    How to upload a dataset to Zenodo (video, 3'55")

    Publishing dataset with a publication

    To open an underlying dataset together with a publication, follow this process:

    1. Upload data to a repository and set an embargo to it to first get the DOI
    2. Then use the DOI to cite the dataset in your publication
    3. Open the dataset after the publication is out

    Register the dataset to ACRIS

    Register your dataset to Aalto University's metadata catalog ACRIS, even if there's a justified reason not to open the data (or part of it). However, the registration should be omitted if data is classified as secret for security reasons. If you need help, contact [email protected].

    Consider the preservation and long-term availability of your data

    Long-term preservation of data is about keeping the data readable, understandable, accessible and reusable for decades or longer. To achieve this, data needs curation and possibly some special operations like format conversions. To prepare for a long-term preservation of your dataset, consider these:

    1. describe the data carefully, sooner the better
    2. ensure that you have the intellectual property (IP) rights to license your data
    3. ensure long-term data ownership at the organization level, help the organization to take care of the data after you
    4. prefer open file formats instead of proprietary ones (check the list of file formats for the Finnish digital preservation service for reference)

    Links to research data management instructions

    Follow these links to navigate through research data management instructions.

    Aalto univerisity library

    Publishing and reusing open data

    Overview and instructions to services for sharing and publishing research data

    People talking with each other

    Research Data Management (RDM) and Open Science

    Properly managed research data creates competitive edge and is an important part of a high-quality research process.

    This service is provided by:

    Research and Innovation Services

    Did you find what you were looking for? If not, please contact us.
    • Published:
    • Updated:
    URL copied!