Practical steps to publish the research data
Choose and prepare data for publishing
Start preparing your dataset by choosing what data to publish and preserve. Some options include:
- At minimum, the data that is needed to validate your results
- Everything that is needed to replicate a study
- Everything that is potentially useful for scientific community
Dataset could include:
- Documentation, such as: code books, lab journals, informed consent forms, etc. These are important for understanding the data and combining them with other data sources.
- Software, hardware, tools, syntax queries, machine configurations – domain-dependent, and important for using the data. Alternatively, share information about the software etc. used in research.
Make sure that your data can be made open
Some limitations in opening the research data are possible if you have:
- Personal data or personal sensitive data which cannot be anonymized (more information can be found from handling personal data and Research Ethics sections)
- Confidential data
- Research outcomes which can be commercially or industrially exploited (more information can be found from Publishing and Commercialization section)
Document the data for reusability
To maximise the impact and possible reuse for your datasets, follow the FAIR (Findable, Accessible, Interoperable, Reusable) principles. Describing data to be understandable by another researcher in your discipline and choosing a good repository is enough to comply to most of them. For more information, look at the principles at: https://www.go-fair.org/fair-principles
Choose a repository
We recommend using a reliable discipline-specific repository to maximize impact of your dataset. If a suitable discipline-specific repository is not available, we recommend Zenodo as a general purpose data repository.
Check the link below to learn more about data repositories and other publishing options like data journals.
Choose the license for your data set
The Principal Investigator is responsible for choosing the appropriate license needed to achieve the goals of the research project. When choosing the license, comply with the possible restrictions on openness, e.g. project agreements and confidentiality of data. If there are no limitations to make the data open, choose the license that maximises the re-use of your data.
In case you want to reserve the right for commercial re-use of your data, you can choose the CC BY NC 4.0 license to allow only the non-commercial use. This allows you to commercialise the data yourself. Check the separate page on commercialisation for further information:
To license a dataset that has been created in other than externally funded projects, the creators have to make written agreements on the ownership and licensing of the data. In research projects receiving external funding, data ownership is transferred to Aalto University and thus licensing is possible without extra agreements.
The CC license information and tool are translated to several languages. Scroll to the end of the page to see the language options.
Describe your dataset
To ensure that your dataset will be found and it's understandable and interesting enough for reuse, write a good description of it to the repository you've chosen.
Upload your data to the repository
Once everything is prepared, it's time to upload your data to the repository. You'll get a persistent identifier to your dataset (DOI, URN or similar), making it findable and citable. '
Publishing dataset with a publication
To open an underlying dataset together with a publication, follow this process:
- Upload data to a repository and set an embargo to it to first get the DOI
- Then use the DOI to cite the dataset in your publication
- Open the dataset after the publication is out
Register the dataset to ACRIS
Register your dataset to Aalto University's metadata catalog ACRIS, even if there's a justified reason not to open the data (or part of it). However, the registration should be omitted if data is classified as secret for security reasons.
Consider the preservation and long-term availability of your data
Long-term preservation of data is about keeping the data readable, understandable, accessible and reusable for decades or longer. To achieve this, data needs curation and possibly some special operations like format conversions. To prepare for a long-term preservation of your dataset, consider these:
- describe the data carefully, sooner the better
- ensure that you have the intellectual property (IP) rights to license your data
- ensure long-term data ownership at the organization level, help the organization to take care of the data after you
- prefer open file formats instead of proprietary ones (check the list of file formats for the Finnish digital preservation service for reference)
Links to research data management instructions
Follow these links to navigate through research data management instructions.