Publishing research data
Publishing the underlying research data increases citations to your journal articles and other publications. Data citations are also citations.
It is imperative to get a persistent identifier for your data so that it is findable and citable. The most common identifier is Digital Object Identifier (DOI). Appropriate data repositories provide you with persistent identifiers for your data sets.
1.Consider the preservation and long-term availability of your data
Long-term preservation of data is about keeping the data readable, understandable, accessible and reusable for decades or longer. To achieve this, data needs curation and possibly some special operations like format conversions. Things to consider with long-term preservation include data description (metadata), license and IPR issues, ownership of data, value of the data and file formats.
To prepare for a preservation of your dataset, describe the data carefully, take care of ownership issues and prefer open file formats instead of proprietary ones.
2. Make your data discoverable
- Acquire a permanent handle to point to your data, DOI, URN or similar. As usual, this is best done by choosing a good repository.
- Register your data set to a metadata catalog such as ACRIS or Etsin even if there's a justified reason not to open the data (or part of it). However, the registration should be omitted if data is classified as secret for security reasons.
For more guidance, please check the section Sharing and publishing data on the Data Management Guide of the Open Science and Research Initiative.
3. Choose a repository for your data set
The best way to answer to the demands in 1. and 2. is to choose a reliable, certified repository that can be
- Specific to your discipline
- A catch-all repository, such as Zenodo
- Finnish Social Science Data Archive (FSD)
- ACRIS - Aalto Current Research Information System for research outputs, researchers, projects, datasets, etc.
- AVAA open data publishing portal.
Read more on recommended data publishing repositories here.
4. Choose the license for your data set
The openness of data is a spectrum, not just a simple yes/no decision. The Principal Investigator is responsible for choosing the appropriate license needed to achieve the goals of the research project.
Publishers and funding organisations usually have requirements and recommendations for opening data sets. They may also recommend specific data repositories. For example, Scientific Data has published a list of recommended data repositories for researchers wishing to publish a manuscript. To make your research data available to other users, you have to define the terms and conditions of use. This is done by licensing. Using international standard licenses helps interoperability, which means that datasets from different sources can be combined.
To license a dataset requires either 1) that all creators agree to release the data they have created using the same license; or 2) the ownership of datasets is transferred to one legal entity. In research projects receiving external funding, data ownership is transferred to Aalto University. Researchers creating research data without external funding must have written agreements on the ownership and licencing of the data.
5. Tell the world about your data
Contact Aalto Communications Services when you publish an interesting data set! For example, when the researches in the University of Turku opened data sets on daily pollen concentrations in Zenodo, the university communications published a news article on this (in Finnish): Turun yliopisto avasi siitepölydataa neljältä vuosikymmeneltä.
- If you create data, publish it in a repository that assigns a DOI or other persisten identifier to your dataset. This makes your data findable, and thus helps to increase citations.
- We recommend using a discipline-specific repository if available, or Zenodo if a discipline-specific repository is not available.
- Make sure that your metadata appears in ACRIS, Aalto's reporting system so that Aalto can get credit, even if your data is not fully open data. It is quick and easy to do yourself.
- Cite data if you use it.