Describing datasets in the data repositories

This guide explains how you should write the data description when you deposit your data to the data repository. Generally speaking, a good description is such that a researcher from the same field can begin to re-use the data without contacting the original author of the data. The informative description will help others find your data and understand what your data is about. It also enables the re-use and proper citation of your data. To create a great description, start to capture information for it during the project.
Computer screen with several windows open, showing cylindrical shapes on topmost windoe and three-dimensional grid structure underneath it. Photo by Mikko Raskinen.

Common metadata elements

The repositories have various descriptive metadata elements and may name them differently. This instruction covers commonly included metadata elements: the title/name of the data set(s), data creator(s)/author(s), abstract, keywords, date of data production, publisher, and persistent identifier. We also address how to set the access and re-use conditions, as you typically have to define that when depositing the data in the repository.

Title Title of the dataset(s).
Data Creator The individuals or organisations the data should be attributed to.
Abstract/ Summary The brief summary what your data is about.
Keywords Words to define your data.
Date of data production When the dataset was created, or other dates appropriate.
Publisher The publisher of the data set (e.g. the repository used to open the data).
Persistent identifier Specific identifier provided by the repository, e.g. DOI, Handle or URN.
Terms of access and use Defines the terms and condition to access and reuse the data, e.g. licensing or re-use by request.


Metadata for repositories and archiving

Each data repository requires you to provide some basic information for each dataset, just like journals require authors, affiliation, publication dates, and so on. Collectively, this is referred to as metadata. For the most part, researchers should be concerned with finding an appropriate repository and follow its instructions for depositing data, which includes filling out the relevant metadata.

One can find evaluations of repositories, including the quality of their metadata, from the Aalto information, and in the Registry of Research Data Repositories. One should strongly prefer repositories that provide persistent identifiers and standards suitable for the quality of the data.


The title is the most important element to find your data and to determine if the dataset meet the user’s needs. Provide a unique title by focusing on the data you are sharing. Even if your data relates to an article, consider to give a distinctive title to your data. Keep the title compact and try to include What, Where, When, Who, and Scale. The informative title covers topic, timeliness of the data, specific information about place and geography. 

Abstract/ Summary

Be specific and quantify when possible to give enough information about your data. Look at the checklist below to help you answer all relevant questions.


Keywords are not typically a mandatory field. It is highly recommended anyway. Target the keywords according to the audiences you have in mind. Use field-specific keywords, e.g. special thesauri when available and free words and phrases too.  

Persistent identifier 

A persistent identifier is a unique string of characters given to the data by the repository. For example, the DOI (Digital Object Identifier) and the URN (Uniform Resource Name) are commonly used identifiers. Persistent identifiers identify online resources, such as datasets, by providing a permanent "name" and link to them. Even if the data changes location on the Internet, the identifier remains the same and will still link to the data, regardless of the new location.

It is important to choose a repository that provides a persistent identifier along with the data deposit. The identifier distinguishes the right data (set) and thus enables the proper data citation and eases the re-use of the specific data.

Version control

One important thing to know, at least in case of general-purpose repositories i.e. Zenodo, is version control. It is a fundamental feature in open science.

There is always a possibility that you will update your dataset in the future or notice a mistake. You cannot delete a dataset, but you can upload a new version. While the dataset will keep the same DOI, versions will have different ones.

Example of a dataset with different versions

Terms of access to the data

When you deposit your data in the repository, you need to define the conditions where your data can be accessed. Typically, in the repositories you will define first how the user can access the data and then the right to re-use the data by licensing it. Some data repositories allow for more access options than the others do.  

Licensing the data for re-use

The users’ rights to employ the deposited data are commonly defined by licenses. The license protects your author rights. It ensures that the user provides credit for you by citing your data. The license also reduces the uncertainty by letting the potential users know how your data can be re-used, combined, mined, or re-distributed.  

Aalto recommends Creative Commons license CC BY 4.0 for datasets in general, and CC BY NC 4.0 license to reserve rights for commercial re-use.

Links to research data management instructions

Follow these links to navigate through research data management instructions.

Data publishing repositories

Data publishing repositories used in Aalto University

The image is from Aalto University material bank

Practical steps to publish the research data

Publishing the underlying research data increases citations to your journal articles and other publications.

The image is from Aalto University material bank.

Publishing and reusing open data

Overview and instructions to services for sharing and publishing research data

Aalto univerisity library

Research Data Management (RDM) and Open Science

Properly managed research data creates competitive edge and is an important part of a high-quality research process. Here you will find links to support, services and instructions for research data management.

People talking with each other
This service is provided by:

Research and Innovation Services

Did you find what you were looking for? If not, please contact us.
  • Published:
  • Updated:
URL copied!