Describing datasets in the data repositories
Common metadata elements
The repositories have various descriptive metadata elements and may name them differently. This instruction covers commonly included metadata elements: the title/name of the data set(s), data creator(s)/author(s), abstract, keywords, date of data production, publisher, and persistent identifier. We also address how to set the access and re-use conditions, as you typically have to define that when depositing the data in the repository.
|Title||Title of the dataset(s).|
|Data Creator||The individuals or organisations the data should be attributed to.|
|Abstract||The brief summary what your data is about.|
|Keywords||Words to define your data.|
|Date of data production||When the dataset was created, or other dates appropriate.|
|Publisher||The publisher of the data set (e.g. the repository used to open the data).|
|Persistent identifier||Specific identifier provided by the repository, e.g. DOI, Handle or URN.|
|Terms of access and use||Defines the terms and condition to access and reuse the data, e.g. licensing or re-use by request.|
The title is the most important element to find your data and to determine if the dataset meet the user’s needs. Provide a unique title by focusing on the data you are sharing. Even if your data relates to an article, consider to give a distinctive title to your data. Keep the title compact and try to include What, Where, When, Who, and Scale. The informative title covers topic, timeliness of the data, specific information about place and geography.
- Time series of microbial carbon release from soil as carbon dioxide under different nitrogen and phosphorus treatments with a low glucose concentration added as a carbon source in the Conwy catchment, North Wales, UK (2016)
- Finnish National Election Study 2011: Telephone Interviews among Finnish-speaking Voters
Be specific and quantify when possible to give enough information about your data. Look at the checklist below to help you answer all relevant questions.
- What the data were about?
- How, when and by whom the data were collected/generated?
- How the data was processed?
- What methods were used?
- What equipment and software were used?
- Why the data were collected /generated?
- What is the geographical location and the temporal coverage of the data?
Keywords are not typically a mandatory field. It is highly recommended anyway. Target the keywords according the audiences you have in mind. Use field specific keywords, e.g. special thesauri when available and free words and phrases too.
A persistent identifier is a unique string of characters given to the data by the repository. For example, the DOI (Digital Object Identifier) and the URN (Uniform Resource Name) are commonly used identifiers. It is important to choose a repository that provides a persistent identifier along with the data deposit: The identifier distinguishes the right data (set) and thus enables the proper data citation and eases the re-use of the specific data.
Terms of access to the data
When you deposit your data in the repository, you need to define the conditions where your data can be accessed. Typically, in the repositories you will define first how the user can access the data and then the right to re-use the data by licensing it. Some data repositories allow for more access options than the others do.
|Open access to data||Means that data is freely used, re-used and redistributed by anyone.|
|Embargoed access to data||Means that data is not available during the defined period of time.|
|Restricted access / Access by request to data||Means that the data is shared within specific conditions. Re-users have to request access and the data creator has to permit or deny the access.|
|Closed access to data||Means that re-users have no access to the data.|
Licensing the data for re-use
The users’ rights to employ the deposited data are commonly defined by licenses. The license protects your author rights. It ensures that the user provides credit for you by citing your data. The license also reduces the uncertainty by letting the potential users know how your data can be re-used, combined, mined, or re-distributed.
Aalto recommends Creative Commons license CC BY 4.0 for datasets in general, and CC BY NC 4.0 license to reserve rights for commercial re-use.
More information on licensing
Links to research data management instructions
Follow these links to navigate through research data management instructions.
Data publishing repositories used in Aalto University
Publishing the underlying research data increases citations to your journal articles and other publications.
Overview and instructions to services for sharing and publishing research data
Properly managed research data creates competitive edge and is an important part of a high-quality research process.