Data documentation, organization, and metadata

Metadata describes the research data. Information about the creator, license, relevant dates, and summary statistics can all be metadata.
The image is from Aalto University material bank.

Metadata describes your actual data, and is necessary in order for your data to be reused. From a researcher perspective, there are two main considerations:

  1. The description of the research data, how it was created, meaning, the software needed to use it, etc.
  2. Basic bibliographic information that is needed to retrieve the research data and make citations, including information about the creator, license, relevant dates, title, year of publication, repository, and identifier.

Metadata standards

There are different metadata standards which are used for both the description/documentation of the research data during the research project and also for basic bibliographic information i.e. in repositories which collect research data. 

More information:

In general, one does not directly select a metadata standard but it is selected by the repository. Sometimes, the repository can require or use data if it is structured in a certain format - as in, the data itself has certain metadata and is in certain formats. This is related to the structuring of data mentioned. A repository of structured data allows large-scale, automated processing and data mining.

Check the link below to learn more about metadata for repositories.

Computer screen with several windows open, showing cylindrical shapes on topmost windoe and three-dimensional grid structure underneath it. Photo by Mikko Raskinen.

Describing datasets in the data repositories

Make a brilliant description to maximize re-use, citation and findability of your dataset.


For description of the research data, it is recommended to use discipline-specific metadata to have a detailed description of the data. If there is no discipline-specific metadata format, it is recommended to write "README" style metadata i.e. basic description of data. 

Basic description of data

One of the most common reasons for data to become unusable is because the contents, collection parameters, fields, or so on are forgotten. Thus, you should always take care to record this type of information in any way possible. The simplest way to record this is in an unstructured README file i.e. so-called "README" style metadata.

An unstructured data description is sufficient for most data, but large or high-value datasets should be more structured, including their metadata. This is because they are more likely to be reused, including reuse without a human manually understanding and reusing data. Data may become more structured over time as its value becomes apparent.

Structuring data is very field-dependent, and the best advice is to search for standards of your field and follow them. For general examples of structured data, see 5-star data. The most basic step is to use an open, machine-readable data format which is not likely to ever become obsolete.

Organizing data

If you don't have a clear organization strategy, your data will become unmaintainable and unfindable, even by you.  Every type of project has different requirements, so it is difficult to make generalizations.  However, try to be strict with your data folders: rigorously sort things early, and give a unique name to different project spaces. Have projects relate to each other, rather than copying and pasting or embedding.

Within a project space, sort files by type or usage instead of allowing everything to become mixed.

More information about good practices for data organization can be found on the page provided by Aalto Science-IT: Data organization

Links to research data management instructions

Follow these links to navigate through research data management instructions.

Aalto univerisity library

Publishing and reusing open data

Overview and instructions to services for sharing and publishing research data

People talking with each other

Research Data Management (RDM) and Open Science

Properly managed research data creates competitive edge and is an important part of a high-quality research process. Here you will find links to support, services and instructions for research data management.

This service is provided by:

Research and Innovation Services

Did you find what you were looking for? If not, please contact us.
  • Published:
  • Updated:
URL copied!