Data storage, collaboration and backup

There are a number of solutions available for storing your data. As the research proceeds, the needs for the storage may change and most likely the data needs to be stored in several systems or with different access rights during its lifecycle. To get the most out of your data during its lifecycle, keep it managed, documented, secured and backed up.

Recommendations

Put the data in a repository that secures the backup of the data and where you can choose the level of openness (not opened / opened to your colleagues partially or completely / opened to everyone partially or completely):

  • Zenodo
  • Finnish Social Science Data Archive (FSD)
  • ACRIS - Aalto Current Research Information System for research outputs, researchers, projects, datasets, etc.
  • Also consider opening the data directly if possible. See here for options.

For day to day collaboration you can consider cloud services described in Aalto intranet (login required):

You might also want to check the IT Services for Research page (login required) to make sure you are aware of all resources at Aalto.

Should you have special requirements concerning storage (e.g. more performance, capacity, collaboration, mobility), please contact esupport [at] aalto [dot] fi (Aalto IT support) to find a suitable solution to your specific needs.

What research data should be preserved and shared?

Minimally  researchers must ensure that the data needed to validate results in scientific publications are preserved and should be available,  minimally for other researchers on request. Everything that is needed to replicate a study should be preserved, and everything that is potentially useful for others. For more information see “How to select and appraise research data “:www.dcc.ac.uk/resources/how-guides/appraise-select-research-data

The datasets must have the associated metadata: the dataset’s creator, title, year of publication, repository, identifier etc. The Finnish Social Science Data Archive staff can help add the metadata to materials that are stored and opened in the repository, for example interviews. FSD's data descriptions are available online as DDI 2.0 XML files. See more http://www.fsd.uta.fi/en/data/background/ddi-records.html

 The datasets should  be FAIR. FAIR Guiding Principles for scientific data management & stewardship http://www.nature.com/articles/sdata201618

The datasets must have a persistent identifier. The repository will assign a persistent ID to the dataset: this is important for discovering and citing the data.

Documentation should be preserved: code books, lab journals these are  important for understanding the data and combining them with other data sources.

Software, hardware, tools, syntax queries, machine configurations – domain-dependent, and important for using the data. (Alternative: information about the software etc.)

Source: Sarah Jones and  Marjan Grootveld: How to write a Data Management Plan  https://eudat.eu/events/webinar/joint-eudat-openaire-webinar-%E2%80%9Chow-to-write-a-data-management-plan%E2%80%9D licensed with a CC-BY 4.0 license https://creativecommons.org/licenses/by/4.0/

Data can be archived to a repository and the access right can at the beginning and during the project be defined as closed access. This can be changed to restricted access, embargoed access and open access according to the goals of the project.

Embargoed access  can  be used for datasets. With embargoed access  the researchers who have collected the data use the research data as underlying data to their publications first. Only after publication researchers  do publish the citable datasets, using a license that requires attribution, for example CC BY 4.0 https://creativecommons.org/licenses/by/4.0/ . The license requires  that  authors and publications are cited according to the Attribution term of the license

Documenting data

You will need to provide metadata that complies with an international metadata standard. A researcher should follow a metadata standard in his line of work, or a generic standard, e.g. Dublin Core or DataCite, for more information see  Research Data Alliance (RDA) http://rd-alliance.github.io/metadata-directory/standards/. To facilitate providing the metadata you should answer the following questions already during the research work. This information can additionally be listed in a README file.

  • Who are the creators and what are their affiliations

  • Where the data is located and is there a persistent identifier

  • What is the license chosen to allow reuse

  • How, when and by whom the data has been collected/ created
  • How the data has been prepared for analysis
  • What kind of data manipulations have taken place
  • How and what methods have been used to analyse the data
  • What instruments and devices have been used
  • Which scientific publications are based on this data
  • What is the software used to process and analyse the data

Confidential data and information security

Information security is about keeping your information safe and accessible. Information should be safe: neither changed nor destroyed accidentally. Information should be accessible: available to you and away from unauthorized users.

If you obtain confidential data, make sure to follow the nondisclosure agreement (NDA).

Plan the security aspects and the handling of personal data  in the beginning of your research. More information on personal data as part of research data in the section research ethics. If you start a project that collects confidential data then please contact researchdata [at] aalto [dot] fi.

Here are some things to consider:

Additional information:

Page content by: | Last updated: 17.11.2016.