Publishing and reusing open data
Overview and instructions to services for sharing and publishing research data
Software is an essential part of much modern research, and can serve as a competitive advantage or something to hold you back. Software that is a byproduct of research work, could include code and scripts for modifying and processing the data, analyzing data, and running simulations - even if it doesn't have an apparent direct use. Will open software serve to bring you more citations, or closed software make your work harder to build upon?
At least:
Before beginning a project, the single most important consideration is to choose a license. Due to copyright law, all software is by default non-reusable. Licenses allow anything from "Use it for anything as long as you allow anyone else to use your work" to "use for anything, including commercial purposes. Without choosing a license, even the original authors may lose ability to fully use their software in the future.
The main consideration is a viral license or permissive license. A viral license means that any derivative works must also be equally open - this is used when one wants to give an advantage to open work, though it can still be used by open companies. A permissive license allows anyone to use the software for any purpose, even in closed commercial software.
To decide a license, the site https://choosealicense.com/ provides good advice. Aalto-specific information can be found at Open Source on the Aalto Scientific Computing Guide. The two most commonly recommended licenses are:
Should the software project be of major importance or involve other intellectual property rights, it can be worth it to be discuss with Aalto Innovation Services first to plan a strategy to combine academic openness and commercialization potential.
One key tool for any software development is a version control system. Each change can be individually recorded, and it provides history. It allows you to look back in time to see code changes – a vital part of scientific reproducibility. It's helpful in fixing bugs and handling your codebase for personal use, but for team collaboration it's essential, and collaboration is required for openness. It provides the basis for getting a greater community involvement.
Git is the most common version control system these days, though it can be a bit difficult to use. Mercurial is also common and more user friendly. However, Git has the widest data hosting possibilities, including Github and the Aalto version control system.
In addition to using a version control system, we recommend to use a central repository to provide backups and potential for collaboration. For open software, Github is recommended for its visibility, and Aalto Gitlab (version.aalto.fi) for it's private repositories and non-commercial nature.
If your source code is subject to a confidentiality agreement (NDA), you should explicitly agree with the research partners which services and tools you're going to use for handling the source code. For the most sensitive projects, even a separate system might be needed.
Github and version.aalto.fi don't provide permanent project hosting, even if it's not immediately deleted. To permanently store code (for example, storing a version referred to by a published paper), we recommend the Github Zenodo integration, which also makes your code citeable with a DOI (you can also directly archive code on Zenodo, too). For more information, see the Github guide on Zenodo.
At Aalto, https://scicomp.aalto.fi provides a wealth of information about computational research and software.
CodeRefinery is a Nordic organization that provides researcher-focused training about software development. There are workshops in the Helsinki region twice per year, including at least one at Aalto.
Follow these links to navigate through research data management instructions.
Overview and instructions to services for sharing and publishing research data
Properly managed research data creates competitive edge and is an important part of a high-quality research process. Here you will find links to support, services and instructions for research data management.