Software as a research outcome
Software is an essential part of much modern research, and can serve as a competitive advantage or something to hold you back. Software that is a byproduct of research work, could include code and scripts for modifying and processing the data, analyzing data, and running simulations - even if it doesn't have an apparent direct use. Will open software serve to bring you more citations, or closed software make your work harder to build upon?
- Choose and give your code a license from the start, depending on your strategic goals.
- Use a version control system – which one you choose is up to you, but git + Github or Gitlab is by far the most common these days.
- Publish your code to a public repository. If you desire citations, use Github + Zenodo, or publish an archive to any other location.
Choose a license
Before beginning a project, the single most important consideration is to choose a license. Due to copyright law, all software is by default non-reusable. Licenses allow anything from "Use it for anything as long as you allow anyone else to use your work" to "use for anything, including commercial purposes. Without choosing a license, even the original authors may lose ability to fully use their software in the future.
The main consideration is a viral license or permissive license. A viral license means that any derivative works must also be equally open - this is used when one wants to give an advantage to open work, though it can still be used by open companies. A permissive license allows anyone to use the software for any purpose, even in closed commercial software.
To decide a license, the site https://choosealicense.com/ provides good advice. Aalto-specific information can be found at Open Source on the Aalto Scientific Computing Guide. The two most commonly recommended licenses are:
- Aalto University recommends the permissive MIT License for published software. This open source license permits also later commercial use by anyone, which is best for widest distribution and use of results, if scientific credit is the primary goal.
- The viral GNU General Public License allows anyone to use the software for any purpose, but requires that modifications are also open. This allows the original authors to retain a commercial advantage in the future, while not inhibiting academic-type use.
Should the software project be of major importance or involve other intellectual property rights, it can be worth it to be discuss with Aalto Innovation Services first to plan a strategy to combine academic openness and commercialization potential.
Use version control and a repository
One key tool for any software development is a version control system. Each change can be individually recorded, and it provides history. It allows you to look back in time to see code changes – a vital part of scientific reproducibility. It's helpful in fixing bugs and handling your codebase for personal use, but for team collaboration it's essential, and collaboration is required for openness. It provides the basis for getting a greater community involvement.
Git is the most common version control system these days, though it can be a bit difficult to use. Mercurial is also common and more user friendly. However, Git has the widest data hosting possibilities, including Github and the Aalto version control system.
In addition to using a version control system, we recommend to use a central repository to provide backups and potential for collaboration. For open software, Github is recommended for its visibility, and Aalto Gitlab (version.aalto.fi) for it's private repositories and non-commercial nature.
If your source code is subject to a confidentiality agreement (NDA), you should explicitly agree with the research partners which services and tools you're going to use for handling the source code. For the most sensitive projects, even a separate system might be needed.
- Online version control system for software, can be used from the beginning of development work.
- Public projects are free and unlimited.
- Well known and popular: increases visibility, discoverability, citability and reusability of your code. A huge amount of scientific software is there these days.
- Supports permanent identifier (DOI) for your code with integration to Zenodo. See: Making your code citable.
- Github has a list of major scientific projects being hosted there. Look at how issues, the wiki, forks, and pull requests all combine to make it easy to contribute. These projects are very large, but the principles can be applied to small projects as well.
version.aalto.fi (Aalto Gitlab)
- The Aalto Version Control System (version.aalto.fi) runs a private Gitlab for the Aalto community, which provides an interface functionally equivalent to Github. Any research project in Aalto University may use it, including collaborative projects with research partners outside the university.
- Aalto Version Control System is like Github, but hosted by Aalto. Basic features are the same, but you have local support and can have unlimited private repositories. They also allow external collaborator access.
- The primary benefit of Aalto Gitlab is self-control and private repositories for individuals and organizations.
- For more information, see scicomp or Aalto instructions.
After your project
Github and version.aalto.fi don't provide permanent project hosting, even if it's not immediately deleted. To permanently store code (for example, storing a version referred to by a published paper), we recommend the Github Zenodo integration, which also makes your code citeable with a DOI (you can also directly archive code on Zenodo, too). For more information, see the Github guide on Zenodo.
At Aalto, https://scicomp.aalto.fi provides a wealth of information about computational research and software.
CodeRefinery is a Nordic organization that provides researcher-focused training about software development. There are workshops in the Helsinki region twice per year, including at least one at Aalto.
Links to research data management instructions
Follow these links to navigate through research data management instructions.
Properly managed research data creates competitive edge and is an important part of a high-quality research process. Here you will find links to support, services and instructions for research data management.