Software as a research outcome
Software that is the primary output of the research is good to publish with appropriate license. It's important to consider support and future updates. Software that is a byproduct of research work, could include code and scripts for modifying and processing the data, analyzing the data, and running simulations. These can be published and licensed too, mainly for reproducibility and so that others can build on your analysis.
Recommendations for publishing software
- Use a version control system – which one you choose is up to you, but Github is by far the most common these days.
- Make sure that all code you want to be open includes a license file. Choose open by default!
- Publish your code to a public repository. If you desire citations, use Github + Zenodo, or publish an archive to any other location.
Use a version control system, it's great
One key tool for any software development is a version control system. Each change can be individually recorded, and it provides history. It allows you to look back in time to see code changes – a vital part of scientific reproducibility. It's helpful in fixing bugs and handling your codebase for personal use, but for team collaboration it's essential, and collaboration is required for openness. It provides the basis for getting a greater community involvement.
Git is the most common version control system these days, though it can be a bit difficult to use. Mercurial is also common and more user friendly. However, Git has the widest data hosting possibilities, including Github and the Aalto version control system.
When the work is done, a version management system can be used for publishing the software. One solution, Github, provides even permanent identifiers (DOI) for software (via Zenodo).
Choose a license
- Due to copyright law, all software and creative output is by default not open. Include a license file with your license. Read more about different licenses on Open Source Initiative webpage.
- Aalto University recommends the MIT License for published software. This open source license permits also later commercial use, which might become useful for the original author or the university. Other licenses may be chosen to achieve the strategic goals of the research project.
- The MIT license allows anyone to use the software for any purpose. The Gnu General Public License (GPL) allows anyone to use the software for any purpose, but requires that modifications are also open. Both allow commercial use, but the GPL requires that commercial users make their work open as well, ensuring that it does not become proprietary.
- Currently, in order to release software as open source, the approval of the primary investigator (PI) is needed.
Publish code in a repository
The recommended service for software that you're going to publish is Github. The benefits include:
- Online version control system for software, can be used from the beginning of development work.
- Public projects are free and unlimited.
- Well known and popular: increases visibility, discoverability, citability and reusability of your code. A huge amount of scientific software is there these days.
- Supports permanent identifier (DOI) for your code with integration to Zenodo. See: Making your code citable.
Aalto Version Control System (version.aalto.fi) runs on Gitlab CE and it provides git version control for software. Any research project in Aalto University may use it, including collaborative projects with research partners outside the university.
Aalto Version Control System is like Github, but hosted by Aalto. Basic features are the same, but you have local support and can have unlimited private repositories. They also allow external collaborator access.
Restricted access / confidentiality issues
- Sometimes the project chooses not to publish the software used in research. Perhaps it might not be useful for others, there are license restrictions for part of it or some other need to restrict access to it.
- For private repositories, Github has a price tag depending on the number of users. If the budget is not a problem, it's a reasonable choice. One can also use the Aalto Version Control System for hosting private repositories.
- If you're concerned about the data location, you can choose a local version control system instead of the cloud services hosted abroad such as Github. Git does not require a remote repository at all – it can track versions directly on your computer. However, make sure that your code is backed up.
- If your source code is subject to a confidentiality agreement (NDA), you should explicitly agree with the research partners which services and tools you're going to use for handling the source code. For the most sensitive projects, even a separate system might be needed.
Github has a list of major scientific projects being hosted there. Look at how issues, the wiki, forks, and pull requests all combine to make it easy to contribute. These projects are very large, but the principles can be applied to small projects as well.