Department of Computer Science

Summer employee positions at the Department of Computer Science 2022

We are looking for BSc or MSc degree students at Aalto or some other university to work with us during the summer 2022.
Computer Science building front entrance, Aalto University, photo Matti Ahlgren / Tietotekniikan talon sisäänkäynti, Aalto-yliopisto

The Department of Computes Science is now looking for Summer employees!

We at the Department of Computer Science want to offer motivated students a chance to work on interesting research topics with us. We are looking for BSc or MSc degree students at Aalto or some other university to work with us during the summer 2022. If you have enjoyed your studies and want to learn more about computer science, this might be your place. We do not expect you to have previous research experience; this could be the start of your bright researcher career! You will be supported by other summer employees and doctoral students & postdocs at the department.

Ready to apply?
See the complete list of the available topics below, choose the topic(s) (max. 5) that interest you the most and list them in the order of preference on the application form.

Please submit your application through our recruitment system. The application form will close on 17th January 2022 at 23:59 Finnish time (UTC +2).  

Link to the application form:

To apply, please share the following application materials with us:
•    Motivation letter
•    CV
•    Up-to-date transcript of records (unofficial is ok)

Are you an international student or coming from abroad?
Please check the Aalto Science Institute AScI internship programme for international summer employees.

AScI arranges activities for international summer employees who have applied through their call and helps in finding the apartment in Espoo.

More information
If you have questions regarding the applying please contact Sanni Kirmanen from HR Team. [email protected].

Summer employee topics 2022

1.    Bayesian workflow

Supervisor: Associate Professor Aki Vehtari
Email: [email protected]
Number of open positions: 1

Background: The task is to take part in development of Bayesian workflow tools and diagnostics. The topic can be selected together with the student. Some example topics are importance sampling diagnostics, diagnosing funnel and banana shaped posteriors, Monte Carlo standard error for arbitrary function, analysis of dynamic Hamiltonian Monte Carlo behavior, analysis of low rank black box variational inference, visualization of results from projective predictive model selection.

Prerequisites: knowledge of Bayesian methods, and R or Python

2.    Transparent and Explainable Machine Learning

Supervisor: Assistant Professor Alex Jung
Email: [email protected]
Number of open positions: 1

Develop novel methods for computing personalised explanations for the predictions obtained from complicated machine learning methods (such as deep learning methods that use deep nets with billions of tunable weights).

Background reading:

A. Jung and P. H. J. Nardelli, "An Information-Theoretic Approach to Personalized Explainable Machine Learning," in IEEE Signal Processing Letters, vol. 27, pp. 825-829, 2020, doi: 10.1109/LSP.2020.2993176.

3.    Federated Learning over Networks for Pandemics

Supervisor: Assistant Professor Alex Jung
Email: [email protected]
Number of open positions: 1

You will study federated learning algorithms that allow to learn personalized Covid-19 infection risk predictions. These algorithms are run by smartphone apps and exchange non-sensitive information between close-by smartphones (the "contact-network").

More info here:

Convex optimization, machine learning, Python, networks

4.    Federated Learning in Networked Data

Supervisor: Assistant Professor Alex Jung
Email: [email protected]
Number of open positions: 1

Design and analyze distributed convex optimization methods for learning personalized models in networks of data points. The data points could be IoT devices (sensors or valves within a car engine) or humans during a pandemic. Federated Learning aims at exploiting the information contained in individual data points as well as their network connection to learn accurate predictive models (for condition monitoring of engines or Covid-19 infection risk prediction).

5.    Research assistant for visual algorithm simulation

Supervisor: Senior University Lecturer Ari Korhonen / Doctoral Candidate Artturi Tilanterä
Email: [email protected]
Number of open positions: 1

We are conducting research to improve teaching of Data Structures and Algorithms. There are computerised, visual exercises on data structures and algorithms. We would like to record students’ answers in detail to improve those exercises.
See the five minute introductory video:
Ultimately, your work will help computing students in the future.

Here you can see a couple of the examples of the recorder software:
Your task would be developing further the software which records students’ solutions to these visual algorithm simulation exercises.

Requirements: At the beginning of your summer work, you should have completed the following Aalto University courses:
One of these:
- CS-A1110 Data Structures and Algorithms
- CS-A1141 Tietorakenteet ja algoritmit Y
- CS-A1143 Data Structures and Algorithms Y
- Other similar course
One of these:
- CS-C3170 Web Software Development
- Experience with JavaScript programming
Both (a) and (b) are required.

It is a plus if you already know Git, Node.js, and JSON Schema. Also all general programming experience is a merit; you can add a link to your GitHub repository in your application or have a free-form PDF portfolio of your software development works as an attachment.

The working language is either Finnish or English. However, you should be able to read and write software documentation in English.

We offer a relaxed working environment and an inspiring view to computing education research.

Feel free to ask further questions.

6.    Positions in Arno Solin’s Machine Learning Group

Supervisor: Assistant Professor Arno Solin
Email: [email protected]
Number of open positions: 3

We are looking for highly motivated candidates to work in the interface of probabilistic machine learning and various application areas. Successful candidates should have good math skills, be knowledgeable in either Python or Julia, and have experience with machine learning related topics, e.g. Bayesian inference, deep neural networks, or Gaussian processes. The work will be done in close collaboration with the group members.

Deep architectures like deep neural networks are an integral part of contemporary artificial intelligence. However, when deploying deep learning models to real-world applications we need to pay attention to issues related to data efficiency, robustness, and interpretability. The design choices made in deep models imply strong inductive biases. Realizing this makes it possible to build more data-efficient, robust and interpretable deep models.

Possible project topics include, but are not limited to: (i) examining recent techniques for uncertainty quantification in Bayesian deep learning, (ii) investigating the scalability of stationary deep learning models to large computer vision data sets, and (iii) advancing recent work on deep mixtures of Gaussian process experts and graph neural networks. You can familiarize yourself with recent research done in the group by browsing

7.    Investigating Homophily and the Glass Ceiling in Supervisor- and Collaboration-Networks

Supervisor: University Lecturer Barbara Keller
Email: [email protected]
Number of open positions: 1

Homophily and the Glass Ceiling are concepts from Sociology: ""The glass ceiling is a colloquial term for the social barrier preventing women and members of minority groups from being promoted to top jobs in management"" and ""Homophily is describing the tendency of individuals to associate and bond with similar others""
In ""Homophily and the Glass Ceiling Effect in Social Networks"" the authors described a graph evolution model which exhibits a glass ceiling effect under certain parameters. We want to extend this work by investigating additional real-world networks, such as (but not limited to), supervisor- and collaborator-networks.  

The tasks involves:
- Finding relevant data sources
- Scraping and cleaning data
- Calculating relevant metrics
- Write-up of the findings

The applicant is interested in Social Networks and their analysis and has sound programming skills, preferably in python.

8.    Kirjasampo: suomalaisen kaunokirjallisuuden data-analyysit semanttisessa webissä

Supervisor: Professor Eero Hyvönen
Email: [email protected]
Number of open positions: 2 on maamme yleisten kirjastojen erityisesti kaunokirjallisuuteen keskittyvä semanttinen tietoportaali, jolla on nykyisin lähes kaksi miljoonaa käyttäjää vuosittain. Järjestelmän konsepti perustuu ""Sampo-malliin"" ja sen semanttisen webin teknologiaan perustuva julkaisukonsepti, tietämysgraafi (knowledge graph) ja datapalvelu kehitettiin alun perin osana Aalto-yliopiston ja Helsingin yliopiston Kulttuurisampo-palvelua ja kansallista FinnONTO-hanketta.
Haemme tutkimustyöstä kiinnostunutta henkilöä uuteen kehityshankkeeseen, jossa tutkitaan maamme kaunokirjallisuutta digitaalisten ihmistieteiden data-analyyttisillä menetelmillä. Työssä hyödynnetään Kirjasammon datapalvelun SPARQL-rajapintaa, Python-kirjastoja, Google Colab Jupyter Notebook -järjestelmää ja YASGUI-editoria. Hankkeessa kehitetään myös SeCo-tutkimusryhmän Sampo-UI työkalulla fasettihakuun perustuvaa prototyyppiä, jossa aineistoja voidaan hakea ja selata semanttisen fasettihaun avulla ja tehdä hakutuloksille data-analyysejä ilman ohjelmointitaitoa.

Lisätietoa ja videoita Sampo-mallista ja Sampo-portaaleista:

Työ soveltuu kesätyöksi mutta myös esimerkiksi DI-työksi tai graduksi. Työ tehdään Semanttisen laskennan tutkimusryhmän tutkijoiden ohjauksella. Palkkaus Aalto-yliopiston normien mukaisesti.

Prof. Eero Hyvönen
Aalto-yliopisto ja Helsingin yliopisto (HELDIG-keskus)

9.    Historiasampo - Suomen historia semanttisessa webissä

Supervisor: Professor Eero Hyvönen
Email: [email protected]
Number of open positions: 2

Historiasampo / TimeMachine Finland: Suomen historia linkitettynä datana

Hankkeessa kehitetään prototyyppi semanttisen webin uudesta ""Sampo-portaalista"" Historiasampo. Sen ydinaineistona on Suomen historiallisia tapahtumia kuvaava ontologia, jonka ydin perustuu maamme historiantutkijoiden toimittamaan Agricola-aikajanaan. Historiallisiin aineistoihin liittyvissä verkkopalveluissa yhdistetään eri tietolähteistä saatavaa heterogeenista tietoa mm. henkilöihin, esineisiin, taiteeseen, kirjoituksiin, aikoihin ja paikkoihin liittyen.

Historiasammossa kehitetään kansainväliselle CIDOC CRM -tietomallille laajennus historiallisten tapahtumien kuvaamiseksi ja sovelletaan sitä Suomen historian kuvaamiseksi. Tuloksena syntyvän ontologian, tietämysgraafin (knowledge graph) ja linkitetyn datan palvelun rajapinnan varaan toteutetaan demonstraattori verkkopalvelusta ""Historiasampo"", joka kerää yhteen kulttuurihistoriallista tietoa.

Lisätietoa Sampo-mallista ja portaaleista:

10.    Designing for Developer Experience: Web + mobile research tool design and development

Supervisor: Senior University Lecture Fabian Fagerholm
Email: [email protected]
Number of open positions: 1-4

We are recruiting summer interns to contribute to our research on developer experience. This project will involve building a web and mobile application for performing and presenting research on developer experience. We seek students with an interest in developer experience research, skills in web and/or mobile development, and knowledge of research methods in human-computer interaction and/or psychology.

Developer experience refers to the cognitive, motivational, and affective experience that software developers have while developing software. We have previously collected and analysed research related to cognition in software development and are now organising the material into web site form. We are also building tools for collecting experience data from software developers.

Your task could focus on web and mobile development, research instrument development, or a combination, depending on your skills and interests. You would work in a small team that plans, designs, and implements web and mobile components. The position can also be more research-oriented, in which case excellent academic writing skills and an understanding of HCI and/or psychology are required rather than the technical skills listed below. A research-oriented position can be combined with a master's thesis in the field.

Required skills:
- Familiarity with modern web development (e.g., HTML5, CSS, JavaScript, Python).
- Familiarity with mobile app development (e.g., Android, iOS).
- Familiarity with collaborative software development (e.g., Git, Continuous Integration).

Desired skills:

- Static and dynamic web development (Gatsby, React, Python).
- Cross-platform mobile development (React Native).
- Cloud deployment using GitHub Pages, Docker.
- Understanding of research instrument development (e.g., questionnaire design, qualitative and quantitative data analysis).
- IDE/editor plug-in development.
- Understanding of accessibility, security, and data protection in web and mobile development.

11.    Deep generative modeling for 1) precision medicine, 2) continuous-time dynamical systems, and 3) single-cell sequencing data

Supervisor: Associate Professor Harri Lähdesmäki
Email: [email protected]
Number of open positions: 3-6

Project-1: Deep generative modeling for precision medicine
We are looking for a summer intern to develop novel probabilistic machine learning methods for large-scale health datasets from biobanks and clinical trials. This project aims to develop novel deep generative modeling methods to (i) predict adverse drug effects using longitudinal/time-series data from large-scale biobanks and clinical trials, and to (ii) harmonize large-scale health data sets for AI-assisted decision making to revolutionize future clinical trials. Methodologically this project includes e.g. VAEs, GANs, Bayesian NNs, domain adaptation, Gaussian processes and causal analysis. Experience/Studies on (probabilistic) machine learning is expected. Tasks for summer internship can be adapted to fit student's skills. The work will be done in collaboration with research groups from the Finnish Center for Artificial Intelligence, and the novel methods will be tested using unique real-world data sets from our collaborators in university hospitals and big pharma company. Work can be continued after the summer.

Our recent work:

Project-2: Deep generative modeling for continuous-time dynamical systems
Recent machine learning breakthroughs include black-box modeling methods for differential equations, such as Gaussian process ODEs [1] and neural ODEs. These methods are particularly useful in learning arbitrary continuous-time dynamics from data, either directly in the data space [1] or in a latent space in case of very high-dimensional data [3]. We are looking for a summer intern to join our current efforts to (i) develop efficient yet calibrated Bayesian methods for learning such black-box ODE models, (ii) develop neural ODEs to learn arbitrary dynamics of high-dimensional systems (e.g. in robotics, biology, physics or video applications) using a low-dimensional latent space representation, and (iii) further developing these methods for reinforcement learning and causal analysis. Experience/Studies in (probabilistic) machine learning is expected. Tasks for summer internship can be adapted to fit student's skills. Work can be continued after the summer.

Our recent work:

Project-3: Deep generative modeling for single-cell sequencing data

Single-cell sequencing technologies provide functional genomics data at unprecedented resolution and can help revealing answers to various disease-related questions that could not be answered previously. We are looking for a summer intern to develop novel probabilistic machine learning methods for various tasks in single-cell biology, including e.g. (i) cell type identification, (ii) modeling spatial single-cell data, (iii) predicting immunotherapy treatment response, (iv) analysing single-cell data from cross-sectional studies. Experience/Studies in (probabilistic) machine learning as well as interest/studies in bioinformatics are expected. Tasks for summer internship can be adapted to fit student's skills. Work can be continued after the summer.

For more information:

12.    Massively Parallel Algorithms for Graph Problems

Supervisor: Assistant Professor Jara Uitto
Email: [email protected]
Number of open positions: 1

Parallel processing of data and distributed computing are gaining attention and becoming more and more vital as the data sets and networks we want to process are overgrowing the capacity of single processors. To understand the potential of modern parallel computing platforms, many mathematical models have emerged to study the theoretical foundations of parallel and distributed computing. In this project, we study algorithm design in these models with a particular focus on the Massively Parallel Computing (MPC) and Local Computation Algorithms (LCA) models.

The problems we study are often in (but not limited to) the domain of graphs, that serve as a very flexible representation of data. We are interested in, for example, the computational complexities of classic problems such as finding large independent sets, matchings, flows, clustering problems, etc.

The applicant is assumed to have a solid knowledge of mathematics, knowledge on the basics of graph theory, and a good command of English. No prior knowledge in distributed computing is required, although it might be helpful.

13.    Kesätyöntekijöitä ja pääassareita Ohjelmointi 1 -kurssille

Supervisor: Senior University Lecturer Juha Sorva
Email: [email protected]
Number of open positions: 1-2

These positions require a reasonably fluent command of written and spoken Finnish.

Ohjelmointi 1:n (O1:n) pääassaritiimi uudistuu osittain syksyksi 2022; tarjolla on hommia ainakin yhdelle uudelle tekijälle. Vähimmäisvaatimuksena on kurssin asioiden hyvä osaaminen, vastuullisuus sekä motivaatio panostaa laadukkaaseen opetukseen ja opetella uutta tarpeen mukaan.

Kesällä 2022 pääassistentit osallistuvat kurssin kehittämiseen koko- tai osa-aikaisesti. Työ jatkuu syyslukukaudelle, jolloin sille tulee varata 30–50 % viikosta ja tehtävissä painottuvat opettaminen ja kurssijärjestelyt. Aluksi tarjoamme työsopimuksen vuoden 2022 loppuun saakka, mutta oikein jatkamme pestejä pidemmiksikin. Päivä- ja viikkotasolla työajat ovat hyvin joustavat.

Kehitystehtävät voivat olla ohjelmointia, automaattisen arvioinnin ja muiden verkko-opetustyökalujen konfigurointia ja/tai oppimateriaalin laatimista (esim. työkaluvideot). Ne ovat ainakin osin sovitettavissa kunkin pääassarin omiin toiveisiin ja osaamiseen. Työssä voi oppia uusia taitoja ja teknologioita, ja opinnäyteaiheitakin voidaan etsiä, jos se puoli kiinnostaa.

O1-kurssi on esitelty oppimateriaalin ensimmäisessä luvussa: Lisätietoja saa vastuuopettajilta.

14.    Foundations of distributed and parallel computing

Supervisor: Associate Professor Jukka Suomela
Email: [email protected]
Number of open positions: 1-3

The modern world relies on huge computer networks and large-scale computing clusters, and our research group studies the theoretical foundations of such systems. We seek to understand the fundamental limits of what can be solved efficiently in very large networks or with massively parallel computers.

We have got plenty of exciting summer internship opportunities for students with different kinds of backgrounds and interests. Some of our work is similar to what people typically do in mathematics: proving theorems with pen and paper. However, some of us are also making heavy use of computers in our work: we write computer programs that discover algorithms and prove theorems for us. So we have got something exciting to do both for those students who like to do computer programming and for those who work better without touching computers.

We are looking for students who enjoy thinking about mathematical puzzles and who have got good problem-solving skills. We expect you to have some familiarity with algorithm design and analysis, theoretical computer science, and discrete mathematics. Knowing something about graph theory and distributed or parallel computing will be helpful but not necessary.

15.    Harjoitustehtävien ja niiden automaattitarkistimien laatiminen kurssille CS-A1111 Ohjelmoinnin peruskurssi Y1

Supervisor: Senior University Lecturer Kerttu Pollari-Malmi
Email: [email protected]
Number of open positions: 2

These positions require a fluent command of written and spoken Finnish.

Tehtävänä on laatia yhdessä toisen kesätyöntekijän kanssa uusia harjoitustehtäviä syksyn CS-A1111 Ohjelmoinnin peruskurssi Y1 -kurssia varten sekä automaattisia tarkistimia näille tehtäville. Lisäksi kesätyöntekijä toimii lisäassistenttina osassa Y1-kesäkurssin harjoitusryhmissä. Tehtävä vaatii ohjelmointitaitoa Pythonilla ja hyvää ideointikykyä. Koska kurssin kieli on suomi, on myös hyvä suomen kielen taito välttämätön.

16.    3D visualisation of and pattern recognition from large-scale data from multi-physics simulations

Supervisor: Associate Professor Maarit Käpylä
Email: [email protected]
Number of open positions: 2

Large-scale simulations of, for example, magnetised fluids in stellar interiors produce huge amounts of three dimensional data, where each system state can comprise hundreds of Gigabytes or even Terabytes. Analysis, visualisation, and even storage of such data is challenging, and special tools are required.

From the visualisation perspective, we are looking for a summer intern, who could develop further our existing Python framework, with which we create 3D visualisations from the simulation data (
The task of the summer intern 1 is to enhance the existing toolbox by adding parallel processing capabilities, to better handle multiple snapshots of large datasets for animation. Prerequisites: Good knowledge in Python, and managing Jupyter notebooks. Some knowledge of supercomputing environments is a bonus

From the analysis and storage perspective, we need to develop tools that are capable of recognising sub-regions of interest, and analyse and output data only from these regions, while storing the full system states will no longer be possible in the future. The long-term aim of the project is to develop an online or offline structure-detector assistant for the large-scale simulation toolbox. The tasks of the summer intern 2 include: To continue developing an existing code based on the FasterRCNN object detection model. The code also includes a data augmentation pipeline, which is necessary for increasing the training data size and diversity; Generating training data for the neural network using idealised simulation setups; use the generated training data for deep learning network; apply the trained network to detect the predefined structures and track their evolution in time from the real simulation data. Prerequisites: Basic knowledge on ML is required, and being familiar with toolboxes like PyTorch or Tensorflow is an extra benefit. 

17.    Modern ubiquitous applications: from devices to the cloud

Supervisor: Associate Professor Mario Di Francesco
Email: [email protected]
Number of open positions: 1-2

Modern applications that are ubiquitous – namely, everywhere – rely on two key components. First, on embedded devices such as mobile phones, wearables and smart objects in the Internet of Things to interact with users and collect information from the physical environment. Second, on a cloud or edge computing infrastructure to support different types of applications requiring a substantial amount of processing and storage, such as those involving machine learning. The major challenges in realizing such applications include efficient resource utilization at both devices and the supporting infrastructure, reliability, and user friendliness. The goal of this project is to investigate some of these aspects in the context of the research carried out in our research group. See also for additional details.

Required skills: experience with Android application development or embedded systems programming, solid understanding of data analysis and (or) machine learning techniques.

Desired skills: some knowledge on human computer interaction and (or) computer vision, familiarity with cloud and web technologies.

18.    Reconstructing Crisis Narratives: Computational Social Media, Visualization and Platform Design

Supervisor: Professor of Practice Nitin Sawhney
More information about the position: Henna Paakki
Email: [email protected]
Number of open positions: 2-3

This research project jointly being conducted between Aalto University and THL is analyzing and reconstructing crisis narratives using mixed-methods, combining qualitative research for narrative inquiry with computational data analytics of crisis discourses in news and social media.

We are seeking two research interns to work with our Computational Social Science team on social media analytics and the Design Research team on devising a web-based platform for visualizing this data in an interactive manner.

Applicants should have experience with either machine learning, NLP and social media analytics OR data visualisation, Javascript and web-based programming for rapid prototyping and design.

19.    Civic Agency in AI? Democratizing Algorithmic Services in the City (CAAI)

Supervisor: Professor of Practice Nitin Sawhney
More information about the position: Karolina Drobotowicz
Email: [email protected]
Number of open positions: 2

Algorithmic tools are increasingly being incorporated into public sector services in cities today. The CAAI project aims to understand citizens’ algorithmic literacy, agency and participation in the design and development of AI services in the Finnish public sector in order to advance more democratic and citizen-centric digital infrastructures.

This new project has the following research objectives: 1) understanding the values, narratives and discourses embedded in public sector data-centric and algorithmic services, 2) understanding citizens’ level of literacy and perceived agency with regards to algorithmic public services, 3) empowering citizens to critically engage with algorithmic public services, and 4) transforming design of public sector AI services to ensure civic participation.

Applicants must show a keen interest in this topic and bring a mix of technical and soft skills in at least one of these aspects: programming and rapid prototyping of web-based platforms, using NLP and textual data processing for analysing content and data visualization, and/or conducting interviews and qualitative research with potential participants as part of our team.

20.    Kesätyöntekijöitä ja pääassari Ohjelmointistudio 1 -kurssille / Research assistants and a head assistant position in Programming Studio 1

Supervisor: University Lecturer Otto Seppälä
Email: [email protected]
Number of open positions: 2

Research assistant

In this job you would work in co-operation with a PhD student researching debugging conducted by novice programmers. The work could involve designing exercises, study material, data collection etc. The exact work items will be informed by the ongoing research in Spring 2022 and will lead to a research setup in Fall 2022. The candidate should preferably be skilled in scala as it Is the programming language used on the introductory courses that will employ the material.

Head assistant / exercise developer

(The head assistant position requires a reasonably fluent command of written and spoken Finnish as the instruction is given in Finnish)

Ohjelmointistudio 1 -kurssin sisältöä on tarkoitus päivittää sisältämään aiempaa enemmän ohjelmien ja algoritmien suunnittelua käsittävään suuntaan. Samanaikaisesti perusohjelmointikursseilla käytetty Scala-ohjelmointikieli päivittyy versioon 3. Kurssille on tarkoitus tuoda myös uudentyyppisiä tehtäviä. Haemme kesäksi työntekijää i tähän kehitystyöhön joka mahdollisuuksien mukaan voisi toimia seuraavana syksynä kurssin pääassistenttina myöhemmin sovittavalla työpanoksella.

21.    Machine Learning for Health (ML4H)

Supervisor: Assistant Professor Pekka Marttinen
Email: [email protected]
Number of open positions: 1

Recent years have witnessed accumulation of massive amounts of health data, enabling researchers to address a range questions such as: how to allocate healthcare resources fairly and efficiently, how to provide personalized guidance and treatment to users based on real-time data from wearable devices, or how to use genomic data to understand disease or antibiotic resistance. Central challenges in ML4H include integrating noisy data from heterogeneous data sources, going beyond correlation to learn about causality, interpreting the models, and assessing the uncertainty of predictions, to name a few. We tackle these by developing models and algorithms which leverage modern machine learning principles: Bayesian neural networks, deep latent variable models, interactive machine learning, attention, reinforcement learning, and natural language processing. Our ongoing interdisciplinary projects include: analysis of nationwide healthcare register data, mobile health, genomics, antibiotic resistance, and epidemiology. Successful applicants are expected to have outstanding skills in machine learning, statistics, applied mathematics, or a related field. The focus of the position may be tailored based on the applicant's background and interests to either methodological or interdisciplinary research questions, and examples of our both kinds of recent research can be found in

22.    Developing the vHelix DNA nanostructure design platform

Supervisor: Professor Pekka Orponen
Email: [email protected]
Number of open positions: 1-2

The area of DNA nanotechnology [1] employs DNA as generic building material for assembling nanoscale objects with dimensions in the order of 10-100 nanometres. Our group has been developing, in collaboration with a biochemistry team from Karolinska Institutet, a general-purpose design platform ""vHelix"" for producing in particular 3D wireframe designs folded from a single long DNA strand [2].

A new, user-friendly and extendible version of the vHelix platform has been developed as summer internship projects in 2020 and 2021, and recently piloted in our DNA Nanotechnology course. After the 2015 publication of the DNA strand-routing algorithm [3] implemented in the current vHelix version, several alternative methods have emerged, and the goal of the present project is to implement some of these more recent algorithms as plugins to the new extendible vHelix version, and also to extend the vHelix design support from DNA to RNA nanostructures.

The project requires familiarity with basic algorithm design techniques, facility with combinatorial thinking, and good programming skills. Previous knowledge of biomolecules is not necessary, although it is an asset. For further information, please see the research group webpage at

[3] Benson et al., Nature 2015,

23.    Empowering full-body interaction in VR

Supervisor: Associate Professor Perttu Hämäläinen
Email: [email protected]
Number of open positions: 1-2

The hired student will develop novel VR interaction prototype(s), specifics to be decided based on the hired person's interests and skills. We are particularly interested in discovering interaction techniques that make users/players feel empowered and capable in movement, e.g., in social dancing or martial arts video games. The hired student is expected to be able to demonstrate previous experience in VR development with the Unity 3D game engine. For examples of previous work that one might build on, see:,,

24.    Deep Learning for Extreme Scale Classification

Supervisor: Assistant Professor Rohit Babbar
Email: [email protected]
Number of open positions: 1

Large output spaces with hundreds of thousand labels are common in Machine learning problems such as ranking, recommendation systems and next word prediction. Apart from the computational problem of scalability, data scarcity for individual labels poses a statistical challenge and especially so for data hungry deep methods. The goal of the project is to investigate and design deep learning based architectures for simultaneously addressing the computational and statistical challenge in learning with large output spaces. As the target domain is textual data, the project also involves exploring recent advances in NLP, such as Bert and TransformerXL, towards exploring the common grounds for further research in this area.

[1] LightXML: Transformer with Dynamic Negative Sampling for High-Performance Extreme Multi-label Text Classification, AAAI 2021.
[2] Embedding Convolutions for Short Text Extreme Classification with Millions of Labels
[3] SiameseXML: Siamese networks meet extreme classifiers with 100M labels, ICML 2021

25.    Bayesian NN

Supervisor: Professor Samuel Kaski
Email: [email protected]
Number of open positions: all together 8 in the group

Bayesian deep learning tackles the important yet challenging task of combining modern deep learning approaches with principled Bayesian probabilistic modelling techniques. In this summer project, you will join the Probabilistic Machine Learning group ( to explore this intersection. Possible questions of interest are how to scale Bayesian neural nets up to be competitive with current deterministic neural networks while ensuring high quality posterior predictive distributions; how to design functional priors that allow for more principled incorporation of prior knowledge to guide the training process; or how to incorporate them as building blocks into larger pipelines in areas such as active learning or reinforcement learning.
You will learn, e.g. how to implement and train modern neural networks, work in probabilistic programming, and how to read and contribute to state-of-the-art research.
Students with a background in machine learning and computer science, as well as an interest in learning and tackling interesting problems are encouraged to apply.

26.    Probabilistic data imputation

Supervisor: Professor Samuel Kaski
Email: [email protected]
Number of open positions: all together 8 in the group

Real-world data is often messy, and with many missing entries. This makes automatic data imputation highly relevant to real-world applications, whether in designing novel materials or finding drugs for new targets. Very simple shallow neural network models can give impressive practical results. In this project you would work within the Probabilistic Machine Learning group to investigate and improve the performance of modern architectures such as VAEs and Bayesian principles to give better results on real-world datasets.

Related overleaf with more details:

27.    Privacy-preserving synthetic data twins

Supervisor: Professor Samuel Kaski
Email: [email protected]
Number of open positions: all together 8 in the group

In recent years differential privacy has been established as the prevailing privacy notion. In practice, differential privacy is often achieved by introducing noise into a learning algorithm in order to mask out any individual contribution. The amount of noise added affects the privacy guarantees, and needs to be carefully calibrated to introduce as little bias as possible while maintaining the strongest possible theoretical guarantees.
Our recent focus has been in developing methods for differentially private release of synthetic twins of sensitive data sets. Our goal for the synthetic twins is to retain the key statistical properties of the data while providing a strong guarantee of anonymity. We are looking for a SUMMER INTERN to further improve these techniques, and to develop tools for assessing the biases in the synthetic data introduced by the private learning algorithm. Ideally, these results will be integrated into our software package for creating synthetic twins at
A suitable candidate has a strong background in math, especially in probability, and in programming (Python preferred). Join our quest for developing machine learning with strong privacy guarantees!

28.    Likelihood-free inference under model misspecification

Supervisor: Professor Samuel Kaski
Email: [email protected]
Number of open positions: all together 8 in the group

Likelihood-free inference (LFI) methods are used to fit complex, simulator-based models with intractable likelihood function to data. Such models appear in a various fields of science and engineering such as population genetics, radio propagation, and cosmology. However, LFI methods become unreliable when the true data generating process is differs from the data that can be simulated from the model, i.e., when the model is misspecified. In this project, you will analyse the issues faced by different LFI methods under varying levels of model misspecification, and come up with solutions to remedy them. You will join the Aalto Probabilistic Machine Learning group to develop new LFI methods that are robust to model misspecification.
Students with strong background in mathematics and statistics are especially encouraged to apply.

29.    Geometric deep learning for fast molecular simulation

Supervisor: Professor Samuel Kaski
Email: [email protected]
Number of open positions: all together 8 in the group

Accelerating molecular simulations has the potential to revolutionize numerous applications in computational biochemistry and life sciences (e.g., drug design). Recently, geometric deep learning models (e.g., graph neural networks) have shown promising results for predicting molecular interactions, outperforming methods based on fixed molecular kernels, and providing significant speedups over methods from quantum chemistry. This project aims to further advance geometric deep learning for fast predicting interatomic forces and the energy of molecules. Importantly, models of interest must satisfy known-a-priori relevant symmetries for molecular data. To join us in this project, we are looking for students with good programming skills (PyTorch/TensorFlow) and a background in machine learning. Knowledge of physics, chemistry, or bioinformatics is considered a plus.

30.    Humane active learning

Supervisor: Professor Samuel Kaski
Email: [email protected]
Number of open positions: all together 8 in the group

Active learning aims to make best use of human effort in labelling datasets by sequentially asking the user for new labels at input points that are most informative about the function that we want to learn. Whilst a query might be optimal from information theoretic points of views, asking for seemingly "random" points can be costly for us humans, whether it is due to context-switching between very dissimilar areas of the feature space or due to physical costs of moving a sensor around. In this project we are exploring new ways of taking human needs into account more broadly, by giving the machine a model of the user and the user's requirements and preferences.

31.    To help users AI must understand them

Supervisor: Professor Samuel Kaski
Email: [email protected]
Number of open positions: all together 8 in the group

Most machine learning systems operate with us humans, to augment our skills and assist us in our tasks. In environments containing human users, or, more generally, intelligent agents with specific goals and plans, the system must develop a good understanding of the other agents if it wants to help them efficiently. This includes, in particular, assessing tacit and changing goals, eliciting the knowledge of the agent and understanding how the agents interpret the actions of the AI.

In the Probabilistic Machine Learning group, we develop probabilistic interactive user models and inference techniques needed to understand other agents and how to assist them more efficiently. These researches are at the crossroads of cutting-edge domains, reinforcement learning, multi-agent learning, inverse reinforcement learning and knowledge elicitation. As an intern, you will participate to this ambitious goal by developing advanced user models and apply them in various applications.

32.    Improving Interpretability with probabilistic Generalized Additive Models

Supervisor: Professor Samuel Kaski
Email: [email protected]
Number of open positions: all together 8 in the group

Generalized Additive Models (GAMs) are a popular choice for building ""glass-box"" models that are interpretable, which is crucial in high-stakes decisions such as healthcare [1]. This is an active topic of research, with recent comparisons of accuracy vs trustworthiness of different approaches to building GAMs [2] as well as visualisation tools that allow domain experts to ""fix"" the model [3]. Yet much of this line of work does not actually consider the uncertainty in our predictions. Gaussian processes (GPs) are a natural choice for modelling uncertainty over functional relationships, and scale well in additive models [4]. ""ANOVA""-decomposition to identify different contributions both of features and latent variables has been studied with GPs [5] and VAEs [6]. In this project, you would connect ideas from these two branches and investigate how we could build interpretable GAMs that capture uncertainty and allow for easy visualisation of the model.
[1] Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead (Rudin 2019)
[2] How Interpretable and Trustworthy are GAMs? (Chang et al. 2021)
[3] GAM Changer: Editing Generalized Additive Models with Interactive Visualization (Wang et al. 2021)
[4] Additive Gaussian Processes (Duvenaud et al. 2011)
[5] Decomposing feature-level variation with Covariate Gaussian Process Latent Variable Models (Märtens et al. 2019)
[6] Neural Decomposition: Functional ANOVA with Variational Autoencoders (Märtens et al. 2020)

33.    Hyperbolic planar graphs and graphs in the hyperbolic plane

Supervisor: Assistant Professor Sándor Kisfaludi-Bak
Email: kbsandor @
Number of open positions: 1

There is a new rising demand for efficient processing of structured data in hyperbolic spaces, but many of the basic tools available in Euclidean space have not yet been honed in non-Euclidean settings. The goal of this project is to do some early investigations with two different approaches: the combinatorial approach (using planar graphs of constant Gromov hyperbolicity) and the geometric approach (using geometric graphs in the hyperbolic plane). The project would compare the algorithmic properties of such graphs to each other and to their Euclidean counterparts.

The applicant should have good mathematical maturity and a background in analysing algorithms and their complexity. Familiarity with computational geometry and hyperbolic geometry is not required.

34.    New generation of learners for tensor coded, nonlinear systems applied in discovering drug interactions.

Supervisor: Research Fellow Sandor Szedmak
Email: [email protected]
Number of open positions: 1

One of the most challenging problem of the contemporary machine learning is to capture the structure of the complex, nonlinear systems. In drug discovery those kind of systems are very common, for example, when interactions between different drugs and cell lines in cancer therapies need to be discovered. There are some well known methods which address these type of problems, e.g. deep neural networks, kernel methods, but they have their own limits. Deep neural networks require very large data set to train and the results provided are hardly interpretable. For the kernel methods processing the large data set are the real challenges. Both of those approaches could require complicated, time consuming parameter tuning as well.  

Learning via polynomial functions could be a good candidate to overcome on those limitations. They are very general, precisely defined, and can be computed with high efficiency. In real computation every continuous nonlinear function  are basically represented by polynomials as well. Even functions realized by the deep neural networks can also be accurately approximated by polynomials.

In the last decades strong and versatile theories have been built around the possible applications in several areas. For example the Algebraic Geometry and the Algebraic Statistics can provide foundation and algorithms which can help us to better exploit the power of these
nicely behaving mathematical objects.

The task is to explore the learning methods built on the polynomial functions. In the last couple of years several relating methods are introduced, e.g. factorization machine, polynomial networks,
latent tensor reconstruction, thus this exploration could yield a useful synthesis of potential methods.

Prerequisites: knowledge in linear algebra, machine learning, e.g. deep learning, kernels. Some basic knowledge in the algebra of polynomials is advantageous.

35.    Basics of Programming Y1 and Y2 exercise developer

Supervisor: University Lecturer Sanna Suoranta
Email: [email protected]
Number of open positions: 2

We offer two positions with two separate tasks (explained below). The hired interns would however work closely together.
The first is to develop new tasks for the course CS-A1111 Basics of Programming
Y1 course exam.  The course exam uses both Exam Studio and A+; the tasks
are in A+. The task also includes finding out how information can be moved
between Exam Studio and A+. This position requires skills of both of Python, especially
unit testing, and Finnish (and Swedish) and English languages.
The second task is to develop assignments and materials for the course CS-A1121
Basics of Programming Y2. The course is about object-oriented
programming with Python and it is a part of the CS minor. The task contains both tuning of
the exercise assignments with their graders and creating new programming
project topics for the course.
Necessary skills:
CS-A1121/3 Basics of Programming Y2 and good knowledge of unit testing

36.    UI programmer for a secure system

Supervisor: University Lecturer Sanna Suoranta
Email: [email protected]
Number of open positions: 1

Task of this summer intern is to program working user interface protytypes for research project where usability of secure software is investigated with psychometric methods, e.g. using eye tracking system. The main emphasis is programming: the intern has to know
both how to implement user interface from scratch with Python and how to make frontend and backend of a webservice.

Necessary skills:
CS-E5220 Interface Construction
CS-C3130 Information Security
CS-C3170 Web Software Development
Good to know:
CS-E4350 Security Engineering

37.    Developing novel symmetry-learning algorithms for out-of-distribution generalization

Supervisor: Assistant Professor Stéphane Deny
Email: [email protected]
Number of open positions: 2

Some experience with a deep learning library such as PyTorch or Tensorflow. Some interest in the topic.

38.    Software developer

Supervisor: Senior University Lecturer Vesa Hirvisalo
Email: [email protected]
Number of open positions: 1

We are looking for a summer worker that could participate in our software development during summer 2022. We develop software both for our research and teaching. In our research, we apply machine learning methods to industrial IoT systems, and in out teaching, we apply automatic grading of programming exercises. Good programming skills are required, but having understanding on machine learning and systems in general is also useful.

39.    Deep Representation Learning – Foundations and New Directions

Supervisor: Assistant Professor Vikas Garg
Email: [email protected]
Number of open positions: 1-2

Applications are invited for an internship in deep representation learning, broadly construed. Topics of particular interest include:

(1) Generative Models
(2) Graph Neural Networks
(3) Neural ODEs/PDEs/SDEs, Deep Equilibrium Models, Implicit Models
(4) Differential Geometry/Information Geometry/Algebraic Methods for Deep Learning
(5) Learning under limited data, distributional shift, and/or uncertainty
(6) Bayesian Methods, Probabilistic Graphical Models, & Approximate Inference
(7) Fair, diverse, and interpretable representations
(8) Off-policy reinforcement learning, inverse reinforcement learning, and causal reinforcement learning
(9) Multiagent systems and AI-assisted human-guided models
(10) Learning on the edge (i.e., learning under resource constraints)
(11) Applications in physics, computer vision, drug discovery, material design, synthetic biology, quantum chemistry,
(12) Quantum Machine Learning for structured spaces

Representative publications:
(1) John Ingraham, Vikas Garg, Regina Barzilay, and Tommi Jaakkola. Generative Models for Protein Design. NeurIPS
(2) Vikas Garg, Stefanie Jegelka, and Tommi Jaakkola. Generalization and Representational Limits of Graph Neural
Networks. ICML (2020).
(3) Vikas Garg and Tommi Jaakkola. Solving graph compression via Optimal Transport. NeurIPS (2019).
(4) Vikas Garg, Lin Xiao, and Ofer Dekel. Learning small predictors. NeurIPS (2018).
(5) Vikas Garg, Cynthia Rudin, and Tommi Jaakkola. CRAFT: Cluster-specific assorted feature selection. AISTATS (2016).
(6) Vikas Garg, Adam Kalai, Katrina Ligett, and Steven Wu. Probably approximately correct domain generalization.
AISTATS (2021).
(7) Vikas Garg and Tommi Jaakkola. Predicting Deliberative Outcomes. ICML (2020).

An ideal student would have strong mathematical/theoretical/statistical/algorithmic background, and be comfortable
programming in a deep learning library (e.g., PyTorch).

  • Published:
  • Updated: