New semantic portal gives meaning to parliamentary data
If you wanted to read all 960,000 speeches held in the Finnish parliament since 1907, it would take seven years and four months as a fulltime job. You’d also have to piece together the digitised, yet unstructured documents from online sources in different formats.
What if all the data were available at your fingertips via a smart web service?
Meet ParliamentSampo, a semantic digital service which has pooled all the speeches and parliamentary networks under a digestible interface. The portal’s name stems from Finnish mythology, where a magical device – Sampo – was bound to bring good fortune to its guardian. In the same way, ParliamentSampo is intended to bear fruit for the wardens of democracy.
‘I think the key contribution of our portal is making these political speeches and networks accessible to the public and providing tools with which researchers and journalists can study the use of parliamentary power in Finland,’ says Eero Hyvönen, professor at Aalto University and director of the Helsinki Centre for Digital Humanities (HELDIG) at the University of Helsinki.
The portal integrates data from several providers, including ministries and parliament, organizing it and enriching it semantically. It provides tools for research, journalism and developers and is an important component in the expanding national semantic web.
‘The traditional web works with links between webpages,’ explains Hyvönen. ‘The Semantic Web works by linking data between webpages. This data has been embedded into a semantic knowledge graph so that a machine can understand the content better and is able to offer more accurate information.’
Kimmo Elo, senior researcher at the Centre for Parliamentary Research at University of Turku.
‘The primary contribution of computational methods is not to generate new information, but to augment our existing research techniques.'
Making the data talk
The portal will be made available to the general public in late 2022, but it’s already helping a handful of researchers make the data talk, diving into discourse, concepts and power.
‘This is unique in the Finnish context because for the first time we have a machine-readable database of all parliamentary speeches,’ says Kimmo Elo, senior researcher at the Centre for Parliamentary Studies at the University of Turku. ‘Parliamentary speeches are especially important because they’re contestations for political alternatives.’
‘If we want to drive progress in computational social sciences and humanities, we need data that is in sync with our methods,’ Elo continues. So far, digital documents have been converted into machine-readable formats on an ad hoc basis by individual researchers.
The new portal broadens the available data, and the semantic approach also offers the potential to challenge and complement existing research. ‘It can inform the way we design our research. Having an organized view of the entirety of parliamentary data allows to look for historical trends, cycles of conceptual and rhetorical evolution, as well as discovering potential blind spots that manual analyses had missed,’ says Elo.
Several research papers to take advantage of ParliamentSampo are already out. For example, Elo and Jenni Karimäki used mixed methods to analyse how environmental policy terminology has changed over the past 60 years in the Finnish Parliament. The findings suggest an increase in the volume and share of environmental policy content, as well as an evolution from a more confined, nature -centred rhetoric towards more holistic and climate change -centred terminology.
The research process demonstrated how computer science can help political scientists. ‘The primary contribution of computational methods is not to generate new information but to augment our existing research techniques,’ explains Elo. ‘In this article, we analysed the data and found clear trends with the chosen terms and concepts, while Jenni – who has a deep understanding of the environmental issues – brought qualitative research techniques to make inferences from the data.’
In addition to magnifying history, the project will ensure a hook-up to current data by incorporating ongoing discussions into its database automatically. Journalists at Helsingin Sanomat were the first ones to put the portal to journalistic use for their live journalism show Black Box.
‘I used the portal’s data in my talk to illuminate the meaning of the plenary session and introduce the most vocal members of parliament,’ explains data journalist Sonia Zaki from Helsingin Sanomat. ‘This list couldn’t have been compiled before due to the sheer amount of manual work it would have entailed. I think’s it’s a valuable tool – especially for political reporters – with which to follow what and how politicians talk in the Parliament.’
Hyvönen’s team also hopes to inspire developers to engage with the open-source tool and find novel ways of using parliamentary data.
Not just semantics
Hyvönen and his colleagues in the Semantic Computing Research Group have been developing the Finnish semantic web for the past two decades under the umbrella of the Sampo model. A series of data services and portals based on the Sampo model link together, and give semantic meaning to, archaeological findings, Finnish war history, biographies, and documents from numerous other fields. Under the hood runs the semantic engine, or the Finnish Ontology and Data Infrastructure, which is a kind of source code for the contents, contexts, and connections of the Finnish Semantic Web.
‘Our work represents a paradigm shift in publishing data in humanities research. From printed texts to online systems, we’re at the third phase where linked data and tools for research are realized – as we have done here,’ says Hyvönen.
He compares the development of a semantic web infrastructure to the impact of physical infrastructure mega-projects. Yet, Hyvönen’s vision for the Sampo model stretches even further.
‘The fourth phase, could be a web where the systems are not only tools, but intelligent agents that help humans in finding and solving research problems.’