Public defence in Computer Science, M.Sc. (Tech) Minna Tamper
Title of the doctoral thesis: From Text to Knowledge: Methods, Tools, and Applications for Digital Humanities Based on Linked Data
Opponent: Professor Veronika Laippala, University of Turku, Finland
Custos: Professor Eero Hyvönen, Aalto University School of Science, Department of Computer Science
The thesis is publicly displayed 10 days before the defence in the publication archive Aaltodoc of Aalto University.
Public defence announcement:
The software presented in this dissertation can be used to transform digitally transformed text document collections into linked data that can be used to improve the collections utilization in research. The linked data created from the collection describes it, its documents, their properties (author, name) and content (themes, actors). Shared and controlled vocabularies are used in describing the properties and content of the data. Therefore, by using these vocabularies to describe the collection, its information can be linked to other collections that use the same vocabularies in describing the data. This results in a network of linked data that can be utilized to search, study, and analyze information.
This dissertation studies the use of natural language processing methods and linked data to transform text document collections into data and enriching it. Research in this thesis is conducted by designing, implementing, and evaluating proof-of-concept systems, tools, and data. The applied methods of natural language processing include extraction of actors and keywords from texts that have been linked to different external controlled vocabularies and to the generated linked dataset created from the document collection.
Based on the findings of this dissertation, the natural language processing methods coupled with linked data technologies provide a good infrastructure for researching and analyzing the text document collection. The work has created new data models, tools, and methods for transforming text collections to linked data and enriching them. With the help of the data it is possible to study the actors and themes of the texts from new points of view that can aid in grasping the texts and collections as a whole. The data model depicting the text document collection’s properties and content, such as keywords and person references, creates a foundation for intelligent services, e.g., network and linguistic analyses. At the same time it enables analytics that can be used as a basis for critical consideration of the use cases for the collections. In addition, the data can be used to build search and other types of applications that can ease browsing and search of materials in addition to improve general user experience.