MLCS: Mikko Tolonen "Data Science and Computational History: Opportunities for Collaboration"
Data science and computational history: opportunities for collaboration
Professor of Digital Humanities
University of Helsinki
Helsinki Computational History Group (COMHIS) is a multidisciplinary team that studies intellectual history (http://helsinki.fi/computational-history). The work in the group is guided by methods from various different backgrounds ranging from modern data science and machine learning to history and linguistics. “Computational history” implies the use of mixed methods in which big data approach is combined to expert subject knowledge in intellectual history and book history. Lately we have been focusing on an integrated study of large historical metadata and full-text sources (especially British eighteenth-century literature and different historical newspaper resources). From method perspective, we have been developing contextual tools for bibliographic sources to study influence and networks; text-reuse detection to study intertextuality (using BLAST to deal with OCR-mistakes); materiality explorations of printed items based on
information derived from layout, font etc.; stylometry to study particular questions of authorship; and word embeddings and other text mining methods when thinking about conceptual change. A bottleneck in computational history is the preprocessing and harmonization of data – we take care of that. What we are looking for from a data science audience are novel methodological ideas about ways to use data science to model our historical data. We have for example ideas how to use computer vision for some aspects of our work, but we need a trained data scientist with computer vision background to collaborate with us on this front. This talk will outline this kind of aspects in our research to think about opportunities for collaboration.
Please spread the news and join us for our weekly habit of beginning the week by an interesting machine learning talk!