Color blots reveal phenomena and connections in data
11.11.2011
Visualization makes it possible to understand interesting phenomena and connections in data that cannot be specifically defined in advance. For example, the data collected from a person’s genes can show how people from different parts of Finland are grouped genetically, says Eli Parviainen, M.Sc. (Technology), who is defending his thesis in the School of Science.
In practice, visualization means changing data into color blots. However, multidimensional data cannot be visualized as such; it first has to be changed into a simpler low-dimensional format.
Parviainen’s dissertation in the field of computational science involves studying dimension reduction as a way of visualizing data. The work comprises a group of studies on current topics in dimension reduction and neural network research Parviainen presents a new method of speeding up the calculation of dimension reduction by removing some of the comparisons between data points and taking into account the neighbour relationship of the data points. This method would facilitate the use of dimension reduction in different applications.
According to Parviainen, modern data can have tens of thousands of variables. Dimension reduction is used to find more detailed features in multidimensional data. Multidimensional data refers to data in which observations comprise many parts – for example, a picture is made up of many pixels, every one of which forms a single dimension.
- Data visualization requires a low-dimensional presentation because paper is two-dimensional and space is three-dimensional. Thus, high-dimensional space has to be changed into a format that can be drawn on paper or a computer screen to enable visualization, explains Parviainen.
An automated world requires data classification
Visualization can also make it possible to study interpretations of data made using a model – for example, to look for reasons why a data classifier was unable to separate certain categories from one another.
Parviainen uses handwritten numbers as an example in his dissertation. It has to be possible to classify numbers written in a poor hand in order to, among other things, ensure that the mail reaches its destination.
- At the post office, a machine can read handwritten postal codes on letters and sort the letters into the right piles so that the mailman doesn't have to decipher the numbers. Data classification would make it easier to automate many activities.
Eli Parviainen will defend his doctoral dissertation “Studies on dimension reduction and feature spaces” ( “Ulotteisuudenpienennyksestä ja piirreavaruuksista” in Finnish) at noon on November 11 in Lecture Hall F239a at the Aalto University School of Science, Otakaari 3 J, Espoo.
The dissertation is available online at: http://lib.tkk.fi/Diss/2011/isbn9789526043128/
Further information: Eli Parviainen, tel. + 358 50 512 4385, eli.parviainen [at] aalto [dot] fi
Text: Tea Kalska
