Molecular big data

Dataset overview and dataset statistics for 62k dataset of organic molecules and their properties
Dataset statistics: Number of molecules and number of different chemical species in each data subset. Sample molecules are shown on the right.

We generated a dataset of 61,489 molecules extracted from organic crystals in the Cambridge Structural Database. For each molecule, the geometry and several electronic properties calculated with density-functional theory are available. For two subsets, we also supply data from higher level methods, such as hybrid functionals and the GW Green's function method. The dataset is available at Nature Scientific Data. The data is open access and can be freely used for applications, data science and machine learning.

