Friday, 27 June 2025 09:58

Photoemission spectroscopy of organic molecules using plane wave/pseudopotential density functional theory and machine learning: A comprehensive and predictive computational protocol for isolated molecules, molecular aggregates, and organic thin film

Ab initio simulations and machine learning models work together to enable full interpretation and predicition of photoemission experiments Ab initio simulations and machine learning models work together to enable full interpretation and predicition of photoemission experiments

In this works published on The Journal of Chemical Physics, theoretical studies of isolated molecules have enabled accurate modeling of core and valence ionization spectra. A plane wave DFT protocol using ΔSCF has been developed and validated across diverse molecules, employing various functionals. It also extends to molecular films and aggregates. Results aid interpretation of measurements and support ML model training. A public repository provides pseudopotentials, input files, and ML datasets to ensure reproducibility of our method.

We have developed and validated a robust computational protocol based on plane wave density functional theory (PW-DFT) and a ΔSCF approach for predicting X-ray photoemission spectra (XPS) of isolated molecules. Our method enables accurate calculation of core-level binding energies (BEs), specifically for C1s, N1s, and O1s orbitals, by incorporating core-hole pseudopotentials and assessing various exchange–correlation functionals (PBE, B3LYP, HSE, BH&HLYP). Our extensive benchmarking, including comparison with equation-of-motion coupled-cluster (EOM-CCSD) calculations and experimental data, demonstrated strong agreement across a broad set of molecular classes—aromatic, aliphatic, heteroaromatic compounds, drugs, and biomolecules. Each functional's strengths and weaknesses were evaluated: for example, PBE offers low-cost calculations but tends to underestimate core BEs in polar environments, while B3LYP and HSE showed consistent accuracy across different chemical environments. We extended our protocol's application to large molecular aggregates and thin films on inorganic surfaces, showing promising results for modeling realistic systems such as N-doped graphite and surface-adsorbed molecular layers. In valence photoemission, Kohn–Sham eigenvalues were tested as proxies for ionization potentials across diverse molecules, with BH&HLYP showing the best accuracy and transferability. Importantly, we have also developed an initial machine learning (ML) model trained on PW-DFT data to predict XPS spectra of organic molecules. The model effectively reproduced experimental spectral features, highlighting the growing role of ML in spectroscopic prediction. Future developments will incorporate conformational and intermolecular interaction effects to improve prediction fidelity. To support reproducibility and community adoption, we provide an open repository with core-hole pseudopotentials, input files, and training datasets for ML applications.

This work has been financially supported by ICSC-Centro Nazionale di Ricerca in High Performance Computing, Big Data, and Quantum Computing, funded by European Union-NextGenerationEU (Grant No. CN00000013), and by the Italian Minister of the University and Research (MUR) within the PRIN-2022 research program (project “NIR+,” Grant No. 2022BREBFN)