Speaker
Description
We aim to present a novel methodology for diagnosing latent instabilities within data-driven molecular dynamics (MD) simulations. Referred to as cumulative latent instability (CLI) analysis, this approach is grounded in the theoretical premise that all regression techniques, regardless of their capacity for extrapolation, are inherently prone to producing unreliable or artefactual predictions when applied outside their natural interpolation domain (NID) [1]. Drawing on quantum chemical topology [2] and leveraging the behavior of a Heaviside-like indicator function, we introduce the CLI index as a robust and well-behaved estimator for quantifying the accumulation of unreliable atomic energy predictions along an MD trajectory. CLI analyses indicate that the progressive buildup of latent instability is responsible for the eventual failure of data-driven MD simulations once a critical threshold is surpassed. Our experiments establish a strong correlation between the CLI index, the intrinsic quality of the model propagating the simulation, and the simulation temperature. We further observe that simulations in which the CLI index reaches saturation within 1 nanosecond may be considered effectively stable over indefinite timescales. Finally, a link is established between the CLI index and the principle of energy conservation. This work paves the way for more in-depth studies aimed at rationalizing and addressing the precarious robustness of current machine learning force fields (MLFFs). More specifically, the proposed methodology lays the foundations for a new class of physics-informed active learning protocols in which atomic CLI indices will guide the data augmentation process.