EuChemS CompChem2025

15-18 settembre 2025

Conference Center – University of Naples Federico II

Europe/Rome timezone

Contact

Embedded machine-readable molecular representation for resource-efficient deep learning applications

Not scheduled

Sala Azzurra (Conference Center – University of Naples Federico II)

Sala Azzurra

Conference Center – University of Naples Federico II

Complesso Universitario di Monte Sant’Angelo Via Cintia, 26, 80126 – Napoli Italy

Oral Presentation

Francisco Martin-Martinez (King's College London)

The practical implementation of deep learning (DL) methods for chemistry applications relies on encoding chemical structures into machine-readable formats that can be efficiently processed by computational tools. One Hot Encoding (OHE) and Morgan fingerprints (MF) are established representations of alphanumeric categorical data in expanded numerical matrices or vectors. We have developed embedded alternatives to OHE and MP that encode discrete alphanumeric tokens of an N-sized alphabet into a few real numbers that constitute a simpler matrix representation of chemical structures. The implementation of this embedded representations in training machine learning models achieves comparable results to traditional representations in model accuracy and robustness while significantly reducing the use of computational resources. Our benchmarks across molecular representations (SMILES, DeepSMILES, and SELFIES) and different molecular databases for Variational Autoencoders (VAEs), Recurrent Neural Networks (RNNs) and other DL models show a reduction in vRAM memory usage by up to 50% while increasing disk Memory Reduction Efficiency to 80% on average, in some cases. These encoding methods open new avenues for data representation in embedded formats that promote energy efficiency and scalable computing in resource-constrained devices, or in scenarios with limited computing resources. The application of these embeddings impacts other disciplines that rely on the use of OHE and MF.

Francisco Martin-Martinez (King's College London) Mr. Emilio Nuñez-Andrade (Swansea University) Dr. Isaac Vidal-Daza (University of Granada) Dr. James W. Ryan (Swansea University) Dr. Rafael Gómez-Bombarelli (Massachusetts Institute of Technology)

There are no materials yet.

EuChemS CompChem2025

Contact

Embedded machine-readable molecular representation for resource-efficient deep learning applications

Sala Azzurra

Conference Center – University of Naples Federico II

Speaker

Description

Primary authors

Presentation Materials

Your browser is out of date!