15-18 settembre 2025
Conference Center – University of Naples Federico II
Europe/Rome timezone

Embedded machine-readable molecular representation for resource-efficient deep learning applications

Not scheduled
Sala Azzurra (Conference Center – University of Naples Federico II)

Sala Azzurra

Conference Center – University of Naples Federico II

Complesso Universitario di Monte Sant’Angelo Via Cintia, 26, 80126 – Napoli Italy
Oral Presentation

Speaker

Francisco Martin-Martinez (King's College London)

Description

The practical implementation of deep learning (DL) methods for chemistry applications relies on encoding chemical structures into machine-readable formats that can be efficiently processed by computational tools. One Hot Encoding (OHE) and Morgan fingerprints (MF) are established representations of alphanumeric categorical data in expanded numerical matrices or vectors. We have developed embedded alternatives to OHE and MP that encode discrete alphanumeric tokens of an N-sized alphabet into a few real numbers that constitute a simpler matrix representation of chemical structures. The implementation of this embedded representations in training machine learning models achieves comparable results to traditional representations in model accuracy and robustness while significantly reducing the use of computational resources. Our benchmarks across molecular representations (SMILES, DeepSMILES, and SELFIES) and different molecular databases for Variational Autoencoders (VAEs), Recurrent Neural Networks (RNNs) and other DL models show a reduction in vRAM memory usage by up to 50% while increasing disk Memory Reduction Efficiency to 80% on average, in some cases. These encoding methods open new avenues for data representation in embedded formats that promote energy efficiency and scalable computing in resource-constrained devices, or in scenarios with limited computing resources. The application of these embeddings impacts other disciplines that rely on the use of OHE and MF.

Primary authors

Francisco Martin-Martinez (King's College London) Mr. Emilio Nuñez-Andrade (Swansea University) Dr. Isaac Vidal-Daza (University of Granada) Dr. James W. Ryan (Swansea University) Dr. Rafael Gómez-Bombarelli (Massachusetts Institute of Technology)

Presentation Materials

There are no materials yet.
Your browser is out of date!

Update your browser to view this website correctly. Update my browser now

×