Creyon Bio Publishes New Research Showing Machine Learning Can Be Used to Predict Electronic Structure of Large Molecules

RESEARCH HIGHLIGHT
Aug. 11, 2022

Creyon Bio, Inc., a drug development company engineering Oligonucleotide-Based Medicines (OBMs) with predictable safety and efficacy profiles, published new research advancing their efforts to predict the chemical and biological properties of OBMs. The research shows proof-of-concept for a novel approach to predicting the electronic structure of large molecules, such as oligonucleotides: machine learning the electron-electron correlations on short polymers and then stitching these together to obtain highly accurate electronic structure of large polymers. The paper, titled “Machine Learning 1- and 2-electron reduced density matrices of polymeric molecules” is available in arXiv.

Identifying chemical features of molecules is a key step in drug discovery. At the most fundamental level, these chemical features are related to the quantum state of the electrons in the molecule, or the electronic structure. Computational chemistry tools for finding electronic structure of small molecules have been available for a long time, but these tools are too slow for larger molecules like oligonucleotides. This research describes proof-of-concept calculations to demonstrate that machine learning can be used to predict electronic structure of large molecules. We envision that this novel approach will be used to predict the electronic properties of OBMs. Efficient prediction of electronic properties in turn will allow us to connect the chemical design of OBMs to their chemical and biological properties, including toxicity and activity.

With advances propelled by machine learning and AI tools such as predicting the electronic structure of OBMs, the Creyon™ Platform creates unprecedented efficiency and will change how precision medicines are created for patients. Traditional trial-and-error approaches to screening gene-based medicines cannot scale up to meet the increasingly rapid pace of genomic discoveries. Creyon Bio develops and uses advanced machine learning and artificial intelligence along with optimal purpose-built datasets to connect foundational biophysical properties of OBM chemistry and sequence with accurate predictive models of safety and efficacy. Creyon Bio’s purpose-built datasets are orders-of-magnitude more efficient than using retrospective or ad-hoc screening data for building predictive models. This allows Creyon Bio to develop models to engineer optimal OBMs across a broad range of molecular modalities from single-stranded antisense oligonucleotides (ASOs) that reduce gene expression levels or change splicing events, to small interfering RNA (siRNA), to DNA and RNA editing systems, to even targeting aptamers.

In this research on predicting electronic structure of large molecules, the Creyon team’s work relies on a concept that Walter Kohn (a theoretical physicist and chemist who won the 1998 Nobel Prize in Chemistry for developing Density Functional Theory) called quantum nearsightedness, which states that electron-electron correlations in molecules are short-ranged. The Creyon team leveraged quantum nearsightedness, by first machine learning the electron-electron correlations on small polymeric molecules from training data generated using high-level quantum chemistry calculations, and then “stitching” these units together to obtain highly accurate electronic structure of bigger molecules.

In addition, the Creyon team’s work also addresses a fundamental problem in quantum chemistry. Since 1955 chemists have appreciated that electronic structure of molecules can be encoded using 2-electron reduced density matrices (2RDMs). 2RDMs store the electron-electron correlations and only require polynomial amount of storage while the conventional many-electron wave functions require exponential amount of storage. However, the adoption of 2RDMs for quantum chemistry calculations has been stymied by the n-representability problem: our inability to distinguish valid and invalid 2RDMs (using a polynomially complex algorithm). Creyon’s machine learning models provide a route around the n-representability problem by teaching the computer what valid 2RDMs look like.

David Pekker, Ph.D., Director of Theory at Creyon Bio, is lead author of the paper. Additional authors are Chungwen Liang, Ph.D., Principal Scientist, Computation Science, Sankha Pattanayak, Ph.D., Director of Chemistry, and Swagatam Mukhopadhyay, Ph.D., Co-founder and Chief Scientific Officer.

About Creyon Bio, Inc.

Creyon Bio is a pre-clinical stage company reimagining drug development as it should be, using a data-first approach for generating uniquely powerful datasets and developing machine learning models to uncover the engineering principles that make precision oligonucleotide-based medicines possible for patient populations of all sizes. To learn more, visit creyonbio.com.