Molecular Screening and Toxicity Estimation of 260,000 Perfluoroalkyl and Polyfluoroalkyl Substances (PFASs) through Machine Learning
September 26, 2022 - Thanh T. Lai, David Kuntz, and Angela K. Wilson
Abstract
Perfluoroalkyl and polyfluoroalkyl substances (PFASs) are a class of chemicals widely used in industrial applications due to their exceptional properties and stability. However, they do not readily degrade in the environment and are linked to contamination and adverse health effects in humans and wildlife. To find alternatives for the most commonly used PFAS molecules that maintain their desirable chemical properties but are not adverse to biological lifeforms, a novel approach based upon machine learning is utilized. The machine learning model is trained on an existing set of PFAS molecules to generate over 260,000 novel PFAS molecules, which we dub PFAS-AI-Gen. Using molecular descriptors with known relationships to toxicity and industrial suitability followed by molecular docking and molecular dynamics simulations, this set of molecules is screened. In this manner, increasingly complex calculations are performed only for candidate molecules that are most likely to yield the desired properties of low binding affinity toward two selected protein receptors, the human pregnane x receptor (hPXR) and peroxisome proliferator-activated receptor γ (PPAR-γ), and high industrial suitability, defined by critical micelle concentration (CMC). The selection criteria of low binding affinity and high industrial suitability are relative to the popular PFAS alternative GenX. hPXR and PPAR-γ are selected as they are PFAS targets and facilitate a variety of functions, such as drug metabolism and glucose regulation, respectively. Through this approach, 22 promising new PFAS substitutes that may warrant experimental investigation are identified. This integrated approach of molecular screening and toxicity estimation may be applicable to other chemical classes.