Projects
Our group develops computational tools, datasets, and models for inorganic and organometallic chemistry. Many of these projects are in active development — check back for updates, code releases, and new datasets.
Featured
T-REX is a canonical, invertible string representation for transition-metal complexes that explicitly encodes metal identity, coordination topology, geometry, oxidation state, and spin. Its extension, MUL-T-REX, generalizes to multinuclear clusters by capturing metal-metal connectivity and bridging topology at each site. These representations make inorganic chemical space searchable, reproducible, and model-ready.
CAT-FM is a catalysis-focused foundation model that treats catalytic cycles as grammatical sequences of T-REX tokens, learning the causal logic of chemical transformations. Trained on dual data streams from high-fidelity manual datasets and automated literature extraction, CAT-FM supports downstream tasks including yield and selectivity prediction, mechanistic reasoning, and catalyst candidate generation.
We use a multi-stage transfer learning pipeline for mechanophore discovery. First, models learn per-bond and per-atom deformation physics from inexpensive DFT calculations. These are then transferred to predict mechanophore kinetics and reactivity under mechanical force. Finally, the trained models are deployed for high-throughput discovery of new mechanophore scaffolds with targeted properties.
More
Building on the T-REX representation, we train property prediction and virtual screening models for transition-metal complexes across diverse application domains. The tmQMg dataset, and our curated subsets, tmCAT, tmPHOTO, tmBIO, and tmSCO, built using natural language processing of the primary literature, provide the training data that powers these models.
We are developing synthesizability models for inorganic complexes built on T-REX, predicting synthetic accessibility as a function of metal identity, oxidation state, and ligand architecture. These assembly-score-type models help prioritize computationally designed candidates that are likely to be experimentally realizable.