Already 3 months ago I visited Dagstuhl for the second time. The weather was much better than in the January right before the start of the pandemic. The first I attended the Computational Metabolomics meeting, with the focus From Cheminformatics to Machine Learning, one of the things we concerned ourselves with was how to do computation with compound classes (see Section 3.6 and this online book). We know how to handle SMILES and we know how to the substructure searching with SMARTS, but what if you have compound classes or lipid classes? Biology is a greasy business.

From a WikiPathways there is additional complexity, with modified proteins involved in lipid metabolism, the acyl-carrier proteins. They look like this, and the R group is a protein:

We have quite a few of them in WikiPathway and they also show up in ChEBI (and likely Reactome), LIPID MAPS, and KEGG.

During this years Dagstuhl we used up one session to continue working on it (report pending). Part of the results is that Wikidata (see doi:10.7554/eLife.52614 and doi:10.7554/eLife.70780) now has a property for CXSMILES. CDK 2.0 (doi:10.1186/s13321-017-0220-4) already supported CXSMILES and the above image is actually created with CDK Depict (thx to John!).

So, that means I can now start adding all those ACPs to Wikidata :) Here’s hexadecanoyl-[acp] (or this Scholia page):