<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.3.4">Jekyll</generator><link href="https://chem-bla-ics.linkedchemistry.info/feed/by_tag/kegg.xml" rel="self" type="application/atom+xml" /><link href="https://chem-bla-ics.linkedchemistry.info/" rel="alternate" type="text/html" /><updated>2026-06-15T12:00:19+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/feed/by_tag/kegg.xml</id><title type="html">chem-bla-ics</title><subtitle>Chemblaics (pronounced chem-bla-ics) is the science that uses open science and computers to solve problems in chemistry, biochemistry and related fields.</subtitle><author><name>Egon Willighagen</name></author><entry><title type="html">Biology, ACPs, lipids, cheminformatics, and Dagstuhl</title><link href="https://chem-bla-ics.linkedchemistry.info/2022/08/01/biology-acps-lipids-cheminformatics-and.html" rel="alternate" type="text/html" title="Biology, ACPs, lipids, cheminformatics, and Dagstuhl" /><published>2022-08-01T00:00:00+00:00</published><updated>2022-08-01T00:00:00+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/2022/08/01/biology-acps-lipids-cheminformatics-and</id><content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2022/08/01/biology-acps-lipids-cheminformatics-and.html"><![CDATA[<p>Already 3 months ago I visited <a href="https://www.dagstuhl.de/">Dagstuhl</a> for the second time. The weather was much better than in the January right before
the start of the pandemic. The first I attended the Computational Metabolomics meeting, with the focus From Cheminformatics to Machine Learning, one
of the things we concerned ourselves with was how to do computation with compound classes (see
<a href="https://drops.dagstuhl.de/opus/volltexte/2020/12403/pdf/dagrep_v010_i001_p144_20051.pdf">Section 3.6</a> and
<a href="https://egonw.github.io/cdk-cxsmiles/">this online book</a>). We know how to handle
SMILES and we know how to the substructure searching with SMARTS, but what if you have compound classes or lipid classes? Biology is a greasy business.</p>

<p>From a <a href="https://wikipathways.org/">WikiPathways</a> there is additional complexity, with modified proteins involved in lipid metabolism, the acyl-carrier
proteins. They look like this, and the R group is a protein:</p>

<p><img src="/assets/images/Screenshot_20220801_180944.png" alt="" /></p>

<p>We have quite a few of them in WikiPathway and they also show up in <a href="https://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:5697">ChEBI</a> (and likely
Reactome), <a href="https://www.lipidmaps.org/databases/lmsd/LMFA07060040?LMID=LMFA07060040">LIPID MAPS</a>, and
<a href="https://www.kegg.jp/entry/C05764">KEGG</a>.</p>

<p>During this years Dagstuhl we used up one session to continue working on it (report pending). Part of the results is that
<a href="https://www.wikidata.org/">Wikidata</a> (see doi:<a href="https://doi.org/10.7554/eLife.52614">10.7554/eLife.52614</a> and
doi:<a href="https://doi.org/10.7554/eLife.70780">10.7554/eLife.70780</a>) now has <a href="https://www.wikidata.org/wiki/Property:P10718">a property for CXSMILES</a>.
CDK 2.0 (doi:<a href="https://doi.org/10.1186/s13321-017-0220-4">10.1186/s13321-017-0220-4</a>) already supported CXSMILES and the above image is actually created with
<a href="https://github.com/cdk/depict">CDK Depict</a> (thx to John!).</p>

<p>So, that means I can now start adding all those ACPs to Wikidata :) Here’s <a href="https://www.wikidata.org/wiki/Q113377202">hexadecanoyl-[acp]</a>
(or this <a href="https://scholia.toolforge.org/chemical-class/Q113377202">Scholia page</a>):</p>

<p><img src="/assets/images/Screenshot_20220801_182345.png" alt="" /></p>]]></content><author><name>Egon Willighagen</name></author><category term="cdk" /><category term="chebi" /><category term="dagstuhl" /><category term="epilipidnet" /><category term="kegg" /><category term="wikipathways" /><category term="lipidmaps" /><category term="metabolomics" /><category term="smiles" /><category term="wikidata" /><category term="doi:10.7554/ELIFE.52614" /><category term="doi:10.7554/ELIFE.70780" /><category term="doi:10.1186/S13321-017-0220-4" /><category term="cxsmiles" /><summary type="html"><![CDATA[Already 3 months ago I visited Dagstuhl for the second time. The weather was much better than in the January right before the start of the pandemic. The first I attended the Computational Metabolomics meeting, with the focus From Cheminformatics to Machine Learning, one of the things we concerned ourselves with was how to do computation with compound classes (see Section 3.6 and this online book). We know how to handle SMILES and we know how to the substructure searching with SMARTS, but what if you have compound classes or lipid classes? Biology is a greasy business.]]></summary></entry><entry><title type="html">CDK Workshop - Day #2</title><link href="https://chem-bla-ics.linkedchemistry.info/2007/01/30/cdk-workshop-day-2.html" rel="alternate" type="text/html" title="CDK Workshop - Day #2" /><published>2007-01-30T00:00:00+00:00</published><updated>2007-01-30T00:00:00+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/2007/01/30/cdk-workshop-day-2</id><content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2007/01/30/cdk-workshop-day-2.html"><![CDATA[<p>Because of other obligations, I was unable to attend the first day of the <a href="http://wiki.cubic.uni-koeln.de/cdkwiki/doku.php?id=spring2007workshop">CDK Workshop</a>,
though Christoph had set up Skype so that at least I could hear the talks from <a href="http://www.inf.uni-konstanz.de/bioml/staff/berthold/">Prof. Berthold</a>
(Konstanz, Germany) about <a href="http://www.knime.org/">KNIME</a> and <a href="http://almost.cubic.uni-koeln.de/cosi/curriculumVitae_zielesny.htm">Prof. Zielesny</a>
about <a href="http://cdk-taverna.de/">CDK-Taverna</a>.</p>

<p>Today, Miguel Rojas and Stefan Kuhn discussed their research. Miguel showed the state of mass spectrum prediction using the <a href="http://cdk.sf.net/">CDK</a>
and the MEDEA plugin for <a href="http://www.bioclipse.net/">Bioclipse</a>. Stefan demonstrated the <a href="http://www.nmrshiftdb.org/">NMRShiftDB</a>
and a new lab systems for NMR experiment scheduling and management system based on that. <a href="http://www2.cmbi.ru.nl/who-and-where/staff/27/">Dr. Ott</a>
(Nijmegen, Netherlands) showed the <a href="http://biometa.cmbi.ru.nl/">BioMeta Database</a> which contains metabolite and reaction information derived from the
<a href="http://www.genome.jp/kegg/ligand.html">KEGG</a>, but which fixes a set of chemical problems in the latter (see also the article,
DOI:<a href="https://doi.org/10.1186/1471-2105-7-517">10.1186/1471-2105-7-517</a>).</p>

<p>The afternoons of CDK workshops traditionally have discussion sessions and hackathons. Two groups were formed: one consisted of the KNIME guys who,
together with Miguel and Federico focused in QSAR descriptor calculations in KNIME, while Stefan, Martin and me looked at the fingerprinter
peculiarities that Martin found (see also this <a href="http://almost.cubic.uni-koeln.de/cdk/cdk_top/cdk_news/archive/cdknews2.2.article22.pdf">CDK News article</a>),
and came up with a possible further performance improvement of the AllRingsFinder. Because one class of molecules that is causing trouble consist of two
ring systems connected by a long linker, like Choloyl-CoA (below), we anticipate that splitting the molecule up into ring systems prior to using the
SSSR algorithm should speed up the complete all-ring finding process.</p>

<p><img src="/assets/images/choloyl-coa.png" alt="" /></p>

<p>Currently, the spanning tree is calculated before deciding on using the SSSR finder, which, we think, can be used to partition the molecule
into separate ring systems. On each of them, then, the further steps of the ring search can be applied.</p>

<p>After dinner (pasta/pizza), during the Spanish-German handball game, we continued the hacking and discussions, now focusing as a whole group
on QSAR descriptors in KNIME. We looked at each descriptor and decided if it should go into a QSAR calculator node, or even in a node of its own.</p>

<h2 id="bugs-found">Bugs found</h2>
<p>I won’t close this blog entry without giving a list of problems we found in the current CDK; some minor and small, some more troublesome.
Here goes: typos all over the place; the OrderQueryBond lack a return statement in an else clause; the Mol2Reader does not mark atom and
bond aromaticity properly and reads a single bond as aromatic, and an aromatic bond as single; the Renderer2D does not always highlight
both atoms when hovering over a bond; SmilesGenerator.parseBond() should output bond orders correctly; the SSSR finder seems to have a
messed up if-else statement for the ringBondCount limit of 37; the BondCount descriptor should count all bonds by default, not just the
single bonds; <code class="language-plaintext highlighter-rouge">IDescriptor.getParameters()</code> should return null instead of <code class="language-plaintext highlighter-rouge">Object[0];</code> several programs use the SYBYL atomtype S.o2, while
the specification and the CDK config defines S.O2; the IP descriptor now returns a variable length descriptor.</p>]]></content><author><name>Egon Willighagen</name></author><category term="cdk" /><category term="kegg" /><category term="knime" /><category term="smiles" /><category term="taverna" /><category term="justdoi:10.1186/1471-2105-7-517" /><category term="inchikey:ZKWNOTQHFKYUNU-JGCIYWTLSA-N" /><category term="nmrshiftdb" /><summary type="html"><![CDATA[Because of other obligations, I was unable to attend the first day of the CDK Workshop, though Christoph had set up Skype so that at least I could hear the talks from Prof. Berthold (Konstanz, Germany) about KNIME and Prof. Zielesny about CDK-Taverna.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://chem-bla-ics.linkedchemistry.info/assets/images/choloyl-coa.png" /><media:content medium="image" url="https://chem-bla-ics.linkedchemistry.info/assets/images/choloyl-coa.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Mining the KEGG pathway database with self-organizing maps</title><link href="https://chem-bla-ics.linkedchemistry.info/2006/04/04/mining-kegg-pathway-database-with-self.html" rel="alternate" type="text/html" title="Mining the KEGG pathway database with self-organizing maps" /><published>2006-04-04T00:00:00+00:00</published><updated>2006-04-04T00:00:00+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/2006/04/04/mining-kegg-pathway-database-with-self</id><content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2006/04/04/mining-kegg-pathway-database-with-self.html"><![CDATA[<p>The <a href="https://en.wikipedia.org/wiki/Self_organizing_map">Self-organizing map</a> (SOM) is a popular (again) and intuitive non-linear mapping
method: it transforms a multidimensional space into two dimensions (normally: they are so easy to visualize). Latino and
<a href="http://www.dq.fct.unl.pt/staff/jas/">Aires-de-Sousa</a> published a paper that uses this method to analyze the whole
<a href="http://www.genome.jp/kegg/pathway.html">KEGG pathway database</a>: <em>Genome-Scale Classification of Metabolic Reactions: A Chemoinformatics
Approach</em> (DOI: <a href="https://doi.org/10.1002/anie.200503833">anie.200503833</a>).</p>

<p>The method is based on earlier work by Zhang and Aires-de-Sousa: <em>Structure-Based Classification of Chemical Reactions without Assignment
of Reaction Centers</em> (DOI: <a href="https://doi.org/10.1021/ci0502707">10.1021/ci0502707</a>). A non-trivial feature of the suggested method is the
use of two SOMs. The first maps the reaction onto a fixed-length vector (coined MOLMAP), which is used as input vector for the second map.
This later map is used to cluster the KEGG reactions on a purely chemical basis. The resemblence with the
<a href="https://en.wikipedia.org/wiki/EC_number">EC numbering system</a> is striking.</p>]]></content><author><name>Egon Willighagen</name></author><category term="kegg" /><category term="chemometrics" /><category term="justdoi:10.1002/ANIE.200503833" /><category term="justdoi:10.1021/CI0502707" /><summary type="html"><![CDATA[The Self-organizing map (SOM) is a popular (again) and intuitive non-linear mapping method: it transforms a multidimensional space into two dimensions (normally: they are so easy to visualize). Latino and Aires-de-Sousa published a paper that uses this method to analyze the whole KEGG pathway database: Genome-Scale Classification of Metabolic Reactions: A Chemoinformatics Approach (DOI: anie.200503833).]]></summary></entry></feed>