<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.3.4">Jekyll</generator><link href="https://chem-bla-ics.linkedchemistry.info/feed/by_tag/phd.xml" rel="self" type="application/atom+xml" /><link href="https://chem-bla-ics.linkedchemistry.info/" rel="alternate" type="text/html" /><updated>2026-06-15T12:00:19+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/feed/by_tag/phd.xml</id><title type="html">chem-bla-ics</title><subtitle>Chemblaics (pronounced chem-bla-ics) is the science that uses open science and computers to solve problems in chemistry, biochemistry and related fields.</subtitle><author><name>Egon Willighagen</name></author><entry><title type="html">The CDK/Metabolomics/Chemometrics Unconference results</title><link href="https://chem-bla-ics.linkedchemistry.info/2008/04/07/cdkmetabolomicschemometrics.html" rel="alternate" type="text/html" title="The CDK/Metabolomics/Chemometrics Unconference results" /><published>2008-04-07T00:10:00+00:00</published><updated>2008-04-07T00:10:00+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/2008/04/07/cdkmetabolomicschemometrics</id><content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2008/04/07/cdkmetabolomicschemometrics.html"><![CDATA[<p>As <a href="https://chem-bla-ics.linkedchemistry.info/2008/04/03/t-plus-18-hours-dr-and-preparing-for.html">announced earlier <i class="fa-solid fa-recycle fa-xs"></i></a>, Miguel, Velitchka,
<a href="http://www.steinbeck-molecular.de/steinblog/">Christoph</a> and I held a small <a href="http://cdk.sf.net/">CDK</a>/Metabolomics/Chemometrics
unconference. We started late, and did not have an evening program, resulting in not overly much results. However, we did do
<em><a href="http://chem-bla-ics.blogspot.com/search?q=molecular+chemometrics">molecular chemometrics</a></em>. <!-- keep link --></p>

<p>We used the <a href="http://www.r-project.org/">R statistics software</a> together with Rajarshi’s <a href="http://cran.r-project.org/web/packages/rcdk/index.html">rcdk</a>
package (an R wrapper around the CDK library) and Ron’s (my PhD supervisor) <a href="http://cran.r-project.org/web/packages/pls/index.html">PLS</a>
package (see <a href="http://www.jstatsoft.org/v18/i02/">this paper</a>), to predict retention indices for a number of metabolites.</p>

<p>We ended up with this R script:</p>

<div class="language-R highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">library</span><span class="p">(</span><span class="s2">"rJava"</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="s2">"rcdk"</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="s2">"pls"</span><span class="p">)</span><span class="w">
</span><span class="n">mols</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">load.molecules</span><span class="p">(</span><span class="s2">"data_cdk.sdf"</span><span class="p">)</span><span class="w">
</span><span class="n">selection</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">get.desc.names</span><span class="p">()</span><span class="w">
</span><span class="n">selection</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">selection</span><span class="p">[</span><span class="o">-</span><span class="n">which</span><span class="p">(</span><span class="n">selection</span><span class="o">==</span><span class="s2">"org.openscience.cdk.qsar.descriptors.molecular.AminoAcidCountDescriptor"</span><span class="p">)]</span><span class="w">
</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">eval.desc</span><span class="p">(</span><span class="n">mols</span><span class="p">,</span><span class="w"> </span><span class="n">selection</span><span class="p">,</span><span class="w"> </span><span class="n">verbose</span><span class="o">=</span><span class="kc">TRUE</span><span class="p">)</span><span class="w">
</span><span class="n">x2</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">x</span><span class="p">[,</span><span class="n">apply</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="m">2</span><span class="p">,</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">a</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="nf">all</span><span class="p">(</span><span class="o">!</span><span class="nf">is.na</span><span class="p">(</span><span class="n">a</span><span class="p">))})]</span><span class="w">
</span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">read.table</span><span class="p">(</span><span class="s2">"data_cdk_RI"</span><span class="p">)</span><span class="w">
</span><span class="n">input</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">data.frame</span><span class="p">(</span><span class="n">x2</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="p">)</span><span class="w">
</span><span class="n">pls.model</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">plsr</span><span class="p">(</span><span class="n">V1</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">.</span><span class="p">,</span><span class="w"> </span><span class="m">50</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="o">=</span><span class="n">input</span><span class="p">,</span><span class="w"> </span><span class="n">validation</span><span class="o">=</span><span class="s2">"CV"</span><span class="p">)</span><span class="w">
</span><span class="n">summary</span><span class="p">(</span><span class="n">pls.model</span><span class="p">)</span><span class="w">
</span><span class="n">plot</span><span class="p">(</span><span class="n">RMSEP</span><span class="p">(</span><span class="n">pls.model</span><span class="p">))</span><span class="w">
</span><span class="n">plot</span><span class="p">(</span><span class="n">pls.model</span><span class="p">,</span><span class="w"> </span><span class="n">ncomp</span><span class="o">=</span><span class="m">20</span><span class="p">)</span><span class="w">
</span><span class="n">abline</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="n">col</span><span class="o">=</span><span class="s2">"red"</span><span class="p">)</span><span class="w">
</span><span class="n">plot</span><span class="p">(</span><span class="n">pls.model</span><span class="p">,</span><span class="w"> </span><span class="s2">"loadings"</span><span class="p">,</span><span class="w"> </span><span class="n">comps</span><span class="o">=</span><span class="m">1</span><span class="o">:</span><span class="m">2</span><span class="p">)</span><span class="w">
</span><span class="n">savehistory</span><span class="p">(</span><span class="s2">"finalHistory.R"</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">AminoAcidCountDescriptor</code> threw us a <code class="language-plaintext highlighter-rouge">NullPointerException</code> and there were a few NAs in the resulting matrix. The CV results were
not so good as Velitchka’s best models, but still a good start:</p>

<p><img src="/assets/images/riPred.png" alt="" /></p>

<p>No variable selection; 200 objects, 190 variables.</p>

<p>Questions:</p>

<ul>
  <li>Can we do this in <a href="http://www.bioclipse.net/">Bioclipse2</a> too?</li>
  <li>Can we improve the default CDK descriptor parameters to maximize the column count?</li>
  <li>Rajarshi, what would be involved to write some wrapper code for atomic descriptors for rcdk?</li>
</ul>]]></content><author><name>Egon Willighagen</name></author><category term="cdk" /><category term="defense" /><category term="phd" /><category term="metabolomics" /><category term="cheminf" /><category term="chemometrics" /><category term="justdoi:10.18637/jss.v018.i02" /><summary type="html"><![CDATA[As announced earlier , Miguel, Velitchka, Christoph and I held a small CDK/Metabolomics/Chemometrics unconference. We started late, and did not have an evening program, resulting in not overly much results. However, we did do molecular chemometrics.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://chem-bla-ics.linkedchemistry.info/assets/images/riPred.png" /><media:content medium="image" url="https://chem-bla-ics.linkedchemistry.info/assets/images/riPred.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">T plus 51 hours: a short photo impression</title><link href="https://chem-bla-ics.linkedchemistry.info/2008/04/04/t-plus-51-hours-short-photo-impression.html" rel="alternate" type="text/html" title="T plus 51 hours: a short photo impression" /><published>2008-04-04T00:00:00+00:00</published><updated>2008-04-04T00:00:00+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/2008/04/04/t-plus-51-hours-short-photo-impression</id><content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2008/04/04/t-plus-51-hours-short-photo-impression.html"><![CDATA[<p>I normally do not do these kinds of blog items, but, in reply to <a href="http://www.steinbeck-molecular.de/steinblog/index.php/2008/04/03/congratulations-egon/#comment-327">Christoph’s blog</a>,
here’s an overview of the ceremony (see also <a href="https://chem-bla-ics.linkedchemistry.info/2008/04/01/t-minus-26-hours-defending-open-source.html">T-26 <i class="fa-solid fa-recycle fa-xs"></i></a> and
<a href="https://chem-bla-ics.linkedchemistry.info/2008/04/03/t-plus-18-hours-dr-and-preparing-for.html">T+18 <i class="fa-solid fa-recycle fa-xs"></i></a>):</p>

<p><img src="/assets/images/vga_E112.JPG" alt="" /></p>

<p>This is the doctorate certificate Christoph mentioned, with also Karin and our kids:</p>

<p><img src="/assets/images/vga_E179.JPG" alt="" /></p>

<p>And, <a href="http://www.oortjeshekken.nl/">here</a> (<a href="http://maps.google.com/maps?f=q&amp;hl=en&amp;geocode=&amp;q=Erlecomsedam+4,+ooij,+netherlands&amp;sll=51.857623,5.93914&amp;sspn=0.046967,0.146942&amp;ie=UTF8&amp;ll=51.864169,5.933647&amp;spn=0.01174,0.036736&amp;t=h&amp;z=15">map</a>)
was the dinner in the evening:</p>

<p><img src="/assets/images/vga_E227.JPG" alt="" /></p>]]></content><author><name>Egon Willighagen</name></author><category term="defense" /><category term="phd" /><summary type="html"><![CDATA[I normally do not do these kinds of blog items, but, in reply to Christoph’s blog, here’s an overview of the ceremony (see also T-26 and T+18 ):]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://chem-bla-ics.linkedchemistry.info/assets/images/vga_E112.JPG" /><media:content medium="image" url="https://chem-bla-ics.linkedchemistry.info/assets/images/vga_E112.JPG" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">T plus 18 hours: dr and preparing for the afterparty, umm ^w^w^w, CDK/Metabolomics/Chemometrics unconference</title><link href="https://chem-bla-ics.linkedchemistry.info/2008/04/03/t-plus-18-hours-dr-and-preparing-for.html" rel="alternate" type="text/html" title="T plus 18 hours: dr and preparing for the afterparty, umm ^w^w^w, CDK/Metabolomics/Chemometrics unconference" /><published>2008-04-03T00:00:00+00:00</published><updated>2008-04-03T00:00:00+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/2008/04/03/t-plus-18-hours-dr-and-preparing-for</id><content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2008/04/03/t-plus-18-hours-dr-and-preparing-for.html"><![CDATA[<p>I am doctor now; I shall now be <a href="http://taaladvies.net/taal/advies/tekst/21#6">addressed as</a> <em>weledelzeergeleerde</em> Egon;
translating to something like <em>quite-noble-very-knowledgeable</em>, hahahaha. I’ll put up a few photo’s of the ceremony, which
is actually quite formal at the <a href="http://www.ru.nl/">Radboud University</a>, later.</p>

<p>With this blog item, I would to thank everyone who left a message, sent email, etc with good luck messages. Very much
appreciated! I’d also like to thank my supervisors, promotores <a href="http://www.cac.science.ru.nl/people/lbuydens/index.html">Lutgarde Buydens</a> and
<a href="http://wwmm.ch.cam.ac.uk/blogs/murrayrust/">Peter Murray-Rust</a> (he mentions the event <a href="http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=1019">here</a>),
and <a href="http://www.cac.science.ru.nl/people/rwehrens/index.html">Ron Wehrens</a> for their confidence in me and their guidance
on the path towards the post-doc life. I also thank all those who attended my defense; I had a brilliant day, and actually
enjoyed talking to those who took place in my promotion committee and who asked me the not-really-nasty-questions about
my work.</p>

<h2 id="cdk-chemometrics-in-metabolomics-unconference">CDK-Chemometrics in Metabolomics Unconference</h2>

<p>For today, I organized a small, informal <a href="http://en.wikipedia.org/wiki/Unconference">unconference</a>, oriented around the
<a href="http://cdk.sf.net/">CDK</a>, chemometrics and metabolomics. I’m certain we will be online much of the day, as we typically
do. The meeting will start around 10:00 <a href="http://en.wikipedia.org/wiki/Central_European_Summer_Time">CEST</a>, but we’ll
attend a seminar by <a href="http://www.ki.si/index.php?id=844">Marjana Novič</a> at 11:00 CEST. If you happen to be in
<a href="http://en.wikipedia.org/wiki/Nijmegen">Nijmegen</a>, just drop in on the Analytical Chemistry department.
Otherwise, join the #cdk chat channel in the irc.freenode.net network.</p>

<p>What we’ll do?? Hey, it’s an unconference; we have no idea yet :)</p>]]></content><author><name>Egon Willighagen</name></author><category term="defense" /><category term="cheminf" /><category term="chemometrics" /><category term="phd" /><summary type="html"><![CDATA[I am doctor now; I shall now be addressed as weledelzeergeleerde Egon; translating to something like quite-noble-very-knowledgeable, hahahaha. I’ll put up a few photo’s of the ceremony, which is actually quite formal at the Radboud University, later.]]></summary></entry><entry><title type="html">T minus 26 hours: defending open source chemoinformatics (and more)</title><link href="https://chem-bla-ics.linkedchemistry.info/2008/04/01/t-minus-26-hours-defending-open-source.html" rel="alternate" type="text/html" title="T minus 26 hours: defending open source chemoinformatics (and more)" /><published>2008-04-01T00:00:00+00:00</published><updated>2008-04-01T00:00:00+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/2008/04/01/t-minus-26-hours-defending-open-source</id><content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2008/04/01/t-minus-26-hours-defending-open-source.html"><![CDATA[<p>In about 26 hours from now, I will be <a href="https://chem-bla-ics.linkedchemistry.info/2008/03/01/todo-april-2nd-defend-my-phd-work.html">defending my PhD thesis <i class="fa-solid fa-recycle fa-xs"></i></a>.
Follow that link to read the summary; I was thinking if publishing my introduction and discussion (the rest has been published in peer-reviewed
journals) on <a href="http://precedings.nature.com/">Nature Precedings</a>; would that be a good idea? Otherwise, I’ll post it in my blog. If you just
happen to want to attend the public defense, it’s
<a href="http://maps.google.com/maps?f=q&amp;hl=en&amp;geocode=&amp;q=Comeniuslaan+2,+6525+Nijmegen,+Nijmegen+(Gelderland),+Netherlands&amp;sll=37.0625,-95.677068&amp;sspn=28.114729,75.234375&amp;ie=UTF8&amp;ll=51.820699,5.857548&amp;spn=0.002673,0.009184&amp;t=h&amp;z=17&amp;iwloc=addr">here</a>:</p>

<p><img src="/assets/images/aula.png" alt="" /></p>]]></content><author><name>Egon Willighagen</name></author><category term="cheminf" /><category term="chemometrics" /><category term="phd" /><summary type="html"><![CDATA[In about 26 hours from now, I will be defending my PhD thesis . Follow that link to read the summary; I was thinking if publishing my introduction and discussion (the rest has been published in peer-reviewed journals) on Nature Precedings; would that be a good idea? Otherwise, I’ll post it in my blog. If you just happen to want to attend the public defense, it’s here:]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://chem-bla-ics.linkedchemistry.info/assets/images/aula.png" /><media:content medium="image" url="https://chem-bla-ics.linkedchemistry.info/assets/images/aula.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">TODO: April 2nd, defend my PhD work</title><link href="https://chem-bla-ics.linkedchemistry.info/2008/03/01/todo-april-2nd-defend-my-phd-work.html" rel="alternate" type="text/html" title="TODO: April 2nd, defend my PhD work" /><published>2008-03-01T00:00:00+00:00</published><updated>2008-03-01T00:00:00+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/2008/03/01/todo-april-2nd-defend-my-phd-work</id><content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2008/03/01/todo-april-2nd-defend-my-phd-work.html"><![CDATA[<p>In 4.5 weeks, on Wednesday April 2 (13:30 precisely, <a href="http://maps.google.com/maps?f=q&amp;hl=en&amp;geocode=&amp;q=comeniuslaan+2,+nijmegen,+nederland&amp;sll=37.0625,-95.677068&amp;sspn=25.010803,75.234375&amp;ie=UTF8&amp;ll=51.820852,5.857548&amp;spn=0.002374,0.009184&amp;t=h&amp;z=17&amp;iwloc=addr">Aula, Comeniuslaan 2, Nijmegen</a>)
I will publicly defend my PhD work performed in the <a href="http://www.cac.science.ru.nl/">Analytical Chemistry group</a> of
<a href="http://scholar.google.nl/scholar?as_q=&amp;num=10&amp;btnG=Search+Scholar&amp;as_epq=&amp;as_oq=&amp;as_eq=&amp;as_occt=any&amp;as_sauthors=LMC+Buydens&amp;as_publication=&amp;as_ylo=&amp;as_yhi=&amp;as_allsubj=all&amp;hl=en&amp;lr=">Prof. Lutgarde Buydens</a>
at the <a href="http://www.ru.nl/">Radboud University Nijmegen</a>:</p>

<p><img src="/assets/images/thesisCover.png" alt="" /></p>

<h2 id="table-of-contents">Table of Contents</h2>

<ol>
  <li>Introduction</li>
  <li>Molecular Chemometrics (doi:<a href="https://doi.org/10.1080/10408340600969601">10.1080/10408340600969601</a>)</li>
  <li>1D NMR in QSPR(doi:<a href="https://doi.org/10.1021/ci050282s">10.1021/ci050282s</a>)</li>
  <li>Comparing Crystals (doi:<a href="https://doi.org/10.1107/S0108768104028344">10.1107/S0108768104028344</a>)</li>
  <li>Supervised SOMs (doi:<a href="https://doi.org/10.1021/cg060872y">10.1021/cg060872y</a>)</li>
  <li>Chemical Metadata in RSS (doi:<a href="https://doi.org/10.1021/ci034244p">10.1021/ci034244p</a>)</li>
  <li>Interoperability (doi:<a href="https://doi.org/10.1021/ci050400b">10.1021/ci050400b</a>, the Blue Obelisk paper)</li>
  <li>Discussion and Outlook</li>
</ol>

<p>Chapters 2, 3, 4, and 5 are first author papers, while for chapters 6 and 7 I am just co-author.</p>

<h2 id="summary">Summary</h2>

<p>Chemometrics and chemoinformatics play important roles in the analysis and modeling of molecular data. In particular, in understanding and
prediction of properties of molecules and molecular systems. Both chemometrics and chemoinformatics apply statistics, machine learning and
informatics methodologies to chemical questions, though originating from a different background. Where chemometrics had its origins in the
extraction of information from chemical experiments, chemoinformatics had roots in the representation of chemical data for storage in
databases. The technological advances in chemistry and biochemistry in the past decades have led, however, to a flood of data and new
questions, and the data analysis and modeling have become more complex. The standing challenge in data analysis and data exchange, is how
to represent the molecular features relevant to the problem at hand. This representation of molecular information is the topic of this
thesis.</p>

<p>Chapter 1 introduces the field of data analysis and modeling of molecular data and describes the aforementioned importance of representation
of relevant features. It discusses different approaches to molecular representation, such as line notations, chemical graphs, and quantum
chemical models. Each of these have limitations when used in data analysis and modeling. Numerical representations are then introduced, which
allow the application of statistical and mathematical modeling approaches. These numerical representations are commonly derived from chemical
graph and quantum chemical representations. CoMFA and the classification of enzyme reactions are examples were the choice of molecular
representation as well as the analysis method are important.</p>

<p>The term <em>molecular chemometrics</em> is coined in Chapter 2 for the field that applies statistical modeling methods to molecular structure.
It reviews the advances made in this field in recent years. New numerical descriptors for molecules are discussed, as well as approaches to
represent molecules in more complex systems like crystal structures and reactions. Molecular descriptors are used in similarity and diversity
analysis. The applications of new methods for structure-activity and structure-property modeling, and dimension reduction are described. An
overview of recent approaches in model validation show new insights and approaches to estimate the performance of classification and regression
models. The last section of this chapter lists new databases and introduces new methods that improve the extracting of chemical data from
database and repositories. Semantic markup languages improve the exchange of data, and new methods have been introduced to extract molecular
properties from text documents.</p>

<p>Chapter 3 studies the in literature proposed use of 1D <sup>13</sup>C and <sup>1</sup>H NMR spectra as molecular descriptor. These spectra
are known to describe features relevant to physical properties like solubility and boiling point. The NMR representation is studied for the
predictive powers of its PLS models for three structure-property data sets. The results indicate that proton NMR is not suitable for building
QSPR models in combination with PLS. Carbon NMR-based models, however, do give reasonable QSPR models, and the regression vectors for the
carbon NMR data, correlate with spectral regions relevant to molecular fragments. Nevertheless, the predictive power of the carbon NMR-based
spectra is still less than models based on common molecular descriptors. It is concluded that NMR spectra should not be considered first
choice when making predictive models in general, and that proton NMR should probably not be used at all.</p>

<p>A computational method to calculate similarities between crystal structures based on a new representation is introduced in Chapter 4. While
a reference method is perfectly able to identify structures with high similarity, it fails to recognize the different similarities between
two similar structures and two completely different structures. This makes it very difficult for clustering algorithm to organize small
clusters of identical and highly similar structures into larger clusters. The new representation of crystal structures introduced in this
chapter shows a much smoother transition in similarity values when crystal structures go from identical, via similar, and finally to
dissimilar structures. Clustering a set of simulated polymorphic structures of estrone, and classification of a set of experimental
cephalosporin structures reproduce expected clustering and classification.</p>

<p>Chapter 5 uses supervised self-organizing maps to cluster crystal structures represented by their powder diffraction pattern and one or
more properties. The topological structure of the resulting maps not only depends on the similarity of the diffraction data, but also on
the properties of interest, such as cell volume, space group, and lattice energy. This approach is used to analyze and visualize large
sets of crystal structures, and the results show that these supervised maps not only give a better mapping, they can also be used to predict
crystal properties based on the diffraction patterns, and for subset selection in polymorph prediction. The two applications in
crystallography show that suitable representations and similarity measures that allow data analysis and modeling of molecular crystal data
are now available. Both approaches are flexible enough to open up a new field of research; especially combinations with other classification
schemes for crystal structures, such as those based on hydrogen bonding patterns, come to mind.</p>

<p>Chapter 6 introduces and discusses a method that allows information rich distribution of molecular data between machines, such as measuring
devices and computers. Existing approaches often imply not or badly documented semantics which may lead to information loss. CMLRSS is
proposed and combines two existing web standards: Rich Site Summaries (RSS), also known as RDF Site Summaries, and the Chemical Markup
Language (CML). Here, RSS is used as transport layer, while CML is used to contain the chemical information. CML supports a wide range of
chemical data, including molecular (crystal) structures, reaction schemes, and experimental data such as NMR spectra. It is shown that
this semantic representation allows automated dissemination of chemical data, and is increasingly used to exchange data between web
resources.</p>

<p>Chapter 7 describes a communal effort to realize interoperability in chemical informatics, which is called the Blue Obelisk movement.
This movement currently consists of more than ten smaller and larger, open source and open data projects all related to chemoinformatics
and chemistry in general. To increase the reproducibility of molecular representations, this chapter introduces a collaborative dictionary
of chemoinformatics algorithms, and a public repository of chemical data of general interest, including data for chemical elements and
isotopes, (boiling points, colors, electron affinities, masses, covalent radii, etc.), definitions of atom types, and more. The
availability of a standard set of atomic properties, open source algorithms and open data (for example via CMLRSS feeds), it is much
easier to reproduce and validate published results in molecular chemometrics. Results from Chapter 3 show that such ability is no luxury.</p>

<p>The last chapter summarizes the efforts in this thesis and how they address the challenges in molecular chemometrics. This thesis shows
the strong interaction between representation and the methods used for data analysis: molecular representation need to capture relevant
information and be compatible with the statistical methods used to analyze the data. The chapters review molecular
representations and put focus on model validation using statistics, visualization methods, and standardization approaches.</p>]]></content><author><name>Egon Willighagen</name></author><category term="cheminf" /><category term="chemometrics" /><category term="phd" /><category term="doi:10.1080/10408340600969601" /><category term="doi:10.1021/CI050282S" /><category term="doi:10.1107/S0108768104028344" /><category term="doi:10.1021/CG060872Y" /><category term="doi:10.1021/CI034244P" /><category term="doi:10.1021/CI050400B" /><summary type="html"><![CDATA[In 4.5 weeks, on Wednesday April 2 (13:30 precisely, Aula, Comeniuslaan 2, Nijmegen) I will publicly defend my PhD work performed in the Analytical Chemistry group of Prof. Lutgarde Buydens at the Radboud University Nijmegen:]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://chem-bla-ics.linkedchemistry.info/assets/images/thesisCover.png" /><media:content medium="image" url="https://chem-bla-ics.linkedchemistry.info/assets/images/thesisCover.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">My PhD Thesis: in color and grayscale</title><link href="https://chem-bla-ics.linkedchemistry.info/2008/01/23/my-phd-thesis-in-color-and-grayscale.html" rel="alternate" type="text/html" title="My PhD Thesis: in color and grayscale" /><published>2008-01-23T00:00:00+00:00</published><updated>2008-01-23T00:00:00+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/2008/01/23/my-phd-thesis-in-color-and-grayscale</id><content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2008/01/23/my-phd-thesis-in-color-and-grayscale.html"><![CDATA[<p>Wednesday is my regular day off from my metabolomics work, and today I am finalizing the layout of my thesis, which I’ll
defend on April 2. The print version will feature grayscale images with some of them in color too. However, the PDF
version that will end up in our university repository should have color prints. So, while halfway creating suitable
grayscale versions of the image, I realized I was not doing it properly. I was replacing the images; so, I lost the
color version. Not good.</p>

<p>But wait, LaTeX can do more; why not have a color and a grayscale option? Here comes <code class="language-plaintext highlighter-rouge">optional.sty</code>. By adding
<code class="language-plaintext highlighter-rouge">\usepackage{optional}</code> I can add to the source (from <code class="language-plaintext highlighter-rouge">book.tex</code>):</p>

<div class="language-latex highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nt">\begin{figure}</span>[bt]
<span class="nt">\begin{center}</span>
  <span class="k">\subfigure</span><span class="na">[]</span><span class="p">{</span>
    <span class="k">\label</span><span class="p">{</span>fig:benzene:a<span class="p">}</span>
    <span class="k">\opt</span><span class="p">{</span>color<span class="p">}{</span><span class="k">\includegraphics</span><span class="na">[width=0.4\textwidth]</span><span class="p">{</span>intro/benzoCompounds<span class="p">_</span>color<span class="p">}}</span>
    <span class="k">\opt</span><span class="p">{</span>grayscale<span class="p">}{</span><span class="k">\includegraphics</span><span class="na">[width=0.4\textwidth]</span><span class="p">{</span>intro/benzoCompounds<span class="p">}}</span>
  <span class="p">}</span>
  <span class="k">\hspace</span><span class="p">{</span>2cm<span class="p">}</span>
  <span class="k">\subfigure</span><span class="na">[]</span><span class="p">{</span>
    <span class="k">\label</span><span class="p">{</span>fig:benzene:b<span class="p">}</span>
    <span class="k">\includegraphics</span><span class="na">[width=0.18\textwidth]</span><span class="p">{</span>intro/Ferrocene-2D<span class="p">}</span>
  <span class="p">}</span>
<span class="nt">\end{center}</span>
<span class="k">\caption</span><span class="p">{</span>a) 2D diagrams of the two possible resonance structures of a compound
with a phenyl ring. Both diagrams refer to the same compounds, but the depicted
graph representations are not identical. b) 2D diagram of ferrocene, which,
like all organometallic compounds,
is difficult to represent with classical chemoinformatics approaches.<span class="p">}</span>
<span class="k">\label</span><span class="p">{</span>fig:benzene<span class="p">}</span>
<span class="nt">\end{figure}</span>
</code></pre></div></div>

<p>Ferrocene was already black-and-white, so no worry about that. And, it is just the red colored hydroxyl group.
But it serves the point :)</p>

<p>Which then allows me to run pdflatex to create a color version and a grayscale version:</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>pdflatex <span class="s2">"</span><span class="se">\d</span><span class="s2">ef</span><span class="se">\U</span><span class="s2">seOption{color}</span><span class="se">\i</span><span class="s2">nput{book}"</span>
pdflatex <span class="s2">"</span><span class="se">\d</span><span class="s2">ef</span><span class="se">\U</span><span class="s2">seOption{grayscale}</span><span class="se">\i</span><span class="s2">nput{book}"</span>
</code></pre></div></div>

<p>/me is happy</p>]]></content><author><name>Egon Willighagen</name></author><category term="latex" /><category term="phd" /><summary type="html"><![CDATA[Wednesday is my regular day off from my metabolomics work, and today I am finalizing the layout of my thesis, which I’ll defend on April 2. The print version will feature grayscale images with some of them in color too. However, the PDF version that will end up in our university repository should have color prints. So, while halfway creating suitable grayscale versions of the image, I realized I was not doing it properly. I was replacing the images; so, I lost the color version. Not good.]]></summary></entry><entry><title type="html">Free at last!</title><link href="https://chem-bla-ics.linkedchemistry.info/2006/01/19/free-at-last.html" rel="alternate" type="text/html" title="Free at last!" /><published>2006-01-19T00:00:00+00:00</published><updated>2006-01-19T00:00:00+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/2006/01/19/free-at-last</id><content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2006/01/19/free-at-last.html"><![CDATA[<p>Free at last! Well, not quite yet, but close enough anyway: my PhD contract has ended; last friday was my last working day, which my
collegues and I celebrated with a visit to Nijmegen oldest bar, <a href="https://indeblaauwehand.nl/in-de-blaauwe-hand/">In de Blauwe Hand <i class="fa-solid fa-recycle fa-xs"></i></a>.
But I still have my manuscript to finish. This formally ends a period of almost 12.5 years at the <a href="http://ru.nl/">Radboud University Nijmegen</a>.</p>

<p>Starting last monday I’m at home, trying to get things finished as soon as possible. Mostly working on my laptop, remote logged in into
our desktop machine downstairs. A good ADSL (170kB downstream) helps a lot too, and the proxy on my university machine allows me to
access the full access journals of my university.</p>

<p>I’m trying to dome some open source chemoinformatics in between writing, and my current QSAR research actually allows me to do some
feature enhancement in CDK’s QSAR package too. Today, I hope to write and finish a <a href="http://sourceforge.net/mailarchive/forum.php?thread_id=9476956&amp;forum_id=2178">config file architecture <i class="fa-solid fa-link-slash fa-xs"></i></a>
that allow fine tuning which QSAR descriptors should be calculated. I anticipate a default config files to be distributed.</p>

<p>Additionally, I will try to finish running teh CDK JUnit test against <a href="http://gnu.wildebeest.org/diary/index.php?p=147">Classpath 0.20</a>,
which 98% of Java 1.4.2 covered, and the limited support for HTML rendering is most of this last 2%. The Classpath progress has
really amazed me over the last few weeks. I have not tested Jmol and JChemPaint against the latest open source java tools, but will
try to do that before I go on holiday next week. Results with 0.19 were very promising, as I reported in earlier blog entries.</p>]]></content><author><name>Egon Willighagen</name></author><category term="phd" /><category term="cdk" /><summary type="html"><![CDATA[Free at last! Well, not quite yet, but close enough anyway: my PhD contract has ended; last friday was my last working day, which my collegues and I celebrated with a visit to Nijmegen oldest bar, In de Blauwe Hand . But I still have my manuscript to finish. This formally ends a period of almost 12.5 years at the Radboud University Nijmegen.]]></summary></entry><entry><title type="html">The annual Lunteren meeting</title><link href="https://chem-bla-ics.linkedchemistry.info/2005/11/01/annual-lunteren-meeting.html" rel="alternate" type="text/html" title="The annual Lunteren meeting" /><published>2005-11-01T00:00:00+00:00</published><updated>2005-11-01T00:00:00+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/2005/11/01/annual-lunteren-meeting</id><content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2005/11/01/annual-lunteren-meeting.html"><![CDATA[<p>Most Dutch chemists have their annual Lunteren meeting, so do I. Lunteren is a small village on the Veluwe where nothing much can be done,
except for listening to the presentations. I participate in the Lunteren meeting for analytical chemists, i.e. HPLC, MS, GC and all their
combinations upto and including HPLC/MS/MS, and since a few years the Lab-on-a-Chip stuff. And, as such, in many cases a lot of details on
how to use and develop these methods.</p>

<p>For a computational chemist, this often is too much practical detail on too little -ics. Fortunately, the proteomics, genomics, etc is a
strong upcoming funding subject, so data analysis is getting in their picture too. Which is good for someone with a chemometrics/chemoinformatics
background as funding in that area is getting smaller every year.</p>

<p>My presentation went reasonable well, as far as I can tell myself. I was very nervous with both my professor and some 150 other people in the
audience, but managed to not wander off the main topic. However, I was told to be a bit too monotone, but that’s an unfortunate effect of
being so nervous.</p>]]></content><author><name>Egon Willighagen</name></author><category term="phd" /><summary type="html"><![CDATA[Most Dutch chemists have their annual Lunteren meeting, so do I. Lunteren is a small village on the Veluwe where nothing much can be done, except for listening to the presentations. I participate in the Lunteren meeting for analytical chemists, i.e. HPLC, MS, GC and all their combinations upto and including HPLC/MS/MS, and since a few years the Lab-on-a-Chip stuff. And, as such, in many cases a lot of details on how to use and develop these methods.]]></summary></entry><entry><title type="html">Wrapping up…</title><link href="https://chem-bla-ics.linkedchemistry.info/2005/10/23/wrapping-up.html" rel="alternate" type="text/html" title="Wrapping up…" /><published>2005-10-23T00:00:00+00:00</published><updated>2005-10-23T00:00:00+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/2005/10/23/wrapping-up</id><content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2005/10/23/wrapping-up.html"><![CDATA[<p>Less then three months before the end of my contract of my PhD project. And not nearly done yet. Weekends are now spend on wrapping up
bits of experimental research into something like a coherent article. And even lot’s of calculations to do to answer the open
questions. <a href="http://freemind.sourceforge.net/">FreeMind</a> is helping me organize thoughts.</p>

<p>Opensource chemoinformatics is a welcomed diversion now and then. Working on some easy-to-fix CDK bugs yesterday, like the
<a href="https://cdk.github.io/cdk/latest/docs/api/org/openscience/cdk/isomorphism/matchers/QueryAtomContainer.html">QueryAtomContainer <i class="fa-solid fa-recycle fa-xs"></i></a> now correctly
updated for the recent <a href="http://sourceforge.net/mailarchive/forum.php?thread_id=8016575&amp;forum_id=2178">cdk.interfaces changes <i class="fa-solid fa-link-slash fa-xs"></i></a>. Fixed now.
I also touched a lot of code when updating the FSF address in the LGPL license notice, and when I modified the construction of
<a href="https://cdk.github.io/cdk/latest/docs/api/org/openscience/cdk/exception/CDKException.html">CDKException <i class="fa-solid fa-recycle fa-xs"></i></a>’s to set the causing Throwable.
Also helped out <a href="http://www.livejournal.com/users/cniehaus/">Carsten</a> a bit with adding his data from
<a href="http://edu.kde.org/kalzium/">Kalzium</a> to the <a href="http://www.blueobelisk.org/">Blue Obelisk</a>
<a href="https://github.com/BlueObelisk/bodr">data repository <i class="fa-solid fa-recycle fa-xs"></i></a>.</p>

<p>Another nice diversion is <a href="http://wesnoth.org/">The Battle for Wesnoth</a>. Just got killed, though.</p>]]></content><author><name>Egon Willighagen</name></author><category term="phd" /><category term="cdk" /><category term="career" /><summary type="html"><![CDATA[Less then three months before the end of my contract of my PhD project. And not nearly done yet. Weekends are now spend on wrapping up bits of experimental research into something like a coherent article. And even lot’s of calculations to do to answer the open questions. FreeMind is helping me organize thoughts.]]></summary></entry></feed>