<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.3.4">Jekyll</generator><link href="https://chem-bla-ics.linkedchemistry.info/feed/by_tag/jmol.xml" rel="self" type="application/atom+xml" /><link href="https://chem-bla-ics.linkedchemistry.info/" rel="alternate" type="text/html" /><updated>2026-06-15T12:00:19+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/feed/by_tag/jmol.xml</id><title type="html">chem-bla-ics</title><subtitle>Chemblaics (pronounced chem-bla-ics) is the science that uses open science and computers to solve problems in chemistry, biochemistry and related fields.</subtitle><author><name>Egon Willighagen</name></author><entry><title type="html">25 years of the Chemistry Development Kit</title><link href="https://chem-bla-ics.linkedchemistry.info/2025/09/28/25-years-of-the-chemistry-development-kit.html" rel="alternate" type="text/html" title="25 years of the Chemistry Development Kit" /><published>2025-09-28T00:00:00+00:00</published><updated>2025-09-28T00:00:00+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/2025/09/28/25-years-of-the-chemistry-development-kit</id><content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2025/09/28/25-years-of-the-chemistry-development-kit.html"><![CDATA[<p>Twenty five years ago the <a href="https://cdk.github.io/">Chemistry Development Kit</a> (CDK) was founded. The Chemistry and Internet (<a href="https://www.google.com/search?q=ChemInt2000">ChemInt2000</a>)
had just ended (it ran from 23 to 26 September) and my friend and I had taken the Amtrak night train from Washington to South Bend. At that time there
were two leading Java applets for chemistry, <a href="https://jchempaint.github.io/">JChemPaint</a> and <a href="http://jmol.org/">Jmol</a>. I had hacked Chemical Markup
Language support into both of them, and <a href="https://chemistry.nd.edu/people/dan-gezelter/">Dan Gezelter</a> (Jmol and <a href="https://openscience.org/">openscience.org</a>),
<a href="http://www.steinbeck-molecular.de/steinblog/">Christoph Steinbeck</a> (JChemPaint), and me took the opportunity of being in North America
to discuss if we could use a common code base. Chris’ <em>compchem</em> had done something similar. Peter Murray-Rust, who had also attended ChemInt2000
like me and Chris did not attend.</p>

<p>I do not remember exactly, but I guess we must have met on the 28th and 29th? Maybe already on Wednesday. During this meeting we discussed a common
data model (yes, Jmol used the CDK data model at some point) and somewhere during the meeting we wrote down a name for the project. There was the
Java Development Kit, so this could be the Chemistry Development Kit. The name stuck.</p>

<p>A quick post like this cannot do credit to the history of the CDK, nor of everyone involved in the past or still is. You can browse some of the history
of the CDK in <a href="https://chem-bla-ics.linkedchemistry.info/tag/cdk">my blog</a> and in <a href="http://www.steinbeck-molecular.de/steinblog/index.php/category/chemistry-development-kit/">Chris’ blog</a>.
It has been an amazing journey and with a small grant just behind us (with  Alyanne de Haan, René van der Ploeg, and Marc Teunis from Hogeschool Utrecht),
and all the awesome things ongoing (new JChemPaint, various extensions, upgraded downstream tools), the CDK is alive and kicking.</p>

<p>A huge congrats and thanks to everyone (and every company and organization) who contributed code to the CDK with this huge milestone. There are a few people
that I want to particularly thank (see the AUTHORS file for all names): Chris, who in the late nineties made a difference with open source in chemistry,
Dan, for Jmol and hosting this memorable meeting at Notre Dame University, Rajarshi Guha, who operated <em>CDK Nightly</em> for many years, well before Travis
and Google Actions, Stefan, Miguel, Gilleain, and Christian, for many years of contributions to the CDK, and John Mayfield, the current
CDK release manager.</p>]]></content><author><name>Egon Willighagen</name></author><category term="cdk" /><category term="jchempaint" /><category term="jmol" /><category term="openscience" /><category term="chemistry" /><summary type="html"><![CDATA[Twenty five years ago the Chemistry Development Kit (CDK) was founded. The Chemistry and Internet (ChemInt2000) had just ended (it ran from 23 to 26 September) and my friend and I had taken the Amtrak night train from Washington to South Bend. At that time there were two leading Java applets for chemistry, JChemPaint and Jmol. I had hacked Chemical Markup Language support into both of them, and Dan Gezelter (Jmol and openscience.org), Christoph Steinbeck (JChemPaint), and me took the opportunity of being in North America to discuss if we could use a common code base. Chris’ compchem had done something similar. Peter Murray-Rust, who had also attended ChemInt2000 like me and Chris did not attend.]]></summary></entry><entry><title type="html">Citing the Chemistry Development Kit</title><link href="https://chem-bla-ics.linkedchemistry.info/2010/02/18/citing-chemistry-development-kit.html" rel="alternate" type="text/html" title="Citing the Chemistry Development Kit" /><published>2010-02-18T00:00:00+00:00</published><updated>2010-02-18T00:00:00+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/2010/02/18/citing-chemistry-development-kit</id><content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2010/02/18/citing-chemistry-development-kit.html"><![CDATA[<p>Two weeks ago, a paper by Peter Ertl was published about <a href="http://www.jcheminf.com/content/2/1/1">Molecular structure input on the web</a>
(doi:<a href="https://doi.org/10.1186/1758-2946-2-1">10.1186/1758-2946-2-1</a>). In this paper, he discusses the state of things and describes his
contribution to this field, the <a href="http://www.molinspiration.com/jme/">JME Molecule Editor</a>. The article also cites the CDK, but only
the website and not one of the two papers (doi:<a href="https://doi.org/10.1021/ci025584y">10.1021/ci025584y</a>, or
doi:<a href="10.2174/138161206777585274">10.2174/138161206777585274</a>). This is not an isolated case, but a common pattern. In principle, the
proper work is cited, and nothing is wrong. Practically it means, that a citation to the <a href="http://cdk.sf.net/">CDK website</a>
does not show up in the citation network. This is <strong><em>not</em></strong> a problem caused by these papers, but merely by the nature current citation
databases work: they only count citations between journal articles, and only sometimes extend to books or conference abstracts.</p>

<p>Now, addressing the limitations of the current citation databases is technically simple, and purely blocked by social and commercial
aspects. The Citation Typing Ontology by <a href="http://www.zoo.ox.ac.uk/staff/academics/shotton_dm.htm">David Shotton</a> defines the framework
to define citation types, independent from any existing database. The semantic web technologies will take it from there, and allow
aggregation etc.</p>

<p>There are some things to think about on how to use such citation networks, though. If we calculate the impact of the CDK project,
we should combine citation counts to the website(s), papers, etc, after removal of duplicates, etc. The
<a href="http://imageweb.zoo.ox.ac.uk/pub/2009/citobase/cito-20090311/cito-content/owldoc/objectproperties/cites.html">cito:cites</a> does
link to resources, and the CDK paper resources is not the same as the CDK website resource. But, we could define a Project Class,
where both are foo:partOf. Then, we could define that <em>the triple chain the:citingWork cito:cites the:CDKArticle foo:partOf the:CDKProject</em>
would imply <em>the triple the:citingWork cito:cites the:CDKProject</em>.</p>

<h2 id="typed-citations">Typed Citations</h2>

<p>Now, while writing up this blog, I realize that my fork of this morning, <a href="http://github.com/egonw/bibo-cto">A BIBO Citation Typing Ontology</a>,
might actually be counter-productive in the long run, as I was only working out a solution to a simpler, but different problem, which the
CiTO also addresses: a citation is not typed. When a paper does cite the CDK paper, we still do not know if it <em>uses</em> the CDK, or
merely mentioned it as <em>related-but-unused</em>, or even <em>refuted</em> work.</p>

<p>Now, as I am leaning towards the Biobliography Ontology as RDF-based system for my references, and been using this already in the
<a href="http://rdf.farmbio.uu.se/chembl/snorql/">RDF store hosting the ChEMBL data</a>,
I forked the CiTO to define rdfs:domain and rdfs:range on <a href="http://bibotools.googlecode.com/svn/bibo-ontology/trunk/doc/classes/Document___-538479979.html">bibo:Document</a>.
The CiTO 1.5 actually defines a large set of document types too, and I rather see BIBO reused.</p>

<p>This indeed has the downside that the bibocto:cites cannot be used for the above chaining, and this might bite me seriously later.
Well, nothing wrong with a failing experiment, right? For now, it will serve my purpose: setting up a citation database for the CDK
project papers.</p>

<h2 id="the-cdk-citation-database">The CDK citation database</h2>

<p>So, here goes (it’s <a href="http://www.w3.org/TR/xhtml-rdfa-primer/">RDFa-enabled</a>; check this
<a href="http://www.w3.org/2007/08/pyRdfa/extract?uri=http://chem-bla-ics.blogspot.com/2010/02/citing-chemistry-development-kit.html">RDF pulled out</a>): <!-- keep link --></p>

<pre xmlns:bibo="http://purl.org/ontology/bibo/" xmlns:bibocto="http://github.com/egonw/bibo-cto/" about="urn:doi:10.1021/ci025584y" rel="bibocto:cites" typeof="bibo:Article">@prefix bibo: &lt;http://purl.org/ontology/bibo/&gt;.
@prefix bibocto: &lt;http://github.com/egonw/bibo-cto/&gt;.

&lt;urn:doi:10.1186/1758-2946-2-1&gt; a bibo:Article ;
  bibocto:cites <span about="urn:doi:10.1021/ci025584y">&lt;urn:doi:10.1021/ci025584y&gt;</span> .
</pre>

<p>I am not entirely happy about the error-prone XHTML+RDFa of the above example, and
<a href="http://www.semanticoverflow.com/questions/573/how-to-create-rdfa-powered-n3-in-the-html-output">filed a question of better solution</a> on
<a href="http://www.semanticoverflow.com/">SemanticOverflow</a>.</p>

<p>While the above example merely defines the citation of Peter Ertl’s article to the CDK (whether that is valid or not… would
he have cited the other paper perhaps?), the citation typing allows me to state how the CDK paper is cited. Now, Peter states:</p>

<blockquote>
  <p>It is also gratifying to see the advent of open source movement in cheminformatics on the Internet, as advocated for example
by the Blue Obelisk Group (<a href="http://blueobelisk.sourceforge.net/wiki/Main_Page">40</a>) and witnessed by collaborative projects
like Chemistry Development Kit CDK (<a href="http://sourceforge.net/apps/mediawiki/cdk/index.php?title=Main_Page">41</a>),
Jmol (<a href="http://jmol.sourceforge.net/">42</a>), Bioclipse (<a href="http://www.bioclipse.net/">43</a>) and several others.</p>
</blockquote>

<p>So, I think it is fair to state that:</p>

<pre xmlns:bibo="http://purl.org/ontology/bibo/" xmlns:bibocto="http://github.com/egonw/bibo-cto/" about="urn:doi:10.1021/ci025584y" rel="bibocto:credits">&lt;urn:doi:10.1186/1758-2946-2-1&gt; bibocto:credits <span about="urn:doi:10.1021/ci025584y">&lt;urn:doi:10.1021/ci025584y&gt;</span> .
</pre>

<p>which is very much appreciated!</p>]]></content><author><name>Egon Willighagen</name></author><category term="cdk" /><category term="cito" /><category term="bioclipse" /><category term="jchempaint" /><category term="jmol" /><category term="rdf" /><category term="owl" /><category term="justdoi:10.1186/1758-2946-2-1" /><category term="doi:10.1021/CI025584Y" /><category term="doi:10.1021/CI025584Y" /><summary type="html"><![CDATA[Two weeks ago, a paper by Peter Ertl was published about Molecular structure input on the web (doi:10.1186/1758-2946-2-1). In this paper, he discusses the state of things and describes his contribution to this field, the JME Molecule Editor. The article also cites the CDK, but only the website and not one of the two papers (doi:10.1021/ci025584y, or doi:10.2174/138161206777585274). This is not an isolated case, but a common pattern. In principle, the proper work is cited, and nothing is wrong. Practically it means, that a citation to the CDK website does not show up in the citation network. This is not a problem caused by these papers, but merely by the nature current citation databases work: they only count citations between journal articles, and only sometimes extend to books or conference abstracts.]]></summary></entry><entry><title type="html">SWAT4LS: wrapping up #1</title><link href="https://chem-bla-ics.linkedchemistry.info/2009/11/25/swat4ls-wrapping-up-1.html" rel="alternate" type="text/html" title="SWAT4LS: wrapping up #1" /><published>2009-11-25T00:00:00+00:00</published><updated>2009-11-25T00:00:00+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/2009/11/25/swat4ls-wrapping-up-1</id><content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2009/11/25/swat4ls-wrapping-up-1.html"><![CDATA[<p>It’s already been five days since the <a href="http://www.swat4ls.org/2009/index.php">SWAT4LS</a> meeting (<a href="http://swat4ls.blogspot.com/">matching blog</a>),
and finally got around to writing up my personal summary. I very much enjoyed the <a href="http://blueobelisk.stackexchange.com/questions/16/who-will-go-to-swat4ls-and-wants-to-join-a-blue-obelisk-dinner">Blue Obelisk dinner</a>
on Thursday evening with <a href="http://semanticscience.wordpress.com/">Nico</a>, <a href="http://duncan.hull.name/">Duncan</a>, and
<a href="http://blog.chemdom.com/">Miguel</a> (the <a href="http://cdk.sf.net/">CDK</a> one).</p>

<p>The SWAT4LS was fun, interesting, perhaps to short, but very much appreciated! Thanx to all organizers! During the day various people tweeted the
meeting, using the <a href="http://search.twitter.com/search?q=%23swat4ls2009">#swat4ls2009</a> hashtag (forwarded to <a href="http://friendfeed.com/swat4ls2009">a FriendFeed room</a>),
while Nico covered things in various blog posts which I’ll link to below where appropriate. Summaries I have seen so far are from
<a href="http://semanticscience.wordpress.com/2009/11/24/semantic-web-tools-and-applications-for-life-sciences-2009-a-personal-summary/">Nico</a>
and <a href="http://duncan.hull.name/2009/11/24/swat4ls/">Duncan</a> (again :), and <a href="http://swat4ls.blogspot.com/2009/11/swat4ls-aftermath.html">the organizers</a>.</p>

<p>The day kicked off with a presentation by Alan Ruttenberg (<a href="http://semanticscience.wordpress.com/2009/11/20/swat4ls2009-keynote-alan-ruttenberg-semantic-web-technology-to-support-studying-the-relation-of-hla-structure-variation-to-disease/">Nico’s coverage</a>).
It nicely demonstrated where the semantic web for life sciences is going too. Particularly interesting was the integration of SPARQL with Jmol in
<a href="http://http//neurocommons.org/page/ImmPort/JmolViz">ImmPort/JmolViz</a>: it uses Jmol to visualize a PDB entry, while using SPARQL to retrieve atomic
and residue annotation, using Jmol script (we have to thank another Miguel (the <a href="http://www.jmol.org/">Jmol</a> one) for taking the scripting
and visualization capabilities <a href="http://sourceforge.net/mailarchive/forum.php?thread_name=64707.217.127.90.82.1035878883.squirrel@www.howards.org&amp;forum_name=jmol-developers">to the next level in 2002</a>).
It always makes me proud to see one of the projects I have worked on to hit a prominent place in keynote talks at conferences :)</p>

<p>Alan also clarified that <a href="http://creativecommons.org/choose/zero">CC0</a> is not a license, but a statement about the <em>public domain</em> nature
of data; there is nothing to accept, nothing to live up to. The important is, and I am sure most of my readers are well aware of that, is
that it formalized the public domain concept by wrapping it in a full CC0 statement. My recommendation to all who want to make (chemical data)
available as <em>public domain</em>, use the CC0; just because the CC0 works in any country, and it will make a lot of your users very happy.
<strong>If you cannot claim CC0 because you are not really owner (as I have seen done), do not claim the data to be public domain either
then (which was done)!</strong></p>

<p>There was also note of the <a href="http://www.co-ode.org/ontologies/amino-acid/2009/02/16/">Amino Acid Ontology</a>, which comes closer to our groups
proteochemometrics work, but I have yet to look if this can be used for or linked protein descriptors. Also interesting is the idea behind
<a href="https://github.com/alanruttenberg/rdfherd">RDFHerd <i class="fa-solid fa-recycle fa-xs"></i></a>, a project aiming to distribute RDF data sets as installable packages. If I understood
correctly, only <a href="http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/">Virtuoso</a> is yet supported, but this thing can fly, particularly,
if these packages are easily converted into <a href="http://www.debian.org/doc/FAQ/ch-pkg_basics.en.html">Debian packages</a>.</p>

<p>More wrapping up will follow, but got other business to do first now.</p>]]></content><author><name>Egon Willighagen</name></author><category term="swat4ls" /><category term="blue-obelisk" /><category term="jmol" /><category term="sparql" /><summary type="html"><![CDATA[It’s already been five days since the SWAT4LS meeting (matching blog), and finally got around to writing up my personal summary. I very much enjoyed the Blue Obelisk dinner on Thursday evening with Nico, Duncan, and Miguel (the CDK one).]]></summary></entry><entry><title type="html">New Bioclipse Features: Kabsch Alignment, RMSD Distance and Tanimoto Simarlity Matrices</title><link href="https://chem-bla-ics.linkedchemistry.info/2009/11/04/new-bioclipse-features-kabsch-alignment.html" rel="alternate" type="text/html" title="New Bioclipse Features: Kabsch Alignment, RMSD Distance and Tanimoto Simarlity Matrices" /><published>2009-11-04T01:00:00+00:00</published><updated>2009-11-04T01:00:00+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/2009/11/04/new-bioclipse-features-kabsch-alignment</id><content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2009/11/04/new-bioclipse-features-kabsch-alignment.html"><![CDATA[<p>We recently submitted a second paper on <a href="http://www.bioclipse.net/">Bioclipse</a>, and have worked hard in the past two weeks on addressing the
reviewers’ questions (and we love these feature requests! See also these <a href="http://bioclipse.blogspot.com/2009/11/download-pdbs-with-bioclipse.html">two</a>
<a href="http://bioclipse.blogspot.com/2009/11/align-sequences-with-kalign-web-service.html">blogs</a>). One reviewer seemed very interested in seeing
docking available in Bioclipse. While we do not have a full docking feature set up for Bioclipse, we do have functionality to deal with 3D
structures, though our researched urged us to focus on the 2D side of cheminformatics so far.</p>

<p>To strengthen our intentions towards the 3D cheminformatics world, we have implemented a few new features, using <a href="http://cdk.sf.net/">CDK</a>
functionality. For example, we added Kabsch aligment and the related RMSD between molecular structures implemented as both popup menus
as well as manager methods. The manager method you can see in action in <a href="http://www.myexperiment.org/workflows/937">MyExperiment workflow 937</a>,
which you can download directly into Bioclipse with one simple command (see
<a href="https://chem-bla-ics.linkedchemistry.info/2009/11/04/bioclipse-manager-for-myexperimentorg.html">Bioclipse Manager for MyExperiment.org <i class="fa-solid fa-recycle fa-xs"></i></a>):</p>

<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">var</span> <span class="nx">smileses</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">Array</span><span class="p">(</span><span class="dl">"</span><span class="s2">CC(C)C</span><span class="dl">"</span><span class="p">,</span> <span class="dl">"</span><span class="s2">CCCN</span><span class="dl">"</span><span class="p">,</span> <span class="dl">"</span><span class="s2">CCC=O</span><span class="dl">"</span><span class="p">);</span>

<span class="kd">var</span> <span class="nx">unaligned</span> <span class="o">=</span> <span class="nx">cdk</span><span class="p">.</span><span class="nf">createMoleculeList</span><span class="p">();</span>
<span class="k">for </span><span class="p">(</span><span class="nx">i</span><span class="o">=</span><span class="mi">0</span><span class="p">;</span> <span class="nx">i</span><span class="o">&lt;</span><span class="nx">smileses</span><span class="p">.</span><span class="nx">length</span><span class="p">;</span> <span class="nx">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
  <span class="nx">mol</span> <span class="o">=</span> <span class="nx">cdk</span><span class="p">.</span><span class="nf">fromSMILES</span><span class="p">(</span><span class="nx">smileses</span><span class="p">[</span><span class="nx">i</span><span class="p">]);</span>
  <span class="nx">mol</span> <span class="o">=</span> <span class="nx">cdk</span><span class="p">.</span><span class="nf">generate3dCoordinates</span><span class="p">(</span><span class="nx">mol</span><span class="p">)</span>
  <span class="nx">unaligned</span><span class="p">.</span><span class="nf">add</span><span class="p">(</span><span class="nx">mol</span><span class="p">);</span>
<span class="p">}</span>

<span class="kd">var</span> <span class="nx">aligned</span> <span class="o">=</span> <span class="nx">cdk</span><span class="p">.</span><span class="nf">kabsch</span><span class="p">(</span><span class="nx">unaligned</span><span class="p">)</span>

<span class="nx">jmol</span><span class="p">.</span><span class="nf">load</span><span class="p">(</span><span class="nx">aligned</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="mi">0</span><span class="p">));</span>
<span class="k">for </span><span class="p">(</span><span class="nx">i</span><span class="o">=</span><span class="mi">1</span><span class="p">;</span> <span class="nx">i</span><span class="o">&lt;</span><span class="nx">aligned</span><span class="p">.</span><span class="nf">size</span><span class="p">();</span> <span class="nx">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
  <span class="nx">jmol</span><span class="p">.</span><span class="nf">append</span><span class="p">(</span><span class="nx">aligned</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="nx">i</span><span class="p">));</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Now, we do have to update the use of Jmol in Bioclipse, and a big overhaul is scheduled for the 2.4 released in February next year. But you
get the idea.</p>

<p>As said, there are two stories to adding this new functionality. Because we want all GUI interaction the user performs to be recordable
(Scientist 1: <em>What did you do to get those nice results?</em> Scientist 2: <em>I pushed that button in the that long menu</em>. Scientist 1:
<em>What button is that?</em> Scientist 2: <em>Wait, I send you the BSL script with a Google Wave.</em>)</p>

<p>The managers that allow this recording is Bioclipse specific, and also the reason why it would not be trivial to make a general Bioclipse
plugin for Eclipse… some Spring magic is used to inject the managers into the JavaScript language. Anyway, the second thing is to add
a GUI element, like popup menus. Now, this is a particular area where Eclipse excels. Now, I did have to ask for the details, as I am
not using this daily (I’m doing science, not IT), but Ola was kind enough to give me the pointers for it.</p>

<p>The below configuration snippet links the pop up action to Bioclipse Navigator content (you know, where your MDL SD, CML, script and other
files show up in Bioclipse). <strong><em>But</em></strong> only if I have selected 3 or more files! And, only if those files are actually some molecular
content with 3D coordinates! And Bioclipse inherits this functionality by using the Eclipse platform.</p>

<div class="language-xml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nt">&lt;menuContribution</span>
  <span class="na">locationURI=</span><span class="s">"popup:org.eclipse.ui.popup.any?after=additions"</span><span class="nt">&gt;</span>
  <span class="nt">&lt;command</span>
    <span class="na">commandId=</span><span class="s">"net.bioclipse.cdk.ui.handlers.kabschAlignment"</span>
    <span class="na">label=</span><span class="s">"Perform Kabsch Alignment"</span>
    <span class="na">icon=</span><span class="s">"icons/molecule2D.png"</span><span class="nt">&gt;</span>
    <span class="nt">&lt;visibleWhen&gt;</span>
      <span class="nt">&lt;with</span> <span class="na">variable=</span><span class="s">"selection"</span><span class="nt">&gt;</span>
        <span class="nt">&lt;count</span> <span class="na">value=</span><span class="s">"(2-"</span><span class="nt">/&gt;</span>
        <span class="nt">&lt;iterate</span> <span class="na">operator=</span><span class="s">"and"</span> <span class="na">ifEmpty=</span><span class="s">"false"</span><span class="nt">&gt;</span>
          <span class="nt">&lt;adapt</span> <span class="na">type=</span><span class="s">"org.eclipse.core.resources.IResource"</span><span class="nt">&gt;</span>
            <span class="nt">&lt;or&gt;</span>
              <span class="nt">&lt;test</span> <span class="na">property=</span><span class="s">"org.eclipse.core.resources.contentTypeId"</span>
                       <span class="na">value=</span><span class="s">"net.bioclipse.contenttypes.cml.singleMolecule3d"</span><span class="nt">/&gt;</span>
              <span class="nt">&lt;test</span> <span class="na">property=</span><span class="s">"org.eclipse.core.resources.contentTypeId"</span>
                       <span class="na">value=</span><span class="s">"net.bioclipse.contenttypes.cml.singleMolecule5d"</span><span class="nt">/&gt;</span>
              <span class="nt">&lt;test</span> <span class="na">property=</span><span class="s">"org.eclipse.core.resources.contentTypeId"</span>
                       <span class="na">value=</span><span class="s">"net.bioclipse.contenttypes.mdlMolFile3D"</span><span class="nt">/&gt;</span>
            <span class="nt">&lt;/or&gt;</span>
          <span class="nt">&lt;/adapt&gt;</span>
        <span class="nt">&lt;/iterate&gt;</span>
      <span class="nt">&lt;/with&gt;</span>
    <span class="nt">&lt;/visibleWhen&gt;</span>
  <span class="nt">&lt;/command&gt;</span>
<span class="nt">&lt;/menuContribution&gt;</span>
</code></pre></div></div>

<p>When Bioclipse is run, this looks like:</p>

<p><img src="/assets/images/kabsch.png" alt="" /></p>

<p>And the alignment results will nicely show up in a Jmol viewer (while it is implemented as an Eclipse editor, it is not yet):</p>

<p><img src="/assets/images/bioclipseKabsch1.png" alt="" /></p>

<p>The first screenshot also shows the new pop-up menus for calculating two matrices for 3 or more molecules. One is based on the
<a href="http://en.wikipedia.org/wiki/Root_mean_square_deviation">RMSD</a> of the 3D atomic coordinats of the atoms in the
<a href="http://blog.rguha.net/?p=113">MCSS</a> (BTW, Asad’s SMSD work is making its way into the CDK library, and will be available in a
later Bioclipse version too.) and will create a distance matrix. The second new pop-up menu used the Tanimoto similarity
measure based on CDK fingerprints on the selected chemical graphs. If the Bioclipse Statistics feature is installed, the
created <a href="http://en.wikipedia.org/wiki/Comma-separated_values">CSV</a> files will open up in a matrix editor:</p>

<p><img src="/assets/images/rmsdMatrix.png" alt="" /></p>

<p>Kabsch alignment of protein backbones is planned for a later Bioclipse release, but an important feature for
<a href="http://www.ncbi.nlm.nih.gov/sites/entrez?term=proteochemometrics%20wikberg">our groups proteochemometrics work</a>.</p>]]></content><author><name>Egon Willighagen</name></author><category term="cheminf" /><category term="cdk" /><category term="bioclipse" /><category term="jmol" /><summary type="html"><![CDATA[We recently submitted a second paper on Bioclipse, and have worked hard in the past two weeks on addressing the reviewers’ questions (and we love these feature requests! See also these two blogs). One reviewer seemed very interested in seeing docking available in Bioclipse. While we do not have a full docking feature set up for Bioclipse, we do have functionality to deal with 3D structures, though our researched urged us to focus on the 2D side of cheminformatics so far.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://chem-bla-ics.linkedchemistry.info/assets/images/bioclipseKabsch1.png" /><media:content medium="image" url="https://chem-bla-ics.linkedchemistry.info/assets/images/bioclipseKabsch1.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">The Dr Who’s of Life Sciences</title><link href="https://chem-bla-ics.linkedchemistry.info/2009/06/21/dr-whos-of-life-sciences.html" rel="alternate" type="text/html" title="The Dr Who’s of Life Sciences" /><published>2009-06-21T00:00:00+00:00</published><updated>2009-06-21T00:00:00+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/2009/06/21/dr-whos-of-life-sciences</id><content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2009/06/21/dr-whos-of-life-sciences.html"><![CDATA[<p>Peter recently wrote up a model of how several <a href="http://en.wikipedia.org/wiki/Blue_Obelisk">Blue Obelisk</a> (please contribute to the page!)
projects changed in history: <a href="http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=2059">The Doctor Who Model of Open Source</a>. This was later
picked up by <a href="http://opendotdotdot.blogspot.com/2009/06/doctor-who-model-of-open-source.html">Glyn</a> and then by
<a href="http://tech.slashdot.org/story/09/06/19/1326254/The-Doctor-Who-Model-of-Open-Source?from=rss">Slashdot</a> (second time Peter got that
fame; that’s one of the advantages of working at a well-known institute, instead of something like Uppsala University. Beside
<a href="http://www.bioclipse.net/">Bioclipse</a>, <a href="http://www.gromacs.org/">GROMACS</a> and the <a href="http://cdk.sf.net/">CDK</a>,
<a href="http://en.wikipedia.org/wiki/MySQL_AB">MySQL AB</a> actually has a headquarters here.) Thanx to
<a href="http://www.steinbeck-molecular.de/steinblog/index.php/2009/06/19/geek-knighthood-blue-obelisk-on-slashdot/comment-page-1/#comment-8809">Chris who pointed me</a>
to the Slashdot coverage.</p>

<p>Now, the several blogs and the Slashdot item contain interesting discussions on whether the ‘Dr Who’ model is the best model of how
open source projects can evolve. Fact is, at least, that the model does not describe a new phenomenon; Peter merely describes a how
the Blue Obelisk deals with the limited resources we have in cheminformatics, and that the succession of project leaders ensures both
the scientists interest (who are generally not payed for development or, $Foo forbid, maintenance of scientific data analysis methods)
as well as the project itself. This makes life science open source different from most pure-IT projects: open source academic software
is always something on the side.</p>

<p>So, when Miguel Howard turned to <a href="http://www.jmol.org/">Jmol</a>, he had seemingly unlimited resources to work on Jmol and he had great
ideas and made them work: Miguel is the father of the now so popular Jmol applet with scripting functionality. It did mean that the
integration with the CDK I worked on, as planned by the original Jmol author <a href="http://www.openscience.org/blog/">Dan Gezelter</a>,
<a href="http://www.steinbeck-molecular.de/steinblog/">Christoph</a> and me in 2000: the CDK data model was too slow (it is amazing how fast
Jmol is, without using accelerated graphics! See this Nature Preceedings paper:
DOI:<a href="http://dx.doi.org/10.1038/npre.2007.50.1">10.1038/npre.2007.50.1</a>). My attention was better spend on the CDK.</p>

<p>Now, if the need arises, and the current Jmol head Bob looses interest or time, I’ll be available to take over again. That is
less likely to happen for an older Dr. Who actor. Several Slashdot commenters also pointed out that the model also matches the
‘drummer-in-a-band’ model. I guess, or lead-singer… This moved the discussion of what the model exactly models. Peter writes:</p>

<blockquote>
  <p>“Instead the Blue Obelisk community seems to have evolved a “Doctor Who” model. You’ll recall that every few years something
fatal happens to the Doctor and you think he is going to die and there will never be another series. Then he regenerates.
The new Doctor has a different personality, a different philosophy (though always on the side of good). It is never clear
how long any Doctor will remain unregenerated or who will come after him. And this is a common theme in the Blue Obelisk.”</p>
</blockquote>

<p>This brings me back to the earlier observation I wrote down: science is different, and Peter is right when he says
<em>you think he is going to die and there will never be another series</em>. This thought is justified for many open source science
projects; in Glyn’s blog there is the remark of lack of data, but I think if someone would count of the number of dead open
source science projects, I think the outcome will be that the fear is highly justified.</p>

<p>This is likely also the power of the <a href="http://en.wikipedia.org/wiki/Blue_Obelisk">Blue Obelisk</a>: it creates a lively and
rewarding community with equally minded people, forming an eco-system where the individual projects can flourish. Maybe
someone can come of with a good metaphore for the Blue Obelisk, matching the Dr Who model? BBC comes to mind: is the BBC
an eco-system where small TV series can survive?</p>

<p>Anyways, my father used to watch Dr Who, and being compared to Dr Who is much more rewarding than being compared to a
drummer in some band.</p>]]></content><author><name>Egon Willighagen</name></author><category term="openscience" /><category term="blue-obelisk" /><category term="cdk" /><category term="jmol" /><category term="doi:10.1038/NPRE.2007.50.1" /><summary type="html"><![CDATA[Peter recently wrote up a model of how several Blue Obelisk (please contribute to the page!) projects changed in history: The Doctor Who Model of Open Source. This was later picked up by Glyn and then by Slashdot (second time Peter got that fame; that’s one of the advantages of working at a well-known institute, instead of something like Uppsala University. Beside Bioclipse, GROMACS and the CDK, MySQL AB actually has a headquarters here.) Thanx to Chris who pointed me to the Slashdot coverage.]]></summary></entry><entry><title type="html">Preferential positions of phophate counter ions</title><link href="https://chem-bla-ics.linkedchemistry.info/2009/03/20/preferential-positions-of-phophate.html" rel="alternate" type="text/html" title="Preferential positions of phophate counter ions" /><published>2009-03-20T00:00:00+00:00</published><updated>2009-03-20T00:00:00+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/2009/03/20/preferential-positions-of-phophate</id><content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2009/03/20/preferential-positions-of-phophate.html"><![CDATA[<p>A long time ago (‘96 or so?), as a student with the no longer existing CAOS/CAMM (Google shows some traces, like
<a href="https://doi.org/10.3233/CMI-2014-000003">this chapter describing the centre</a>), I did a short internship
with Hilbert Bruijn-Slot (I hope I remember his name correctly), where has asked me to look at data in the CSD, and in
particular the prefered position of phosphate counter ions. It was a fun research, and almost made it into a paper, if we
were not just beating by a few months by a group of Russians who just published the same.</p>

<p>Today, <a href="http://chem-bla-ics.blogspot.com/2009/03/nature-chemistry-improves-publishing.html?showComment=1237542960000#c7670910429973706274">Neil asked me</a> <!-- keep link -->
to look at another Nature Chemistry paper (DOI:<a href="http://dx.doi.org/10.1038/nchem.100">10.1038/nchem.100</a>), and in particular
<a href="http://www.nature.com/nchem/journal/v1/n1/compound/nchem.100_ci.html">its Chemical Compounds table</a>. I could not directly
spot the thing not in the table I discussed, but did notice the phosphate salts in the table. Not uncommonly, the counter ions are not near the phosphate in this diagram and I wondered how they did this in 3D.</p>

<p>Well, bringing back good memories to that internship I mentioned, <a href="http://www.nature.com/nchem/journal/v1/n1/compound/nchem.100_comp5_3d.html">the 3D model</a>
shown by <a href="http://www.jmol.org/">Jmol</a> actually does show the salt, and with the two sodiums near the phosphate;
even better, they sit at very recognisable positions :)</p>

<p><img src="/assets/images/phosphateSalt.png" alt="" /></p>]]></content><author><name>Egon Willighagen</name></author><category term="chemistry" /><category term="jmol" /><category term="justdoi:10.1038/nchem.100" /><category term="justdoi:10.3233/CMI-2014-000003" /><summary type="html"><![CDATA[A long time ago (‘96 or so?), as a student with the no longer existing CAOS/CAMM (Google shows some traces, like this chapter describing the centre), I did a short internship with Hilbert Bruijn-Slot (I hope I remember his name correctly), where has asked me to look at data in the CSD, and in particular the prefered position of phosphate counter ions. It was a fun research, and almost made it into a paper, if we were not just beating by a few months by a group of Russians who just published the same.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://chem-bla-ics.linkedchemistry.info/assets/images/phosphateSalt.png" /><media:content medium="image" url="https://chem-bla-ics.linkedchemistry.info/assets/images/phosphateSalt.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Nature Chemistry improves publishing chemistry: a detailed analysis</title><link href="https://chem-bla-ics.linkedchemistry.info/2009/03/19/nature-chemistry-improves-publishing.html" rel="alternate" type="text/html" title="Nature Chemistry improves publishing chemistry: a detailed analysis" /><published>2009-03-19T00:00:00+00:00</published><updated>2009-03-19T00:00:00+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/2009/03/19/nature-chemistry-improves-publishing</id><content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2009/03/19/nature-chemistry-improves-publishing.html"><![CDATA[<p><a href="http://www.nature.com/nchem/">Nature Chemistry</a> just released the first issue with a few free papers,
like <em>Asymmetric total syntheses of (+)- and (-)-versicolamide B and biosynthetic implications</em> by Miller et al.
(DOI:<a href="https://doi.org/10.1038/nchem.110">10.1038/nchem.110</a>).</p>

<p>Now, we’ve seen the Royal Society of Chemistry’s <a href="http://chem-bla-ics.blogspot.com/search?q=project+prospect">Project Prospect</a> <!-- keep link -->
(see <a href="https://chem-bla-ics.linkedchemistry.info/2007/02/01/rsc-first-publisher-to-go-semantic.html">RSC: the first publisher to go semantic! <i class="fa-solid fa-recycle fa-xs"></i></a>)
and ChemSpiders recent <a href="http://www.chemmantis.com/">ChemMantis</a> system which enriches
the papers with machine readable representations of the molecules discussed in those
papers. The new Nature publication has been in the works for a while, and they
<a href="http://blogs.nature.com/thescepticalchymist/2008/05/jj_day_98_service_with_a_simpl.html">asked</a>
the community before what a Nature Chemistry paper should like like, and I replied in
<a href="https://chem-bla-ics.linkedchemistry.info/2008/05/08/re-what-should-nature-chemistry-paper.html">Re: What should a Nature Chemistry paper look like? <i class="fa-solid fa-recycle fa-xs"></i></a>.</p>

<h2 id="the-verdict">The verdict</h2>

<p>So, have the been listening? Is the HTML they produce semantic? Is it data rich? Or is it
just another hamburger? Well, I am very happy to see some of the suggestions I made picked
up (though I do not fool myself in believing I am the only one that suggested those
features). A tour of good things, and points for improvement.</p>

<p>The first impression is not shocking; it looks like any other interface, with molecules drawn as images in the paper:</p>

<p><img src="/assets/images/nchem3.png" alt="" /></p>

<p>All structures that are numbered and linked (as in <em>C6-epi-stephacidin A (Compound <strong>13</strong>)</em>
have a hover-over function to popup a drawing of the structure:</p>

<p><img src="/assets/images/nchem4.png" alt="" /></p>

<p>The popup image is a nice gimmick, but not really sematically useful. The link, however,
is! It points to a separate supplementary page with further information which include
a image of the 2D structure and, following a link, the 3D structure in <a href="http://www.jmol.org/">Jmol</a>.
Moreover, it comes with the machine readable representations:</p>

<p><img src="/assets/images/nchem5.png" alt="" /></p>

<p>This is indeed interesting, and a big step forward, though please do note my comments later.
For convenience, all molecules with such supplementary information is available from the
special Chemical Compounds section of the paper:</p>

<p><img src="/assets/images/nchem2.png" alt="" /></p>

<p>Excellent! This really is a step forward towards a data-rich paper! Indeed, I will shortly
write up a <a href="http://www.bioclipse.net/">Bioclipse</a> plugin for Nature Chemistry, which
will download all molecular structures based on the DOI! Anyway, more on that later…
For this article, that table looks like:</p>

<p><img src="/assets/images/nchem1.png" alt="" /></p>

<p>By now, you likely also noted the links to <a href="http://pubchem.ncbi.nlm.nih.gov/">PubChem</a>, and
indeed, upon publication of a paper, all structures are deposited in the public domain:</p>

<p><img src="/assets/images/nchem6.png" alt="" /></p>

<p>At last but not least, each molecule is available in the <a href="http://en.wikipedia.org/wiki/Chemical_Markup_Language">Chemical Markup Language</a>
(with 2D coordinates)! And you know I am a very happy CML user for a long time (see e.g.
Peter’s recent blog <a href="https://blogs.ch.cam.ac.uk/pmr/2009/03/13/egon-willighagen-and-cml/">Egon Willighagen and CML <i class="fa-solid fa-recycle fa-xs"></i></a>).
BTW, one comment on the CML: the namespace used is the outdated namespace, <strong>not</strong>
the current one (see <a href="http://cmlexplained.blogspot.com/2007/06/there-can-be-only-one-namespace.html">There can be only one (namespace)</a>).
(But the <a href="http://cdk.sf.net/">CDK</a> and Bioclipse will read it anyway.)</p>

<h2 id="details-matter">Details matter</h2>

<p>So, while the first impression was not shocking, it was a bit deceptive. <em>Nature Chemistry</em>
really changes publishing of chemistry. But I have bad news too. They need to improve the
HTML they produce.</p>

<p>But before pointing out some missed chances, let me reply <em>inter alia</em> to Peter’s recent
work on the Open Source plugin for including semantic chemistry in MS-Word documents
(see [How can we publish semantic chemical documents? <i class="fa-solid fa-recycle fa-xs">&lt;/i](https://blogs.ch.cam.ac.uk/pmr/2009/03/16/how-can-we-publish-semantic-chemical-documents/)):
Nature Chemistry seems to have done a great job with existing tools. Nevertheless, I fully
back up Peters comment that while the plugin is useless without Word, the results produced
with the plugin are extremely Open Standard, and enormously reusable! Indeed, while the
Word file format is only formally an true Open Standard, the file format is plain XML, and
extracting content bearing the CML namespace is trivial.</i></p>

<p>Which reminds me, if someone from the Nature Chemistry team is reading this, please point
me to a blog what tools actually <em>are</em> involved in publishing a Nature Chemistry paper!
I think we all like to know.</p>

<p>Now, the <a href="http://en.wikipedia.org/wiki/HTML">HTML</a> has room for improvement. First of all,
a look at the metadata defined for the web page of the article shows a <em>description</em>
and <em>keywords</em> about the journal, not the article, and the same goes for the web pages for
the molecules:</p>

<p><img src="/assets/images/nchem7.png" alt="" /></p>

<p>Additionally, the compound details web page has no special markup for the machine readable
information:</p>

<p><img src="/assets/images/nchem8.png" alt="" /></p>

<p>Or, if it does, it’s still mixed with markup for visual pleasing output:</p>

<p><img src="/assets/images/nchem9.png" alt="" /></p>

<p>Still, the HTML is clean enough to have some regular expressions extract a good deal of
information, and there is also still the PubChem deposition.</p>

<h2 id="beyond-connection-tables">Beyond connection tables</h2>

<p>Like many other chemistry journals, Nature Chemistry does not consider properties of
the molecule interesting, and NMR spectra are hidden in the Supplementary Information.
This paper in particular, disregards a lot of machine readable facts by putting all
experimental section bits in a PDF document. So, the next challenge for Nature Chemistry
will be to get the authors of papers contribute the original spectra (JCAMP-DX, CMLSpect,
etc) in the supplementary information section. Better, have the raw data or even the NMR
peak-atom annotations deposited in public repositories such (see 
<a href="https://chem-bla-ics.linkedchemistry.info/2009/03/04/open-nmr-data-raw-curves-and-annotated.html">Open NMR data: raw curves and annotated peak lists <i class="fa-solid fa-recycle fa-xs"></i></a>).</p>

<p>All in all, I am rather positive about the first Nature Chemistry issue, and like to
thank the editors and paper authors for there efforts on improving publishing chemistry!</p>]]></content><author><name>Egon Willighagen</name></author><category term="inchi" /><category term="justdoi:10.1038/nchem.110" /><category term="chemistry" /><category term="jmol" /><summary type="html"><![CDATA[Nature Chemistry just released the first issue with a few free papers, like Asymmetric total syntheses of (+)- and (-)-versicolamide B and biosynthetic implications by Miller et al. (DOI:10.1038/nchem.110).]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://chem-bla-ics.linkedchemistry.info/assets/images/nchem4.png" /><media:content medium="image" url="https://chem-bla-ics.linkedchemistry.info/assets/images/nchem4.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Bioclipse: a powerful Jmol application</title><link href="https://chem-bla-ics.linkedchemistry.info/2009/03/12/bioclipse-powerful-jmol-application.html" rel="alternate" type="text/html" title="Bioclipse: a powerful Jmol application" /><published>2009-03-12T00:00:00+00:00</published><updated>2009-03-12T00:00:00+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/2009/03/12/bioclipse-powerful-jmol-application</id><content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2009/03/12/bioclipse-powerful-jmol-application.html"><![CDATA[<p>While <a href="http://www.bioclipse.net/">Bioclipse</a> is much more, it could be an interesting alternative to the
<a href="http://www.jmol.org/">Jmol</a> application. It offers:</p>

<ul>
  <li>a scripting console</li>
  <li>a file browser (the <a href="http://www.eclipse.org/">Eclipse</a> way)</li>
  <li>an outline of the file content which allows selections</li>
  <li>a script editor</li>
</ul>

<p>The underlying RCP toolkit has many other interesting features for a Jmol application, but the above is up and running:</p>

<p><img src="/assets/images/jmolBioclipse.png" alt="" /></p>]]></content><author><name>Egon Willighagen</name></author><category term="bioclipse" /><category term="jmol" /><summary type="html"><![CDATA[While Bioclipse is much more, it could be an interesting alternative to the Jmol application. It offers:]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://chem-bla-ics.linkedchemistry.info/assets/images/jmolBioclipse.png" /><media:content medium="image" url="https://chem-bla-ics.linkedchemistry.info/assets/images/jmolBioclipse.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">RSC now allows Jmol in main text of publication… well, almost</title><link href="https://chem-bla-ics.linkedchemistry.info/2009/01/19/rsc-now-allows-jmol-in-main-text-of.html" rel="alternate" type="text/html" title="RSC now allows Jmol in main text of publication… well, almost" /><published>2009-01-19T00:00:00+00:00</published><updated>2009-01-19T00:00:00+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/2009/01/19/rsc-now-allows-jmol-in-main-text-of</id><content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2009/01/19/rsc-now-allows-jmol-in-main-text-of.html"><![CDATA[<p>Richard Kidd wrote in the <a href="http://prospect.rsc.org/blogs/cw/?p=1315">ChemistryWorldBlog</a> about Henry Rzepa to have published two papers in
<a href="http://www.rsc.org/">RSC</a> journals where Jmol is part of the main paper, after having used Jmol in extra material in ACS journals before.
The key here is that the <a href="http://www.jmol.org/">Jmol</a> is part of the official text… when you open the paper in a browser, you immediately
get to see the Jmol live, 3D graphics! Well, so it is said in the blog.</p>

<p>However, when I checked the HTML of the first of the two papers (<em>A computational investigation of the structure of polythiocyanogen</em>,
doi:<a href="http://dx.doi.org/10.1039/b810147g">10.1039/b810147g</a>). The main HTML <strong>still</strong> links to a supplementary page. Progress, but not
perfect either:</p>

<p><img src="/assets/images/henryJmolOnline.png" alt="" /></p>]]></content><author><name>Egon Willighagen</name></author><category term="jmol" /><category term="publishing" /><category term="justdoi:10.1039/b810147g" /><summary type="html"><![CDATA[Richard Kidd wrote in the ChemistryWorldBlog about Henry Rzepa to have published two papers in RSC journals where Jmol is part of the main paper, after having used Jmol in extra material in ACS journals before. The key here is that the Jmol is part of the official text… when you open the paper in a browser, you immediately get to see the Jmol live, 3D graphics! Well, so it is said in the blog.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://chem-bla-ics.linkedchemistry.info/assets/images/henryJmolOnline.png" /><media:content medium="image" url="https://chem-bla-ics.linkedchemistry.info/assets/images/henryJmolOnline.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Open{Data|Source|Standards} is not enough: we need Open Projects</title><link href="https://chem-bla-ics.linkedchemistry.info/2008/11/07/opendatasourcestandards-is-not-enough.html" rel="alternate" type="text/html" title="Open{Data|Source|Standards} is not enough: we need Open Projects" /><published>2008-11-07T00:00:00+00:00</published><updated>2008-11-07T00:00:00+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/2008/11/07/opendatasourcestandards-is-not-enough</id><content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2008/11/07/opendatasourcestandards-is-not-enough.html"><![CDATA[<p>The <a href="http://blueobelisk.sourceforge.net/wiki/Main_Page">Blue Obelisk</a> mantra <a href="http://blueobelisk.sourceforge.net/wiki/ODOSOS">ODOSOS</a>,
Open Data, Open Source, Open Standards, is well known, and much cited too. <a href="http://usefulchem.blogspot.com/">Jean-Claude Bradley</a>
popularized the <a href="http://en.wikipedia.org/wiki/Open_Notebook_Science">Open Notebook Science</a> (ONS). This has always been nagging me a bit,
because the <a href="http://cdk.sf.net/">CDK</a>, <a href="http://www.jmol.org/">Jmol</a>, JChemPaint and other chemistry projects have done that for much
longer, though we did not use notebooks as much, so called it just an open source project. It really is no different, IMO, though
surely, there are differences.</p>

<p>Anyway, the key thing which ONS and CDK and Jmol share, is that they use an Open Notebook. Not every Open Source or Open Data project does.
Actually, many scientific Open Source are not open Projects! They are more like the Cathedral than the wished-for Bazaar (see
<a href="http://en.wikipedia.org/wiki/The_Cathedral_and_the_Bazaar">The Cathedral and the Bazaar</a>). So, Open Source (science) projects are certainly not ONS projects by default!</p>

<p>Now, the CDK actually is ONS, it is a Bazaar. The notebooks we use include:</p>

<ul>
  <li>open project via <a href="https://sourceforge.net/mail/?group_id=20024">mailing lists</a></li>
  <li>open methods/results via <a href="https://sourceforge.net/svn/?group_id=20024">subversion</a></li>
  <li>informal reporting via blogs (e.g. <a href="http://rguha.wordpress.com/">Rajarshi</a>, <a href="http://www.steinbeck-molecular.de/steinblog/">Christoph</a>, <a href="http://cdktaverna.wordpress.com/">Thomas</a>, mine)</li>
  <li>informal reporting via <a href="http://www.cdknews.org/">CDK News</a></li>
</ul>

<p>What more would you wish for? That’s not a rhetorical question. Remember that every reader of this blog is in
<a href="https://chem-bla-ics.linkedchemistry.info/2007/11/27/be-in-my-advisory-board-1-being-good.html">my advisory board <i class="fa-solid fa-recycle fa-xs"></i></a>!</p>

<p>Unfortunately, I do not create work at a workbench myself, so I do not produce new knowledge myself, other than extracted from existing
data. That’s really a shame, and I really do hope that Jean-Claude or <a href="http://blog.openwetware.org/scienceintheopen">Cameron</a> will send
me a box to measure solubilities (see <a href="http://usefulchem.blogspot.com/2008/10/rdf-triples-for-open-notebook-science.html">here</a>,
<a href="http://usefulchem.blogspot.com/2008/11/ons-solubility-web-query.html">here</a>, and
<a href="http://anybody.cephb.fr/perso/lindenb/tmp/jcbradley.rdf">here</a>,
<a href="http://rguha.wordpress.com/2008/11/06/solubility-queries-and-the-google-visualization-api/">here</a> for first data exploration),
even though I cannot participate in the <a href="http://usefulchem.blogspot.com/2008/11/submeta-open-notebook-science-awards.html">challenge</a>.
(hint, hint :)</p>

<h2 id="from-cathedral-to-bazaar-in-life-sciences">From Cathedral to Bazaar in Life Sciences</h2>

<p>One Cathedral we ran into with <a href="http://www.bioclipse.net/">Bioclipse</a> was <a href="http://www.biocatalogue.org/">BioCatalogue</a>,
which will serve as website where people can annotate and categorize (web) services. While the project has been around for a while, the
website was rather uninformative. Fortunately, the projects is going to open up, and be more Bazaar-like. For example, they
now started a <a href="http://www.biocatalogue.org/wiki">wiki</a> and a
<a href="http://listserv.manchester.ac.uk/cgi-bin/wa?SUBED1=biocatalogue-friends&amp;A=1">mailing list</a>. I hope these efforts will continue,
so that I can contribute from my point of view!</p>

<p>The <a href="http://embraceregistry.net/">EMBRACE Registry</a> is a project with similar goals and a rather nice outcome (which I learned about on
<a href="https://chem-bla-ics.linkedchemistry.info/2008/11/03/embrace-workshop-in-uppsala.html">Monday <i class="fa-solid fa-recycle fa-xs"></i></a>). It is actually anticipate to be replaced by or merge
with BioCatalogue. So, all data I entered, <a href="http://prints.cs.man.ac.uk:8081/category/tags/cheminformatics">cheminformatics workflows</a>
(look, <a href="https://chem-bla-ics.linkedchemistry.info/2008/10/18/chemoinformatics-p0wned-by.html">no ‘o’ <i class="fa-solid fa-recycle fa-xs"></i></a>), will later be available from BioCatalogue too.
That is already my first contribution to BioCatalogue. One enormously interesting feature of the Registry, is that is allows uploading of
code to test the service. This will mean the Registry will not only poll if the service is still online (by checking the WSDL file), it
will also test if the service behaves properly. Now, immediate thoughts are mashups with <a href="http://www.myexperiment.org/">MyExperiment</a>.
Each WSDL entry in the Registry points to MyExperiment workflows that use them, and the workflow page would indicate the status of all
used WDSL services. This integration was already anticipated long before I thought about it, as the involved Cathedrals were nicely
located in the same floor in Manchester.</p>

<p>Below is a screenshot from the EMBRACE Registry for the <a href="http://www.chemspider.com/">ChemSpider</a>
<a href="http://prints.cs.man.ac.uk:8081/service/massspecapi">WDSL entry</a> for <a href="http://www.myexperiment.org/workflows/97">a workspace</a>
I <a href="https://chem-bla-ics.linkedchemistry.info/2007/11/26/metabolomics-workflows-in-taverna.html">uploaded <i class="fa-solid fa-recycle fa-xs"></i></a> about a year ago to MyExperiment:</p>

<p><img src="/assets/images/registry.png" alt="" /></p>

<p>BTW, ChemSpider has an Advisory Board of which I am member, but it is also a classical (and intentional) Cathedral project. We do share common interests though, which makes us collaborate.</p>

<h2 id="why-important">Why Important?</h2>

<p>One recurrent theme in Open Source is <a href="http://en.wikipedia.org/wiki/Given_enough_eyeballs">given enough eyeballs, all bugs are shallow</a>.
This surely applies to science as well. The difference between the two is that in current science the eyes only inspect with a delay of at
least 6 months. Current practice is that research is finished (delay), and when decided publishable written up a paper (delay, and loosing
valuable information in the process, as you can read in my blog all the time), and published (even more delay). ONS changes that, and so do
Bazaar-like open source projects, such as the CDK, Jmol and Bioclipse. They bugs are present, whether we like it or not, not just in source
code, but in science too. Theories get overthrown, but why should we like the long delays current scientific good practice? Hate it! Work
around it. Use the Bazaar. Use ONS!</p>

<p>Now, ONS actually needs Open Source, allowing them to deal effectively with the data they produce; to allow extraction of new scientific
knowledge from the measurements. If Rajarshi and Pierre would not have made their efforts, other could not easily join in, leading to
those much hated delays. Bugs should be shallow, and openness allows us to make those bugs visible. We can prove that there is a bug,
without having to reproduce data ourselves, leading to those nasty delays again. Just copy the data, compare it to your own, do your
analysis.</p>

<p>One recent project in open source chemistry dealing with making bugs visible, is the web page set up by Andreas Tille for the
<a href="http://alioth.debian.org/projects/debichem">DebiChem project</a>. His page <a href="http://cdd.alioth.debian.org/debichem/bugs/">summarizes the bugs</a>
listed for the chemistry in Debian (which includes the Blue Obelisk projects <a href="http://packages.debian.org/lenny/avogadro">Avogadro</a>,
<a href="http://packages.debian.org/lenny/bodr">BODR</a>, <a href="http://packages.debian.org/lenny/libcdk-java">CDK</a>,
<a href="http://packages.debian.org/lenny/chemical-mime-data">Chemical MIME Data</a>,
<a href="http://packages.debian.org/lenny/kalzium">Kalzium</a> and <a href="http://packages.debian.org/lenny/openbabel">OpenBabel</a>):</p>

<p><img src="/assets/images/debichem.png" alt="" /></p>

<p>This data analysis helps the projects being analyzed.</p>

<h2 id="packaging">Packaging</h2>

<p>This brings me to a last topic, for this blog: packaging using Open Standards. In order to allow those eyeballs to spot bugs, it is of the
utmost importance to package your results in Open Standards, and not just one, but likely many. For Open Source projects this ultimately
means Distribution Packages (deb or rpm). If that goal has been achieved, you know your results can be read by anyone. Software should be
installable (make, ant, cmake, etc), and Data should be readable (no PDF, but RDF, XML, JSON, or whatever standard). Preferably not Excel,
as this is too free format (as Rajarshi also <a href="http://rguha.wordpress.com/2008/11/06/solubility-queries-and-the-google-visualization-api/">indicated</a>),
but with some added conventions it may do well. Blue Obelisk project are generally doing well in terms of packaging.</p>

<p>For the CDK, which already is reasonably well packaged, I am currently working on <a href="http://cdk.svn.sourceforge.net/viewvc/cdk/cdk-eclipse/trunk/">Eclipse</a>
and <a href="http://cdk.svn.sourceforge.net/viewvc/cdk/cdk-pom/trunk/">Maven2</a> packages. The former is already being used by Bioclipse, while the
second aims at <a href="https://sourceforge.net/projects/cml">Jumbo</a> (which has just seen a
<a href="https://sourceforge.net/project/showfiles.php?group_id=51361">new release</a>. <a href="http://wwmm.ch.cam.ac.uk/blogs/downing/">Jim</a>,
I’m happy to see the CMLDOM/Jumbo split!), <a href="http://www.cdk-taverna.de/">CDK-Taverna</a>, and possibly a third (Paula, what for do you plan
to use it?). The POM export is not fully working yet, but with four research sites involved in this Open Project, I’m sure we’ll work
it out.</p>

<p>The bottom line is, scientific progress would benefit so much from a Bazaar approach. And the key thing is not collaboration; that’s
something you can do in a Cathedral-like fashion too. No, the key thing is to be Open and allow anyone, even your worst nightmare, to
comment on what you do. Let him prove you wrong, openly, that is.</p>

<p>OK, there it is. My open notebook entry for this week. Now you know what I have been up to this week.</p>]]></content><author><name>Egon Willighagen</name></author><category term="odosos" /><category term="chemspider" /><category term="workflow" /><category term="cdk" /><category term="bioclipse" /><category term="cml" /><category term="debian" /><category term="eclipse" /><category term="rdf" /><category term="jmol" /><category term="blue-obelisk" /><summary type="html"><![CDATA[The Blue Obelisk mantra ODOSOS, Open Data, Open Source, Open Standards, is well known, and much cited too. Jean-Claude Bradley popularized the Open Notebook Science (ONS). This has always been nagging me a bit, because the CDK, Jmol, JChemPaint and other chemistry projects have done that for much longer, though we did not use notebooks as much, so called it just an open source project. It really is no different, IMO, though surely, there are differences.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://chem-bla-ics.linkedchemistry.info/assets/images/registry.png" /><media:content medium="image" url="https://chem-bla-ics.linkedchemistry.info/assets/images/registry.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry></feed>