<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.3.4">Jekyll</generator><link href="https://chem-bla-ics.linkedchemistry.info/feed/by_tag/workflow.xml" rel="self" type="application/atom+xml" /><link href="https://chem-bla-ics.linkedchemistry.info/" rel="alternate" type="text/html" /><updated>2026-02-28T20:19:43+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/feed/by_tag/workflow.xml</id><title type="html">chem-bla-ics</title><subtitle>Chemblaics (pronounced chem-bla-ics) is the science that uses open science and computers to solve problems in chemistry, biochemistry and related fields.</subtitle><author><name>Egon Willighagen</name></author><entry><title type="html">Open{Data|Source|Standards} is not enough: we need Open Projects</title><link href="https://chem-bla-ics.linkedchemistry.info/2008/11/07/opendatasourcestandards-is-not-enough.html" rel="alternate" type="text/html" title="Open{Data|Source|Standards} is not enough: we need Open Projects" /><published>2008-11-07T00:00:00+00:00</published><updated>2008-11-07T00:00:00+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/2008/11/07/opendatasourcestandards-is-not-enough</id><content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2008/11/07/opendatasourcestandards-is-not-enough.html"><![CDATA[<p>The <a href="http://blueobelisk.sourceforge.net/wiki/Main_Page">Blue Obelisk</a> mantra <a href="http://blueobelisk.sourceforge.net/wiki/ODOSOS">ODOSOS</a>,
Open Data, Open Source, Open Standards, is well known, and much cited too. <a href="http://usefulchem.blogspot.com/">Jean-Claude Bradley</a>
popularized the <a href="http://en.wikipedia.org/wiki/Open_Notebook_Science">Open Notebook Science</a> (ONS). This has always been nagging me a bit,
because the <a href="http://cdk.sf.net/">CDK</a>, <a href="http://www.jmol.org/">Jmol</a>, JChemPaint and other chemistry projects have done that for much
longer, though we did not use notebooks as much, so called it just an open source project. It really is no different, IMO, though
surely, there are differences.</p>

<p>Anyway, the key thing which ONS and CDK and Jmol share, is that they use an Open Notebook. Not every Open Source or Open Data project does.
Actually, many scientific Open Source are not open Projects! They are more like the Cathedral than the wished-for Bazaar (see
<a href="http://en.wikipedia.org/wiki/The_Cathedral_and_the_Bazaar">The Cathedral and the Bazaar</a>). So, Open Source (science) projects are certainly not ONS projects by default!</p>

<p>Now, the CDK actually is ONS, it is a Bazaar. The notebooks we use include:</p>

<ul>
  <li>open project via <a href="https://sourceforge.net/mail/?group_id=20024">mailing lists</a></li>
  <li>open methods/results via <a href="https://sourceforge.net/svn/?group_id=20024">subversion</a></li>
  <li>informal reporting via blogs (e.g. <a href="http://rguha.wordpress.com/">Rajarshi</a>, <a href="http://www.steinbeck-molecular.de/steinblog/">Christoph</a>, <a href="http://cdktaverna.wordpress.com/">Thomas</a>, mine)</li>
  <li>informal reporting via <a href="http://www.cdknews.org/">CDK News</a></li>
</ul>

<p>What more would you wish for? That’s not a rhetorical question. Remember that every reader of this blog is in
<a href="https://chem-bla-ics.linkedchemistry.info/2007/11/27/be-in-my-advisory-board-1-being-good.html">my advisory board <i class="fa-solid fa-recycle fa-xs"></i></a>!</p>

<p>Unfortunately, I do not create work at a workbench myself, so I do not produce new knowledge myself, other than extracted from existing
data. That’s really a shame, and I really do hope that Jean-Claude or <a href="http://blog.openwetware.org/scienceintheopen">Cameron</a> will send
me a box to measure solubilities (see <a href="http://usefulchem.blogspot.com/2008/10/rdf-triples-for-open-notebook-science.html">here</a>,
<a href="http://usefulchem.blogspot.com/2008/11/ons-solubility-web-query.html">here</a>, and
<a href="http://anybody.cephb.fr/perso/lindenb/tmp/jcbradley.rdf">here</a>,
<a href="http://rguha.wordpress.com/2008/11/06/solubility-queries-and-the-google-visualization-api/">here</a> for first data exploration),
even though I cannot participate in the <a href="http://usefulchem.blogspot.com/2008/11/submeta-open-notebook-science-awards.html">challenge</a>.
(hint, hint :)</p>

<h2 id="from-cathedral-to-bazaar-in-life-sciences">From Cathedral to Bazaar in Life Sciences</h2>

<p>One Cathedral we ran into with <a href="http://www.bioclipse.net/">Bioclipse</a> was <a href="http://www.biocatalogue.org/">BioCatalogue</a>,
which will serve as website where people can annotate and categorize (web) services. While the project has been around for a while, the
website was rather uninformative. Fortunately, the projects is going to open up, and be more Bazaar-like. For example, they
now started a <a href="http://www.biocatalogue.org/wiki">wiki</a> and a
<a href="http://listserv.manchester.ac.uk/cgi-bin/wa?SUBED1=biocatalogue-friends&amp;A=1">mailing list</a>. I hope these efforts will continue,
so that I can contribute from my point of view!</p>

<p>The <a href="http://embraceregistry.net/">EMBRACE Registry</a> is a project with similar goals and a rather nice outcome (which I learned about on
<a href="https://chem-bla-ics.linkedchemistry.info/2008/11/03/embrace-workshop-in-uppsala.html">Monday <i class="fa-solid fa-recycle fa-xs"></i></a>). It is actually anticipate to be replaced by or merge
with BioCatalogue. So, all data I entered, <a href="http://prints.cs.man.ac.uk:8081/category/tags/cheminformatics">cheminformatics workflows</a>
(look, <a href="https://chem-bla-ics.linkedchemistry.info/2008/10/18/chemoinformatics-p0wned-by.html">no ‘o’ <i class="fa-solid fa-recycle fa-xs"></i></a>), will later be available from BioCatalogue too.
That is already my first contribution to BioCatalogue. One enormously interesting feature of the Registry, is that is allows uploading of
code to test the service. This will mean the Registry will not only poll if the service is still online (by checking the WSDL file), it
will also test if the service behaves properly. Now, immediate thoughts are mashups with <a href="http://www.myexperiment.org/">MyExperiment</a>.
Each WSDL entry in the Registry points to MyExperiment workflows that use them, and the workflow page would indicate the status of all
used WDSL services. This integration was already anticipated long before I thought about it, as the involved Cathedrals were nicely
located in the same floor in Manchester.</p>

<p>Below is a screenshot from the EMBRACE Registry for the <a href="http://www.chemspider.com/">ChemSpider</a>
<a href="http://prints.cs.man.ac.uk:8081/service/massspecapi">WDSL entry</a> for <a href="http://www.myexperiment.org/workflows/97">a workspace</a>
I <a href="https://chem-bla-ics.linkedchemistry.info/2007/11/26/metabolomics-workflows-in-taverna.html">uploaded <i class="fa-solid fa-recycle fa-xs"></i></a> about a year ago to MyExperiment:</p>

<p><img src="/assets/images/registry.png" alt="" /></p>

<p>BTW, ChemSpider has an Advisory Board of which I am member, but it is also a classical (and intentional) Cathedral project. We do share common interests though, which makes us collaborate.</p>

<h2 id="why-important">Why Important?</h2>

<p>One recurrent theme in Open Source is <a href="http://en.wikipedia.org/wiki/Given_enough_eyeballs">given enough eyeballs, all bugs are shallow</a>.
This surely applies to science as well. The difference between the two is that in current science the eyes only inspect with a delay of at
least 6 months. Current practice is that research is finished (delay), and when decided publishable written up a paper (delay, and loosing
valuable information in the process, as you can read in my blog all the time), and published (even more delay). ONS changes that, and so do
Bazaar-like open source projects, such as the CDK, Jmol and Bioclipse. They bugs are present, whether we like it or not, not just in source
code, but in science too. Theories get overthrown, but why should we like the long delays current scientific good practice? Hate it! Work
around it. Use the Bazaar. Use ONS!</p>

<p>Now, ONS actually needs Open Source, allowing them to deal effectively with the data they produce; to allow extraction of new scientific
knowledge from the measurements. If Rajarshi and Pierre would not have made their efforts, other could not easily join in, leading to
those much hated delays. Bugs should be shallow, and openness allows us to make those bugs visible. We can prove that there is a bug,
without having to reproduce data ourselves, leading to those nasty delays again. Just copy the data, compare it to your own, do your
analysis.</p>

<p>One recent project in open source chemistry dealing with making bugs visible, is the web page set up by Andreas Tille for the
<a href="http://alioth.debian.org/projects/debichem">DebiChem project</a>. His page <a href="http://cdd.alioth.debian.org/debichem/bugs/">summarizes the bugs</a>
listed for the chemistry in Debian (which includes the Blue Obelisk projects <a href="http://packages.debian.org/lenny/avogadro">Avogadro</a>,
<a href="http://packages.debian.org/lenny/bodr">BODR</a>, <a href="http://packages.debian.org/lenny/libcdk-java">CDK</a>,
<a href="http://packages.debian.org/lenny/chemical-mime-data">Chemical MIME Data</a>,
<a href="http://packages.debian.org/lenny/kalzium">Kalzium</a> and <a href="http://packages.debian.org/lenny/openbabel">OpenBabel</a>):</p>

<p><img src="/assets/images/debichem.png" alt="" /></p>

<p>This data analysis helps the projects being analyzed.</p>

<h2 id="packaging">Packaging</h2>

<p>This brings me to a last topic, for this blog: packaging using Open Standards. In order to allow those eyeballs to spot bugs, it is of the
utmost importance to package your results in Open Standards, and not just one, but likely many. For Open Source projects this ultimately
means Distribution Packages (deb or rpm). If that goal has been achieved, you know your results can be read by anyone. Software should be
installable (make, ant, cmake, etc), and Data should be readable (no PDF, but RDF, XML, JSON, or whatever standard). Preferably not Excel,
as this is too free format (as Rajarshi also <a href="http://rguha.wordpress.com/2008/11/06/solubility-queries-and-the-google-visualization-api/">indicated</a>),
but with some added conventions it may do well. Blue Obelisk project are generally doing well in terms of packaging.</p>

<p>For the CDK, which already is reasonably well packaged, I am currently working on <a href="http://cdk.svn.sourceforge.net/viewvc/cdk/cdk-eclipse/trunk/">Eclipse</a>
and <a href="http://cdk.svn.sourceforge.net/viewvc/cdk/cdk-pom/trunk/">Maven2</a> packages. The former is already being used by Bioclipse, while the
second aims at <a href="https://sourceforge.net/projects/cml">Jumbo</a> (which has just seen a
<a href="https://sourceforge.net/project/showfiles.php?group_id=51361">new release</a>. <a href="http://wwmm.ch.cam.ac.uk/blogs/downing/">Jim</a>,
I’m happy to see the CMLDOM/Jumbo split!), <a href="http://www.cdk-taverna.de/">CDK-Taverna</a>, and possibly a third (Paula, what for do you plan
to use it?). The POM export is not fully working yet, but with four research sites involved in this Open Project, I’m sure we’ll work
it out.</p>

<p>The bottom line is, scientific progress would benefit so much from a Bazaar approach. And the key thing is not collaboration; that’s
something you can do in a Cathedral-like fashion too. No, the key thing is to be Open and allow anyone, even your worst nightmare, to
comment on what you do. Let him prove you wrong, openly, that is.</p>

<p>OK, there it is. My open notebook entry for this week. Now you know what I have been up to this week.</p>]]></content><author><name>Egon Willighagen</name></author><category term="odosos" /><category term="chemspider" /><category term="workflow" /><category term="cdk" /><category term="bioclipse" /><category term="cml" /><category term="debian" /><category term="eclipse" /><category term="rdf" /><category term="jmol" /><category term="blue-obelisk" /><summary type="html"><![CDATA[The Blue Obelisk mantra ODOSOS, Open Data, Open Source, Open Standards, is well known, and much cited too. Jean-Claude Bradley popularized the Open Notebook Science (ONS). This has always been nagging me a bit, because the CDK, Jmol, JChemPaint and other chemistry projects have done that for much longer, though we did not use notebooks as much, so called it just an open source project. It really is no different, IMO, though surely, there are differences.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://chem-bla-ics.linkedchemistry.info/assets/images/registry.png" /><media:content medium="image" url="https://chem-bla-ics.linkedchemistry.info/assets/images/registry.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Taverna runs with Classpath 0.91</title><link href="https://chem-bla-ics.linkedchemistry.info/2006/05/18/taverna-runs-with-classpath-091.html" rel="alternate" type="text/html" title="Taverna runs with Classpath 0.91" /><published>2006-05-18T00:00:00+00:00</published><updated>2006-05-18T00:00:00+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/2006/05/18/taverna-runs-with-classpath-091</id><content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2006/05/18/taverna-runs-with-classpath-091.html"><![CDATA[<p>Classpath 0.91 <a href="http://www.gnu.org/software/classpath/announce/20060515.html">is released</a> with
<a href="http://jroller.com/page/dgilbert?entry=1_45_million_lines_of">1.45 million</a> lines of code and with
<a href="http://www.kaffe.org/~stuart/japi/htmlout/h-jdk14-classpath.html">98.96%</a> coverage of Java 1.4.2,
and 99.82% of java.swing. Or, as <a href="http://jroller.com/page/dgilbert?entry=gnu_classpath_0_91">Dave calls it</a>:
0.91 rocks! <a href="https://chem-bla-ics.linkedchemistry.info/2005/11/20/open-source-swing-jchempaint-runs.html">JChemPaint runs again <i class="fa-solid fa-recycle fa-xs"></i></a>
(they fixed the XML parsing problem), and <a href="https://chem-bla-ics.linkedchemistry.info/2005/11/27/open-source-swing-jmol-renderer-runs.html">Jmol still runs &lt;i class=”fa-solid fa-recycle fa-xs”</a>,
<a href="http://developer.classpath.org/mediation/FreeSwingTestApps">but slow</a>. I also tested
<a href="http://taverna.sourceforge.net/">Taverna</a> which now also starts up, but has an XML parsing error too:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Exception occured whilst loading RDFS! Error on line 2: required string: "?&gt;"
org.jdom.input.JDOMParseException: Error on line 2: required string: "?&gt;"
   at org.jdom.input.SAXBuilder.build(SAXBuilder.java:468)
   at org.jdom.input.SAXBuilder.build(SAXBuilder.java:851)
   at org.embl.ebi.escience.scufl.semantics.RDFSParser.loadRDFSDocument(RDFSParser.java:70)
   at org.embl.ebi.escience.scuflui.workbench.Workbench.main(Workbench.java:128)
   at java.lang.reflect.Method.invokeNative(Native Method)
   at java.lang.reflect.Method.invoke(Method.java:355)
   at org.embl.ebi.escience.scuflui.workbench.WorkbenchLauncher.main(WorkbenchLauncher.java:40)
</code></pre></div></div>

<p>Oh, and rumours go that <a href="http://www.nongnu.org/gcjwebplugin/">gcjwebplugin</a> can run the Jmol applet now,
except for the JavaScript interaction, that is.</p>]]></content><author><name>Egon Willighagen</name></author><category term="java" /><category term="workflow" /><category term="jchempaint" /><category term="taverna" /><summary type="html"><![CDATA[Classpath 0.91 is released with 1.45 million lines of code and with 98.96% coverage of Java 1.4.2, and 99.82% of java.swing. Or, as Dave calls it: 0.91 rocks! JChemPaint runs again (they fixed the XML parsing problem), and Jmol still runs &lt;i class=”fa-solid fa-recycle fa-xs”, but slow. I also tested Taverna which now also starts up, but has an XML parsing error too:]]></summary></entry><entry><title type="html">The goal: a live chemblaics CD</title><link href="https://chem-bla-ics.linkedchemistry.info/2005/11/18/goal-live-chemblaics-cd.html" rel="alternate" type="text/html" title="The goal: a live chemblaics CD" /><published>2005-11-18T00:00:00+00:00</published><updated>2005-11-18T00:00:00+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/2005/11/18/goal-live-chemblaics-cd</id><content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2005/11/18/goal-live-chemblaics-cd.html"><![CDATA[<p>This evening I have been looking at with the <a href="http://www.knoppix.net/">KNOPPIX</a> customization howto, and ran many of the interesting commands.
I’ve setup a environment with Kalzium, OpenBabel, CDK, jython, <a href="http://pymol.sourceforge.net/">PyMOL</a>, and for development I included gcj and
Eclipse. At some later point I will include kfile_chemical too, but I want to make a deb package first.</p>

<p>Moreover, I also wanted it to include JChemPaint, Jmol and <a href="http://taverna.sourceforge.net/">Taverna</a> (with the CDK extension). However, these
depend on Swing, which is not suffiently provided by open source java virtual machines. I attempted gij 4.0, <a href="http://www.kaffe.org/">kaffe</a>
and <a href="http://sablevm.org/">sablevm</a>, all without success.</p>

<p>A live CD with all the open source chemo- and bioinformatics tools would be a real killer. We could take a burned live CD with us to conferences
and have others run our software on their laptop! But we need to stop use Swing. Fortunately, there seems to be a serious project going on to
port JChemPaint and Jmol to a free Java GUI environment, so maybe we can have the live CD up and going before the 2006 conferences start.</p>]]></content><author><name>Egon Willighagen</name></author><category term="cheminf" /><category term="linux" /><category term="java" /><category term="workflow" /><summary type="html"><![CDATA[This evening I have been looking at with the KNOPPIX customization howto, and ran many of the interesting commands. I’ve setup a environment with Kalzium, OpenBabel, CDK, jython, PyMOL, and for development I included gcj and Eclipse. At some later point I will include kfile_chemical too, but I want to make a deb package first.]]></summary></entry><entry><title type="html">CDK-Taverna fully recognized</title><link href="https://chem-bla-ics.linkedchemistry.info/2005/10/18/cdk-taverna-fully-recognized.html" rel="alternate" type="text/html" title="CDK-Taverna fully recognized" /><published>2005-10-18T00:00:00+00:00</published><updated>2005-10-18T00:00:00+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/2005/10/18/cdk-taverna-fully-recognized</id><content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2005/10/18/cdk-taverna-fully-recognized.html"><![CDATA[<p>After asking about it, Tom explained me how <a href="http://taverna.sf.net/">Taverna</a> can pick
up the <code class="language-plaintext highlighter-rouge">apiconsumer.xml</code> file from jars: just copy it into the root directory of the jar package. Easy as that.</p>

<p>So, users now only need to copy the <code class="language-plaintext highlighter-rouge">cdk-taverna.jar</code> into the <code class="language-plaintext highlighter-rouge">taverna-workbench-1.3/lib/</code> directory and have a nice chemoinformatics
workbench environment. I’ll upload the jar to <a href="http://sourceforge.net/projects/cdk">CDK’s project page</a> right now.</p>]]></content><author><name>Egon Willighagen</name></author><category term="cdk" /><category term="workflow" /><summary type="html"><![CDATA[After asking about it, Tom explained me how Taverna can pick up the apiconsumer.xml file from jars: just copy it into the root directory of the jar package. Easy as that.]]></summary></entry></feed>