{
  "version": "https://jsonfeed.org/version/1.1",
  "title": "chem-bla-ics",
  "description": "Chemblaics (pronounced chem-bla-ics) is the science that uses open science and computers to solve problems in chemistry, biochemistry and related fields.",
  "home_page_url": "https://chem-bla-ics.linkedchemistry.info/",
  "feed_url": "https://chem-bla-ics.linkedchemistry.info/feed.json",
  "icon": "https://chem-bla-ics.linkedchemistry.info/assets/images/chem-bla-ics_logo.png",
  "language": "en",
  "authors": [
    {
      "name": "Egon Willighagen",
      "url": "https://orcid.org/0000-0001-7542-0286",
      "_orcid": "0000-0001-7542-0286"
    }
  ],
  "items": [
    {
      "id": "https://doi.org/10.59350/bmxve-vry14",
      "url": "https://chem-bla-ics.linkedchemistry.info/2026/04/04/swat4hcls-2026.html",
      "title": "SWAT4HCLS 2026",
      "content_html": "<p>A bit over a week ago, <a href=\"https://www.swat4ls.org/workshops/amsterdam2026/\">SWAT4HCLS 2026</a> took place, with the matching\n<a href=\"https://www.swat4ls.org/workshops/amsterdam2026/swat4hcls-biohackathon-2026/\">biohackathon</a> on Thursday (see\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2026/03/22/swat4hcls-2026-amsterdam-this-week.html\">this post</a>.\nI attempted a bit of live coverage on mastodon: <a href=\"https://social.edu.nl/@egonw/116285060969709401\">day 1</a> and\n<a href=\"https://social.edu.nl/@egonw/116289579219485790\">day 2</a>. But it seems the semantic web community interested\nin SWAT4HCLS has not found the fediverse yet. So, make sure to check\n<a href=\"https://www.swat4ls.org/workshops/amsterdam2026/programme/accepted-submissions/\">this full list of abstracts</a>.</p>\n\n<p>The meeting consisted of <a href=\"https://www.swat4ls.org/workshops/amsterdam2026/programme/keynotes/\">four keynotes</a>, each\none was quite interesting. Cornet gave a nice historic perspective of the venue and of the semantic web field,\nwhich is a great way to welcome the participants to your institute. The talk also touches on the main theme\nof the meeting: clinical data. It is a long standing (and important) research field, but progress is slow.\nCornet <a href=\"https://social.edu.nl/@egonw/116283216644714695\">comments</a> along the lines that <em>we have been talking\nabout reasoning over patient data for more than twenty years, but we still have not solve it</em>.</p>\n\n<p>The problem is really not only privacy, but simple also lack of a common language. As\n<a href=\"https://qlever.scholia.wiki/orcid/0000-0003-3248-7899\">Sabine Österle</a> explains\nabout sharing health/patient data in Switzerland, across 26 kantons and legislations and 4 national languages.\nAnother issue is more technical, running SPARQL across hospitals involves more than just aligning ontologies,\nbut also requires (too much) fiddling with SPARQL queries.</p>\n\n<p>There was plenty of other content too, however. For example, I was pleasantly\n<a href=\"https://social.edu.nl/@egonw/116284409447761902\">surprised</a> by the\n<a href=\"https://www.swat4ls.org/workshops/amsterdam2026/programme/accepted-submissions/#RDF4RiskAssessment_Toolkit_A_Toolkit_for_Converting_Tabular_Research_Data_to_FAIR_RDF_for_Risk_Assessment_and_Life_Sciences\">RDF4RiskAssessment</a>\nwork, the <a href=\"https://www.swat4ls.org/workshops/amsterdam2026/programme/accepted-submissions/#RO-Crates_for_BioImaging\">RO-Crates for BioImaging</a>,\nand <a href=\"https://www.swat4ls.org/workshops/amsterdam2026/programme/accepted-submissions/#FDPcrawleR_A_Lightweight_R_Framework_for_Auditing_FAIR_Data_Points_and_FAIR_Virtual_Platforms\">FDPcrawleR</a>.\nAll these projects have direct links to research ongoing in <a href=\"https://www.maastrichtuniversity.nl/research/translational-genomics\">our TGX team</a>.</p>\n\n<p><a href=\"https://qlever.scholia.wiki/orcid/0000-0003-1213-6776\">Hanna Bast</a> gave the second keynote of the first day, about <a href=\"https://qlever.dev/\">QLever</a>\n(doi:<a href=\"https://doi.org/10.1145/3132847.3132921\">10.1145/3132847.3132921</a>). She talked about some of the recent improvements,\nsomething we really <a href=\"https://chem-bla-ics.linkedchemistry.info/2026/02/28/rescuing-scholia-3-we-did-it.html\">needed for Scholia</a>.\nShe showed a technical approach to make federated queries faster, tho it currently only works between endpoints\nthat both run QLever. One thing I am looking forward to, is playing with the notion of\n<a href=\"https://docs.qlever.dev/materialized-views/?h=materialize\">materialized views</a>, but the biohackathon\nwas too short to get around to that during the Thursday.</p>\n\n<p>The second day kicked off with a keynote by <a href=\"https://qlever.scholia.wiki/orcid/0000-0002-3469-4923\">Janna Hastings</a>,\nwhose work I greatly admire. I was not disappointed today, and she showed the\n<a href=\"https://www.bciontology.org/\">Behaviour Change Intervention Ontology</a> and <a href=\"https://chebifier.hastingslab.org/\">Chebifier</a>\n(doi:<a href=\"https://doi.org/10.1039/D3DD00238A\">10.1039/D3DD00238A</a>).</p>\n\n<p>The last talk I want to mention in the blog is by two researcher working with Michel Dumontier. They\n<a href=\"https://www.swat4ls.org/workshops/amsterdam2026/programme/accepted-submissions/#Embedding-based_Deduplication_of_Knowledge_Graphs_using_Graph_Neural_Networks\">presented</a>\na study about deduplication in/of knowledge graphs. This is something I want to read in more detail.</p>",
      "summary": "A bit over a week ago, SWAT4HCLS 2026 took place, with the matching biohackathon on Thursday (see this post. I attempted a bit of live coverage on mastodon: day 1 and day 2. But it seems the semantic web community interested in SWAT4HCLS has not found the fediverse yet. So, make sure to check this full list of abstracts.",
      
      "date_published": "2026-04-04T16:54:00+00:00",
      "date_modified": "2026-04-04T16:54:00+00:00",
      "tags": ["swat4ls","mastodon"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1039/D3DD00238A", "doi": "10.1039/D3DD00238A"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1145/3132847.3132921", "doi": "10.1145/3132847.3132921"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/re9j2-hk972",
      "url": "https://chem-bla-ics.linkedchemistry.info/2026/03/29/using-compact-identifiers-in-project-reports.html",
      "title": "Using compact identifiers in project reports",
      "content_html": "<p>This document describes how you can improve the FAIR-ness of your project report by using\ncompact identifiers. Of course, it can be applied to any other document too, and has been used\nin, for example, journal articles and online documentation already.</p>\n\n<p>Compact identifiers find a balance between compactness in writing and being a persistent, unique,\nand global identifier. It “is a string constructed by concatenating a namespace prefix, a separating colon,\nand a locally unique identifier (LUI)” (doi:<a href=\"https://doi.org/10.1038/sdata.2018.29\">10.1038/sdata.2018.29</a>).\nFor example, for proteins it can represent the PDB structure <a href=\"https://bioregistry.io/pdb:2gc4\">2gc4</a> as\n<em>pdb:2gc4</em>. There is a clear similarity with the SciCrunch <a href=\"https://rrid.site/\">Research Resource Identifiers</a>\n(RRIDs) as used by several journals, like\n<a href=\"https://elifesciences.org/inside-elife/ff683ecc/rrids-how-did-we-get-here-and-where-are-we-going\">eLife</a>\n(doi:<a href=\"https://doi.org/10.1007/s12021-015-9284-3\">10.1007/s12021-015-9284-3</a>).</p>\n\n<p>When the prefixes are defined by community standards, then a compact identifier can be resolved.\nThere currently are multiple providers of prefix files (doi:<a href=\"https://doi.org/10.1038/sdata.2018.29\">10.1038/sdata.2018.29</a>),\nincluding Identifiers.org (doi:<a href=\"https://doi.org/10.1093/bioinformatics/btaa864\">10.1093/bioinformatics/btaa864</a>)\nand Bioregistry (doi:<a href=\"https://doi.org/10.1038/s41597-022-01807-3\">10.1038/s41597-022-01807-3</a>).\nThe Bioregistry has an overview of more than twenty registries of prefixes and their metadata\n(doi:<a href=\"https://doi.org/10.1038/s41597-022-01807-3\">10.1038/s41597-022-01807-3</a>). The metadata commonly\nincludes information on the URL pattern for each identifier. Often this is more than one pattern, as\nthere may more several databases with information for the same identifier.</p>\n\n<p>It is the URL pattern in the database that allows services to <em>resolve</em> the compact identifier\ninto a link to a database. The above registries correspond to three existing <em>resolvers</em> that will take a compact\nidentifier as part of a resolver URL and redirect to the database with the record matching\nthat identifier:</p>\n\n<ul>\n  <li>Name-to-Thing (N2T): <a href=\"https://n2t.net/\">https://n2t.net/</a></li>\n  <li>Identifiers.org: <a href=\"https://identifiers.org/\">https://identifiers.org/</a></li>\n  <li>The Bioregistry: <a href=\"https://bioregistry.io/\">https://bioregistry.io/</a></li>\n</ul>\n\n<p>Each of these URLs can be extended with a compact identifier. For example, a taxon record\nfrom the NCBI databases or the PDB entry mentioned earlier:</p>\n\n<ul>\n  <li><a href=\"https://bioregistry.io/pdb:2gc4\">https://bioregistry.io/pdb:2gc4</a></li>\n  <li><a href=\"https://identifiers.org/col:6MB3T\">https://identifiers.org/col:6MB3T</a> (<code class=\"language-plaintext highlighter-rouge\">col</code> is the prefix for the Catalogue of Life)</li>\n</ul>\n\n<h2 id=\"why-use-in-reports\">Why use in reports?</h2>\n\n<p>Using persistent identifiers is generally accepted as a good practice that benefits science\nand has been part of the ideas of FAIR data (doi:<a href=\"https://doi.org/10.1038/sdata.2016.18\">10.1038/sdata.2016.18</a>)\nand of Open Science. Compact\nidentifiers make it easy to be precise in reports about what things the reports talk about: they\nare relatively short but very precise at the same time. also, that has the benefit that they\nare much easier to reuse than labels of things and concepts that intrinsically have a certain\nlevel of uncertainty; a database entry has commonly a very specific meaning.</p>\n\n<h2 id=\"examples-uses\">Examples uses</h2>\n\n<p>The use of compact identifiers can be used in two ways. The simplest is to just put the\ncompact identifier as plain text in the document, possibly in parentheses\n(with the compact identifier highlighted here in bold):</p>\n\n<ul>\n  <i>This report is only about the experimental data of the human (<b>NCBITaxon:9606</b>) cell lines.</i>\n</ul>\n\n<p>Or:</p>\n\n<ul>\n  <i>We found that BRCA1 (<b>ensembl:ENSG00000012048</b>) played an important role.</i>\n</ul>\n\n<p>Alternatively, you can add a hyperlink with one of the resolvers, for example, Identifiers.org:</p>\n\n<ul>\n  <i>We found that BRCA1 (<b><a href=\"https://identifiers.org/ensembl:ENSG00000012048\">ensembl:ENSG00000012048</a></b>) played an important role.</i>\n</ul>\n\n<h3 id=\"compact-identifiers-for-material-identifiers\">Compact identifiers for material identifiers</h3>\n\n<p>The European Registry of Materials proposes to use the compact identifier for their\nERM identifiers (doi:<a href=\"https://doi.org/10.1186/s13321-022-00614-7\">10.1186/s13321-022-00614-7</a>):</p>\n\n<ul>\n  <i>\n    For example, the NanoSolveIT project registered a material with the ERM00000001 identifier.\n    The full Uniform Resource Identifier (URI) for this compound is\n    https://nanocommons.github.io/identifiers/registry#ERM00000001 which is too long to be used\n    in documentation. The corresponding compact identifier <b>erm:ERM00000001</b> is easy to use in written\n    material, analogous to the use of Protein Data Bank (PDB) identifiers for proteins in journals.\n  </i>\n</ul>\n\n<h3 id=\"compact-identifiers-for-citation-intent-annotations\">Compact identifiers for citation intent annotations</h3>\n\n<p>The compact identifier has also been used as the method to include citation intentions in journal\narticles (doi:<a href=\"https://doi.org/10.1186/s13321-020-00448-1\">10.1186/s13321-020-00448-1</a>,\ncompact identifier here highlighted in bold):</p>\n\n<ul>\n  <i>\n    We take advantage here of the ability to add notes to full form [..] references in bibliographies.\n    These are referred to as bibnotes. The content of the note will be strictly formatted: it will use\n    the syntax [<b>cito:usesMethodIn</b>] and formatted in bold. That is, the bibnote starts with the\n    [ character, followed by one of the CiTO types, and ends with the ] character. If you wish to\n    provide more than one annotation, you can repeat this syntax, separated by one or more spaces,\n    for example: [<b>cito:usesMethodIn</b>] [<b>cito:citeAsAuthority</b>].\n  </i>\n</ul>\n\n<p>Note that in this use, the square brackets and bold typeface are used to make them easier to\nbe recognized. Also, note that this document uses this approach to indicate the intention of\nwhy the cited articles are cited.</p>\n\n<h2 id=\"conclusion\">Conclusion</h2>\n\n<p>This document described what the compact identifier is, how it helps linking to online\ndatabases, and how they can be used in written reports as plain text, optionally\nhyperlinked with one of the compact identifier resolvers.</p>\n\n<h3 id=\"acknowledgments\">Acknowledgments</h3>\n\n<p>I thank <a href=\"https://n2t.net/github:tabbassidaloii\">github:tabbassidaloii</a>,\n<a href=\"https://n2t.net/github:cthoyt\">github:cthoyt</a>, and\n<a href=\"https://n2t.net/github:larsgw\">github:larsgw</a> for their comment on\n<a href=\"https://github.com/egonw/compact-ids-in-reports\">this GitHub repo</a>.</p>",
      "summary": "This document describes how you can improve the FAIR-ness of your project report by using compact identifiers. Of course, it can be applied to any other document too, and has been used in, for example, journal articles and online documentation already.",
      
      "date_published": "2026-03-29T00:00:00+00:00",
      "date_modified": "2026-03-29T00:00:00+00:00",
      "tags": ["identifier","semweb","cito"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1038/sdata.2018.29", "doi": "10.1038/sdata.2018.29"
            , "cito":
              
              
                [ 
                  "usesMethodIn"
                  ,
                
                  "includesQuotationFrom"
                  
                 ]
              
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1007/s12021-015-9284-3", "doi": "10.1007/s12021-015-9284-3"
            , "cito":
              
              
                [ 
                  "obtainsBackgroundFrom"
                  
                 ]
              
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1093/bioinformatics/btaa864", "doi": "10.1093/bioinformatics/btaa864"
            , "cito":
              
              
                [ 
                  "usesMethodIn"
                  
                 ]
              
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1038/S41597-022-01807-3", "doi": "10.1038/S41597-022-01807-3"
            , "cito":
              
              
                [ 
                  "usesMethodIn"
                  
                 ]
              
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1038/sdata.2016.18", "doi": "10.1038/sdata.2016.18"
            , "cito":
              
              
                [ 
                  "obtainsBackgroundFrom"
                  
                 ]
              
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1186/S13321-022-00614-7", "doi": "10.1186/S13321-022-00614-7"
            , "cito":
              
              
                [ 
                  "includesQuotationFrom"
                  
                 ]
              
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1186/S13321-020-00448-1", "doi": "10.1186/S13321-020-00448-1"
            , "cito":
              
              
                [ 
                  "includesQuotationFrom"
                  
                 ]
              
             }
            
          
        ],
      
      "_funding": [{"award": { "title" : "FAIR4ChemNL: Accelerating the adoption of universal data standards in chemistry", "acronym" : "FAIR4ChemNL", "uri" : "doi:10.61686/XVYQV45374" }, "funder": { "name": "Dutch Research Council", "ror": "04jsz6e67" } }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/mztnx-y1770",
      "url": "https://chem-bla-ics.linkedchemistry.info/2026/03/22/swat4hcls-2026-amsterdam-this-week.html",
      "title": "SWAT4HCLS 2026 Amsterdam this week",
      "content_html": "<p>Tomorrow, <a href=\"https://www.swat4ls.org/workshops/amsterdam2026/\">SWAT4HCLS 2026</a> will start, again in Amsterdam.\nThe first SWAT4LS I attended <a href=\"https://chem-bla-ics.linkedchemistry.info/2009/11/21/swat4ls-linking-open-drug-data-to.html\">was also in Amsterdam</a>, and the second meeting in Amsterdam I was <a href=\"https://chem-bla-ics.linkedchemistry.info/2016/12/18/my-swat4ls-poster-about-enanomapper.html\">also there</a>. And I was in <a href=\"https://www.swat4ls.org/workshops/cambridge2015/index.php\">Cambridge</a> (see\n<a href=\"https://chem-bla-ics.blogspot.com/2015/12/swat4ls-in-cambridge.html\">this post</a>),\n<a href=\"https://www.swat4ls.org/workshops/antwerp2018/\">Antwerp</a>  (no post), and at least to one of the two\n<a href=\"https://www.swat4ls.org/workshops/leiden2024/\">Leiden</a> meetings (also no posts, it seems).</p>\n\n<p>I am looking forward to meet old friends, new friends (some whom I never met in person), and\nrecent collaborators (that I never met in person).\nFor those who will not be in Amsterdam, you can follow the meeting on social media with\nthe <a href=\"https://hashtags-hub.toolforge.org/swat4hcls\">hashtag #swat4hcls</a>. And there is also\n<a href=\"https://fediwall.biohackrxiv.org/\">this BioHackrXiv Fediwall</a>, for those in the\n<a href=\"https://en.wikipedia.org/wiki/Fediverse\">fediverse</a>.</p>\n\n<h3 id=\"scholia-demo\">Scholia demo</h3>\n\n<p>I will give a demo to update people on the work in the <a href=\"https://github.com/wdscholia/scholia\">Scholia</a> project with\nDaniel Mietchen, Peter Patel-Schneider, Konrad Linden, Johannes Kalmbach,\nLars Willighagen, Wolfgang Fahl, and Hannah Bast (also keynote in Amsterdam)\nto <a href=\"https://chem-bla-ics.linkedchemistry.info/2026/02/28/rescuing-scholia-3-we-did-it.html\">update the SPARQL queries</a>\nwe use to visualize data in <a href=\"https://www.wikidata.org/\">Wikidata</a> to SPARQL 1.1 so that it can run on\n<a href=\"https://qlever.dev/\">Qlever</a>.\nThe abstract can be <a href=\"https://commons.wikimedia.org/wiki/File:Scholia_2026_Compliance_with_SPARQL_1.1.pdf\">found in Wikimedia Commons</a>.</p>\n\n<p>This was the outcome of many years figuring how to ensure Scholia could remain working. The\n<a href=\"https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2026-02-17/Technology_report\">Wikidata RDF graph split</a>\nhas given us many headaches, so many that just before christmas it became it could\nbe possible to survive the split, I was so happy, I realize I want to share this news. So, we teamed\nup and wrote this demonstration contribution abstract. Thanks to everyone who made this happen!\nJust to be clear, we are not done yet. The system is not running outside the Wikimedia Foundation\nplatforms.</p>\n\n<p>One of the reviewer comments requested <a href=\"https://qlever.scholia.wiki/event/Q138033585\">a Scholia page for the meeting</a>.\nIt has not been updated for the accepted speakers, but you can look at <a href=\"https://qlever.scholia.wiki/event-series/Q56846035\">pages for past meetings</a>\nto get an idea what you will find.</p>\n\n<h3 id=\"swat4hcls-biohackathon-2026\">SWAT4HCLS Biohackathon 2026</h3>\n\n<p>There will also be <a href=\"https://www.swat4ls.org/workshops/amsterdam2026/swat4hcls-biohackathon-2026/\">a biohackathon again</a>,\nof course, with the <a href=\"https://index.biohackrxiv.org/tag/SWAT4HCLS26\">option for BioHackRxiv reports</a>.\nThere are already <a href=\"https://www.swat4ls.org/workshops/amsterdam2026/swat4hcls-biohackathon-2026/\">several pitches</a>,\nincluding one that I submitted about Scholia.</p>",
      "summary": "Tomorrow, SWAT4HCLS 2026 will start, again in Amsterdam. The first SWAT4LS I attended was also in Amsterdam, and the second meeting in Amsterdam I was also there. And I was in Cambridge (see this post), Antwerp (no post), and at least to one of the two Leiden meetings (also no posts, it seems).",
      
      "date_published": "2026-03-22T00:00:00+00:00",
      "date_modified": "2026-03-22T00:00:00+00:00",
      "tags": ["rdf","sparql","swat4ls","wikidata"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/gw9at-srp84",
      "url": "https://chem-bla-ics.linkedchemistry.info/2026/03/08/cdk-2.12.html",
      "title": "CDK 2.12",
      "content_html": "<p><a href=\"https://github.com/cdk/cdk/releases/tag/cdk-2.12\">Version 2.12</a> of the <a href=\"https://cdk.github.io/\">Chemistry Development Kit</a> has been released.\nIt is the last release with contributions by <a href=\"https://www.nwo.nl/en/projects/osf232097\">our NWO Open Science grant</a>.\nThis release adds some nice new APIs:</p>\n\n<ul>\n  <li>harmonize hydrogens to various states: depiction, stereo, minimal, and unsafe (useful for depictions)</li>\n  <li>generate wedge bonds based on coordinates and stereochemistry</li>\n  <li>more Markush / RGroup support</li>\n  <li>atropisomers via CXSMILES</li>\n  <li>sugar extraction</li>\n</ul>\n\n<p>I also update the following libraries/tools to use CDK 2.12:</p>\n\n<ul>\n  <li><a href=\"https://github.com/enanomapper/nanojava/releases/tag/nanojava-2.0.6\">NanoJava 2.16</a></li>\n  <li><a href=\"https://github.com/egonw/bacting/releases/tag/bacting-1.0.10\">Bacting 1.0.10</a> (and the Python pyBacting will follow asap)</li>\n</ul>",
      "summary": "Version 2.12 of the Chemistry Development Kit has been released. It is the last release with contributions by our NWO Open Science grant. This release adds some nice new APIs:",
      
      "date_published": "2026-03-08T00:00:00+00:00",
      "date_modified": "2026-03-08T00:00:00+00:00",
      "tags": ["cdk","openscience"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.5281/zenodo.18850648", "doi": "10.5281/zenodo.18850648"
             }
            
          
        ],
      
      "_funding": [{"award": { "title" : "The Chemistry Development Kit in 2024: improving cheminformatics research", "acronym" : "CDK2024", "uri" : "drc.filenumber:osf232097" }, "funder": { "name": "Dutch Research Council", "ror": "04jsz6e67" } }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/kd793-2fe02",
      "url": "https://chem-bla-ics.linkedchemistry.info/2026/02/28/rescuing-scholia-3-we-did-it.html",
      "title": "Rescuing Scholia #3: We did it!",
      "content_html": "<p>It was not a set up, when I openly <a href=\"https://chem-bla-ics.linkedchemistry.info/2025/12/08/rescuing-scholia.html\">wondered if we would be able to rescue Scholia in time</a>.\nI honestly did not know. Three weeks and some serious hacking by an international team later <a href=\"https://chem-bla-ics.linkedchemistry.info/2025/12/31/rescuing-scholia-2-getting-close.html\">I was more optimistic</a>.\nActually, just before christmas, we started writing a <a href=\"https://www.swat4ls.org/\">SWAT4HCLS 2026</a> demonstration abstract. This was accepted and\nyou can read the <em>Scholia 2026: Compliance with SPARQL 1.1</em> preprint <a href=\"https://github.com/WolfgangFahl/ScholiaGraphSplitPaper\">here</a> and\n<a href=\"https://commons.wikimedia.org/wiki/File:Scholia_2026_Compliance_with_SPARQL_1.1.pdf\">here</a>.\nThis paper describes the work that had to be done, and I am deeply grateful to everyone who contributed with smaller or\nbigger contributions (Daniel, Peter, Konrad, Johannes, Lars, Wolfgang, Hannah).\nI am merely first author for the demo, and just another contributor to the long series of patches, in a\n<a href=\"https://github.com/WDscholia/scholia/pull/2715\">branch started by Prof. Hannah Bast</a>.</p>\n\n<p>The work actually started long before that, with the <em>Robustifying Scholia</em> grant (see doi:<a href=\"https://doi.org/10.3897/rio.5.e35820\">10.3897/rio.5.e35820</a>),\nwhere we explored alternatives. The Wikidata graph (RDF) split has been long coming, and I can recommend\n<a href=\"https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2026-02-17/Technology_report\">this recent The Signpost article</a>\nby <a href=\"https://disobey.net/@Bluerasberry\">Lane</a> for a good overview. So, this would not have been possible with\n<a href=\"https://github.com/WDscholia/scholia/graphs/contributors\">the many people who contributed over the years</a>.\nBut this last sprint really made a difference.</p>\n\n<p>The developments of the QLever software in the past year are very important, and the SPARQL endpoint we run now is live updated,\njust like we knew from the Wikidata Query Service (WDQS). Recent improvement allowed us to replace all the Wikidata and Blazegraph\nspecific aspects of the SPARQL queries, and good discussions let to pragmatic approaches to keep localization features\nScholia had for displaying query results from Wikidata.</p>\n\n<p>The work is not completed, however. All queries are SPARQL 1.1 now, but some can still be further optimized, and some still\nneed some fixing. For example, I still spot some QIDs here and there, instead of the localized labels that should be shown instead.\nAlso, we are actively looking in getting everything running again on WMF servers (see <a href=\"https://github.com/WDscholia/scholia/issues/2766\">this overview issue</a>),\nso that <em>scholia.toolforge.org</em> works again.</p>\n\n<p>For now, however, please use <a href=\"https://qlever.scholia.wiki/\">qlever.scholia.wiki</a>.</p>",
      "summary": "It was not a set up, when I openly wondered if we would be able to rescue Scholia in time. I honestly did not know. Three weeks and some serious hacking by an international team later I was more optimistic. Actually, just before christmas, we started writing a SWAT4HCLS 2026 demonstration abstract. This was accepted and you can read the Scholia 2026: Compliance with SPARQL 1.1 preprint here and here. This paper describes the work that had to be done, and I am deeply grateful to everyone who contributed with smaller or bigger contributions (Daniel, Peter, Konrad, Johannes, Lars, Wolfgang, Hannah). I am merely first author for the demo, and just another contributor to the long series of patches, in a branch started by Prof. Hannah Bast.",
      
      "date_published": "2026-02-28T00:00:00+00:00",
      "date_modified": "2026-02-28T00:00:00+00:00",
      "tags": ["scholia","sparql","swat4ls"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.3897/RIO.5.E35820", "doi": "10.3897/RIO.5.E35820"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/6smn2-ah530",
      "url": "https://chem-bla-ics.linkedchemistry.info/2026/02/22/where-do-the-wikipathways-come-from.html",
      "title": "Where do the WikiPathways come from?",
      "content_html": "<p><a href=\"https://en.wikipedia.org/wiki/WikiPathways\">WikiPathways</a> was <a href=\"https://qlever.scholia.wiki/topic/Q7999828#earliest-published-works\">founded in 2008</a>,\nin the year I left Wageningen (and we Nijmegen) and moved to Uppsala, Sweden. When we dediced to move back to The Netherlands in 2012, I got to opportunity\nto join the Department of Bioinformatics (BiGCaT) and work on Open PHACTS. I had visited the group in March 2011 because I had a COST action\nworkshop near Maastricht (about nanoQSAR) and the bioinformatics group did <a href=\"https://wikipathways.org/\">WikiPathways</a>.</p>\n\n<p>When I joined, there were already hundreds of pathways, originating from various collaborations (see below).\nAround the winter break, the question came up who are the people who have drawn all these pathways. And on the new website\nthis is not actually that easy to see. You can <a href=\"https://www.wikipathways.org/browse/table.html\">browse all pathways</a>, or look up\n<a href=\"https://www.wikipathways.org/browse/authors.html\">author profiles</a>, but not all authors have done the same amount of work.\nMoreover, at various points of time, batches of pathways from those collaborators were added. Often, these were added\nby the <code class=\"language-plaintext highlighter-rouge\">MaintBot</code> account, which is routinely hidden, and then the author who shows up as first author, is not even\nthe original author. And then we still have a lot of homology-converted pathways. These are pathways translated to\nsome species from a model species. You can find them in <a href=\"https://github.com/wikipathways/wikipathways-homology\">this repository</a>.</p>\n\n<p>But nowadays I do a lot in the WikiPathways project, among other things generate the RDF and maintain the code that does so.\nAnd I realized that we have author information in the RDF too (created by <a href=\"https://orcid.org/0000-0001-5706-2163\">Alex Pico</a>.\nSo, the idea came up to see who the “first authors” are of the WikiPathways (mind the <em>MaintBot</em> issue), and what we know\nabout them. Many already had their ORCID profiles linked from their profile pages, making it easy to look up their\nexpertises.</p>\n\n<p>Now, that was in January. But it turned out that the author information in the RDF worked fine in the <code class=\"language-plaintext highlighter-rouge\">.ttl</code> file\nof a single pathway, but that the <em>series ordinal</em> (e.g. 1 for being first author) was bound to the author, and\na SPARQL query would not be able to figure out on which pathways someone was first author. I fixed this somewhere\nin January, so in the <a href=\"https://github.com/wikipathways/wikipathways-help/discussions/221\">February 10 release</a> the\nimproved data model was available.</p>\n\n<p>Allow me to show what is now possible, with a few SPARQL queries. First, list the authors of a pathway, use\n<a href=\"https://edu.nl/q9txc\">this template</a> for <code class=\"language-plaintext highlighter-rouge\">WP10</code>:</p>\n\n<div class=\"language-sparql highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">PREFIX</span><span class=\"w\"> </span><span class=\"nn\">dc</span><span class=\"o\">:</span><span class=\"w\">    </span><span class=\"nn\">&lt;http://purl.org/dc/elements/1.1/&gt;</span><span class=\"w\">\n</span><span class=\"k\">PREFIX</span><span class=\"w\"> </span><span class=\"nn\">foaf</span><span class=\"o\">:</span><span class=\"w\">  </span><span class=\"nn\">&lt;http://xmlns.com/foaf/0.1/&gt;</span><span class=\"w\">\n</span><span class=\"k\">PREFIX</span><span class=\"w\"> </span><span class=\"nn\">wpq</span><span class=\"o\">:</span><span class=\"w\">   </span><span class=\"nn\">&lt;http://www.wikidata.org/prop/qualifier/&gt;</span><span class=\"w\">\n</span><span class=\"k\">PREFIX</span><span class=\"w\"> </span><span class=\"nn\">pav</span><span class=\"o\">:</span><span class=\"w\">   </span><span class=\"nn\">&lt;http://purl.org/pav/&gt;</span><span class=\"w\">\n\n</span><span class=\"k\">SELECT</span><span class=\"w\"> </span><span class=\"nv\">?pathway</span><span class=\"w\"> </span><span class=\"nv\">?version</span><span class=\"w\"> </span><span class=\"nv\">?ordinal</span><span class=\"w\"> </span><span class=\"nv\">?author_</span><span class=\"w\"> </span><span class=\"nv\">?name</span><span class=\"w\"> </span><span class=\"nv\">?orcid</span><span class=\"w\"> </span><span class=\"nv\">?page</span><span class=\"w\"> </span><span class=\"k\">WHERE</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\">\n  </span><span class=\"k\">VALUES</span><span class=\"w\"> </span><span class=\"nv\">?pathway</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\"> </span><span class=\"nn\">&lt;https://identifiers.org/wikipathways/WP10&gt;</span><span class=\"w\"> </span><span class=\"p\">}</span><span class=\"w\">\n  </span><span class=\"nv\">?author_</span><span class=\"w\"> </span><span class=\"k\">a</span><span class=\"w\"> </span><span class=\"nn\">foaf</span><span class=\"o\">:</span><span class=\"ss\">Person</span><span class=\"w\"> </span><span class=\"p\">;</span><span class=\"w\">\n    </span><span class=\"nn\">wp</span><span class=\"o\">:</span><span class=\"ss\">hasAuthorship</span><span class=\"w\"> </span><span class=\"nv\">?authorship</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n  </span><span class=\"nv\">?authorship</span><span class=\"w\"> </span><span class=\"err\">^</span><span class=\"nn\">wp</span><span class=\"o\">:</span><span class=\"ss\">hasAuthorship</span><span class=\"w\"> </span><span class=\"nv\">?pathway</span><span class=\"w\"> </span><span class=\"p\">;</span><span class=\"w\">\n    </span><span class=\"nn\">wpq</span><span class=\"o\">:</span><span class=\"ss\">series_ordinal</span><span class=\"w\"> </span><span class=\"nv\">?ordinal</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n  </span><span class=\"nv\">?pathway</span><span class=\"w\"> </span><span class=\"nn\">pav</span><span class=\"o\">:</span><span class=\"ss\">hasVersion</span><span class=\"w\"> </span><span class=\"nv\">?pathway_</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n  </span><span class=\"nv\">?pathway_</span><span class=\"w\"> </span><span class=\"k\">a</span><span class=\"w\"> </span><span class=\"nn\">wp</span><span class=\"o\">:</span><span class=\"ss\">Pathway</span><span class=\"w\"> </span><span class=\"p\">;</span><span class=\"w\"> </span><span class=\"nn\">wp</span><span class=\"o\">:</span><span class=\"ss\">isAbout</span><span class=\"w\"> </span><span class=\"o\">/</span><span class=\"w\"> </span><span class=\"nn\">gpml</span><span class=\"o\">:</span><span class=\"ss\">version</span><span class=\"w\"> </span><span class=\"nv\">?version</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n  </span><span class=\"k\">OPTIONAL</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\"> </span><span class=\"nv\">?author_</span><span class=\"w\"> </span><span class=\"nn\">foaf</span><span class=\"o\">:</span><span class=\"ss\">homepage</span><span class=\"w\"> </span><span class=\"nv\">?page</span><span class=\"w\"> </span><span class=\"p\">}</span><span class=\"w\">\n  </span><span class=\"k\">OPTIONAL</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\"> </span><span class=\"nv\">?author_</span><span class=\"w\"> </span><span class=\"nn\">foaf</span><span class=\"o\">:</span><span class=\"ss\">name</span><span class=\"w\"> </span><span class=\"nv\">?name</span><span class=\"w\"> </span><span class=\"p\">}</span><span class=\"w\">\n  </span><span class=\"k\">OPTIONAL</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\"> </span><span class=\"nv\">?author_</span><span class=\"w\"> </span><span class=\"nn\">dc</span><span class=\"o\">:</span><span class=\"ss\">identifier</span><span class=\"w\"> </span><span class=\"nv\">?orcid</span><span class=\"w\"> </span><span class=\"p\">}</span><span class=\"w\">\n</span><span class=\"p\">}</span><span class=\"w\"> </span><span class=\"k\">ORDER</span><span class=\"w\"> </span><span class=\"k\">BY</span><span class=\"w\"> </span><span class=\"k\">ASC</span><span class=\"p\">(</span><span class=\"nv\">?pathway</span><span class=\"p\">)</span><span class=\"w\"> </span><span class=\"k\">ASC</span><span class=\"p\">(</span><span class=\"nv\">?ordinal</span><span class=\"p\">)</span><span class=\"w\">\n</span></code></pre></div></div>\n\n<p>We can see who the 8 people are who contributed to this pathway (we cannot actually see here what they contributed), and many\nauthors are member of the WikiPathways review team who focus more on technical quality than the biology. The first author,\nhowever, often is the person who contributed most of the biological knowledge in the pathway, in this case\n<a href=\"https://www.wikipathways.org/authors/A.Pandey\">Akhilesh Pandey</a> from the NetSlim collaboration\n(see doi:<a href=\"https://doi.org/10.1093/database/bar032\">10.1093/database/bar032</a>):</p>\n\n<p><img src=\"/assets/images/wikipathways_authorList.png\" alt=\"\" /></p>\n\n<h2 id=\"collaborations\">Collaborations</h2>\n\n<p>Over time, multiple collaborations have taken place, like the one with NetSlim from the above query. In these collaborations,\nthe knowledge may not be digitized in WikiPathways as GPML by the biological experts. That encoding regularly is done\nby others, but with those experts ensuring the quality. The following collaborations are examples, and\n<a href=\"https://www.wikipathways.org/browse/communities.html\">a fuller list is found online</a>:</p>\n\n<ul>\n  <li><a href=\"https://www.wikipathways.org/communities/wormbase_approved.html\">WormBase</a> (doi:<a href=\"https://doi.org/10.1093/nar/gkt1063\">10.1093/nar/gkt1063</a>)</li>\n  <li><a href=\"https://www.wikipathways.org/communities/lipids.html\">LIPID MAPS</a> (doi:<a href=\"https://doi.org/10.1093/nar/gkad896\">10.1093/nar/gkad896</a>)</li>\n  <li><a href=\"https://www.wikipathways.org/communities/imd.html\">Inherited Metabolic Disorders</a> (doi:<a href=\"https://doi.org/10.1007/978-3-030-67727-5_73\">10.1007/978-3-030-67727-5_73</a>)</li>\n  <li><a href=\"https://www.wikipathways.org/communities/micronutrients.html\">Micronutrients</a> (doi:<a href=\"https://doi.org/10.1007/s12263-010-0192-8\">10.1007/s12263-010-0192-8</a>)</li>\n</ul>\n\n<p>We have collaborated with Reactome on various occassions (e.g. see doi:<a href=\"https://doi.org/10.1371/journal.pcbi.1004941\">10.1371/journal.pcbi.1004941</a> and\ndoi:<a href=\"https://doi.org/10.1007/s12263-010-0192-8\">10.1007/s12263-010-0192-8</a>), around plants (e.g. see doi:<a href=\"https://doi.org/10.1186/1939-8433-6-14\">10.1186/1939-8433-6-14</a>),\naround rare diseases in projects like <a href=\"https://www.ejprarediseases.org/\">EJP-RD</a> and <a href=\"https://erdera.org/\">ERDERA</a>, and around SARS-CoV-2.\nFor that, see these communities:</p>\n\n<ul>\n  <li><a href=\"https://www.wikipathways.org/communities/reactome.html\">Reactome</a></li>\n  <li><a href=\"https://www.wikipathways.org/communities/plants.html\">Plants</a> (see also <a href=\"https://doi.org/10.37044/osf.io/m37f2_v1\">this DBCLS BioHackathon 2025 paper</a>)</li>\n  <li><a href=\"https://www.wikipathways.org/communities/rarediseases.html\">Rare Diseases</a></li>\n  <li><a href=\"https://www.wikipathways.org/communities/covid19.html\">COVID-19</a></li>\n</ul>\n\n<p>And then there are pathways in WikiPathways supported by a full paper, but I will leave that for a later moment.</p>\n\n<h2 id=\"author-statistics\">Author statistics</h2>\n\n<p>Back to the authors, because the new RDF model allows a few more nice queries. For example, we can check the number\nof pathways with a certain number of authors, and then we find with the following query that there are two pathways\nwith up to 18 authors (<a href=\"https://edu.nl/mhjbw\">try here</a>):</p>\n\n<div class=\"language-sparql highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">PREFIX</span><span class=\"w\"> </span><span class=\"nn\">dc</span><span class=\"o\">:</span><span class=\"w\">    </span><span class=\"nn\">&lt;http://purl.org/dc/elements/1.1/&gt;</span><span class=\"w\">\n</span><span class=\"k\">PREFIX</span><span class=\"w\"> </span><span class=\"nn\">wpq</span><span class=\"o\">:</span><span class=\"w\">   </span><span class=\"nn\">&lt;http://www.wikidata.org/prop/qualifier/&gt;</span><span class=\"w\">\n\n</span><span class=\"k\">SELECT</span><span class=\"w\"> </span><span class=\"nv\">?atLeast</span><span class=\"w\"> </span><span class=\"p\">(</span><span class=\"nb\">COUNT</span><span class=\"p\">(</span><span class=\"k\">DISTINCT</span><span class=\"w\"> </span><span class=\"nv\">?pathway</span><span class=\"p\">)</span><span class=\"w\"> </span><span class=\"k\">AS</span><span class=\"w\"> </span><span class=\"nv\">?count</span><span class=\"p\">)</span><span class=\"w\"> </span><span class=\"k\">WHERE</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\">\n  </span><span class=\"nv\">?author_</span><span class=\"w\"> </span><span class=\"k\">a</span><span class=\"w\"> </span><span class=\"nn\">foaf</span><span class=\"o\">:</span><span class=\"ss\">Person</span><span class=\"w\"> </span><span class=\"p\">;</span><span class=\"w\">\n    </span><span class=\"nn\">wp</span><span class=\"o\">:</span><span class=\"ss\">hasAuthorship</span><span class=\"w\"> </span><span class=\"nv\">?authorship</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n  </span><span class=\"nv\">?authorship</span><span class=\"w\"> </span><span class=\"err\">^</span><span class=\"nn\">wp</span><span class=\"o\">:</span><span class=\"ss\">hasAuthorship</span><span class=\"w\"> </span><span class=\"nv\">?pathway</span><span class=\"w\"> </span><span class=\"p\">;</span><span class=\"w\">\n    </span><span class=\"nn\">wpq</span><span class=\"o\">:</span><span class=\"ss\">series_ordinal</span><span class=\"w\"> </span><span class=\"nv\">?atLeast</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n</span><span class=\"p\">}</span><span class=\"w\"> </span><span class=\"k\">GROUP</span><span class=\"w\"> </span><span class=\"k\">BY</span><span class=\"w\"> </span><span class=\"nv\">?atLeast</span><span class=\"w\">\n  </span><span class=\"k\">ORDER</span><span class=\"w\"> </span><span class=\"k\">BY</span><span class=\"w\"> </span><span class=\"k\">ASC</span><span class=\"p\">(</span><span class=\"nn\">xsd</span><span class=\"o\">:</span><span class=\"ss\">integer</span><span class=\"p\">(</span><span class=\"nv\">?atLeast</span><span class=\"p\">))</span><span class=\"w\">\n</span></code></pre></div></div>\n\n<p>We can also look at the <a href=\"https://edu.nl/fkwy9\">list of authors</a>, sorted by the number of pathways they are noted as first author on.\nallong with their profile page on ORCID number:</p>\n\n<div class=\"language-sparql highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">PREFIX</span><span class=\"w\"> </span><span class=\"nn\">dc</span><span class=\"o\">:</span><span class=\"w\">    </span><span class=\"nn\">&lt;http://purl.org/dc/elements/1.1/&gt;</span><span class=\"w\">\n</span><span class=\"k\">PREFIX</span><span class=\"w\"> </span><span class=\"nn\">foaf</span><span class=\"o\">:</span><span class=\"w\">  </span><span class=\"nn\">&lt;http://xmlns.com/foaf/0.1/&gt;</span><span class=\"w\">\n</span><span class=\"k\">PREFIX</span><span class=\"w\"> </span><span class=\"nn\">wpq</span><span class=\"o\">:</span><span class=\"w\">   </span><span class=\"nn\">&lt;http://www.wikidata.org/prop/qualifier/&gt;</span><span class=\"w\">\n</span><span class=\"k\">PREFIX</span><span class=\"w\"> </span><span class=\"nn\">pav</span><span class=\"o\">:</span><span class=\"w\">   </span><span class=\"nn\">&lt;http://purl.org/pav/&gt;</span><span class=\"w\">\n\n</span><span class=\"k\">SELECT</span><span class=\"w\"> </span><span class=\"p\">(</span><span class=\"nb\">COUNT</span><span class=\"p\">(</span><span class=\"k\">DISTINCT</span><span class=\"w\"> </span><span class=\"nv\">?pathway</span><span class=\"p\">)</span><span class=\"w\"> </span><span class=\"k\">AS</span><span class=\"w\"> </span><span class=\"nv\">?count</span><span class=\"p\">)</span><span class=\"w\"> </span><span class=\"nv\">?name</span><span class=\"w\"> </span><span class=\"nv\">?orcid</span><span class=\"w\"> </span><span class=\"nv\">?page</span><span class=\"w\"> </span><span class=\"k\">WHERE</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\">\n  </span><span class=\"k\">VALUES</span><span class=\"w\"> </span><span class=\"nv\">?ordinal</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\"> </span><span class=\"s2\">\"1\"</span><span class=\"w\"> </span><span class=\"p\">}</span><span class=\"w\">\n  </span><span class=\"nv\">?author_</span><span class=\"w\"> </span><span class=\"k\">a</span><span class=\"w\"> </span><span class=\"nn\">foaf</span><span class=\"o\">:</span><span class=\"ss\">Person</span><span class=\"w\"> </span><span class=\"p\">;</span><span class=\"w\">\n    </span><span class=\"nn\">wp</span><span class=\"o\">:</span><span class=\"ss\">hasAuthorship</span><span class=\"w\"> </span><span class=\"nv\">?authorship</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n  </span><span class=\"nv\">?authorship</span><span class=\"w\"> </span><span class=\"err\">^</span><span class=\"nn\">wp</span><span class=\"o\">:</span><span class=\"ss\">hasAuthorship</span><span class=\"w\"> </span><span class=\"nv\">?pathway</span><span class=\"w\"> </span><span class=\"p\">;</span><span class=\"w\">\n    </span><span class=\"nn\">wpq</span><span class=\"o\">:</span><span class=\"ss\">series_ordinal</span><span class=\"w\"> </span><span class=\"nv\">?ordinal</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n  </span><span class=\"nv\">?pathway</span><span class=\"w\"> </span><span class=\"nn\">pav</span><span class=\"o\">:</span><span class=\"ss\">hasVersion</span><span class=\"w\"> </span><span class=\"nv\">?pathway_</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n  </span><span class=\"nv\">?pathway_</span><span class=\"w\"> </span><span class=\"k\">a</span><span class=\"w\"> </span><span class=\"nn\">wp</span><span class=\"o\">:</span><span class=\"ss\">Pathway</span><span class=\"w\"> </span><span class=\"p\">;</span><span class=\"w\"> </span><span class=\"nn\">dcterms</span><span class=\"o\">:</span><span class=\"ss\">identifier</span><span class=\"w\"> </span><span class=\"nv\">?version</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n  </span><span class=\"k\">OPTIONAL</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\"> </span><span class=\"nv\">?author_</span><span class=\"w\"> </span><span class=\"nn\">foaf</span><span class=\"o\">:</span><span class=\"ss\">homepage</span><span class=\"w\"> </span><span class=\"nv\">?page</span><span class=\"w\"> </span><span class=\"p\">}</span><span class=\"w\">\n  </span><span class=\"k\">OPTIONAL</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\"> </span><span class=\"nv\">?author_</span><span class=\"w\"> </span><span class=\"nn\">foaf</span><span class=\"o\">:</span><span class=\"ss\">name</span><span class=\"w\"> </span><span class=\"nv\">?name</span><span class=\"w\"> </span><span class=\"p\">}</span><span class=\"w\">\n  </span><span class=\"k\">OPTIONAL</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\"> </span><span class=\"nv\">?author_</span><span class=\"w\"> </span><span class=\"nn\">dc</span><span class=\"o\">:</span><span class=\"ss\">identifier</span><span class=\"w\"> </span><span class=\"nv\">?orcid</span><span class=\"w\"> </span><span class=\"p\">}</span><span class=\"w\">\n</span><span class=\"p\">}</span><span class=\"w\"> </span><span class=\"k\">GROUP</span><span class=\"w\"> </span><span class=\"k\">BY</span><span class=\"w\"> </span><span class=\"nv\">?ordinal</span><span class=\"w\"> </span><span class=\"nv\">?name</span><span class=\"w\"> </span><span class=\"nv\">?orcid</span><span class=\"w\"> </span><span class=\"nv\">?page</span><span class=\"w\">\n  </span><span class=\"k\">ORDER</span><span class=\"w\"> </span><span class=\"k\">BY</span><span class=\"w\"> </span><span class=\"k\">DESC</span><span class=\"p\">(</span><span class=\"nv\">?count</span><span class=\"p\">)</span><span class=\"w\">\n</span></code></pre></div></div>\n\n<p>Is this the full story? No, of course not. There are so much details yet uncovered, but it gives a bit more\ninsight of where the biological knowledge in WikiPathways is coming from.</p>\n\n<p>Want more peer review of the content? Then why not help setup a new community? Just ping me or\n<a href=\"https://www.wikipathways.org/authors/Mkutmon\">Martina</a>.</p>",
      "summary": "WikiPathways was founded in 2008, in the year I left Wageningen (and we Nijmegen) and moved to Uppsala, Sweden. When we dediced to move back to The Netherlands in 2012, I got to opportunity to join the Department of Bioinformatics (BiGCaT) and work on Open PHACTS. I had visited the group in March 2011 because I had a COST action workshop near Maastricht (about nanoQSAR) and the bioinformatics group did WikiPathways.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/wikipathways_authorList.png",
      "date_published": "2026-02-22T00:00:00+00:00",
      "date_modified": "2026-02-22T00:00:00+00:00",
      "tags": ["wikipathways","rdf","sparql"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1093/database/bar032", "doi": "10.1093/database/bar032"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1093/nar/gkt1063", "doi": "10.1093/nar/gkt1063"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1093/nar/gkad896", "doi": "10.1093/nar/gkad896"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1007/978-3-030-67727-5_73", "doi": "10.1007/978-3-030-67727-5_73"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1007/s12263-010-0192-8", "doi": "10.1007/s12263-010-0192-8"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1371/journal.pcbi.1004941", "doi": "10.1371/journal.pcbi.1004941"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1007/s12263-010-0192-8", "doi": "10.1007/s12263-010-0192-8"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.37044/osf.io/m37f2_v1", "doi": "10.37044/osf.io/m37f2_v1"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/pm3c5-89k94",
      "url": "https://chem-bla-ics.linkedchemistry.info/2026/02/21/the-tdcc-nes-col-lab-retreat.html",
      "title": "The TDCC NES Col-Lab Retreat",
      "content_html": "<p>Last autumn two TDCC projects started, <em>FAIR4ChemNL</em> (<a href=\"https://chem-bla-ics.linkedchemistry.info/2026/02/08/open-infrastructures.html\">with the PeerTube channel</a>\nand doi:<a href=\"https://doi.org/10.61686/XVYQV45374\">10.61686/XVYQV45374</a>) and <em>FAIRify for metabolomics data</em>\n(doi:<a href=\"https://doi.org/10.61686/CSGIP04334\">10.61686/CSGIP04334</a>). But I haven’t written much on either yet and what the role is our research group in these projects.</p>\n\n<p>Let’s start with what the TDCC actually are: they are <a href=\"https://tdcc.nl/\">Thematic Digital Competence Centres</a>:</p>\n\n<blockquote>\n  <p>The Thematic Digital Competence Centres (TDCCs) are network-based initiatives set up by NWO and the Dutch academic\ncommunity to broker investments into research data management projects. The three TDCCs are national and discipline\nbased, with one pillar each for the Social Sciences &amp; Humanities (SSH), Natural and Engineering Sciences (NES) and\nLife Sciences &amp; Health (LSH). The networks will help formulate and facilitate projects designed to promote the adoption\nof open data, software and research practices, alongside the development of the necessary expertise.</p>\n</blockquote>\n\n<p>So, where initiatives like <a href=\"https://www.go-fair.org/\">GO FAIR</a> had centers of competencies (the implementation networks),\nthey did not have funding for them. This was a main reason why the <em>Chemistry Implementation Network</em> (ChIN,\ndoi:<a href=\"https://doi.org/10.1162/dint_a_00035\">10.1162/dint_a_00035</a>) did not take off.\nThe TDCCs do not provide a lot of money, but enough to support disseminating expertise and promote some key ideas.</p>\n\n<p>The idea is that combined with other efforts, it strengthens the level of FAIR in the Dutch research community.\nI have to say, this is much needed, as the level of FAIR data in journal publications is so much to wish for,\nand still mostly absent.</p>\n\n<p>The FAIR4ChemNL project already had a networking activity during the writing of the proposal, the workshop already\nback in 2024 that I <a href=\"https://chem-bla-ics.linkedchemistry.info/2024/06/10/two-meetings.html\">blogged about earlier</a>\n(see also <a href=\"https://doi.org/10.5281/zenodo.15050550\">this report</a>).\nThe FAIRify project is coordinated by the group that was key in the <em>Netherlands Metabolomics Center</em> (NMC), now the\n<a href=\"https://metabolomicscentre.nl/\">BeneLux Metabolomics Center</a>. During a postdoc at the NMC during my Wageningen\ndays, we already did a lot of FAIR competency building with <a href=\"https://chem-bla-ics.linkedchemistry.info/tag/metware\">the MetWare project</a>.</p>\n\n<h2 id=\"the-col-lab-retreat\">The Col-Lab Retreat</h2>\n\n<p>The <a href=\"https://tdcc.nl/about-tddc/nes/\">TDCC-NES</a> organized a networking event in August last year,\nthe 2025 <a href=\"https://nescollab.nl/\">TDCC-NES Col-Lab Retreat</a>. I am late with\nreporting on it, but there simply was too much project management that took priority. The meeting was in the\nwonderful Dutch town Schoorl, and the location is great for collaborative meetings. I had been there a year\nearlier for an Open Science Retreat and was happy to go back.</p>\n\n<p>During the unconference-style meeting <a href=\"https://tdcc.nl/creating-space-for-our-community-the-story-of-our-nes-col-lab-retreat/\">various topics were discussed</a>\nin breakout groups, and because of the two TDCC projects, I was particularly interested in the <em>Metadata and interoperability</em>\ntopic. Partly because this is how we can make eletronic lab notebooks automatically push metadata to\nregistries (and <a href=\"https://www.linkedin.com/in/rory-macneil-68a80011/\">Rory Macneil</a> was also in Schoorl,\nof <a href=\"https://www.researchspace.com/\">RSpace/ResearchSpace</a> which already integrated with various open\nplatforms), and partly because I wanted to continue explore <a href=\"https://chem-bla-ics.linkedchemistry.info/tag/nanopub\">nanopublications</a>\nwith <a href=\"https://fediscience.org/@rupdecat\">Christian Meesters</a>, which could be the envelope to distribute\nthe metadata. For the last, I was looking at the Java library for nanopublications\n(see <a href=\"https://github.com/Nanopublication/nanopub-java/pull/52\">this PR</a>.</p>\n\n<p>The idea that ELNs automatically share metadata about experiments is something that is attractive.\nIt would require no involvement from the researcher, would be fully automatic, and drive interest\n(users, peer reviewers) to experiments and experimental data. Something that is still absurdly hard\nis to do a search for experiments that measured the melting point of some chemical. How\nawesome would it be if ELNs would automatically register chemicals from the experiment in,\nfor example, <a href=\"https://pubchem.ncbi.nlm.nih.gov/\">PubChem</a>.</p>\n\n<p>We had the idea of applying for a Lorentz Workshop, but the earliest deadline was too early, but\nmaybe it is time to pick up that idea again. Interoperability standards already exist, like\nthe aforementioned nanopubs, but also <a href=\"https://www.researchobject.org/ro-crate/\">RO-Crates</a> that are also studied by Jente Houweling\nin the VHP4Safety project (see <a href=\"https://platform.vhp4safety.nl/data\">this Data tab</a> for a preview).</p>",
      "summary": "Last autumn two TDCC projects started, FAIR4ChemNL (with the PeerTube channel and doi:10.61686/XVYQV45374) and FAIRify for metabolomics data (doi:10.61686/CSGIP04334). But I haven’t written much on either yet and what the role is our research group in these projects.",
      
      "date_published": "2026-02-21T00:00:00+00:00",
      "date_modified": "2026-02-21T00:00:00+00:00",
      "tags": ["fair","chemistry","metabolomics","fair4chemnl","fairify","nanopub","crate","pubchem"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1162/DINT_A_00035", "doi": "10.1162/DINT_A_00035"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.5281/ZENODO.15050550", "doi": "10.5281/ZENODO.15050550"
            , "cito":
              
              
                [ 
                  "citesAsEvidence"
                  
                 ]
              
             }
            
          
        ],
      
      "_funding": [{"award": { "title" : "FAIRify your metabolomics data: achieving convergence on standards for reuse-ready data and workflows", "acronym" : "FAIRify", "uri" : "doi:10.61686/CSGIP04334" }, "funder": { "name": "Dutch Research Council", "ror": "04jsz6e67" } },{"award": { "title" : "FAIR4ChemNL: Accelerating the adoption of universal data standards in chemistry", "acronym" : "FAIR4ChemNL", "uri" : "doi:10.61686/XVYQV45374" }, "funder": { "name": "Dutch Research Council", "ror": "04jsz6e67" } }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/1ja9h-jem83",
      "url": "https://chem-bla-ics.linkedchemistry.info/2026/02/08/open-infrastructures.html",
      "title": "Open Infrastructures #2: the SURF Fediverse",
      "content_html": "<p>When I first started writing this post, I started writing up why scientific communication is important, but because I started\nexplaining what needs improving, and what are underlying causes why change is not happening, it got dark pretty quickly. So,\nI deleted that essay again. Instead, let’s just enjoy the awesome and long list of solutions we have for scientific discourse.\nReaders of my blog can find many posts in the past 20 years about the diversification.\nOne thing I will say before I move one, is a reply to the argument that journal-based peer review is essential to the\nquality of research: if the quality of your research is dependent on your peers, then please rethink why you are doing research.</p>\n\n<p>Now, about the <a href=\"https://en.wikipedia.org/wiki/Fediverse\">fediverse</a>…</p>\n\n<h2 id=\"mastodon-service-by-surf\">Mastodon (service by SURF)</h2>\n\n<p><a href=\"https://en.wikipedia.org/wiki/Mastodon_(social_network)\">Mastodon</a> is one of the more well-known corners of the fediverse,\nand I <a href=\"https://chem-bla-ics.linkedchemistry.info/tag/mastodon\">blogged about it before</a>.\nIt is intrinsically open, while it has extensive options to make things more private. It is like Twitter but then without the\ncentral control. It is unlike Slack, <a href=\"https://en.wikipedia.org/wiki/Zulip\">Zulip</a>, and LinkedIn which has clear walls around communities.\nIt also is unlike past efforts like Google Wave and <a href=\"https://chem-bla-ics.linkedchemistry.info/tag/friendfeed\">FriendFeed</a>\nwhich created much more structured discourse.</p>\n\n<p>But I enjoy Mastodon. It has all the good science, the friendly, helpful people, and I have many options to block people,\nfediverse servers, and even individual keywords (you can remove anything “PFAS”, for example, something hard in the real world).\nBut you also have linear timeline, with just content of the people you follow.</p>\n\n<p>And, with the <a href=\"http://surf.nl/\">SURF</a> <a href=\"https://social.edu.nl/\">social.edu.nl</a> server, every researcher from a SURF-linked\nresearch insitute can get an account there via <a href=\"https://www.surf.nl/en/services/identity-access-management/surfconext\">SURFconext</a>\n(the Mastodon solution may need to be activited by your institute first; if so, ask your institute ICT to enable it).\nThe list of accounts on this SURF Mastodon server shows <a href=\"https://social.edu.nl/directory?order=active\">a veried list of people and organisations</a>,\nbut you can also check this list of <a href=\"https://communities.surf.nl/publieke-waarden/artikel/80-ways-to-follow-research-science-and-education-on-mastodon\">80 Ways to follow Research, Science and Education on Mastodon</a>.\nOr <a href=\"https://chem-bla-ics.linkedchemistry.info/2022/11/21/finding-mastodon-accounts-with-wikidata.html\">this list of Wikidata queries</a>.</p>\n\n<p>I think every organization that communicates their research should have at least one open world communication channel,\nand if they then like to keep their wall-garden LinkedIn account too, that is fine. But societal impact for just a select group\nof people feels a bit awkward to me.</p>\n\n<h2 id=\"peertube-service-by-surf\">PeerTube (service by SURF)</h2>\n\n<p>But SURF operates a second fediverse server, one using the <a href=\"https://en.wikipedia.org/wiki/PeerTube\">PeerTube</a> software, also\nextended with the SURFconext interoperability. PeerTube is a platform to share videos, like YouTube.\nJust before the winter holiday, I got the opportunity to create two project accounts on SURF’s <a href=\"https://video.edu.nl/\">video.edu.nl</a>,\none for the <a href=\"https://vhp4safety.nl/\">VHP4Safety</a> project and one for the <a href=\"https://tdcc.nl/projects/tdcc-nes-projects/fair4chemnl-accelerating-the-adoption-of-universal-data-standards-in-chemistry/\">FAIR4ChemNL</a>\nproject.</p>\n\n<p>The cool thing actually is that SURFconext has group accounts via <a href=\"https://servicedesk.surf.nl/wiki/spaces/IAM/pages/92668196/SURFconext+Invite+EN\">SURFconext Invite</a>\n(it was earlier called <em>SURFconext Teams</em>), so these two PeerTube channels are operated by two or more\npeople from the project, and the two videos that are now available, have not actually been uploaded by me.</p>\n\n<p>But I am very excited we now have channels to share our video communication, <a href=\"https://video.edu.nl/a/vhp4safety/videos\">here for VHP4Safety</a>:</p>\n\n<p><img src=\"/assets/images/peertube_vhp4safety.png\" alt=\"\" /></p>\n\n<p>And <a href=\"https://video.edu.nl/a/fair4chemnl/videos\">here for FAIR4ChemNL</a>:</p>\n\n<p><img src=\"/assets/images/peertube_fair4chemnl.png\" alt=\"\" /></p>\n\n<!-- Communication infrastructure behind the world wide web has been open infrastructure for a long time, including email, the web itself,\nand internet relay chat. Early commercial alternatives, like Compuserve and AOL, created walled gardens using unique information, quite like\nNetflix, HBO, and AppleTV do now. While these disappeared, the commercial need for walls is deep rooted in the Western culture.\nAnd the walled gardens won in the end. The do for streaming, for searching, and increasingly for communication. The latter, of course,\nis causing a lot of social problems, by controlling who can say what to whom. And being operated by huge interantional companies, the\noften operate outside law. Even the European Commissions cannot keep them within legal limits.\n\nIt is essential to realize this affects the research community hard. The publishing industry is largely a walled garden: it was\nbefore open access and with APC-that-come-with-30-percent-profit as the norm the walls have not really dropped. If you prefer to\ntalk about the peer review walls, the walls exist just as well: who can do peer review (is allowed inside the wall), who decides\nwhich peer reviewers are important (who gets thrown outside the wall), and why post-publication peer review is not a thing\n(only thing inside the wall matter). The walls, unfortunately, are often based on good looks (like journal impact factor,\nthe label \"American\" or \"Society\") and discussions about quality are mostly pushed outside the wall.\n\nYet, communication is a central activity in doing research, and open communication channels are to me an essential part\nof that. If the discussion of good science is limited to those in power, this can only harm science. Of course, retractions are\nrare, fraud even more, and any correlation with anything cannot happen inside the walls (until it does).\nUnfortunately, until we can untangle the notion of peer review from prestige, power, and money, it will not easily change. -->",
      "summary": "When I first started writing this post, I started writing up why scientific communication is important, but because I started explaining what needs improving, and what are underlying causes why change is not happening, it got dark pretty quickly. So, I deleted that essay again. Instead, let’s just enjoy the awesome and long list of solutions we have for scientific discourse. Readers of my blog can find many posts in the past 20 years about the diversification. One thing I will say before I move one, is a reply to the argument that journal-based peer review is essential to the quality of research: if the quality of your research is dependent on your peers, then please rethink why you are doing research.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/peertube_fair4chemnl.png",
      "date_published": "2026-02-08T00:00:00+00:00",
      "date_modified": "2026-02-08T00:00:00+00:00",
      "tags": ["mastodon","peertube","vhp4safety","fair4chemnl"],
      
      "_funding": [{"award": { "title" : "The Virtual Human Platform for Safety Assessment", "acronym" : "VHP4Safety", "uri" : "drc.filenumber:nwa129219272" }, "funder": { "name": "Dutch Research Council", "ror": "04jsz6e67" } },{"award": { "title" : "FAIR4ChemNL: Accelerating the adoption of universal data standards in chemistry", "acronym" : "FAIR4ChemNL", "uri" : "doi:10.61686/XVYQV45374" }, "funder": { "name": "Dutch Research Council", "ror": "04jsz6e67" } }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/v13h7-7av66",
      "url": "https://chem-bla-ics.linkedchemistry.info/2026/01/17/chemical-blogs-history.html",
      "title": "Chemical blogs history",
      "content_html": "<p>Like many awesome internet phenomena, <a href=\"https://en.wikipedia.org/wiki/Blog\">blogging started in the late nineties</a>.\n<a href=\"https://doi.org/10.1038/432933a\">Nature</a> <a href=\"https://doi.org/10.1038/ngeo170\">authors</a> <a href=\"https://doi.org/10.1038/ncb0905-845b\">and</a>\n<a href=\"https://doi.org/10.1038/4571058a\">editors</a> recognized the effort early. In 2006 there were already\nmore than 45 million blogs, and at least 50 science blogs made it in the top 50,000 and\n<a href=\"https://doi.org/10.1038/442009a\">5 in the top 3,500</a>.</p>\n\n<p>I started blogging in 2005, around the time many others did, among which many chemists.\nin 2006 I started a website called <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/08/25/chemical-blogspace.html\">Chemical blogspace</a>\nusing the <em>Postgenomic.com</em> software. Chemical blogspace extracted which journal articles\nwere discussed (yeah, there is <a href=\"https://www.linkedin.com/pulse/how-did-altmetric-come-euan-adie/\">a causal relationship with altmetrics</a>!),\nand <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/02/25/hacking-inchi-support-into.html\">I added recognition of chemicals</a>,\nso that you could <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/01/04/chemical-blogspace-is-getting-more.html\">follow blog posts talking about a specific chemical</a>.\nI <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/10/16/lunch-at-nature-hq-with-euan-joanna-ian.html\">visited Euan Adie and others in 2007</a>.\nI had to sunset Chemical blogspace several years later, in a time where blogging seems to\nbe on its return, overtaken by microblogging platforms like Twitter (which died in 2022).</p>\n\n<p>We know now that it didn’t really go away, however. If we look at <a href=\"https://chem-bla-ics.linkedchemistry.info/2024/07/21/rogue-scholar-and-more.html\">Rogue Scholar</a>\nwe <a href=\"https://docs.rogue-scholar.org/dashboard\">see there is plenty of activity</a>, indeed.\nI am very interested in restarting something like Chemical blogspace, based on Rogue Scholar.\nThe nice things of Chemical blogspace was that it created a virtual community, and in\nthe end it aggregated and indexed more than 250 chemistry blogs. I would love to see\nmany of them archived on Rogue Scholar, but the blog authors have to\n<a href=\"https://tally.so/r/nPvNK0\">recommend their blog personally here</a>.</p>\n\n<p>You can also just visit many of these blogs to relive the dynamics at the time:</p>\n\n<ul>\n  <li><a href=\"https://chemicalblogspace.blogspot.com/2006/12/new-blogs-1.html\">New Blogs #1</a> (2006)</li>\n  <li><a href=\"https://chemicalblogspace.blogspot.com/2007/01/new-blogs-2.html\">New Blogs #2</a> (2007)</li>\n  <li><a href=\"https://chemicalblogspace.blogspot.com/2007/02/new-blogs-3.html\">New Blogs #3</a></li>\n  <li><a href=\"https://chemicalblogspace.blogspot.com/2007/03/new-blogs-4.html\">New Blogs #4</a></li>\n  <li><a href=\"https://chemicalblogspace.blogspot.com/2007/04/new-blogs-5.html\">New Blogs #5</a></li>\n  <li><a href=\"https://chemicalblogspace.blogspot.com/2007/05/new-blogs-6.html\">New Blogs #6</a></li>\n  <li><a href=\"https://chemicalblogspace.blogspot.com/2007/06/these-are-new-blogs-that-entered.html\">New Blogs #7</a></li>\n  <li><a href=\"https://chemicalblogspace.blogspot.com/2007/10/new-blogs-8.html\">New Blogs #8</a></li>\n  <li><a href=\"https://chemicalblogspace.blogspot.com/2008/04/new-blogs-9.html\">New Blogs #9</a> (2008)</li>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2009/07/23/new-blogs-10.html\">New Blogs #10 <i class=\"fa-solid fa-recycle fa-xs\"></i></a> (2009)</li>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2009/07/31/new-blogs-11.html\">New Blogs #11 <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n  <li><a href=\"https://chemicalblogspace.blogspot.com/2009/11/new-blogs-12.html\">New Blogs #12</a></li>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2010/07/15/cb-new-blogs-13.html\">Cb: New Blogs #13</a> (2010)</li>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2010/10/22/cb-new-blogs-14.html\">Cb: New Blogs #14</a></li>\n</ul>\n\n<p>A lot has happened since then. There are <a href=\"https://docs.rogue-scholar.org/dashboard\">new platforms</a>.\nBlogger and Wordpress are still the bigger platform, but Hugo, Jekyll, and Quarto are modern, open source\nalternatives. <a href=\"https://www.anildash.com/2026/01/09/how-markdown-took-over-the-world/\">Markdown may have helped</a>\nwith the revival of blogging, making it easier than ever.</p>\n\n<p>What is your current favorite chemistry blog? Love to hear from you!</p>",
      "summary": "Like many awesome internet phenomena, blogging started in the late nineties. Nature authors and editors recognized the effort early. In 2006 there were already more than 45 million blogs, and at least 50 science blogs made it in the top 50,000 and 5 in the top 3,500.",
      
      "date_published": "2026-01-17T00:00:00+00:00",
      "date_modified": "2026-03-20T00:00:00+00:00",
      "tags": ["blog","nature"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1038/4571058a", "doi": "10.1038/4571058a"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1038/ngeo170", "doi": "10.1038/ngeo170"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1038/ncb0905-845b", "doi": "10.1038/ncb0905-845b"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1038/442009a", "doi": "10.1038/442009a"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1038/432933a", "doi": "10.1038/432933a"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/0xxqw-90533",
      "url": "https://chem-bla-ics.linkedchemistry.info/2026/01/10/where-does-the-wikipathways-cited-in-information-come-from.html",
      "title": "Where does the WikiPathways Cited In information come from?",
      "content_html": "<p>I have been wanting to blog about this since this summer, but with everything going on, I never really got around to it.\nWhat is this <em>Cited In</em> feature of <a href=\"https://wikipathways.org/\">WikiPathways</a> and where does that information come from?\nIf you have not noticed this yet, this is what it looks like for <a href=\"https://www.wikipathways.org/instance/WP4846\">WP4846</a>:</p>\n\n<p><img src=\"/assets/images/wp_cited_in.png\" alt=\"\" /></p>\n\n<p>Recently, I was close to writing up the context, because it is related to a new feature of the profile pages, where you\nnow can look up citations to pathways that you first authored (see\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2025/11/30/wikipathways-curation-reports-on-profile-pages.html\">this post</a>).\nAnd it also relates to the data I have been collecting around <a href=\"https://chem-bla-ics.linkedchemistry.info/tag/cito\">citation intention annotations</a>:\narticles that cite one of the WikiPathways papers and mention a specific pathway, could be considered <em>cito:usesDataFrom</em>\n(see doi:<a href=\"https://doi.org/10.1186/s13321-023-00683-2\">10.1186/s13321-023-00683-2</a>).</p>\n\n<p>A third angle to citations to specific WikiPathways is the following. WikiPathways is used a lot in data analyses and\nputting experimental data in biological context. How researchers do this varies a lot, in multiple ways. But just\nthinking about this factually, research output cite specific biological pathways. And there are some interesting\nphenomena there. Back in 2015 at the Metabolomics Society meeting in San Francisco (apparently, I only\nblogged about the meeting only <a href=\"https://chem-bla-ics.blogspot.com/2015/06/metsoc2015-converting-smiles-annotation.html\">once</a>?),\nwhen I visited the 500+ posters looking for interesting biological pathways, there were a lot of studies\non different species, different diseases, different toxicities. The biological response had one thing in common:\nit always was the TCA cycle that was key (see doi:<a href=\"https://doi.org/10.1096/FJ.11-203091\">10.1096/FJ.11-203091</a> for\na 2012 comparison of TCA models).</p>\n\n<p>Thus, with so many articles mentioned specific pathways and deriving biological knowledge from this, what is\nreasonable to expect? Do we expect <em>co-citation</em> effects? That is, if two articles found the same set of pathways\nof interest to their data, is the data showing a similar biological response? Do we expect a similar thing\nlike the above TCA cycle in metabolomics, something similar to the notion of <em>frequent hitters</em> (see\ndoi:<a href=\"https://doi.org/10.1021/jm010934d\">10.1021/jm010934d</a>)?</p>\n\n<p>Of course, to test this hypothesis we need data and the <em>Cited In</em> feature comes in. At the time of\nwriting of this blog post, we can see on <a href=\"https://www.wikipathways.org/browse/citedin.html\">this page</a>\nthat 878 pathways have been cited a total of 2715 times. We are getting somewhere. This blog\npost will not analyze this data, which is one reason why I had not blogged about it. But from the\nabove you can understand that I want to :)</p>\n\n<h2 id=\"the-cited-in-feature\">The Cited In feature</h2>\n\n<p>This <em>Cited In</em> feature was introduced along with the new website (see doi:<a href=\"https://doi.org/10.1093/nar/gkad960\">10.1093/nar/gkad960</a>),\nwhere we change how GPML files are stored and how web pages are created from that.\nBecause we are no longer confined to the MediaWiki platform (which has served the project for very long,\nvery effectively), it is easier to integrate information from other sources. For example,\nfrom literature databases. This feature was developed by <a href=\"https://orcid.org/0000-0001-5706-2163\">Alex Pico</a>\nat the Gladstone Institutes (see <a href=\"https://github.com/wikipathways/wikipathways-database/commit/840234adfd581730d86553910c078401351606ce\">this 2022 commit</a>),\nwhere he uses the <a href=\"https://www.ncbi.nlm.nih.gov/books/NBK25497/\">NCBI eUtils API</a> to access\n<a href=\"https://pmc.ncbi.nlm.nih.gov/\">PubMed Central</a>.\nThe data is then collected into <a href=\"https://github.com/wikipathways/wikipathways-database/blob/main/downstream/citedin_lookup.yml\">this YAML file</a>\nwhich then gets used to generate webpage content (like the section in the above screenshot\nand the page mentioning the current statistics).</p>\n\n<h2 id=\"where-is-the-data-coming-from\">Where is the data coming from?</h2>\n\n<p>As just explained, originally the data was only coming from NCBI.\nHowever, because I found many articles citting specific pathways that were not picked up by this\napproach, and I wanted more data, so I started searching <a href=\"https://europepmc.org/\">Europe PMC</a> the European\npartner of PubMed Central. However, I am not automating this. I want to see the data, the articles, and\nhow people cite the pathways. I need to see that so that I can better understand how people are\nusing the data/knowledge from WikiPathways. I cannot keep up with checking why people are citing\nmy own research, but <a href=\"https://chem-bla-ics.linkedchemistry.info/2010/10/31/citeulike-cito-use-case-1-wordles.html\">I once was</a>.\nI learn(-ed) a lot from that.</p>\n\n<p>I normally use a search that requires the word “WikiPathways” to be\n<a href=\"https://europepmc.org/search?query=wikipathways\">mentioned in the article</a> (in most, but\nnot all of them; citing literature you extend sounds like a core scholarly value, but is factually\nnot systematically complied with), and then manually searching for “WP”. With close to 1000\nPubMed Central articles mentioning WikiPathways in 2025 and that these are mostly full texts,\nI can see if the cite specific pathways. A good number of article mentions the WikiPathways\nidentifier, e.g. the aforementioned <code class=\"language-plaintext highlighter-rouge\">WP4846</code>. If the article only mentions a pathway title,\nI cannot confidently identify which pathway is cited, so I exclude that.</p>\n\n<p>I originally started out manually editing the YAML file where the citations are collected,\nbut by now use <a href=\"https://github.com/wikipathways/wikipathways-database/blob/main/scripts/citedin_fromFile.R\">a script similar to Alex’ R script</a>.\nThis makes it far easier to scale up, as I just have to populate a three column TSV file,\nwhich is used by my R script to update the YAML file. This manual approach ensures that\nI am not looking at text mining results, but see the citation of the WikiPathways identifier\nwith my own eyes. That’s just how I like it.</p>\n\n<p>The full history of the YAML file content can be found on <a href=\"https://github.com/wikipathways/wikipathways-database/commits/main/downstream/citedin_lookup.yml\">this GitHub page</a>\nand <a href=\"https://github.com/wikipathways/wikipathways-database/blame/main/downstream/citedin_lookup.yml\">this <em>git blame</em></a>\ntells you if the information came from PubMed Central via the API, or was added by me:</p>\n\n<p><img src=\"/assets/images/wp_cited_in_git_blame.png\" alt=\"\" /></p>\n\n<p>This is Open Science in action: added transparency and making it easier for anyone to verify,\nso that no one needs to be stuck in (dis)trust.</p>\n\n<p>Of course, as we know from the CiTO ontology and real-world data, there are so\nmany different reasons why journal articles are cited (just <a href=\"https://chem-bla-ics.linkedchemistry.info/2024/08/07/cito-updates.html\">an example</a>),\nthe data in the YAML file and on the WikiPathways website in the <em>Cited In</em> feature\ndoes not have direct meaning. Just like a high citation count for an article or\neven a journal impact factor cannot be directly interpreted (despite so many researchers\njust blindly doing just that).</p>\n\n<h2 id=\"whats-next\">What’s next?</h2>\n\n<p>Well, while I did not do any analysis yet, and do not even know yet how much citations we need to\nreach some level of statistical significance, there are some observations I can mention:</p>\n\n<ul>\n  <li>if your analysis included anything like linking your data to pathways, citing those pathways is\na good way to give credit to the researchers that created that pathway</li>\n  <li>if you cite data, please cite that as accurately as possible, see e.g. DataCite</li>\n  <li>I wish all journal articles citing specific pathways from WikiPathways would include the pathway identifier</li>\n  <li>I congratulate those authors that even mentioned the revision of the pathway! well done!</li>\n</ul>\n\n<p>And about biological interpretation, our group has long published that some genes with\ndifferential data mapping to a pathway does not imply that that pathway is really affected.\nGene-set enrichment and over-representation analysis are a starting point; not a conclusion.\nI wish more people were more aware of the work in our (now)\n<a href=\"https://cris.maastrichtuniversity.nl/en/organisations/translational-genomics/\">Translational Genomics research group</a>.\nLike that of <a href=\"https://orcid.org/0000-0002-7699-8191\">Martina Kutmon</a> (now as\n<a href=\"https://www.maastrichtuniversity.nl/research/maastricht-centre-systems-biology-and-bioinformatics\">MaCSBio<sup>2</sup></a>),\nwhom I have had the pleasure of collaborating with for quite some years now (and long time\narchtect of WikiPathways).</p>\n\n<p>There is so much more I want to write up about WikiPathways, but I leave it to this\nfor now.</p>",
      "summary": "I have been wanting to blog about this since this summer, but with everything going on, I never really got around to it. What is this Cited In feature of WikiPathways and where does that information come from? If you have not noticed this yet, this is what it looks like for WP4846:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/wp_cited_in.png",
      "date_published": "2026-01-10T00:00:00+00:00",
      "date_modified": "2026-01-10T00:00:00+00:00",
      "tags": ["wikipathways","europepmc"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1186/S13321-023-00683-2", "doi": "10.1186/S13321-023-00683-2"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1096/FJ.11-203091", "doi": "10.1096/FJ.11-203091"
            , "cito":
              
              
                [ 
                  "citesAsDataSource"
                  
                 ]
              
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1021/jm010934d", "doi": "10.1021/jm010934d"
            , "cito":
              
              
                [ 
                  "obtainsBackgroundFrom"
                  
                 ]
              
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1093/NAR/GKAD960", "doi": "10.1093/NAR/GKAD960"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/wyz5v-vts33",
      "url": "https://chem-bla-ics.linkedchemistry.info/2026/01/05/plantmetwiki.html",
      "title": "PlantMetWiki: a linked open data service for querying and analyzing plant pathway knowledge",
      "content_html": "<p>Back on October I presented <em>Everything you always wanted to know: plant pathway modelling in WikiPathways</em> (doi:<a href=\"https://doi.org/10.5281/zenodo.18149988\">10.5281/zenodo.18149988</a>)\nat the <em>Knowledge Graphs for Plant and Microbiome Multiomics</em> symposium (see <a href=\"https://web.archive.org/web/20260105060309/https://www.linkedin.com/posts/elena-del-pup-840805164_knowledgegraphs-plantbiology-fairdata-activity-7351538387108978689-bgqh/\">this archived LinkedIn post</a>)\non 14th October 2025 (<a href=\"https://www.youtube.com/watch?v=NgYRHiuBvpc\">youtube recording</a>).\nI had not found time yet to post about this meeting, but it was an awesome list of speakers, regrettable absense of some others, but resulting\nin new contacts and some slowly evolving collaborations.</p>\n\n<p>Previously, plant pathways were somewhat negatively prioritized at our BiGCaT research group. Something with Dutch academic politics. But that\nwas 10 years ago, and with the notion that human health very much involves the exposome, which includes live around humans, I think the\nplant pathway science is important to human health. Even just the human health impacts of drops in biodiversity. Or the impact on our\nnutrition supply chain of climate change.</p>\n\n<p>Anyway, I am happy that <a href=\"https://github.com/elenadelpup\">Elena</a> and <a href=\"https://github.com/DeniseSl22\">Denise</a>\npulled me into a <a href=\"https://github.com/pathway-lod\">collaboration</a> to create an RDF-based knowledge graph about plant pathways.\nTheir idea was to <a href=\"https://plantcyc.org/\">PlantCyc</a> pathways (their license seems to allow that; doi:<a href=\"https://doi.org/10.1093/nar/gkae991\">10.1093/nar/gkae991</a>),\nconvert that to GPML (<a href=\"https://github.com/pathway-lod/Cyc_to_wiki\">by Max</a>) and then to RDF. That last step is where I come in. The details will follow later, but Elena announced\nthe project on LinkedIn (<a href=\"https://web.archive.org/web/20260105060958/https://www.linkedin.com/feed/update/urn:li:activity:7407756920041713664/\">archived link</a>),\nso time to blog about it myself too.</p>\n\n<p>I am happy with this effort, not just because we now have pathways in RDF form for more than 500 species, but also\nbecause it requires continued development of the WikiPathways solutions, like GPML and\n<a href=\"https://github.com/PathVisio/libGPML\">libGPML</a> and the RDF generation\ncode, but also BridgeDb (doi:<a href=\"https://doi.org/10.1186/1471-2105-11-5\">10.1186/1471-2105-11-5</a>).\nThe latter provides the identifier mapping infrastructure, but needed to be extended for\nthe new species (something I had to do earlier this year for several <a href=\"https://www.wikipathways.org/search.html?query=caffeine+synthesis\">caffeine synthesis pathways</a>\ndeveloped at the <a href=\"https://2025.biohackathon.org/\">DBCLS BioHackathon 2025</a>).</p>\n\n<p>Lars gave me a tip on how to scale this up (after <a href=\"https://github.com/bridgedb/datasources/commit/be64e5ac120d21fc70f742a090353fb801279b38\">a manual addition</a>),\n<a href=\"https://verifier.globalnames.org/\">verifier.globalnames.org</a> (doi:<a href=\"https://doi.org/10.5281/zenodo.17245658\">10.5281/zenodo.17245658</a>,\nwhich greatly helped me out. It translates species names\ninto identifiers, and their JSON is very rich in that process as well as easy to process. So,\n<a href=\"\">a custom script</a> allowed me to update BridgeDb more efficiently. Highly recommended!</p>\n\n<p>So, the resulting knowledge base is available at <a href=\"https://plantmetwiki.bioinformatics.nl/\">plantmetwiki.bioinformatics.nl</a>\nand looks like this (also big thanks to Marvin for support in setting this up!):</p>\n\n<p><img src=\"/assets/images/plantmetwiki.png\" alt=\"\" /></p>",
      "summary": "Back on October I presented Everything you always wanted to know: plant pathway modelling in WikiPathways (doi:10.5281/zenodo.18149988) at the Knowledge Graphs for Plant and Microbiome Multiomics symposium (see this archived LinkedIn post) on 14th October 2025 (youtube recording). I had not found time yet to post about this meeting, but it was an awesome list of speakers, regrettable absense of some others, but resulting in new contacts and some slowly evolving collaborations.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/plantmetwiki.png",
      "date_published": "2026-01-05T00:00:00+00:00",
      "date_modified": "2026-01-05T00:00:00+00:00",
      "tags": ["wikipathways","gpml","rdf"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.5281/zenodo.18149988", "doi": "10.5281/zenodo.18149988"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1093/nar/gkae991", "doi": "10.1093/nar/gkae991"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1186/1471-2105-11-5", "doi": "10.1186/1471-2105-11-5"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.5281/zenodo.17245658", "doi": "10.5281/zenodo.17245658"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/6t2qh-2f839",
      "url": "https://chem-bla-ics.linkedchemistry.info/2025/12/31/rescuing-scholia-2-getting-close.html",
      "title": "Rescuing Scholia #2: getting closer",
      "content_html": "<p>Three weeks ago, I wrote a the post <a href=\"https://chem-bla-ics.linkedchemistry.info/2025/12/08/rescuing-scholia.html\">Rescuing Scholia: will we make it in time?</a>,\nwhere I sketched a future without <a href=\"https://scholia.toolforge.org/\">Scholia</a>. Scholia, started\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2023/01/27/scholia-timeline.html\">almost 10 years ago</a>\nand I think it is worth keeping around longer.</p>\n\n<p>Fortunately, it looks like we will have a working replacement in time before the\n<a href=\"https://www.mediawiki.org/wiki/Wikidata_Query_Service\">WDQS</a> instance with all the\n<a href=\"https://wikidata.org/\">Wikidata</a> triples in a single SPARQL endpoint goes down,\nlikely in a week or so (even tho we may be behind <a href=\"https://openalex.org/works?page=1&amp;filter=cites:w2767995756\">the citation peak</a>).</p>\n\n<p>The work of the past year helped, for exampe, making it easier to configure Scholia for a different\nendpoint and the asynchronous loading of panels (reducing the stress on the SPARQL end point).\nAlready in September, Prof. <a href=\"https://github.com/hannahbast\">Hannah Bast</a> started\n<a href=\"https://github.com/WDscholia/scholia/pull/2715\">a branch</a> for the transition and various\nhackathons this autumn, and the work by <a href=\"https://github.com/KonradLinden\">Konrad Linded</a>\nwho explored and addressed some of the hurdles to take. The tips and suggestions from\nHannah and <a href=\"https://github.com/RobinTF\">RobinTF</a> really made a difference. And also a huge thanks\nto <a href=\"https://orcid.org/0000-0001-9488-1870\">Daniel</a> who kept relentlessly pushing this forward.</p>\n\n<p>When I posted my <a href=\"https://chem-bla-ics.linkedchemistry.info/2025/12/08/rescuing-scholia.html\">will we make it</a> post,\nthere was a demo instance and a spreadsheet showing the state of each query. The instance\nshowed no human-readable labels. This was because the WDQS <code class=\"language-plaintext highlighter-rouge\">wikibase:label</code> service \nwas used a lot, and there is no replacement for that. Getting labels for all relevant\nitems is possible, but makes the queries a lot heavier and made even more queries\nrun out of memory. Various solutions were <a href=\"https://github.com/ad-freiburg/scholia/issues/17\">discussed</a>,\nFinn indicated he <a href=\"https://github.com/ad-freiburg/scholia/issues/17#issuecomment-3605952951\">preferred a macro solution</a>,\nwhich <a href=\"https://github.com/ad-freiburg/scholia/pull/20/changes\">Lars implemented</a>, and\nsaw some tweaks after that. Then followed a long series of patches by particularly\n<a href=\"https://github.com/pfps\">Peter</a> to update all the SPARQL queries to have them use\nthe new labels macro. But plenty of other things were fixed or newly implemented,\nsuch as <a href=\"https://github.com/WolfgangFahl\">Wolfgang</a>’s <a href=\"https://qlever.scholia.wiki/backend\">/backend</a>\npage.</p>\n\n<p>So, with one week to go, we need your help: as the weekly\n<a href=\"https://www.wikidata.org/wiki/Wikidata:Status_updates/2025_12_29\">Wikidata Status Update</a>\nalready indicated:</p>\n\n<blockquote>\n  <p>this month’s Scholia hackathon has moved Scholia closer to its planned switch to a\nQLever backend. Beta testers can assist by exploring the\n<a href=\"https://qlever.scholia.wiki/\">interim QLever-backed Scholia instance</a>\nand <a href=\"https://github.com/WDscholia/scholia/issues\">reporting any issues</a>.</p>\n</blockquote>\n\n<p>And thanks to <a href=\"https://github.com/Adafede\">Adriano</a> and others who already have!</p>\n\n<p>Now, we are not done yet. The real instance at <a href=\"https://scholia.toolforge.org/\">scholia.toolforge.org</a>\nhas seen ridiculous abuse by scrapers (and the main instance is regularly unusable, to be honest),\nand we have no idea the new setup is powerful enough. And we need to point to the new servers anyway.\nSo, plenty of work is left to be done in the next few days.</p>\n\n<p>But we are getting close. So, please give <a href=\"https://qlever.scholia.wiki/\">qlever.scholia.wiki</a>\na go, and let us know your observations. As <a href=\"https://en.wikipedia.org/wiki/Linus%27s_law\">Linus’s law</a> writes:</p>\n\n<blockquote>\n  <p>Given enough eyeballs, all bugs are shallow.</p>\n</blockquote>",
      "summary": "Three weeks ago, I wrote a the post Rescuing Scholia: will we make it in time?, where I sketched a future without Scholia. Scholia, started almost 10 years ago and I think it is worth keeping around longer.",
      
      "date_published": "2025-12-31T00:00:00+00:00",
      "date_modified": "2025-12-31T00:00:00+00:00",
      "tags": ["wikidata","scholia","sparql","rdf"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/g30ef-gxm10",
      "url": "https://chem-bla-ics.linkedchemistry.info/2025/12/29/open-infrastructures.html",
      "title": "Open Infrastructures #1: Research Software Directory",
      "content_html": "<blockquote>\n  <p>Research software is an integral part of scientific investigations.</p>\n</blockquote>\n\n<p>This is what Struck wrote in 2018 in a contribution to the 2018 IEEE 14th International Conference on e-Science (e-Science)\n(doi:<a href=\"https://doi.org/10.1109/eScience.2018.00016\">10.1109/eScience.2018.00016</a>). I very much agree with this,\nand the notion is gaining ground in the academic community. Their paper <em>“identifies challenges, risks and new opportunities\nin research software publication and discovery”</em>.</p>\n\n<p>At the same conference, Spaaks <em>et al.</em> presented a lightning talk about the\n<a href=\"https://research-software-directory.org/software/rsd-ng\">Research Software Directory</a> (RSD),\n<em>“a content management system for research software, which promotes the visibility, reuse, and impact of research software”</em>\n(doi:<a href=\"https://doi.org/10.1109/eScience.2018.00013\">10.1109/eScience.2018.00013</a>).</p>\n\n<p>I wonder who spoke first at the meeting.</p>\n\n<p>Anyway, I learned about RSD a while ago already and have been using it for some of our\ngroup’s research software. We don’t have a collection for our group, but you will find them\nunder the <a href=\"https://research-software-directory.org/organisations/maastricht-university\">Maastricht University organisation page</a>.</p>\n\n<p>And as sketched by Struck and implemented by Spaaks <em>et al.</em>, RSD gives rich context\nto the research software. It can track the activity on the project (for GitHub, GitLab,\n<a href=\"https://github.com/research-software-directory/RSD-as-a-service/issues/1605\">Codeberg</a> etc),\ntrack citations to key literature, and can have links to distributions where the software is published\n(like Debian, CRAN, <a href=\"https://github.com/research-software-directory/RSD-as-a-service/issues/1606\">Bioconductor</a>, etc).</p>\n\n<p>This is what it looks like for the <a href=\"https://research-software-directory.org/software/cdk\">Chemistry Development Kit</a>:</p>\n\n<p><img src=\"/assets/images/rsd.png\" alt=\"\" /></p>\n\n<p>I like initiatives like this, as they help the community work out open standards to exchange\nmetadata, and encourage other projects by reusing their APIs.</p>",
      "summary": "Research software is an integral part of scientific investigations.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/rsd.png",
      "date_published": "2025-12-29T00:00:00+00:00",
      "date_modified": "2025-12-29T00:00:00+00:00",
      "tags": ["cdk","openscience"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1109/eScience.2018.00016", "doi": "10.1109/eScience.2018.00016"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1109/eScience.2018.00013", "doi": "10.1109/eScience.2018.00013"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/yh369-rr787",
      "url": "https://chem-bla-ics.linkedchemistry.info/2025/12/08/rescuing-scholia.html",
      "title": "Rescuing Scholia: will we make it in time?",
      "content_html": "<p>What <a href=\"https://chem-bla-ics.linkedchemistry.info/2023/01/27/scholia-timeline.html\">started out in 2016 on Twitter</a> became a\n<a href=\"https://meta.wikimedia.org/wiki/Coolest_Tool_Award/Full_history\">(small) award winning</a>\n<a href=\"https://chem-bla-ics.linkedchemistry.info/tag/scholia\">decade long collaborative project</a>.\nUnfortunately, the future is not clear. We are at odds if it will survice the growth of Wikidata\nand in particularly the <a href=\"https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_graph_split\">SPARQL graph split</a>.\nTo be clear, the choice for Blazegraph initially worked great, but after it was bought by a big\ncompany, developed halted. Very unfortunate for Wikidata. Unlike earlier, we no longer have funding, and rewriting Scholia\nat this scale takes a good bit of effort. We already\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2025/04/20/the-april-2025-scholia-hackathon.html\">held a few hackathons</a>.</p>\n\n<p>So far, we have been able to continue to use a <em>legacy</em> SPARQL endpoint with all the data, but in exactly one month\nthat endpoint will be sunset. And we are <strong>not</strong> ready.</p>\n\n<h2 id=\"rescuing-scholia\">Rescuing Scholia</h2>\n<p>Daniel and Lane have been leading an effort to rescue Scholia. The hackathons were part of this effort. It seems\nthat <a href=\"https://en.wikipedia.org/wiki/QLever\">QLever</a> is the only route left. Earlier efforts to rewrite the more\nthan 350 Scholia SPARQL queries to support the graph split have basically failed. The complexity is far too high.\nQLever, however, provides the full graph and since recently full SPARQL 1.1 support. That is also not enough to\nreproduce the full Scholia functionality, but it seems to get us far.\nImportantly, the data may not update as frequently as the <a href=\"https://www.mediawiki.org/wiki/Wikidata_Query_Service\">WDQS</a>,\nand that is another complexity to take into account. Particularly, all the 404 pages.</p>\n\n<p>So, in the next weeks, we have to complete rewriting all those queries as queries that QLever can handle. A team\nof people have done great work already, <a href=\"https://github.com/ad-freiburg/scholia/issues?q=is%3Aissue%20author%3AKonradLinden\">including Konrad</a>.</p>\n\n<p>I hope we make it in time.</p>",
      "summary": "What started out in 2016 on Twitter became a (small) award winning decade long collaborative project. Unfortunately, the future is not clear. We are at odds if it will survice the growth of Wikidata and in particularly the SPARQL graph split. To be clear, the choice for Blazegraph initially worked great, but after it was bought by a big company, developed halted. Very unfortunate for Wikidata. Unlike earlier, we no longer have funding, and rewriting Scholia at this scale takes a good bit of effort. We already held a few hackathons.",
      
      "date_published": "2025-12-08T00:00:00+00:00",
      "date_modified": "2025-12-08T00:00:00+00:00",
      "tags": ["scholia","wikidata","rdf","sparql"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/s7vw2-r7y02",
      "url": "https://chem-bla-ics.linkedchemistry.info/2025/11/30/wikipathways-curation-reports-on-profile-pages.html",
      "title": "WikiPathways curation reports on profile pages",
      "content_html": "<p>I have been running automated curation tests for many years now, at least <a href=\"https://chem-bla-ics.linkedchemistry.info/2018/10/11/two-presentations-at-wikipathways-2018.html\">from before 2018</a>.\nBecause it has been done without funding, it has not been as nicely integrated, and depends, for example, first on the RDF generation to be integrated\nin the GitHub Action. So, I still run them regularly (often in the morning during breakfast). Meanwhile, the <a href=\"https://www.wikipathways.org/wikipathways-collection/index2\">curation tests</a>\nhelp the project to monitor and maintain the quality of the pathways. The curation reports have been integrated into pathway pages for some\ntime now.</p>\n\n<p><img src=\"/assets/images/wpCurationBadge.png\" alt=\"\" /></p>\n\n<p>We have now integrated this curation badge into the author and community pages on the (not so) <a href=\"https://www.wikipathways.org/\">new WikiPathways website</a>\ntoo. Authors can now find curation reports for pathways they started and also for the community pages:</p>\n\n<p><img src=\"/assets/images/4a0a20557574c3ae.png\" alt=\"\" /></p>\n\n<p>A second new feature is the “Citations” tab on both pages, which link to <a href=\"https://europepmc.org/\">Europe PMC</a>\nwith a dedicated search for articles mentioning those author or community pathways:</p>\n\n<p><img src=\"/assets/images/270716cef8d30481.png\" alt=\"\" /></p>\n\n<p>We hope you like it!</p>",
      "summary": "I have been running automated curation tests for many years now, at least from before 2018. Because it has been done without funding, it has not been as nicely integrated, and depends, for example, first on the RDF generation to be integrated in the GitHub Action. So, I still run them regularly (often in the morning during breakfast). Meanwhile, the curation tests help the project to monitor and maintain the quality of the pathways. The curation reports have been integrated into pathway pages for some time now.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/wpCurationBadge.png",
      "date_published": "2025-11-30T00:00:00+00:00",
      "date_modified": "2025-11-30T00:00:00+00:00",
      "tags": ["wikipathways","curation","europepmc"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/v9c2y-4f248",
      "url": "https://chem-bla-ics.linkedchemistry.info/2025/10/15/20-years-of-blogging.html",
      "title": "20 years of blogging",
      "content_html": "<p>Today, exactly 20 years ago I <a href=\"https://chem-bla-ics.linkedchemistry.info/2005/10/15/chem-bla-ics.html\">started this blog</a>.\nTwo years ago I decided to <a href=\"https://chem-bla-ics.linkedchemistry.info/2023/07/27/archiving-and-updating-my-blog.html\">upgrade my blog</a>\nto one with version control. A decision I am still very excited about. It allowed me to start innovating my blog again.\nAs part of this, and following the step <a href=\"https://larsgw.blogspot.com/\">Lars</a> <a href=\"https://legacy.rogue-scholar.org/blogs/syntaxus_baccata\">took ealier</a>,\nI registered my blog with <a href=\"https://rogue-scholar.org/\">Rogue Scholar</a> and I started migrating blog posts from\n<em>blogger.com</em> to my new location. I completed the years 2005, 2006, 2007, and 2008 now.</p>\n\n<p>I decided to do this manually, which gives me the opportunity to reflect on my 20 years of blogging and 20 years of\nresearch. It has been an utter joy to read about all the open science and all the innovation that happened around\nthose times, and all the collaborations. For example, the fun of <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/08/25/chemical-blogspace.html\">Postgenomics and Chemical blogspace</a>.\nThe early semantic web discussions, FOAF, rewrite of JChemPaint, use of the Chemical Markup Language, and so much more.</p>\n\n<p>At the same time, the two years of Rogue Scholar give new opportunities. Starting with DOIs, blog PDFs and ePUBs,\nto JSON feeds, grant acknowledgements, and CiTO annotations. It also allows me to update links (e.g. to the blog\nof Peter Murray-Rust) and to properly cite other blogs (e.g. that of Rich Apodaca and Henry Rzepa).</p>\n\n<p>I think science blogging is alive and kicking and looking forward to another 20 years of blogging\nand another 20 years of science dissemamination innovation!</p>",
      "summary": "Today, exactly 20 years ago I started this blog. Two years ago I decided to upgrade my blog to one with version control. A decision I am still very excited about. It allowed me to start innovating my blog again. As part of this, and following the step Lars took ealier, I registered my blog with Rogue Scholar and I started migrating blog posts from blogger.com to my new location. I completed the years 2005, 2006, 2007, and 2008 now.",
      
      "date_published": "2025-10-15T00:00:00+00:00",
      "date_modified": "2025-10-15T00:00:00+00:00",
      "tags": ["blog"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/4ce2c-fxh02",
      "url": "https://chem-bla-ics.linkedchemistry.info/2025/09/28/25-years-of-the-chemistry-development-kit.html",
      "title": "25 years of the Chemistry Development Kit",
      "content_html": "<p>Twenty five years ago the <a href=\"https://cdk.github.io/\">Chemistry Development Kit</a> (CDK) was founded. The Chemistry and Internet (<a href=\"https://www.google.com/search?q=ChemInt2000\">ChemInt2000</a>)\nhad just ended (it ran from 23 to 26 September) and my friend and I had taken the Amtrak night train from Washington to South Bend. At that time there\nwere two leading Java applets for chemistry, <a href=\"https://jchempaint.github.io/\">JChemPaint</a> and <a href=\"http://jmol.org/\">Jmol</a>. I had hacked Chemical Markup\nLanguage support into both of them, and <a href=\"https://chemistry.nd.edu/people/dan-gezelter/\">Dan Gezelter</a> (Jmol and <a href=\"https://openscience.org/\">openscience.org</a>),\n<a href=\"http://www.steinbeck-molecular.de/steinblog/\">Christoph Steinbeck</a> (JChemPaint), and me took the opportunity of being in North America\nto discuss if we could use a common code base. Chris’ <em>compchem</em> had done something similar. Peter Murray-Rust, who had also attended ChemInt2000\nlike me and Chris did not attend.</p>\n\n<p>I do not remember exactly, but I guess we must have met on the 28th and 29th? Maybe already on Wednesday. During this meeting we discussed a common\ndata model (yes, Jmol used the CDK data model at some point) and somewhere during the meeting we wrote down a name for the project. There was the\nJava Development Kit, so this could be the Chemistry Development Kit. The name stuck.</p>\n\n<p>A quick post like this cannot do credit to the history of the CDK, nor of everyone involved in the past or still is. You can browse some of the history\nof the CDK in <a href=\"https://chem-bla-ics.linkedchemistry.info/tag/cdk\">my blog</a> and in <a href=\"http://www.steinbeck-molecular.de/steinblog/index.php/category/chemistry-development-kit/\">Chris’ blog</a>.\nIt has been an amazing journey and with a small grant just behind us (with  Alyanne de Haan, René van der Ploeg, and Marc Teunis from Hogeschool Utrecht),\nand all the awesome things ongoing (new JChemPaint, various extensions, upgraded downstream tools), the CDK is alive and kicking.</p>\n\n<p>A huge congrats and thanks to everyone (and every company and organization) who contributed code to the CDK with this huge milestone. There are a few people\nthat I want to particularly thank (see the AUTHORS file for all names): Chris, who in the late nineties made a difference with open source in chemistry,\nDan, for Jmol and hosting this memorable meeting at Notre Dame University, Rajarshi Guha, who operated <em>CDK Nightly</em> for many years, well before Travis\nand Google Actions, Stefan, Miguel, Gilleain, and Christian, for many years of contributions to the CDK, and John Mayfield, the current\nCDK release manager.</p>",
      "summary": "Twenty five years ago the Chemistry Development Kit (CDK) was founded. The Chemistry and Internet (ChemInt2000) had just ended (it ran from 23 to 26 September) and my friend and I had taken the Amtrak night train from Washington to South Bend. At that time there were two leading Java applets for chemistry, JChemPaint and Jmol. I had hacked Chemical Markup Language support into both of them, and Dan Gezelter (Jmol and openscience.org), Christoph Steinbeck (JChemPaint), and me took the opportunity of being in North America to discuss if we could use a common code base. Chris’ compchem had done something similar. Peter Murray-Rust, who had also attended ChemInt2000 like me and Chris did not attend.",
      
      "date_published": "2025-09-28T00:00:00+00:00",
      "date_modified": "2025-09-28T00:00:00+00:00",
      "tags": ["cdk","jchempaint","jmol","openscience","chemistry"],
      
      "_funding": [{"award": { "title" : "The Chemistry Development Kit in 2024: improving cheminformatics research", "acronym" : "CDK2024", "uri" : "drc.filenumber:osf232097" }, "funder": { "name": "Dutch Research Council", "ror": "04jsz6e67" } }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/t7kmh-fc360",
      "url": "https://chem-bla-ics.linkedchemistry.info/2025/08/24/the-enanomapper-project-deliverables.html",
      "title": "The eNanoMapper project deliverables",
      "content_html": "<p>This is a bit of an administrative post and historic, but keep coming back to the question, where are all the deliverable\nof the (past) project. Now, since many <a href=\"https://chem-bla-ics.linkedchemistry.info/tag/enanomapper\">eNanoMapper</a> project\ndeliverables were public, we were able to release most of them on Zenodo. This post makes an overview.</p>\n\n<p>To former partners and collaborators, thanks for this awesome journey!</p>\n\n<h2 id=\"work-package-1\">Work Package 1</h2>\n\n<p>WP1 was about community interaction and developeing solutions to support the <a href=\"https://www.nanosafetycluster.eu/\">NanoSafety Cluster</a>.</p>\n\n<ul>\n  <li>D1.1 <strong>Requirements Analysis and System Design</strong> (<a href=\"https://zenodo.org/record/375623\">10.5281/zenodo.375623</a>) (CC-BY 4.0 Int)</li>\n  <li>D1.2 <strong>Use Cases and Test Suite</strong> (<a href=\"https://zenodo.org/record/375624\">10.5281/zenodo.375624</a>) (CC-BY 4.0 Int)</li>\n  <li>D1.3 <strong>Sustainability Plan</strong> (not public)</li>\n  <li>D1.4 <strong>User Guidance</strong> (<a href=\"https://zenodo.org/record/375630\">10.5281/zenodo.375630</a>) (CC-BY 4.0 Int)</li>\n  <li>D1.5 <strong>Evaluation</strong> (not public)</li>\n</ul>\n\n<h2 id=\"work-package-2\">Work Package 2</h2>\n\n<p>But the grant call already described one specific need, a common language, so WP2 was all about the eNanoMapper Ontology (doi:<a href=\"https://doi.org/10.1186/S13326-015-0005-5\">10.1186/S13326-015-0005-5</a>).</p>\n\n<ul>\n  <li>D2.1 <strong>Framework and Infrastructure for Ontology development, versioning and dissemination</strong> (<a href=\"https://zenodo.org/record/375633\">10.5281/zenodo.375633</a>) (CC-BY 4.0 Int)</li>\n  <li>D2.2 <strong>Ontology Content Types and Existing Community efforts</strong> (<a href=\"https://zenodo.org/record/375634\">10.5281/zenodo.375634</a>) (CC-BY 4.0 Int)</li>\n  <li>D2.3 <strong>Ontology initial release</strong> (<a href=\"https://zenodo.org/record/375635\">10.5281/zenodo.375635</a>) (CC-BY 4.0 Int)</li>\n  <li>D2.4 <strong>Ontology final release</strong> (<a href=\"https://zenodo.org/record/375636\">10.5281/zenodo.375636</a>) (CC-BY 4.0 Int)</li>\n</ul>\n\n<h2 id=\"work-package-3\">Work Package 3</h2>\n\n<p>And the common language had as goal to support data and knowledge exchange, so WP3 was about the eNanoMapper Database (doi:<a href=\"https://doi.org/10.3762/bjnano.6.165\">10.3762/bjnano.6.165</a>).</p>\n\n<ul>\n  <li>D3.1 <strong>Technical Specification and initial implementation of the protocol and data management web services</strong> (<a href=\"https://zenodo.org/record/375637\">10.5281/zenodo.375637</a>) (CC-BY 4.0 Int)</li>\n  <li>D3.2 <strong>Data Management System with extended search capabilities</strong> (<a href=\"https://zenodo.org/record/375639\">10.5281/zenodo.375639</a>) (CC-BY 4.0 Int)</li>\n  <li>D3.3 <strong>Modules and services for linking and integration with third party databases</strong> (<a href=\"https://zenodo.org/record/375813\">10.5281/zenodo.375813</a>) (CC-BY 4.0 Int)</li>\n  <li>D3.4 <strong>ISA-Tab templates for common bioselected set of assays</strong> (<a href=\"https://zenodo.org/record/375814\">10.5281/zenodo.375814</a>) (CC-BY 4.0 Int)</li>\n</ul>\n\n<h2 id=\"work-package-4\">Work Package 4</h2>\n\n<p>WP4 and WP5 were aimed at impact, where WP4 focused on safe-by-design or AI for materials.</p>\n\n<ul>\n  <li>D4.1 <strong>Analysis and Modelling Specifications</strong> (<a href=\"https://zenodo.org/record/346000\">10.5281/zenodo.346000</a>) (CC-BY 4.0 Int)</li>\n  <li>D4.2 <strong>Descriptor Calculation Algorithms and Methods</strong> (<a href=\"https://zenodo.org/record/375609\">10.5281/zenodo.375609</a>) (CC-BY 4.0 Int)</li>\n  <li>D4.3 <strong>nQSAR Modelling infrastructure</strong> (<a href=\"https://zenodo.org/record/375610\">10.5281/zenodo.375610</a>) (CC-BY 4.0 Int)</li>\n  <li>D4.4 <strong>Mechanism-ofaction Modelling Tools</strong> (<a href=\"https://zenodo.org/record/375613\">10.5281/zenodo.375613</a>) (CC-BY 4.0 Int)</li>\n  <li>D4.5 <strong>Design of experiments and inter-laboratory testing facilities</strong> (<a href=\"https://zenodo.org/record/375616\">10.5281/zenodo.375616</a>) (CC-BY 4.0 Int)</li>\n  <li>D4.6 <strong>Tools for generating QMRF and QPRF reports</strong> (<a href=\"https://zenodo.org/record/375619\">10.5281/zenodo.375619</a>) (CC-BY 4.0 Int)</li>\n</ul>\n\n<h2 id=\"work-package-5\">Work Package 5</h2>\n\n<p>This was perhaps a separate unique innovation: WP5 was not clearly defined in the outcome. We were going to ask\nthe community, and use results from WP2, WP3, and WP4 to help other NanoSafety Cluster projects. For example, the collaboration\nwith NanoReg was positioned in this WP.</p>\n\n<ul>\n  <li>D5.1 <strong>Integrated Issue Management and Testing system</strong> (not public)</li>\n  <li>D5.2 <strong>User registration, authentication and authorisation</strong> (not public)</li>\n  <li>D5.3 <strong>User applications for importing NanoWiki data</strong> (<a href=\"https://zenodo.org/record/345979\">10.5281/zenodo.345979</a>) (CC-BY 4.0 Int)</li>\n  <li>D5.4 <strong>User application for importing NanoSafety Cluster data</strong> (<a href=\"https://zenodo.org/record/345994\">10.5281/zenodo.345994</a>) (CC-BY 4.0 Int)</li>\n  <li>D5.5 <strong>User application for searching and downloading eNanoMapper data</strong> (<a href=\"https://zenodo.org/record/345995\">10.5281/zenodo.345995</a>) (CC-BY 4.0 Int)</li>\n  <li>D5.6 <strong>User Application for Conformance to Reporting and Curation Standards</strong> (<a href=\"https://zenodo.org/record/345997\">10.5281/zenodo.345997</a>) (CC-BY 4.0 Int)</li>\n  <li>D5.7 <strong>Final report on User Applications</strong> (<a href=\"https://zenodo.org/record/322296\">10.5281/zenodo.322296</a>) (CC-BY 4.0 Int)</li>\n</ul>\n\n<h2 id=\"work-package-6\">Work Package 6</h2>\n\n<p>This was the management++ work package.</p>\n\n<ul>\n  <li>D6.1 <strong>eNanoMapper Year 1 Dissemination Report</strong> (<a href=\"https://zenodo.org/record/345960\">10.5281/zenodo.345960</a>) (CC-BY 4.0 Int)</li>\n  <li>D6.2 <strong>eNanoMapper Year 2 Dissemination Report</strong> (<a href=\"https://zenodo.org/record/345974\">10.5281/zenodo.345974</a>) (CC-BY 4.0 Int)</li>\n  <li>D6.3 <strong>eNanoMapper Tutorials</strong> (<a href=\"https://zenodo.org/record/345975\">10.5281/zenodo.345975</a>) (CC-BY 4.0 Int)</li>\n  <li>D6.4 <strong>eNanoMapper Community Development Report</strong> (not public)</li>\n  <li>D6.5 <strong>eNanoMapper Exploitation Report</strong> (not public)</li>\n  <li>D6.6 <strong>eNanoMapper Final Dissemination Report</strong> (<a href=\"https://zenodo.org/record/345976\">10.5281/zenodo.345976</a>) (CC-BY 4.0 Int)</li>\n</ul>\n\n<p>I will probably do this in the future for other projects I have been involved in, but will\nthen maybe focus on deliverables I have co-authored.</p>",
      "summary": "This is a bit of an administrative post and historic, but keep coming back to the question, where are all the deliverable of the (past) project. Now, since many eNanoMapper project deliverables were public, we were able to release most of them on Zenodo. This post makes an overview.",
      
      "date_published": "2025-08-24T00:00:00+00:00",
      "date_modified": "2025-08-24T00:00:00+00:00",
      "tags": ["enanomapper"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.3762/BJNANO.6.165", "doi": "10.3762/BJNANO.6.165"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1186/S13326-015-0005-5", "doi": "10.1186/S13326-015-0005-5"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.5281/ZENODO.375623", "doi": "10.5281/ZENODO.375623"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.5281/ZENODO.375624", "doi": "10.5281/ZENODO.375624"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.5281/ZENODO.375630", "doi": "10.5281/ZENODO.375630"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.5281/ZENODO.375633", "doi": "10.5281/ZENODO.375633"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.5281/ZENODO.375634", "doi": "10.5281/ZENODO.375634"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.5281/ZENODO.375635", "doi": "10.5281/ZENODO.375635"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.5281/ZENODO.375636", "doi": "10.5281/ZENODO.375636"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.5281/ZENODO.375637", "doi": "10.5281/ZENODO.375637"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.5281/ZENODO.375639", "doi": "10.5281/ZENODO.375639"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.5281/ZENODO.375813", "doi": "10.5281/ZENODO.375813"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.5281/ZENODO.375814", "doi": "10.5281/ZENODO.375814"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.5281/ZENODO.346000", "doi": "10.5281/ZENODO.346000"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.5281/ZENODO.375609", "doi": "10.5281/ZENODO.375609"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.5281/ZENODO.375610", "doi": "10.5281/ZENODO.375610"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.5281/ZENODO.375613", "doi": "10.5281/ZENODO.375613"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.5281/ZENODO.375616", "doi": "10.5281/ZENODO.375616"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.5281/ZENODO.375619", "doi": "10.5281/ZENODO.375619"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.5281/ZENODO.345979", "doi": "10.5281/ZENODO.345979"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.5281/ZENODO.345994", "doi": "10.5281/ZENODO.345994"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.5281/ZENODO.345995", "doi": "10.5281/ZENODO.345995"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.5281/ZENODO.345997", "doi": "10.5281/ZENODO.345997"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.5281/ZENODO.322296", "doi": "10.5281/ZENODO.322296"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.5281/ZENODO.345960", "doi": "10.5281/ZENODO.345960"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.5281/ZENODO.345974", "doi": "10.5281/ZENODO.345974"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.5281/ZENODO.345975", "doi": "10.5281/ZENODO.345975"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.5281/ZENODO.345976", "doi": "10.5281/ZENODO.345976"
             }
            
          
        ],
      
      "_funding": [{"award": { "title" : "eNanoMapper - A Database and Ontology Framework for Nanomaterials Design and Safety Assessment", "acronym" : "eNanoMapper", "uri" : "cordis.project:604134" }, "funder": { "name": "European Commission", "ror": "00k4n6c32" } }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/hr4y6-kwq16",
      "url": "https://chem-bla-ics.linkedchemistry.info/2025/08/18/ai-technologies-in-academia.html",
      "title": "AI Technologies in Academia",
      "content_html": "<p>I have had the <a href=\"https://openletter.earth/open-letter-stop-the-uncritical-adoption-of-ai-technologies-in-academia-b65bba1e\">Open Letter: Stop the Uncritical Adoption of AI Technologies in Academia</a>\nfrom June 27 open for some time now. I thought I wanted to sign it, but got stuck on the first paragraphs multiple times:</p>\n\n<blockquote>\n  <p>With this letter we take a principled stand against the proliferation of so-called ‘AI’ technologies in universities. As an educational institution,\nwe cannot condone the uncritical use of AI by students, faculty, or leadership. We also call for reconsidering any direct financial relationships\nbetween Dutch universities and AI companies.The unfettered introduction of AI technology leads to contravention of the spirit of the EU Al act. It\nundermines our basic pedagogical values and the principles of scientific integrity. It prevents us from maintaining our standards of independence\nand transparency. And most concerning, AI use has been shown to hinder learning and deskill critical thought.</p>\n</blockquote>\n\n<p>These few lines contain for me more than 25 years of research and I know the complexities. Before I can co-sign this letter,\nI need to understand the details. There is no definition of ‘AI’ here and it mentiones the <a href=\"https://eur-lex.europa.eu/legal-content/NL/TXT/?uri=CELEX:32024R1689\">EU AI Act</a>\n(I guess, the letter actually writes “Al” (with an <code class=\"language-plaintext highlighter-rouge\">l</code> of letter) act, I notice now after I read the content in another font),\nbut I have not read the EU AI Act yet (it is 144 pages of legal text).</p>\n\n<h2 id=\"the-legal-context-of-the-open-letter\">The legal context of the Open Letter</h2>\n\n<p>Let me first say, I am not a lawyer (IANAL). I am not versed in the specific legal definitions of tightly defined and controlled\nwords.</p>\n\n<p>Reading the <em>EU AI Act</em>, I read a reassuring opening statement (repeated later with more context, links to other laws, etc):</p>\n\n<blockquote>\n  <p>to promote the uptake of human centric and trustworthy artificial intelligence (AI) while ensuring a high level of protection\nof health, safety, fundamental rights as enshrined in the Charter of Fundamental Rights of the European Union (the ‘Charter’),\nincluding democracy, the rule of law and environmental protection, to protect against the harmful effects of AI systems in the Union</p>\n</blockquote>\n\n<p>We clearly see how these things are currently routinely violated.</p>\n\n<blockquote>\n  <p>This Regulation does not apply to AI systems or AI models, including their output, specifically developed and put into service\nfor the sole purpose of scientific research and development.</p>\n</blockquote>\n\n<p>In Dutch this is officially translated to “wetenschappelijk onderzoek”, so <em>scientific research</em> seems to be legally\nincluding humanites, etc, and not limited to natural sciences [citation needed].</p>\n\n<p>The EU AI Act also outlines a definition of “AI”, leaning towards machine learning, but the border between deterministic,\nrule-based algorithms and machine-learned patters for predictions remains a bit vague to me. But I can live with it.</p>\n\n<p>The Open Letter’s <em>contravention of the spirit of the EU Al act</em> gets context here too. It has to be the <em>spirit</em>,\nbecause the law does not apply to academia. Good, clarified. The Letter continues with:</p>\n\n<blockquote>\n  <p>It undermines our basic pedagogical values and the principles of scientific integrity.\nIt prevents us from maintaining our standards of independence and transparency. And most concerning, AI use has been\nshown to hinder learning and deskill critical thought.</p>\n</blockquote>\n\n<p>Yes, that clearly links to the EU AI Act’s protection of rights. Maybe on purpose and maybe there are legal reasons\nto not explicitly list them, are the international human rights, which includes rigths to benefit from science,\nbut I think this is still in the spirit of the EU AI Act. And if AI fetters our ability to learn (yes, there\nis scientific evidence for that [citation needed]), then it violates the EU AI Act (IANAL).</p>\n\n<h2 id=\"what-the-open-letter-expects\">What the Open Letter expects</h2>\n\n<p>The next part of the Open Letter calls to what the signers expect from our universities. I will will reflect on each of them.</p>\n\n<blockquote>\n  <p><strong>Resist the introduction of AI in our own software systems</strong>, from Microsoft to OpenAI to Apple. It is not in our interests\nto let our processes be corrupted and give away our data to be used to train models that are not only useless to us, but\nalso harmful.</p>\n</blockquote>\n\n<p>The intrinsic problem and why I think it is fair to call out these companies, is, as the letter explains, there\nis an clear conflict of interest. The goal of companies is to make profit (and in a Western world, as much\nas possible), and not any of the human or scientific needs. In this respect, companies like Elsevier\ncould just as well have mentioned too (see e.g. <a href=\"https://irisvanrooijcogsci.com/2025/08/12/ai-slop-and-the-destruction-of-knowledge/\">this post by Prof. Van Rooij</a>,\nactually 2nd signature on the letter).</p>\n\n<blockquote>\n  <p><strong>Ban AI use in the classroom</strong> for student assignments, in the same way we ban essay mills and other forms\nof plagiarism. Students must be protected from de-skilling and allowed space and time to perform their\nassignments themselves.</p>\n</blockquote>\n\n<p>About a year ago, I was pleasently surprised by the depth of discussion at Maastricht University on how and when\nto use AI, and by default not. This one is really complicated and it matters when and how the AI is used.\nAfter all, and the spirit of the EU AI Act expects us to use AI in research (to trigger innovation). So,\nI cannot agree with the literal statement, but I fully agree with the spirit. Particularly combined with\nthe clear “Stop the Uncritical Adoption of AI Technologies in Academia” of the title of the Open Letter.</p>\n\n<p>I read this line like this, AI in the classroom must have a purpose that aligns with the EU AI Act.\nThat means, use for writing assays, reports, it must not be used. I am old enough that remember the\nacademic discussions (at Radboud University) about writing and the clear hesitance among scholars\nabout the use of written assignments: “I want to test their scientific knowledge and reasoning skills,\nnot their ability to write narratives”. And LLMs, like ChatGPT but also the European, more open variants,\nthey write narratives, so the written report and assay is no longer a valid way to assess a student’s\nscientific learning progress.</p>\n\n<p>So, alternatively, we should very carefully and scientifically evaluate which forms of assessment\nwe perform, and banning AI in the classroom may just be distracting from a more fundamental problem.\nAnyways… if you continue using writing assignments to test progress in learning, you must ban\nuse of AI in that process. You must be testing the student, not some piece of software (as a teaching\ninstitute).</p>\n\n<blockquote>\n  <p><strong>Cease normalising the AI hype</strong> and the lies which are prevalent in the technology industry’s framing of\nthese technologies. The technologies do not have the advertised capacities and their adoption puts students\nand academics at risk of violating ethical, legal, scholarly, and scientific standards of reliability,\nsustainability, and safety.</p>\n</blockquote>\n\n<p>Sounds like a no brainer. But I too find my own university uncritically promoting AI. Maybe the tested\nit well, and just forgot to share that. But hey, scientifical quality goes all ways.</p>\n\n<blockquote>\n  <p><strong>Fortify our academic freedom</strong> as university staff to enforce these principles and standards in our\nclassrooms and our research as well as on the computer systems we are obliged to use as part of our\nwork. We as academics have the right to our own spaces.</p>\n</blockquote>\n\n<p>Again, a no brainer. But important to add. It must be said as it is intrisic part of\n<a href=\"https://recognitionrewards.nl/\">Recognition &amp; Rewards</a>. If you cannot guarantee academic freedom,\nthere there is something seriously wrong with your R&amp;R.</p>\n\n<blockquote>\n  <p><strong>Sustain critical thinking on AI</strong> and promote critical engagement with technology on a firm\nacademic footing. Scholarly discussion must be free from the conflicts of interest caused by\nindustry funding, and reasoned resistance must always be an option.</p>\n</blockquote>\n\n<p>Yeah, this is something that is underestimated. Part of our academic teaching is this critical thinking.\nIt returns in academic reading (did you already read <em>“What Little Red Riding Hood Can Teach Us about Reading Science”</em>,\ndoi:<a href=\"https://uplopen.com/chapters/e/10.1515/9783110782844-010\">10.1515/9783110782844-010</a>,\nby <a href=\"https://scholar.google.com/citations?user=0KRmIbcAAAAJ&amp;hl=nl&amp;oi=ao\">Monica Gonzalez-Marquez</a> <em>et al.</em>?),\nscientific programming, data analysis, and our teaching has been\nlacking here. Not just for new AI forms, but also for the old algorihmts. I have seen this, and\nscientific literature is riddled with mistakes, just because our peer reviewers are not sufficiently\nskilled. This will take effort. I know, it was a major part of\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2008/03/01/todo-april-2nd-defend-my-phd-work.html\">my PhD thesis</a>.</p>\n\n<p>Of course, this is exactly why I have been so active in Open Science. Without Open Science,\nwe cannot work <em>in the spirit</em> of the EU AI Act. It’s nothing new. It’s just that the big money\nhas found in AI a way to profit at the expense of humans.</p>\n\n<p>So, go read that <a href=\"https://openletter.earth/open-letter-stop-the-uncritical-adoption-of-ai-technologies-in-academia-b65bba1e\">Open Letter</a>\nand sign too!</p>",
      "summary": "I have had the Open Letter: Stop the Uncritical Adoption of AI Technologies in Academia from June 27 open for some time now. I thought I wanted to sign it, but got stuck on the first paragraphs multiple times:",
      
      "date_published": "2025-08-18T00:00:00+00:00",
      "date_modified": "2025-08-18T00:00:00+00:00",
      "tags": ["cheminf","chemometrics","openscience"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1515/9783110782844-010", "doi": "10.1515/9783110782844-010"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/r4mbg-yyb06",
      "url": "https://chem-bla-ics.linkedchemistry.info/2025/08/13/integrating-comments-via-the-fediverse.html",
      "title": "Integrating comments via the Fediverse",
      "content_html": "<p>My <a href=\"https://chem-bla-ics.linkedchemistry.info/2023/07/27/archiving-and-updating-my-blog.html\">old blog</a>\nhad (has) comments via the Blogger.com platform, but I did not have anything for the new blog. A couple of options\nare used, like Disqus and <a href=\"https://skerdiberberi.com/blog/utterances\">comments via GitHub</a>. However, these both\nhave the downside of a <a href=\"https://en.wikipedia.org/wiki/Vendor_lock-in\">vendor lock-in</a> and the whole point of moving\nmy blog was to break out from such lock-ins.</p>\n\n<p>Almost half a year ago I read <a href=\"https://tzovar.as/commenting/\">this post</a> by <a href=\"https://tzovar.as/about/\">Bastian</a>\nabout, well, <em>integrating comments via the Fediverse</em>. He explains a solution worked out in 2021 by\n<a href=\"https://fossacademic.tech/2021/12/16/CommentsTest.html\">Robert W. Gehl</a>. I liked this idea and had the blog\npost bookmarked for some time. But I did not get around working it <a href=\"https://github.com/egonw/blog2/commit/006b41822b46a935d159ca276cfab3a6a3b00e40\">out until Monday</a>.\nI had to do some <a href=\"https://github.com/egonw/blog2/commit/cc4f81db22d69d928352ee06ecad889286ae27bf\">CSS fixing</a>\nafter that, but I can now have fediverse-powered comments in my blog by adding a bit of YAML, like this:</p>\n\n<div class=\"language-yaml highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"na\">comments</span><span class=\"pi\">:</span>\n  <span class=\"na\">host</span><span class=\"pi\">:</span> <span class=\"s\">social.edu.nl</span>\n  <span class=\"na\">username</span><span class=\"pi\">:</span> <span class=\"s\">egonw</span>\n  <span class=\"na\">id</span><span class=\"pi\">:</span> <span class=\"m\">115009169485329450</span>\n</code></pre></div></div>\n\n<p>I have annotated a few posts now, and will have some curation to do. But when I have made such a link,\nthe this is what it will look like by default:</p>\n\n<p><img src=\"/assets/images/fedicomments1.png\" alt=\"\" /></p>\n\n<p>After clicking the button (I want to style that a bit better), it shows the fediverse reactions:</p>\n\n<p><img src=\"/assets/images/fedicomments2.png\" alt=\"\" /></p>\n\n<p>There are some things that I like to improve. For example, clicking the <em>reply</em>, <em>boost</em>, or <em>like</em>\nbuttons doesn’t do anything yet. But there is code around that will show a popup box to redirect you\nto your home fediserver. Let’s see how this evolves.</p>",
      "summary": "My old blog had (has) comments via the Blogger.com platform, but I did not have anything for the new blog. A couple of options are used, like Disqus and comments via GitHub. However, these both have the downside of a vendor lock-in and the whole point of moving my blog was to break out from such lock-ins.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/fedicomments2.png",
      "date_published": "2025-08-13T00:00:00+00:00",
      "date_modified": "2025-08-13T00:00:00+00:00",
      "tags": ["blog","mastodon"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.59350/pr6zx-10397", "doi": "10.59350/pr6zx-10397"
            , "cito":
              
              
                [ 
                  "usesMethodIn"
                  
                 ]
              
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/2ss5b-jpr33",
      "url": "https://chem-bla-ics.linkedchemistry.info/2025/08/11/the-internet-journal-of-chemistry.html",
      "title": "The Internet Journal of Chemistry",
      "content_html": "<p>The <a href=\"https://scholia.toolforge.org/topic/Q27211732\">Internet Journal of Chemistry</a> (IJC, issn:1099-8292) was one of the first scientific journals to get\npublished on the world wide web (part of <em>the Internet</em>), see doi:<a href=\"https://doi.org/10.1080/00987913.2000.10764578\">10.1080/00987913.2000.10764578</a>.\nIssues were published from 1998 to 2004. But because it predates\nsystematic archiving of webpages by libraries, a lot is lost. The nature of the journal, however, makes it unique, and quite\na number of articles are cited a lot, and should be part of the <em>scientific record</em>.\nBut I soon realized it actually is quite hard to track down content of the journal. I knew some articles have been\n<em>author accepted manuscripts</em> online. One of that was my own first (and single) author-article, self-archived on\nZenodo (doi:<a href=\"https://doi.org/10.5281/zenodo.1495470\">10.5281/zenodo.1495470</a>), green open access style.</p>\n\n<p>I wanted to see what I could recover, and here I describe what I did and what could be done next.</p>\n\n<h2 id=\"a-list-of-all-articles\">A list of all articles</h2>\n\n<p>The first step is actually to create a list of all articles published in the IJC and collect as much metadata about\nthem as possible. With just over 100 articles, I decided to use Wikidata, as a machine-readable database, supporting the curation and reporting. I wanted at least\ntwo independent sources, and for Wikidata, use public resources. That means, while Web of Science does have a list of\nall articles, I only used this for validation, and <strong>not</strong> as information source. Instead, I used citations to IJC\narticles and, of course, the Internet Archive (IA). It turns out <a href=\"https://web.archive.org/web/*/http://www.ijc.com/abstracts/*\">a query like this</a>\ndoes wonders (well, for the abstracts; I did not find full-texts archived on IA):</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>https://web.archive.org/web/*/http://www.ijc.com/abstracts/*\n</code></pre></div></div>\n\n<p>I found that all but one article had the abstract archived in the IA. Here’s <a href=\"https://web.archive.org/web/20000925050415/http://www.ijc.com/abstracts/abstract2n8.html\">an example</a>:</p>\n\n<p><img src=\"/assets/images/ia_ijc_abstract.png\" alt=\"\" /></p>\n\n<p>This gave my a lot of information to add to Wikidata. Title, publication date, volume, article number, keywords, an absstract,\nand, of course, the list of authors. Some authors I know personally, many I did not. But it did allow me to enter all\narticles to Wikidata along with the authors and “author” (<a href=\"https://www.wikidata.org/wiki/Property:P50\">P50</a>) or\n“author name string” (<a href=\"https://www.wikidata.org/wiki/Property:P2093\">P2093</a>).</p>\n\n<h2 id=\"the-article-authors\">The article authors</h2>\n\n<p>It also turned out that multiple authors listed their IJC article on their public ORCID profile.\nThat greatly helped identification. I managed to <a href=\"https://w.wiki/Ezda\">link many authors</a> to mostly existing Wikidata items:</p>\n\n<p><img src=\"/assets/images/ijc_authors.png\" alt=\"\" /></p>\n\n<p>I already mentioned that I used Wikidata to collect this information. Besides the <a href=\"https://scholia.toolforge.org/venue/Q27211732\">interactive visualization with Scholia</a>,\nit also gave me the option to track my progress with SPARQL queries. For example, <a href=\"https://w.wiki/Ezdf\">this query</a> helped\nme do that author FAIR-ification:</p>\n\n<p><img src=\"/assets/images/ijc_sparql1.png\" alt=\"\" /></p>\n\n<p>You can see here two columns with author information, one for P50 and the other for P2093. There is quite some\nidentification left to be done, and additional information is welcome:</p>\n\n<p><img src=\"/assets/images/ijc_sparql2.png\" alt=\"\" /></p>\n\n<h2 id=\"sources\">Sources</h2>\n\n<p>So, that brings us to this list of sources:</p>\n\n<ul>\n  <li>Internet Archive: abstracts and metadata</li>\n  <li>ORCID profiles: ORCIDs of (some) authors</li>\n  <li>Google Scholar: metadata and citations</li>\n  <li>Web of Science: independent list for external validation</li>\n</ul>\n\n<p>Because there is plenty of work left to be done and I hope the collected information will further spread\nin library collections, I added sources as much as possible. <a href=\"https://w.wiki/Em9i\">This query</a> lists for all\narticles the Web of Science identifier (recorded so that everyone can check the consistency), the link\nto the Internet Archive-d abstract page, and a link to a known full text (five).</p>\n\n<p>If you wonder, neither <a href=\"https://openalex.org/works?page=1&amp;filter=primary_location.source.id:s32147083\">OpenAlex</a>\nor <a href=\"https://europepmc.org/search?query=JOURNAL%3A%28%22Internet%20Journal%20of%20Chemistry%22%29\">Europe PMC</a> have a full list.</p>\n\n<h2 id=\"whats-next\">What’s next?</h2>\n\n<p>I do not have a formal training in archiving, but I am happy with the minimal viable metadata collection.\nI know more can be done (and love to hear your pointers and suggestions): more author identies,\nbetter coverage of keyword annotation, etc. But I think an important addition is adding citations\nto and from the IJC articles are important. The journal predates efforts like the <a href=\"https://i4oc.org/\">I4OC</a> and\n<a href=\"https://opencitations.net/\">Open Citations</a>, so I may have to manually recover citations from Google Scholar.\nI will have to report on that later. But you can enjoy the citations that are\n<a href=\"https://scholia.toolforge.org/venue/Q27211732#Citations\">already there</a>. And now that we have sufficient metadata,\nI can use this to find more full texts.</p>\n\n<p>Btw, I have made contact with Prof. <a href=\"https://scholia.toolforge.org/author/Q28420106\">Steven Bachrach</a>,\nwho founded the journal and was the Editor-in-Chief.</p>",
      "summary": "The Internet Journal of Chemistry (IJC, issn:1099-8292) was one of the first scientific journals to get published on the world wide web (part of the Internet), see doi:10.1080/00987913.2000.10764578. Issues were published from 1998 to 2004. But because it predates systematic archiving of webpages by libraries, a lot is lost. The nature of the journal, however, makes it unique, and quite a number of articles are cited a lot, and should be part of the scientific record. But I soon realized it actually is quite hard to track down content of the journal. I knew some articles have been author accepted manuscripts online. One of that was my own first (and single) author-article, self-archived on Zenodo (doi:10.5281/zenodo.1495470), green open access style.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/ia_ijc.png",
      "date_published": "2025-08-11T00:00:00+00:00",
      "date_modified": "2025-08-11T00:00:00+00:00",
      "tags": ["publishing","wikidata","scholia","europepmc"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.5281/ZENODO.1495470", "doi": "10.5281/ZENODO.1495470"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1080/00987913.2000.10764578", "doi": "10.1080/00987913.2000.10764578"
            , "cito":
              
              
                [ 
                  "citesAsEvidence"
                  
                 ]
              
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/krw9n-dv417",
      "url": "https://chem-bla-ics.linkedchemistry.info/2025/08/09/one-million-iupac-names-4.html",
      "title": "One Million IUPAC names #4: a lot is happening",
      "content_html": "<p>A lot is happening. If you have been following this project more closesly, you may have already seen some interesting updates, but\nI will post it here too. First, a quick recap. In March I started a new <a href=\"http://blueobelisk.org/\">Blue Obelisk</a> project to\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2025/03/08/iupac-names.html\">collect CCZero IUPAC names</a>\nfrom primary literature (paper still pending). It turned out we can automate that, while legally not violating any laws or licenses.\nIn April I reported on <a href=\"https://chem-bla-ics.linkedchemistry.info/2025/04/27/one-million-iupac-names-2-the-100-thousand-milestone.html\">some tweaks</a>\nboosting the efficiency of the use of the API. I also reported on some possible further steps, including how to use the extracted\nnames to create a larger set. Indeed, in June I could <a href=\"https://chem-bla-ics.linkedchemistry.info/2025/06/09/one-million-iupac-names.html\">report to have passed the 200k IUPAC names</a>,\nwhich with the idea from April gave us more than 1M IUPAC names.</p>\n\n<p>In this post I want to give an update.</p>\n\n<h2 id=\"275k-iupac-names\">275k IUPAC names</h2>\n\n<p>I have continued running the scripts to detect new IUPAC names in full text, open access papers in <a href=\"https://europepmc.org/\">Europe PMC</a>,\nbut something more awesome actually did much more since the <a href=\"https://chem-bla-ics.linkedchemistry.info/2025/06/09/one-million-iupac-names.html\">June post</a>:\nin July I received a <a href=\"https://github.com/BlueObelisk/iupac-names/pull/13\">pull request</a> from <a href=\"https://github.com/mnietfeld\">mnietfeld</a>\nwith more than 40 thousand unique and new IUPAC names from the <a href=\"https://www.beilstein-journals.org/bjoc/\">Beilstein Journal of Organic Chemistry</a>\n(see also <a href=\"https://www.linkedin.com/posts/beilstein-institut_openaccess-bjoc-fair-activity-7351596602660167681-0Z0r/\">their LinkedIn post</a> or\n<a href=\"https://archive.is/DZOnP\">this archived version</a> that doesn’t require an account).\nWhile Europe PMC provides these articles too (and actually one of the first I analyzed), a lot of these names come from supplementary\ninformation, not provided by Europe PMC. Thanks!</p>\n\n<p>This is focusing on names from primary literature, but there is more happening. Because I want to restrict the above project to\nnames from primary literature (and supplementary information is still that), I have not been sure what to do with other collections\nyet, and they have been coming in. I have been <a href=\"https://github.com/BlueObelisk/iupac-names/issues?q=is%3Aissue%20label%3Aother\">taking notes</a>\nin the project issue tracker, for future reference (like now, here). I have not forgotten about these!</p>\n\n<h2 id=\"other-large-collections-of-iupac-names\">Other large collections of IUPAC names</h2>\n\n<p><strong>4M, CCZero</strong><br />\nLet’s start with the news yesterday. The <a href=\"https://www.ebi.ac.uk/about/teams/chemical-biology-services/\">Chemical Biology Services team</a>\n<a href=\"https://chembl.blogspot.com/2025/08/unleashing-4-million-iupac-names-into.html\">released 4 million IUPAC names from patent literature as CCZero</a>!\nThe CCZero license/waiver makes it compatible with our list. Their Zenodo release:</p>\n\n<blockquote>\n  <p>… contains IUPAC names text-mined from patents (US, WIPO, EPO, Chinese, Japanese).</p>\n</blockquote>\n\n<p>The post also includes a nice example of the complexity of IUPAC names which makes the counting of unique names tricky:\n<code class=\"language-plaintext highlighter-rouge\">O-methylphenol</code> and <code class=\"language-plaintext highlighter-rouge\">o-methylphenol</code>. Thanks, Noel and the rest of the EMBL-EBI team!</p>\n\n<p><strong>2.3 million, CC-BY</strong><br />\nAnd then <a href=\"https://github.com/haydn-jones\">Haydn Jones</a> was one of the earliest <a href=\"https://github.com/BlueObelisk/iupac-names/issues/9\">to coin in</a>,\nand <a href=\"https://doi.org/10.5281/zenodo.15077270\">released 2.3 million IUPAC names</a> under the CC-BY license.</p>\n\n<p><strong>850k, CCZero</strong><br />\nWikidata also turnes out to have many IUPAC names. <a href=\"https://github.com/Adafede/\">Adriano</a> found more than 850 thousand IUPAC\nnames, see <a href=\"https://github.com/Adafede/wd-labels-to-iupac\">this project</a>.</p>\n\n<p>Next week I will do some comparisons of the datasets with a clear Creative Commons license.</p>\n\n<h2 id=\"even-more\">Even more</h2>\n\n<p>Beyond these five data releases, there is more. PubChem and other databses have millions of names, but often these are\ngenerated by proprietary software. These IUPAC name collections may be under some license agreement, and thus not compatible\nwith Open Science. This is why it is so important that we very clearly know where these names are coming from.</p>\n\n<p><strong>5-6 million, license unclear</strong><br />\nI also learned about <a href=\"https://chempile.lamalab.org/\">ChemPile</a> about which <a href=\"https://www.linkedin.com/in/adrian-mirza-chem/\">Adrian Mirza</a>\nexplained me it has <a href=\"https://www.linkedin.com/feed/update/urn:li:activity:7330626142611062784\">about 5-6 million IUPAC names</a>.\nBut the source of this list of names is not yet clear to me.</p>\n\n<p><strong>Names from PhD theses and preprints</strong><br />\nI also want to give a shout out to <a href=\"https://github.com/BlueObelisk/iupac-names/issues/15\">Peter Murray-Rust</a>s proposal\nto start extracting IUPAC names from PhD theses. There have been projects to extract chemistry from PhD thesis in the\npast, and this will yield a lot of unique names. Please ping Peter, if you want to get involved in his idea!</p>\n\n<h2 id=\"whats-next\">What’s next</h2>\n\n<p>I am so excited with all these efforts and very grateful with the contribution by Beilstein. I really hope more Open Science\npublishers will follow, like perhaps the Royal Society of Chemistry for which it should be easy, with their\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2007/02/01/rsc-first-publisher-to-go-semantic.html\">Project Prospect</a> background!</p>\n\n<p>I am also excited by the release by ChEMBL under CCZero. That will allow the <a href=\"https://www.wikidata.org/wiki/Wikidata:WikiProject_Chemistry\">WikiProject Chemistry</a>\nuse this for Wikidata!</p>\n\n<p>So, I have one week left to write the article about the work we started in March. The outlook is bright. I played last\nweek with the Europe PMC full text downloads and can confirm that should yield thousands of additional names from the\nfull texts. A single download file gave me more than two thousand new unique names. I think the 500k IUPAC names\nis absolutely in reach with purely the full texts from Europe PMC.</p>\n\n<p>This brings us to the end of 2025. By then, we should have a many millions of openly-licensed IUPAC names.\nAnd by March 2026, I hope we reached the 1M IUPAC names extracted from primary literature. That will require some\ncreativity and enthusiasm, but sounds feasible!</p>",
      "summary": "A lot is happening. If you have been following this project more closesly, you may have already seen some interesting updates, but I will post it here too. First, a quick recap. In March I started a new Blue Obelisk project to collect CCZero IUPAC names from primary literature (paper still pending). It turned out we can automate that, while legally not violating any laws or licenses. In April I reported on some tweaks boosting the efficiency of the use of the API. I also reported on some possible further steps, including how to use the extracted names to create a larger set. Indeed, in June I could report to have passed the 200k IUPAC names, which with the idea from April gave us more than 1M IUPAC names.",
      
      "date_published": "2025-08-09T00:00:00+00:00",
      "date_modified": "2025-09-03T00:00:00+00:00",
      "tags": ["iupac","beilstein","chembl","europepmc"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.5281/zenodo.16755947", "doi": "10.5281/zenodo.16755947"
            , "cito":
              
              
                [ 
                  "citesAsRecommendedReading"
                  
                 ]
              
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.5281/zenodo.15077270", "doi": "10.5281/zenodo.15077270"
            , "cito":
              
              
                [ 
                  "citesAsRecommendedReading"
                  
                 ]
              
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/vwd81-p8z85",
      "url": "https://chem-bla-ics.linkedchemistry.info/2025/08/06/archiving-but-not-really.html",
      "title": "Archiving, but not really",
      "content_html": "<p><a href=\"https://sauropods.win/@mike\">Mike Taylor</a> wrote up <a href=\"https://doi.org/10.59350/svpow.24000\">a post</a> about the various things a journal article is doing,\nthe first being <em>a scientific report</em>. We put a lot of money in establishing a scientific track record. In the past 30 years\nhow we publish our research and how we archive it has changed significantly. If you read my blog more often, you know I have\nbeen critical of the performance of many publishers. Springer Nature was so disappointing that after 5 years I\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2021/06/11/conflict-of-interest-or-why-i-am.html\">stepped down</a>\nas Editor-in-Chief (of two) of the <a href=\"https://en.wikipedia.org/wiki/Journal_of_Cheminformatics\">Journal of Cheminformatics</a>.\nThere is so much that must be <a href=\"https://chem-bla-ics.linkedchemistry.info/2024/09/16/publishing.html\">done better</a>.</p>\n\n<p>But in the most recent iteration, triggered by some work for <a href=\"https://www.wikipathways.org/\">WikiPathways</a>, I was using\n<a href=\"https://europepmc.org/\">Europe PMC</a> to find articles that\nmention <em>WikiPathways</em> and then search in the full text for the string <code class=\"language-plaintext highlighter-rouge\">WP</code>, as a trigger for the possible mention of\nWikiPathways pathway identifiers, which look like <code class=\"language-plaintext highlighter-rouge\">WP4846</code>. The use of <em>compact (resource) identifiers</em>\n(see doi:<a href=\"https://doi.org/10.1038/sdata.2018.29\">10.1038/sdata.2018.29</a>) is minimal, but at least some articles use identifiers.</p>\n\n<p>That allows me to extend our WikiPathways knowledge graph of <a href=\"https://www.wikipathways.org/browse/citedin\">articles citing specific pathways</a>.\nAt the time of writing, we collected 2509 citations from 440 different articles to 883 different pathways. Now,\nI want to blog about that more, but it’s related to an observation.</p>\n\n<h2 id=\"information-loss\">Information loss</h2>\n<p>Now, back in the late ninities I learned about GNU/Linux and after playing with Red Hat and Suse, I settled for Debian.\nOne of the things I learned is that, generally, information corruption (like data loss) is an absolute red flag, a no-go,\na total showstopper.</p>\n\n<p>And then we have this in publishing, the one area where data corruption must also be a no-go:</p>\n\n<p><img src=\"/assets/images/imageResolutionLoss.png\" alt=\"\" /></p>\n\n<p>In this image, the left side shows a screenshot of the publisher version of the article and on the right side\nthe version in <a href=\"https://pmc.ncbi.nlm.nih.gov/\">Pubmed Central</a> (PMC). PMC has been an important project to archive full text versions of articles:</p>\n\n<blockquote>\n  <p>11.2 million articles are archived in PMC.</p>\n</blockquote>\n\n<p>So, this is <strong>really bad</strong>! The archived version is not really useful. As a human I already struggle to read the\ndegraded image, let alone an algorithm.</p>\n\n<p>Does that matter? Yes, projects like the awesome\n<a href=\"https://pfocr.wikipathways.org/\">Pathway Figure OCR</a> (see doi:<a href=\"https://doi.org/10.1186/s13059-020-02181-2\">10.1186/s13059-020-02181-2</a>)\ndepend on images to be FAIR enough to extract information. (Side note: yes, these images should be vector\ngraphics, but commercial publishers decided about twenty years ago that they could not care enough.)</p>\n\n<p>At this moment, I do not know where the information is lost. Maybe PubMed Central is storing the images in a low\nresolution. Maybe the publisher provides PMC with a low resolution image. But to me, this must be solved as soon\nas possible. This is utterly unacceptable.</p>\n\n<p>I wonder what the authors of the article (doi:<a href=\"https://doi.org/10.1186/s13287-025-04166-z\">10.1186/s13287-025-04166-z</a>)\nI took as example think of this.</p>",
      "summary": "Mike Taylor wrote up a post about the various things a journal article is doing, the first being a scientific report. We put a lot of money in establishing a scientific track record. In the past 30 years how we publish our research and how we archive it has changed significantly. If you read my blog more often, you know I have been critical of the performance of many publishers. Springer Nature was so disappointing that after 5 years I stepped down as Editor-in-Chief (of two) of the Journal of Cheminformatics. There is so much that must be done better.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/imageResolutionLoss.png",
      "date_published": "2025-08-06T00:00:00+00:00",
      "date_modified": "2025-08-06T00:00:00+00:00",
      "tags": ["publishing","europepmc"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.59350/svpow.24000", "doi": "10.59350/svpow.24000"
            , "cito":
              
              
                [ 
                  "citesAsRecommendedReading"
                  
                 ]
              
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1186/s13059-020-02181-2", "doi": "10.1186/s13059-020-02181-2"
            , "cito":
              
              
                [ 
                  "citesAsRecommendedReading"
                  
                 ]
              
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1186/s13287-025-04166-z", "doi": "10.1186/s13287-025-04166-z"
            , "cito":
              
              
                [ 
                  "describes"
                  
                 ]
              
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1038/sdata.2018.29", "doi": "10.1038/sdata.2018.29"
            , "cito":
              
              
                [ 
                  "obtainsBackgroundFrom"
                  
                 ]
              
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/fhb6w-2ge30",
      "url": "https://chem-bla-ics.linkedchemistry.info/2025/07/06/pfas-in-the-blood-of-the-dutch-population.html",
      "title": "PFAS in the blood of the Dutch population",
      "content_html": "<p>A recent report by the Dutch <a href=\"https://www.rivm.nl/\">RIVM</a>, <em>PFAS in the blood of the Dutch population</em>\n(doi:<a href=\"https://www.rivm.nl/bibliotheek/rapporten/2025-0094.pdf\">10.21945/RIVM-2025-0094</a>), writes\nthat seven <a href=\"https://scholia.toolforge.org/chemical-class/Q648037\">PFAS</a> compounds are found in blood samples\nof all tested people. Another nine compounds are found in at least 1-in-10 people.\nBecause there is relevant data in the report on the 28 studied PFAS compound, I wanted to\nhave the report more FAIR than it is on the website. Why this report? Well, the chemistry and the\nhistory is fascinating and brutal (I like <a href=\"https://www.youtube.com/watch?v=SC2eSujzrUY\">this Veritasium video</a>).</p>\n\n<p>The history tells me that our society may sound woke and leftish, in reality it is a continous fight\nfor basic human rights. (Something that plenty have been saying for years.)\nIn this case, a healtht life is the human right.</p>\n\n<p>So, what can I do to make this report more FAIR?</p>\n\n<h2 id=\"findable-in-wikidata\">Findable in Wikidata</h2>\n\n<p>Since this report has been <a href=\"https://news.google.com/search?q=PFAS%20in%20the%20blood%20of%20the%20Dutch%20population&amp;hl=en-US&amp;gl=US&amp;ceid=US%3Aen\">mentioned in the news</a>,\nit clearly is notable. The simplest thing to do is thus to just add it <a href=\"https://www.wikidata.org/wiki/Wikidata:Main_Page\">Wikidata</a>.\nBecause the DOI of the report had not been recorded yet, I could not let <a href=\"https://scholia.toolforge.org/\">Scholia</a>\ndo it for me. But doing it manually is only a bit more work: <a href=\"https://www.wikidata.org/wiki/Q135222054\">Q135222054</a>.\nThe provided metadata <a href=\"https://www.rivm.nl/en/news/first-nationwide-study-into-pfas-in-blood\">on the RIVM website</a>\nis minimal.</p>\n\n<p>But we can do more. Particularly, because I want people to find this report when they look info knowledge\nabout the 28 studied chemicals, I added <a href=\"https://www.wikidata.org/wiki/Q135222054#P921\">main subject</a> annotation\nusing the information in <em>Table 1</em> in the report. Using Scholia and the CAS registry number in the table,\nI crosscheck the information in Wikidata is consistent with the report (and visa versa). It was.\nI then added the Dutch name and acronym for most of them. Some already had the name as in the Table.\nThat gives us a nice “Topic scores” plot for <a href=\"https://scholia.toolforge.org/work/Q135222054\">the Scholia page of the report</a>:</p>\n\n<p><img src=\"/assets/images/pfas_report.png\" alt=\"\" /></p>\n\n<p>The central PFAS bubble is also only one <em>main subject</em> but larger because many the specific PFAS compounds\nare subclassing PFAS. And you may also note many smaller bubbles. These actually come from <em>main subject</em>\nannotations of articles cited from the report. Because I added a few of them too. Not all, because many are\nnot in Wikidata (yet).</p>\n\n<h2 id=\"findable-in-wikipathways\">Findable in WikiPathways</h2>\n\n<p>But since 16 of these compounds are readily found in human blood samples, that is handy knowledge when\ndoing metabolomics (on blood samples). Or (and I leave that to later blog post), we can map the experimental\ndata for Dordrecht versus the rest of The Netherlands to the PFAS compounds. That is relevant to research\nby <a href=\"https://vhp4safety.nl\">VHP4Safety</a>. There are many ways to see if you have PFAS in your dataset,\nbut since we have many controlled lists of genes in metabolites, I added one for common PFAS in human\nblood samples. Well, the 16 common in Dutch blood samples:</p>\n\n<p><img src=\"/assets/images/pfas_wikipathways.png\" alt=\"\" /></p>\n\n<p>Each <em>metabolite</em> here is annotated with their Wikidata identifier, allowing us to map experimental\ndata on top of it. And we get links out to other databases almost for free:</p>\n\n<p><img src=\"/assets/images/pfas_wikipathways_outlinks.png\" alt=\"\" /></p>\n\n<p>And the link to Wikidata actually links to Scholia, so for the PFOA in the above example,\nwe can quickly see the boiling point, decomposition point, and melting point of this PFAS.\nAnd literature with undoubtedly even more knowledge about this PFAS:</p>\n\n<p><img src=\"/assets/images/pfas_scholia.png\" alt=\"\" /></p>\n\n<p>Now, these two steps were mostly manual: drawing <a href=\"https://classic.wikipathways.org/index.php/Pathway:WP5579\">WP5579</a>\nin WikiPathways and adding the report annotations (<em>main subject</em> and <em>cites</em>) in Wikidata.</p>\n\n<h2 id=\"findable-in-the-vhp4safety-compound-wiki\">Findable in the VHP4Safety Compound Wiki</h2>\n\n<p>As part of the VHP4Safety project, I am collecting information on chemicals studied in the context\nof toxicology, safety, and risk assessment. Often specific collections of compounds studied as a whole.\nThis report is such a collection and provides experimental data on these compounds. So, I want this\nreport to be findable for the <a href=\"https://compoundcloud.wikibase.cloud/\">VHP4Safety Compound Wiki</a> too.\nCreating the collection is a manual step: <a href=\"https://compoundcloud.wikibase.cloud/wiki/Item:Q5145\">Q5145</a>.</p>\n\n<p>Now, because both Wikidata and our VHP4Safety Compound Wiki (a Wikibase instance) are semantic and support, I can use SPARQL\nto create instructions to link the 28 compounds to the new collection. Now, arguably, that can be\ndone manually too, and maybe faster, for larger collections this is harder. So, I dug up\n<a href=\"https://compoundcloud.wikibase.cloud/wiki/User:Egonw\">my earlier notes</a> and got some useful\nthings together.</p>\n\n<p>This query lists all 28 PFAS linked to the report <a href=\"https://w.wiki/Eepm\">in Wikidata</a>:</p>\n\n<div class=\"language-sparql highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">SELECT</span><span class=\"w\"> </span><span class=\"nv\">?pfas</span><span class=\"w\"> </span><span class=\"nv\">?pfasLabel</span><span class=\"w\"> </span><span class=\"k\">WHERE</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\">\n  </span><span class=\"nn\">wd</span><span class=\"o\">:</span><span class=\"ss\">Q135222054</span><span class=\"w\"> </span><span class=\"nn\">wdt</span><span class=\"o\">:</span><span class=\"ss\">P921</span><span class=\"w\"> </span><span class=\"nv\">?pfas</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n  </span><span class=\"nv\">?pfas</span><span class=\"w\"> </span><span class=\"nn\">wdt</span><span class=\"o\">:</span><span class=\"ss\">P31</span><span class=\"w\"> </span><span class=\"nn\">wd</span><span class=\"o\">:</span><span class=\"ss\">Q113145171</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n  </span><span class=\"k\">SERVICE</span><span class=\"w\"> </span><span class=\"nn\">wikibase</span><span class=\"o\">:</span><span class=\"ss\">label</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\"> </span><span class=\"nn\">bd</span><span class=\"o\">:</span><span class=\"ss\">serviceParam</span><span class=\"w\"> </span><span class=\"nn\">wikibase</span><span class=\"o\">:</span><span class=\"ss\">language</span><span class=\"w\"> </span><span class=\"s2\">\"[AUTO_LANGUAGE],mul,en\"</span><span class=\"p\">.</span><span class=\"w\"> </span><span class=\"p\">}</span><span class=\"w\">\n</span><span class=\"p\">}</span><span class=\"w\">\n</span></code></pre></div></div>\n\n<p>Using federation powers, I can use this for <a href=\"https://edu.nl/ar9wf to match these up with our Wikibase\">a SPARQL query</a>,\nand return the results in QuickStatements that say <em>this VHP compound is part of the VHP collection</em>:</p>\n\n<div class=\"language-sparql highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">PREFIX</span><span class=\"w\"> </span><span class=\"nn\">wb</span><span class=\"o\">:</span><span class=\"w\"> </span><span class=\"nn\">&lt;https://compoundcloud.wikibase.cloud/entity/&gt;</span><span class=\"w\">\n</span><span class=\"k\">PREFIX</span><span class=\"w\"> </span><span class=\"nn\">wbt</span><span class=\"o\">:</span><span class=\"w\"> </span><span class=\"nn\">&lt;https://compoundcloud.wikibase.cloud/prop/direct/&gt;</span><span class=\"w\">\n\n</span><span class=\"k\">SELECT</span><span class=\"w\"> </span><span class=\"p\">(</span><span class=\"nb\">SUBSTR</span><span class=\"p\">(</span><span class=\"nb\">STR</span><span class=\"p\">(</span><span class=\"nv\">?cmp</span><span class=\"p\">),</span><span class=\"mi\">45</span><span class=\"p\">)</span><span class=\"w\"> </span><span class=\"k\">AS</span><span class=\"w\"> </span><span class=\"nv\">?qid</span><span class=\"p\">)</span><span class=\"w\"> </span><span class=\"nv\">?P21</span><span class=\"w\"> </span><span class=\"k\">WHERE</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\">\n  </span><span class=\"nv\">?cmp</span><span class=\"w\"> </span><span class=\"nn\">wbt</span><span class=\"o\">:</span><span class=\"ss\">P5</span><span class=\"w\"> </span><span class=\"nv\">?wikidata</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n  </span><span class=\"k\">SERVICE</span><span class=\"w\"> </span><span class=\"nn\">&lt;https://query.wikidata.org/sparql&gt;</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\">\n    </span><span class=\"nn\">wd</span><span class=\"o\">:</span><span class=\"ss\">Q135222054</span><span class=\"w\"> </span><span class=\"nn\">wdt</span><span class=\"o\">:</span><span class=\"ss\">P921</span><span class=\"w\"> </span><span class=\"nv\">?pfas</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n    </span><span class=\"nv\">?pfas</span><span class=\"w\"> </span><span class=\"nn\">wdt</span><span class=\"o\">:</span><span class=\"ss\">P31</span><span class=\"w\"> </span><span class=\"nn\">wd</span><span class=\"o\">:</span><span class=\"ss\">Q113145171</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n    </span><span class=\"k\">BIND</span><span class=\"w\"> </span><span class=\"p\">(</span><span class=\"nb\">substr</span><span class=\"p\">(</span><span class=\"nb\">str</span><span class=\"p\">(</span><span class=\"nv\">?pfas</span><span class=\"p\">),</span><span class=\"mi\">32</span><span class=\"p\">)</span><span class=\"w\"> </span><span class=\"k\">AS</span><span class=\"w\"> </span><span class=\"nv\">?wikidata</span><span class=\"p\">)</span><span class=\"w\">\n  </span><span class=\"p\">}</span><span class=\"w\">\n  </span><span class=\"k\">BIND</span><span class=\"w\"> </span><span class=\"p\">(</span><span class=\"s2\">\"Q5145\"</span><span class=\"w\"> </span><span class=\"k\">AS</span><span class=\"w\"> </span><span class=\"nv\">?P21</span><span class=\"p\">)</span><span class=\"w\">\n</span><span class=\"p\">}</span><span class=\"w\">\n</span></code></pre></div></div>\n\n<p>I actually had to add 5 PFAS compounds in the VHP4Safety Compound Wiki first. That follows the\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2016/03/20/adding-disclosures-to-wikidata-with.html\">same procedure for how I have been adding chemical compounds to Wikidata</a>\n(see also <a href=\"https://doi.org/10.26434/chemrxiv-2025-53n0w\">this preprint</a>).\nThe input <code class=\"language-plaintext highlighter-rouge\">cas.smi</code> has the (missing) SMILES, Wikidata QID, and English label:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>C(CS(=O)(=O)O)C(C(C(C(C(C(F)(F)F)(F)F)(F)F)(F)F)(F)F)(F)F       Q27063662       6:2 Fluorotelomer sulfonate\nCN(CC(=O)O)S(=O)(=O)C(C(C(C(C(C(F)(F)F)(F)F)(F)F)(F)F)(F)F)(F)F Q126605979      MeFHxSAA\nCN(CC(=O)O)S(=O)(=O)C(C(C(C(F)(F)F)(F)F)(F)F)(F)F       Q126682412      MeFBSAA\nC(=O)(C(C(F)(F)F)(F)OC(C(C(F)(F)F)(F)F)(F)F)O[H]        Q29387971       2,3,3,3-tetrafluoro-2-(heptafluoropropoxy)propanoic acid\nC(C(C(=O)O)(F)F)(OC(C(C(OC(F)(F)F)(F)F)(F)F)(F)F)F      Q81981675       4,8-Dioxa-3H-perfluorononanoic acid\n</code></pre></div></div>\n\n<p>For reference, this is the command line I used to create QuickStatement instructions:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>groovy createWDitemsFromSMILES.groovy <span class=\"nt\">-w</span> compoundcloud.wikibase.cloud <span class=\"nt\">-c</span> Q2368 <span class=\"nt\">-d</span> P5 <span class=\"nt\">-l</span> <span class=\"nt\">-i</span> wikidata <span class=\"nt\">-a</span> P11\n</code></pre></div></div>\n\n<h2 id=\"final-remark\">Final remark</h2>\n\n<p>Are these 16 the only PFAS in our body? With 28 studied out of <a href=\"https://doi.org/10.1021/acs.est.3c04855\">a potential seven million</a>,\nI doubt it.</p>",
      "summary": "A recent report by the Dutch RIVM, PFAS in the blood of the Dutch population (doi:10.21945/RIVM-2025-0094), writes that seven PFAS compounds are found in blood samples of all tested people. Another nine compounds are found in at least 1-in-10 people. Because there is relevant data in the report on the 28 studied PFAS compound, I wanted to have the report more FAIR than it is on the website. Why this report? Well, the chemistry and the history is fascinating and brutal (I like this Veritasium video).",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/pfas_report.png",
      "date_published": "2025-07-06T00:00:00+00:00",
      "date_modified": "2025-07-06T00:00:00+00:00",
      "tags": ["pfas","chemistry","fair","scholia","wikidata","vhp4safety"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.26434/CHEMRXIV-2025-53N0W", "doi": "10.26434/CHEMRXIV-2025-53N0W"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1021/acs.est.3c04855", "doi": "10.1021/acs.est.3c04855"
            , "cito":
              
              
                [ 
                  "citesAsRecommendedReading"
                  
                 ]
              
             }
            
          
        ],
      
      "_funding": [{"award": { "title" : "The Virtual Human Platform for Safety Assessment", "acronym" : "VHP4Safety", "uri" : "drc.filenumber:nwa129219272" }, "funder": { "name": "Dutch Research Council", "ror": "04jsz6e67" } }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/c4k5q-h8849",
      "url": "https://chem-bla-ics.linkedchemistry.info/2025/06/29/curation-is-an-essential-part-of-doing-research.html",
      "title": "Curation is an essential part of doing research",
      "content_html": "<p>Depending on your exact definition of doing science, keeping track as precise as possible of your observations\nis an essential part of doing science. The precision should be high enough that mistakes are obvious. This pattern is,\nof course, not limited to doing science and we see this in open source development too. Unfortunately, in the\nmodern way of doing science, this is not getting the attention it should get. Worse, with narratives (stories)\nabout the research, in the form of journal articles, are generally considered more important that a precise\ndescription of the observations.</p>\n\n<p>Is that a big issue? Hell, yes. Where do you think the FAIR ideas came from? And why FAIR in ten years has not\nbrought about the change it was hoping for?</p>\n\n<p>For me, my fascination for curation started as a student, around 1995, with the <em>Dictionary on Organic Chemistry</em>.\nAt that time, my interest came from wanting to learn about chemistry and biology. During my M.Sc. and PhD, it was\nobvious how essential it was to derivating correct scientific conclusions from your experiment. Data, knowledge,\nand software alike, imo. And because curation is expensive, not having to repeat it, I prefer to do it as\nOpen Science.</p>\n\n<h2 id=\"curation\">Curation</h2>\n\n<p>Of course, curation has been part of doing science, but to a large extens is separate step from doing science.\nIt is done by database developers, librarians, and chemo- and bioinformaticians. For example, Chemical Abstracts\nService (CAS) <a href=\"https://en.wikipedia.org/wiki/Chemical_Abstracts_Service\">started over 100 years ago</a> and started\nindexing chemical structures in 1965. The curation is an ongoing process, <a href=\"https://chem-bla-ics.linkedchemistry.info/2022/05/22/new-cas-common-chemistry-in-2021.html\">also for old records</a>.</p>\n\n<p><a href=\"https://www.biocuration.org/dissemination/who-are-we/\">Biocuration</a> is getting\n<a href=\"https://scholia.toolforge.org/topic/Q54987878#publications-per-year\">more and more attention</a>:</p>\n\n<p><img src=\"/assets/images/biocuration.png\" alt=\"\" /></p>\n\n<p>The recognition and rewarding by having the <a href=\"https://www.biocuration.org/\">International Society for Biocuration</a>\n(ISB, <a href=\"https://scholia.toolforge.org/organization/Q23809291\">Scholia page</a>) should not be underestimated\n(doi:<a href=\"https://doi.org/10.1038/455047A\">10.1038/455047A</a>). Their <a href=\"https://scholia.toolforge.org/event-series/Q106486148\">Annual International Biocuration Conferences</a>\nhave been running since <a href=\"https://scholia.toolforge.org/event/Q109408101\">2005</a>. And with their\nawards, they give the biocuration work recognition and, literally, rewarding:</p>\n\n<ul>\n  <li><a href=\"https://scholia.toolforge.org/award/Q106045191\">Biocuration Career Award</a> (2016-2021)</li>\n  <li><a href=\"https://scholia.toolforge.org/award/Q118947746\">Excellence in Biocuration Early Career Award</a> (2022-)</li>\n  <li><a href=\"https://scholia.toolforge.org/award/Q119882229\">Excellence in Biocuration Advanced Career Award</a> (2022-)</li>\n  <li><a href=\"https://scholia.toolforge.org/award/Q106045103\">Exceptional Contribution to Biocuration Award</a> (2017-)</li>\n</ul>\n\n<h2 id=\"my-curation-curriculum-vitae\">My curation Curriculum Vitae</h2>\n\n<p>I don’t have a good <em>curation CV</em>. For a large extend because the curation has been part of a study. The curation\nitself does not get recognized, and only the <em>journal article</em> does. With datasets slowly getting more recognition,\nso does data curation, but data curation is not really part of how we do FAIR at this moment, and via this route\nnot getting the attention it gets.</p>\n\n<p>But since I have been updating <a href=\"https://egonw.github.io/cv/\">my CV anyway</a>, I dug up some curation I am proud\nof:</p>\n\n<ul>\n  <li>the Dictionary on Organic Chemistry, which no longer exists, but it started my Open Science chemistry research</li>\n  <li>the <a href=\"Blue Obelisk Data Repository\">Blue Obelisk Data Repositry</a> (BODR), which has been part of various\nGNU/Linux distributions (see also doi:<a href=\"https://doi.org/10.1021/ci050400b\">10.1021/ci050400b</a>).\nA new version is <a href=\"https://chem-bla-ics.blogspot.com/2013/08/the-blue-obelisk-data-repositorys-10.html\">long overdue</a></li>\n  <li>I contributed hundreds of NMR spectra with uncommon nuclei to <a href=\"https://sourceforge.net/projects/nmrshiftdb2/files/data/\">NMRShiftDb</a></li>\n  <li>Wikidata, see <a href=\"https://chem-bla-ics.linkedchemistry.info/2025/05/25/new-preprint-scholia-chemistry-access-to-chemistry-in-wikidata.html\">this preprint</a>,\nbut also many small projects, like adding CXSMILES for polymers, and <a href=\"https://laurendupuis.github.io/Scholia_tutorial/\">main subject annotation in Scholia</a></li>\n  <li>WikiPathways (see <a href=\"https://chem-bla-ics.linkedchemistry.info/tag/wikipathways\">these blog posts</a>), where I started\n<a href=\"https://classic.wikipathways.org/index.php?title=Special:Contributions&amp;dir=prev&amp;target=Egonw&amp;month=&amp;year=\">curating metabolites in 2012</a>,\nset up <a href=\"https://chem-bla-ics.linkedchemistry.info/2018/10/11/two-presentations-at-wikipathways-2018.html\">a computer-assistent curation platform</a>\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2016/07/02/two-apache-jena-sparql-query.html\">using SPARQL</a>, and\nwere an early curator of <a href=\"https://chem-bla-ics.linkedchemistry.info/2020/10/31/sars-cov-2-covid-19-and-open-science.html\">SARS-CoV-2 biological processes</a></li>\n  <li>citation intent annotation with the Citation Typing Ontology, see this <a href=\"https://scholia.toolforge.org/cito/\">Scholia overview</a></li>\n  <li>nanosafety ontology and data: the <a href=\"https://github.com/enanomapper/ontologies\">eNanoMapper Ontology</a> (ENMO),\n<a href=\"https://figshare.com/search?q=nanowiki\">NanoWiki</a>, <a href=\"https://nanocommons.github.io/specifications/jrc/\">JRC nanomaterial index</a> and\n<a href=\"https://nanocommons.github.io/erm-database/\">the ERM indentifier database</a></li>\n  <li>made RDF for supplementary information (e.g. <a href=\"http://chem-bla-ics.linkedchemistry.info/2018/09/16/data-curation-5-inspiration-95.html\">this NanoE-Tox spreadsheet</a>,\nfull databases, like <a href=\"https://chem-bla-ics.linkedchemistry.info/2011/04/21/chembl-09-as-rdf.html\">ChEMBL</a> and\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2009/09/04/nmrshiftdb-enters-rdfopenmoleculesnet-2.html\">NMRShiftDb <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n  <li>organized <a href=\"https://chem-bla-ics.linkedchemistry.info/2019/10/14/chemcuration-2019-poster-conference.html\">an online ChemCuration event</a> (inspired by the ISB annual meetings!)</li>\n</ul>\n\n<p>I am also curation my blog, which was <a href=\"https://chem-bla-ics.linkedchemistry.info/2023/08/18/last-post-here-freebie-model-online.html\">originally in blogger.com but being ported to Markdown with extra annotation</a>.\nThat includes <a href=\"https://chem-bla-ics.linkedchemistry.info/2023/07/27/archiving-and-updating-my-blog.html\">updating URLs</a>\nand annotation of blog posts <a href=\"https://chem-bla-ics.linkedchemistry.info/2005/10/21/viagra-saves-environment.html\">with chemicals</a>,\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2024/10/24/vhp4safety.html\">grants</a>, and\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2025/02/08/cito-for-blog-citations.html\">intention-typed citations</a>.</p>\n\n<h2 id=\"long-tail\">Long tail</h2>\n\n<p>Of course, I have my Wikipedia edits, and contributed to projects like <a href=\"https://github.com/biopragmatics/bioregistry/commits/main/?author=egonw\">Bioregistry.io</a>,\n<a href=\"https://fairsharing.org/users/596\">FAIRsharing</a>, regularly submit <a href=\"https://form.typeform.com/to/SWoxIY?typeform-source=altmetric.typeform.com\">missed mentions to Altmetric.com</a>,\netc. There is a long tail in curation. And there is a lot of curation hidden in <a href=\"https://scholar.google.com/citations?user=u8SjMZ0AAAAJ&amp;hl=en\">my literature list</a>.</p>\n\n<p>And that long tail matters to me. I want every researcher to pick up the challenge to curate their own\nresearch output. Put your experimental data in databases, add important provenance, get the details rights.\nThis is essential to reduce the cost of doing research, and that is more important than ever.</p>\n\n<p>BTW, I must note that our bioinformatics team colleagues too have done a tremendous amount of biocuration,\nin WikiPathways (<a href=\"https://scholia.toolforge.org/author/Q43744369\">Denise</a>, <a href=\"https://scholia.toolforge.org/author/Q28025534\">Freddie</a>,\n<a href=\"https://scholia.toolforge.org/author/Q19851164\">Susan</a>), in nanosafety (<a href=\"https://scholia.toolforge.org/author/Q99306396\">Jeaphianne</a>,\n<a href=\"https://scholia.toolforge.org/author/Q86442640\">Ammar</a>), and in toxicology (<a href=\"https://scholia.toolforge.org/author/Q42369611\">Marvin</a>),\njust to name a few. Often together with B.Sc. and M.Sc. students (which <a href=\"https://europepmc.org/article/med/26557796\">can work really well</a>).</p>\n\n<h2 id=\"award-nomination\">Award nomination</h2>\n\n<p>And I hope this makes it clear why I am delighted to was <a href=\"https://www.biocuration.org/community/biocuration-career-awards/excellence-in-biocuration-advanced-career-award-2025/\">nominated last week</a>\nfor an ISB <em>Excellence in Biocuration Advanced Career Award</em>. The list of past awardees is impressive,\nas are the other nominations:\n<a href=\"https://scholia.toolforge.org/author/Q89869027\">Laurel Cooper</a>, Oregon State University/USA,\n<a href=\"https://scholia.toolforge.org/author/Q57227590\">Steven Marygold</a>, University of Cambridge/UK,\n<a href=\"https://scholia.toolforge.org/author/Q111430202\">Saurabh Raghuvanshi</a>, University of Delhi/India, and\n<a href=\"https://scholia.toolforge.org/author/Q59674797\">Kimberly Van Auken</a>, California Institute of Technology/USA.</p>\n\n<p>It’s an honor to be listed along these other nominees and being nominated is a great recognition! With a\n<em>thank you</em> to the person who proposed my nomination.</p>",
      "summary": "Depending on your exact definition of doing science, keeping track as precise as possible of your observations is an essential part of doing science. The precision should be high enough that mistakes are obvious. This pattern is, of course, not limited to doing science and we see this in open source development too. Unfortunately, in the modern way of doing science, this is not getting the attention it should get. Worse, with narratives (stories) about the research, in the form of journal articles, are generally considered more important that a precise description of the observations.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/biocuration.png",
      "date_published": "2025-06-29T00:00:00+00:00",
      "date_modified": "2026-03-07T00:00:00+00:00",
      "tags": ["curation","openscience","nmrshiftdb","europepmc"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1038/455047A", "doi": "10.1038/455047A"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1021/CI050400B", "doi": "10.1021/CI050400B"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/18gr7-6mx88",
      "url": "https://chem-bla-ics.linkedchemistry.info/2025/06/22/all-biohackrxiv-preprints-and-biohackathon-rss-feeds.html",
      "title": "All BioHackrXiv preprints and BioHackathon RSS feeds",
      "content_html": "<p>One thing I was still missing in <a href=\"https://biohackrxiv.org\">BioHackrXiv</a> was a place with an overview\nof: 1. all biohackathons, 2. all preprints linked to a biohackathon, 3. an RSS feed for new papers of a biohackathon.\nOf course, there is the <a href=\"https://biohackrxiv.org/discover\">BioHackrXiv discover</a> service, but the biohackathon\nis not a metadata field and I cannot filter based on it. And, of course, there is Scholia, but not all preprints\nare notable (so far, a good number had CiTO annotation that at least made them somewhat notable). Thus,\nthey are not all listed in <a href=\"https://scholia.toolforge.org/venue/Q115450084\">this venue page</a> and neither\non <a href=\"https://scholia.toolforge.org/event-series/Q109379759\">this overview of collections of preprints linked to BioHackathon Europe meetings</a>.</p>\n\n<p>Additionally, it also does not have an RSS feed, and\n<a href=\"https://pluralistic.net/2024/10/16/keep-it-really-simple-stupid/read-receipts-are-you-kidding-me-seriously-fuck-that-noise\">we should indeed be using RSS more</a>.\nSo, I hacked something up and impressions were positive. Based on Jekyll and the experiences I had with this\nblog, I modelled individual articles as blog posts and biohackathons as tags. That automatically gave me\nthe RSS feeds:</p>\n\n<ul>\n  <li><a href=\"https://index.biohackrxiv.org/feed.xml\">feed for new BioHackrXiv preprints</a> (or <a href=\"https://index.biohackrxiv.org/feed.json\">this JSON Feed</a>)</li>\n  <li><a href=\"https://index.biohackrxiv.org/feed/by_tag/BH21EU.xml\">feed for Europe BioHackathon 2021</a> (based on <a href=\"https://index.biohackrxiv.org/tag/BH21EU\">the BH21EU tag</a>)</li>\n</ul>\n\n<p>It’s still very much in progress, but it’s now live at <a href=\"https://index.biohackrxiv.org/\">index.biohackrxiv.org</a>:</p>\n\n<p><img src=\"/assets/images/biohackrxiv_index.png\" alt=\"\" /></p>\n\n<h2 id=\"extras\">Extras</h2>\n\n<p>Some extra goodies already there include:</p>\n\n<ul>\n  <li>the <a href=\"https://en.wikipedia.org/wiki/Altmetrics\">Altmetric.com donut</a></li>\n  <li>links to <a href=\"https://scholia.toolforge.org/\">Scholia</a></li>\n  <li>links to <a href=\"https://europepmc.org/\">Europe PMC</a></li>\n</ul>\n\n<p>The <a href=\"https://index.biohackrxiv.org/tags/\">overview of biohackathons</a> looks like this (the tag size follows the number\nof preprints for that biohackathon):</p>\n\n<p><img src=\"/assets/images/biohackrxiv_biohackathons.png\" alt=\"\" /></p>",
      "summary": "One thing I was still missing in BioHackrXiv was a place with an overview of: 1. all biohackathons, 2. all preprints linked to a biohackathon, 3. an RSS feed for new papers of a biohackathon. Of course, there is the BioHackrXiv discover service, but the biohackathon is not a metadata field and I cannot filter based on it. And, of course, there is Scholia, but not all preprints are notable (so far, a good number had CiTO annotation that at least made them somewhat notable). Thus, they are not all listed in this venue page and neither on this overview of collections of preprints linked to BioHackathon Europe meetings.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/biohackrxiv_index.png",
      "date_published": "2025-06-22T00:00:00+00:00",
      "date_modified": "2025-06-22T00:00:00+00:00",
      "tags": ["biohackrxiv","europepmc"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/6f7he-kxt56",
      "url": "https://chem-bla-ics.linkedchemistry.info/2025/06/09/one-million-iupac-names.html",
      "title": "One Million IUPAC names #3: the 200 thousand milestone and 1 million IUPAC names",
      "content_html": "<p>I could not find the time earlier to report (<a href=\"https://chem-bla-ics.linkedchemistry.info/2025/06/08/iccs2025-1-back-in-noordwijkerhout.html\">reason</a>),\nbut three weeks ago we passed the fourth milestone release of the CCZero IUPAC names found in literature collection. This release contains\n200026 IUPAC names, 168702 unique names, reflecting 116207 unique InChIKeys. Time for an update of the\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2025/03/08/iupac-names.html\">One Million IUPAC names</a> project.</p>\n\n<p>The current count actually is just above 230 thousand IUPAC names, but further growth may require new approaches,\nsuch as the <a href=\"https://chem-bla-ics.linkedchemistry.info/2025/04/27/one-million-iupac-names-2-the-100-thousand-milestone.html\">four ideas</a>\nI posted earlier. I have gone through all full-text Open Access articles provided by the <a href=\"https://europepmc.org/RestfulWebService\">Europe PMC API</a>.\nNow, this list is not static, but I wanted to start using their <a href=\"https://europepmc.org/downloads\">bulk downloads</a> anyway.</p>\n\n<h2 id=\"the-current-results\">The current results</h2>\n\n<p>I have been looking at the names coming in. Some are short, others long. The complexity is fascinating and I will\nhave to brush up my cheminformatics skills to make chemical space splots and visualize the structural diversity.\nI also note the current workflow does a good job at unicode characters, and we have plenty of names\nlike <code class=\"language-plaintext highlighter-rouge\">ε,ε-carotene-3,3’-dione</code>. There are also names that I do not expect to be really valid, like\n<code class=\"language-plaintext highlighter-rouge\">hydroxymethyl methacrylate-</code> that end with a hyphen (41 in total), but their overall count is low.\nAnd OPSIN is happy with it, so the name fits the rules.</p>\n\n<p>The ten longest names (so far) are these (with the lengths 322, 324, 332, 357, 371, 373, 376, 421, 429, and 626):</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>(5Z)-3-ethyl-5-[[4-[15-[7-[(Z)-(3-ethyl-4-oxo-2-sulfanylidene-1,3-thiazolidin-5-ylidene)methyl]-2,1,3-benzothiadiazol-4-yl]-9,9,18,18-tetra(nonyl)-5,14-dithiapentacyclo[10.6.0.03,10.04,8.013,17]octadeca-1(12),2,4(8),6,10,13(17),15-heptaen-6-yl]-2,1,3-benzothiadiazol-7-yl]methylidene]-2-sulfanylidene-1,3-thiazolidin-4-one\n(Z)-[[4-[[(Z)-N’-carbamoyl-N-[2-[2-[2-[[3-[(4S)-6,8-dichloro-2-methyl-3,4-dihydro-1H-isoquinolin-4-yl]phenyl]sulfonylamino]ethoxy]ethoxy]ethyl]carbamimidoyl]amino]butylamino]-[2-[2-[2-[[3-[(4S)-6,8-dichloro-2-methyl-3,4-dihydro-1H-isoquinolin-4-yl]phenyl]sulfonylamino]ethoxy]ethoxy]ethylamino]methylene]urea dihydrochloride\n2-((Z)-2-((6-(4-(6-((Z)-(1-(dicyanomethylene)-5,6-difluoro-3-oxo-1H-inden-2(3H)-ylidene)methyl)-4,4-bis(2-ethylhexyl)-4H-cyclopenta[1,2-b:5,4-b′]dithiophen-2-yl)-2,3-bis(hexyloxy)phenyl)-4-(5,7-diethylundecan-6-yl)-4H-cyclopenta[1,2-b:5,4-b′]dithiophen-2-yl)methylene)-5,6-difluoro-3-oxo-2,3-dihydro-1H-inden-1-ylidene)malononitrile\n(2S,4S,5R,6R)‐5‐acetamido‐2‐[(2S,3R,4R,5S,6R)‐5‐[(2S,3R,4R,5R,6R)‐3‐acetamido‐4,5‐dihydroxy‐6‐(hydroxymethyl)oxan‐2‐yl]oxy‐2‐[(2R,3S,4R,5R,6R)‐4,5‐dihydroxy‐2‐(hydroxymethyl)‐6‐[(E,2S,3R)‐3‐hydroxy‐2‐(octadecanoylamino)octadec‐4‐enoxy]oxan‐3‐yl]oxy‐3‐hydroxy‐6‐(hydroxymethyl)oxan‐4‐yl]oxy‐4‐hydroxy‐6‐[(1R,2R)‐1,2,3‐trihydroxypropyl]oxane‐2‐carboxylic acid\n(2R,3S,4R,5R,7S,9S,10S,11R,12S,13R)-7-[(benzylcarbamoyl)oxy]-2-(1-{[(2R,3R,4R,5R,6R)-5-hydroxy-3,4-dimethoxy-6-methyltetrahydro-2H-pyran-2-yl]oxy}propan-2-yl)-10-{[(2S,3R,6R)-3-hydroxy-4-(methoxyimino)-6-methyltetrahydro-2H-pyran-2-yl]oxy}-3,5,7,9,11,13-hexamethyl-6,14-dioxo-12-{[(2S,5R,7R)-2,4,5-trimethyl-1,4-oxazepan-7-yl]oxy}oxacyclotetradecan-4-yl 3-methylbutanoate\n2-[4-[2-[[(2R)-1-[[(4R,7S,10S,13R,16S,19R)-10-(4-aminobutyl)-4-[[(2R,3R)-1,3 dihydroxybutan-2-yl]carbamoyl]-7-[(1R)-1-hydroxyethyl]-16-[(4-hydroxyphenyl)methyl]-13-(1H-indol3-ylmethyl)-6,9,12,15,18-pentaoxo-1,2-dithia-5,8,11,14,17-pentazacycloicos-19-yl]amino]-1-oxo-3 phenylpropan-2-yl]amino]-2-oxoethyl]-7,10-bis(carboxymethyl)-1,4,7,10-tetrazacyclododec-1-yl]acetic acid\n(2R,3S,4R,5R,7S,9S,10S,11R,12S,13R)-12-{[(2R,4R,5S,6S)-4,5-dihydroxy-4,6-dimethyltetrahydro-2H-pyran-2-yl]oxy}-7-hydroxy-2-(1-{[(2R,3R,4R,5R,6R)-5-hydroxy-3,4-dimethoxy-6-methyltetrahydro-2H-pyran-2-yl]oxy}propan-2-yl)-10-{[(2S,3R,6R)-3-hydroxy-4-(methoxyimino)-6-methyltetrahydro-2H-pyran-2-yl]oxy}-3,5,7,9,11,13-hexamethyl-6,14-dioxooxacyclotetradecan-4-yl 3-methylbutanoate\n(2S,4S,5R,6R)‐5‐acetamido‐2‐[(2S,3R,4R,5S,6R)‐5‐[(2S,3R,4R,5R,6R)‐3‐acetamido‐5‐hydroxy‐6‐(hydroxymethyl)‐4‐[(2R,3R,4S,5R,6R)‐3,4,5‐trihydroxy‐6‐(hydroxymethyl)oxan‐2‐yl]oxyoxan‐2‐yl]oxy‐2‐[(2R,3S,4R,5R,6R)‐4,5‐dihydroxy‐2‐(hydroxymethyl)‐6‐[(E,2S,3R)‐3‐hydroxy‐2‐(octadecanoylamino)octadec‐4‐enoxy]oxan‐3‐yl]oxy‐3‐hydroxy‐6‐(hydroxymethyl)oxan‐4‐yl]oxy‐4‐hydroxy‐6‐[(1R,2R)‐1,2,3‐trihydroxypropyl]oxane‐2‐carboxylic acid\n(2R,3S,4R,5R,7S,9S,10S,11R,12S,13R)-12-{[(2R,4R,5S,6S)-4,5-dihydroxy-4,6-dimethyltetrahydro-2H-pyran-2-yl]oxy}-2-(1-{[(2R,3R,4R,5R,6R)-5-hydroxy-3,4-dimethoxy-6-methyltetrahydro-2H-pyran-2-yl]oxy}propan-2-yl)-10-{[(2S,3R,6R)-3-hydroxy-4-(methoxyimino)-6-methyltetrahydro-2H-pyran-2-yl]oxy}-3,5,7,9,11,13-hexamethyl-7-({[2-(2-methyl-5-nitro-1H-imidazol-1-yl)ethyl]carbamoyl}oxy)-6,14-dioxooxacyclotetradecan-4-yl 3-methylbutanoate\nN-[(2S,3R,4R,5S,6R)-5-[(2S,3R,4R,5S,6R)-3-amino-5-[(2S,3R,4R,5S,6R)-3-amino-5-[(2S,3R,4R,5S,6R)-3-amino-5-[(2S,3R,4R,5S,6R)-3-amino-5-[(2S,3R,4R,5S,6R)-3-amino-5-[(2S,3R,4R,5S,6R)-3-amino-4,5-dihydroxy-6-(hydroxymethyl)oxan-2-yl]oxy-4-hydroxy-6-(hydroxymethyl)oxan-2-yl]oxy-4-hydroxy-6-(hydroxymethyl)oxan-2-yl]oxy-4-hydroxy-6-(hydroxymethyl)oxan-2-yl]oxy-4-hydroxy-6-(hydroxymethyl)oxan-2-yl]oxy-4-hydroxy-6-(hydroxymethyl)oxan-2-yl]oxy-2-[(2R,3S,4R,5R,6S)-5-amino-6-[(2R,3S,4R,5R,6R)-5-amino-4,6-dihydroxy-2-(hydroxymethyl)oxan-3-yl]oxy-4-hydroxy-2-(hydroxymethyl)oxan-3-yl]oxy-4-hydroxy-6-(hydroxymethyl)oxan-3-yl]carbamate\n</code></pre></div></div>\n\n<p>That last compound has the InChIKey <code class=\"language-plaintext highlighter-rouge\">DKPKDPKJVDQUPD-XGBIXEJNSA-M</code> and cannot be found in Google nor in PubChem.\nIt looks like this:</p>\n\n<p><img src=\"/assets/images/iupac_626.png\" alt=\"\" /></p>\n\n<p>There are <a href=\"https://pubchem.ncbi.nlm.nih.gov/#query=N-%5B(2S%2C3R%2C4R%2C5S%2C6R)-5-%5B(2S%2C3R%2C4R%2C5S%2C6R)-3-amino-5-%5B(2S%2C3R%2C4R%2C5S%2C6R)-3-amino-5-%5B(2S%2C3R%2C4R%2C5S%2C6R)-3-amino-5-%5B(2S%2C3R%2C4R%2C5S%2C6R)-3-amino-5-%5B(2S%2C3R%2C4R%2C5S%2C6R)-3-amino-5-%5B(2S%2C3R%2C4R%2C5S%2C6R)-3-amino-4%2C5-dihydroxy-6-(hydroxymethyl)oxan-2-yl%5Doxy-4-hydroxy-6-(hydroxymethyl)oxan-2-yl%5Doxy-4-hydroxy-6-(hydroxymethyl)oxan-2-yl%5Doxy-4-hydroxy-6-(hydroxymethyl)oxan-2-yl%5Doxy-4-hydroxy-6-(hydroxymethyl)oxan-2-yl%5Doxy-4-hydroxy-6-(hydroxymethyl)oxan-2-yl%5Doxy-2-%5B(2R%2C3S%2C4R%2C5R%2C6S)-5-amino-6-%5B(2R%2C3S%2C4R%2C5R%2C6R)-5-amino-4%2C6-dihydroxy-2-(hydroxymethyl)oxan-3-yl%5Doxy-4-hydroxy-2-(hydroxymethyl)oxan-3-yl%5Doxy-4-hydroxy-6-(hydroxymethyl)oxan-3-yl%5Dcarbamate\">some closely related compounds</a>,\nthough.</p>\n\n<h2 id=\"chemicals-only-published-about-once\">Chemicals only published about once</h2>\n\n<p>Some <a href=\"https://doi.org/10.59350/rzepa.28802\">related data was blogged</a> by <a href=\"https://orcid.org/0000-0002-8635-8390\">Henry Rzepa</a> last week,\nwith this quote by Lee from CAS:</p>\n\n<blockquote>\n  <p>38.5% of the current substances have only 1 reference</p>\n</blockquote>\n\n<p>Apparently, based on <a href=\"https://www.cas.org/support/documentation/chemical-substances\">CAS Registry</a> data,\nabout 1 in 3 chemical structures are only published about once. And two in three are published\nabout at least twice. I agree with Henry here, with organic chemistry literature in mind, I would have\nexpected that 38.5% to be higher.</p>\n\n<p>Anyway, since this project is not tracking in which articles IUPAC names are found, I have nothing to study this.</p>\n\n<h2 id=\"1-million-iupac-names\">1 million IUPAC names</h2>\n\n<p>So, the primary goal of this project is to reach one million IUPAC names. We are currently at around 23%.\nNot bad, considering we started in Februari. And we have plenty of untouched literature left.</p>\n\n<p>But I also applied <a href=\"https://chem-bla-ics.linkedchemistry.info/2025/04/27/one-million-iupac-names-2-the-100-thousand-milestone.html\">idea 1</a>,\nthe varying names. The idea is that this was I can explode the number of compounds. In that compounds above,\njust the number of variations by enumerating all <code class=\"language-plaintext highlighter-rouge\">OH</code> replacements with <code class=\"language-plaintext highlighter-rouge\">OMe</code> and <code class=\"language-plaintext highlighter-rouge\">OEt</code> would help a lot.</p>\n\n<p>Because I wanted to make sure I could answer positively at the ICCS if we made it to one million\nCCZero IUPAC names, I implemented a very simple enumeration script. Really dumb approach. But the\nresults are interesting. I started with the 200026 names from the milestone. If I\n<a href=\"https://github.com/BlueObelisk/iupac-names/blob/main/explode.groovy\">explode</a> these names,\nI get 1,377,127 IUPAC names, well above the target. Even if I remove name variations due to unicode\nvariations for hyphens, I still have 1,162,107 IUPAC names.</p>\n\n<p>Something interesting I cannot fully understand at this moment yet, however, is the following.\nWhen I calculate the number of unique InChIKeys for the milestone, I get 117,726 keys, and when I do\nthis for the list of name variations, I get 203,979 keys. So, while the IUPAC name list is about five\ntimes as long, the list of InChIKeys is not even twice as long. Well, I guess that is why this is called\nresearch.</p>",
      "summary": "I could not find the time earlier to report (reason), but three weeks ago we passed the fourth milestone release of the CCZero IUPAC names found in literature collection. This release contains 200026 IUPAC names, 168702 unique names, reflecting 116207 unique InChIKeys. Time for an update of the One Million IUPAC names project.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/iupac_626.png",
      "date_published": "2025-06-09T00:00:00+00:00",
      "date_modified": "2025-08-16T00:00:00+00:00",
      "tags": ["iupac","textmining","europepmc"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.59350/rzepa.28802", "doi": "10.59350/rzepa.28802"
            , "cito":
              
              
                [ 
                  "containsAssertionFrom"
                  
                 ]
              
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/mjxmp-7ra02",
      "url": "https://chem-bla-ics.linkedchemistry.info/2025/06/08/retracted-articles-cited-in-wikipedia.html",
      "title": "Retracted articles cited in Wikipedia",
      "content_html": "<p>Last week, the <a href=\"https://www.wikidata.org/wiki/Event:Wikidata_and_Sister_Projects\">Wikidata and Sister Projects</a> event tooks place.\nThe presentations are recorded, and I strongly encourage you to check the schedule. One presentation I liked (there are more),\nwas the one by <a href=\"https://www.wikidata.org/wiki/User:Mike_Peel\">Mike Peel</a> with the title\n<em>“Best practices for reusing Wikidata’s data in the Wikimedia Projects”</em>. At some point he walks us through the\n<a href=\"https://en.wikipedia.org/wiki/Template:Cite_Q\">{{Cite Q}}</a> template, <a href=\"https://www.youtube.com/live/xanSjW30g2o?feature=shared&amp;t=1561\">around 26:07</a>.</p>\n\n<p>I learned that this template will highlight when an article cited in Wikipedia is actually retracted (withdrawn or replaced).\nNow, for the past months, I have been using the Crossref API to the <a href=\"http://retractiondatabase.org\">Retraction Watch Database</a>\nand annotated thousands of articles as retracted, using URLs from the database as reference. I use\n<a href=\"https://github.com/egonw/ons-wikidata/blob/main/RetractionWatch/quickstatements.groovy\">this script</a>.</p>\n\n<p>So, that means that this work actually has had a massive impact. Perhaps thousands of (English) Wikipedia\nreaders have seen the results from running that script. That is pretty awesome! This is why we do open science.</p>\n\n<p>But it made me also wonder something else. The Retraction Watch Database has over 60 thousand articles and\nWikidata only about 22 thousand (at the time of writing). What if Wikipedia has an article not in Wikidata?\nWell, obviously, it cannot use <code class=\"language-plaintext highlighter-rouge\">{{Cite Q}}</code>. But wouldn’t we want to have that article in Wikidata? Clearly,\nthe article is notable; at least, in Wikipedia notability-sense. So, I was wondering, of those 40 thousand\nretracted articles not in Wikidata, how many are cited in English Wikipedia (to start with).</p>\n\n<p>So, I wrote <a href=\"https://github.com/egonw/ons-wikidata/blob/main/RetractionWatch/listRetractionsNotInWikidata.groovy\">a first script</a>\nthat lists DOIs in the Retraction Watch Database (via the CrossRef API downloaded list) that are not found\nin Wikidata. <a href=\"https://github.com/egonw/ons-wikidata/blob/main/RetractionWatch/searchMissingInWikipedia.groovy\">A second script</a>\nuses a Scholia (doi:<a href=\"https://doi.org/10.1007/978-3-319-70407-4_36\">10.1007/978-3-319-70407-4_36</a>) SPARQL query develop by Finn Nielsen that\n<a href=\"https://github.com/WDscholia/scholia/commit/caf2694a4\">uses wikibase:mwapi to do an efficient DOI search</a>.</p>\n\n<p>The results are fascinating and this is the list of DOIs found:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>Searching 10.1016/j.engfailanal.2021.105457 in Wikipedia...\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Filippo_Berto\nSearching 10.1002/14651858.CD002291 in Wikipedia...\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Antipruritic\nSearching 10.1007/s12115-020-00496-1 in Wikipedia...\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Lawrence_Mead\nSearching 10.1515/9783110619768 in Wikipedia...\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Dead_Eagle_Owl\nSearching 10.1016/j.sbi.2022.102426 in Wikipedia...\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Deborah_F._Kelly\nSearching 10.1371/journal.pone.0240851 in Wikipedia...\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Industrialization_of_China\nSearching 10.1080/00927670309601525 in Wikipedia...\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Rouben_Azizian\nSearching 10.1080/14693062.2016.1179616 in Wikipedia...\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Benjamin_K._Sovacool\nSearching 10.1038/s41390-022-02127-3 in Wikipedia...\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Harald_Walach\nSearching 10.1001/jamapediatrics.2021.2659 in Wikipedia...\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Harald_Walach\nSearching 10.1109/CCECE.2007.335 in Wikipedia...\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/List_of_scientific_misconduct_incidents\nSearching 10.1002/14651858.CD001834 in Wikipedia...\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Meningococcal_vaccine\nSearching 10.1007/s11356-021-16530-6 in Wikipedia...\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Kamran_Bagheri_Lankarani\nSearching 10.1016/j.biortech.2023.129044 in Wikipedia...\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Ashok_Pandey\nSearching 10.1002/14651858.CD003614 in Wikipedia...\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Trimetazidine\nSearching 10.1002/14651858.CD003808 in Wikipedia...\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Hall_Technique\nSearching 10.1002/14651858.CD003747 in Wikipedia...\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Venous_thrombosis\nSearching 10.1002/14651858.CD003711 in Wikipedia...\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Myocarditis\nSearching 10.1002/14651858.CD003225 in Wikipedia...\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Prevention_of_migraine_attacks\nSearching 10.1002/14651858.CD003226 in Wikipedia...\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Prevention_of_migraine_attacks\nSearching 10.1002/14651858.CD003498 in Wikipedia...\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/A2_milk\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Autism\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Autism_therapies\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Casein\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Casomorphin\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Causes_of_autism\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Gluten-free%2C_casein-free_diet\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Gluten-free_diet\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Opioid_excess_theory\nSearching 10.1093/restud/rdy054 in Wikipedia...\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Managerial_economics\nSearching 10.1039/d1nr00388g in Wikipedia...\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Deborah_F._Kelly\nSearching 10.1139/p90-116 in Wikipedia...\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Canada%27s_Stonehenge\nSearching 10.3791/64256 in Wikipedia...\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Thomas_J._Webster\nSearching 10.1016/j.biortech.2022.127565 in Wikipedia...\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Ashok_Pandey\nSearching 10.1016/j.anbehav.2015.04.001 in Wikipedia...\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Stegodyphus_dumicola\nSearching 10.1016/j.tafmec.2022.103573 in Wikipedia...\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Filippo_Berto\nSearching 10.1155/2022/7002630 in Wikipedia...\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Dopamine_receptor_D3\nSearching 10.1007/s11223-017-9884-2 in Wikipedia...\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Filippo_Berto\nSearching 10.1016/j.forsciint.2024.112115 in Wikipedia...\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Peter_A._McCullough\nSearching 10.1002/14651858.CD002778 in Wikipedia...\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Temporomandibular_joint_dysfunction\nSearching 10.1016/j.jcv.2022.105248 in Wikipedia...\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Kay_Davies\nSearching 10.1002/14651858.CD002916 in Wikipedia...\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Bleomycin\nSearching 10.1007/s11756-021-00841-7 in Wikipedia...\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/2021_in_archosaur_paleontology\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Argentinadraco\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Wellnhopterus\nSearching 10.4132/KoreanJPathol.2009.43.4.306 in Wikipedia...\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Cho_Kuk\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Cho_Min_academic_credentials_scandal\nSearching 10.1503/cmaj.80742 in Wikipedia...\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/LetUsTalk\nSearching 10.1002/14651858.CD004125 in Wikipedia...\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Granisetron\nSearching 10.1111/ffe.12616 in Wikipedia...\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Filippo_Berto\nSearching 10.1007/s00366-008-0118-x in Wikipedia...\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Kamran_Daneshjoo\nSearching 10.1002/jmv.28097 in Wikipedia...\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Noora_%28vaccine%29\nSearching 10.1126/science.1070563 in Wikipedia...\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Sch%C3%B6n_scandal\nSearching 10.1246/cl.170853 in Wikipedia...\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Enantioselective_Iridium-Catalyzed_C-H_Borylation\nSearching 10.1016/j.marpol.2017.06.032 in Wikipedia...\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Ray_Hilborn\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Tony_J._Pitcher\nSearching 10.1016/j.ijfatigue.2021.106450 in Wikipedia...\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Filippo_Berto\nSearching 10.1016/j.crphar.2022 in Wikipedia...\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Desidustat\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Emoxypine\nSearching 10.1007/s10479-023-05261-1 in Wikipedia...\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Artificial_intelligence_marketing\nSearching 10.1257/aer.20210369 in Wikipedia...\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Dividend_tax\nSearching 10.1109/AIMSEC.2011.6010222 in Wikipedia...\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/SAP_CRM\nSearching 10.1002/14651858.CD001103 in Wikipedia...\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Hydrocolloid_dressing\nSearching 10.1007/978-3-642-27708-5 in Wikipedia...\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/IP_Multimedia_Subsystem\nSearching 10.1351/PAC-CON-08-12-06 in Wikipedia...\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Parthenocissus_tricuspidata\nSearching 10.1002/advs.202204315 in Wikipedia...\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Tetrataenite\nSearching 10.1353/sho.2011.0038 in Wikipedia...\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Maus\nSearching 10.1111/jpim.12058 in Wikipedia...\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Ulrich_Lichtenthaler\nSearching 10.1136/bcr-2021-241572 in Wikipedia...\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/COVID-19_proxalutamide_trial_in_Brazil\nSearching 10.1038/s44160-022-00068-7 in Wikipedia...\n  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Single-layer_materials\n</code></pre></div></div>",
      "summary": "Last week, the Wikidata and Sister Projects event tooks place. The presentations are recorded, and I strongly encourage you to check the schedule. One presentation I liked (there are more), was the one by Mike Peel with the title “Best practices for reusing Wikidata’s data in the Wikimedia Projects”. At some point he walks us through the {{Cite Q}} template, around 26:07.",
      
      "date_published": "2025-06-08T00:10:00+00:00",
      "date_modified": "2025-06-08T00:10:00+00:00",
      "tags": ["wikidata","wikipedia","doi"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1007/978-3-319-70407-4_36", "doi": "10.1007/978-3-319-70407-4_36"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/j6ea0-ycg53",
      "url": "https://chem-bla-ics.linkedchemistry.info/2025/06/08/iccs2025-1-back-in-noordwijkerhout.html",
      "title": "ICCS2025: back in Noordwijkerhout",
      "content_html": "<p>This week the 13th <a href=\"https://iccs-nl.org/\">International Conference on Chemical Structures</a> took place\n(see also <a href=\"https://scholia.toolforge.org/event/Q133457282\">this Scholia overview</a> or\n<a href=\"https://scholia.toolforge.org/event-series/Q47501052\">this overview of the full ICCS history</a>). This\nis the conference I first <a href=\"https://chem-bla-ics.linkedchemistry.info/2005/11/02/open-source-data-mining-in.html\">joined 20 years ago</a>\nas a PhD student <a href=\"https://chem-bla-ics.linkedchemistry.info/2011/06/25/from-archives-my-iccs-2005-poster.html\">presenting a poster</a>\n(see <a href=\"https://chem-bla-ics.linkedchemistry.info/tag/iccs\">these past blog posts</a>).\nOf course, I am actually co-organizer nowadays (actually, co-treasurer). Organizing a meeting with just over\n200 participants, and I like to thank\n<a href=\"https://scholia.toolforge.org/author/Q30086276\">Gerard</a> and <a href=\"https://scholia.toolforge.org/author/Q52125630\">Willem</a>\nin particular, but also <a href=\"https://scholia.toolforge.org/Q134716139\">Pieter</a>,\n<a href=\"https://scholia.toolforge.org/Q47501420\">Marcus</a>, <a href=\"https://scholia.toolforge.org/Q47509033\">Frank</a>,\n<a href=\"https://scholia.toolforge.org/Q134721967\">Jenke</a>, and Frans Koeman who has helped us during the\npast three events.</p>\n\n<p>The meeting started, as usual, wth the <a href=\"https://scholia.toolforge.org/award/Q47508692\">CSA Trust Mike Lynch Award</a>,\nthis year awarded to Prof. <a href=\"https://scholia.toolforge.org/Q42717125\">Val Gillet</a>\n(see also <a href=\"https://csa-trust.org/2025/05/23/mike-lynch-award-2025-val-gillet/\">this press release</a>).</p>\n\n<p><a href=\"https://iccs-nl.org/general-information/scientific-program/\">This time</a>, there where the following themes,\nwhere the first was by far the most dominant theme:</p>\n\n<ul>\n  <li>Artificial Intelligence, Machine Learning, and QSAR (five sessions)</li>\n  <li>New Modalities and Large Chemical Data Sets (one session)</li>\n  <li>Advanced Cheminformatics Techniques (two sessions)</li>\n  <li>Integrative Structure-Based Drug Design (two sessions)</li>\n</ul>\n\n<p>My contribution this time was a poster for the <a href=\"https://vhp4safety.nl/\">VHP4Safety</a> project,\nbut more about that later.</p>\n\n<p>Like last time, I have been annotating speakers with identifier and accounts, if they provided those:</p>\n\n<p><img src=\"/assets/images/iccs_annotated_schedule.png\" alt=\"\" /></p>\n\n<p>As you can see, it also includes PDFs, for both <a href=\"https://iccs-nl.org/general-information/scientific-program/\">talks</a> and\n<a href=\"https://iccs-nl.org/posters/\">posters</a>. At the time of writing,\nI collected PDFs of two presentations and five posters. Additions are still most welcome,\nideally with DOI, so that they can be cited (doi:<a href=\"https://doi.org/\">10.5281/zenodo.15494630</a>,\ndoi:<a href=\"http://dx.doi.org/10.13140/RG.2.2.27441.90720\">10.13140/RG.2.2.27441.90720</a>\ndoi:<a href=\"https://doi.org/10.5281/zenodo.15614295\">10.5281/zenodo.15614295</a>, and\ndoi:<a href=\"http://dx.doi.org/10.13140/RG.2.2.36774.23365\">10.13140/RG.2.2.36774.23365</a>)!</p>\n\n<p>Finally, I like to remind everyone that there is again a\n<a href=\"https://www.biomedcentral.com/collections/ICCS25\">proceedings collection in the Journal of Cheminformatics</a>,\nand presenters of oral and poster presentations are invited to submit their presented\nwork to this collection.</p>",
      "summary": "This week the 13th International Conference on Chemical Structures took place (see also this Scholia overview or this overview of the full ICCS history). This is the conference I first joined 20 years ago as a PhD student presenting a poster (see these past blog posts). Of course, I am actually co-organizer nowadays (actually, co-treasurer). Organizing a meeting with just over 200 participants, and I like to thank Gerard and Willem in particular, but also Pieter, Marcus, Frank, Jenke, and Frans Koeman who has helped us during the past three events.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/iccs_annotated_schedule.png",
      "date_published": "2025-06-08T00:00:00+00:00",
      "date_modified": "2025-06-08T00:00:00+00:00",
      "tags": ["iccs"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.5281/zenodo.15494630", "doi": "10.5281/zenodo.15494630"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.13140/RG.2.2.27441.90720", "doi": "10.13140/RG.2.2.27441.90720"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.5281/zenodo.15614295", "doi": "10.5281/zenodo.15614295"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.13140/RG.2.2.36774.23365", "doi": "10.13140/RG.2.2.36774.23365"
             }
            
          
        ],
      
      "_funding": [{"award": { "title" : "The Virtual Human Platform for Safety Assessment", "acronym" : "VHP4Safety", "uri" : "drc.filenumber:nwa129219272" }, "funder": { "name": "Dutch Research Council", "ror": "04jsz6e67" } }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/zm558-pd424",
      "url": "https://chem-bla-ics.linkedchemistry.info/2025/05/25/new-preprint-scholia-chemistry-access-to-chemistry-in-wikidata.html",
      "title": "New preprint: &quot;Scholia Chemistry: access to chemistry in Wikidata&quot;",
      "content_html": "<p>Two week ago I uploaded a paper that has been in the works for some time. In fact, I first mention it as conference paper\nfor the special issue of the <a href=\"https://scholia.toolforge.org/event/Q47501229\">11th International Conference on Chemical Structures</a>,\nyou know, the meeting held in 2018, of which <a href=\"https://iccs-nl.org/\">the 13th edition</a> starts in 7 days. I had a\n<a href=\"https://doi.org/10.6084/m9.figshare.6356027.v1\">poster</a> at that conference which I described in\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2018/08/18/compound-class-identifiers-in-wikidata.html\">this blog post</a>.</p>\n\n<p>In turn, that poster described work of at least three years, going back to\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2015/12/22/new-edition-getting-cas-registry.html\">adding identifiers in 2015</a>\nand <a href=\"https://chem-bla-ics.linkedchemistry.info/2016/01/27/adding-chemical-compound-to-wikidata.html\">chemical structures in early 2016</a>.\nI started <a href=\"https://chem-bla-ics.linkedchemistry.info/2016/03/20/adding-disclosures-to-wikidata-with.html\">using scripts two months later</a>.\nThis helped a lot with <a href=\"https://chem-bla-ics.linkedchemistry.info/2016/03/27/migrating-pka-data-from-drugmet-to.html\">migrating pKa data</a>\nfrom a custom Semantic MediaWiki installation to Wikidata and with adding thousands of EPA CompTox\n<a href=\"https://chem-bla-ics.blogspot.com/2017/01/epa-comptox-dashboard-ids-in-wikidata.html\">identifiers in 2017</a>.</p>\n\n<p>But that 2018 conference paper never happened. Because <a href=\"https://chem-bla-ics.linkedchemistry.info/2017/10/15/two-conference-proceedings.html\">Scholia did</a>.\nAnd even on the ICCS poster, Scholia was used to visualize chemistry data in Wikidata. To be honest, not just that,\nof course. About a year ago I had a serious go at finishing the paper, and it was sent around to co-authors.\nBut I realized at the time, that the paper was lacking some good suggestions how the peer review our\nactual contributions to Wikidata. I could hardly expect readers of the paper browse the individual\nhistories of all, by then, 1.3 million chemical compounds. And during the holidays I collected a few\ntools, which I had lined up to add to the manuscript.</p>\n\n<p>However, another thing happened, the COVID-19 pandemic. While all the experience helped a lot with getting\nknowledge together around SARS-CoV-2, it also made something else clear: the software behind Wikidata\ndoes not scale well (enough). This lead to plans to split the RDF graph representation into two\nseparate SPARQL endpoints. And that breaks many, if not most, of Scholia’s SPARQL queries, including\nthose for the chemistry aspects. The situation in Summer 2024 was that there was a significant\nchance Scholia would not survive the split. And the <em>Scholia Chemistry</em> paper had to wait. You\ncannot publish an article of which the website is gone before it is formally accepted.</p>\n\n<p>Let me make clear, this graph split is not solved and the risk is not gone. But a serious of unfunded,\nweekend hackathons allows us to refactor Scholia to give us a chance. It started with\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2024/08/23/scholia.html\">making Scholia more configurable</a>.\nWe had the first hackathons in October and November, and I had\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2025/04/20/the-april-2025-scholia-hackathon.html\">four more hackathon weekends</a>\nthis April.</p>\n\n<p>The graph split into a main graph and a scholarly graph <a href=\"https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_graph_split\">happened on May 9</a>.\nCurrently, we have been granted extra time and can use a legacy server with the full graph, but a lot\nless hardware, so slower. A final patch, merged in last week, allows us to define which SPARQL endpoint a query\nshould run. So, each time we port a SPARQL query, we can directly update Scholia, making the migration\nsomewhat more manageable.</p>\n\n<p>But, with those uncertainties out of the way, it was time to finish the Scholia Chemistry paper!</p>\n\n<p>The preprint (doi:<a href=\"https://doi.org/10.26434/chemrxiv-2025-53n0w\">10.26434/chemrxiv-2025-53n0w</a>) brings\n10 years of research together, and describes details of the used methods not formally peer-reviewed before.\nWe describe in detail how chemical structures are added, the choices of Wikidata on how to\nrepresent chemical structures, how we curate the quality, and how we visualize chemical structures\nand data with Scholia. As you can expect, the Chemistry Development Kit has an important role,\nalong with the InChI.</p>\n\n<p>The paper introduces three new Scholia <em>aspects</em> for chemicals, chemical classes, and elements.\nEach aspect is a template for a page with information about molecular entities and chemical substances,\ncompound classes (like <em>fatty acids</em>), and elements (like carbon). Each template provides relevant\ninformation. Of course, any compound, class, or element can also still be opened in the Scholia\n“topic” aspect, listing relevant literature.</p>\n\n<p>With this paper we aim to show that Wikidata is a innovative platform that meets the needs for\na chemical structure database, with detailed data provenance, and scalable community curation.</p>\n\n<p>I welcome your strongest peer review on the preprint. I don’t liking settling for anything less.\nHere’s the abstract:</p>\n\n<blockquote>\n  <p>Sharing knowledge on chemicals in the digital age has been the playground of databases such\nas the Chemical Abstract Services and PubChem. Wikipedia complements this field by providing\ncontext to chemicals aimed at a broad audience, but is not easily read by machines. Wikidata\nwas started as a database service to improve the machine readability of the knowledge captured\nin Wikipedia. Wikidata has an open license, application programming interfaces, and a strong\nprovenance model. Scholia uses the features to provide access to chemical knowledge. This\nstudy reviews the chemistry in Wikidata, shows how thousands of new chemicals were added,\nextends Wikidata with new properties for chemical representation and external links to\nadditional databases, and shows how we extended Scholia to represent the chemistry in Wikidata.</p>\n</blockquote>\n\n<p>Thanks to Finn, Denise, Daniel, and Adriano!</p>",
      "summary": "Two week ago I uploaded a paper that has been in the works for some time. In fact, I first mention it as conference paper for the special issue of the 11th International Conference on Chemical Structures, you know, the meeting held in 2018, of which the 13th edition starts in 7 days. I had a poster at that conference which I described in this blog post.",
      
      "date_published": "2025-05-25T00:00:00+00:00",
      "date_modified": "2025-05-25T00:00:00+00:00",
      "tags": ["wikidata","scholia","chemistry","iccs"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.6084/m9.figshare.6356027.v1", "doi": "10.6084/m9.figshare.6356027.v1"
            , "cito":
              
              
                [ 
                  "citesAsEvidence"
                  
                 ]
              
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.26434/CHEMRXIV-2025-53N0W", "doi": "10.26434/CHEMRXIV-2025-53N0W"
             }
            
          
        ],
      
      "_funding": [{"award": { "title" : "To support development on Scholia, a software tool to facilitate the exploration and curation of the research literature", "acronym" : "Scholia", "uri" : "https://sloan.org/grant-detail/G-2019-11458" }, "funder": { "name": "Alfred P. Sloan Foundation", "ror": "052csg198" } }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/t01ew-fed58",
      "url": "https://chem-bla-ics.linkedchemistry.info/2025/05/12/intoxicom-training-material-collection.html",
      "title": "INTOXICOM Workshop #4: Training Materials for Toxicology",
      "content_html": "<p>INTOXICOM is an ELIXIR Toxicoloy Community (doi:<a href=\"https://doi.org/10.12688/f1000research.74502.2\">10.12688/f1000research.74502.2</a>)\nimplementation study, organizing five workshops, each\none with a different topic. The first three workshops were in Utrecht\n(doi:<a href=\"https://doi.org/10.37044/osf.io/un2rw\">10.37044/osf.io/un2rw</a>), Basel, and Uppsala.\nThe <a href=\"https://www.aanmelder.nl/intoxicom-ws-2/wiki/1152037/wp-4-features-ws-4\">fourth workshop</a>\nhas the theme <em>training material</em> and will be held in Birmingham (thanks\n<a href=\"https://www.linkedin.com/in/iseult-lynch-b5856910/\">Iseult</a>!). And one aspect of that is to make\nexisting training material from toxicology projects indexed and therefore more FAIR.</p>\n\n<p>Last week we had a telcon, and I promised to make an overview of some open educational resources\nthat I have (co-)developed over the years. It started during the eNanoMapper project, where\nour Project Technical Adviser was Dr. <a href=\"https://orcid.org/0000-0003-1461-0988\">Cedric Notredame</a>.\nHe commented on us using Word documents for training materials, and that was for me a cue to\n<a href=\"https://github.com/enanomapper/tutorials/commit/5ca03a94a3bd3b60dbb6e080f05cb9faca7eaf69\">start using</a>\nGitHub Pages instead. We later reused this for NanoCommons, and by then GitHub Pages had\nbecome common and the foundation for other online material for toxicology projects.</p>\n\n<p>Here follows an overview of tutorials I contributed to. Some are already registered\nwith TeSS, but I am sure I will learn a lot of new tricks, and particularly about the\nlearning paths.</p>\n\n<p>Hope to see you there! Register <a href=\"https://www.aanmelder.nl/intoxicom-ws-2/wiki/1152037/wp-4-features-ws-4\">here</a>.</p>\n\n<h2 id=\"enanomapper\">eNanoMapper</h2>\n\n<ul>\n  <li><a href=\"https://enanomapper.github.io/tutorials/BrowseOntology/Tutorial%20browsing%20eNM%20ontology.html\">Browsing the eNanoMapper ontology with BioPortal, AberOWL and Protégé</a></li>\n  <li><a href=\"https://enanomapper.github.io/tutorials/Entering_and_analysing_nano_safety_data/readme.html\">Entering and analysing nano safety data</a></li>\n  <li><a href=\"https://enanomapper.github.io/tutorials/Added%20ontology%20terms/README.html\">Adding ontology terms</a></li>\n  <li><a href=\"https://enanomapper.github.io/tutorials/Pathway_analysis/Pathway%20analysis.html\">How to use the Pathway module of ArrayAnalysis.org for pathway analysis of microarray data</a></li>\n  <li><a href=\"https://enanomapper.github.io/tutorials/Pathway/readme.html\">How to make a pathway</a></li>\n</ul>\n\n<p>By other eNanoMapper partners:</p>\n\n<ul>\n  <li><a href=\"https://enanomapper.github.io/tutorials/Omics%20descriptors%20calculation/Omics%20descriptors%20calculation%20R%20package.html\">Omics descriptors calculation</a> (by Georgia Tsiliki and Haralambos Sarimveis)</li>\n</ul>\n\n<h2 id=\"nanocommons\">NanoCommons</h2>\n\n<ul>\n  <li><a href=\"https://nanocommons.github.io/tutorials/enteringData/\">Adding nanomaterial data</a></li>\n  <li><a href=\"https://nanocommons.github.io/user-handbook/FAIRification/ten-simple-actions/\">Ten simple actions to make NSC Research Output more Findable</a></li>\n</ul>\n\n<p>By group members:</p>\n\n<ul>\n  <li><a href=\"https://nanocommons.github.io/tutorials/eNanoMapper/\">Adding new term to the eNanoMapper ontology</a> (by Laurent Winckers)</li>\n  <li><a href=\"https://nanocommons.github.io/user-handbook/\">NanoCommons User Guidance Handbook</a> (by Thomas Exner, with contributions by many)</li>\n</ul>\n\n<h2 id=\"fairplus\">FAIRplus</h2>\n\n<p>I add this one as well, as this below recipes were developed for the eTox use case.</p>\n\n<ul>\n  <li><a href=\"https://w3id.org/faircookbook/FCB007\">InChI and SMILES identifiers for chemical structures</a></li>\n  <li><a href=\"https://w3id.org/faircookbook/FCB080\">Creating InChIKeys for IUPAC names</a></li>\n</ul>\n\n<h2 id=\"vhp4safety\">VHP4Safety</h2>\n\n<ul>\n  <li><a href=\"https://docs.vhp4safety.nl/en/latest/tutorials/cheminfo/intro.html\">Information about chemicals</a></li>\n</ul>\n\n<p>Many more are listed on the <a href=\"https://docs.vhp4safety.nl/e\">VHP4Safety Docs</a> website.</p>",
      "summary": "INTOXICOM is an ELIXIR Toxicoloy Community (doi:10.12688/f1000research.74502.2) implementation study, organizing five workshops, each one with a different topic. The first three workshops were in Utrecht (doi:10.37044/osf.io/un2rw), Basel, and Uppsala. The fourth workshop has the theme training material and will be held in Birmingham (thanks Iseult!). And one aspect of that is to make existing training material from toxicology projects indexed and therefore more FAIR.",
      
      "date_published": "2025-05-12T00:00:00+00:00",
      "date_modified": "2025-05-12T00:00:00+00:00",
      "tags": ["enanomapper","nanocommons","elixir"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.12688/f1000research.74502.2", "doi": "10.12688/f1000research.74502.2"
            , "cito":
              
              
                [ 
                  "citesAsRecommendedReading"
                  
                 ]
              
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.37044/osf.io/un2rw", "doi": "10.37044/osf.io/un2rw"
            , "cito":
              
              
                [ 
                  "citesAsRecommendedReading"
                  
                 ]
              
             }
            
          
        ],
      
      "_funding": [{"award": { "title" : "eNanoMapper - A Database and Ontology Framework for Nanomaterials Design and Safety Assessment", "acronym" : "eNanoMapper", "uri" : "cordis.project:604134" }, "funder": { "name": "European Commission", "ror": "00k4n6c32" } },{"award": { "title" : "The European Nanotechnology Community Informatics Platform: Bridging data and disciplinary gaps for industry and regulators", "acronym" : "NanoCommons", "uri" : "cordis.project:731032" }, "funder": { "name": "European Commission", "ror": "00k4n6c32" } },{"award": { "title" : "The Virtual Human Platform for Safety Assessment", "acronym" : "VHP4Safety", "uri" : "drc.filenumber:nwa129219272" }, "funder": { "name": "Dutch Research Council", "ror": "04jsz6e67" } }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/dycsw-qeq51",
      "url": "https://chem-bla-ics.linkedchemistry.info/2025/04/27/one-million-iupac-names-2-the-100-thousand-milestone.html",
      "title": "One Million IUPAC names #2: the 100 thousand milestone",
      "content_html": "<p>Two and a half month into the <a href=\"https://chem-bla-ics.linkedchemistry.info/2025/03/08/iupac-names.html\">One Million IUPAC Names</a>\nproject, we passed <a href=\"https://github.com/BlueObelisk/iupac-names/releases/tag/milestone-100k\">the third milestone</a>,\nthe one for 100 thousand IUPAC names (doi:<a href=\"https://doi.org/10.5281/zenodo.15266459\">10.5281/zenodo.15266459</a>).\nTime for an update.</p>\n\n<p>This milestone release took a bit longer. Going from 50 to 100 thousand is a bigger step than from 10 to 50\nthousand, but the open access chemistry literature was already done by then. Basically, I ran out of open access\nchemistry publications. The scripts are now finding names in all (open access) literature, and the number of\nnew names per articles is a lot lower. Still about 1 in every twenty to 30 articles. But the diversity in names\nis not really going down, which is important.</p>\n\n<p>The first few weeks, I used the Google Colab to run a Jupyter notebook, initial created by\n<a href=\"https://cpm.lumc.nl/research/bioinformatics-224/magnus-palmblad-5\">Magnus</a>, but having to process more articles\nto get a reasonable number of new IUPAC names required longer and longer jobs, and then Google Colab\nis not really fit (well, the free version anyway). So, I started using a local script. That turned out\nto be able to handle up to 20 thousand articles in one go and runs at least twice as fast. Moreover, I can\nrun three of them in parallel.</p>\n\n<p>And that had impact. With each commit around 1000 new IUPAC names, the number of commits went up remarkably\nlast week:</p>\n\n<p><img src=\"/assets/images/iupac-names-commits.png\" alt=\"\" /></p>\n\n<p>At the current speed, I think we’ll make it to 150k soon and I added a new milestone for 200k, which sounds\ndoable in the next three week. That also means that 1M extracted IUPAC names from literature has become\na reasonable goal. And we can start thinking about the 2, 5, 10, 50 and 100 million IUPAC names. Those are,\nat the current speed, rather unlikely to reach from the open access literature anytime soon. That brings\nus to the question, what will. Well, I have some ideas.</p>\n\n<h3 id=\"idea-1-name-variations\">Idea 1: name variations</h3>\n\n<p>First, I am figuring out some ways to make variants of names (no, not based on hyphens and spaces; that’s too easy),\nbut actual variations of the chemical structures. For example, I could exhaustively replace “methoxy” with “ethoxy”,\nand iterate the halogens and acyl chain lengts. I have little doubt that I can grow the list with this approach\neasily a 5-fold, maybe even a 10-fold.</p>\n\n<h3 id=\"idea-2-hallucination\">Idea 2: hallucination</h3>\n\n<p>Another idea is that I could use tools that can generate IUPAC names for a limited set of compounds.\nI once wrote code for alkanes myself and if I can find that, I may be able to generate additional names.\nBut perhaps more realistic is that I train a deep learning model and have it generate names for all compounds in\nWikidata (~1.5 million) or PubChem (&gt;100 million). STOUT needed 81 million compounds\n(doi:<a href=\"https://doi.org/10.1186/s13321-021-00512-4\">10.1186/s13321-021-00512-4</a>), but I don’t need a good model;\nI just need a model that comes up with new, valid names. Hallucinated names, but valid.</p>\n\n<p>While the list of valid names grows, I can retrain the deep-learned model and repeat. As long as the diversity\nremains high enough, one could hypothesize that the deep learning will learn new tricks. And then,\nthat should be a near infinite source of additional names.</p>\n\n<h3 id=\"idea-3-semi-closed-access-literature\">Idea 3: (semi-)closed access literature</h3>\n\n<p>Also, I haven’t touched closed access articles yet. This is all based on the collection of full texts\nin <a href=\"https://europepmc.org/\">Europe PMC</a>. For example, I could start with the green open access article\nin (Dutch) university repositories, particularly those with large chemistry departments. PDF to text\ntools are mature enough that this will provide a new source. Oh, and perhaps PhD thesis, which are now\nalso increasingly archived in university repository under open access. And that reminds me of a Dutch\nproject two decades ago doing exactly that. I wish I remembered the name.</p>\n\n<h3 id=\"idea-4-alternatives-to-oscar4-and-europe-pmc\">Idea 4: alternatives to Oscar4 and Europe PMC</h3>\n\n<p>So, the first round of named entity recognition was with Europe PMC itself, as explained in\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2025/03/08/iupac-names.html\">the first post</a>. The move\nto Oscar4 helped a lot. But there exist many other chemical NER tools, like\n(doi:<a href=\"https://doi.org/10.1093/bioinformatics/btn181\">10.1093/bioinformatics/btn181</a>. And those may\nfind an additional number of names, even with just the literature I already covered.</p>\n\n<p>Well, you get the idea.</p>\n\n<h2 id=\"iccs-poster-rejected\">ICCS poster rejected</h2>\n\n<p>Unfortunately, the <a href=\"https://iccs-nl.org/\">ICCS poster</a> abstract did not make the cut. The score was high enough,\nbut they received many abstracts and had to make a selection (of course, I am part of the ICCS organization,\nand have more details of how it came about). I really like the project, and eager to write up a paper around\nit.</p>",
      "summary": "Two and a half month into the One Million IUPAC Names project, we passed the third milestone, the one for 100 thousand IUPAC names (doi:10.5281/zenodo.15266459). Time for an update.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/iupac-names-commits.png",
      "date_published": "2025-04-27T00:00:00+00:00",
      "date_modified": "2025-06-09T00:00:00+00:00",
      "tags": ["iupac","textmining","oscar","europepmc"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1186/s13321-021-00512-4", "doi": "10.1186/s13321-021-00512-4"
            , "cito":
              
              
                [ 
                  "citesForInformation"
                  
                 ]
              
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1093/bioinformatics/btn181", "doi": "10.1093/bioinformatics/btn181"
            , "cito":
              
              
                [ 
                  "citesAsPotentialSolution"
                  
                 ]
              
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/d79ep-rzd32",
      "url": "https://chem-bla-ics.linkedchemistry.info/2025/04/20/the-april-2025-scholia-hackathon.html",
      "title": "The April 2025 Scholia hackathon",
      "content_html": "<p>This is the third weekend I am working on Scholia, the first two part of the <a href=\"https://www.wikidata.org/wiki/Wikidata:Scholia/Events/Hackathon_April_2025#Participants\">April 2025 hackathon</a>. It follows the hackathons\nlast year <a href=\"https://www.wikidata.org/wiki/Wikidata:Scholia/Events/Hackathon_October_2024\">October</a> and\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2024/11/17/sparql-examples.html\">November</a> hackathons.\nThere is some urgency for this unpaid work, because Wikidata is splitting the RDF into two\nSPARQL endpoints (see <a href=\"https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2024-05-16/Op-Ed\">this The Signpost</a>\nand <a href=\"https://finnaarupnielsen.wordpress.com/2024/10/18/scholia-in-the-age-of-the-wikidata-query-service-split/\">this post by Finn</a>).\nThis split has happened, but there is a <em>legacy</em> server for tools that have not been upgraded.</p>\n\n<p>Scholia has not been upgraded. It has more then 350 SPARQL queries, and each one has to be tested\nseparately and updating every query is not trivial. Together with Daniel, Finn, and others, I have\nhacked up patches last year to:</p>\n\n<ul>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2024/08/23/scholia.html\">configure Scholia for the endpoint to use</a></li>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2024/11/17/sparql-examples.html\">create pages for many Scholia SPARQL queries</a></li>\n</ul>\n\n<p>This month I continued working on the second, and I:</p>\n\n<ul>\n  <li><a href=\"https://github.com/WDscholia/scholia/pull/2597\">added titles for chemistry aspect panels</a></li>\n  <li><a href=\"https://github.com/WDscholia/scholia/pull/2589\">use the legacy SPARQL endpoint</a></li>\n</ul>\n\n<p>That second also indicated that the legacy server has more limited resources and users will more\nquickly run into error messages that too many queries are run in parallel. Now, users can rerun\nthe query, but then the results table contains the previous error message. Second, you want to run\nthe queries for one aspect to not run all at the same time, but have Scholia send of the\nquery when the panel becomes visible (and scrolling a page takes a bit of time).</p>\n\n<p>For these issues, I wrote these two patches (yet to be approven and merged):</p>\n\n<ul>\n  <li><a href=\"https://github.com/WDscholia/scholia/pull/2608\">delete the previous error message</a></li>\n  <li><a href=\"https://github.com/WDscholia/scholia/pull/2611\">lazy load the table and iframe panels</a></li>\n</ul>\n\n<p>Now, the iframes already had some aspects of lazy loading, but it turned out that it was mostly\nlazy display, and the queries were still run as soon as possible. This last patch challenged my\nJavaScript skills and I learned <code class=\"language-plaintext highlighter-rouge\">Intersection Observer API</code>, a browser technology that allows\nthe browser to see what part of the webpage you are looking at right now. Yeah, I can easily\nsee how that does user profiling, but in this case it is just used to fire of the SPARQL\nquery when it become relevant. It uses an additional callback function, so I had to\nmake sure Jekyll/Liquid creates custom callback functions for each panel.\nActually, I intended to show the code here, but I am not entirely sure how to escape\nthe code so that Jekyll does not try to run the instructions. For now, you have to\n<a href=\"https://github.com/WDscholia/scholia/pull/2611/files\">check the PR</a>.</p>\n\n<h2 id=\"scholia-chemistry-paper\">Scholia Chemistry paper</h2>\n\n<p>The other things I have been doing, is finally finish up the Scholia Chemistry paper.\nThat actually depended on the maturing of various tools, me figuring out how to characterize\nthe actual amount of content contributed to Wikidata and how to make that transparent,\nand more recently, the above to be able to convince readers Scholia will not die with\nthe graph split. With the above pages, we have, I think, sufficient guarantee it will\nbe around for another few years, at least.</p>\n\n<p>This paper, which I hope to finish the final draft today, applying\nsome good feedback from co-author last weekend, is the final bit of work done on\nthe Alfred P. Sloan Foundation grant.</p>\n\n<p>We intend to put the paper up as preprint soon and then submit it do a Diamond Open Access\njournal, one that supports CiTO citation intent annotation.</p>",
      "summary": "This is the third weekend I am working on Scholia, the first two part of the April 2025 hackathon. It follows the hackathons last year October and November hackathons. There is some urgency for this unpaid work, because Wikidata is splitting the RDF into two SPARQL endpoints (see this The Signpost and this post by Finn). This split has happened, but there is a legacy server for tools that have not been upgraded.",
      
      "date_published": "2025-04-20T00:00:00+00:00",
      "date_modified": "2025-04-20T14:58:00+00:00",
      "tags": ["scholia","javascript","sparql"],
      
      "_funding": [{"award": { "title" : "To support development on Scholia, a software tool to facilitate the exploration and curation of the research literature", "acronym" : "Scholia", "uri" : "https://sloan.org/grant-detail/G-2019-11458" }, "funder": { "name": "Alfred P. Sloan Foundation", "ror": "052csg198" } }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/mxfta-p1k55",
      "url": "https://chem-bla-ics.linkedchemistry.info/2025/04/13/updating-links-to-blogs-peter-murray-rust.html",
      "title": "Updating links to blogs: Peter Murray-Rust",
      "content_html": "<p>One reason to <a href=\"https://chem-bla-ics.linkedchemistry.info/2023/07/27/archiving-and-updating-my-blog.html\">move my blog</a>\nfrom Blogger to a git-backed repository is that I can update links but that the version history shows exactly what\nchange was made. I have been using three icons: when I can find a new URL for the website, I use <i class=\"fa-solid fa-recycle fa-xs\"></i>;\nwhen I cannot find a new URL, but the old URL is in the <a href=\"https://web.archive.org/\">Internet Archive</a>,\nI use a <i class=\"fa-solid fa-box-archive fa-xs\"></i>;\nfinally, if I cannot find anything to replace the broken link, I use <i class=\"fa-solid fa-link-slash fa-xs\"></i>.\nFor example, the blog of Rich Apodaca is <a href=\"https://chem-bla-ics.linkedchemistry.info/2024/12/27/archiving_blogs.html\">now archived</a>,\nand I have been updating the many links to his (still running) blog to use DOI links. That has the added benefit\nof <a href=\"https://chem-bla-ics.linkedchemistry.info/2024/12/30/fair-blog-to-blog-citations.html\">making blog-to-blog citations more FAIR</a>.</p>\n\n<p>Now, another blog I link to a lot, is the blog by Peter Murray-Rust, which has run at various URLs, including\n<a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/\">http://wwmm.ch.cam.ac.uk/blogs/murrayrust/</a> and now at\n<a href=\"https://blogs.ch.cam.ac.uk/pmr/\">https://blogs.ch.cam.ac.uk/pmr/</a>. My posts on Blogger have a lot of links\nto the <code class=\"language-plaintext highlighter-rouge\">wwmm.ch.cam.ac.uk</code> domain, e.g. <code class=\"language-plaintext highlighter-rouge\">http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=845</code>. The problem\nwith these URLs is that they do not get properly rewritten and all point to a <em>European copyright: Cancel Articles 3, 11 and 13</em>\npost from 2018.</p>\n\n<p>But here too, the Internet Archive is helping. It gives me the opportunity to find what <code class=\"language-plaintext highlighter-rouge\">p=845</code> pointed too.\nNow, not all blog posts are archived and therefore sometimes need to figure out which posts it was using archived\nposts just before or after the post I linked too. Of course, this only works because Peter’s blog is still online\nwith all posts. First step is to list all pages using the old URL pattern in the archive, e.g. with\n<code class=\"language-plaintext highlighter-rouge\">https://web.archive.org/web/*/http://wwmm.ch.cam.ac.uk/blogs/murrayrust/*</code>.\nHere, the first <code class=\"language-plaintext highlighter-rouge\">*</code> indicates <code class=\"language-plaintext highlighter-rouge\">any date</code>, while the second <code class=\"language-plaintext highlighter-rouge\">*</code> indicates <code class=\"language-plaintext highlighter-rouge\">any URL that started with the preceding</code>.</p>\n\n<p>This gives <a href=\"https://web.archive.org/web/*/http://wwmm.ch.cam.ac.uk/blogs/murrayrust/*\">a list of 1233 posts</a>, which you can filter with the text box on the top right of the list, where I filtered\nhere for URLs with <code class=\"language-plaintext highlighter-rouge\">p=</code>:</p>\n\n<p><img src=\"/assets/images/pmr_internet_archive.png\" alt=\"\" /></p>\n\n<p>However, if the blog post I am looking for is listed, it does not mean it actually has been archived. In the above\nscreenshot, note the <code class=\"language-plaintext highlighter-rouge\">From</code> column, and that date needs to be from before the <code class=\"language-plaintext highlighter-rouge\">wwmm.ch.cam.ac.uk</code> stopped being\nused. For example, the following post has an Internet Archive, but after the move and the original content is\nnot visible, but a redirect message instead, here visible as a green date:</p>\n\n<p><img src=\"/assets/images/pmr_no_luck.png\" alt=\"\" /></p>\n\n<p>This way, I have been able to update various links already, not with a DOI and blog-to-blog citation as for\nRich’s blog, but just with an updated link so that readers of these older posts actually end up on\nthe post I originally linked to, such as\n<a href=\"https://github.com/egonw/blog2/commit/d4c5e02c725ba799609e643f46389ea3b5266f6c#diff-863da47bd5bf4b33d3701c737854d2b3cd30a1802abaa225c78acac72d6d6f83L36-R38\">here</a>:</p>\n\n<p><img src=\"/assets/images/pmr_update.png\" alt=\"\" /></p>",
      "summary": "One reason to move my blog from Blogger to a git-backed repository is that I can update links but that the version history shows exactly what change was made. I have been using three icons: when I can find a new URL for the website, I use ; when I cannot find a new URL, but the old URL is in the Internet Archive, I use a ; finally, if I cannot find anything to replace the broken link, I use . For example, the blog of Rich Apodaca is now archived, and I have been updating the many links to his (still running) blog to use DOI links. That has the added benefit of making blog-to-blog citations more FAIR.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/pmr_internet_archive.png",
      "date_published": "2025-04-13T00:00:00+00:00",
      "date_modified": "2025-04-13T00:00:00+00:00",
      "tags": ["blog"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/143wd-e3m51",
      "url": "https://chem-bla-ics.linkedchemistry.info/2025/03/30/cdk2024-4.html",
      "title": "cdk2024 #6: wrapping up already",
      "content_html": "<p>Tomorrow is already the last day of the <a href=\"https://www.nwo.nl/en/researchprogrammes/open-science/open-science-fund\">NWO Open Science</a> grant\nfor the <a href=\"https://cdk.github.io/\">Chemistry Development Kit</a>. We are wrapping up, but I am happy we have a few weeks more\nto finish up the reporting. We held a user group meeting earlier this month (btw, check out the <a href=\"https://doi.org/10.5281/zenodo.15058009\">slides by Jonas</a>),\nand I did a <a href=\"https://github.com/cdk/cdk/pull/1175\">few</a> <a href=\"https://github.com/cdk/cdk/pull/1178\">more</a>\n<a href=\"https://github.com/cdk/cdk/pull/1179\">JUnit</a> <a href=\"https://github.com/cdk/cdk/pull/1180\">testing</a> updates last week:</p>\n\n<p><img src=\"/assets/images/cdk2024_junit.png\" alt=\"\" /></p>\n\n<p>Actually, you see <a href=\"https://github.com/cdk/cdk/pull/1177\">one pull request</a> here that I closed. I accidentally included a\ncircular dependency. Some core CDK functionality is hard to test with an implementation of the data model, but if that\nimplementation depends on the module you are testing, that won’t work (not in Maven anyway). But the good bits got included\nin the next pull request. One of the goals was to improve the code covered by the tests.</p>\n\n<p>This <em>coverage testing</em> has an imporant code maintenance purpose: it visualizes whih code is not checked. Sometimes\ncode is not tested that under normal conditions should have been (a bug) and sometimes it is handling a rare situation\nwhich you want tested too, to make sure that rare case does not get covered by the common code. Thus, the percentage\ncode covered by tests should be as high as is reasonable. The pull requests therefore aim to raise that percentage,\nsuch as for this pull request:</p>\n\n<p><img src=\"/assets/images/cdk2024_coverage.png\" alt=\"\" /></p>\n\n<p>Indeed, over the past 12 months, the coverage did improve, perhaps not as much as we liked, with 2.32 percent point\nto 64.96 percent:</p>\n\n<p><img src=\"/assets/images/cdk2024_coverage2.png\" alt=\"\" /></p>\n\n<p>The CDK started routinely using unit testing somewhere in the zeroes, with home made continous integration\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2006/05/01/nightly-cdk-builds-now-available.html\">as early as 2006</a>,\nshared with the development community every night, thanks to Rajarshi Guha’s effort. We had our\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2006/11/28/code-coverage-making-sure-your-code-is.html\">first code coverage results in the same year</a>.\nAnd at some point we had sufficient coverage that it gave us the opportunity to routinely check\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2007/11/07/comparing-junit-test-results-between.html\">the impact of a patch</a>.\nOf course, this is exactly what many open source projects do with GitHub Actions nowadays.</p>\n\n<p>Unfortunately, we also found that many of the CDK-using tools we worked on in this grant to get updated to\nuse a (more) recent CDK version do not have such solutions in place. That left us in several cases quite\nin the dark. More about that soon!</p>",
      "summary": "Tomorrow is already the last day of the NWO Open Science grant for the Chemistry Development Kit. We are wrapping up, but I am happy we have a few weeks more to finish up the reporting. We held a user group meeting earlier this month (btw, check out the slides by Jonas), and I did a few more JUnit testing updates last week:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/cdk2024_junit.png",
      "date_published": "2025-03-30T00:00:00+00:00",
      "date_modified": "2025-04-20T00:00:00+00:00",
      "tags": ["cdk2024","cdk","junit"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.5281/zenodo.15058009", "doi": "10.5281/zenodo.15058009"
            , "cito":
              
              
                [ 
                  "citesAsRecommendedReading"
                  
                 ]
              
             }
            
          
        ],
      
      "_funding": [{"award": { "title" : "The Chemistry Development Kit in 2024: improving cheminformatics research", "acronym" : "CDK2024", "uri" : "drc.filenumber:osf232097" }, "funder": { "name": "Dutch Research Council", "ror": "04jsz6e67" } }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/hdb72-f7198",
      "url": "https://chem-bla-ics.linkedchemistry.info/2025/03/23/orcid-and-ror.html",
      "title": "New preprint: &quot;BioHackEU24 report: ORCID and ROR identifiers in BioHackrXiv reports&quot;",
      "content_html": "<p>While this was not the primary hack project during the <a href=\"https://biohackathon-europe.org/\">ELIXIR BioHackathon Europe</a>\nlast autumn, but I really like BioHackrXiv and I got the question if I could have a look at getting\nthe ORCID logo in generated PDF. The ORCID was already in the YAML metadata of report markdown,\nso it sounded easy. Well, it was a big more complicated, but all the nicer to now have the project\nreport online (doi:<a href=\"https://doi.org/10.37044/osf.io/p9u42_v1\">10.37044/osf.io/p9u42_v1</a>). And once the ORCID was working,\nadding the <a href=\"https://ror.org/\">Research Organisation Registry</a> ID was not much harder. Cool to see both used\nin other <a href=\"https://osf.io/preprints/biohackrxiv/discover?sort=-dateCreated\">recent BioHackrXiv reports</a>!</p>\n\n<p><img src=\"/assets/images/biohackrxiv_orcid_ror.png\" alt=\"\" /></p>",
      "summary": "While this was not the primary hack project during the ELIXIR BioHackathon Europe last autumn, but I really like BioHackrXiv and I got the question if I could have a look at getting the ORCID logo in generated PDF. The ORCID was already in the YAML metadata of report markdown, so it sounded easy. Well, it was a big more complicated, but all the nicer to now have the project report online (doi:10.37044/osf.io/p9u42_v1). And once the ORCID was working, adding the Research Organisation Registry ID was not much harder. Cool to see both used in other recent BioHackrXiv reports!",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/biohackrxiv_orcid_ror.png",
      "date_published": "2025-03-23T00:00:00+00:00",
      "date_modified": "2025-03-23T00:00:00+00:00",
      "tags": ["elixir","biohackrxiv"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.37044/osf.io/p9u42_v1", "doi": "10.37044/osf.io/p9u42_v1"
            , "cito":
              
              
                [ 
                  "describes"
                  
                 ]
              
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/n7e6r-p1e93",
      "url": "https://chem-bla-ics.linkedchemistry.info/2025/03/16/CDK-UGM-2.html",
      "title": "cdk2024 #5: Chemistry Development Kit User Group Meeting - Day 2",
      "content_html": "<p>Where <a href=\"https://chem-bla-ics.linkedchemistry.info/2025/03/11/CDK-UGM.html\">the first workshop day</a> had several\ntalks about new and old features of the <a href=\"https://cdk.github.io/\">Chemistry Development Kit</a> (CDK), the second day\nwas a hackathon day. We hacked and we talked. The coding was not mostly only the CDK repository itself,\nbut <a href=\"https://github.com/cdk/cdk/commits/main/?since=2025-03-10&amp;until=2025-03-11\">some things happened there</a> too:</p>\n\n<p><img src=\"/assets/images/cdk_hackathon.png\" alt=\"\" /></p>\n\n<p>Some pointers:</p>\n\n<ul>\n  <li>we worked on the <a href=\"http://cdk.github.io/cdkbook/\">Groovy Cheminformatics with the Chemistry Development Kit</a> book\n    <ul>\n      <li>move the repository to the GitHub organisation</li>\n      <li>improved the build system</li>\n    </ul>\n  </li>\n  <li>it was explored how the CDK can generate SMILES for glycans</li>\n  <li>continued work on updated tools using the CDK, e.g. <a href=\"https://apps.cytoscape.org/apps/chemviz2\">ChemViz2</a></li>\n  <li>code clean up, e.g. on <a href=\"https://github.com/cdk/cdk/commit/7263da00a86d97f965995bf7e706eadb95b90aa9\">JavaDoc</a>\nand <a href=\"https://github.com/cdk/cdk/commit/f8621e5fc02cfc73f52272b9dcc9354a4b0bc35d\">the XML parsing</a></li>\n  <li>a <a href=\"https://github.com/JChemPaint/jchempaint/releases/tag/3.4b\">JChemPaint release, based on CDK 2.10</a>,\nwith a <a href=\"https://flathub.org/apps/io.github.jchempaint.JChemPaint\">flatpak for easy install on many Linux distributions</a></li>\n</ul>\n\n<p>We further had discussions about a possible change of the license (what it would involve) and OSGi support.\nThe problem there is that Java packages can only exist in one OSGi bundle, and this is currently not the\ndiscuss. We discussed that the current modules were partially setup to clean up dependencies, and generally\nmodularize the CDK (e.g. each module could have a separate person responsible). We now want to propose\na larger <code class=\"language-plaintext highlighter-rouge\">core</code> module which covers the common cheminformatics functionality. A final discussion point\nI want to mention is that there are serious hints that we may have the Chemistry Development Kit\nas JavaScript in the browser soon!</p>\n\n<p>I like to thank every one who joined the workshop, particularly those that travelled to Maastricht\nfrom the UK, Germany, and Bulgaria. Also thanks to the six participants online who joined on the\nfirst day!</p>",
      "summary": "Where the first workshop day had several talks about new and old features of the Chemistry Development Kit (CDK), the second day was a hackathon day. We hacked and we talked. The coding was not mostly only the CDK repository itself, but some things happened there too:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/cdk_hackathon.png",
      "date_published": "2025-03-16T00:00:00+00:00",
      "date_modified": "2025-03-16T00:00:00+00:00",
      "tags": ["cdk","openscience","cdk2024"],
      
      "_funding": [{"award": { "title" : "The Chemistry Development Kit in 2024: improving cheminformatics research", "acronym" : "CDK2024", "uri" : "drc.filenumber:osf232097" }, "funder": { "name": "Dutch Research Council", "ror": "04jsz6e67" } }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/e08pe-thb38",
      "url": "https://chem-bla-ics.linkedchemistry.info/2025/03/11/CDK-UGM.html",
      "title": "cdk2024 #4: Chemistry Development Kit User Group Meeting - Day 1",
      "content_html": "<p>As part of our <a href=\"https://www.nwo.nl/en/\">Dutch Research Council</a> (NWO) <a href=\"https://www.nwo.nl/en/projects/osf232097\">Open Science grant</a>,\nwe organized a <a href=\"https://cdk.github.io/nwo-openscience-2024/\">Chemistry Development Kit User Group Meeting</a>\n(<a href=\"https://hashtags-hub.toolforge.org/CDK25UGM\">#CDK25UGM</a>), of which yesterday was the “conference” day, and today a hackathon.</p>\n\n<p>I opened the session with a few slides welcoming everyone at Maastricht University (and our\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2025/01/27/translational-genomics.html\">Dept of Translational Genomics</a>,\nand explaining the NWO grant.\n<a href=\"https://orcid.org/0000-0001-7730-2646\">John Mayfield</a> (<a href=\"https://www.nextmovesoftware.com/\">NextMove</a>) spoke about\n“What’s New” in the Chemistry Development Kit 2.10, e.g. explaining more about the new (much faster) <code class=\"language-plaintext highlighter-rouge\">AtomContainer</code>,\nSMIRKS, and more.</p>\n\n<p>After lunch, <a href=\"https://orcid.org/0000-0003-1554-6666\">Jonas Schaub</a> (<a href=\"https://www.uni-jena.de/en/\">Friedrich Schiller University Jena</a>)\nshowed various projects where the CDK is used, titled  “Scaffolds, Functional Groups, Aglycones: Algorithmic Substructure Identification with CDK”\n(see doi:<a href=\"https://doi.org/10.1186/s13321-023-00762-4\">10.1186/s13321-023-00762-4</a>, doi:<a href=\"https://doi.org/10.1186/s13321-022-00656-x\">10.1186/s13321-022-00656-x</a>,\nand doi:<a href=\"https://doi.org/10.1186/s13321-020-00467-y\">10.1186/s13321-020-00467-y</a>).\nLyudvika Radeva (<a href=\"https://www.ideaconsult.net/\">Ideaconsult Ltd</a>, <a href=\"https://uni-plovdiv.bg/en/\">University of Plovdiv</a>) showed\nwhat SYBYL Line Notation (SLN) is and how this is implemented in Ambit (see doi:<a href=\"https://doi.org/10.1002/minf.202100027\">10.1002/minf.202100027</a>).\n<a href=\"https://orcid.org/0000-0002-4354-4353\">Sonja Herres-Pawlis</a> (<a href=\"https://www.rwth-aachen.de/\">RWTH Aachen University</a>)\nupdated us with “News from the InChI: making the InChI FAIR and including inorganics”, e.g. showing how\nthey worked out how the InChI is going to handle organometalics, where the bonds and the stereochemistry\nas aspects that were not handled by the current InChI.</p>\n\n<p>After the afternoon coffee break, <a href=\"https://orcid.org/0000-0003-3662-2621\">Zhixu Ni</a> (<a href=\"https://fedorovalab.net/team/zhixu-ni/\">TU Dresden</a>)\nshowed his work on lipid maps characterization and identification. We previously met a few times\nat EpiLipidNET COST action meetings, and it was great to see his continued research on representation\nof lipids and lipid classes in hit “A Fuzzy Solution for Lipid Structures Using CXSMILES”.</p>\n\n<p>Finally, <a href=\"https://www.linkedin.com/in/matthiasmailaender/\">Matthias Mailänder</a> (<a href=\"https://www.lablicate.com/\">Lablicate GmbH</a>)\ngave a “Live demo of where <a href=\"https://github.com/OpenChrom\">OpenChrom</a> uses the CDK”, and\n<a href=\"https://research.rug.nl/en/persons/yajie-ding\">Yajie Ding</a> (University of Groningen) told her about her\nglycoscience research. There, cheminformatics can also greatly help and the CDK may provide\nthem with solutions.</p>\n\n<p>This really doesn’t do justice to all the discussions, examples, use cases, etc. But it gives you\nan idea. We had 11 people in the room, and were joined online by an additional 6 people.</p>",
      "summary": "As part of our Dutch Research Council (NWO) Open Science grant, we organized a Chemistry Development Kit User Group Meeting (#CDK25UGM), of which yesterday was the “conference” day, and today a hackathon.",
      
      "date_published": "2025-03-11T00:00:00+00:00",
      "date_modified": "2025-03-11T14:00:00+00:00",
      "tags": ["cdk","openscience","cdk2024"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1002/minf.202100027", "doi": "10.1002/minf.202100027"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1186/s13321-022-00656-x", "doi": "10.1186/s13321-022-00656-x"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1186/s13321-023-00762-4", "doi": "10.1186/s13321-023-00762-4"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1186/s13321-020-00467-y", "doi": "10.1186/s13321-020-00467-y"
             }
            
          
        ],
      
      "_funding": [{"award": { "title" : "The Chemistry Development Kit in 2024: improving cheminformatics research", "acronym" : "CDK2024", "uri" : "drc.filenumber:osf232097" }, "funder": { "name": "Dutch Research Council", "ror": "04jsz6e67" } }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/tjkf2-k1608",
      "url": "https://chem-bla-ics.linkedchemistry.info/2025/03/08/iupac-names.html",
      "title": "One Million IUPAC names",
      "content_html": "<p>Names of chemicals are part of the human user experience when browsing a chemical database. And literature too,\nof course. Chemical names are also not easy to use, and what a chemical name means is not always clear.\nThis is why the <a href=\"https://en.wikipedia.org/wiki/International_Union_of_Pure_and_Applied_Chemistry\">IUPAC</a>\nstarted a standardizing nomenclature in chemistry, the <em>IUPAC names</em>. Each IUPAC name uniquely defines\nthe chemical structure it defines. For example, <em>methane</em> is the IUPAC name for the chemical CH<sub>4</sub>.</p>\n\n<p>So, when propagating chemical structures from the <a href=\"https://chem-bla-ics.linkedchemistry.info/2025/02/13/beiltein-journal-has-bioschemas.html\">Beilstein Bioschemas feed</a>,\nI was looking for names, IUPAC or not, ideally the name used in the article. When I asked about this,\nthe question came up if they could autogenerate IUPAC names, for which\n<a href=\"https://doi.org/10.1038/s41598-021-94082-y\">various</a>\n<a href=\"https://doi.org/10.1186/s13321-021-00535-x\">new</a>\n<a href=\"https://doi.org/10.1186/s13321-021-00512-4\">tools</a>\n<a href=\"https://doi.org/10.1186/s13321-024-00941-x\">exist</a>\n(I think I am missing one from an American team, but cannot find the reference),\nalong with multiple established commerical tools.\nBecause the IUPAC nomenclature is a long list of naming rules, priorities, etc, a rule-based\nalgorithm is logical, but newer methods take a deep-learning approach.</p>\n\n<p>Back to the chemical annotation of chemistry literature. This is of obvious interest: you want\nto know where we can read more about a certain chemical. We need the chemical structures in\na database for that, linked to the articles. This is, of course, one of the original studies\nof <em>cheminformatics</em>. And when authors of the chemical literature do not provide this routinely\n(<a href=\"https://chem-bla-ics.linkedchemistry.info/2025/02/13/beiltein-journal-has-bioschemas.html\">this post</a>\nshows a few exceptions, but it is still all too rare). And then manual and automated curation\nis needed, e.g. done by <a href=\"https://en.wikipedia.org/wiki/Chemical_Abstracts_Service\">Chemical Abstracts</a>.</p>\n\n<p>Third, <a href=\"https://wikidata.org/\">Wikidata</a> has <a href=\"https://scholia.toolforge.org/chemical/\">about 1.4 million</a>\nchemical compounds and many names. A <a href=\"https://www.wikidata.org/wiki/Wikidata:Property_proposal/Pending#IUPAC_name\">property propoal for IUPAC names</a>\nhas been long pending, but once accepted in one form or another, will require IUPAC names too.</p>\n\n<h2 id=\"one-million-iupac-names\">One million IUPAC names</h2>\n\n<p>Thus, the idea came up, can we create a set of 1 million unique IUPAC names found in literature?\nI asked on the <a href=\"https://elixir-europe.org/\">ELIXIR Europe</a> slack channel if <a href=\"https://europepmc.org/\">Europe PMC</a>\nhad such a dataset (doi:<a href=\"https://doi.org/10.1093/nar/gkad1085\">10.1093/nar/gkad1085</a>). I knew they had been adding chemical\n<a href=\"https://scholia.toolforge.org/topic/Q403574\">named-entity recognition</a> (NER) results in\n<a href=\"https://europepmc.org/Annotations\">their annotation API</a>. I learned they used <a href=\"https://www.ebi.ac.uk/chebi/\">ChEBI</a>.\nMelanie Vollmar and Summer Rosonovski or Europe PMC gave useful information and support.\n<a href=\"https://cpm.lumc.nl/research/bioinformatics-224/magnus-palmblad-5\">Magnus Palmblad</a> also replied\nand provided Python code to use the Europe PMC API to fetch names it returns and see if those\nare IUPAC names. Well, that’s easy. We have <a href=\"https://opsin.ch.cam.ac.uk/\">OPSIN</a> for that\n(see doi:<a href=\"https://doi.org/10.1021/ci100384d\">10.1021/ci100384d</a>).</p>\n\n<p>Unfortunately, the Europe PMC NER results are not ideal for IUPAC names. Just scanning\nsome 5, 6 organic chemistry journals returned some 8 thousand IUPAC names in open access\narticles. But it quickly started to be too limited: each set of articles returned\nincreasingly few new names. The reason is simple: the NER is too <em>greedy</em> and as a\nresult, does not easily recognize longer IUPAC names. It is too happy with a substring\nof the IUPAC name. For example, when it encounters the IUPAC name <em>5-Bromo-1H-indole-3-carboxylic acid</em>,\nit settles for <em>indole-3-carboxylic acid</em>:</p>\n\n<p><img src=\"/assets/images/greedy.png\" alt=\"\" /></p>\n\n<h2 id=\"open-source-chemistry-analysis-routines\">Open-Source Chemistry Analysis Routines</h2>\n\n<p>During my PhD, in 2003, when I worked a few months with Prof. <a href=\"https://scholia.toolforge.org/author/Q908710\">Peter Murray-Rust</a> (University of Cambridge)\nand Prof. Janet Thornthon (EMBL-EBI), I learned about the research by <a href=\"https://scholia.toolforge.org/author/Q28946549\">Sam Adams</a>\n(doi:<a href=\"https://doi.org/10.1039/B411699M\">10.1039/B411699M</a>), <a href=\"https://scholia.toolforge.org/author/Q133040220\">Joe Townsend</a>\n(doi:<a href=\"https://doi.org/10.1039/B411033A\">10.1039/B411033A</a>), and <a href=\"https://scholia.toolforge.org/author/Q90318722\">Peter Corbett</a>\n(doi:<a href=\"https://doi.org/10.1007/11875741_11\">10.1007/11875741_11</a>). One of the tools that used\nthis research was (is) <a href=\"https://scholia.toolforge.org/topic/Q133037490\">OSCAR</a>,\nshort for <em>Open-Source Chemistry Analysis Routines</em> (see <a href=\"https://blogs.ch.cam.ac.uk/pmr/2009/05/16/opsin-and-oscar-chemical-language-processing/\">this detailed write up by Peter MR</a>).\nLater, in 2010 I visted Peter again, as postdoc, in Cambridge, and then\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2010/10/15/working-on-oscar-for-three-months.html\">worked on the OSCAR project</a> too.\nAnd while OSCAR did a lot more, the integration of <a href=\"https://chem-bla-ics.linkedchemistry.info/2010/12/26/oscar-training-data-models-etc.html\">Corbett’s NER research</a>\nmade OSCAR the obvious follow-up step in finding IUPAC names in literature.</p>\n\n<p>And because <a href=\"https://chem-bla-ics.linkedchemistry.info/2011/09/27/almost-year-ago-i-started-position-with.html\">OSCAR4 had been integrated into Bioclipse</a>\n(doi:<a href=\"https://doi.org/10.1186/1758-2946-3-41\">10.1186/1758-2946-3-41</a>) and I had this ported to Bacting already\n(doi:<a href=\"https://doi.org/10.21105/joss.02558\">10.21105/joss.02558</a>), using this was trivial.\nThe use of Europe PMC is different now, however, and we are no longer using the Annotations API,\nbut just using it to find open access articles, and to get the full text in XML format.\nThat allows a simple XPath search on <code class=\"language-plaintext highlighter-rouge\">&lt;p&gt;</code> elements, pass the resulting string to OSCAR4,\nand the recognized names are checked with OPSIN.\nAnd with this approach, processing two of the five or six journals we earlier explored,\nwe find another 40+ thousand IUPAC names. Quite a success, I am tempted to say.</p>\n\n<h2 id=\"a-blue-obelisk-project\">A Blue Obelisk project</h2>\n\n<p>So, I started a new <a href=\"https://blueobelisk.github.io/\">Blue Obelisk</a> project,\n<a href=\"https://github.com/BlueObelisk/iupac-names\">iupac-names</a>, to collect 1M IUPAC names. For researchers\nto use, learn from, etc. Just IUPAC names. Not even the chemical structure, nor the link to the\narticles. The first is trivial to do with OPSIN, so the matching SMILES do not need to be stored.\nLinks to literature is tricky because of the aforementioned issues, and we only want to know\nwhich (partial) IUPAC names occur in literature. If you really want to know in which articles\nthat IUPAC name is found, you can simply do a search in Europe PMC.</p>\n\n<p>And because we only store IUPAC names, this are very basic facts (this is an IUPAC name, as defined\nby OPSIN being able to generate a SMILES for this structure) and that that string occurs in\nsome article) and we can share them as CCZero. We <a href=\"is:issue\" title=\"milestone release\">defined various milestones</a>,\nand I am happy that the first two have been reached within two weeks:</p>\n\n<ul>\n  <li><a href=\"https://github.com/BlueObelisk/iupac-names/releases/tag/milestone-10k\">Milestone 10k</a> (doi:<a href=\"https://doi.org/10.5281/zenodo.14965762\">10.5281/zenodo.14965762</a>)</li>\n  <li><a href=\"https://github.com/BlueObelisk/iupac-names/releases/tag/milestone-50k\">Milestone 50k</a> (doi:<a href=\"https://doi.org/10.5281/zenodo.14978557\">10.5281/zenodo.14978557</a>)</li>\n</ul>\n\n<p>This second milestone has 53848 unique names, but as literature goes, there are interesting\nvariations, some likely because of typesetting leading to spaces added and missing. If\nwe ignore spaces and hyphens, we have 50534 names left (hence the milestone). But IUPAC\nnames are also not fully unique, partly because of Unicode character variations and greek\nletter alternatives, and you may wonder how many different chemical structures this set\nreflects. While not perfect, the Standard InChI gives some lower limit, and we find 36528\nInChIKeys in this second milestone.</p>\n\n<p>Now, we need twenty times as much to reach the 1M IUPAC names, but given we have many, many\nmore open access articles to process. The bottleneck seems to be mostly our workflow.</p>\n\n<h3 id=\"can-you-contribute\">Can you contribute?</h3>\n\n<p>Yes, of course! This is an open science project. But please keep in mind the narrow focus of this\nproject: only IUPAC names which can be found in (open access) literature. This project doed not accept\nautogenerated names (PubChem would have given use many millions already), nor IUPAC names from existing\ndatabases. Ideally, you are able to show the code you use to extract/find those names in literature.</p>\n\n<h3 id=\"can-i-use-these-names\">Can I use these names?</h3>\n\n<p>First of all, this is what the CCZero license and open science nature of this project is about: reuse.\nWe love to hear how you are using these names, tho, and we encourage you to write up how you\nare using them. You can use <a href=\"https://datacite.org/\">DataCite</a> to cite the release you used,\nand citing this blog post by DOI is also possible.</p>\n\n<h3 id=\"does-it-support-my-language-too\">Does it support my language too?</h3>\n\n<p>No, at this moment it only support IUPAC names in English. Dutch, French, Spanish, or Chinese\nIUPAC names are valid, but currently not supported. See also\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2010/12/30/text-mining-chemistry-from-dutch-or.html\">this post</a>.</p>\n\n<h3 id=\"will-there-be-a-publication\">Will there be a publication?</h3>\n\n<p>Magnus and I intend so. We already submitted an abstract to the <a href=\"https://iccs-nl.org/\">International Conference on Chemical Structures</a>,\nwhich has <a href=\"https://www.biomedcentral.com/collections/ICCS25\">a Collection in the Journal of Cheminformatics</a>.\nIf the abstract gets accepted, of course, we can submit there. Otherwise, we will look for another venue,\nlikely <a href=\"https://en.wikipedia.org/wiki/Diamond_open_access\">diamond open access</a>.</p>\n\n<h3 id=\"where-is-your-script\">Where is your script?</h3>\n\n<p>Ah, fair point. We did not decide on the final license yet. I have used two scripts based on the template\nby Magnus. As soon as we have finalized the license, we will make those available.</p>",
      "summary": "Names of chemicals are part of the human user experience when browsing a chemical database. And literature too, of course. Chemical names are also not easy to use, and what a chemical name means is not always clear. This is why the IUPAC started a standardizing nomenclature in chemistry, the IUPAC names. Each IUPAC name uniquely defines the chemical structure it defines. For example, methane is the IUPAC name for the chemical CH4.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/greedy.png",
      "date_published": "2025-03-08T00:00:00+00:00",
      "date_modified": "2025-03-12T00:00:00+00:00",
      "tags": ["iupac","cheminf","oscar","textmining","europepmc"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1038/s41598-021-94082-y", "doi": "10.1038/s41598-021-94082-y"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1186/s13321-021-00512-4", "doi": "10.1186/s13321-021-00512-4"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1186/s13321-021-00535-x", "doi": "10.1186/s13321-021-00535-x"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1186/s13321-024-00941-x", "doi": "10.1186/s13321-024-00941-x"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1021/ci100384d", "doi": "10.1021/ci100384d"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1039/B411699M", "doi": "10.1039/B411699M"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1039/B411033A", "doi": "10.1039/B411033A"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1007/11875741_11", "doi": "10.1007/11875741_11"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1186/1758-2946-3-41", "doi": "10.1186/1758-2946-3-41"
            , "cito":
              
              
                [ 
                  "usesMethodIn"
                  
                 ]
              
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.21105/JOSS.02558", "doi": "10.21105/JOSS.02558"
            , "cito":
              
              
                [ 
                  "usesMethodIn"
                  
                 ]
              
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1093/nar/gkad1085", "doi": "10.1093/nar/gkad1085"
            , "cito":
              
              
                [ 
                  "usesMethodIn"
                  
                 ]
              
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.5281/zenodo.14965762", "doi": "10.5281/zenodo.14965762"
            , "cito":
              
              
                [ 
                  "citesAsEvidence"
                  
                 ]
              
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.5281/zenodo.14978557", "doi": "10.5281/zenodo.14978557"
            , "cito":
              
              
                [ 
                  "citesAsEvidence"
                  
                 ]
              
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/w4zj3-mbw53",
      "url": "https://chem-bla-ics.linkedchemistry.info/2025/02/16/retraction-data-in-wikidata.html",
      "title": "Retracted articles in Wikidata",
      "content_html": "<p>A good number of years ago, a colleague and I explored if we could get access to the <a href=\"retractiondatabase.org/\">Retraction Watch Database</a>,\nbut we could not afford it. We have been using data on retractions for curate our databases, like\n<a href=\"https://www.wikipathways.org/\">WikiPathways</a>. A database should not contain knowledge based on (only) a retracted article.\nWikidata, btw, has a small number (499) of statements supported by retracted articles. Similarly, it turns out that I am\n<a href=\"https://w.wiki/8pwe\">citing retracted articles in two papers</a> (and a preprint of one of them).</p>\n\n<p><a href=\"https://www.wikidata.org/\">Wikidata</a> has a good number of retracted articles in their database\n(<a href=\"https://scholia.toolforge.org/statistics\">some 21 thousand at the time of writing</a>). A lot of this data\ncomes from CrossRef, that recently <a href=\"https://www.crossref.org/blog/news-crossref-and-retraction-watch/\">acquired the Retraction Watch Database</a>\n(doi:<a href=\"https://doi.org/10.13003/c23rw1d9\">10.13003/c23rw1d9</a>)) and started providing the content as FAIR and Open data.\nWith <a href=\"https://github.com/egonw/ons-wikidata/blob/main/RetractionWatch/quickstatements.groovy\">a Bacting-based script</a>\nI am regularly updating Wikidata with annotations from CrossRef, giving a rich dataset in Wikidata around\nthe queries. Over the past few years I have written various SPARQL queries to show the results which today\nI <a href=\"https://bigcat-um.github.io/sparql-examples/examples/WikidataRetractions/\">collected under a single home</a>:</p>\n\n<p><img src=\"/assets/images/retraction_SPARQL.png\" alt=\"\" /></p>",
      "summary": "A good number of years ago, a colleague and I explored if we could get access to the Retraction Watch Database, but we could not afford it. We have been using data on retractions for curate our databases, like WikiPathways. A database should not contain knowledge based on (only) a retracted article. Wikidata, btw, has a small number (499) of statements supported by retracted articles. Similarly, it turns out that I am citing retracted articles in two papers (and a preprint of one of them).",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/retraction_SPARQL.png",
      "date_published": "2025-02-16T00:00:00+00:00",
      "date_modified": "2025-02-16T00:00:00+00:00",
      "tags": ["wikidata","wikipathways"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1093/NAR/GKAD960", "doi": "10.1093/NAR/GKAD960"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.13003/c23rw1d9", "doi": "10.13003/c23rw1d9"
            , "cito":
              
              
                [ 
                  "citesAsEvidence"
                  
                 ]
              
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/yyjnz-n5j48",
      "url": "https://chem-bla-ics.linkedchemistry.info/2025/02/13/beiltein-journal-has-bioschemas.html",
      "title": "Beilstein journals contain Bioschemas",
      "content_html": "<p>Two weeks ago, the <a href=\"https://www.beilstein-journals.org/bjoc/news/LAFGBV6PT5ASC5R7JOKSEXOQYM\">Beilstein Institute announced Bioschemas support in their journals</a>:</p>\n\n<blockquote>\n  <p>We streamline the discoverability of your research by incorporating machine-readable chemical information into many of our published articles.\nThis includes the conversion of chemical structures from submitted ChemDraw files to InChI strings and validating them using open-source tools.</p>\n</blockquote>\n\n<p>The idea is far from new and has been around for two decades. But the <a href=\"https://scholia.toolforge.org/publisher/Q4881267\">two Beilstein journals</a>\n(both <a href=\"https://en.wikipedia.org/wiki/Diamond_open_access\">diamond Open Access</a>), actually integrated into their active publishing model.\nThat has been trialed and put in action before. For example, there was (is?) <a href=\"https://doi.org/10.59350/ne4rf-wey66\">Project Prospect</a>\n(2007), <a href=\"https://chem-bla-ics.linkedchemistry.info/2009/03/19/nature-chemistry-improves-publishing.html\">chemical structure annotation in Nature Chemistry</a>\n(2009), <a href=\"https://chem-bla-ics.linkedchemistry.info/2014/02/21/slow-publishing-innovation.html\">SMILES in the ACS Journal of Medicinal Chemistry</a>\n(2014) (doi:<a href=\"https://doi.org/10.1021/jm5002056\">10.1021/jm5002056</a>),\nand <em>FAIR chemical structures in the Journal of Cheminformatics</em> (2021) (doi:<a href=\"https://doi.org/10.1186/s13321-021-00520-4\">10.1186/s13321-021-00520-4</a>).</p>\n\n<p>But this announcement is a new step. I like how validation of the chemical structures is part of the approach, and I like\nhow they use the <a href=\"https://bioschemas.org/\">Bioschemas</a> extention of <a href=\"https://schema.org/\">schema.org</a>. The last because\nthey use two Bioschemas types/profiles that contributed to or initiated, respectively: <a href=\"https://bioschemas.org/profiles/MolecularEntity/0.5-RELEASE\">MolecularEntity</a>\nand <a href=\"https://bioschemas.org/profiles/ChemicalSubstance/0.4-RELEASE\">ChemicalSubstance</a>.</p>\n\n<p>First stop for me is to check the schema.org annotation with a validation tool, like <a href=\"https://search.google.com/test/rich-results\">Google’s Rich Results Test</a>.\nThat gives an idea how they may have have their search engine pick it up. The test article I was given on LinkedIn is\nXiao <em>et al.</em>’s <em>Molecular diversity of the reactions of MBH carbonates of isatins and various nucleophiles</em>\n(doi:<a href=\"https://doi.org/10.3762/bjoc.21.21\">10.3762/bjoc.21.21</a>) in the <a href=\"https://scholia.toolforge.org/venue/Q2894008\">Beilstein Journal of Organic Chemistry</a>,\nand we indeed <a href=\"https://search.google.com/test/rich-results/result?id=FRW9wBOpXtsMp9TLUV6SfQ\">see the schema.org annotation show up</a>:</p>\n\n<p><img src=\"/assets/images/bjoc_bioschemas.png\" alt=\"\" /></p>\n\n<p>And because of the use of open standards, extracting the information is not so hard with, for example here,\nBacting (doi:<a href=\"https://doi.org/10.21105/joss.02558\">10.21105/joss.02558</a>), based on a 2022 script from the NanoSafety Cluster\nprojects NanoCommons and SbD4Nano:</p>\n\n<div class=\"language-groovy highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nd\">@Grab</span><span class=\"o\">(</span><span class=\"n\">group</span><span class=\"o\">=</span><span class=\"s1\">'io.github.egonw.bacting'</span><span class=\"o\">,</span> <span class=\"n\">module</span><span class=\"o\">=</span><span class=\"s1\">'managers-rdf'</span><span class=\"o\">,</span> <span class=\"n\">version</span><span class=\"o\">=</span><span class=\"s1\">'1.0.4'</span><span class=\"o\">)</span>\n<span class=\"nd\">@Grab</span><span class=\"o\">(</span><span class=\"n\">group</span><span class=\"o\">=</span><span class=\"s1\">'io.github.egonw.bacting'</span><span class=\"o\">,</span> <span class=\"n\">module</span><span class=\"o\">=</span><span class=\"s1\">'managers-ui'</span><span class=\"o\">,</span> <span class=\"n\">version</span><span class=\"o\">=</span><span class=\"s1\">'1.0.4'</span><span class=\"o\">)</span>\n<span class=\"nd\">@Grab</span><span class=\"o\">(</span><span class=\"n\">group</span><span class=\"o\">=</span><span class=\"s1\">'io.github.egonw.bacting'</span><span class=\"o\">,</span> <span class=\"n\">module</span><span class=\"o\">=</span><span class=\"s1\">'net.bioclipse.managers.jsoup'</span><span class=\"o\">,</span> <span class=\"n\">version</span><span class=\"o\">=</span><span class=\"s1\">'1.0.4'</span><span class=\"o\">)</span>\n\n<span class=\"n\">bioclipse</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"n\">net</span><span class=\"o\">.</span><span class=\"na\">bioclipse</span><span class=\"o\">.</span><span class=\"na\">managers</span><span class=\"o\">.</span><span class=\"na\">BioclipseManager</span><span class=\"o\">(</span><span class=\"s2\">\".\"</span><span class=\"o\">);</span>\n<span class=\"n\">rdf</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"n\">net</span><span class=\"o\">.</span><span class=\"na\">bioclipse</span><span class=\"o\">.</span><span class=\"na\">managers</span><span class=\"o\">.</span><span class=\"na\">RDFManager</span><span class=\"o\">(</span><span class=\"s2\">\".\"</span><span class=\"o\">);</span>\n<span class=\"n\">jsoup</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"n\">net</span><span class=\"o\">.</span><span class=\"na\">bioclipse</span><span class=\"o\">.</span><span class=\"na\">managers</span><span class=\"o\">.</span><span class=\"na\">JSoupManager</span><span class=\"o\">(</span><span class=\"s2\">\".\"</span><span class=\"o\">);</span>\n\n<span class=\"n\">articles</span> <span class=\"o\">=</span> <span class=\"o\">[</span>\n   <span class=\"n\">args</span><span class=\"o\">[</span><span class=\"mi\">0</span><span class=\"o\">]</span>\n<span class=\"o\">]</span>\n\n<span class=\"n\">kg</span> <span class=\"o\">=</span> <span class=\"n\">rdf</span><span class=\"o\">.</span><span class=\"na\">createInMemoryStore</span><span class=\"o\">()</span>\n\n<span class=\"k\">for</span> <span class=\"o\">(</span><span class=\"n\">article</span> <span class=\"k\">in</span> <span class=\"n\">articles</span><span class=\"o\">)</span> <span class=\"o\">{</span>\n    <span class=\"n\">htmlContent</span> <span class=\"o\">=</span> <span class=\"n\">bioclipse</span><span class=\"o\">.</span><span class=\"na\">download</span><span class=\"o\">(</span><span class=\"n\">article</span><span class=\"o\">)</span>\n\n    <span class=\"n\">htmlDom</span> <span class=\"o\">=</span> <span class=\"n\">jsoup</span><span class=\"o\">.</span><span class=\"na\">parseString</span><span class=\"o\">(</span><span class=\"n\">htmlContent</span><span class=\"o\">)</span>\n\n    <span class=\"c1\">// application/ld+json</span>\n\n    <span class=\"n\">bioschemasSections</span> <span class=\"o\">=</span> <span class=\"n\">jsoup</span><span class=\"o\">.</span><span class=\"na\">select</span><span class=\"o\">(</span><span class=\"n\">htmlDom</span><span class=\"o\">,</span> <span class=\"s2\">\"script[type='application/ld+json']\"</span><span class=\"o\">);</span>\n\n    <span class=\"k\">for</span> <span class=\"o\">(</span><span class=\"n\">section</span> <span class=\"k\">in</span> <span class=\"n\">bioschemasSections</span><span class=\"o\">)</span> <span class=\"o\">{</span>\n        <span class=\"n\">bioschemasJSON</span> <span class=\"o\">=</span> <span class=\"n\">section</span><span class=\"o\">.</span><span class=\"na\">html</span><span class=\"o\">()</span>\n        <span class=\"n\">rdf</span><span class=\"o\">.</span><span class=\"na\">importFromString</span><span class=\"o\">(</span><span class=\"n\">kg</span><span class=\"o\">,</span> <span class=\"n\">bioschemasJSON</span><span class=\"o\">,</span> <span class=\"s2\">\"JSON-LD\"</span><span class=\"o\">)</span>\n    <span class=\"o\">}</span>\n<span class=\"o\">}</span>\n\n<span class=\"n\">turtle</span> <span class=\"o\">=</span> <span class=\"n\">rdf</span><span class=\"o\">.</span><span class=\"na\">asTurtle</span><span class=\"o\">(</span><span class=\"n\">kg</span><span class=\"o\">);</span>\n\n<span class=\"n\">println</span> <span class=\"s2\">\"#\"</span> <span class=\"o\">+</span> <span class=\"n\">rdf</span><span class=\"o\">.</span><span class=\"na\">size</span><span class=\"o\">(</span><span class=\"n\">kg</span><span class=\"o\">)</span> <span class=\"o\">+</span> <span class=\"s2\">\" triples detected in the JSON-LD\"</span>\n<span class=\"c1\">// println turtle</span>\n\n\n<span class=\"n\">sparql</span> <span class=\"o\">=</span> <span class=\"s2\">\"\"\"\nPREFIX schema: &lt;http://schema.org/&gt;\nSELECT ?entity ?inchikey ?smiles WHERE {\n  ?entity a schema:MolecularEntity .\n  OPTIONAL { ?entity schema:inChIKey ?inchikey }\n  OPTIONAL { ?entity schema:smiles ?smiles }\n}\n\"\"\"</span>\n\n<span class=\"n\">results</span> <span class=\"o\">=</span> <span class=\"n\">rdf</span><span class=\"o\">.</span><span class=\"na\">sparql</span><span class=\"o\">(</span><span class=\"n\">kg</span><span class=\"o\">,</span> <span class=\"n\">sparql</span><span class=\"o\">)</span>\n\n<span class=\"k\">for</span> <span class=\"o\">(</span><span class=\"n\">i</span><span class=\"o\">=</span><span class=\"mi\">1</span><span class=\"o\">;</span><span class=\"n\">i</span><span class=\"o\">&lt;=</span><span class=\"n\">results</span><span class=\"o\">.</span><span class=\"na\">rowCount</span><span class=\"o\">;</span><span class=\"n\">i</span><span class=\"o\">++)</span> <span class=\"o\">{</span>\n  <span class=\"n\">println</span> <span class=\"s2\">\"${results.get(i, \"</span><span class=\"n\">inchikey</span><span class=\"s2\">\")}\\t${results.get(i, \"</span><span class=\"n\">smiles</span><span class=\"s2\">\")}\"</span>\n<span class=\"o\">}</span>\n</code></pre></div></div>\n\n<p>The output is a simple table:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>MGAPJMNPGGTFHJ-JEIPZWNWSA-N     CN1C(=O)/C(=C/2\\C3=CC(=CC=C3N(CC4=CC=CC=C4)C2=O)Cl)/C(=P(C5=CC=CC=C5)(C6=CC=CC=C6)C7=CC=CC=C7)C1=O\nXEWMQVUVGAHESA-UHFFFAOYSA-N     CC1=CC=C(C=C1)NC2=C(C3C4=CC(=CC=C4N(CC5=CC=CC=C5)C3=O)C)C(=O)N(C)C2=O\nUVTJORFYHPGJDZ-PYCFMQQDSA-N     CCCCN1C2=CC=C(C)C=C2/C(=C(\\C#N)/CNC3=CC=C(C)C=C3)/C1=O\nILWGDUYVQRAMMG-PGMHBOJBSA-N     CCCCN1C2=CC=C(C)C=C2/C(=C(\\C#N)/CNC3=CC=C(C=C3)Cl)/C1=O\nCAFIBKBZWJFZCW-FXBPSFAMSA-N     CCCCN1C2=CC=C(C)C=C2/C(=C(\\C#N)/CNC3=CC=CC=C3)/C1=O\nUOJSFLANMVIMBV-UHFFFAOYSA-N     CCCCN1C2=CC=C(C)C=C2C(C3=C(C(=O)N(C)C3=O)NC4=CC=C(C=C4)Cl)C1=O\nVNJBTGZXAGHCSO-OAPYJULQSA-N     COC(=O)/C(=C\\1/C2=C(C=CC=C2)N(CC3=CC=CC=C3)C1=O)/C=P(C4=CC=CC=C4)(C5=CC=CC=C5)C6=CC=CC=C6\nKJXQRAKSOANQTJ-GFMRDNFCSA-N     CC1=CC=C(C=C1)NC/C(=C\\2/C3=C(C=CC=C3)N(CC4=CC=CC=C4)C2=O)/C#N\nIGEBJMZDOPBFGF-UHFFFAOYSA-N     CCCCN1C2=CC=C(C)C=C2C(C3=C(C(=O)N(C)C3=O)NC4=CC=CC=C4)C1=O\nSSANVPNESOMKOM-AWQADKOQSA-N     C1=CC=C(C=C1)CN2C3=CC=C(C=C3/C(=C(/C#N)\\C=P(C4=CC=CC=C4)(C5=CC=CC=C5)C6=CC=CC=C6)/C2=O)Cl\nGEHWHSHQSIOZKL-NVQSTNCTSA-N     CCCCN1C2=CC=C(C=C2/C(=C\\3/C(=P(C4=CC=CC=C4)(C5=CC=CC=C5)C6=CC=CC=C6)C(=O)N(C)C3=O)/C1=O)Cl\nPALRSQOHFLRWDH-UHFFFAOYSA-N     CCCCN1C2=CC=C(C)C=C2C(C3=C(C(=O)N(C)C3=O)NC4=CC=C(C=C4)OC)C1=O\nKBFODZMDSAFLFR-UHFFFAOYSA-N     CN1C(=O)C(=C(C1=O)NC2=CC(=CC=C2)Cl)C3C4=CC(=CC=C4N(CC5=CC=CC=C5)C3=O)Cl\nJCGAVVZYXDJPBU-GFMRDNFCSA-N     CC1=C(C=CC=C1)NC/C(=C\\2/C3=C(C=CC=C3)N(CC4=CC=CC=C4)C2=O)/C#N\nDZFPCPDEQGLPLY-UHFFFAOYSA-N     CCCCN1C2=CC=C(C)C=C2C(C3=C(C(=O)N(C)C3=O)NC4=CC=C(C)C=C4)C1=O\nXMRNJCJUOXYXJU-DAFNUICNSA-N     CC1=CC=C(C=C1)NC/C(=C\\2/C3=CC(=CC=C3N(CC4=CC=CC=C4)C2=O)C)/C#N\nSSDSNBBHEUUKGI-UHFFFAOYSA-N     CC1=CC=C2C(=C1)C(C3=C(C(=O)N(C)C3=O)N(C)C4=CC=CC=C4)C(=O)N2CC5=CC=CC=C5\nUSFYPRDMNXMWPO-UHFFFAOYSA-N     CCCCN1C2=CC=C(C)C=C2C(C3=C(C(=O)N(C)C3=O)NC4=CC=C(C=C4)Br)C1=O\nXYHTWFULRHTEAG-MUGXBBEHSA-N     CCCCN1C2=CC=C(C)C=C2/C(=C(/C#N)\\C=P(C3=CC=CC=C3)(C4=CC=CC=C4)C5=CC=CC=C5)/C1=O\nXALDZIBHNNIVAM-UHFFFAOYSA-N     CCCCN1C2=CC=C(C)C=C2C(C3=C(C(=O)N(C)C3=O)NC4=C(C=CC=C4)O)C1=O\nTUTWQHBRQPMLME-OAPYJULQSA-N     COC(=O)/C(=C\\1/C2=CC(=CC=C2N(CC3=CC=CC=C3)C1=O)Cl)/C=P(C4=CC=CC=C4)(C5=CC=CC=C5)C6=CC=CC=C6\nIYEHFTMZZMIPRU-UHFFFAOYSA-N     CC1=CC=C(C=C1)NC2=C(C3C4=CC(=CC=C4N(CC5=CC=CC=C5)C3=O)Cl)C(=O)N(C)C2=O\nKBSDGNPLIPXCEX-UHFFFAOYSA-N     CCCCN1C2=CC=C(C)C=C2C(C3=C(C(=O)N(C)C3=O)NCC4=CC=CC=C4)C1=O\nBQGIUMITIGHBSD-UHFFFAOYSA-N     CCCCNC1=C(C2C3=CC(=CC=C3N(CC4=CC=CC=C4)C2=O)C)C(=O)N(C)C1=O\nPNSOLOPHIVUPOZ-MNDPQUGUSA-N     CCCCNC/C(=C\\1/C2=CC(=CC=C2N(CCCC)C1=O)C)/C#N\nHLTBKJRJOIZCMJ-PYCFMQQDSA-N     CCCCN1C2=CC=C(C)C=C2/C(=C(\\C#N)/CN(C)C3=CC=CC=C3)/C1=O\nFFLHFLUBMRBQTB-UHFFFAOYSA-N     CCCCN1C2=CC=C(C=C2C(C3=C(C(=O)N(C)C3=O)NC4=CC=C(C)C=C4)C1=O)F\nFOQOVOLYYARWPA-NKFKGCMQSA-N     C1=CC=C(C=C1)CN2C3=C(C=CC=C3)/C(=C(\\C#N)/CNC4=CC(=CC=C4)Cl)/C2=O\nKLEPCAQFOXJLNV-UHFFFAOYSA-N     CC1=C(C=CC=C1)NC2=C(C3C4=CC(=CC=C4N(CC5=CC=CC=C5)C3=O)Cl)C(=O)N(C)C2=O\n</code></pre></div></div>\n\n<p>That also made me realize that there are not chemical names in the annotation. That would be really useful to move things\nforward. Then again, PubChem will likely just generate the IUPAC name, since they have access to such software anyway.\nThey have teamed up with PubChem which will index it, but I will be interested in seeing how to use this for\n<code class=\"language-plaintext highlighter-rouge\">main subject</code> annotation in <a href=\"https://www.wikidata.org/wiki/Wikidata:WikiProject_Chemistry\">Wikidata</a>.</p>\n\n<p>A final note for now, the model they use is annotate the article with chemical substances (<code class=\"language-plaintext highlighter-rouge\">ChemicalSubstance</code>) with\n(one or more?) molecular entities (`MolecularEntity’). That is a model that scales well to their other journal,\nthe <a href=\"https://scholia.toolforge.org/venue/Q814756\">Beilstein Journal of Nanotechnology</a>. But scraping that is for another post.</p>",
      "summary": "Two weeks ago, the Beilstein Institute announced Bioschemas support in their journals:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/bjoc_bioschemas.png",
      "date_published": "2025-02-13T00:00:00+00:00",
      "date_modified": "2025-02-22T00:00:00+00:00",
      "tags": ["bioschemas","rdf","chemistry","beilstein"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.59350/ne4rf-wey66", "doi": "10.59350/ne4rf-wey66"
            , "cito":
              
              
                [ 
                  "citesForInformation"
                  
                 ]
              
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1186/s13321-021-00520-4", "doi": "10.1186/s13321-021-00520-4"
            , "cito":
              
              
                [ 
                  "citesForInformation"
                  
                 ]
              
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1021/jm5002056", "doi": "10.1021/jm5002056"
            , "cito":
              
              
                [ 
                  "citesForInformation"
                  
                 ]
              
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.3762/bjoc.21.21", "doi": "10.3762/bjoc.21.21"
            , "cito":
              
              
                [ 
                  "usesDataFrom"
                  
                 ]
              
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.21105/joss.02558", "doi": "10.21105/joss.02558"
            , "cito":
              
              
                [ 
                  "usesMethodIn"
                  
                 ]
              
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.59350/40377-hz881", "doi": "10.59350/40377-hz881"
            , "cito":
              
              
                [ 
                  "citesForInformation"
                  
                 ]
              
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/cnt8a-3v351",
      "url": "https://chem-bla-ics.linkedchemistry.info/2025/02/08/cito-for-blog-citations.html",
      "title": "CiTO for blog citations",
      "content_html": "<p>This is mostly a test, but if it turns out the way I hope it will, likely after a few iterations, it adds\nsupport in my blog for CiTO intent annotations to the DOIs I cite. I\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2024/12/30/fair-blog-to-blog-citations.html\">pondered about the earlier</a>.\nIn the <a href=\"https://www.jsonfeed.org/version/1.1/\">JSON Feed</a> it should, at least for now, show up like this:</p>\n\n<div class=\"language-json highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nl\">\"_references\"</span><span class=\"p\">:</span><span class=\"w\"> </span><span class=\"p\">[</span><span class=\"w\">\n  </span><span class=\"p\">{</span><span class=\"w\">\n    </span><span class=\"nl\">\"url\"</span><span class=\"p\">:</span><span class=\"w\"> </span><span class=\"s2\">\"https://doi.org/10.59350/er1mn-m5q69\"</span><span class=\"p\">,</span><span class=\"w\">\n    </span><span class=\"nl\">\"doi\"</span><span class=\"p\">:</span><span class=\"w\"> </span><span class=\"s2\">\"10.59350/er1mn-m5q69\"</span><span class=\"p\">,</span><span class=\"w\">\n    </span><span class=\"nl\">\"cito\"</span><span class=\"p\">:</span><span class=\"w\"> </span><span class=\"p\">[</span><span class=\"w\"> </span><span class=\"s2\">\"extends\"</span><span class=\"w\"> </span><span class=\"p\">]</span><span class=\"w\">\n  </span><span class=\"p\">}</span><span class=\"w\">\n</span><span class=\"p\">]</span><span class=\"w\">\n</span></code></pre></div></div>",
      "summary": "This is mostly a test, but if it turns out the way I hope it will, likely after a few iterations, it adds support in my blog for CiTO intent annotations to the DOIs I cite. I pondered about the earlier. In the JSON Feed it should, at least for now, show up like this:",
      
      "date_published": "2025-02-08T00:00:00+00:00",
      "date_modified": "2025-02-08T00:00:00+00:00",
      "tags": ["cito","blog","json"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.59350/er1mn-m5q69", "doi": "10.59350/er1mn-m5q69"
            , "cito":
              
              
                [ 
                  "extends"
                  
                 ]
              
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/2akb8-d4v55",
      "url": "https://chem-bla-ics.linkedchemistry.info/2025/01/27/translational-genomics.html",
      "title": "s/BiGCaT/Translational Genomics/g",
      "content_html": "<p>With a year of preparation and two years of thinking, on September 1st 2024 the\nDepartment of Bioinformatics, aka BiGCaT, merged with two other departments to\nform the <a href=\"https://www.maastrichtuniversity.nl/research/translational-genomics\">Department of Translational Genomics</a>\n(see also <a href=\"https://www.linkedin.com/feed/update/urn:li:activity:7289584128176336896?utm_source=share&amp;utm_medium=member_desktop\">this LinkedIn announcement</a>).\nThis merger creates many new opportunities while it strenghtens our bioinformatics\nresearch. In fact, I will have more room to focus on the chemical roles in our\n<em>more accurate understanding of biological processes</em>. I am looking forward to\nthe upcoming years!</p>\n\n<p><img src=\"/assets/images/translational_genomics.png\" alt=\"\" /></p>",
      "summary": "With a year of preparation and two years of thinking, on September 1st 2024 the Department of Bioinformatics, aka BiGCaT, merged with two other departments to form the Department of Translational Genomics (see also this LinkedIn announcement). This merger creates many new opportunities while it strenghtens our bioinformatics research. In fact, I will have more room to focus on the chemical roles in our more accurate understanding of biological processes. I am looking forward to the upcoming years!",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/translational_genomics.png",
      "date_published": "2025-01-27T00:00:00+00:00",
      "date_modified": "2025-01-27T00:00:00+00:00",
      "tags": ["bioinfo"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/am3yq-9xx77",
      "url": "https://chem-bla-ics.linkedchemistry.info/2025/01/26/niche-papers.html",
      "title": "Niche papers and citation intentions",
      "content_html": "<p>I wish I could say I remember the first citation to one of my research articles. I do not. But I do remember\nthe excitement to see why someone was citing my research. What I do remember is that I got a comment around\nthe same time along the lines of this: <em>“why would anyone cite your article if they can download the results\nfor free?”</em> (about open science cheminformatics research). Other times. Indeed, I found out there are many reasons why people are citing and not citing\narticles. The above is one of them (still happens too often). But that’s also an intrinsic property of the\ncurrent publishing model: some papers get cited too much, others get cited too little.</p>\n\n<p><a href=\"https://scholar.social/@dingemansemark\">Mark Dingemanse</a> wrote up a post <em><a href=\"https://doi.org/10.59350/m6erd-7px95\">[i]n praise of niche papers</a></em>,\nsuggesting people to highlight papers that are not cited enough (as proxy for not getting enough attention).\nThey write:</p>\n\n<blockquote>\n  <p>Let’s define niche papers informally as work to be proud of even if it managed to remain a bit obscure;\ngood work that would deserve more readers. Niche papers may not contain the most flashy results. They\nmay not appear in the most glamourous venues. They may be book chapters. They don’t easily gather\ndrive-by citations.</p>\n</blockquote>\n\n<h2 id=\"why-i-found-this-post-interesting\">Why I found this post interesting</h2>\n\n<p>Before I move on to highlighting niche papers (from our group and from others), I want to ponder\na bit more about the rest. The first I learned is that the citation count to articles is a bad measure\nfor the impact (<a href=\"https://chem-bla-ics.linkedchemistry.info/2006/11/07/when-is-open-source-chemoinformatics.html\">2006 pondering</a>):\narticles using your work may get more citations than your own article. For example, the first paper\n(doi:<a href=\"https://doi.org/10.1021/CI025584Y\">10.1021/CI025584Y</a>) about the open science cheminformatics\nabout the <a href=\"https://cdk.github.io\">Chemistry Development Kit</a> (CDK) was originally cited less than\nthe paper about the BRENDA enzyme database (doi:<a href=\"https://doi.org/10.1093/NAR/GKH081\">10.1093/NAR/GKH081</a>)\nusing the CDK for fingerprint calculations (to compare and search enzyme substrates), and later much\nless than MZmine (doi:<a href=\"https://doi.org/10.1186/1471-2105-11-395\">10.1186/1471-2105-11-395</a>)\n(see <a href=\"https://scholia.toolforge.org/works/Q27061829,Q27136473,Q24599948\">this Scholia page</a>):</p>\n\n<p><img src=\"/assets/images/cdk_citations.png\" alt=\"\" /></p>\n\n<p>I think we should with limiting ourselves to papers and book chapters. We must extend out notion of research output,\nanyway, starting with data and software. This is part of defining what niche is, imo.</p>\n\n<p>Second reason why I liked Mark’s post is the <em>drive-by citations</em>, which he references to\n<a href=\"https://scatter.wordpress.com/2009/04/30/drive-by-citations/\">a 2009 post by andrewperrin</a> which defined\nsuch a citations as</p>\n\n<blockquote>\n  <p>references to a work that make a very quick appearance, extract a very small, specific point from the work,\nand move on without really considering the existence or depth of connection between the student’s work and\nthe cited work.</p>\n</blockquote>\n\n<p>This is someone I noted too when analyzing citations to the aforementioned CDK paper. Particularly in the\nearly days, it was cited a lot in a similar way: it was not using the CDK, but ascribed some authority\nto the paper in a <em>very quick appearance, without really considering the cited work</em>. The\n<a href=\"https://purl.org/spar/cito\">Citation Typing Ontology</a> (CiTO, doi:<a href=\"https://doi.org/10.1186/2041-1480-1-s1-s6\">10.1186/2041-1480-1-s1-s6</a>)\nhas <em>cito:citesAsAuthority</em> for that (not exactly the same thing,\nand maybe CiTO should have <em>cito:driveByCitation</em> too). And they happen a lot, and in the past I have\nguestimated them to make up 20-35% of the citations to an article, and I postulate that\nhigh-journal-impact-factor journals amass a higher ratio than specialistic (niche?) journals.</p>\n\n<p>With FAIR citations (see <a href=\"https://chem-bla-ics.linkedchemistry.info/2024/12/30/fair-blog-to-blog-citations.html\">this post</a>)\nwe can visualize that ratio, here in <a href=\"https://scholia.toolforge.org/work/Q27061829#cito-incoming\">this Scholia page</a>:</p>\n\n<p><img src=\"/assets/images/cdk_citations_why.png\" alt=\"\" /></p>\n\n<p>It is also obvious that the first CDK paper introduced a new method. But the pattern is not\nlimited to this paper, and with <a href=\"https://scholia.toolforge.org/cito/#article-counts\">just over 2000 citation intentions</a>,\nwe start of get some idea of this pattern:</p>\n\n<p><img src=\"/assets/images/citations_why.png\" alt=\"\" /></p>\n\n<h2 id=\"my-contributed-niche-papers\">My contributed Niche Papers</h2>\n\n<p>That brings me to a first neglected paper, David Shotton’s original conference proceedings <em>CiTO, the Citation Typing Ontology</em>\n(doi:<a href=\"https://doi.org/10.1186/2041-1480-1-S1-S6\">10.1186/2041-1480-1-S1-S6</a>), another paper where citing articles\nare more cited than the original:</p>\n\n<p><img src=\"/assets/images/cito_openalex.png\" alt=\"\" /></p>\n\n<p>A second example is cited even less (only <a href=\"https://openalex.org/works?page=1&amp;filter=cites%3Aw2103581950\">36 times</a>\naccording to OpenAlex), but a wonderful early example of machine learning of a massive amount of data:\n<em>Genome‐Scale Classification of Metabolic Reactions: A Chemoinformatics Approach</em> (doi:<a href=\"https://doi.org/10.1002/anie.200503833\">10.1002/anie.200503833</a>) by\nDiogo Latino and João Aires‐de‐Sousa. My <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/04/04/mining-kegg-pathway-database-with-self.html\">2006 blog post</a>\nabout their article did not make a difference. And this is remarkable if you look at home many articles\nare <a href=\"https://scholar.google.com/scholar?hl=en&amp;as_sdt=0,5&amp;as_ylo=2021&amp;q=enzyme+reaction+classification+with+machine+learning\">published now yearly in similar efforts</a>.</p>\n\n<p>From our group, I think the impact of <a href=\"https://scholar.google.com/citations?view_op=list_works&amp;hl=en&amp;hl=en&amp;user=bJYJJVMAAAAJ\">Ryan Miller</a>’s\n<em>Understanding signaling and metabolic paths using semantified and harmonized information about biological interactions</em>\n(doi:<a href=\"https://doi.org/10.1371/journal.pone.0263057\">10.1371/journal.pone.0263057</a> is not fully appreciated yet. This\npaper describes and validates work by Ryan, Martina Kutmon, Answesha Bohler, and Andra Waagmeester on modelling biological\ninteraction in a FAIR way. It builds on earlier work, like the <a href=\"https://rdf.wikipathways.org/\">WikiPathways RDF</a>\nwork by Andra (doi:<a href=\"https://doi.org/10.1371/JOURNAL.PCBI.1004989\">10.1371/journal.pcbi.1004989</a>),\nbut zooms in on the interactions and develops method to assess the quality of the FAIR modelling\nof them. This provides us with a method to evaluate later analyses where these interactions are used.</p>\n\n<p>A second paper from our group which I expected to get more attention is a paper by <a href=\"https://scholar.google.com/citations?hl=en&amp;user=8ZmXyZcAAAAJ\">Ammar</a>\n(doi:<a href=\"https://doi.org/10.1186/s13321-023-00701-3\">10.1186/s13321-023-00701-3</a>) where he looked\ninto personalized binding affinities. That is, drugs may bind better to their targets for some\npeople than for other (and therefore work better for some people than for other), and his analysis\nsuggests they impact can be significant. We will learn in time.</p>",
      "summary": "I wish I could say I remember the first citation to one of my research articles. I do not. But I do remember the excitement to see why someone was citing my research. What I do remember is that I got a comment around the same time along the lines of this: “why would anyone cite your article if they can download the results for free?” (about open science cheminformatics research). Other times. Indeed, I found out there are many reasons why people are citing and not citing articles. The above is one of them (still happens too often). But that’s also an intrinsic property of the current publishing model: some papers get cited too much, others get cited too little.",
      "image": "https://chem-bla-ics.linkedchemistry.infoassets/images/cdk_citations.png",
      "date_published": "2025-01-26T00:00:00+00:00",
      "date_modified": "2025-01-26T00:00:00+00:00",
      "tags": ["publishing"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.59350/m6erd-7px95", "doi": "10.59350/m6erd-7px95"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1021/CI025584Y", "doi": "10.1021/CI025584Y"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1186/1471-2105-11-395", "doi": "10.1186/1471-2105-11-395"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1093/NAR/GKH081", "doi": "10.1093/NAR/GKH081"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1002/anie.200503833", "doi": "10.1002/anie.200503833"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1186/2041-1480-1-S1-S6", "doi": "10.1186/2041-1480-1-S1-S6"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1371/JOURNAL.PONE.0263057", "doi": "10.1371/JOURNAL.PONE.0263057"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1371/JOURNAL.PCBI.1004989", "doi": "10.1371/JOURNAL.PCBI.1004989"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1186/S13321-023-00701-3", "doi": "10.1186/S13321-023-00701-3"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/cf885-kee54",
      "url": "https://chem-bla-ics.linkedchemistry.info/2025/01/19/blog-updates.html",
      "title": "Blog updates",
      "content_html": "<p><a href=\"https://doi.org/10.59350/nfqxs-qs982\">One-and-a-half years ago</a> I started migration my blog from blogger.com to a Markdown and Git-based blog.\nIt has been a fascinating journey that I do not regret. I love being back in control and not reliant on features of some\n<em>content management system</em>. I learned so much along the way, including <a href=\"https://jekyllrb.com/\">Jekyll</a> and <a href=\"https://jekyllrb.com/docs/liquid/\">Liquid</a>\nto start with, but also <a href=\"https://doi.org/10.59350/nfqxs-qs982\">Fontawesome</a> (for better or worse)m and <a href=\"https://doi.org/10.59350/8x2f1-h6d21\">Goatcounter</a>\nfor GDPR-compatible and privacy-first impact tracking.</p>\n\n<p>But I also greatly enjoy the interaction with the <a href=\"https://rogue-scholar.org/\">Rogue Scholar</a> team (particularly <a href=\"https://blog.front-matter.io/\">Martin Fenner</a>).\nFirst, it has great to be listed on (something like) a blog planet, and to read the collection of blog posts, of course! BTW, also thanks to\n<a href=\"https://larsgw.blogspot.com/\">Lars Willighagen</a> who joined Rogue Scholar earlier than I did. This interaction allowed me\nto take part in various innovations, like archiving and getting DOIs for blog posts, archiving entire blogs (see doi:<a href=\"https://doi.org/10.53731/3c6pm-xbp04\">10.53731/3c6pm-xbp04</a> and\ndoi:<a href=\"https://doi.org/10.59350/vjvdy-6p110\">10.59350/vjvdy-6p110</a>), <a href=\"https://doi.org/10.59350/er1mn-m5q69\">cite blog posts with DOIs</a>,\nreferences in blogs (e.g. see doi:<a href=\"https://doi.org/10.53731/m9d5v-xmr74\">10.53731/m9d5v-xmr74</a>),\n<a href=\"https://www.jsonfeed.org/\">JSON Feed</a> (see doi:<a href=\"https://doi.org/10.53731/d6vdvbt-tffmezj\">10.53731/d6vdvbt-tffmezj</a>;\n<a href=\"https://chem-bla-ics.linkedchemistry.info/feed.json\">last 10</a> or <a href=\"https://chem-bla-ics.linkedchemistry.info/archive.json\">full archive</a>),\n<a href=\"https://doi.org/10.59350/1cg8w-qth68\">ORCID support</a>,\nand if things goes well, <a href=\"https://doi.org/10.53731/m9d5v-xmr74\">preregistration of blogpost DOIs with commonmeta</a>.</p>\n\n<p>The JSON Feed is interesting. For example, it includes more specific support for references, something that any scholarly\nblogger should look at:</p>\n\n<div class=\"language-json highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"p\">{</span><span class=\"w\">\n  </span><span class=\"err\">...</span><span class=\"w\">\n  </span><span class=\"nl\">\"_references\"</span><span class=\"p\">:</span><span class=\"w\"> </span><span class=\"p\">[</span><span class=\"w\">\n    </span><span class=\"p\">{</span><span class=\"w\"> </span><span class=\"nl\">\"url\"</span><span class=\"p\">:</span><span class=\"w\"> </span><span class=\"s2\">\"https://doi.org/10.7717/peerj-cs.214\"</span><span class=\"w\"> </span><span class=\"p\">},</span><span class=\"w\">\n    </span><span class=\"p\">{</span><span class=\"w\"> </span><span class=\"nl\">\"url\"</span><span class=\"p\">:</span><span class=\"w\"> </span><span class=\"s2\">\"https://doi.org/10.5281/ZENODO.14562484\"</span><span class=\"w\"> </span><span class=\"p\">},</span><span class=\"w\">\n    </span><span class=\"p\">{</span><span class=\"w\"> </span><span class=\"nl\">\"url\"</span><span class=\"p\">:</span><span class=\"w\"> </span><span class=\"s2\">\"https://doi.org/10.5281/ZENODO.14562504\"</span><span class=\"w\"> </span><span class=\"p\">}</span><span class=\"w\">\n  </span><span class=\"p\">],</span><span class=\"w\">\n  </span><span class=\"err\">...</span><span class=\"w\">\n</span><span class=\"p\">}</span><span class=\"w\">\n</span></code></pre></div></div>\n\n<p>And the citations get propagated and show up like this in the Rogue Scholar archives:</p>\n\n<p><img src=\"/assets/images/rs_archives.png\" alt=\"\" /></p>\n\n<p>I think we also see the ongoing innovation in action. Previously, this is the first time I see the “Unknown title”,\nbut from the JSON it is obviously missing too. One thing to remember here, is that currently my blog does\nnot have this metadata, and when you read my blog, it is <a href=\"https://citation.js.org/\">citation.js</a>\n(doi:<a href=\"https://doi.org/10.7717/peerj-cs.214\">10.7717/peerj-cs.214</a>) that looks up the metadata using the DOI and adds\nthat to the blog post in your browser. Doing this when the HTML is being generated is something\nI still need to learn how to do that.</p>",
      "summary": "One-and-a-half years ago I started migration my blog from blogger.com to a Markdown and Git-based blog. It has been a fascinating journey that I do not regret. I love being back in control and not reliant on features of some content management system. I learned so much along the way, including Jekyll and Liquid to start with, but also Fontawesome (for better or worse)m and Goatcounter for GDPR-compatible and privacy-first impact tracking.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/rs_archives.png",
      "date_published": "2025-01-19T00:00:00+00:00",
      "date_modified": "2025-01-19T00:00:00+00:00",
      "tags": ["blog"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.53731/m9d5v-xmr74", "doi": "10.53731/m9d5v-xmr74"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.53731/d6vdvbt-tffmezj", "doi": "10.53731/d6vdvbt-tffmezj"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.53731/3c6pm-xbp04", "doi": "10.53731/3c6pm-xbp04"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.7717/peerj-cs.214", "doi": "10.7717/peerj-cs.214"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/fjbv7-53d20",
      "url": "https://chem-bla-ics.linkedchemistry.info/2025/01/05/sr24-results.html",
      "title": "Serious Request: the results",
      "content_html": "<p>The last week before the winter break <a href=\"https://chem-bla-ics.linkedchemistry.info/2024/12/09/sr24.html\">Serious Request took place</a>.\nWe started <a href=\"https://www.npo3fm.nl/kominactie/acties/wikipathways-in-actie-voor-metakids\">an action around WikiPathways</a> and\nwe collected 877 euro for <a href=\"https://nl.wikipedia.org/wiki/Stichting_Metakids\">the MetaKids Foundation</a>. In total there were 2612\nactions, many of which brought in a lot more. We ended up in position 928.</p>\n\n<p>But the money was only one part of our “donation” of the MetaKids goal to make 35 percent point more inherited metabolic\ndisorders treatable (which they currently are not), and to address the number one cause of death among Dutch kids.\nBecause our action focussed on getting more biology relevant to metabolic diseases into WikiPathays. For this we set\nup a <a href=\"https://sr24.wikipathways.org/\">WikiPathways SR24 community</a> page, along with a <a href=\"https://www.wikipathways.org/sr24-curation/index2.html\">curation page</a>\nshowing the results of automated curation alerts. Actually, in preparation of the Action, I updated that code\nbase to no longer have two states (succeeded, failed), but four states, depending on the percentage of tests failing\nfor that pathway. This has also been roled out to the <a href=\"https://www.wikipathways.org/\">main WikiPathways website</a>.</p>\n\n<p>In the weekend before our action, I wanted to test my <a href=\"skills\">PathVisio</a> and had a go at a pathway drawing\nfrom a book of which most pathways had already been digitized (see doi:<a href=\"https://doi.org/10.1007/978-3-030-67727-5_73\">10.1007/978-3-030-67727-5_73</a>),\nbut not this one. This resulted in a first pathway (wikipathways:<a href=\"https://wikipathways.org/instance/WP5504\">WP5504</a>),\nwhich was later that week greatly extended by <a href=\"https://scholar.google.com/citations?hl=en&amp;user=Le-4tuQAAAAJ\">Denise</a>.\nI also ported the table of chapters from this book to <a href=\"https://blau.wikipathways.org/\">the new WikiPathways community page for the book</a>.</p>\n\n<h2 id=\"a-list-of-genes\">A list of genes</h2>\n\n<p>From <a href=\"https://scholar.google.com/citations?user=6yvglHYAAAAJ&amp;hl=en\">Marek Noga</a> from our university medical center\nI received a pointer to a nice paper with a long list of diseases and matching genes (doi:<a href=\"https://doi.org/10.1002/jimd.12348\">10.1002/jimd.12348</a>)\nwhich provided a great starting point. I started out by making the data from the supplementary files more FAIR\nby <a href=\"https://social.edu.nl/@egonw/113661472648129803\">converting the data into RDF</a>.</p>\n\n<p>With SPARQL I compared the genes (via their HGNC symbols) with the content of WikiPathways:</p>\n\n<div class=\"language-sparql highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">PREFIX</span><span class=\"w\"> </span><span class=\"nn\">wp</span><span class=\"o\">:</span><span class=\"w\">      </span><span class=\"nn\">&lt;http://vocabularies.wikipathways.org/wp#&gt;</span><span class=\"w\">\n</span><span class=\"k\">PREFIX</span><span class=\"w\"> </span><span class=\"nn\">dc</span><span class=\"o\">:</span><span class=\"w\">      </span><span class=\"nn\">&lt;http://purl.org/dc/elements/1.1/&gt;</span><span class=\"w\">\n\n</span><span class=\"k\">SELECT</span><span class=\"w\"> </span><span class=\"nv\">?gene</span><span class=\"w\"> </span><span class=\"nv\">?omim</span><span class=\"w\"> </span><span class=\"nv\">?geneLabel</span><span class=\"w\"> </span><span class=\"k\">WHERE</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\">\n  </span><span class=\"nv\">?gene</span><span class=\"w\"> </span><span class=\"k\">a</span><span class=\"w\"> </span><span class=\"nn\">wp</span><span class=\"o\">:</span><span class=\"ss\">GeneProduct</span><span class=\"w\"> </span><span class=\"p\">;</span><span class=\"w\">\n    </span><span class=\"nn\">rdfs</span><span class=\"o\">:</span><span class=\"ss\">label</span><span class=\"w\"> </span><span class=\"nv\">?geneLabel</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n  </span><span class=\"k\">OPTIONAL</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\">\n    </span><span class=\"nv\">?gene</span><span class=\"w\"> </span><span class=\"nn\">rdfs</span><span class=\"o\">:</span><span class=\"ss\">seeAlso</span><span class=\"w\"> </span><span class=\"nv\">?omimIRI</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n    </span><span class=\"nv\">?omimIRI</span><span class=\"w\"> </span><span class=\"nn\">dc</span><span class=\"o\">:</span><span class=\"ss\">identifier</span><span class=\"w\"> </span><span class=\"nv\">?omim</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n    </span><span class=\"k\">FILTER</span><span class=\"w\"> </span><span class=\"p\">(</span><span class=\"nb\">contains</span><span class=\"p\">(</span><span class=\"nb\">str</span><span class=\"p\">(</span><span class=\"nv\">?omimIRI</span><span class=\"p\">),</span><span class=\"w\"> </span><span class=\"s2\">\"omim:\"</span><span class=\"p\">))</span><span class=\"w\">\n  </span><span class=\"p\">}</span><span class=\"w\">\n  </span><span class=\"nv\">?gene</span><span class=\"w\"> </span><span class=\"nn\">wp</span><span class=\"o\">:</span><span class=\"ss\">bdbHgncSymbol</span><span class=\"w\"> </span><span class=\"nv\">?hgnc</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n  </span><span class=\"k\">OPTIONAL</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\">\n    </span><span class=\"k\">SERVICE</span><span class=\"w\"> </span><span class=\"nn\">&lt;https://sparql.wikipathways.org/sparql&gt;</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\">\n      </span><span class=\"nv\">?wpGene</span><span class=\"w\"> </span><span class=\"nn\">wp</span><span class=\"o\">:</span><span class=\"ss\">bdbHgncSymbol</span><span class=\"w\"> </span><span class=\"nv\">?hgnc</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n    </span><span class=\"p\">}</span><span class=\"w\">\n  </span><span class=\"p\">}</span><span class=\"w\">\n  </span><span class=\"k\">FILTER</span><span class=\"w\"> </span><span class=\"p\">(</span><span class=\"o\">!</span><span class=\"nb\">BOUND</span><span class=\"p\">(</span><span class=\"nv\">?wpGene</span><span class=\"p\">))</span><span class=\"w\">\n  </span><span class=\"k\">FILTER</span><span class=\"w\"> </span><span class=\"p\">(</span><span class=\"nb\">CONTAINS</span><span class=\"p\">(</span><span class=\"nv\">?geneLabel</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"s2\">\" \"</span><span class=\"p\">))</span><span class=\"w\">\n</span><span class=\"p\">}</span><span class=\"w\">\n</span></code></pre></div></div>\n\n<p>This resulted in a <a href=\"https://docs.google.com/spreadsheets/d/1fWFKXVs9q172eHDpv4OLa0TcHuozTBweDe2_zOLJc-Q/edit?usp=sharing\">spreadsheet with more than 300 genes not in WikiPathways</a>.\nAn analysis by Karen Rothfels and Lisa Matthews showed that the number of genes not found in Reactome\nis only 129. Indeed, later analyses showed that Reactome has a few very relevant pathways missing in\nWikiPathways.</p>\n\n<h1 id=\"new-biological-pathways\">New biological pathways</h1>\n\n<p>To figure out, it turns out the <a href=\"https://pfocr.wikipathways.org/\">Pathway Figure OCR</a> (doi:<a href=\"https://doi.org/10.1186/s13059-020-02181-2\">10.1186/s13059-020-02181-2</a>)\nand <a href=\"https://www.ndexbio.org/\">NDEX</a> (doi:<a href=\"https://doi.org/10.1093/bioinformatics/btad118\">10.1093/bioinformatics/btad118</a>) tools\nare very useful here. They both allow passing a list of genes and return results (sets, pathways, models) relevant to\nthat list. NDEX includes the sets from Pathway Figure OCR, and those sets are a set of genes linked to single\njournal article which included a pathway diagram. I used this on the list of 371 genes not in WikiPathways and the list\nof 129 genes not in Reactome, and identified five articles. It actually turns out that two\nbasically described the same biology and both are captured in the same new pathway\n(wikipathways:<a href=\"https://wikipathways.org/instance/WP5505\">WP5505</a>). This pathway includes a good number\nof PIG genes, handling the very specific metabolic conversion of a metabolite.</p>\n\n<h1 id=\"complex-chemistry\">Complex chemistry</h1>\n\n<p>That <a href=\"https://social.edu.nl/@egonw/113678723229529283\">metabolite is complex</a> and databases do not seem to have the structure yet, so I set out\ngenerating a SMILES:</p>\n\n<p><img src=\"/assets/images/b0fbb7ea135318b9.png\" alt=\"\" /></p>\n\n<p>I reported the final SMILES, but I am not happy with it yet, and actually spotted an error already:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>N[Prot]C(=O)NCCOP(=O)([O-])OC[C@@H]1[C@@H](O)[C@H]([R11])[C@H]([R10])[C@@H](O1)O[C@H]1[C@@H]([R8])[C@H](O)[C@@H](C[R9])O[C@H]1OC[C@@H]1[C@@H]([R7])[C@H]([R6])[C@H]([R5])[C@@H](O1)OC[C@@H]1[C@@H](O[M3])[C@H](O)[C@@H](N)[C@H](O1)O[C@@H]1[C@@H](O)[C@H](O)[C@@H](O)[C@@H]([R3])[C@H]1OP(=O)([O-])OC[C@H]([R1])C[R2]\n</code></pre></div></div>\n\n<p>So, for completeness and as backup, here are the fragment SMILES that you can copy/paste into <a href=\"https://www.simolecule.com/cdkdepict/depict.html\">CDK Depict</a>:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>N[Prot]C(=O)NCCOP(=O)([O-])OC[C@@H]1[C@@H](O)[C@H]([R11])[C@H]([R10])[C@@H](O1)O[C@H]1[C@@H]([R8])[C@H](O)[C@@H](C[R9])O[C@H]1OC[C@@H]1[C@@H]([R7])[C@H]([R6])[C@H]([R5])[C@@H](O1)OC[C@@H]1[C@@H](O[M3])[C@H](O)[C@@H](N)[C@H](O1)O[C@@H]1[C@@H](O)[C@H](O)[C@@H](O)[C@@H]([R3])[C@H]1OP(=O)([O-])OC[C@H]([R1])C[R2]\nN[Prot]C(=O)NCCOP(=O)([O-])O protein-linked ethanolamine phosphate (E0)\n[E0]OC[C@@H]1[C@@H](O)[C@H]([R11])[C@H]([R10])[C@@H](O1)O[M2] Manα1-2 (M1)\n[M1]O[C@H]1[C@@H]([R8])[C@H](O)[C@@H](C[R9])O[C@H]1O[M3] Manα1-6 (M2)\n[M2]OC[C@@H]1[C@@H]([R7])[C@H]([R6])[C@H]([R5])[C@@H](O1)O[G4] Manα1-4 (M3)\n[R4]C[C@@H]1[C@@H](O[M3])[C@H](O)[C@@H](N)[C@H](O1)O[S5] GlCNα1-6 (G4)\n[G4]O[C@@H]1[C@@H](O)[C@H](O)[C@@H](O)[C@@H]([R3])[C@H]1OP(=O)([O-])OC[C@H]([R1])C[R2] phosphatidylinositol (S5)\n</code></pre></div></div>\n\n<h1 id=\"the-hackathon-day\">The hackathon day</h1>\n\n<p>On Thursday we had a hackathon day at our <a href=\"https://www.maastrichtuniversity.nl/research/translational-genomics\">Translational Genomics department</a>\n(UNS60 building). One of the Action organizers was still travelling back from Germany, but otherwise Tina, Denise, Daan, me, and Marek worked\non Thursday on various things. Tina worked on WP5505, Daan created his first pathways (wikipathways:<a href=\"https://wikipathways.org/instance/WP5507\">WP5507</a>),\nand so did Marek (wikipathways:<a href=\"https://wikipathways.org/instance/WP5506\">WP5506</a>).</p>\n\n<p>We now have 36 pathways on <a href=\"https://sr24.wikipathways.org/\">the community page</a>:</p>\n\n<p><img src=\"/assets/images/sr24_community_pathways.png\" alt=\"\" /></p>\n\n<p>After that hackathon, and to wrap up things, I finalized the updated to the curation page, making the output\nlook better (more curation tests now output Markdown) and failing tests now almost all have an explanation page\nshowing how the affected pathway can be improved (to address the issue).</p>\n\n<p>Somewhere next week, the results of the pathways will be available from the <a href=\"https://sparql.wikipathways.org/\">WikiPathways SPARQL endpoint</a>\nand I can then calculate new numbers. The number of genes not in WikiPathways should be lower.</p>\n\n<p>Finally, perhaps, there are some very specific results, but also we have created a nice todo list:</p>\n\n<ul>\n  <li>plenty of curation on those 36 pathways remains to be done</li>\n  <li>we still have many genes of interest not in pathways, and we should start stubs in WikiPathways</li>\n  <li>we need a better overview of the mitochondiral biology</li>\n</ul>\n\n<p>And there are also still a few issues open:</p>\n\n<ul>\n  <li>I have a todo item to make a curation SPARQL query available via the automated testing (enhancement)</li>\n  <li>not all interactions end up in the RDF (bug)</li>\n</ul>\n\n<p>That bug actually has significant impact on downstream analyses, I guestimate.</p>",
      "summary": "The last week before the winter break Serious Request took place. We started an action around WikiPathways and we collected 877 euro for the MetaKids Foundation. In total there were 2612 actions, many of which brought in a lot more. We ended up in position 928.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/b0fbb7ea135318b9.png",
      "date_published": "2025-01-05T00:00:00+00:00",
      "date_modified": "2025-01-17T00:00:00+00:00",
      "tags": ["sr24"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1002/jimd.12348", "doi": "10.1002/jimd.12348"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1007/978-3-030-67727-5_73", "doi": "10.1007/978-3-030-67727-5_73"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1186/s13059-020-02181-2", "doi": "10.1186/s13059-020-02181-2"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1093/bioinformatics/btad118", "doi": "10.1093/bioinformatics/btad118"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/1cg8w-qth68",
      "url": "https://chem-bla-ics.linkedchemistry.info/2025/01/04/isaac-browser-extension.html",
      "title": "ISAAC Chrome Extension",
      "content_html": "<p>In 2022 I had my first experience with the <a href=\"https://isaac.nwo.nl/\">ISAAC database</a>\nby the Dutch <a href=\"https://www.nwo.nl/\">NWO</a> research funding organization. ISAAC is\nwhere you apply for funding and where grants get tracked. As such, research output\nis recorded in this database.</p>\n\n<p>The list of supported research output types in ISAAC is a bit dated, but includes\nscientific articles, books and monographs, book chapters, PhD theses, conference\nproceedings papers, professional publications, publications aimed at a broad audience,\npatents, contracts, and other. With <a href=\"https://recognitionrewards.nl/\">Recognition &amp; Rewards</a>\nin mind, this list should\nbe more diverse. And clearly missing are software and data, because these are\nalready supported by global unique identifiers and dedicated efforts for\nsoftware citations and data citations. FAIR has progressed a lot for these two\ntypes.</p>\n\n<p>When entering new research output to a project in ISAAC, you get asked to fill\nout various HTML forms. For articles, each author is a separate HTML form. All\nin all, quite a bit of work. But with unique identifiers and open APIs, it is a\nwaste of research funding to have to enter this all by hand. Some time earlier\nI heard of browser plugins in the USA that automagically filled out those forms,\nand realized I wanted that too.</p>\n\n<p>Fortunately, Lars Willighagen had done much of the work already with\n<a href=\"https://citation.js.org/\">citation.js</a> (doi:<a href=\"https://doi.org/10.7717/peerj-cs.214\">10.7717/peerj-cs.214</a>),\na JavaScript library that can convert formats like BibTeX into formatted references\n(with <a href=\"https://citationstyles.org/\">Citation Style Language</a> and\n<a href=\"https://citeproc-js.readthedocs.io/\">citeproc-js</a>), but also can support various\nidentifiers to fetch bibliographic metadata. All we needed is to integrate that.\nAnd so the <a href=\"https://chromewebstore.google.com/detail/ISAAC%20Chrome%20extension/kiljfbiapahlahhilgcgfkfjnkgggode?hl=en-GB&amp;authuser=0\">ISAAC Chrome extension</a>\nwas born. But the history, technology, and use has not been written up, while we\nhave a solid base of some 50 users who regularly use it. And one user <a href=\"https://chromewebstore.google.com/detail/isaac-chrome-extension/kiljfbiapahlahhilgcgfkfjnkgggode/reviews\">wrote</a>:</p>\n\n<blockquote>\n  <p>Werkt geweldig. [..] dit de enige manier waarop publicaties redelijk ingevoerd kunnen worden.</p>\n</blockquote>\n\n<p>Actually, maybe we should rename the extension to <em>ISAAC Browser Extenaion</em>,\nbecause it also works in Brave and Edge.</p>\n\n<h2 id=\"2025-updates\">2025 updates</h2>\n\n<p>The last update had been a while, and we got reports of some changes on the ISAAC\ndatabase side, and we could confirm at least one of the HTML form identifiers had\nchanged, so we fixed filling out the Open Access status of output. This is released\nas <a href=\"https://github.com/citation-js/isaac-chrome-extension/releases/tag/v1.5.0\">version 1.5.0</a>\n(doi:<a href=\"https://doi.org/10.5281/zenodo.14562484\">10.5281/zenodo.14562484</a>).</p>\n\n<p>Another change is that the ISAAC database now supports listing the <a href=\"https://orcid.org/\">ORCID</a>\nidentifier of authors, and this metadata is increasingly available from research\noutput metadata, and <a href=\"https://github.com/citation-js/isaac-chrome-extension/commit/8306809803ef93f448645fc4ca8c55d4c9bb7c6b\">a single line change</a> was enough for Lars to update the extension\nto automatically fill that out too. This is FAIR in action. This version is released\nas <a href=\"https://github.com/citation-js/isaac-chrome-extension/releases/tag/v1.6.0\">version 1.6.0</a>\n(doi:<a href=\"https://doi.org/10.5281/zenodo.14562504\">10.5281/zenodo.14562504</a>) and should\nbe available from the webstore soon.</p>\n\n<h2 id=\"how-it-works\">How it works</h2>\n\n<p>While the ISAAC database does not have an API, at least we found sufficient hooks\nin the HTML to get a reproducible workflow. The foundation of the browser extension\nis global unique identifiers, and it supports DOIs, ISBNs, and PubMed identifiers\nfor research output. For authors, it supports the ORCID. To fetch the metadata,\nit uses online resources Crossref, DataCite, and mEDRA, Google Books and OpenLibrary,\nPubMed and Unpaywall. The first three to fetch metadata for DOIs, the next two for\nISBN numbers, and PubMed for, well, PubMed identifiers. Based on the retrieved\nmetadata it determines which type of research output it should fill out the HTML\nfor.</p>\n\n<p><a href=\"https://unpaywall.org/\">Unpaywall</a> is used to see if the output is, for example,\npublished in a purely open access venue (like a CC-BY-only journal like <a href=\"https://elifesciences.org/\">eLife</a>\nor Nature’s <a href=\"https://www.nature.com/sdata/\">Scientific Data</a>), or published in a\nhybrid journal. The ISAAC database does not have the option to drop a URL (which\ncould be automated with Unpaywall too), but does allow uploading documents into\ntheir database. This last is left to the user.</p>\n\n<h2 id=\"how-to-use-it\">How to use it</h2>\n\n<p>Users would install the browser extension and this would add an add-on icon to\nthe toolbar. The <img src=\"/assets/images/icon.svg\" width=\"16\" alt=\"Icon: Black serif font 'I' on a background of four colored squares: brown, gold, green and platinum\" /> icon shows the various colors of Open Access with an <code class=\"language-plaintext highlighter-rouge\">I</code>, for\nidentifier. The user would then login on the ISAAC database, open their project\ngrant page, and navigate to the Product tab:</p>\n\n<p><img src=\"/assets/images/isaac2025_1.png\" alt=\"\" /></p>\n\n<p>To use the extension, the user would take the following steps.\nFirst, click the “Toevoegen” button, green-blue with white letters in the above\nscreenshot. This would give a page like this:</p>\n\n<p><img src=\"/assets/images/isaac2025_2.png\" alt=\"\" /></p>\n\n<p>Second, and optionally, click one of the types. The metadata retrieved by the extension\ncontains sufficient information to make the right guess, so that this step is optional.\nIf you find that the metadata is wrong and the wrong guess was made, in this step\nyou can first manually indicate the research output type.</p>\n\n<p>Third, one the page with the above screenshot (or, optionally, after indicating\nthe output type), click the ISAAC Chrome Extension icon in the browser toolbar\nto give a popup dialog:</p>\n\n<p><img src=\"/assets/images/isaac2025_3.png\" alt=\"\" /></p>\n\n<p>Fourth, select the identifier type (DOI, ISBN, or PubMed) and give the identifier\nitself, and then click Search. For example, for a DOI, it would look like this:</p>\n\n<p><img src=\"/assets/images/isaac2025_4.png\" alt=\"\" /></p>\n\n<p>Fifth, the plugin will then guide you through the ISAAC HTML forms, just like you\nwould do manually, but with the important difference that some forms show in different\norder. But rest assured, it will not submit anything before your final approval.\nFor example, for a journal article I would immediate fo to the HTML form for the\nfirst author, which, for a random article, could look like this:</p>\n\n<p><img src=\"/assets/images/isaac2025_5.png\" alt=\"\" /></p>\n\n<p>By clicking “Verder”, the browser extension allows you to add missing metadata\n(for example, the ORCID is not listed for this author in the CrossRef metadata\nand the gender is not shared by the publisher), and sometimes you may find yourself\nneeding to correct metadata.</p>\n\n<p>Sixth, after going through all author pages, you will return to the main product\nform, which will look something like this (for a random paper):</p>\n\n<p><img src=\"/assets/images/isaac2025_6.png\" alt=\"\" /></p>\n\n<p>Here you can add the final missing information and upload additional files, like\na PDF of the article. In the above screenshot we find some required (red asterix)\nmissing. In this case, the DOI referred to an article published as “as soon as\npublishable” and the page numbers and issue is simply not known yet. You can also\nsee the Unpaywall metadat in action here.</p>\n\n<p>Seven, like before, the final submission of this new output is done manually.\nThe ISAAC Chrome Extension requires that manual step; on purpose: you are in control.</p>",
      "summary": "In 2022 I had my first experience with the ISAAC database by the Dutch NWO research funding organization. ISAAC is where you apply for funding and where grants get tracked. As such, research output is recorded in this database.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/isaac2025_1.png",
      "date_published": "2025-01-04T00:00:00+00:00",
      "date_modified": "2026-03-29T00:00:00+00:00",
      "tags": ["javascript"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.7717/peerj-cs.214", "doi": "10.7717/peerj-cs.214"
            , "cito":
              
              
                [ 
                  "usesMethodIn"
                  
                 ]
              
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.5281/ZENODO.14562484", "doi": "10.5281/ZENODO.14562484"
            , "cito":
              
              
                [ 
                  "citesAsEvidence"
                  
                 ]
              
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.5281/ZENODO.14562504", "doi": "10.5281/ZENODO.14562504"
            , "cito":
              
              
                [ 
                  "citesAsEvidence"
                  
                 ]
              
             }
            
          
        ],
      
      
      
      
        "authors": [
        
          
            { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" },
          
        
          
            { "name": "Lars Willighagen", "url": "https://orcid.org/0000-0002-4751-4637" }
          
        
        ]
      
    },
    {
      "id": "https://doi.org/10.59350/er1mn-m5q69",
      "url": "https://chem-bla-ics.linkedchemistry.info/2024/12/30/fair-blog-to-blog-citations.html",
      "title": "FAIR blog-to-blog citations",
      "content_html": "<p><a href=\"https://chem-bla-ics.linkedchemistry.info/2021/08/28/scholarly-journals-should-use-archived.html\">Linkrot is real</a> and\n<a href=\"https://doi.org/10.59348/1z1p2-nn569\">digital preservation problematic</a>. One reason why I have\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2023/07/27/archiving-and-updating-my-blog.html\">started migrating my blog</a>\nto a more robust platform. That first step gave me version control. This summer my blog was\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2024/07/21/rogue-scholar-and-more.html\">accepted to Rogue Scholar</a>.\nThat gave me DOIs. And an idea.</p>\n\n<p>Things are coming together, and while commercial publishers (SpringerNature, Elsevier, MDPI, Frontiers, etc)\nare focused on profit (“shareholder value”) instead of the community they serve, Open Science is providing\nworking, real-world, inexpensive, superior FAIR solutions for scientific dissemination. Maybe European\nuniversities are not convinced yet (see <a href=\"https://doi.org/10.59350/1nmwy-nhk20\">Björn’s post</a>), but it is\nhappening.</p>\n\n<p>Two things that are happening are <a href=\"https://openalex.org/\">OpenAlex</a> and <a href=\"https://opencitations.net/\">OpenCitations</a>.\n<a href=\"https://chem-bla-ics.linkedchemistry.info/tag/cito\">CiTO adoption</a> not so much yet, but I am not giving up\nyet. Simply because Open Science doesn’t go away and everything can be picked up tomorrow. Each holiday\nI am picking up the Citation Typing Ontology and this holiday the\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2024/04/02/open-science-retreat-2.html\">use of nanopublications for CiTO intent annotation</a>\nof April this year.</p>\n\n<p>Yesterday, I played with the nanopublication templates used by NanoDash, got to the triples of it, and\nended up using the web interface to create <a href=\"https://w3id.org/np/RA43F9EoOuzF0xoNUnCMNyFsfIqlsuWDdPHCnN0wCdCAw\">a derived template</a>\nfrom <a href=\"https://w3id.org/np/RAX_4tWTyjFpO6nz63s14ucuejd64t2mK3IBlkwZ7jjLo\">Tobias’ template from April</a>.\nWhat makes this nanopublication template special is that it uses <a href=\"https://github.com/SPAROntologies/cito\">the CiTO ontology</a>.</p>\n\n<p>The difference is that the original template used <code class=\"language-plaintext highlighter-rouge\">ScholarlyWork</code> as type for the citing resource,\nwhile the derivative uses <code class=\"language-plaintext highlighter-rouge\">CreativeWork</code> from the schema.org namespace, allowing things like this:</p>\n\n<ul>\n  <li>article to software release: <a href=\"https://w3id.org/np/RAzmTPPM7v5Ilgvo-3aFRRZgdD3ImaUB434NtGlfI0G90\">example nanopub</a></li>\n  <li>article to blog: <a href=\"https://w3id.org/np/RAaRH1WhRgirso3JiTUJJ0XcBaRyHI6G4OZPdWBoIf17U\">example nanopub</a></li>\n  <li>blog to article: <a href=\"https://w3id.org/np/RAXL9q3jakrpaDh8oyVaNS1Y7JowmZm4tx4WcdIFMmg8g\">example nanopub</a></li>\n  <li>blog to blog: <a href=\"https://w3id.org/np/RAJOwolZUwUxuvPEhMFiQYHywJdWMWTlt_gnXoUbUBaYY\">example nanopub</a></li>\n</ul>\n\n<p>The last three are possible because of the Rogue Scholar DOIs. Let’s continue with the fourth example,\nthe blog to blog citation. While an URL is a unique, global identifier, the digital preservation depends\non a lot of things. On the other hand, a DOI with the associated metadata is easier to preserve. For example,\nbecause it can be spread more easily than the digital object itself.</p>\n\n<p>So, when <a href=\"https://blog.front-matter.io/author/martin/\">Martin Fenner</a> and\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2024/12/08/rich-l-apodaca.html\">I</a>\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2024/12/27/archiving_blogs.html\">started</a>\n<a href=\"https://doi.org/10.53731/3c6pm-xbp04\">archiving</a>\nthe <a href=\"https://depth-first.com/\">Depth-First blog of Rich Apodaca</a> to digitally preserve his blog, <!-- keep link -->\nit also automatically gave the blog posts DOIs. This makes the blog more FAIR, just like it does\nfor my blog. And being more FAIR, we can use the DOIs for other things too, like blog to blog\ncitations with CiTO intent annotation, as nanopublications.\n(Technically, any Springer Nature journal can do this, but they found reasons to not do it.)</p>\n\n<p>So, let’s take <a href=\"https://chem-bla-ics.linkedchemistry.info/2024/12/08/rich-l-apodaca.html\">this blog post</a>.\nI have today updated this to not use <code class=\"language-plaintext highlighter-rouge\">depth-first.com</code> URLs but, following Martin’s example, use the DOIs <!-- keep link -->\nfor those posts instead.</p>\n\n<p>And when I make a nanopublication out of this, I can add the citation intent, and then it looks like\n<a href=\"https://w3id.org/np/RAmETOQXyoS5dYeP8yhJscOrAIimf1RHFnzG2GtziqIQ8\">this</a>:</p>\n\n<p><img src=\"/assets/images/nanopub1.png\" alt=\"\" /></p>\n\n<p>For some reason, the DOIs do not show up as references as they do for this post for the DOIs of the\nposts of Martin Paul Eve, Björn Brembs, and Martin Fenner. It does for\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2009/11/19/chempedia-rdf-1-sparql-end-point.html\">this post citing Depth-First</a>.</p>\n\n<p>So, from now on, I will use DOIs when citing other blog posts, and I hope many other blogs will\nstart using Rogue Scholar or some other service to generate DOIs for single blog posts.\nI also have to figure out if I want to use DOIs to link to posts in my own blog.\nAnd hopefully, OpenCitations will soon accept citations provided by nanopublications.\nWith or without CiTO intent annotations, whatever comes first. Oh, and I cannot wait to see\nthe citations who up in <a href=\"https://www.altmetric.com/\">Altmetric.com</a> :)</p>\n\n<p>Let’s see where this is going.</p>",
      "summary": "Linkrot is real and digital preservation problematic. One reason why I have started migrating my blog to a more robust platform. That first step gave me version control. This summer my blog was accepted to Rogue Scholar. That gave me DOIs. And an idea.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/nanopub1.png",
      "date_published": "2024-12-30T00:00:00+00:00",
      "date_modified": "2024-12-30T00:00:00+00:00",
      "tags": ["cito","blog","publishing"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.59348/1z1p2-nn569", "doi": "10.59348/1z1p2-nn569"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.59350/1nmwy-nhk20", "doi": "10.59350/1nmwy-nhk20"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.53731/3c6pm-xbp04", "doi": "10.53731/3c6pm-xbp04"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/vjvdy-6p110",
      "url": "https://chem-bla-ics.linkedchemistry.info/2024/12/27/archiving_blogs.html",
      "title": "Archiving blogs",
      "content_html": "<p>Blogs come and go. Sometimes they move from one location to another. However, blogs have not been systematically\narchived, perhaps for work by efforts by OpenLaboraty. Bora Zivkovic gave in 2012\n<a href=\"https://web.archive.org/web/20120713032329/http://blogs.scientificamerican.com/a-blog-around-the-clock/2012/07/10/science-blogs-definition-and-a-history/\">a good overview <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>,\nto which Paul Raeburn <a href=\"https://ksj.mit.edu/tracker-archive/what-was-first-science-blog/\">replied</a>: <em>“If you weren’t\nblogging in the mid-2000s, when all the science bloggers knew and blogrolled each other, you’ve already missed the golden\nage.”</em>. I think blogging is as strong as ever, but a lot of blogs have become more like columns in bigger media.\nArchiving of blog had not been done systematically, tho some posts made it into print, for example in\n<a href=\"https://web.archive.org/web/20120114030926/http://blogs.scientificamerican.com/network-central/2011/07/18/open-laboratory-2011-submissions-so-far/\">the Open Laboratory <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>\nseries. Some copies made it into libraries, e.g. <a href=\"https://search.worldcat.org/en/title/225554926\">2006</a>,\n<a href=\"https://search.worldcat.org/en/title/727023103\">2010</a>, and <a href=\"https://search.worldcat.org/en/title/797975793\">2012</a>.</p>\n\n<p>The two posts from the first paragraph from the <em>blogs.scientificamerican.com</em> provide a good example of the problem:\nbitrot. The Internet Archive has always been useful for archiving webpages and has been useful for archiving blogs too.\nBut I do not believe it has been used systematically either, but at least it helped recover the above two pages.</p>\n\n<p>So, when I discussed <a href=\"https://depth-first.com/\">the blog of Rich Apodaca</a> <a href=\"https://chem-bla-ics.linkedchemistry.info/2024/12/08/rich-l-apodaca.html\">earlier this month</a>,\nthe question came up if we could archive his blog. Beside his <a href=\"https://depth-first.com/\">personal coverage</a> of\nhis cancer, his blog also covers a good bit of open science cheminformatics of the zeros and 10s.</p>\n\n<h2 id=\"rogue-scholar\">Rogue Scholar</h2>\n\n<p>This is where <a href=\"https://rogue-scholar.org/\">Rogue Scholar</a> comes in. <a href=\"https://blog.front-matter.io/\">Martin Fenner</a>\ntook up my question and started archiving Rich’ blog, resulting in <a href=\"https://rogue-scholar.org/communities/rapodaca/records?q=&amp;l=list&amp;p=1&amp;s=10&amp;sort=newest\">this ‘community’</a>\ncollecting the blog posts. This is what an archive page for a single blog post looks like:</p>\n\n<p><img src=\"/assets/images/depth-first-on-rogue-scholar.png\" alt=\"\" /></p>\n\n<p>What this archive now has is DOIs for each blog post, archived metadata that will also propagate via DataCite, etc.\nIt does not have PDFs or other copies of the full blog posts yet. There are more than 900 blog posts to create\nPDFs for. Anyone <a href=\"https://mastodon.social/@egonw/113725573843479243\">has an idea?</a></p>\n\n<p>I will post later this year about formally/semantically linking blogs citing other blogs using DOIs for blog\nposts, for example from Rogue Scholar. Any probably throw in <a href=\"http://localhost:4000/2024/04/02/open-science-retreat-2.html\">some use of the Citation Typing Ontology</a>.</p>\n\n<p>Anyway, I can recommend everyon to get their blog lists on Rogue Scholar, for the DOIs and for the automatic\narchiving.</p>",
      "summary": "Blogs come and go. Sometimes they move from one location to another. However, blogs have not been systematically archived, perhaps for work by efforts by OpenLaboraty. Bora Zivkovic gave in 2012 a good overview , to which Paul Raeburn replied: “If you weren’t blogging in the mid-2000s, when all the science bloggers knew and blogrolled each other, you’ve already missed the golden age.”. I think blogging is as strong as ever, but a lot of blogs have become more like columns in bigger media. Archiving of blog had not been done systematically, tho some posts made it into print, for example in the Open Laboratory series. Some copies made it into libraries, e.g. 2006, 2010, and 2012.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/depth-first-on-rogue-scholar.png",
      "date_published": "2024-12-27T00:00:00+00:00",
      "date_modified": "2024-12-27T00:00:00+00:00",
      "tags": ["blog","openlab"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/b76wv-bbn97",
      "url": "https://chem-bla-ics.linkedchemistry.info/2024/12/09/sr24.html",
      "title": "Serious Request: &quot;WikiPathways in actie voor MetaKids&quot;",
      "content_html": "<p><a href=\"https://sr24.wikipathways.org/\"><img src=\"/assets/images/sr24.png\" style=\"width: 40%; display: block; margin-left: auto; margin-right: auto; float: right\" alt=\"Screenshot of the 'WikiPathways in actie voor MetaKids' action page.\" /></a>\nEvery day a child is born with an <a href=\"https://imd.wikipathways.org/\">inherited metabolic disorder</a>, and many do not grow old.\n<a href=\"https://metakids.nl/\">MetaKids</a> is a Dutch foundation that collects money and raises awareness and the charity selected\nthis year for the <a href=\"https://npo.nl/\">NPO</a> (Dutch national radio/tv) <a href=\"https://en.wikipedia.org/wiki/NPO_3FM\">3FM</a>\n<a href=\"https://www.npo3fm.nl/kominactie\">Serious Request</a>. This has become <a href=\"https://en.wikipedia.org/wiki/Serious_Request\">a Dutch tradition</a>.\nSerious Request will play music on the radio, when people contributed to the fundraiser, and the more money, the\nmore often the music gets played.</p>\n\n<p>But besides this, Serious Request also encourages people to jump into action. And we have jumped into action.</p>\n\n<h2 id=\"what-we-will-do\">What we will do</h2>\n\n<p>In the week when the <a href=\"https://en.wikipedia.org/wiki/Disc_jockey\">DJ</a>s are locked up in their\n<a href=\"https://en.wikipedia.org/wiki/Serious_Request#/media/File:Serious_Request_2008_-_20.jpg\">glass house</a> in\n<a href=\"https://en.wikipedia.org/wiki/Zwolle\">Zwolle</a> just before christmas, <a href=\"https://scholia.toolforge.org/author/Q56868311\">Dr Laura Steinbusch</a>,\n<a href=\"https://scholia.toolforge.org/author/Q27987764\">Martina Kutmon</a>, <a href=\"https://www.maastrichtuniversity.nl/d-van-beek\">Daan van Beek</a>,\nand I will work on making our open science <a href=\"https://wikipathways.org/\">WikiPathways</a> knowledgebase even better to support\nresearch into these disorders. Like we did for COVID-9/SARS-CoV-2 before (see doi:<a href=\"https://doi.org/10.1038/s41597-020-0477-8\">10.1038/s41597-020-0477-8</a>).\nGuided by experts, we will update existing maps (leveraging on the awesome work\n<a href=\"https://doi.org/10.26481/dis.20240624ds\">by Denise Slenter in her PhD</a>) with recent literature, supported by\n<a href=\"https://www.wikipathways.org/sr24-curation/\">computer-assisted data curation</a>, and draw new maps where there\nare knowledge gaps.</p>\n\n<p>Read our full statements here: <a href=\"https://www.npo3fm.nl/kominactie/acties/wikipathways-in-actie-voor-metakids\">https://www.npo3fm.nl/kominactie/acties/wikipathways-in-actie-voor-metakids</a></p>\n\n<p>Part of this will be a workshop day on Thursday 19th of December in Maastricht. Details about that will follow.</p>\n\n<p>In this way, we collect not only money to donate, but we also donate research.</p>\n\n<h2 id=\"how-to-donate\">How to donate</h2>\n\n<p>Well, obviously, it is a fund-raiser. So, please <a href=\"https://www.npo3fm.nl/kominactie/acties/wikipathways-in-actie-voor-metakids\">donate here</a>.\nWe have at least one donation with PayPal (not a fan) from outside The Netherlands.</p>\n\n<p>We are currently at 405 euro of our 2500 euro goal. Please <a href=\"https://www.npo3fm.nl/kominactie/acties/wikipathways-in-actie-voor-metakids\">help us a bit closer to that goal</a>.</p>\n\n<h2 id=\"how-can-you-help\">How can you help</h2>\n\n<p>You can help us enormously by spreading the news of the “kom in actie” in your social network, and raise awareness\nfor the cause of MetaKids. For example, by sharing our action:</p>\n\n<ul>\n  <li>the action page: <a href=\"https://www.npo3fm.nl/kominactie/acties/wikipathways-in-actie-voor-metakids\">https://www.npo3fm.nl/kominactie/acties/wikipathways-in-actie-voor-metakids</a></li>\n  <li>the “we are working on” and results page: <a href=\"https://www.wikipathways.org/communities/sr24.html\">https://www.wikipathways.org/communities/sr24.html</a></li>\n</ul>\n\n<p>Second, in good open science practice, we welcome you to join our “kom in actie”, and several other have\nalread indicated wanting to do so. There is plenty of work that can be done, and we are documenting\n<a href=\"https://github.com/orgs/wikipathways/projects/2/views/1\">our activity on a project board</a>. Any work that will make\nthe FAIR and open knowledge better or show the power will help. To get some ideas of how the knowledge can be used\nis written up in <a href=\"https://link.springer.com/chapter/10.1007/978-3-030-67727-5_73\">this open access chapter</a> by\nDenise, Tina, and me.</p>",
      "summary": "Every day a child is born with an inherited metabolic disorder, and many do not grow old. MetaKids is a Dutch foundation that collects money and raises awareness and the charity selected this year for the NPO (Dutch national radio/tv) 3FM Serious Request. This has become a Dutch tradition. Serious Request will play music on the radio, when people contributed to the fundraiser, and the more money, the more often the music gets played.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/sr24.png",
      "date_published": "2024-12-09T00:00:00+00:00",
      "date_modified": "2024-12-09T00:00:00+00:00",
      "tags": ["wikipathways","sr24"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1038/S41597-020-0477-8", "doi": "10.1038/S41597-020-0477-8"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.26481/DIS.20240624DS", "doi": "10.26481/DIS.20240624DS"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1007/978-3-030-67727-5_73", "doi": "10.1007/978-3-030-67727-5_73"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/myaw4-dtg76",
      "url": "https://chem-bla-ics.linkedchemistry.info/2024/12/08/rich-l-apodaca.html",
      "title": "Richard L. Apodaca",
      "content_html": "<p><img src=\"/assets/images/depth_first.png\" style=\"width: 40%; display: block; margin-left: auto; margin-right: auto; float: right\" alt=\"Screenshot of the first Depth-First blog post\" />\nIf you are into openscience chemistry or chemistry blogging, then you probably heard of\n<a href=\"https://orcid.org/0000-0003-3855-9427\">Rich Apodaca</a>’s <a href=\"https://depth-first.com/\">Depth-First blog</a>. <!-- keep link -->\nRich <a href=\"https://doi.org/10.59350/xyp0f-9dt42\">started blogging in 2006 <i class=\"fa-solid fa-recycle fa-xs\"></i></a> but this is not\nhow I discovered his work originally. I know that we at least already had contact in 2005,\nbecause that is when he wrote about an integration between his Octet library and the Chemistry Development Kit\nin the <a href=\"https://sourceforge.net/projects/cdk/files/CDK%20News/\">CDK News</a> (volume 2, issue 2),\n<em>CDKTools: The CDK-Octet Bridge</em>. In 2006 he <a href=\"https://doi.org/10.59350/esgte-mv539\">reviewed our use of the Open Journal System for CDK News <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.</p>\n\n<p>But I did find we have been blogging about our work a lot. <a href=\"https://www.google.com/search?q=site%3Achem-bla-ics.blogspot.com+rich\">Searching for Rich</a>\ngives false positives, but plenty of discussions of his work. At the same time, <a href=\"https://www.google.com/search?q=site:depth-first.com+egon\">my name shows up multiple times</a> <!-- keep link -->\nin Depth-First too. Looking back at our shared history, we find, for example, Rich has blogged a lot about using the\n<a href=\"https://doi.org/10.59350/50ebs-4zq55\">Chemistry Development Kit in Ruby <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.</p>\n\n<p>Rich <a href=\"https://depth-first.com/articles/\">blogged about a lot of cheminformatics innovation</a>. For example, <!-- keep link -->\nin 2006 <a href=\"https://doi.org/10.59350/pz3p6-fv247\">he was working on multi-atom bonding <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\nsuch as in ferrocene, something that is even today not routinely used in cheminformatics. I replied\nto that in <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/12/30/modern-chemistry-in-cdk-beyond-two.html\">this post</a>.\nAnother thing he explored was embedding chemical graph notations in PNG images. In 2007 he\nwrote how to <a href=\"https://doi.org/10.59350/j026p-17z02\">Never Draw the Same Molecule Twice: Image Metadata for Cheminformatics <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.\nThis was picked up by several others, including me with <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/08/24/jchempaint-too-png-embedded.html\">an implementation in JChemPaint</a>.</p>\n\n<p>Another tool that I really liked was <a href=\"https://web.archive.org/web/20101010030537/http://chempedia.com/\">his Chempedia <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>\nwhich collected “[f]ree chemical information resources created and reviewed by chemists”. One of the things it did\nwas link chemical names to chemical structures, e.g. for <a href=\"https://web.archive.org/web/20101031093610/http://chempedia.com/substances/0-4825-8876-0064\">this compound <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>.\nAnd because of the open license I was able to generate <a href=\"https://chem-bla-ics.linkedchemistry.info/2009/11/19/chempedia-rdf-1-sparql-end-point.html\">an RDF representation of Chempedia</a>.\nThis resulted perhaps in one of my first online SPARQL endpoints.</p>\n\n<p>One and a half year ago he was <a href=\"https://doi.org/10.59350/5ct28-aaj63\">confronted with health issues <i class=\"fa-solid fa-recycle fa-xs\"></i></a>. Rich\nblogged openly about the months after that. Rereading this post is still hard, having seen cancer in action\non my mother. It turned out to be cancer, <a href=\"https://doi.org/10.59350/g29jj-d3m35\">a brain tumor <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.\nJust this Thursday I attended a fascinating <sup>2</sup>H NMR presentation, showing how much better\nwe got at recognizing tumors, but Rich’ MRI was obvious. He blogged for months on\n<a href=\"https://doi.org/10.59350/mxqbw-ek659\">his plan</a>. Until <a href=\"https://doi.org/10.59350/6beed-gk067\">the end of May <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nthis year.</p>\n\n<p>Some weeks ago I received confirmation our fear; he passed away. Richard L. Apodaca was\n<a href=\"https://search.lib.utexas.edu/discovery/fulldisplay?docid=alma991024143089706011&amp;context=L&amp;vid=01UTAU_INST:SEARCH&amp;lang=en&amp;search_scope=MyInst_and_CI&amp;adaptor=Local%20Search%20Engine&amp;tab=Everything&amp;query=any,contains,39207173&amp;sortby=rank\">born in 1968</a>,\ncompleted his PhD at the University of Texas at Austin in 1996 on <em>Studies in enantioselective catalysis:\n(1) a new class of chiral C₂-symmetric bisphenols; (2) Diorganotin dihalides</em> (wikidata:<a href=\"https://scholia.toolforge.org/work/Q131405461\">Q131405461</a>).\nRich published multiple papers in the field of medicinal chemistry (see <a href=\"https://scholia.toolforge.org/author/Q43837652\">his Scholia profile</a>),\nwas very active in open science and <a href=\"https://patents.google.com/?inventor=Richard+Apodaca\">held many patents</a>.\nHis latest work was about <em>Balsa: A Compact Line Notation Based on SMILES</em>\n(see doi:<a href=\"https://doi.org/10.26434/chemrxiv-2022-01ltp\">10.26434/chemrxiv-2022-01ltp</a>).</p>\n\n<p>The <a href=\"https://depth-first.com/\">Depth-First blog</a> has a CC-BY 2.0 license and perhaps <a href=\"https://rogue-scholar.org/\">Rogue Scholar</a> <!-- keep link -->\ncan archive it? It helps us remember Rich and his contributions to open science cheminformatics.</p>",
      "summary": "If you are into openscience chemistry or chemistry blogging, then you probably heard of Rich Apodaca’s Depth-First blog. Rich started blogging in 2006 but this is not how I discovered his work originally. I know that we at least already had contact in 2005, because that is when he wrote about an integration between his Octet library and the Chemistry Development Kit in the CDK News (volume 2, issue 2), CDKTools: The CDK-Octet Bridge. In 2006 he reviewed our use of the Open Journal System for CDK News .",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/depth_first.png",
      "date_published": "2024-12-08T00:00:00+00:00",
      "date_modified": "2025-02-22T00:00:00+00:00",
      "tags": ["openscience","cheminf"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.26434/chemrxiv-2022-01ltp", "doi": "10.26434/chemrxiv-2022-01ltp"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.59350/xyp0f-9dt42", "doi": "10.59350/xyp0f-9dt42"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.59350/esgte-mv539", "doi": "10.59350/esgte-mv539"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.59350/50ebs-4zq55", "doi": "10.59350/50ebs-4zq55"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.59350/pz3p6-fv247", "doi": "10.59350/pz3p6-fv247"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.59350/j026p-17z02", "doi": "10.59350/j026p-17z02"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.59350/5ct28-aaj63", "doi": "10.59350/5ct28-aaj63"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.59350/g29jj-d3m35", "doi": "10.59350/g29jj-d3m35"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.59350/6beed-gk067", "doi": "10.59350/6beed-gk067"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.59350/mxqbw-ek659", "doi": "10.59350/mxqbw-ek659"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/9mb5c-y3a10",
      "url": "https://chem-bla-ics.linkedchemistry.info/2024/11/23/version-of-record.html",
      "title": "Version of record, and what Open Access must learn from Open Science",
      "content_html": "<p>Before we go into the learning bit, let’s just revisit what a <em>version of record</em> is. Wikipedia\n<a href=\"https://en.wikipedia.org/wiki/Version_of_record\">describes it</a> as\n“the fully copyedited, typeset and formatted copy of a manuscript as published” (with two references).\nBasically, in the whole scheme of research output, it is a <em>release</em>. It is a tagged version of the\noutput, allowing people to discuss that version specifically, so that we do not run into endless “oh, but\nI meant version <code class=\"language-plaintext highlighter-rouge\">manuscript_rewrite_V2_AE_MB_Fixed.docx</code>”. Really, publishing is not unique at all and\npublishers are doing it wrong.</p>\n\n<p>So, with “version of record” defined, why do we have only one in publishing?</p>\n\n<p>There is absolutely no reason not to have multiple versions of the same narrative, as long as they\nare clearly tagged. Open Science has been doing this for two decades, but publishers have been slacking.\nRetractions are updated versions of the same article, as are corrections, corrigendum, and errata.\nIt is not that publishers do not know how to do it. Hesitently, they are accepting that preprints\nexist, but publishers tend to frame that as inferior versions. There was a paper earlier this month\nthat looked into how much the versions are really different, and when I find it again, I will add the link.</p>\n\n<p>Of course, money, control, power likely have a role here. And historic reasons too, I guess. After all,\nwhen you have to print a journal issue and send them by horse carriage to universities around the\nworld, making updates is indeed not trivial.</p>\n\n<h2 id=\"twitter-or-x-or-mastodon-or-bluesky\">Twitter or X (or Mastodon or Bluesky)</h2>\n\n<p>But as an openscientist, I have the urge to keep research output relevant. We do this for data, we\ndo this for community standards, and we do this for research code. Routinely. Again, for decades.\nMust open access not learn from open science here?</p>\n\n<p>I <a href=\"https://mastodon.social/@egonw/113252951241453752\">asked that recently on Mastodon</a>.\nSpecifically, should the <em>Ten simple rules for getting started on Twitter as a scientist</em> article\n(doi:<a href=\"https://doi.org/10.1371/journal.pcbi.1007513\">10.1371/journal.pcbi.1007513</a>) not be updated?\nLooking at the number of scientific papers that discuss social media in scientific\ncommunication, ten tips sound to me to be timeless. And I was interested in why or why-not the paper\nshould be updated. Content-wise, a trivial update would be to update the name to X, which is the\nname of what was formerly known as Twitter. But then, updating the paper to replace Twitter\nby Bluesky or Mastodon would not be that much work either.</p>\n\n<p>The discussion brought up various aspects of this question (and hereby thanks to all who joined the\ndiscussion!). Is it worth it? Is it legal? Should it be an update, or just a new paper? Who\nshould do it? Do scholars have some responsibility to keep there research relevant? If I string-replace\nTwitter with X, how do I make clear who the original authors are, and what my role is? How do we\nget PLOS to point to the new version (surely not as corrigendum)? I do not have the answers.</p>\n\n<p>But I do see differences between different types of research output and that\nmakes these question an essential part of <a href=\"https://recognitionrewards.nl/\">Recognition and Rewards</a>.\nIf it some types of output have different rules, then we do not give all scholars the same\nchance of recognition. Of course, this is the current situation, and just reflects that academia\nstill has much to do to adopt Open Science.</p>",
      "summary": "Before we go into the learning bit, let’s just revisit what a version of record is. Wikipedia describes it as “the fully copyedited, typeset and formatted copy of a manuscript as published” (with two references). Basically, in the whole scheme of research output, it is a release. It is a tagged version of the output, allowing people to discuss that version specifically, so that we do not run into endless “oh, but I meant version manuscript_rewrite_V2_AE_MB_Fixed.docx”. Really, publishing is not unique at all and publishers are doing it wrong.",
      
      "date_published": "2024-11-23T00:00:00+00:00",
      "date_modified": "2024-11-23T00:00:00+00:00",
      "tags": ["publishing","openaccess","openscience"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1371/JOURNAL.PCBI.1007513", "doi": "10.1371/JOURNAL.PCBI.1007513"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/djm89-5nb39",
      "url": "https://chem-bla-ics.linkedchemistry.info/2024/11/17/sparql-examples.html",
      "title": "SPARQL examples: SIB model, software, and patches",
      "content_html": "<p><a href=\"https://akademienl.social/@jerven\">Jerven Bolleman</a> <em>et al.</em> recently <a href=\"https://arxiv.org/abs/2410.06010\">published a great preprint</a>\nabout how to use RDF to give SPARQL queries context by linking it (semantically) with metadata. The context includes\nkeywords, the SPARQL endpoint the query can be run against, and a human-oriented description of the query. A few groups\nhave at recent hackathons been working on usingn the combination of a SPARQL query and a human-oriented description\nto train large language models, including the group behind this paper. Given that SPARQL is a very small language, I can see\nthis may work well, and that it may support our <a href=\"https://vhp4safety.nl/\">VHP4Safety</a> and\n<a href=\"https://scholia.toolforge.org/\">Scholia</a> projects.</p>\n\n<p>But in addition to the data model for SPARQL as research output (see doi:<a href=\"https://doi.org/10.32388/ZNWI7T.2\">10.32388/ZNWI7T.2</a>),\nthe paper also introduces the <a href=\"https://github.com/sib-swiss/sparql-examples-utils\">sparql-example-utils</a> software that I was\nfirst introduced with at <a href=\"https://www.wikidata.org/wiki/Wikidata:Scholia/Events/Hackathon_October_2024\">the recent October Scholia hackathon</a>.</p>\n\n<p>But I have/had some features I like to see added. The first is provenance. Who is the author/contributor of the SPARQL\nquery? Is there a open license for it, or perhaps public domain? How do I give attribution if I reuse the SPARQL query?\nThese things matter in a modern <a href=\"https://recognitionrewards.nl/\">recognition and rewards</a> world where is room for\neveryone’s talent. A set of good SPARQL queries may be more valuable than a ten-page Jupyter notebook (and the other way\naround). So, I <a href=\"https://github.com/sib-swiss/sparql-examples-utils/pull/24\">started</a>\n<a href=\"https://github.com/sib-swiss/sparql-examples-utils/pull/25\">writing</a>\n<a href=\"https://github.com/sib-swiss/sparql-examples-utils/pull/26\">patches</a>. And I created\n<a href=\"https://github.com/BiGCAT-UM/sparql-examples-utils/releases/tag/v2.0.11-tgx-1\">a custom jar</a> so that I can see these\npatches in action in <a href=\"https://bigcat-um.github.io/sparql-examples/\">our growing list of SPARQL queries</a>\n(here <a href=\"https://bigcat-um.github.io/sparql-examples/examples/WikiPathways/002.html\">a WikiPathways query</a>):</p>\n\n<p><img src=\"/assets/images/sparql-examples-tgx.png\" alt=\"\" /></p>\n\n<p>I started collecting SPARQL queries for <a href=\"https://bigcat-um.github.io/sparql-examples/examples/ChEMBL/\">ChEMBL</a>,\n<a href=\"https://bigcat-um.github.io/sparql-examples/examples/WikiPathways/\">WikiPathways</a>, and\n<a href=\"https://bigcat-um.github.io/sparql-examples/examples/VHP4Safety/\">VHP4Safety</a>. These queries are often part\nof other interfaces but we can easily extract the original SPARQL from the Turtle files behind these pages.</p>",
      "summary": "Jerven Bolleman et al. recently published a great preprint about how to use RDF to give SPARQL queries context by linking it (semantically) with metadata. The context includes keywords, the SPARQL endpoint the query can be run against, and a human-oriented description of the query. A few groups have at recent hackathons been working on usingn the combination of a SPARQL query and a human-oriented description to train large language models, including the group behind this paper. Given that SPARQL is a very small language, I can see this may work well, and that it may support our VHP4Safety and Scholia projects.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/sparql-examples-tgx.png",
      "date_published": "2024-11-17T00:00:00+00:00",
      "date_modified": "2025-03-11T00:00:00+00:00",
      "tags": ["sparql","wikipathways","vhp4safety","chembl","scholia"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.32388/ZNWI7T", "doi": "10.32388/ZNWI7T"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.48550/arXiv.2410.06010", "doi": "10.48550/arXiv.2410.06010"
             }
            
          
        ],
      
      "_funding": [{"award": { "title" : "The Virtual Human Platform for Safety Assessment", "acronym" : "VHP4Safety", "uri" : "drc.filenumber:nwa129219272" }, "funder": { "name": "Dutch Research Council", "ror": "04jsz6e67" } }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/yxxp4-r5j24",
      "url": "https://chem-bla-ics.linkedchemistry.info/2024/11/10/mastodon-bridge-to-bluesky.html",
      "title": "Mastodon, RSS, BlueSky",
      "content_html": "<p><img style=\"float: right;\" src=\"/assets/images/bluesky.png\" width=\"200\" />\nThe x-odus continues and there is a wave of researchers moving from X to another walled-garden called <a href=\"https://en.wikipedia.org/wiki/Bluesky\">Bluesky</a>.\nThis is good and bad. First, it is good that people are leaving X (imho) and it is good that they move to a platform that supports\nopen standards, the <a href=\"https://en.wikipedia.org/wiki/AT_Protocol\">AT Protocol</a>. But I am less sure, about moving to another closed source\nplatform. I prefer <a href=\"https://chem-bla-ics.linkedchemistry.info/tag/mastodon\">Mastodon</a>. You can follow Mastodon accounts with their\n<a href=\"https://chem-bla-ics.linkedchemistry.info/tag/rss\">RSS</a> feeds and that gives BlueSky users the ability to follow me on social media.\nThis is important to me. I have a LinkedIn account too, but you can only follow me there if you have an account there too. To me,\nthat does not align with the Open Science ideals.</p>\n\n<p>But while you can follow me Mastodon accounts <a href=\"https://social.edu.nl/@egonw.rss\">with</a> <a href=\"https://mastodon.social/@egonw.rss\">RSS</a>\n(or just by checking the <a href=\"https://social.edu.nl/@egonw\">two</a> <a href=\"https://mastodon.social/@egonw\">webpages</a>, this is a read-only access. That is,\nyou cannot reply. For that, you still need an Mastodon (or Fediverse) account too.</p>\n\n<p>But then there is <a href=\"https://fed.brid.gy/docs\">Bridgy Fed</a>. It <em>“is a decentralized social network bridge. It connects the fediverse,\nthe web, and Bluesky”</em>. I learned about this recently, and it seems to do what it promises. Using the AT Protocol, it allows me\nto follow and reply to BlueSky users (if they have enabled the bridge), and BlueSky users can interact with me.</p>\n\n<p>So, if you have BlueSky and want to follow one or both of my Mastodon accounts, check out:</p>\n\n<ul>\n  <li><a href=\"https://bsky.app/profile/egonw.social.edu.nl.ap.brid.gy\">@egonw.social.edu.nl.ap.brid.gy</a> (focused on my research)</li>\n  <li><a href=\"https://bsky.app/profile/egonw.mastodon.social.ap.brid.gy\">@egonw.mastodon.social.ap.brid.gy</a> (more general open science)</li>\n</ul>\n\n<p>But only if they enabled the bridge too, I can follow them back.</p>",
      "summary": "The x-odus continues and there is a wave of researchers moving from X to another walled-garden called Bluesky. This is good and bad. First, it is good that people are leaving X (imho) and it is good that they move to a platform that supports open standards, the AT Protocol. But I am less sure, about moving to another closed source platform. I prefer Mastodon. You can follow Mastodon accounts with their RSS feeds and that gives BlueSky users the ability to follow me on social media. This is important to me. I have a LinkedIn account too, but you can only follow me there if you have an account there too. To me, that does not align with the Open Science ideals.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/bluesky.png",
      "date_published": "2024-11-10T00:00:00+00:00",
      "date_modified": "2024-11-10T00:00:00+00:00",
      "tags": ["mastodon"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/acrqt-9y217",
      "url": "https://chem-bla-ics.linkedchemistry.info/2024/10/29/suppdata-data-dataset-database.html",
      "title": "Additional files, data, datasets, databases, and published data",
      "content_html": "<p>Open Science doesn’t make publishing easier. That that’s all for the better: our research efforts are complex,\nso why should the publishing be. Sure, I am <strong>not</strong> talking about references formatting or moving the Methods\nsection to the right location, or some silly statement that all authors agree with the manuscript when you are\nthe only author.</p>\n\n<p>No, let’s talk about data. What should you publish? How, and when? And why would you do it in the first\nplace? This is not going to be a post about FAIR either, but instead about when to publish data as additional\nfiles (aka supplementary data), raw data, processed data, as a datasets, or even as a database. That’s a\nlot of types of data, and the differences matter at least for the effort you want to put in.</p>\n\n<p>First, things have changed. We produce a massive amount more data. In the past your data, or at least the\nprocessed data, would be part of your conference talk, your journal article, or your book (chapter).\nOpen Science has changed this: data should be easier to reuse. But that results in new questions; those\nas in the previous paragraph. So, let’s add some context.</p>\n\n<p>Data is very broad and includes digital knowledge. Data can be raw, and the exact numbers collected (e.g.\nby a apparatus) or created by researchers. Processed data is what you get when you process the raw data.\nFor example, raw data may be a FID graph in nucleic magnetic resonance, while processed data would be a\nplot showing intensities versus chemical shifts. Published data is then a list of peaks you put in your\nresults section to support your claim of chemical identity.</p>\n\n<p>A fourth type of data is metadata, and could here be the instrument on which the FID was measured, or\nthe solvent used, etc. This is where it gets complicated, because depending on the researcher who\nprocesses the data, metadata can actually be data itself. For example, when you study the chemical\nshift differences in different organic solvents.</p>\n\n<p>From a more social level, the <a href=\"https://chem-bla-ics.linkedchemistry.info/2024/10/21/nasa-tops.html\">Open Science 101</a>\nuses the following categories: primary data as collected/recorded by the researcher, and\n“secondary data typically refers to data that is used by someone different from who collected or generated the data”.\nThis angle of data captures the collaboration aspects of open science, but says more about\nthe processors than the data, I think.</p>\n\n<h2 id=\"monitoring-open-data\">Monitoring Open Data</h2>\n\n<p>Central aspect of doing research is to disseminate the research. Traditionally, this has been\ndisseminating results, hoping they become facts. Increasingly, we realize that this process needs\nimprovement, particularly clearly studies, done, and communicated by the Open Science approaches.</p>\n\n<p>Complementary, there is recognition&amp;rewarding (R&amp;R) and the wish to use various kinds of monitoring to\nassess who should be rewarded (and who should be fired), and the monitor is the implementation\nof the recognition. So, how does this work for open data? We can count every open data, but\nif thrown on a big pile, that becomes a bad monitor for use in recognition and rewarding.</p>\n\n<p>One idea is to differentiate in what data we monitor? Just raw data? Or processed data?\nHow much intellectual effort does that have to in collecting/recording the data? Should that\nbe part of the monitor and how do you even measure that? Lot’s of known unknowns here.</p>\n\n<p>But this should not inhibit us from telling the research narative. And maybe we should\njust exploring the possible narratives to allow us how it may help us monitor work done,\nhow to recognize contributions to the scientific record, and how to use all that in R&amp;R.</p>\n\n<p>I here present some example from my own research, just to start a narrative.</p>\n\n<h2 id=\"raw-data\">Raw data</h2>\n\n<p>Over the years I have collected and recorded quite a bit of raw data. First data collected in the lab\nand later mostly recorded. Even though I have been doing Open Science since the late nineties,\nI cannot say all my data has been archived well. Even less so, I do not have a “publication list”\nof all my raw data. As an academic community, we have been focusing too much on the scholarly\narticle as the center of the research system (more on that later, because there is awesome\nresearch presented at the Dutch National Open Science Festival).</p>\n\n<ul>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.org/03/27/migrating-pka-data-from-drugmet-to.html\">pKa values</a> (not archived, no DOI)</li>\n  <li><a href=\"https://doi.org/10.6084/m9.figshare.7075214.v1\">NanoWiki 5</a> (archived, with DOI)</li>\n</ul>\n\n<h2 id=\"processed-data\">Processed data</h2>\n\n<p>As is defined in the <a href=\"https://commission.europa.eu/law/law-topic/data-protection/reform/what-constitutes-data-processing_en\">European laws around GDPR</a>,\nprocessing “includes the collection, recording, organisation, structuring, storage,\nadaptation or alteration, retrieval, consultation, use, disclosure by transmission, dissemination or otherwise making available,\nalignment or combination, restriction, erasure or destruction of [..] data”. As you can see, this is slightly\ndifferent from the first, but in light of protecting citizen, this broader definition makes sense.\nMy point here the that processing should be taken broadly. And data curation, which researchers\nroutinely do, is processing too. For any data scientist, this is easily taking up 25% of the\nfull time needed for any data analysis. One of the points of the FAIR principles is to keep\nthat number as low as possible, but not really the point here.</p>\n\n<p>When it comes to this kind of data, I like people to have readily access to the results\nof my curation. You will find a lot of processed data like this archived. Some examples of\ndata by me or to which I contributed:</p>\n\n<ul>\n  <li><a href=\"https://doi.org/10.5281/zenodo.13933046\">WikiPathways</a> (monthly archived, with DOIs)</li>\n  <li><a href=\"https://doi.org/10.6084/m9.figshare.681678\">ChemPedia RDF</a> (different format than original data, archived, with DOI)</li>\n  <li><a href=\"https://doi.org/10.6084/m9.figshare.26931712.v1\">BridgeDb Metabolite ID mapping database</a> (irregular releases, not every one is notable; archived, with DOI)</li>\n</ul>\n\n<p>The last one will look something like this:</p>\n\n<p><img src=\"/assets/images/figshare_bridgedb.png\" alt=\"\" /></p>\n\n<h2 id=\"published-data\">Published data</h2>\n\n<p>And then we have published data, which refers to data presented in a publication, like a journal\narticles. We know this as supplementary data or additional files. Several publishers, like\nBioMedCentral, submit these data automatically to a repository. For example, the\n<a href=\"https://jcheminf.biomedcentral.com/\">Journal of Cheminformatics</a> publishes all additional files under a CCZero license on Figshare.\nBut many of these support the narrative of the story, rather than the narrative of the\nresearch question. Of course, journals also have limited expectations of the format and\nmy personal impression is that these are not commonly FAIR. (Open Access is not Open Science.)</p>\n\n<p>Some examples of such datasets where I do not see them as notable and do not expect them\nto be monitored. These datasets are part of the journal article, and that narrative is\nalready monitored.</p>\n\n<ul>\n  <li><a href=\"https://doi.org/10.6084/m9.figshare.c.3696370_D1.v1\">MOESM1 of PubChemRDF: towards the semantic annotation of PubChem compound and substance databases</a> (Word document with data, with DOI)</li>\n  <li><a href=\"https://doi.org/10.6084/m9.figshare.c.3698536_D1.v1\">MOESM1 of XMetDB: an open access database for xenobiotic metabolism</a> (archived Structured Data file with chemical structures, with DOI)</li>\n</ul>\n\n<h2 id=\"databases\">Databases</h2>\n\n<p>And then we have databases provides as interactive website. This allows other researchers\nto explore the data, before the start processing the data. These typically do not have a DOI itself,\ntho data can be routinely archived as in the above WikiPathways example.</p>\n\n<p>Databases itself, as research output, are much harder to archive. And to make them citatable,\nresearch publish journal articles with a narrative that describes the database. The follwing two\nare such database papers, where the article DOI is a proxy for the database:</p>\n\n<ul>\n  <li><a href=\"https://doi.org/10.1186/1758-2946-5-23\">The ChEMBL database as linked open data</a> (<a href=\"https://chemblmirror.rdf.bigcat-bioinformatics.org/\">online</a>, DOI via article)</li>\n  <li><a href=\"https://doi.org/10.1186/s13321-021-00573-5\">PSnpBind</a> (<a href=\"https://psnpbind.org/\">online</a>, DOI via article)</li>\n</ul>",
      "summary": "Open Science doesn’t make publishing easier. That that’s all for the better: our research efforts are complex, so why should the publishing be. Sure, I am not talking about references formatting or moving the Methods section to the right location, or some silly statement that all authors agree with the manuscript when you are the only author.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/figshare_bridgedb.png",
      "date_published": "2024-10-29T00:00:00+00:00",
      "date_modified": "2024-11-02T00:00:00+00:00",
      "tags": ["data"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.6084/M9.FIGSHARE.7075214.V1", "doi": "10.6084/M9.FIGSHARE.7075214.V1"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.5281/ZENODO.13933046", "doi": "10.5281/ZENODO.13933046"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.6084/M9.FIGSHARE.681678", "doi": "10.6084/M9.FIGSHARE.681678"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.6084/M9.FIGSHARE.26931712.V1", "doi": "10.6084/M9.FIGSHARE.26931712.V1"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.6084/M9.FIGSHARE.C.3696370_D1.V1", "doi": "10.6084/M9.FIGSHARE.C.3696370_D1.V1"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.6084/M9.FIGSHARE.C.3698536_D1.V1", "doi": "10.6084/M9.FIGSHARE.C.3698536_D1.V1"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1186/1758-2946-5-23", "doi": "10.1186/1758-2946-5-23"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1186/S13321-021-00573-5", "doi": "10.1186/S13321-021-00573-5"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/mch14-dtx11",
      "url": "https://chem-bla-ics.linkedchemistry.info/2024/10/24/vhp4safety.html",
      "title": "New paper: The Virtual Human Platform for Safety Assessment (VHP4Safety)",
      "content_html": "<p>I have <a href=\"https://chem-bla-ics.linkedchemistry.info/tag/vhp4safety\">not posted a lot</a> about our <a href=\"https://vhp4safety.nl/\">Virtual Human Platform for Safety Assessment</a>\n(VHP4Safety) project yet. Actually, more generally I do not post frequently about the funded projects. This is likely that few of them are Open Science\nby contract and often they have some formal process in place to approve output. That makes open notebook science-style posting about these projects\nhard. One is restricted to previously cleared material.</p>\n\n<p>One such material is the new project paper about VHP4Safety, <em>The Virtual Human Platform for Safety Assessment (VHP4Safety) project: Next generation chemical\nsafety assessment based on human data</em> (doi:<a href=\"https://doi.org/10.14573/altex.2407211\">10.14573/altex.2407211</a>). It is a fun project to work in,\nambitious, and in a vibrant community making steps in open science. That means that a lot of what we is core science, but the science comes\nfrom many different disciplines, and it is as much natural sciences as it is humanities.</p>\n\n<p>So, we somewhere during the project we started organizing hackathons. Some of us had plenty of experience with that already, but these\nare hackathons from fields where this has not been as common, perhaps. But is has been fun, e.g. see\n<a href=\"https://www.sciencrew.com/c/9347/a/335221636?title=Advancing_AI_in_Toxicology_Insights_from_the_Third_VHP4Safety_H\">this write up of the third hackathon</a>.</p>\n\n<p>There is a lot more I should be writing about VHP4Safety, and I will try, but for now I will limit it to these pointers:</p>\n\n<ul>\n  <li>the main VHP4Safety website: <a href=\"https://vhp4safety.nl/\">https://vhp4safety.nl/</a></li>\n  <li>our documentation platform: <a href=\"https://docs.vhp4safety.nl/\">https://docs.vhp4safety.nl/</a></li>\n  <li>our catalogue of cloud services: <a href=\"https://cloud.vhp4safety.nl/\">https://cloud.vhp4safety.nl/</a></li>\n  <li>our common language: <a href=\"https://glossary.vhp4safety.nl/\">https://glossary.vhp4safety.nl/</a></li>\n</ul>\n\n<p>And we try to register our solutions as widely as possible, e.g. with national and ELIXIR indices:</p>\n\n<ul>\n  <li>our <a href=\"https://taxila.nl/content_providers/vhp4safety\">Taxila.nl section</a></li>\n  <li>out <a href=\"https://tess.elixir-europe.org/content_providers/vhp4safety\">ELIXIR TeSS section</a></li>\n</ul>",
      "summary": "I have not posted a lot about our Virtual Human Platform for Safety Assessment (VHP4Safety) project yet. Actually, more generally I do not post frequently about the funded projects. This is likely that few of them are Open Science by contract and often they have some formal process in place to approve output. That makes open notebook science-style posting about these projects hard. One is restricted to previously cleared material.",
      
      "date_published": "2024-10-24T00:00:00+00:00",
      "date_modified": "2025-03-11T00:00:00+00:00",
      "tags": ["vhp4safety"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.14573/ALTEX.2407211", "doi": "10.14573/ALTEX.2407211"
             }
            
          
        ],
      
      "_funding": [{"award": { "title" : "The Virtual Human Platform for Safety Assessment", "acronym" : "VHP4Safety", "uri" : "drc.filenumber:nwa129219272" }, "funder": { "name": "Dutch Research Council", "ror": "04jsz6e67" } }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/w7kzh-8y965",
      "url": "https://chem-bla-ics.linkedchemistry.info/2024/10/21/nasa-tops.html",
      "title": "NASA Transform to Open Science (TOPS) Open Science 101",
      "content_html": "<p>It was on my radar for some time already, but did not get around to finishing it. But I completed all\nfive modules of the <a href=\"https://openscience101.org/\">NASA Transform to Open Science (TOPS) Open Science 101</a>\n(doi:<a href=\"https://doi.org/10.5281/zenodo.10161527\">10.5281/zenodo.10161527</a>).\nThis Open Science 101 consists of several modules, starting with <em>The Ethos of Open Science</em>, via\n<em>Open Tools and Resources</em>, <em>Open Data</em>, and <em>Open Code</em>, to <em>Open Results</em>.</p>\n\n<p><img src=\"/assets/images/nasa_tops.png\" alt=\"\" /></p>\n\n<p>Now, since I have been practising aspects of science for almost 25 years, I have to admit I was nervous\ndoing this. That probably explains why it took me so long to do it. Just going through the material will\nprobably take 4-8 hours, but there was a lot to reflect on. They also link to many additional resources\nand cite a good bunch of scientific research.</p>\n\n<p>I also like to stress that I like the material very, very much. It is very well designed, covers a lot\nof aspects, and finds a great balance between depth and coverage. Sure, I had some comments here and\nthere, but it higjhlights very well what open science really is, what not, and how the open science\ncommunity is working on reaching the goals, which things work well, and which things need more work.</p>\n\n<p>The material itself is <a href=\"https://github.com/nasa/Transform-to-Open-Science\">open and available from GitHub</a>.</p>",
      "summary": "It was on my radar for some time already, but did not get around to finishing it. But I completed all five modules of the NASA Transform to Open Science (TOPS) Open Science 101 (doi:10.5281/zenodo.10161527). This Open Science 101 consists of several modules, starting with The Ethos of Open Science, via Open Tools and Resources, Open Data, and Open Code, to Open Results.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/nasa_tops.png",
      "date_published": "2024-10-21T00:00:00+00:00",
      "date_modified": "2024-10-21T00:00:00+00:00",
      "tags": ["openscience"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.5281/zenodo.10161527", "doi": "10.5281/zenodo.10161527"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/3abda-n1j28",
      "url": "https://chem-bla-ics.linkedchemistry.info/2024/09/23/patents-and-impact.html",
      "title": "Patents, societal impact, and sustainability",
      "content_html": "<p>Division 1 of our <a href=\"https://www.maastrichtuniversity.nl/research/school-nutrition-and-translational-research-metabolism\">Institute of Nutrition and Translational Research in Metabolism</a>\n(NUTRIM) held a meeting last week which had a panel discussion on the use of\npatents to bring research to the market, aimed at PhD candidates of the institute.\nPatents are one of the routes to make research output more sustainable. For\nexample, the research output into a new method to study something or make something\noften needs the development into a product. For example, a new multivariate\nstatistics method may need a graphical user interface.\nAs such, the “development” after the research (think, R&amp;D) is often part of the\n<em>sustainability</em> of some research.</p>\n\n<p>Patents, trade secrets, and precompetitive collaboration are three methods that\nhave been used to make research output sustainable. Of course, in addition to\nthe fourth, which is simply the published journal article or book chapter.</p>\n\n<p>This led to the notion that PhD research, if it is to benefit (the Dutch) society,\nthen if needs to get used. There needs to be a market of users. This could be\nother scholars that use the method, use the data (see also\n<a href=\"https://chem-bla-ics.linkedchemistry.info/tag/cito\">Citation Typing Ontology</a>\nthat captures such reuse), or could be a product sold to other businesses or\neven a consumer market product.</p>\n\n<p>Filing a patent is often seen as research having societal impact. It captures\nthe notion that one or more people trust the impact enough to invest a considerable\namount of money. BTW, patents allow others to reuse your knowledge, to extend\nit, and to modify it. It is just that the patent limits how you use the results\nof that reuse commercially.</p>\n\n<p>But patents are interesting in another way. A mention of your research means that\nthe people that cited your work in their patent found your research valuable\nenough to list it as support of their patent. This is similar to getting cited\nin another journal article (or book (chapter)), but much closer to society.</p>\n\n<p>Therefore, if you are interested to learn whhich of the research you do, and the output\nof that research, has an impact on society, scanning patent literature for citations to\nyour work or the work of the research group you work in, can give surprising\nresults. Worst case, it gives you ideas of how the research may benefit society.</p>\n\n<h2 id=\"google-patents\">Google Patents</h2>\n\n<p>Nowadays, there are multiple patent search engines and sometimes the do a lot\nof text mining, e.g. to find patents that mention a certain chemical structures.\nBut a general search engine like <a href=\"https://patents.google.com/\">Google Patents</a>\nwill already to you a great service. If you search here on terms related to\nyour research, or your last name, you can find results. If your research project\nhas a unique name, this will, of course, greatly simplify the search.</p>\n\n<p>For example, when I search for <a href=\"https://www.wikipathways.org/\">WikiPathways</a> (our biological WikiPathways\nknowledge graph), it finds <a href=\"https://patents.google.com/?q=(wikipathways)&amp;oq=wikipathways\">over 200 patents that mention it</a>.\nWikiPathways is an Open Science project and there is no patent on our approach,\nbut what this project has done, turns out to be important for SMEs enough that\nthey base a patent on it. Of course, the role is often just supportive, just\nlike a journal article citation. This is what a results page may look like:</p>\n\n<p><img src=\"/assets/images/google_patent_wikipathways.png\" alt=\"\" /></p>\n\n<h2 id=\"citations-to-specific-article\">Citations to specific article</h2>\n\n<p>There are also tools that make available text mining results that found which\narticles have been cited in which patent. <a href=\"https://altmetric.com/\">Altmetric.com</a>\nis one of them. For many articles (DOIs) they provide information on where that\narticle (DOI) is mentioned. And they provide a <a href=\"https://www.altmetric.com/about-us/our-data/donut-and-altmetric-attention-score/\">donut to visualize that\nattention</a>.\nOver time, the diversity of what mentions they find has gone down, and new\nmedia are not added frequently and Mastodon is a big one missing, but patents\nis still one of the supported resources.</p>\n\n<p>For any DOI you can look up what data Altmetric.com has using this URL\npattern (the example is for the DOI <em>10.1039/D3DD00069A</em>):</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>https://altmetric.com/details/doi/10.1039/D3DD00069A\n</code></pre></div></div>\n\n<p>Maastricht University users can use our <a href=\"https://cris.maastrichtuniversity.nl/\">cris</a>\nwhich provides an HTML page listing all your articles (e.g.\n<a href=\"https://cris.maastrichtuniversity.nl/en/persons/egon-willighagen/publications/\">mine</a>)\nand each has a Altmetric.com donut, which an orange band for patents:</p>\n\n<p><img src=\"/assets/images/altmetrics_patents.png\" alt=\"\" /></p>\n\n<p>We can see here that this article is cited in three patents. You can click\nthe donut to find which patents those are. The <em>cris</em> overview page gives\na quick look which articles (or research lines) are cited in patents.</p>\n\n<p>Also look out for the purple bands, which reflect citations in policy documents,\nwhich reflect another kind of societal impact.</p>\n\n<h2 id=\"potential\">Potential</h2>\n\n<p>For early career researchers with few articles and not a lot of time to\nget cited in patents (or policies), it can also be useful to look at articles\nthat your work is based on, e.g. those of your supervisor.</p>",
      "summary": "Division 1 of our Institute of Nutrition and Translational Research in Metabolism (NUTRIM) held a meeting last week which had a panel discussion on the use of patents to bring research to the market, aimed at PhD candidates of the institute. Patents are one of the routes to make research output more sustainable. For example, the research output into a new method to study something or make something often needs the development into a product. For example, a new multivariate statistics method may need a graphical user interface. As such, the “development” after the research (think, R&amp;D) is often part of the sustainability of some research.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/google_patent_wikipathways.png",
      "date_published": "2024-09-23T00:00:00+00:00",
      "date_modified": "2024-09-23T00:00:00+00:00",
      
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/7qe60-evp05",
      "url": "https://chem-bla-ics.linkedchemistry.info/2024/09/16/publishing.html",
      "title": "Better Publishing",
      "content_html": "<p>If you read my blog, it should not surprise you that I have long experimented with technologies\nto improve knowledge dissemination, for example <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/12/10/including-smiles-cml-and-inchi-in.html\">in HTML</a>. And I have <a href=\"https://chem-bla-ics.linkedchemistry.info/tag/publishing\">blogged about publishing</a>\nfrom an author and researcher, and editor perspective, for many years (see <a href=\"https://chem-bla-ics.blogspot.com/search?q=publishing\">this longer list\non my old blog</a>).\nAlso, in the <a href=\"https://jcheminf.biomedcentral.com/\">Journal of Cheminformatics</a>\nwe pushed for innovation, including <a href=\"https://jcheminf.biomedcentral.com/articles/10.1186/s13321-019-0365-4\">ORCID and GitHub adoption</a> and <a href=\"https://jcheminf.biomedcentral.com/articles/10.1186/s13321-020-00448-1\">Citation Typing Ontology adoption</a>.</p>\n\n<p>All of these depend on the publisher to support these efforts. But the big publishers are not good\nat this (see also doi:<a href=\"\">10.5281/zenodo.4926031</a>https://doi.org/10.5281/zenodo.4926031)\nand/or prefer to make 20-30% profit first.</p>\n\n<p>This opens room for innovative publishers. We have <a href=\"https://f1000research.com/\">F1000Research</a> pushing open peer review,\nand <a href=\"https://pensoft.net/\">PenSoft</a> pushing a new editor,\n<a href=\"https://www.overleaf.com/\">Overleaf</a> bringing collaborative online editing of LaTeX,\n[Qeios] experimenting with a <a href=\"https://chem-bla-ics.linkedchemistry.info/2023/07/02/qeios-open-dissemination-platform-for.html\">wider range of output types</a>,\nand the  <a href=\"https://joss.theoj.org/\">Journal of Open Source Software</a> (JOSS) pioneering\na more open platform for the whole editing process.</p>\n\n<p>And, of course, we have <a href=\"https://en.wikipedia.org/wiki/Diamond_open_access\">Diamond Open Access</a>\npublishers that do not get enough visibility, like <a href=\"https://scipost.org/\">SciPost</a>\nand <a href=\"https://www.beilstein-journals.org/\">Beilstein</a> for natural sciences and\nJOSS for open source.</p>\n\n<h2 id=\"open-journal-systems\">Open Journal Systems</h2>\n\n<p>And there is the <a href=\"https://pkp.sfu.ca/software/ojs/\">Open Journal Systems</a> (OJS), another\neditor manager, one that has been around for some time now. We use OJS for the\n<a href=\"https://chem-bla-ics.linkedchemistry.info/tag/cdknews\">CDK News newsletter</a>.\nBig news this week was that <a href=\"https://pkp.sfu.ca/2024/09/12/ojs-infrastructure-for-open-research-europe/\">OJS has been selected</a>\nas infrastructure to underping the <a href=\"https://open-research-europe.ec.europa.eu/\">Open Research Europe</a> publishing platform,\nsomething running on F1000Research, <a href=\"https://en.wikipedia.org/wiki/F1000_(publisher)\">bought up up Taylor&amp;Francis in 2020</a>.</p>\n\n<p>I need to catch up with where the OJS is technically. Do they support Markdown\nsubmissions? Do they export <a href=\"https://jats.nlm.nih.gov/\">JATS</a>? Do they support CiTO annotations? But this needs\neditors and journals to expect these things. Unfortunately, many journals have\na limited expectation of digitial knowledge dissemination, and it’s still\nPDF galore.</p>\n\n<h2 id=\"better-publishing\">Better Publishing</h2>\n\n<p>This brings me to the following: should the Dutch universities continue to fund\nthe publisher business, stakeholder profit, or should we invest in open infrastructure\nto benefit our own core business: research and education. I think you understand\nwhat my position is on this. The current big deals we have with the big\npublishers are not actually really in our benefit and with the upcoming defunding\nwe have to use every euro carefully. And then I prefer to fund a young researcher\ninstead of publisher stakeholders.</p>\n\n<p>I hope you are willing the read the following petition to the Dutch negotiators\nto very carefully consider what their priorities are and who they represent.\nYou can sign anonymously (if you fear backslash) and you can just read the details\nbehind this well-written petition: there are many references at the bottom to\nsupport the statements I make here, and more.</p>\n\n<p>But I really, really hope you wish a better future for knowledge dissemination.\nJust think of your next Reviewer 2, that you pay the publisher to have Reviewer 2\nscold at you, or the time spent on reference formatting, just because the publisher\nprefers profit over usability.</p>\n\n<p>Join and <a href=\"https://openscienceretreat.eu/call-to-commitment-future-proof-oa-publishing/\">sign</a>!</p>\n\n<p><a href=\"https://openscienceretreat.eu/call-to-commitment-future-proof-oa-publishing/\"><img src=\"/assets/images/0-768x768.jpg\" alt=\"\" /></a></p>",
      "summary": "If you read my blog, it should not surprise you that I have long experimented with technologies to improve knowledge dissemination, for example in HTML. And I have blogged about publishing from an author and researcher, and editor perspective, for many years (see this longer list on my old blog). Also, in the Journal of Cheminformatics we pushed for innovation, including ORCID and GitHub adoption and Citation Typing Ontology adoption.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/0-768x768.jpg",
      "date_published": "2024-09-16T00:00:00+00:00",
      "date_modified": "2024-09-16T00:00:00+00:00",
      "tags": ["publishing","openscience"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1186/S13321-019-0365-4", "doi": "10.1186/S13321-019-0365-4"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1186/S13321-020-00448-1", "doi": "10.1186/S13321-020-00448-1"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.5281/ZENODO.4926030", "doi": "10.5281/ZENODO.4926030"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/7hjzg-ngr66",
      "url": "https://chem-bla-ics.linkedchemistry.info/2024/09/07/wikidata-citations.html",
      "title": "Adding citations between existing articles in Wikidata",
      "content_html": "<p>Scholarly articles provide context to the factualness of statements in <a href=\"https://wikidata.org/\">Wikidata</a>,\nsimilar to the <a href=\"https://en.wikipedia.org/wiki/Citation_needed\">[citation needed]</a> in <a href=\"https://en.wikipedia.org/wiki/\">Wikipedia</a>.\nAnd just like the cited references in each scholarly article itself. The citation network is general seen\nas an essential part of (doing) science, even without <a href=\"https://chem-bla-ics.linkedchemistry.info/tag/cito\">citation intention annotation</a>.\nNowadays, citations are mostly open, but this took very serious lobbying by the <a href=\"https://i4oc.org/\">Initiative for Open Citations</a> and\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2018/11/17/join-me-in-encouraging-acs-to-join.html\">not every publisher reacted immediately</a>.\nBut now that they are open, projects like <a href=\"https://opencitations.net/\">OpenCitations</a> are making this citation\nnetwork FAIR.</p>\n\n<p>Therefore, when an article is cited as reference in Wikidata, I think that the articles (and other research output)\ncited in that article is part of the reference. After all, it is really hard to understand any article without the details\nin the cited articles. So, getting these citations between article into Wikidata deepens the knowledge captured\nby Wikidata. Of course, Wikidata is also one of the few places where we can capture the citation intentions at all.</p>\n\n<p>Adding these citations manually is cumbersome but <a href=\"https://chem-bla-ics.linkedchemistry.info/2023/08/08/history-provenance-detail.html\">sometimes needed</a>\nas these citations are not open or not FAIR yet. Fortunately, in many cases we can automate the process, for\nwhich I wrote a <a href=\"https://chem-bla-ics.linkedchemistry.info/tag/bioclipse\">Bacting</a>-cased\n<a href=\"https://github.com/egonw/ons-wikidata/blob/main/OpenCitations/quickstatements.groovy\">script</a>.\nUntil recently, the script takes as input a single DOI or a list of DOIs as input, and for each DOI\nlooks up in OpenCitations if it cites other article DOIs and is cited by other DOIs. For the\ncited and citing DOIs it checks if those are in Wikidata and (only) if they are in Wikidata,\nthen it create QuickStatements. The result can look like <a href=\"https://www.wikidata.org/wiki/Q91911528#P2860\">this</a>:</p>\n\n<p><img src=\"/assets/images/opencitationsImport.png\" alt=\"\" /></p>\n\n<p>The script also needs a OpenCitation token, which you can <a href=\"https://opencitations.net/querying\">get here</a>.\nThis is how I run the code from the command line (with the token in the <code class=\"language-plaintext highlighter-rouge\">TOKEN</code> environment variable),\nfor a single DOI:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>groovy quickstatements.groovy <span class=\"nt\">-t</span> <span class=\"k\">${</span><span class=\"nv\">TOKEN</span><span class=\"k\">}</span> <span class=\"nt\">-d</span> 10.1002/JLAC.18721620110 | <span class=\"nb\">tee </span>output.qs\n</code></pre></div></div>\n\n<p>A list of DOIs is provided as a text file, with one DOI on one line. I then use the <code class=\"language-plaintext highlighter-rouge\">-l</code> parameter\n(oh, here DOIs of works by <a href=\"https://en.wikipedia.org/wiki/Shyamala_Gopalan\">Shyamala Gopalan</a>, mother of\n<a href=\"https://en.wikipedia.org/wiki/Kamala_Harris\">Kamala Harris</a>):</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>groovy quickstatements.groovy <span class=\"nt\">-t</span> <span class=\"k\">${</span><span class=\"nv\">TOKEN</span><span class=\"k\">}</span> <span class=\"nt\">-l</span> harris_dois.txt | <span class=\"nb\">tee </span>output.qs\n</code></pre></div></div>\n\n<p>But last weekend I created a new feature. To enrich the profiles of authors, for example Nobel Prize\nwinners, mothers of, or <a href=\"https://scholia.toolforge.org/author/Q76784\">famous</a> <a href=\"https://scholia.toolforge.org/author/Q80956\">chemists</a>,\npreviously I would create a list of DOIs, now I have the script do that:</p>\n\n<p>So, today I could add the citation network for any arbitraty author, e.g. <a href=\"https://en.wikipedia.org/wiki/Carolyn_Bertozzi\">Carolyn Bertozzi</a>,\nI just pass the Wikidata QID:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>groovy quickstatements.groovy <span class=\"nt\">-t</span> <span class=\"k\">${</span><span class=\"nv\">TOKEN</span><span class=\"k\">}</span> <span class=\"nt\">-a</span> Q7442 | <span class=\"nb\">tee </span>output.qs\n</code></pre></div></div>\n\n<p>I can imagine that in the future the script will have more such options, to do the same\nfor many authors at some affiliation, or all DOIs for a certain journal.</p>",
      "summary": "Scholarly articles provide context to the factualness of statements in Wikidata, similar to the [citation needed] in Wikipedia. And just like the cited references in each scholarly article itself. The citation network is general seen as an essential part of (doing) science, even without citation intention annotation. Nowadays, citations are mostly open, but this took very serious lobbying by the Initiative for Open Citations and not every publisher reacted immediately. But now that they are open, projects like OpenCitations are making this citation network FAIR.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/opencitationsImport.png",
      "date_published": "2024-09-07T00:00:00+00:00",
      "date_modified": "2024-09-07T00:00:00+00:00",
      "tags": ["wikidata","bioclipse","opencitations"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1002/JLAC.18721620110", "doi": "10.1002/JLAC.18721620110"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/epanj-4t315",
      "url": "https://chem-bla-ics.linkedchemistry.info/2024/08/23/scholia.html",
      "title": "Scholia configurability",
      "content_html": "<p><a href=\"https://scholia.toolforge.org/\">Scholia</a> is a visual layer on top of <a href=\"https://wikidata.org/\">Wikidata</a> providing\na rich user experience for browing scholarly research related knowledge. I am using the combinatie\nfor various things, including exploring new research topics (a method, compound, or protein I do not know so much\nabout yet), indexing notable research output (including citations), <a href=\"https://chem-bla-ics.linkedchemistry.info/tag/cito\">progress of Citation Typing Ontology\nuptake</a>, etc. This weekend I hope to send around the\nfinal draft for the <em>Scholia Chemistry</em> paper.</p>\n\n<p>Scholia has received a fair share of scholarly and social attention. The Scholia paper has been cited\n<a href=\"https://scholar.google.com/scholar?hl=en&amp;as_sdt=0%2C5&amp;q=scholia+wikidata&amp;btnG=&amp;oq=scholia\">over 100 times</a> and\nthe websites received about 200 thousand page views each day (though we do not know how to get Toolforge\nto give us sufficient insight into the how and what of that count). There is a Wikipedia template to link\nto Scholia and some of projects I am involved in link Scholia for articles, such as\n<a href=\"https://wikipathways.org/\">WikiPathways</a>.</p>\n\n<p>With that, there is also interest in using it for other Wikibases and perhaps even random SPARQL endpoints.\nThese things are not trivial, as Scholia uses complementary APIs, various URL patterns for some of the\nfunctionality, and generally, all SPARQL queries are tweaked to the Wikidata Blazegraph SPARQL endpoint\nto ensure results are returned in reasonable time. But that last requires use of Blazegraph extensions\nto the SPARQL standard.</p>\n\n<p>All this requires Scholia to become more independent, in a better model-view-controller model. And that\nactually turns out very important at this moment. That is, Wikidata is not a RDF-first database, but\na Wikibase-based store. Whenever an edit is made, RDF is generated and the SPARQL endpoint is updated.\nNow, the number of edits in Wikidata is enormous and the notion that the SPARQL endpoint is often minutes\nat most behind is a huge accomplishment. But the Blazegraph platform cannot keep up with Wikidata.\nBlazegraph is open source, but has been bought up and development stopped from one day to another.</p>\n\n<p>Therefore, a split of the Wikidata SPARQL platform is <a href=\"https://phabricator.wikimedia.org/T337013\">planned</a>.\nThis split will put one part of\nthe knowledge in on endpoint and the other half in the other. Any query that needs information\nfrom both graphs, will have to do a federated SPARQL query. Basically, there are very few Scholia\nqueries that do not rewriting. My first rewrite actually failed, because the rewriting is not\nobvious and quickly times out. To some extend, this is because now lots of results of subqueries\nneed to be send over the network from one endpoint to the other. When the combined query basically\ncovers half of each endpoint, that’s a lot of network traffic.</p>\n\n<p>An immediate use case of the configuration is therefore running Scholia against the current three\nendpoints: the current official endpoint, and the two split endpoints under development. With\n<a href=\"https://github.com/WDscholia/scholia/pull/2515\">a recent patch</a> <a href=\"@fnielsen@expressional.social\">Finn</a>\nand I worked on, this configuration looks like this (and saved as <code class=\"language-plaintext highlighter-rouge\">scholia.ini</code>:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>[query-server]\n# Wikidata:\n#sparql_endpoint = https://query.wikidata.org/sparql\n#sparql_editurl = https://query.wikidata.org/#\n#sparql_embedurl = https://query.wikidata.org/embed.html#\n\n# Wikidata Split Main\nsparql_endpoint = https://query-main.wikidata.org/sparql\nsparql_editurl = https://query-main.wikidata.org/#\nsparql_embedurl = https://query-main.wikidata.org/embed.html#\n\n# Wikidata Split Scholar\n#sparql_endpoint = https://query-scholarly.wikidata.org/sparql\n#sparql_editurl = https://query-scholarly.wikidata.org/#\n#sparql_embedurl = https://query-scholarly.wikidata.org/embed.html#\n</code></pre></div></div>\n\n<p>So, right now, we can test the impact of the split with Scholia and this patch.\nWe would fire up a local instances of Scholia, running against one of the\nsplit endpoints, and use the Toolforge instance as baseline.</p>\n\n<p>Now, on my system I need to use <a href=\"https://python.land/virtual-environments/virtualenv\">Python virtualenv</a>\nso, I first start a Scholia <code class=\"language-plaintext highlighter-rouge\">venv</code>:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nb\">source</span> ~/.venvs/scholia/bin/activate\n</code></pre></div></div>\n\n<p>After that, I can select an other endpoint, e.g. the <code class=\"language-plaintext highlighter-rouge\">main</code> Wikidata split endpoint (<code class=\"language-plaintext highlighter-rouge\">query-main-experimental.wikidata.org</code>)\nwere it not they are <a href=\"https://phabricator.wikimedia.org/T371833\">currently offline</a> as part of the transition\nand run Scholia on a unique port:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>scholia run\n</code></pre></div></div>\n\n<p>Then I can have two browser windows along side and compare Scholia pages againt the current\nScholia instance and when running against another SPARQL endpoint. For now, I can test how well\nScholia runs on the <a href=\"qlever.cs.uni-freiburg.de/wikidata\">QLever instance of Wikidata</a> (superfast and\nupdated data once a week). Here the configuration I have is not entirely complete, and many\nSPARQL queries do not work against QLever, including anything with graphical depiction. But\nthat said, I can use this configuration:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>[query-server]\n# QLever\n#sparql_endpoint = https://qlever.cs.uni-freiburg.de/api/wikidata\n#sparql_editurl = https://qlever.cs.uni-freiburg.de/wikidata/?query=\n#sparql_embedurl = \n</code></pre></div></div>\n\n<p>Then, I can compare, for example, the chemicals statistics the main Scholia with one running\nagainst QLever:</p>\n\n<p><img src=\"/assets/images/scholia_comparison.png\" alt=\"\" /></p>\n\n<p>This query ran without modification. For other queries rewriting is needed, but with this\nsetup we can at least quickly see the differences in the results.</p>",
      "summary": "Scholia is a visual layer on top of Wikidata providing a rich user experience for browing scholarly research related knowledge. I am using the combinatie for various things, including exploring new research topics (a method, compound, or protein I do not know so much about yet), indexing notable research output (including citations), progress of Citation Typing Ontology uptake, etc. This weekend I hope to send around the final draft for the Scholia Chemistry paper.",
      
      "date_published": "2024-08-23T00:00:00+00:00",
      "date_modified": "2024-09-05T00:00:00+00:00",
      "tags": ["scholia","wikidata","sparql"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1007/978-3-319-70407-4_36", "doi": "10.1007/978-3-319-70407-4_36"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/7vhj4-ae665",
      "url": "https://chem-bla-ics.linkedchemistry.info/2024/08/15/kasabi-archives.html",
      "title": "Kasabi archive at the Internet Archive",
      "content_html": "<p><a href=\"https://www.wikidata.org/wiki/Q128214915\">Kasabi</a> was an innovative RDF publishing platform from around 2011.\n<a href=\"https://web.archive.org/web/20130907095112/http://blog.kasabi.com/about/\">Shortlived</a>, and maybe just too early.\nI published two open datasets there. One was ChEMBL-RDF (see these <a href=\"https://chem-bla-ics.linkedchemistry.info/tag/chembl\">posts</a>).\nThe second was a small data sets called <a href=\"https://chem-bla-ics.linkedchemistry.info/2011/07/06/chempedia-rdf-2-kasabi.html\">ChemPedia</a>,\na open science effort to crowdsource chemical names. This is still very much needed, and possibly Wikidata could fill that gap,\nbut it would first need to be able to handle all labels as statements itself.</p>\n\n<p>Anyway, just before they shutdown because of, I understood, lack of commercial interest, they\n<a href=\"https://archive.org/details/kasabi\">archived all data</a>, including the ChemPedia datasets. I was happy to be reminded about that,\nbecause I am not sure I had archived that data.</p>",
      "summary": "Kasabi was an innovative RDF publishing platform from around 2011. Shortlived, and maybe just too early. I published two open datasets there. One was ChEMBL-RDF (see these posts). The second was a small data sets called ChemPedia, a open science effort to crowdsource chemical names. This is still very much needed, and possibly Wikidata could fill that gap, but it would first need to be able to handle all labels as statements itself.",
      
      "date_published": "2024-08-15T00:00:00+00:00",
      "date_modified": "2024-08-15T00:00:00+00:00",
      "tags": ["semweb","chembl","kasabi","ia","chempedia"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/y9chc-zb166",
      "url": "https://chem-bla-ics.linkedchemistry.info/2024/08/11/scholarly-discussions.html",
      "title": "Scholarly discussions through the eyes of CiTO (and Wikidata)",
      "content_html": "<p>Diabetes was already discussed in literature back in 1838-1839 (doi:<a href=\"https://doi.org/10.1016/S0140-6736(02)96038-1\">10.1016/S0140-6736(02)96038-1</a>,\ndoi:<a href=\"10.1016/S0140-6736(02)96066-6\">10.1016/S0140-6736(02)96066-6</a>, and doi:<a href=\"https://doi.org/10.1016/S0140-6736(02)83966-6\">10.1016/S0140-6736(02)83966-6</a>).\nThese three papers show a short discussion. Papers were a lot shorter back in the days, and the discussion actually shows why papers are longer now\n(tho I am not sure they really got sufficiently more reproducible, but that’s another discussion).</p>\n\n<p>Traditional citation counts do not make this discussion obvious, but if we make our publishing sufficiently FAIR (it’s far from that, right now),\nthen we can get a step closer. For example, with the <a href=\"https://purl.org/spar/cito\">Citation Typing Ontology</a>\nwe can show how the papers relate to each other:</p>\n\n<p><img src=\"/assets/images/clannyNetwork.png\" alt=\"\" /></p>\n\n<p>This network is based on public knowledge in <a href=\"https://wikidata.org/\">Wikidata</a> and actually can be easily reproduced by anyone\nwith <a href=\"https://w.wiki/AtV9\">this query</a>:</p>\n\n<div class=\"language-sparql highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"c1\">#defaultView:Graph</span><span class=\"w\">\n</span><span class=\"k\">SELECT</span><span class=\"w\"> </span><span class=\"k\">DISTINCT</span><span class=\"w\"> </span><span class=\"nv\">?focus1</span><span class=\"w\"> </span><span class=\"nv\">?focus1Label</span><span class=\"w\"> </span><span class=\"nv\">?focus2</span><span class=\"w\"> </span><span class=\"nv\">?focus2Label</span><span class=\"w\"> </span><span class=\"nv\">?edgeLabel</span><span class=\"w\"> </span><span class=\"k\">WHERE</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\">\n  </span><span class=\"k\">VALUES</span><span class=\"w\"> </span><span class=\"nv\">?focus1</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\"> </span><span class=\"nn\">wd</span><span class=\"o\">:</span><span class=\"ss\">Q124174475</span><span class=\"w\"> </span><span class=\"nn\">wd</span><span class=\"o\">:</span><span class=\"ss\">Q124174776</span><span class=\"w\"> </span><span class=\"nn\">wd</span><span class=\"o\">:</span><span class=\"ss\">Q124174815</span><span class=\"w\"> </span><span class=\"p\">}</span><span class=\"w\">\n  </span><span class=\"k\">VALUES</span><span class=\"w\"> </span><span class=\"nv\">?focus2</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\"> </span><span class=\"nn\">wd</span><span class=\"o\">:</span><span class=\"ss\">Q124174475</span><span class=\"w\"> </span><span class=\"nn\">wd</span><span class=\"o\">:</span><span class=\"ss\">Q124174776</span><span class=\"w\"> </span><span class=\"nn\">wd</span><span class=\"o\">:</span><span class=\"ss\">Q124174815</span><span class=\"w\"> </span><span class=\"p\">}</span><span class=\"w\">\n  </span><span class=\"nv\">?focus1</span><span class=\"w\"> </span><span class=\"nn\">p</span><span class=\"o\">:</span><span class=\"ss\">P2860</span><span class=\"w\"> </span><span class=\"nv\">?citation</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n  </span><span class=\"nv\">?citation</span><span class=\"w\"> </span><span class=\"nn\">ps</span><span class=\"o\">:</span><span class=\"ss\">P2860</span><span class=\"w\"> </span><span class=\"nv\">?focus2</span><span class=\"w\"> </span><span class=\"p\">;</span><span class=\"w\"> </span><span class=\"nn\">pq</span><span class=\"o\">:</span><span class=\"ss\">P3712</span><span class=\"w\"> </span><span class=\"nv\">?edge</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n  </span><span class=\"nv\">?edge</span><span class=\"w\"> </span><span class=\"nn\">rdfs</span><span class=\"o\">:</span><span class=\"ss\">label</span><span class=\"w\"> </span><span class=\"nv\">?edgeLabel</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\"> </span><span class=\"k\">FILTER</span><span class=\"p\">(</span><span class=\"nb\">LANG</span><span class=\"p\">(</span><span class=\"nv\">?edgeLabel</span><span class=\"p\">)</span><span class=\"w\"> </span><span class=\"p\">=</span><span class=\"w\"> </span><span class=\"s2\">\"en\"</span><span class=\"p\">)</span><span class=\"w\">\n  </span><span class=\"k\">SERVICE</span><span class=\"w\"> </span><span class=\"nn\">wikibase</span><span class=\"o\">:</span><span class=\"ss\">label</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\"> </span><span class=\"nn\">bd</span><span class=\"o\">:</span><span class=\"ss\">serviceParam</span><span class=\"w\"> </span><span class=\"nn\">wikibase</span><span class=\"o\">:</span><span class=\"ss\">language</span><span class=\"w\"> </span><span class=\"s2\">\"[AUTO_LANGUAGE],mul,en\"</span><span class=\"p\">.</span><span class=\"w\"> </span><span class=\"p\">}</span><span class=\"w\">\n</span><span class=\"p\">}</span><span class=\"w\">\n</span></code></pre></div></div>\n\n<p>The two “focus” values are an identical list of the articles I want to see. To make sure to get citations between all of them,\nI have to give them twice.</p>\n\n<p>In the above example I have used <code class=\"language-plaintext highlighter-rouge\">VALUES</code> for this, but I can also generate the controlled list of items between the citations\nI want to visualize with any SPARQL fragment too. <a href=\"https://edu.nl/y38rg\">This query</a> does that (or here as\n<a href=\"https://gist.github.com/egonw/b5fb7ae550c1597ff247f70cee8063c8\">GitHub Gist</a>, but something else too: it uses a trick I learned\nfrom <a href=\"https://scholia.toolforge.org/author/Q20980928\">Finn Nielsen</a> from <a href=\"https://github.com/WDscholia/scholia/commit/d34dee85bc12575e0f1891c4e663ef8e2c450083\">this patch</a>\nfrom the <a href=\"https://scholia.toolforge.org/\">Scholia</a> project (doi:<a href=\"https://doi.org/10.1007/978-3-319-70407-4_36\">10.1007/978-3-319-70407-4_36</a>)).</p>\n\n<p>Here, I select the articles by replacing the above <code class=\"language-plaintext highlighter-rouge\">VALUES</code> lines with this fragment (<code class=\"language-plaintext highlighter-rouge\">P50</code> is ‘author’ and <code class=\"language-plaintext highlighter-rouge\">Q20895241</code> is me in Wikidata):</p>\n\n<pre><code class=\"language-SPARQL\">  ?focus1 wdt:P50 wd:Q20895241 .\n  ?focus2 wdt:P50 wd:Q20895241 .\n</code></pre>\n\n<p>And, to be honest, then I get this network which is much richer than I expected:</p>\n\n<p><img src=\"/assets/images/willighagen_cito.png\" alt=\"\" /></p>\n\n<p>I wonder how far we can push this. Can we also do this for the <a href=\"https://scholia.toolforge.org/venue/Q6294930\">Journal of Cheminformatics</a>?\nAfter all, this journal had a <a href=\"https://www.biomedcentral.com/collections/cito\">CiTO Pilot</a> and, indeed,\n<a href=\"https://edu.nl/hk8xy\">the results do not disappoint</a>! All I had to do was replace the focus section:</p>\n\n<pre><code class=\"language-SPARQL\">  ?focus1 wdt:P1433 wd:Q6294930 .\n  ?focus2 wdt:P1433 wd:Q6294930 .\n</code></pre>",
      "summary": "Diabetes was already discussed in literature back in 1838-1839 (doi:10.1016/S0140-6736(02)96038-1, doi:10.1016/S0140-6736(02)96066-6, and doi:10.1016/S0140-6736(02)83966-6). These three papers show a short discussion. Papers were a lot shorter back in the days, and the discussion actually shows why papers are longer now (tho I am not sure they really got sufficiently more reproducible, but that’s another discussion).",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/clannyNetwork.png",
      "date_published": "2024-08-11T00:00:00+00:00",
      "date_modified": "2024-08-11T00:00:00+00:00",
      "tags": ["cito","wikidata"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1016/S0140-6736(02)96038-1", "doi": "10.1016/S0140-6736(02)96038-1"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1016/S0140-6736(02)96066-6", "doi": "10.1016/S0140-6736(02)96066-6"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1016/S0140-6736(02)83966-6", "doi": "10.1016/S0140-6736(02)83966-6"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1007/978-3-319-70407-4_36", "doi": "10.1007/978-3-319-70407-4_36"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/8c1e7-8yp77",
      "url": "https://chem-bla-ics.linkedchemistry.info/2024/08/07/cito-updates.html",
      "title": "CiTO updates: Wakefield and WikiPathways",
      "content_html": "<p>This summer I am trying to finish up some smaller projects that I did not have time for to finish, with\nmixed successes. I am combing this with a nice Dutch staycation, and I already cycled in\n<a href=\"https://en.wikipedia.org/wiki/Overijssel\">Overijssel</a> and in south-west <a href=\"https://en.wikipedia.org/wiki/Friesland\">Friesland</a>\nand learning about their histories.\nBut this post is about an update on my Citation Typing Ontology use cases. And I have to say,\na <a href=\"https://www.youtube.com/watch?v=1kD7jkyDr3s\">mention by Silvio Peroni</a> is pretty awesome, thanks!</p>\n\n<p>First, the bad news. I still did not get around to the following to tasks I have. First, I need to write up a\nstep-by-step guide how to create <a href=\"https://chem-bla-ics.linkedchemistry.info/2024/04/02/open-science-retreat-2.html\">CiTO nanopublications</a>\nand matching draft article. Second, I still need to work out how to update the JATS workflow for\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2021/11/15/biohackathon-europe-2021-1-cito.html\">CiTO annotation in BioHackrXiv</a>.</p>\n\n<h2 id=\"wakefield\">Wakefield</h2>\n\n<p>Let’s first start with a dataset. Peroni mentioned a study they did (<a href=\"https://doi.org/10.1007/S11192-021-04097-5\">10.1007/S11192-021-04097-5</a>)\ninto why the famous Wakefield paper\n(doi:<a href=\"https://doi.org/10.1016/S0140-6736(97)11096-0\">10.1016/S0140-6736(97)11096-0</a>) is cited. They published\ntheir data set on Zenodo (doi:<a href=\"https://doi.org/10.5281/zenodo.13166142\">10.5281/zenodo.13166142</a>) with CCZero,\nso I imported it into <a href=\"https://wikidata.org/\">Wikidata</a>. Well, at least the citations\nof articles already in Wikidata. I used a Bacting (doi:<a href=\"https://doi.org/10.21105/joss.02558\">10.21105/joss.02558</a>)\n<a href=\"https://gist.github.com/egonw/379c72a49517716712b70bdee0d845ce\">script</a> and it actually was quite short.\nIn the end, this added some 500 new citation intentions to Wikidata, now at almost <a href=\"https://scholia.toolforge.org/cito/\">2000</a>.\nThis is also the third dataset with explicit CiTO intention annotations (see also\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2023/04/02/cito-updates-4-annotations-in-datasets.html\">this post</a>).</p>\n\n<p>This is what the <a href=\"https://scholia.toolforge.org/work/Q28264479#cito-incoming\">CiTO section of the Wakefield paper</a>\nin <a href=\"https://scholia.toolforge.org/\">Scholia</a> (doi:<a href=\"https://doi.org/10.1007/978-3-319-70407-4_36\">10.1007/978-3-319-70407-4_36</a>)\nnow looks like:</p>\n\n<p><img src=\"/assets/images/wakefieldCitations.png\" alt=\"\" /></p>\n\n<h2 id=\"wikipathways\">WikiPathways</h2>\n\n<p>A second thing I want to show is a potentional CiTO intention annotation dataset. Almost two years ago\n<a href=\"https://qoto.org/@xanderpico\">Alex Pico</a> started a new <a href=\"https://wikipathways.org/\">WikiPathways</a>\nfeature as part of the new website (doi:<a href=\"https://doi.org/10.1093/NAR/GKAD960\">10.1093/NAR/GKAD960</a>)):\n<a href=\"https://github.com/wikipathways/wikipathways-database/commit/97f7df0057d312f0c332a9ff290c11684bf252d5\">a list of citations to specific pathways</a>\n(in WikiPathways). Alex’ setup is fully automated and using <a href=\"https://www.ncbi.nlm.nih.gov/pmc/\">PubMed Central</a>\nand find mentions in figure captions:</p>\n\n<p><em>Beyond citations to previous WikiPathways journal articles, we have identified 1228 mentions of a total of 582\nunique WikiPathways pathway model identifiers, e.g. WP4846, in PubMedCentral articles over the past 13 years.</em></p>\n\n<p>The file format is a pretty basic YAML file:</p>\n\n<p><img src=\"/assets/images/citedin_yaml.png\" alt=\"\" /></p>\n\n<p>Additional mentions are found in the main text and tables in the article. These are not always picked up.\nThese can be added manually. Over the past months and the past two weeks particularly, I have been adding\nadditional mentions, not listed yet. We now passed 1500 mentions but I cannot easily give the other\nstatistics.</p>\n\n<p>BTW, anyone can add these citations with the ‘edit’ pencil and some Microsoft GitHub editing (but\nas far as I am concerned, please feel free to also just mention the paper on the\n<a href=\"https://github.com/wikipathways/wikipathways-help/discussions\">WikiPathways Community Forum</a>):</p>\n\n<p><img src=\"/assets/images/citedin_website.png\" alt=\"\" /></p>\n\n<p>So, in the next few days I plan to do two things: 1. generate RDF for the YAML file and make that part of the\n<a href=\"https://data.wikipathways.org/current/rdf/\">monthly WikiPathways RDF release</a>; 2. extract citations and\noffer this back to <a href=\"https://opencitations.net/\">the OpenCitations project</a>; and, 3. add the citations\ninto Wikidata. Of course, all with <code class=\"language-plaintext highlighter-rouge\">cito:usesDataFrom</code> :)</p>\n\n<p>There is a fourth things that I am still thinking about. I can also use the above data the annotation\ncitations to the WikiPathways papers if they also mention a WikiPathways identifier as <code class=\"language-plaintext highlighter-rouge\">cito:usesDataFrom</code>,\nbut I cannot fully oversee the implications of that. What do you think?</p>",
      "summary": "This summer I am trying to finish up some smaller projects that I did not have time for to finish, with mixed successes. I am combing this with a nice Dutch staycation, and I already cycled in Overijssel and in south-west Friesland and learning about their histories. But this post is about an update on my Citation Typing Ontology use cases. And I have to say, a mention by Silvio Peroni is pretty awesome, thanks!",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/wakefieldCitations.png",
      "date_published": "2024-08-07T00:00:00+00:00",
      "date_modified": "2024-08-07T00:00:00+00:00",
      "tags": ["cito","wikipathways","wikidata"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1016/S0140-6736(97)11096-0", "doi": "10.1016/S0140-6736(97)11096-0"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.21105/JOSS.02558", "doi": "10.21105/JOSS.02558"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1007/978-3-319-70407-4_36", "doi": "10.1007/978-3-319-70407-4_36"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.5281/ZENODO.13166142", "doi": "10.5281/ZENODO.13166142"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1093/NAR/GKAD960", "doi": "10.1093/NAR/GKAD960"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1007/S11192-021-04097-5", "doi": "10.1007/S11192-021-04097-5"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/32j3a-7ae65",
      "url": "https://chem-bla-ics.linkedchemistry.info/2024/07/31/directed-metabolic-network.html",
      "title": "New paper: &quot;Discovering life&apos;s directed metabolic (sub)paths to interpret human biochemical markers using the DSMN tool&quot;",
      "content_html": "<p>I am still catching up with a lot of work, and found out I actually had forgotten to blog about this cool article\nby <a href=\"https://scholar.google.com/citations?user=Le-4tuQAAAAJ&amp;hl\">Denise Slenter</a>: “Discovering life’s directed metabolic (sub)paths to\ninterpret human biochemical markers using the DSMN tool” (doi:<a href=\"https://doi.org/10.1039/D3DD00069A\">10.1039/D3DD00069A</a>).\nThis paper explains how various open science resources (<a href=\"https://www.wikidata.org/\">Wikidata</a>,\n<a href=\"https://reactome.org/\">Reactome</a>, <a href=\"https://www.wikipathways.org/\">WikiPathways</a>) are used to visualize\nthe biological story of the data from two metabolomics experiments archived in MetaboLights.</p>\n\n<p>Using <a href=\"https://neo4j.com/\">Neo4J</a> and <a href=\"https://cytoscape.org/\">Cytoscape</a> she visualizes the data onto a network created with\nRDF, <a href=\"https://en.wikipedia.org/wiki/SPARQL\">SPARQL</a> from the above resources:</p>\n\n<p><img src=\"/assets/images/d3dd00069a-f12_hi-res.png\" alt=\"\" /></p>\n\n<p>The whole approach uses open science, making the work very reproducible. This is essential, as our knowledge\nabout metabolic processes continues to grow, if not only for the human lipids, but also from molecular\nimaging technologies. Moreover, a lot of biological detail is yet to be encoded on pathway databases,\nsuch as cellular location of proteins and metabolites, which proteins are expressed in which tissue, or\nthe kinetics of metabolic reactions. All knowledge that can be pulled it via knowledge graphs becomes\nimmediately available by using this <a href=\"https://en.wikipedia.org/wiki/FAIR_data\">FAIR</a> approach.</p>\n\n<p>One last note, the reader may notice a focus on the shortest path. Of course, the biological relevant\npath may not be the “shortest” path. But from a network analysis perspective that question is purely\nacademic. Neo4J, like other tools, support finding all paths. But validation which paths (the shorter\nor any of the longer) is biologically most relevant first depends on actually more biological\nknowledge to become FAIR. After this, it is just push button.</p>",
      "summary": "I am still catching up with a lot of work, and found out I actually had forgotten to blog about this cool article by Denise Slenter: “Discovering life’s directed metabolic (sub)paths to interpret human biochemical markers using the DSMN tool” (doi:10.1039/D3DD00069A). This paper explains how various open science resources (Wikidata, Reactome, WikiPathways) are used to visualize the biological story of the data from two metabolomics experiments archived in MetaboLights.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/d3dd00069a-f12_hi-res.png",
      "date_published": "2024-07-31T00:00:00+00:00",
      "date_modified": "2024-07-31T00:00:00+00:00",
      "tags": ["wikipathways","metabolomics"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1039/D3DD00069A", "doi": "10.1039/D3DD00069A"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/8x2f1-h6d21",
      "url": "https://chem-bla-ics.linkedchemistry.info/2024/07/21/rogue-scholar-and-more.html",
      "title": "GoatCounter, Rogue Scholar and more new things",
      "content_html": "<p>About <a href=\"https://chem-bla-ics.linkedchemistry.info/2023/07/27/archiving-and-updating-my-blog.html\">a year ago</a> I started migrating\nmy blogger.com blog to a git-version-controlled, Markdown-based blogging platform. I have to say, it has been a happy year.\nIt actually is awesome to port old blog posts (<a href=\"https://egonw.github.io/blog/\">follow that here</a>) and to see what I have been\nworking on some 17, 18 years ago.</p>\n\n<p>I do have a nasty bug to fix that causes the conversion of the Markdown to HTML is scaling badly. The system is doing some indexing at\nthe wrong time, and probably all indexing for each post again. Kudos if you spot it.</p>\n\n<p>But while still being on a Jekyll learning curve, some nice things have happened since I started. This blog started with\nInChIKeys, as demonstrated in <a href=\"https://doi.org/10.59350/fbnx1-9r832\">this post</a>,\nwhich adds <a href=\"https://chem-bla-ics.linkedchemistry.info/molecule/DEIYFTQMQPDXOT-UHFFFAOYSA-N\">this molecule page</a>. On my wishlist\nis still a <a href=\"https://chem-bla-ics.linkedchemistry.info/tag/rss\">CMLRSS</a>-based feed.</p>\n\n<p>Newer is things I worked on since, this includes the following, and something that readers of my blog may be interested in\nlearning about. First, I started counting visitors again, but with the GDPR-compliant <a href=\"https://goatcounter.com/\">GoatCounter</a>.\nI have been using my social network as advisory board, and knowing what people find interested matters to me.</p>\n\n<p>The second thing is listing in <a href=\"https://rogue-scholar.org/\">The Rogue Scholar</a>. This is a new platform, like a blog planet, perhaps\na bit like (the late) <a href=\"https://chem-bla-ics.blogspot.com/search?q=%22chemical+blogspace%27\">Chemical blogspace</a> and (the late)\n<a href=\"https://chem-bla-ics.blogspot.com/search?q=%22postgenomic.com%22\">Postgenomic.com</a>, but so far without the extraction of\njournal articles (tho it did start <a href=\"https://doi.org/10.53731/j77gv-54g66\">recognizing some references</a>),\nchemicals, and conferences. Instead, they offer <a href=\"https://doi.org/10.53731/br9f5xa-a556w2t\">archiving</a>\n<a href=\"https://doi.org/10.53731/g60vh-3ng48\">by the Internet Archive</a>, <a href=\"https://doi.org/10.53731/6mkrk-dzh02\">DOIs for your blog posts</a>,\n<a href=\"https://doi.org/10.53731/1dfxr-hs665\">ePub and PDF downloads</a>, and <a href=\"https://doi.org/10.53731/3w1ye-q6z42\">JATS</a>.\nThe just passed the milestone of <a href=\"https://doi.org/10.53731/xkfsa-xkk56\">100 participating blogs</a>!\nPlease do check it out, it’s an awesome service.</p>\n\n<p><img src=\"/assets/images/chemblaics-on-roguescholar.png\" alt=\"\" /></p>\n\n<p>A final thing I want to mention here is that my blog now has an <a href=\"https://chem-bla-ics.linkedchemistry.info/archive/\">archive page</a>,\nwhich sometimes can be useful.</p>\n\n<p>Let’s see what I can say next year, when my blog celebrates its 20th birthday :)</p>",
      "summary": "About a year ago I started migrating my blogger.com blog to a git-version-controlled, Markdown-based blogging platform. I have to say, it has been a happy year. It actually is awesome to port old blog posts (follow that here) and to see what I have been working on some 17, 18 years ago.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/chemblaics-on-roguescholar.png",
      "date_published": "2024-07-21T00:00:00+00:00",
      "date_modified": "2024-07-21T00:00:00+00:00",
      "tags": ["blog"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.59350/fbnx1-9r832", "doi": "10.59350/fbnx1-9r832"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.53731/j77gv-54g66", "doi": "10.53731/j77gv-54g66"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.53731/br9f5xa-a556w2t", "doi": "10.53731/br9f5xa-a556w2t"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.53731/g60vh-3ng48", "doi": "10.53731/g60vh-3ng48"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.53731/6mkrk-dzh02", "doi": "10.53731/6mkrk-dzh02"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.53731/1dfxr-hs665", "doi": "10.53731/1dfxr-hs665"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.53731/3w1ye-q6z42", "doi": "10.53731/3w1ye-q6z42"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.53731/xkfsa-xkk56", "doi": "10.53731/xkfsa-xkk56"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/dtfq8-5x011",
      "url": "https://chem-bla-ics.linkedchemistry.info/2024/06/16/cdk2024-3.html",
      "title": "cdk2024 #3: an unexpected downstream project",
      "content_html": "<p>In <a href=\"https://chem-bla-ics.linkedchemistry.info/2024/04/07/cdk2024.html\">the CDK2024</a>\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2024/05/18/cdk2024-2.html\">grant</a> we wrote about\nupdating various software projects using the <a href=\"https://cdk.github.io/\">Chemistry Development Kit</a>.\nWe even wrote that “[r]equired API changes will be publicly shared and disseminated with the\nGroovy Cheminformatics with the Chemistry Development Kit book (egonw.github.io/cdkbook/)”.\nThe <em>Groovy Cheminformatics with the Chemistry Development Kit</em> book is a project that has\nrun since 2009.</p>\n\n<pre><code class=\"language-git\">commit c5cbf9b5dd49baf582afc595c9cbafc714c5199f\nAuthor: Egon Willighagen &lt;egon.willighagen@gmail.com&gt;\nDate:   Fri Apr 10 12:34:42 2009 +0200\n\n    Initial copy of the current draft; converted into separate project for easier branching\n    for tunes of the book for workshops and sorts\n</code></pre>\n\n<p>The original version was in LaTeX and\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2011/02/06/groovy-cheminformatics.html\">sold online via Lulu.com <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.\nBecause all code examples were run (the first public edition had 72 pages with 75 code examples),\nlike RMarkdown of Jupyter Notebooks by design, I was able to\nmake <a href=\"https://chem-bla-ics.blogspot.com/search?q=lulu\">many releases</a>.\nThe big advantage of this was that when <a href=\"https://en.wikipedia.org/wiki/API\">API</a> changes happened,\nthis would be visible by code not compiling or by output changing.</p>\n\n<p>At some point I open sourced the book (doi:<a href=\"https://doi.org/10.6084/M9.FIGSHARE.2057790.V1\">10.6084/M9.FIGSHARE.2057790.V1</a>)\nand then realized that I can <a href=\"https://github.com/egonw/cdkbook/commit/2630699aa280200188f2ae9ef3f0698964926752\">convert the book to Markdown</a>:</p>\n\n<pre><code class=\"language-git\">commit 2630699aa280200188f2ae9ef3f0698964926752\nAuthor: Egon Willighagen &lt;egon.willighagen@gmail.com&gt;\nDate:   Mon Dec 24 16:59:14 2018 +0100\n\n    Create chapter3.md\n</code></pre>\n\n<p>This is the version available at <a href=\"https://egonw.github.io/cdkbook/\">egonw.github.io/cdkbook/</a>\nfor some time now. So, now that for SMARTCyp I need to update the visualization, I went book to my book of\ncode examples (I have a collection of more than 200 examples), but then found that\nthe chapter on <a href=\"https://egonw.github.io/cdkbook/depiction\">Depiction</a> was missing. I was not\nlooking forward to this, because I know that\nthe code examples predate a massive improvement by <a href=\"https://scholia.toolforge.org/author/Q28796322\">John Mayfield</a>\nof the rendering stack and I never got around to see if the examples from the book work well enough\nwith that new API (one is actually updated).</p>\n\n<p>That is when I realized that the <em>Groovy Cheminformatics</em> book actually also is a downstream\nproject that needs updating. I have been doing this already and it’s fairly smooth so that I did\nnot think of including it in the grant, other than updating the\n<a href=\"https://egonw.github.io/cdkbook/migration\">Migration</a> chapter. I now had enough time\nto dive into <a href=\"https://github.com/cdk/nwo-openscience-2024/issues/30\">this project</a>. I need that,\nbecause the goal of the project is also to learn about all the meta science aspects of\nproject maintenance, roles, communication, etc. Therefore also this blog post: we need a track\nrecord, to collect data.</p>\n\n<p>Anyway, porting <a href=\"https://egonw.github.io/cdkbook/code/RenderMolecule.code.html\">the first script</a> went fairly easy,\nbut I am now running into a stacktrace:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>Processing  RenderSelection.groovyin\ndoing RenderSelection.out ...\norg.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed:\n/home/egonw/var/Projects/hub/cdkbook-source/code/RenderSelection.groovy: 39: unable to resolve class ExternalHighlightGenerator\n @ line 39, column 16.\n   generators.add(new ExternalHighlightGenerator());\n                  ^\norg.codehaus.groovy.syntax.SyntaxException: unable to resolve class ExternalHighlightGenerator\n @ line 39, column 16.\n\n</code></pre></div></div>\n\n<p>That brings us to the task of how to find where that class is coming from, which happens\nto be something I already <a href=\"https://github.com/cdk/nwo-openscience-2024/issues/29\">had to write up</a>\nfor up for <code class=\"language-plaintext highlighter-rouge\">RingSearch</code>. Dependency galore.</p>",
      "summary": "In the CDK2024 grant we wrote about updating various software projects using the Chemistry Development Kit. We even wrote that “[r]equired API changes will be publicly shared and disseminated with the Groovy Cheminformatics with the Chemistry Development Kit book (egonw.github.io/cdkbook/)”. The Groovy Cheminformatics with the Chemistry Development Kit book is a project that has run since 2009.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/cdkDepictChapter.png",
      "date_published": "2024-06-16T00:00:00+00:00",
      "date_modified": "2025-03-11T00:00:00+00:00",
      "tags": ["cdk","grant","cdk2024"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.6084/M9.FIGSHARE.2057790.V1", "doi": "10.6084/M9.FIGSHARE.2057790.V1"
             }
            
          
        ],
      
      "_funding": [{"award": { "title" : "The Chemistry Development Kit in 2024: improving cheminformatics research", "acronym" : "CDK2024", "uri" : "drc.filenumber:osf232097" }, "funder": { "name": "Dutch Research Council", "ror": "04jsz6e67" } }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/m9g28-dne38",
      "url": "https://chem-bla-ics.linkedchemistry.info/2024/06/10/two-meetings.html",
      "title": "Two meetings: ELIXIR Toxicology and FAIR4ChemNL",
      "content_html": "<p>Noting that in the coming week I am not attending the <a href=\"https://elixir-europe.org/events/elixir-all-hands-2024\">ELIXIR All Hands in Uppsala</a>.\nHaving lived in (and around) Uppsala for more than three years, I am disappointed and with the first stories from colleagues coming\nin even more. But it has been a way too busy year, I have much to finish up, and I need to take care of myself too. I am not 32 anymore.</p>\n\n<p>But in the past two weeks I did attend two workshops. The first was a <a href=\"https://www.aanmelder.nl/intoxicom2024firstworkshop\">workshop</a> by the\n<a href=\"https://elixir-europe.org/communities/toxicology\">ELIXIR Toxicology Community</a>, which was held in Utrecht/NL. The programme was around\nFAIR and included two really nice hands-on sessions where we developed drafts for <a href=\"https://faircookbook.elixir-europe.org/\">FAIR Cookbook</a>\nrecipes (see also doi:<a href=\"https://doi.org/10.1038/s41597-023-02166-3\">10.1038/s41597-023-02166-3</a>) and for\n<a href=\"https://www.go-fair.org/how-to-go-fair/fair-implementation-profile/\">FAIR Implementation Profiles</a>\n(doi:<a href=\"https://doi.org/10.1007/978-3-030-65847-2_13\">10.1007/978-3-030-65847-2_13</a>). We will write up a\n<a href=\"https://biohackrxiv.org/discover\">BioHackrXiv</a> report.</p>\n\n<p>The second workshop was last week, the <a href=\"https://tdcc.nl/evenementen/fair4chemnl-workshop/\">FAIR4ChemNL workshop</a>, which was also held\nin Utrecht/NL. The topic was FAIR in chemistry, and we discussed various aspects. There was a significant participant group from the\nGerman NFDI4Cat project (“Cat” is short for (chemical) catalysis), which recently published a nice analysis of several ontologies\n(doi:<a href=\"https://doi.org/10.1186/s13321-024-00807-2\">10.1186/s13321-024-00807-2</a>). And there was also a lot of mention of RDF and SPARQL.</p>\n\n<p>I think it is time for a new special issue around semantic web technologies.</p>",
      "summary": "Noting that in the coming week I am not attending the ELIXIR All Hands in Uppsala. Having lived in (and around) Uppsala for more than three years, I am disappointed and with the first stories from colleagues coming in even more. But it has been a way too busy year, I have much to finish up, and I need to take care of myself too. I am not 32 anymore.",
      
      "date_published": "2024-06-10T00:00:00+00:00",
      "date_modified": "2024-06-10T00:00:00+00:00",
      "tags": ["elixir","fair","chemistry","rdf","sparql","fair4chemnl"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1038/S41597-023-02166-3", "doi": "10.1038/S41597-023-02166-3"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1007/978-3-030-65847-2_13", "doi": "10.1007/978-3-030-65847-2_13"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1186/S13321-024-00807-2", "doi": "10.1186/S13321-024-00807-2"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/b4tm0-s7c62",
      "url": "https://chem-bla-ics.linkedchemistry.info/2024/06/10/linking-fair-to-reuse.html",
      "title": "New paper: FAIR assessment of nanosafety data reusability with community standards",
      "content_html": "<p><a href=\"FAIR assessment of nanosafety data reusability with community standards\">Ammar</a> is finishing up his PhD thesis with his\nresearch on the use of FAIR towards predictive toxicology. Or, “AI ready”, as the term FAIR is now sometimes explained.\nAny computational method needs good data, and just FAIR is not enough. It needs to meet community standards, as formalized\nin R1.3. To me, this includes meeting community standards like minimal reporting standards. Indeed, in the\n<a href=\"https://www.nanosafetycluster.eu/\">EU NanoSafety Cluster</a> the notion that FAIR data also needs be scientifically\ngood data is well noted.</p>\n\n<p>In this paper (doi:<a href=\"https://doi.org/10.1038/s41597-024-03324-x\">10.1038/s41597-024-03324-x</a>),\nAmmar explores this notion and compiled more than 200 maturity indicators in the category R1.3\nresulting from 12 different community standards. For example, this includes minimal reporting standards. There\nis overlap in needs, but they often also have a different focus. The conclusion here: different (re)use cases\nhave different needs, and data not usable to one use case can be sufficiently FAIR for another. Of course, ideally,\nit would be FAIR enough for all use cases.</p>\n\n<p>Ammar formalizes the maturity indicators and links the comming maturity indicators to various use cases.\nThat means that when you determine the indicator values for your data, people can immediately lookup how\nthis data can be reused. And, the generator of the data can immediately see how the data would need to be\nimproved to widen the reusability. How FAIR can we get?</p>\n\n<p>His proposal has already been further explored in two other papers, one around data sharing\n(doi:<a href=\"https://doi.org/10.1038/s41596-024-00993-1\">10.1038/s41596-024-00993-1</a>, see also\n<a href=\"https://doi.org/10.59350/vfvwq-s0v13\">this blog post</a>) and one around QSAR modelling\n(doi:<a href=\"https://doi.org/10.1016/j.impact.2023.100475\">10.1016/j.impact.2023.100475</a>,\nsee also <a href=\"https://doi.org/10.59350/7zf38-w9670\">this blog post</a>).</p>\n\n<p>The below screenshot shows what an analysis using this approach can look like:</p>\n\n<p><img src=\"/assets/images/41597_2024_3324_Fig3_HTML.png\" alt=\"\" /></p>",
      "summary": "Ammar is finishing up his PhD thesis with his research on the use of FAIR towards predictive toxicology. Or, “AI ready”, as the term FAIR is now sometimes explained. Any computational method needs good data, and just FAIR is not enough. It needs to meet community standards, as formalized in R1.3. To me, this includes meeting community standards like minimal reporting standards. Indeed, in the EU NanoSafety Cluster the notion that FAIR data also needs be scientifically good data is well noted.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/41597_2024_3324_Fig3_HTML.png",
      "date_published": "2024-06-10T00:00:00+00:00",
      "date_modified": "2024-06-10T00:00:00+00:00",
      "tags": ["fair","toxicology","qsar"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1038/S41597-024-03324-X", "doi": "10.1038/S41597-024-03324-X"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1016/J.IMPACT.2023.100475", "doi": "10.1016/J.IMPACT.2023.100475"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1038/S41596-024-00993-1", "doi": "10.1038/S41596-024-00993-1"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/vfvwq-s0v13",
      "url": "https://chem-bla-ics.linkedchemistry.info/2024/05/27/from-spreadsheets-to-rdf.html",
      "title": "New paper: A template wizard for the cocreation of machine-readable data-reporting to harmonize the evaluation of (nano)materials",
      "content_html": "<p>I was about to call this blog post <em>From spreadsheets to RDF</em>, after <a href=\"https://chem-bla-ics.linkedchemistry.info/2024/05/20/from-papers-to-rdf.html\">the post last week</a>.\nBut then I decided to just use the pattern I typically use. Why I wanted to use that shorter term in the first\nplace was that one of the thing I like about the <a href=\"https://sourceforge.net/projects/ambit/\">AMBIT software</a>\n(of OpenTox and eNanoMapper fame) is its\nRDF support (see doi:<a href=\"https://doi.org/10.1186/1756-0500-4-487\">10.1186/1756-0500-4-487</a>). But\n<a href=\"https://chem-bla-ics.linkedchemistry.info/tag/rdf\">RDF</a>, ontologies,\nthose are hard things. And unlike mathematics, we do not have simple objects like integer numbers or simple\noperators. Well, I think we do, and we talk about them. But there is no obligatory education. Just like\nany biologist needs to know what <em>1 + 2</em> means, I think any biologist needs basic knowledge about how\nknowledge graphs work. But sometimes feels like a taboo, like cursing in the life sciences church.</p>\n\n<p>So, there we are. This is where spreadsheets come in. If done well, they combine aspects of knowledge graphs\nwith usability and can even cover a good bit of the learnability. This is what is described in this new\npaper about templates in the <a href=\"https://www.nanosafetycluster.eu/\">EU NanoSafety Cluster</a>: <em>A template wizard\nfor the cocreation of machine-readable data-reporting to harmonize the evaluation of (nano)materials</em>\n(doi:<a href=\"https://doi.org/10.1038/s41596-024-00993-1\">10.1038/s41596-024-00993-1</a>).</p>\n\n<p>The learnability comes in with the spreadsheet templates (“this is how we did it”) and a “wizard” around\nit guides the user with the selection of a template but also can provide feedback on the template. The\ntechnical term for that is “validator”, but it can be tought of as a spelling checker. Computers are good at\nfinding contradictions (the lack of a pattern), though less good at ranking the alternatives (which is\nthe cause of hallucinations in AI approaches).</p>\n\n<p>And to return to the RDF, software like AMBIT can read these templates, use the semantics linked to the\ntemplate, and make the FAIR static spreadsheets (good for archiving on Zenodo!) available as FAIR interactive\ndata (good for exploration and machine learning), and as RDF (good for data integration).</p>\n\n<p>Congrats to <a href=\"http://orcid.org/0000-0002-4322-6179\">Nina</a> and the various EU NanoSafety Cluster projects!</p>",
      "summary": "I was about to call this blog post From spreadsheets to RDF, after the post last week. But then I decided to just use the pattern I typically use. Why I wanted to use that shorter term in the first place was that one of the thing I like about the AMBIT software (of OpenTox and eNanoMapper fame) is its RDF support (see doi:10.1186/1756-0500-4-487). But RDF, ontologies, those are hard things. And unlike mathematics, we do not have simple objects like integer numbers or simple operators. Well, I think we do, and we talk about them. But there is no obligatory education. Just like any biologist needs to know what 1 + 2 means, I think any biologist needs basic knowledge about how knowledge graphs work. But sometimes feels like a taboo, like cursing in the life sciences church.",
      
      "date_published": "2024-05-27T00:00:00+00:00",
      "date_modified": "2024-05-27T00:00:00+00:00",
      "tags": ["rdf","opentox","fair"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1186/1756-0500-4-487", "doi": "10.1186/1756-0500-4-487"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1038/S41596-024-00993-1", "doi": "10.1038/S41596-024-00993-1"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/jdj8r-h6187",
      "url": "https://chem-bla-ics.linkedchemistry.info/2024/05/20/from-papers-to-rdf.html",
      "title": "New paper: From papers to RDF-based integration of physicochemical data and adverse outcome pathways for nanomaterials",
      "content_html": "<p>Making something FAIR is hard, particularly when you do more than making something findable. We’ve seen before that\nmaking something usefully findable <a href=\"https://chem-bla-ics.blogspot.com/2020/10/new-paper-semi-automated-workflow-for.html?q=serena\">requires deep indexing</a>,\nand already that continues to be difficult, because we are not seeing it enough.\nSo, when I thought convert a <a href=\"https://chem-bla-ics.blogspot.com/2021/05/new-strategy-towards-generation-of.html\">paper led by Hoet’s lab in Leuven</a>\ninto machine-actionable RDF to make it FAIR, I gravely underestimated the amount of work.\n<a href=\"https://scholia.toolforge.org/author/Q99306396\">Jeaphianne</a> et al. did an awesome job on this work\n(doi:<a href=\"https://doi.org/10.1186/s13321-024-00833-0\">10.1186/s13321-024-00833-0</a>).</p>\n\n<p>The idea was simple: write up which nanomaterial (type) activates which molecular initiating event.\nIt would simply annotate each material with a unique identifier to link it to databases like\n<a href=\"https://enanomapper.adma.ai/\">eNanoMapper</a> and <a href=\"https://doi.org/10.3389/fphy.2023.1271842\">NanoCommons</a>\nand it would use unique identifiers for the\n<a href=\"https://chem-bla-ics.blogspot.com/2022/05/new-providing-adverse-outcome-pathways.html\">Adverse Outcome Pathway</a>) (AOP) key events.\nAs such, it would make a direct link in the growing linked open data cloud between the AOPs\nand the nanomaterial databases.</p>\n\n<p>Unfortunately, it was quickly discovered that actually reusing this new datasets requires rich annotation (metadata!)\nof the materials and the materials from the source paper were not yet in material databases.\nAnd then the cumbersome start was started, resulting in a very rich data model describing the\nkey events, the materials, the assays used, and the original papers themselves:</p>\n\n<p><img src=\"/assets/images/13321_2024_833_Fig1_HTML.png\" alt=\"\" /></p>\n\n<p>But the work has not finished yet. The paper assigned <a href=\"https://chem-bla-ics.blogspot.com/2022/09/nanomaterial-identifiers-erm-identifier.html\">ERM identifiers</a>\nto all included materials, and now these need to be added to new <a href=\"https://nanocommons.github.io/erm-database/\">ERM Identifier Database</a>\nunder development.</p>",
      "summary": "Making something FAIR is hard, particularly when you do more than making something findable. We’ve seen before that making something usefully findable requires deep indexing, and already that continues to be difficult, because we are not seeing it enough. So, when I thought convert a paper led by Hoet’s lab in Leuven into machine-actionable RDF to make it FAIR, I gravely underestimated the amount of work. Jeaphianne et al. did an awesome job on this work (doi:10.1186/s13321-024-00833-0).",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/13321_2024_833_Fig1_HTML.png",
      "date_published": "2024-05-20T00:00:00+00:00",
      "date_modified": "2024-05-20T00:00:00+00:00",
      "tags": ["fair","rdf","erm"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1186/S13321-024-00833-0", "doi": "10.1186/S13321-024-00833-0"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.14573/ALTEX.2102191", "doi": "10.14573/ALTEX.2102191"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.3390/NANO10102068", "doi": "10.3390/NANO10102068"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1186/S13321-022-00614-7", "doi": "10.1186/S13321-022-00614-7"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.3389/FPHY.2023.1271842", "doi": "10.3389/FPHY.2023.1271842"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.3762/BJNANO.6.165", "doi": "10.3762/BJNANO.6.165"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1089/AIVT.2021.0010", "doi": "10.1089/AIVT.2021.0010"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/s1hwk-vj154",
      "url": "https://chem-bla-ics.linkedchemistry.info/2024/05/18/cdk2024-2.html",
      "title": "cdk2024 #2: publishing grant proposals",
      "content_html": "<p>Publishing grant proposal is still not very common. The proposal published in Research Ideas and Outcomes)\n(doi:<a href=\"https://doi.org/10.3897/rio.10.e124884\">10.3897/rio.10.e124884</a>) for the\n<a href=\"/2024/04/07/cdk2024.html\">NWO Open Science grant for the CDK</a> is, however, not the first and hopefully not the last.\nInterestingly, it is already cited in (the German) Wikipedia. It is used <a href=\"https://de.wikipedia.org/wiki/Chemistry_Development_Kit\">there</a>\nto support a statement which tools use the Chemistry Development Kit.</p>",
      "summary": "Publishing grant proposal is still not very common. The proposal published in Research Ideas and Outcomes) (doi:10.3897/rio.10.e124884) for the NWO Open Science grant for the CDK is, however, not the first and hopefully not the last. Interestingly, it is already cited in (the German) Wikipedia. It is used there to support a statement which tools use the Chemistry Development Kit.",
      
      "date_published": "2024-05-18T00:00:00+00:00",
      "date_modified": "2025-03-11T00:00:00+00:00",
      "tags": ["cdk","grant","cdk2024"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.3897/RIO.10.E124884", "doi": "10.3897/RIO.10.E124884"
             }
            
          
        ],
      
      "_funding": [{"award": { "title" : "The Chemistry Development Kit in 2024: improving cheminformatics research", "acronym" : "CDK2024", "uri" : "drc.filenumber:osf232097" }, "funder": { "name": "Dutch Research Council", "ror": "04jsz6e67" } }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/ytkmr-0vv92",
      "url": "https://chem-bla-ics.linkedchemistry.info/2024/04/07/cdk2024.html",
      "title": "cdk2024 #1: NWO Open Science grant for the Chemistry Development Kit",
      "content_html": "<p>We recently got awarded our <a href=\"https://chem-bla-ics.linkedchemistry.info/2022/03/05/bridgedb-nwo-grant-update-1-first-steps.html\">second <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nNWO Open Science grant (<a href=\"https://www.nwo.nl/en/projects/osf232097\">OSF23.2.097</a>),\nthis time for the <a href=\"https://cdk.github.io/\">Chemistry Development Kit</a> (CDK).\n“We” here is me and <a href=\"https://orcid.org/0000-0003-0896-0906\">Alyanne de Haan</a>, René van der Ploeg, and\n<a href=\"https://orcid.org/0000-0002-3496-6669\">Marc Teunis</a> from Hogeschool Utrecht.\nThe proposal has been submitted for public dissemination in <a href=\"https://riojournal.com/\">RIO Journal</a>, like\n<a href=\"http://localhost:4000/2022/04/17/bridgedb-nwo-grant-update-2-building-up.html\">we did <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nwith the first NWO Open Science grant.</p>\n\n<p>The project formally started on April 1 but we had our kick-off meeting in Maastricht on April 4-5.\nWe were joined by Javier and on the second day by Marvin, and Ozan from our <a href=\"https://www.maastrichtuniversity.nl/research/bioinformatics\">BiGCaT research group</a>\nin Maastricht. During this hackathon, I gave a (repeat) <a href=\"https://zenodo.org/records/6414204\">presentation</a>\nabout the history of the CDK which also included the problem that software using the CDK does not\nalways use the most recent version.</p>\n\n<p>And that, upgrading tools using the CDK with the latest CDK version, is the main topic of this grant (work package 2, WP2).\nThe full proposal has the focus list of tools, but most of it is also listed in\n<a href=\"https://github.com/cdk/nwo-openscience-2024/issues\">the issue tracker</a> we have set up as project\nmanagement tool on GitHub.</p>\n\n<p>Second, we actually hacked together on two first tools, one on our focus list, but the other that was\n<a href=\"https://github.com/cdk/nwo-openscience-2024/issues/22\">requested we have a look at too</a>: SMARTCyp.\nThe latest version uses <a href=\"https://www.rdkit.org/\">RDKit</a> (doi:<a href=\"https://doi.org/10.1093/bioinformatics/btz037\">10.1093/bioinformatics/btz037</a>),\nbut the original version uses the CDK (doi:<a href=\"https://doi.org/10.1021/ml100016x\">10.1021/ml100016x</a>).</p>\n\n<p>We downloaded the source code of SMARTCyp 2.4.2, started taking <a href=\"https://github.com/cdk/nwo-openscience-2024/blob/main/monitoring/smartcyp.md\">notes</a>,\nJavier <a href=\"https://github.com/cdk/smartcyp\">started</a> a Maven build environment, updated a lot of code, but we seem quite close to a version that can be tested by\npeople that have integrated SMARTCyp in other tools. This is based on <a href=\"https://github.com/cdk/cdk/releases/tag/cdk-2.9\">CDK 2.9</a>\nand if you ignore the 2D depiction glitch, it looks it was a nice first choice:</p>\n\n<p><img src=\"/assets/images/smartcyp.png\" alt=\"\" /></p>\n\n<p>On a final note, we plan to record carefully our steps, in an open notebook science approach, with\nthe intention to extract general upgrade steps. For example, we will update the\n<a href=\"https://egonw.github.io/cdkbook/migration.html\">Migration</a> section of the\n<a href=\"https://egonw.github.io/cdkbook/\">Groovy Cheminformatics with the Chemistry Development Kit</a>.</p>",
      "summary": "We recently got awarded our second NWO Open Science grant (OSF23.2.097), this time for the Chemistry Development Kit (CDK). “We” here is me and Alyanne de Haan, René van der Ploeg, and Marc Teunis from Hogeschool Utrecht. The proposal has been submitted for public dissemination in RIO Journal, like we did with the first NWO Open Science grant.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/smartcyp.png",
      "date_published": "2024-04-07T00:00:00+00:00",
      "date_modified": "2025-03-11T00:00:00+00:00",
      "tags": ["grant","cdk","cdk2024"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1093/bioinformatics/btz037", "doi": "10.1093/bioinformatics/btz037"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1021/ml100016x", "doi": "10.1021/ml100016x"
             }
            
          
        ],
      
      "_funding": [{"award": { "title" : "The Chemistry Development Kit in 2024: improving cheminformatics research", "acronym" : "CDK2024", "uri" : "drc.filenumber:osf232097" }, "funder": { "name": "Dutch Research Council", "ror": "04jsz6e67" } }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/p4g67-ajf20",
      "url": "https://chem-bla-ics.linkedchemistry.info/2024/04/02/open-science-retreat-2.html",
      "title": "Open Science Retreat #2: CiTO Nanopublications",
      "content_html": "<p>During the <a href=\"http://chem-bla-ics.linkedchemistry.info/2024/03/31/open-science-retreat-1.html\">Open Science Retreat</a> I organized\na short session where we looking into typing citation intentions using a new nanopublication template. First, let’s describe\nnanopublications (originally used in doi:<a href=\"https://doi.org/10.3233/ISU-2010-0613\">10.3233/ISU-2010-0613</a>) a bit.\nScholia gives <a href=\"https://scholia.toolforge.org/topic/Q57814310\">a nice overview of (macro?)publications on the topic</a>.\nThe <a href=\"https://nanopub.net/\">nanopub.net</a>\nwebsite describes that <em>[a nanopublication is a small knowledge graph snippet with metadata that is treated as an\nindependent (scientific) publication.]</em>. The knowledge graph, it continues, can be anything from an opinion to the link\nbetween a disease and a gene (doi:<a href=\"https://doi.org/10.1109/ESCIENCE.2018.00024\">10.1109/ESCIENCE.2018.00024</a>).</p>\n\n<p>Now, in this post I will document an update of how we can use nanopublications for citation intention annotation, and\ncompare this to existing solutions. I have been collecting and indexing the CiTO intention annotations in Wikidata and\nvisualizing the corpus with Scholia at <a href=\"https://scholia.toolforge.org/cito/\">scholia.toolforge.org/cito/</a>. There are\ncurrently 22 journal articles with explicit CiTO annoation, largely thanks to a <a href=\"https://www.biomedcentral.com/collections/cito\">Journal of Cheminformatics pilot</a>\n(e.g. see doi:<a href=\"https://doi.org/10.1186/s13321-023-00683-2\">10.1186/s13321-023-00683-2</a>). Recently,\nthe preprint/report server <a href=\"https://biohackrxiv.org/discover\">BioHackrXiv</a> started\n<a href=\"https://github.com/biohackrxiv/publication-template\">CiTO support</a> too, also visible in the statistics\non Scholia with another 17 papers. A third source is data sets from bibliometric-like studies, as explained\nin <a href=\"https://chem-bla-ics.linkedchemistry.info/2023/04/02/cito-updates-4-annotations-in-datasets.html\">this post <i class=\"fa-solid fa-recycle fa-xs\"></i></a>. Nanopublications\nwould be a fourth solution.</p>\n\n<p>So, why another solutions? Like the datasets, assuming DataCite approaches, have clear provenance, but the overhead\nof and needed time for creating a dataset with citation intent annotations can be limiting. And because nanopublications\ncan be linked to ORCID identifiers, we can even discover which citation intent annotations are created by the original\nauthors of articles. Another advantage is that nanopubs are basically RDF and we can query them easily, allowing\nthe citation intentions to migrate to Wikidata. Scholia already saw an update to recognize nanopublications as\na unique kind reference (see the new Wikidata property <a href=\"https://www.wikidata.org/wiki/Property:P12545\">Nanopublication identifier (P12545)</a>).</p>\n\n<h1 id=\"nanodash-template\">NanoDash template</h1>\n\n<p>So, if we can make it easy for people to define nanopublications with CiTO citation intent annotations, than we can\nstart formalizing intent annotations from a much wider range of use cases. For example, we can annotate historically\nimportant discussions. Anyone can retrospectively annotate all their own articles, making them more FAIR. And if we\nuse DOI links, then it no longer is limited to journal articles, but we can use of for software and data citations too.\nThis is where <a href=\"https://w3id.org/np/RAX_4tWTyjFpO6nz63s14ucuejd64t2mK3IBlkwZ7jjLo\">a recent template</a> comes in created by\n<a href=\"https://orcid.org/0000-0002-1267-0234\">Tobias Kuhn</a>, one of the main nanopub developers:</p>\n\n<p><img src=\"/assets/images/citoPub.png\" alt=\"\" /></p>\n\n<p>This nanopublication template defines the minimal needs of the assumptions, along with useful provenance and nanopub\ninfo. Basically, the assertion defines that one DOI is a ScholarlyWork and using the CiTO, defines that it cites\none or more article works (with DOI). For each citations, one can select any of the known CiTO intent types,\ne.g. ‘extends’ or ‘uses method’ in, as in <a href=\"https://w3id.org/np/RA6Rxk1sSOSWxM7A6gW4SjJZRVt4fbY6nShPTAbQ8kce8\">this nanopublication</a>\ncreated with this template:</p>\n\n<p><img src=\"/assets/images/citoPub2.png\" alt=\"\" /></p>\n\n<h2 id=\"sparql-ing-cito-annotations\">SPARQL-ing CiTO annotations</h2>\n\n<p>Besides the template, Tobias also started a SPARQL query to which I added restrictions that the citing and cited\nresources needs to have a DOI, giving us <a href=\"https://query.knowledgepixels.com/tools/type/2c1cce3f3152738c1009d59251409392aaaa3b0324bcb5fdfb4b7b944b8f0c18/yasgui.html#query=prefix+rdfs%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%0Aprefix+np%3A+%3Chttp%3A%2F%2Fwww.nanopub.org%2Fnschema%23%3E%0Aprefix+npa%3A+%3Chttp%3A%2F%2Fpurl.org%2Fnanopub%2Fadmin%2F%3E%0Aprefix+npx%3A+%3Chttp%3A%2F%2Fpurl.org%2Fnanopub%2Fx%2F%3E%0Aprefix+xsd%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2001%2FXMLSchema%23%3E%0Aprefix+dct%3A+%3Chttp%3A%2F%2Fpurl.org%2Fdc%2Fterms%2F%3E%0A%0Aselect+%3Fnp+%3Flabel+%3Fsubj+%3Fcitationrel+%3Fobj+%3Fdate+where+%7B%0A++graph+npa%3Agraph+%7B%0A++++%3Fnp+npa%3AhasValidSignatureForPublicKey+%3Fpubkey+.%0A++++%3Fnp+dct%3Acreated+%3Fdate+.%0A++++%3Fnp+np%3AhasAssertion+%3Fassertion+.%0A++++optional+%7B+%3Fnp+rdfs%3Alabel+%3Flabel+.+%7D%0A++++filter+not+exists+%7B+%3Fnpx+npx%3Ainvalidates+%3Fnp+%3B+npa%3AhasValidSignatureForPublicKey+%3Fpubkey+.+%7D%0A++++filter+not+exists+%7B+%3Fnp+npx%3AhasNanopubType+npx%3AExampleNanopub+.+%7D%0A++%7D%0A++graph+%3Fassertion+%7B%0A++++%3Fsubj+%3Fcitationrel+%3Fobj+.%0A++++filter(regex(str(%3Fcitationrel)%2C+%22%5Ehttp%3A%2F%2Fpurl.org%2Fspar%2Fcito%2F.*%24%22))%0A++++filter(regex(str(%3Fsubj)%2C+%22doi.org%2F10%22))%0A++++filter(regex(str(%3Fobj)%2C+%22doi.org%2F10%22))%0A++%7D%0A%7D%0A++&amp;contentTypeConstruct=text%2Fturtle&amp;contentTypeSelect=application%2Fsparql-results%2Bjson&amp;endpoint=%2Frepo%2Ftype%2F2c1cce3f3152738c1009d59251409392aaaa3b0324bcb5fdfb4b7b944b8f0c18&amp;requestMethod=POST&amp;tabTitle=Query&amp;headers=%7B%7D&amp;outputFormat=table\">this query</a>:</p>\n\n<div class=\"language-sparql highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">prefix</span><span class=\"w\"> </span><span class=\"nn\">rdfs</span><span class=\"o\">:</span><span class=\"w\"> </span><span class=\"nn\">&lt;http://www.w3.org/2000/01/rdf-schema#&gt;</span><span class=\"w\">\n</span><span class=\"k\">prefix</span><span class=\"w\"> </span><span class=\"nn\">np</span><span class=\"o\">:</span><span class=\"w\"> </span><span class=\"nn\">&lt;http://www.nanopub.org/nschema#&gt;</span><span class=\"w\">\n</span><span class=\"k\">prefix</span><span class=\"w\"> </span><span class=\"nn\">npa</span><span class=\"o\">:</span><span class=\"w\"> </span><span class=\"nn\">&lt;http://purl.org/nanopub/admin/&gt;</span><span class=\"w\">\n</span><span class=\"k\">prefix</span><span class=\"w\"> </span><span class=\"nn\">npx</span><span class=\"o\">:</span><span class=\"w\"> </span><span class=\"nn\">&lt;http://purl.org/nanopub/x/&gt;</span><span class=\"w\">\n</span><span class=\"k\">prefix</span><span class=\"w\"> </span><span class=\"nn\">xsd</span><span class=\"o\">:</span><span class=\"w\"> </span><span class=\"nn\">&lt;http://www.w3.org/2001/XMLSchema#&gt;</span><span class=\"w\">\n</span><span class=\"k\">prefix</span><span class=\"w\"> </span><span class=\"nn\">dct</span><span class=\"o\">:</span><span class=\"w\"> </span><span class=\"nn\">&lt;http://purl.org/dc/terms/&gt;</span><span class=\"w\">\n\n</span><span class=\"k\">select</span><span class=\"w\"> </span><span class=\"nv\">?np</span><span class=\"w\"> </span><span class=\"nv\">?label</span><span class=\"w\"> </span><span class=\"nv\">?subj</span><span class=\"w\"> </span><span class=\"nv\">?citationrel</span><span class=\"w\"> </span><span class=\"nv\">?obj</span><span class=\"w\"> </span><span class=\"nv\">?date</span><span class=\"w\"> </span><span class=\"k\">where</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\">\n  </span><span class=\"k\">graph</span><span class=\"w\"> </span><span class=\"nn\">npa</span><span class=\"o\">:</span><span class=\"ss\">graph</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\">\n    </span><span class=\"nv\">?np</span><span class=\"w\"> </span><span class=\"nn\">npa</span><span class=\"o\">:</span><span class=\"ss\">hasValidSignatureForPublicKey</span><span class=\"w\"> </span><span class=\"nv\">?pubkey</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n    </span><span class=\"nv\">?np</span><span class=\"w\"> </span><span class=\"nn\">dct</span><span class=\"o\">:</span><span class=\"ss\">created</span><span class=\"w\"> </span><span class=\"nv\">?date</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n    </span><span class=\"nv\">?np</span><span class=\"w\"> </span><span class=\"nn\">np</span><span class=\"o\">:</span><span class=\"ss\">hasAssertion</span><span class=\"w\"> </span><span class=\"nv\">?assertion</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n    </span><span class=\"k\">optional</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\"> </span><span class=\"nv\">?np</span><span class=\"w\"> </span><span class=\"nn\">rdfs</span><span class=\"o\">:</span><span class=\"ss\">label</span><span class=\"w\"> </span><span class=\"nv\">?label</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\"> </span><span class=\"p\">}</span><span class=\"w\">\n    </span><span class=\"k\">filter</span><span class=\"w\"> </span><span class=\"k\">not</span><span class=\"w\"> </span><span class=\"k\">exists</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\"> </span><span class=\"nv\">?npx</span><span class=\"w\"> </span><span class=\"nn\">npx</span><span class=\"o\">:</span><span class=\"ss\">invalidates</span><span class=\"w\"> </span><span class=\"nv\">?np</span><span class=\"w\"> </span><span class=\"p\">;</span><span class=\"w\"> </span><span class=\"nn\">npa</span><span class=\"o\">:</span><span class=\"ss\">hasValidSignatureForPublicKey</span><span class=\"w\"> </span><span class=\"nv\">?pubkey</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\"> </span><span class=\"p\">}</span><span class=\"w\">\n    </span><span class=\"k\">filter</span><span class=\"w\"> </span><span class=\"k\">not</span><span class=\"w\"> </span><span class=\"k\">exists</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\"> </span><span class=\"nv\">?np</span><span class=\"w\"> </span><span class=\"nn\">npx</span><span class=\"o\">:</span><span class=\"ss\">hasNanopubType</span><span class=\"w\"> </span><span class=\"nn\">npx</span><span class=\"o\">:</span><span class=\"ss\">ExampleNanopub</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\"> </span><span class=\"p\">}</span><span class=\"w\">\n  </span><span class=\"p\">}</span><span class=\"w\">\n  </span><span class=\"k\">graph</span><span class=\"w\"> </span><span class=\"nv\">?assertion</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\">\n    </span><span class=\"nv\">?subj</span><span class=\"w\"> </span><span class=\"nv\">?citationrel</span><span class=\"w\"> </span><span class=\"nv\">?obj</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n    </span><span class=\"k\">filter</span><span class=\"p\">(</span><span class=\"nb\">regex</span><span class=\"p\">(</span><span class=\"nb\">str</span><span class=\"p\">(</span><span class=\"nv\">?citationrel</span><span class=\"p\">),</span><span class=\"w\"> </span><span class=\"s2\">\"^http://purl.org/spar/cito/.*$\"</span><span class=\"p\">))</span><span class=\"w\">\n    </span><span class=\"k\">filter</span><span class=\"p\">(</span><span class=\"nb\">regex</span><span class=\"p\">(</span><span class=\"nb\">str</span><span class=\"p\">(</span><span class=\"nv\">?subj</span><span class=\"p\">),</span><span class=\"w\"> </span><span class=\"s2\">\"doi.org/10\"</span><span class=\"p\">))</span><span class=\"w\">\n    </span><span class=\"k\">filter</span><span class=\"p\">(</span><span class=\"nb\">regex</span><span class=\"p\">(</span><span class=\"nb\">str</span><span class=\"p\">(</span><span class=\"nv\">?obj</span><span class=\"p\">),</span><span class=\"w\"> </span><span class=\"s2\">\"doi.org/10\"</span><span class=\"p\">))</span><span class=\"w\">\n  </span><span class=\"p\">}</span><span class=\"w\">\n</span><span class=\"p\">}</span><span class=\"w\">\n</span></code></pre></div></div>\n\n<p>This includes 6 citation intentions defined by 4 nanopublications added during the Open Science Retreat:</p>\n\n<ul>\n  <li><a href=\"https://w3id.org/np/RAUjZE1JMu1GAvUQ_fZ4yc9-7sOSCT9xbeS0wYznkKtYk\">RAUjZE1JMu</a> by <a href=\"https://nanodash.knowledgepixels.com/explore?id=https%3A%2F%2Forcid.org%2F0000-0002-7192-1486\">me</a> for a paper by Marija Purgar</li>\n  <li><a href=\"https://nanodash.knowledgepixels.com/explore?id=RAXgI--5gcKskgrnOI1XZoA4b3hu9RbNj3bcc2Zxeos7c\">RAXgI–5gc</a> by <a href=\"https://nanodash.knowledgepixels.com/explore?id=https%3A%2F%2Forcid.org%2F0000-0003-2408-7588\">Christian Meesters</a></li>\n  <li><a href=\"https://nanodash.knowledgepixels.com/explore?id=RATZNhd3l_jN0y8GEi8mLIqy-uVV8tiUZIq2RJtkq6G8A\">RATZNhd3l_j</a> by <a href=\"https://nanodash.knowledgepixels.com/explore?id=https%3A%2F%2Forcid.org%2F0000-0003-4285-690X\">Taichi Oichi</a></li>\n  <li><a href=\"https://nanodash.knowledgepixels.com/explore?id=RA6Q6wxSYyWfA3XwpOBqSNFKgQpM7ZgdVBoU2kSD-CFjw\">RA6Q6wxSYy</a> by <a href=\"https://nanodash.knowledgepixels.com/explore?id=https%3A%2F%2Forcid.org%2F0000-0003-1559-1838\">Niklas Hohmann</a></li>\n</ul>\n\n<h1 id=\"from-nanopublications-to-wikidata\">From nanopublications to Wikidata</h1>\n\n<p>Now, this query also provides me with enough information to propagate the citation intent (a fact?) to Wikidata\nand cite the original nanopublication as reference. With a variation of the above SPARQL query, I can get the\nfive most recent new nanopublications, convert them to QuickStatements, and then enjoy them in Wikidata. This\nis written up in <a href=\"https://github.com/egonw/ons-wikidata/blob/main/Nanopubs/createQS.groovy\">this Bacting script</a>.</p>\n\n<p>The script needs to handle some situations. For example, it will not add items for DOIs not already in Wikidata.\nSo, if neither of the two DOIs are known in Wikidata, then nothing gets added. If they both are, then it will\nadd the citation intent. There are alternative solutions, but in practice that doesn’t matter and the QuickStatements\nis in all situations the same, and QuickStatements will only add the new information.</p>\n\n<p>This is what it will <a href=\"https://www.wikidata.org/wiki/Q113312162#P2860\">look like in Wikidata</a>:</p>\n\n<p><img src=\"/assets/images/citoPub3.png\" alt=\"\" /></p>\n\n<p>And this is <a href=\"https://scholia.toolforge.org/cito/#articles\">what it looks</a> (yellow) when we compare the contributions\nfrom nanopublications now with the other sources:</p>\n\n<p><img src=\"/assets/images/citoPubs4.png\" alt=\"\" /></p>",
      "summary": "During the Open Science Retreat I organized a short session where we looking into typing citation intentions using a new nanopublication template. First, let’s describe nanopublications (originally used in doi:10.3233/ISU-2010-0613) a bit. Scholia gives a nice overview of (macro?)publications on the topic. The nanopub.net website describes that [a nanopublication is a small knowledge graph snippet with metadata that is treated as an independent (scientific) publication.]. The knowledge graph, it continues, can be anything from an opinion to the link between a disease and a gene (doi:10.1109/ESCIENCE.2018.00024).",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/citoPub.png",
      "date_published": "2024-04-02T00:00:00+00:00",
      "date_modified": "2025-12-24T00:00:00+00:00",
      "tags": ["osr24nl","openscience","cito","nanopub","wikidata"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.3233/ISU-2010-0613", "doi": "10.3233/ISU-2010-0613"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1109/ESCIENCE.2018.00024", "doi": "10.1109/ESCIENCE.2018.00024"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1186/S13321-023-00683-2", "doi": "10.1186/S13321-023-00683-2"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/znw1y-zfg25",
      "url": "https://chem-bla-ics.linkedchemistry.info/2024/03/31/open-science-retreat-1.html",
      "title": "Open Science Retreat #1: impressions",
      "content_html": "<p>Last week I attended the <a href=\"https://openscienceretreat.eu/\">Open Science Retreat</a> (<a href=\"https://hashtags-hub.toolforge.org/osr24nl\">#osr24nl</a>)\nin a quite and relaxing region in North-Holland. The meeting was how I like all meetings to be (and I count myself lucky many of my meetings\nare like this): open, welcoming, constructive, diverse, and intellectually challenging. Not all scientific meetings are like this\nand it is easy to end up going to obligatory meetings where the discussions are of a different level. Therefore, great thanks to\nthe organizers, but also to all participants, that showed not just to have a hearth for open science (getting pretty common),\nbut also a drive to advocate for open science. Finally, I like to thank the people that joined me in creating nanopublications for\nCiTO annotations (will blog about that later), and <a href=\"https://twitter.com/marija_purgar/status/1773745895508451573\">to Sadik and Marija</a>\nwith whom we worked on exploring using Wikibase for capturing knowledge about research waste in ecology (more about that later too).</p>",
      "summary": "Last week I attended the Open Science Retreat (#osr24nl) in a quite and relaxing region in North-Holland. The meeting was how I like all meetings to be (and I count myself lucky many of my meetings are like this): open, welcoming, constructive, diverse, and intellectually challenging. Not all scientific meetings are like this and it is easy to end up going to obligatory meetings where the discussions are of a different level. Therefore, great thanks to the organizers, but also to all participants, that showed not just to have a hearth for open science (getting pretty common), but also a drive to advocate for open science. Finally, I like to thank the people that joined me in creating nanopublications for CiTO annotations (will blog about that later), and to Sadik and Marija with whom we worked on exploring using Wikibase for capturing knowledge about research waste in ecology (more about that later too).",
      
      "date_published": "2024-03-31T00:00:00+00:00",
      "date_modified": "2024-03-31T00:00:00+00:00",
      "tags": ["osr24nl","openscience","wikibase","cito","nanopub"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/zds99-03s42",
      "url": "https://chem-bla-ics.linkedchemistry.info/2024/03/17/two-papers.html",
      "title": "Reusing data: two new papers",
      "content_html": "<p>My research is about the interaction of (machine) representation and the impact on the success of\ndata analysis (matchine learning, chemometrics, AI, etc). See the posts\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2010/08/09/molecular-chemometrics-principles-1.html\">about</a>\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2010/08/12/molecular-chemometrics-principles-2-be.html\">molecular</a>\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2010/08/14/molecular-chemometrics-principles-3.html\">chemometrics</a>.\nThis got me into <a href=\"https://chem-bla-ics.linkedchemistry.info/tag/fair\">FAIR</a>: making data interoperable\nand being able to (really) reuse data is the starting point of doing research.</p>\n\n<p>So, when I get the chance to see something where I worked on to make more FAIR actually being used,\nI love to push the boundaries of FAIR a bit extra. The study of representation of molecules and molecular\nsystems is not quite a popular science, but I find it important. Two new papers got recently published\nto which I contributed from this perspective.</p>\n\n<p>The first paper by Anna Niarakis <i>et al.</i> is about using the SARS-CoV-2/COVID-19 knowledge base we\nhave collected of the past 4 years (doi:<a href=\"https://doi.org/10.3389/fimmu.2023.1282859\">10.3389/fimmu.2023.1282859</a>).\nFor me, this started with a WikiPathways with early knowledge about the virus proteins. I think\nin this and earlier papers, we improved our open science and bioinformatics and are actually\nmore ready for a next pandemic, which inevitably will come.</p>\n\n<p>The second paper by Alfaro Serrano <i>et al.</i> is about how access to data remains key to many\nthings, and this, obviously, includes the Sustainable Development Goals (SDGs)\n(doi:<a href=\"https://doi.org/10.1039/D3SU00148B\">10.1039/D3SU00148B</a>). When it comes down\nto the face/off of FAIR versus Open, I think Open has more impact, hands-down.</p>\n\n<p>About the latter, I recently wrote up ten simple actions you can take to make your\nnanosafety research output more FAIR (doi:<a href=\"https://doi.org/10.5281/zenodo.10533126\">10.5281/zenodo.10533126</a>).</p>",
      "summary": "My research is about the interaction of (machine) representation and the impact on the success of data analysis (matchine learning, chemometrics, AI, etc). See the posts about molecular chemometrics. This got me into FAIR: making data interoperable and being able to (really) reuse data is the starting point of doing research.",
      
      "date_published": "2024-03-17T00:00:00+00:00",
      "date_modified": "2024-03-17T00:00:00+00:00",
      "tags": ["covid19","fair","nanosafety","nanocommons"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.3389/FIMMU.2023.1282859", "doi": "10.3389/FIMMU.2023.1282859"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1039/D3SU00148B", "doi": "10.1039/D3SU00148B"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.5281/ZENODO.10533126", "doi": "10.5281/ZENODO.10533126"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/57rv7-5m756",
      "url": "https://chem-bla-ics.linkedchemistry.info/2024/02/13/wikidata-subsetting.html",
      "title": "New paper: &quot;Wikidata subsetting: approaches, tools, and evaluation&quot;",
      "content_html": "<p>Just before the end of the year, the <em>Wikidata subsetting: approaches, tools, and evaluation</em> paper\nby Seyed Amir Hosseini Beghaeiraveri <em>et al.</em> got published (doi:<a href=\"https://doi.org/10.3233/SW-233491\">10.3233/SW-233491</a>).\nI am really excited our group (i.e.\n<a href=\"https://orcid.org/0000-0002-8399-8990\">Ammar</a> and <a href=\"https://orcid.org/0000-0001-8449-1318\">Denise</a>)\nhas been able to contribute to this. I think it also is a great example\nof the power of hackathons to bring together people.</p>\n\n<p>To me, subsetting of Wikidata (or any large knowledge graph) is important for a couple of reasons.\nFirst, there can be practical reasons. Scholia, for example, is computationally expensive, and the idea\nwe explore in the Alfred P. Sloan Foundation grant for Scholia (doi:<a href=\"https://doi.org/10.3897/rio.5.e35820\">10.3897/rio.5.e35820</a>)\nwas that a subset of Wikidata would make it more performant and potentially\nmore environmental-friendly.</p>\n\n<p>A second reason is more about the scientific process. When doing an analysis and when you want to make\nthe reasoning transparent, you want to share the analyzed data as part of the research output (basically, the “data”).\nFor example, the data may have undergone some curation, or you combined data from two or more different\nsources. And you will want to share this as part of the scientific process. Resharing a full dump\nof the larger knowledge base would not be practical for at least two reasons: duplication of huge data,\nand a lot of unrelated content makes it hard for peers to find the bits of interest to the study.</p>\n\n<p>Subsetting may be useful here. This paper evaluates a number of different subsetting approaches.\nMyself, I am particularly excited about the idea that we can take a shape expression (e.g. <a href=\"https://shex.io\">ShEx</a>)\nas input. I still love the idea that I take the SPARQL queries in my analyses, convert that into\nshapes automatically, and then get a subet that returns the exact same results as the query would\non the full dataset.</p>",
      "summary": "Just before the end of the year, the Wikidata subsetting: approaches, tools, and evaluation paper by Seyed Amir Hosseini Beghaeiraveri et al. got published (doi:10.3233/SW-233491). I am really excited our group (i.e. Ammar and Denise) has been able to contribute to this. I think it also is a great example of the power of hackathons to bring together people.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/wikidata_subsetting_features.png",
      "date_published": "2024-02-13T00:00:00+00:00",
      "date_modified": "2024-02-13T00:00:00+00:00",
      "tags": ["wikidata","scholia"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.3233/SW-233491", "doi": "10.3233/SW-233491"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.3897/RIO.5.E35820", "doi": "10.3897/RIO.5.E35820"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/xcvg3-37491",
      "url": "https://chem-bla-ics.linkedchemistry.info/2024/01/07/phd-defences.html",
      "title": "PhD Defences: Andra Waagmeester and Marvin Martens",
      "content_html": "<p>2023 has been a long year in which a lot happens. Two EU projects ended (<a href=\"https://riskgone.eu/\">RiskGONE</a>\nand <a href=\"https://nanosolveit.eu/\">NanoSolveIT</a>; more about that in a\nlater post), our group leader <a href=\"https://scholia.toolforge.org/author/Q19845641\">Chris Evelo</a> will retire this year,\nthe <a href=\"https://elixir-europe.org/communities/toxicology\">ELIXIR Toxicology Community</a> started (see\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2023/06/11/community-activity-2-fairsharing.html\">this post</a>), the\n<a href=\"https://www.wikipathways.org/\">new WikiPathways website</a> launched (see <a href=\"/2023/11/11/wikipathways-nar.html\">this post</a>),\nand a lot, lot more.</p>\n\n<p>But this post is about the upcoming PhD defences of <a href=\"https://scholia.toolforge.org/author/Q19845625\">Andra Waagmeester</a>\nand <a href=\"https://scholia.toolforge.org/author/Q42369611\">Marvin Martens</a>:</p>\n\n<ul>\n  <li><a href=\"https://www.maastrichtuniversity.nl/events/phd-defence-andra-sachinder-waagmeester\">January 16, 16:00</a>: Andra Waagmeester\non “Biological Pathway Abstractions: From Two-Dimensional Drawings to Multidimensional Linked Data”</li>\n  <li><a href=\"https://www.maastrichtuniversity.nl/events/phd-defence-marvin-tlj-martens\">January 29, 16:00</a>: Marvin Martens\non “Adverse Outcome Pathways Coming to Life Exploring New Ways to Support Risk Assessments”</li>\n</ul>\n\n<p>Both meetings have a minisymposium in the morning, related to their thesis topics. I am very much looking forward\nto these meetings. It’s hard to summarize in a few words what they contributed to open science in general and to\nthe data sciences in biology. So, I rather invite you to join the afternoon PhD defences. I think the PhD theses\nwill become freely avialable after the defence, but you can always check the literature lists on their \nabove linked Scholia pages.</p>\n\n<p>Or ask them questions on Mastodon: <a href=\"https://social.edu.nl/@Marvin\">@Marvin</a> and <a href=\"https://genomic.social/@Andrawaag\">@Andrawaag</a>.</p>",
      "summary": "2023 has been a long year in which a lot happens. Two EU projects ended (RiskGONE and NanoSolveIT; more about that in a later post), our group leader Chris Evelo will retire this year, the ELIXIR Toxicology Community started (see this post), the new WikiPathways website launched (see this post), and a lot, lot more.",
      
      "date_published": "2024-01-07T00:00:00+00:00",
      "date_modified": "2024-01-07T00:00:00+00:00",
      "tags": ["bigcat"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/8pkga-q4n03",
      "url": "https://chem-bla-ics.linkedchemistry.info/2023/11/11/wikipathways-nar.html",
      "title": "New paper: &quot;WikiPathways 2024: next generation pathway database&quot;",
      "content_html": "<p>This week the next <a href=\"https://wikipathways.org/\">WikiPathways</a> <a href=\"https://academic.oup.com/nar/search-results?f_TocHeadingTitle=Database+Issue\">NAR Database</a>\nissue paper was published (doi:<a href=\"https://doi.org/10.1093/nar/gkad960\">10.1093/nar/gkad960</a>). It is the next\npaper in a series of papers about the evolution of the Open Science project for\nmaking biological pathways available in a Open and FAIR way. This year, it described\nthat significant move away from <a href=\"https://en.wikipedia.org/wiki/MediaWiki\">MediaWiki</a>.\nIt simply was too costly to keep up with the upstream code base (think: more than 200\nthousand euro costly). This paper describes a transition to a modular system with\n<a href=\"https://en.wikipedia.org/wiki/Jekyll_(software)\">Jekyll</a> and Markdown as\nnew platform technologies. The full details are available as open notebook science:\neverything is basically a git repository.</p>\n\n<p>The is the workflow of what the new platform does when a new pathway (version) gets\nadded to WikiPathways:</p>\n\n<p><img src=\"/assets/images/wp-gpml-change-workflow.png\" alt=\"Workflow that is triggered by an added or changed GPML file, eventually triggering an update of the website.\" /></p>\n\n<p>The upgrade of the whole stack is, however, in full swing. Not everything has\nmigrated yet and the RDF generation is not for example.</p>",
      "summary": "This week the next WikiPathways NAR Database issue paper was published (doi:10.1093/nar/gkad960). It is the next paper in a series of papers about the evolution of the Open Science project for making biological pathways available in a Open and FAIR way. This year, it described that significant move away from MediaWiki. It simply was too costly to keep up with the upstream code base (think: more than 200 thousand euro costly). This paper describes a transition to a modular system with Jekyll and Markdown as new platform technologies. The full details are available as open notebook science: everything is basically a git repository.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/wp-gpml-change-workflow.png",
      "date_published": "2023-11-11T00:00:00+00:00",
      "date_modified": "2023-11-11T00:00:00+00:00",
      "tags": ["wikipathways","git"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1093/NAR/GKAD960", "doi": "10.1093/NAR/GKAD960"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/dtyms-yt012",
      "url": "https://chem-bla-ics.linkedchemistry.info/2023/09/24/ai.html",
      "title": "Artificial intelligence for natural product drug discovery",
      "content_html": "<p>Two weeks ago the write up of a week-long scientific discussions around artificial intelligence for natural product drug discovery\nin Leiden at the <a href=\"https://www.lorentzcenter.nl/\">Lorentz Center</a> got published\n(doi:<a href=\"https://doi.org/10.1038/s41573-023-00774-7\">10.1038/s41573-023-00774-7</a>, <a href=\"https://cris.maastrichtuniversity.nl/en/publications/artificial-intelligence-for-natural-product-drug-discovery\">free PDF</a>).</p>\n\n<p><img src=\"/assets/images/ai.png\" alt=\"Part of the copyrighted Figure 1 from the article. I hope this counts as fair use.\" /></p>\n\n<p>Sadly, the meetings was still during the (partial) lockdown, and I think my contribution could have been\nmore extensive. But I am happy I got to pitch the idea of using Wikidata in this area too, taking advantage\nof the work done by the LOTUS (doi:<a href=\"https://doi.org/10.7554/eLife.70780\">10.7554/eLife.70780</a>) team earlier.</p>\n\n<p>And this is key to me: you cannot do statistics, chemometrics, machine learning, or artificial\nintelligence without good quality linked data. Happy reading!</p>",
      "summary": "Two weeks ago the write up of a week-long scientific discussions around artificial intelligence for natural product drug discovery in Leiden at the Lorentz Center got published (doi:10.1038/s41573-023-00774-7, free PDF).",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/ai.png",
      "date_published": "2023-09-24T00:00:00+00:00",
      "date_modified": "2024-03-18T00:00:00+00:00",
      "tags": ["cheminf","natprod"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1038/s41573-023-00774-7", "doi": "10.1038/s41573-023-00774-7"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.7554/eLife.70780", "doi": "10.7554/eLife.70780"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/7zf38-w9670",
      "url": "https://chem-bla-ics.linkedchemistry.info/2023/09/17/using-fair-for-reuse.html",
      "title": "Using FAIR to select data for reuse",
      "content_html": "<p>This paper got published in July already, but I had not had the time yet to blog about this exciting work by\n<a href=\"https://scholia.toolforge.org/author/Q92131000\">Irini Furxhi</a> and <a href=\"https://scholia.toolforge.org/author/Q86442640\">Ammar Ammar</a>:\n<em>A data reusability assessment in the nanosafety domain based on the NSDRA framework followed by an exploratory\nquantitative structure activity relationships (QSAR) modeling targeting cellular viability</em>\n(doi:<a href=\"https://doi.org/10.1016/j.impact.2023.100475\">10.1016/j.impact.2023.100475</a>)</p>\n\n<p>The study has two sides to it: first, it looks into how far we are with <a href=\"https://en.wikipedia.org/wiki/Quantitative_structure%E2%80%93activity_relationship\">QSAR</a>\nin the field of nanosafety. We have limited data, but this paper got together 34 data sets, and in the model building\nmany different possible factors are explored. Now, as a scholar, I would really want to know which factors are\nreally important. We have been studying this for some time, e.g. in the past RRegrs paper\n(doi:<a href=\"https://doi.org/10.1186/S13321-015-0094-2\">10.1186/S13321-015-0094-2</a>). Basically, I think we still\ndon’t really understand the relation between the data characteristics and the modelling options. When is\ndata rich enough to move from classification to regression? How much (many) exerimental data do we need,\nfor the model to capture a certain applicability domain sufficiently?</p>\n\n<p>Actually, I think the rise of deep learning approaches shows us a few things: more data actually does help.\nBut also, with enough data, the representation becomes less important for the overall pattern. There are\neven hints that deep learning needs a certain level of noise. Did anyone study that phenomenon yet?</p>\n\n<p>Now, the reader of this paper will not be disappointed. The design is complex and there are many small hints\nabout what worked and what did not. But this gets us to the other side of this story.</p>\n\n<p>The second side of this paper is the question whether the level of FAIR-ness helps this QSAR modelling.\nEarlier, Ammar studied the R1.3 aspects of nanosafety research. The R1.3 guiding principle expects that\n<a href=\"https://www.go-fair.org/fair-principles/r1-3-metadata-meet-domain-relevant-community-standards/\">(Meta)data meet domain-relevant community standards</a>.\nAmmar’s research (preprint doi:<a href=\"https://doi.org/10.26434/CHEMRXIV-2022-L8VK8-V2\">10.26434/CHEMRXIV-2022-L8VK8-V2</a>)\nshows we can link this to actual reuse, where QSAR is one of those use cases.\nIn their July paper, they show how we can integrate the use of the community standards\nin a reproducible way to support nanosafety research.</p>\n\n<p>The following screenshot from the article (Figure 2, CC-BY) shows the relation between R1.3 maturity\nindicators and QSAR variables:</p>\n\n<p><img src=\"/assets/images/qsar-maturity-indicators.jpg\" alt=\"\" /></p>\n\n<p>I think Furxhi and Ammar may actually have introduced a new community standard: this is how nanoQSAR\nresearch should be done from now on. Irini and Ammar, thanks for this great collaboration!</p>",
      "summary": "This paper got published in July already, but I had not had the time yet to blog about this exciting work by Irini Furxhi and Ammar Ammar: A data reusability assessment in the nanosafety domain based on the NSDRA framework followed by an exploratory quantitative structure activity relationships (QSAR) modeling targeting cellular viability (doi:10.1016/j.impact.2023.100475)",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/qsar-maturity-indicators.jpg",
      "date_published": "2023-09-17T00:00:00+00:00",
      "date_modified": "2023-09-17T00:00:00+00:00",
      "tags": ["fair","qsar"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1016/J.IMPACT.2023.100475", "doi": "10.1016/J.IMPACT.2023.100475"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1186/S13321-015-0094-2", "doi": "10.1186/S13321-015-0094-2"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.26434/CHEMRXIV-2022-L8VK8-V2", "doi": "10.26434/CHEMRXIV-2022-L8VK8-V2"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/pn744-knt64",
      "url": "https://chem-bla-ics.linkedchemistry.info/2023/09/09/making-bridgedb-derby-files-with-groovy.html",
      "title": "Making BridgeDb Derby files with Groovy",
      "content_html": "<p>I just want to drop this here. There are various ways to make <a href=\"https://www.bridgedb.org/\">BridgeDb</a> identifier mapping files. Some of the tools\npredate my joining the BiGCaT research group and the BridgeDb project, but this Groovy page is basically what we\nhave been using to create the metabolite identifier mapping databases:</p>\n\n<div class=\"language-groovy highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nd\">@Grab</span><span class=\"o\">(</span><span class=\"n\">group</span><span class=\"o\">=</span><span class=\"s1\">'org.bridgedb'</span><span class=\"o\">,</span> <span class=\"n\">module</span><span class=\"o\">=</span><span class=\"s1\">'org.bridgedb.bio'</span><span class=\"o\">,</span> <span class=\"n\">version</span><span class=\"o\">=</span><span class=\"s1\">'3.0.23'</span><span class=\"o\">)</span>\n<span class=\"nd\">@Grab</span><span class=\"o\">(</span><span class=\"n\">group</span><span class=\"o\">=</span><span class=\"s1\">'org.bridgedb'</span><span class=\"o\">,</span> <span class=\"n\">module</span><span class=\"o\">=</span><span class=\"s1\">'org.bridgedb.rdb.construct'</span><span class=\"o\">,</span> <span class=\"n\">version</span><span class=\"o\">=</span><span class=\"s1\">'3.0.23'</span><span class=\"o\">)</span>\n\n<span class=\"kn\">import</span> <span class=\"nn\">java.text.SimpleDateFormat</span><span class=\"o\">;</span>\n<span class=\"kn\">import</span> <span class=\"nn\">java.util.Date</span><span class=\"o\">;</span>\n\n<span class=\"kn\">import</span> <span class=\"nn\">org.bridgedb.IDMapperException</span><span class=\"o\">;</span>\n<span class=\"kn\">import</span> <span class=\"nn\">org.bridgedb.DataSource</span><span class=\"o\">;</span>\n<span class=\"kn\">import</span> <span class=\"nn\">org.bridgedb.Xref</span><span class=\"o\">;</span>\n<span class=\"kn\">import</span> <span class=\"nn\">org.bridgedb.bio.DataSourceTxt</span><span class=\"o\">;</span>\n<span class=\"kn\">import</span> <span class=\"nn\">org.bridgedb.rdb.construct.DBConnector</span><span class=\"o\">;</span>\n<span class=\"kn\">import</span> <span class=\"nn\">org.bridgedb.rdb.construct.DataDerby</span><span class=\"o\">;</span>\n<span class=\"kn\">import</span> <span class=\"nn\">org.bridgedb.rdb.construct.GdbConstruct</span><span class=\"o\">;</span>\n<span class=\"kn\">import</span> <span class=\"nn\">org.bridgedb.rdb.construct.GdbConstructImpl4</span><span class=\"o\">;</span>\n\n<span class=\"n\">DataSourceTxt</span><span class=\"o\">.</span><span class=\"na\">init</span><span class=\"o\">()</span>\n\n<span class=\"n\">GdbConstruct</span> <span class=\"n\">database</span> <span class=\"o\">=</span> <span class=\"n\">GdbConstructImpl4</span><span class=\"o\">.</span><span class=\"na\">createInstance</span><span class=\"o\">(</span>\n  <span class=\"s2\">\"test\"</span><span class=\"o\">,</span> <span class=\"k\">new</span> <span class=\"n\">DataDerby</span><span class=\"o\">(),</span> <span class=\"n\">DBConnector</span><span class=\"o\">.</span><span class=\"na\">PROP_RECREATE</span>\n<span class=\"o\">);</span>\n<span class=\"n\">database</span><span class=\"o\">.</span><span class=\"na\">createGdbTables</span><span class=\"o\">();</span>\n<span class=\"n\">database</span><span class=\"o\">.</span><span class=\"na\">preInsert</span><span class=\"o\">();</span>\n\n<span class=\"n\">inchikeyDS</span> <span class=\"o\">=</span> <span class=\"n\">DataSource</span><span class=\"o\">.</span><span class=\"na\">getExistingBySystemCode</span><span class=\"o\">(</span><span class=\"s2\">\"Ik\"</span><span class=\"o\">)</span>\n<span class=\"n\">lmDS</span> <span class=\"o\">=</span> <span class=\"n\">DataSource</span><span class=\"o\">.</span><span class=\"na\">getExistingBySystemCode</span><span class=\"o\">(</span><span class=\"s2\">\"Lm\"</span><span class=\"o\">)</span>\n<span class=\"n\">swisslipidsDS</span> <span class=\"o\">=</span> <span class=\"n\">DataSource</span><span class=\"o\">.</span><span class=\"na\">getExistingBySystemCode</span><span class=\"o\">(</span><span class=\"s2\">\"Sl\"</span><span class=\"o\">)</span>\n\n<span class=\"n\">String</span> <span class=\"n\">dateStr</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"n\">SimpleDateFormat</span><span class=\"o\">(</span><span class=\"s2\">\"yyyyMMdd\"</span><span class=\"o\">).</span><span class=\"na\">format</span><span class=\"o\">(</span><span class=\"k\">new</span> <span class=\"n\">Date</span><span class=\"o\">());</span>\n<span class=\"n\">database</span><span class=\"o\">.</span><span class=\"na\">setInfo</span><span class=\"o\">(</span><span class=\"s2\">\"BUILDDATE\"</span><span class=\"o\">,</span> <span class=\"n\">dateStr</span><span class=\"o\">);</span>\n<span class=\"n\">database</span><span class=\"o\">.</span><span class=\"na\">setInfo</span><span class=\"o\">(</span><span class=\"s2\">\"DATASOURCENAME\"</span><span class=\"o\">,</span> <span class=\"s2\">\"LIPIDMAPS_SWISSLIPIDS\"</span><span class=\"o\">);</span>\n<span class=\"n\">database</span><span class=\"o\">.</span><span class=\"na\">setInfo</span><span class=\"o\">(</span><span class=\"s2\">\"DATASOURCEVERSION\"</span><span class=\"o\">,</span> <span class=\"s2\">\"LIPID_TEST\"</span><span class=\"o\">);</span>\n<span class=\"n\">database</span><span class=\"o\">.</span><span class=\"na\">setInfo</span><span class=\"o\">(</span><span class=\"s2\">\"DATATYPE\"</span><span class=\"o\">,</span> <span class=\"s2\">\"Metabolite\"</span><span class=\"o\">);</span>\n<span class=\"n\">database</span><span class=\"o\">.</span><span class=\"na\">setInfo</span><span class=\"o\">(</span><span class=\"s2\">\"SERIES\"</span><span class=\"o\">,</span> <span class=\"s2\">\"standard_metabolite\"</span><span class=\"o\">);</span>\n\n<span class=\"n\">ref1</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"n\">Xref</span><span class=\"o\">(</span><span class=\"s2\">\"YECLLIMZHNYFCK-RRNJGNTNSA-J\"</span><span class=\"o\">,</span> <span class=\"n\">inchikeyDS</span><span class=\"o\">,</span> <span class=\"kc\">true</span><span class=\"o\">);</span>\n<span class=\"n\">ref2</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"n\">Xref</span><span class=\"o\">(</span><span class=\"s2\">\"LMFA07050035\"</span><span class=\"o\">,</span> <span class=\"n\">lmDS</span><span class=\"o\">,</span> <span class=\"kc\">false</span><span class=\"o\">);</span>\n<span class=\"n\">database</span><span class=\"o\">.</span><span class=\"na\">addGene</span><span class=\"o\">(</span><span class=\"n\">ref1</span><span class=\"o\">)</span>\n<span class=\"n\">database</span><span class=\"o\">.</span><span class=\"na\">addGene</span><span class=\"o\">(</span><span class=\"n\">ref2</span><span class=\"o\">)</span>\n<span class=\"n\">database</span><span class=\"o\">.</span><span class=\"na\">addLink</span><span class=\"o\">(</span><span class=\"n\">ref1</span><span class=\"o\">,</span> <span class=\"n\">ref1</span><span class=\"o\">)</span>\n<span class=\"n\">database</span><span class=\"o\">.</span><span class=\"na\">addLink</span><span class=\"o\">(</span><span class=\"n\">ref1</span><span class=\"o\">,</span> <span class=\"n\">ref2</span><span class=\"o\">)</span>\n\n<span class=\"n\">ref3</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"n\">Xref</span><span class=\"o\">(</span><span class=\"s2\">\"SLM:000000493\"</span><span class=\"o\">,</span> <span class=\"n\">swisslipidsDS</span><span class=\"o\">,</span> <span class=\"kc\">true</span><span class=\"o\">);</span>\n<span class=\"n\">database</span><span class=\"o\">.</span><span class=\"na\">addGene</span><span class=\"o\">(</span><span class=\"n\">ref3</span><span class=\"o\">)</span>\n<span class=\"n\">database</span><span class=\"o\">.</span><span class=\"na\">addLink</span><span class=\"o\">(</span><span class=\"n\">ref1</span><span class=\"o\">,</span> <span class=\"n\">ref3</span><span class=\"o\">)</span>\n\n<span class=\"n\">database</span><span class=\"o\">.</span><span class=\"na\">commit</span><span class=\"o\">();</span>\n<span class=\"n\">database</span><span class=\"o\">.</span><span class=\"na\">finalize</span><span class=\"o\">();</span>\n</code></pre></div></div>\n\n<p>For the people who have worked with BridgeDb Java in the past, note the new SQL schema 4, as used by the\n<code class=\"language-plaintext highlighter-rouge\">GdbConstructImpl4</code>. This schema allows indicating of an identifiers is outdated/retired/etc. This is\nactually the case for the <code class=\"language-plaintext highlighter-rouge\">LMFA07050035</code> identifiers, and hence the <code class=\"language-plaintext highlighter-rouge\">false</code> parameter in the <code class=\"language-plaintext highlighter-rouge\">new Xref()</code>\ncall.</p>",
      "summary": "I just want to drop this here. There are various ways to make BridgeDb identifier mapping files. Some of the tools predate my joining the BiGCaT research group and the BridgeDb project, but this Groovy page is basically what we have been using to create the metabolite identifier mapping databases:",
      
      "date_published": "2023-09-09T00:00:00+00:00",
      "date_modified": "2023-09-09T00:00:00+00:00",
      "tags": ["groovy","bridgedb"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/zqtdm-66432",
      "url": "https://chem-bla-ics.linkedchemistry.info/2023/09/09/ACSFall2023.html",
      "title": "American Chemical Society Fall 2023 meeting",
      "content_html": "<p>About four weeks ago the <a href=\"https://www.acs.org/meetings/acs-meetings/fall-2023.html\">Fall 2023 American Chemical Society</a>\nmeeting (<a href=\"https://mastodon.social/tags/ACSFall2023\">#ACSFall2023</a>).\nI have attended a few ACS meetings in person and even organized a <a href=\"https://egonw.github.io/acsrdf2010/\">symposium at the 2010 ACS meeting</a>\nin Boston. This time too, I did not participate in person, tho visiting San Francisco again would have been nice. I gave\n<a href=\"https://mastodon.social/@egonw@social.edu.nl/110882509829434765\">two</a> <a href=\"https://mastodon.social/@egonw@social.edu.nl/110883271752255923\">presentations</a>\n(slides doi:<a href=\"https://doi.org/10.5281/zenodo.8255394\">10.5281/zenodo.8255394</a>), but have not uploaded my slides of the first presentation to Zenodo yet.</p>\n\n<p>The theme of the meeting was data, and this resulted in a wealth of presentations with cheminformatics. What is striking\nhere is that a lot of work has not changed so much in 20 years, except for the scale. What I missed here was the large open\ndata sets, but generally the level of open science was heartwarming! So many preprints mentions, GitHub repositories, and Zenodo\ndeposits. The Blue Obelisk was truly ahead of its time, but it is a delight to see the field of chemistry catch up.\nI can now say a lot of about peer review, and why the field is not benefitting from all the experience that exists in the field\nbecause people publish in the wrong journals, but that is for another time.</p>\n\n<p>I attended multiple sessions, which is a bit of a challenge, doing this remotely from Central European Summer Time (CEST).\nOf course, the Sunday started with the <a href=\"https://acs.digitellinc.com/sessions/574129/view\">Chemical informatics (R)evolution: Towards Democratization and Open Science</a>\nsession, where I had my first talk, and later that day the <a href=\"https://acs.digitellinc.com/sessions/573932/view\">Enhance your Data - Smart Ways to Metadata and Knowledge Graphs</a> session,\nwhere I gave a second talk, about <a href=\"https://bioschemas.org/\">Bioschemas</a>’ <code class=\"language-plaintext highlighter-rouge\">ChemicalSubstance</code> and <code class=\"language-plaintext highlighter-rouge\">MolecularEntity</code>. Sadly, I\nhad to leave that meeting early because it was getting too late.</p>\n\n<p>There were so many interesting sessions, I could not attend everything. I also have to go back to all\n<a href=\"https://mastodon.social/tags/ACSFall2023\">my notes</a> and isolate things I want to follow up on, prominently open datasets.</p>\n\n<p>More later.</p>",
      "summary": "About four weeks ago the Fall 2023 American Chemical Society meeting (#ACSFall2023). I have attended a few ACS meetings in person and even organized a symposium at the 2010 ACS meeting in Boston. This time too, I did not participate in person, tho visiting San Francisco again would have been nice. I gave two presentations (slides doi:10.5281/zenodo.8255394), but have not uploaded my slides of the first presentation to Zenodo yet.",
      
      "date_published": "2023-09-09T00:00:00+00:00",
      "date_modified": "2023-09-09T00:00:00+00:00",
      "tags": ["acs","scholia","wikidata","acsfall2023","conference"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.5281/zenodo.8255394", "doi": "10.5281/zenodo.8255394"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/65nqr-3w351",
      "url": "https://chem-bla-ics.linkedchemistry.info/2023/08/18/last-post-here-freebie-model-online.html",
      "title": "Last post there / the Freebie model online",
      "content_html": "<p>This is <a href=\"https://chem-bla-ics.blogspot.com/2023/08/last-post-here-freebie-model-online.html\">my last post</a> on blogger.com. At least, that is the plan. It has been a great 18 years. I like to thank the owners of\nblogger.com and Google later for providing this service. I am continuing the chem-bla-ics on a new domain:\n<a href=\"https://chem-bla-ics.linkedchemistry.info/\">https://chem-bla-ics.linkedchemistry.info/</a></p>\n\n<p>I, like so many others, struggle with choosing open infrastructure versus the freebie model. Of course, we know these things come\nand go. Google Reader, FriendFeed, Twitter/X (see doi:<a href=\"https://doi.org/10.1038/d41586-023-02554-0\">10.1038/d41586-023-02554-0</a>).\nMy new blog is still using the freebie model: I am hosting it on GitHub. But following the advice from a fellow cheminformatician,\nI now front this with a owned domain name.</p>\n\n<p>See you at <code class=\"language-plaintext highlighter-rouge\">linkedchemistry.info</code>!</p>",
      "summary": "This is my last post on blogger.com. At least, that is the plan. It has been a great 18 years. I like to thank the owners of blogger.com and Google later for providing this service. I am continuing the chem-bla-ics on a new domain: https://chem-bla-ics.linkedchemistry.info/",
      
      "date_published": "2023-08-18T00:00:00+00:00",
      "date_modified": "2023-08-18T00:00:00+00:00",
      
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1038/d41586-023-02554-0", "doi": "10.1038/d41586-023-02554-0"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/xr6k8-z4480",
      "url": "https://chem-bla-ics.linkedchemistry.info/2023/08/12/boiling-points-in-wikidata.html",
      "title": "Boiling points in Wikidata",
      "content_html": "<p>Some days ago, I started added boiling points to <a href=\"https://wikidata.org/\">Wikidata</a>, referenced from\n<a href=\"https://scholia.toolforge.org/work/Q22236188\">Basic Laboratory and Industrial Chemicals</a> (wikidata:Q22236188),\n<a href=\"https://scholia.toolforge.org/author/Q18609741\">David R. Lide</a>’s\n‘a CRC quick reference handbook’ from 1993 (well, the edition I have). But Wikidata\n<a href=\"https://www.wikidata.org/wiki/User_talk:Egon_Willighagen#Basic_laboratory_and_industrial_chemicals:_a_CRC_quick_reference_handbook_(Q22236188)\">wants</a>\npressure (wikidata:P2077) info at which the boiling point (wikidata:P2102) was measured. Rightfully so. But I had not added those yet,\nbecause it slows me and can be automated with <a href=\"https://quickstatements.toolforge.org/\">QuickStatements</a>.</p>\n\n<p>I just need a few SPARQL queries to list to which statements the qualifiers needs to be added. Basically, all boiling points which has the\nbook as a reference and that do not have the pressure info. First, there are values with ‘unknown value’, which results in blank nodes\n(by the time you read this, they likely are already fixed):</p>\n\n<div class=\"language-sparql highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">SELECT</span><span class=\"w\"> </span><span class=\"nv\">?cmp</span><span class=\"w\"> </span><span class=\"nv\">?bp</span><span class=\"w\"> </span><span class=\"nv\">?pressure</span><span class=\"w\"> </span><span class=\"k\">WHERE</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\">\n  </span><span class=\"nv\">?cmp</span><span class=\"w\"> </span><span class=\"nn\">p</span><span class=\"o\">:</span><span class=\"ss\">P2102</span><span class=\"w\"> </span><span class=\"nv\">?bpStatement</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n  </span><span class=\"nv\">?bpStatement</span><span class=\"w\"> </span><span class=\"nn\">prov</span><span class=\"o\">:</span><span class=\"ss\">wasDerivedFrom</span><span class=\"o\">/</span><span class=\"nn\">pr</span><span class=\"o\">:</span><span class=\"ss\">P248</span><span class=\"w\"> </span><span class=\"nn\">wd</span><span class=\"o\">:</span><span class=\"ss\">Q22236188</span><span class=\"w\"> </span><span class=\"p\">;</span><span class=\"w\">\n    </span><span class=\"nn\">ps</span><span class=\"o\">:</span><span class=\"ss\">P2102</span><span class=\"w\"> </span><span class=\"nv\">?bp</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n  </span><span class=\"nv\">?bpStatement</span><span class=\"w\"> </span><span class=\"nn\">pq</span><span class=\"o\">:</span><span class=\"ss\">P2077</span><span class=\"w\"> </span><span class=\"nv\">?pressure</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n  </span><span class=\"k\">FILTER</span><span class=\"w\"> </span><span class=\"p\">(</span><span class=\"nb\">contains</span><span class=\"p\">(</span><span class=\"nb\">str</span><span class=\"p\">(</span><span class=\"nv\">?pressure</span><span class=\"p\">),</span><span class=\"w\"> </span><span class=\"s2\">\"http://\"</span><span class=\"p\">))</span><span class=\"w\">\n</span><span class=\"p\">}</span><span class=\"w\">\n</span></code></pre></div></div>\n\n<p>So, to get the list for which I want to write the QuickStatements which does not have any P2077 qualifier yet, I use\n<a href=\"https://query.wikidata.org/#SELECT%20%3Fcmp%20WHERE%20%7B%0A%20%20%3Fcmp%20p%3AP2102%20%3FbpStatement%20.%0A%20%20%3FbpStatement%20prov%3AwasDerivedFrom%2Fpr%3AP248%20wd%3AQ22236188%20%3B%0A%20%20%20%20ps%3AP2102%20%3Fbp%20.%0A%20%20MINUS%20%7B%20%3FbpStatement%20pq%3AP2077%20%3Fpressure%20%7D%0A%7D\">this query</a>:</p>\n\n<div class=\"language-sparql highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">SELECT</span><span class=\"w\"> </span><span class=\"nv\">?cmp</span><span class=\"w\"> </span><span class=\"k\">WHERE</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\">\n  </span><span class=\"nv\">?cmp</span><span class=\"w\"> </span><span class=\"nn\">p</span><span class=\"o\">:</span><span class=\"ss\">P2102</span><span class=\"w\"> </span><span class=\"nv\">?bpStatement</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n  </span><span class=\"nv\">?bpStatement</span><span class=\"w\"> </span><span class=\"nn\">prov</span><span class=\"o\">:</span><span class=\"ss\">wasDerivedFrom</span><span class=\"o\">/</span><span class=\"nn\">pr</span><span class=\"o\">:</span><span class=\"ss\">P248</span><span class=\"w\"> </span><span class=\"nn\">wd</span><span class=\"o\">:</span><span class=\"ss\">Q22236188</span><span class=\"w\"> </span><span class=\"p\">;</span><span class=\"w\">\n    </span><span class=\"nn\">ps</span><span class=\"o\">:</span><span class=\"ss\">P2102</span><span class=\"w\"> </span><span class=\"nv\">?bp</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n  </span><span class=\"k\">MINUS</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\"> </span><span class=\"nv\">?bpStatement</span><span class=\"w\"> </span><span class=\"nn\">pq</span><span class=\"o\">:</span><span class=\"ss\">P2077</span><span class=\"w\"> </span><span class=\"nv\">?pressure</span><span class=\"w\"> </span><span class=\"p\">}</span><span class=\"w\">\n</span><span class=\"p\">}</span><span class=\"w\">\n</span></code></pre></div></div>\n\n<p>At the time of writing, this lists 54 boiling points.</p>\n\n<p>I can the WDQS create CSV-styled QuickStatements with:</p>\n\n<div class=\"language-sparql highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">SELECT</span><span class=\"w\"> </span><span class=\"p\">(</span><span class=\"nb\">SUBSTR</span><span class=\"p\">(</span><span class=\"nb\">STR</span><span class=\"p\">(</span><span class=\"nv\">?cmp</span><span class=\"p\">),</span><span class=\"mi\">32</span><span class=\"p\">)</span><span class=\"w\"> </span><span class=\"k\">AS</span><span class=\"w\"> </span><span class=\"nv\">?qid</span><span class=\"p\">)</span><span class=\"w\"> </span><span class=\"nv\">?P2102</span><span class=\"w\"> </span><span class=\"nv\">?qal2077</span><span class=\"w\"> </span><span class=\"k\">WHERE</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\">\n  </span><span class=\"nv\">?cmp</span><span class=\"w\"> </span><span class=\"nn\">p</span><span class=\"o\">:</span><span class=\"ss\">P2102</span><span class=\"w\"> </span><span class=\"nv\">?bpStatement</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n  </span><span class=\"nv\">?bpStatement</span><span class=\"w\"> </span><span class=\"nn\">prov</span><span class=\"o\">:</span><span class=\"ss\">wasDerivedFrom</span><span class=\"o\">/</span><span class=\"nn\">pr</span><span class=\"o\">:</span><span class=\"ss\">P248</span><span class=\"w\"> </span><span class=\"nn\">wd</span><span class=\"o\">:</span><span class=\"ss\">Q22236188</span><span class=\"w\"> </span><span class=\"p\">;</span><span class=\"w\">\n    </span><span class=\"nn\">ps</span><span class=\"o\">:</span><span class=\"ss\">P2102</span><span class=\"w\"> </span><span class=\"nv\">?P2102</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n  </span><span class=\"k\">MINUS</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\"> </span><span class=\"nv\">?bpStatement</span><span class=\"w\"> </span><span class=\"nn\">pq</span><span class=\"o\">:</span><span class=\"ss\">P2077</span><span class=\"w\"> </span><span class=\"nv\">?pressure</span><span class=\"w\"> </span><span class=\"p\">}</span><span class=\"w\">\n  </span><span class=\"k\">BIND</span><span class=\"w\"> </span><span class=\"p\">(</span><span class=\"s2\">\"101.325U21064807\"</span><span class=\"w\"> </span><span class=\"k\">AS</span><span class=\"w\"> </span><span class=\"nv\">?qal2077</span><span class=\"p\">)</span><span class=\"w\">\n</span><span class=\"p\">}</span><span class=\"w\">\n</span></code></pre></div></div>\n\n<p>Here, the SPARQL variables double as QuickStatement instructions. Finally, note to use of “U21064807” which is the Wikidata item for\nkilopascal (wikidata:Q21064807).</p>\n\n<p>I also need to “add” the boiling point again, to make sure QuickStatements knows which statement to add the qualifier to. I think this\ncan be done better, but not sure how to target statements directly. This is not fool proof: I noted that this approach ignores the\nsituation where there are two statements with the (exact) same boiling point, but different error margins. But that I will monitor\nand where needed correct manually.</p>",
      "summary": "Some days ago, I started added boiling points to Wikidata, referenced from Basic Laboratory and Industrial Chemicals (wikidata:Q22236188), David R. Lide’s ‘a CRC quick reference handbook’ from 1993 (well, the edition I have). But Wikidata wants pressure (wikidata:P2077) info at which the boiling point (wikidata:P2102) was measured. Rightfully so. But I had not added those yet, because it slows me and can be automated with QuickStatements.",
      
      "date_published": "2023-08-12T00:00:00+00:00",
      "date_modified": "2023-08-12T00:00:00+00:00",
      "tags": ["rdf","wikidata","chemistry"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/kxar2-7z367",
      "url": "https://chem-bla-ics.linkedchemistry.info/2023/08/08/history-provenance-detail.html",
      "title": "History, provenance, detail",
      "content_html": "<p>Just a quick note: I just love the level of detail <a href=\"https://www.wikidata.org/\">Wikidata</a> allows us to use. One of the marvels is the\npractices of <code class=\"language-plaintext highlighter-rouge\">named as</code>, which can be used in statements for subject and objects. The notion and importance here is that things are\nreferred to in different ways, and these properties allows us to link the interpretation with the source. For example,\n<a href=\"https://scholia.toolforge.org/author/Q58978\">Max Born</a>’s seminal work <em><a href=\"https://scholia.toolforge.org/work/Q55867811\">Zur Quantenmechanik</a></em>\n(doi:<a href=\"https://doi.org/10.1007/BF01328531\">10.1007/BF01328531</a>) uses a very short notation to cite other literature, as footnotes,\nand DOIs did not exist yet.</p>\n\n<p><img src=\"/assets/images/old_references.png\" alt=\"Screenshot of two references as footnotes on a page with a mathematical formula from the old Born paper from 1925.\" /></p>\n\n<p>So, in Wikidata, you can <a href=\"https://www.wikidata.org/wiki/Q55867811#P2860\">capture this like this</a>:</p>\n\n<p><img src=\"/assets/images/new_old_references.png\" alt=\"Screenshot of the FAIR references from the 1925 Born paper.\" /></p>",
      "summary": "Just a quick note: I just love the level of detail Wikidata allows us to use. One of the marvels is the practices of named as, which can be used in statements for subject and objects. The notion and importance here is that things are referred to in different ways, and these properties allows us to link the interpretation with the source. For example, Max Born’s seminal work Zur Quantenmechanik (doi:10.1007/BF01328531) uses a very short notation to cite other literature, as footnotes, and DOIs did not exist yet.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/old_references.png",
      "date_published": "2023-08-08T00:00:00+00:00",
      "date_modified": "2023-08-08T00:00:00+00:00",
      "tags": ["wikidata","publishing"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1007/BF01328531", "doi": "10.1007/BF01328531"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/nq3e0-09a54",
      "url": "https://chem-bla-ics.linkedchemistry.info/2023/08/04/blog-planets-blogging-about-debian.html",
      "title": "Blog planets: blogging about Debian, GNOME, Wikimedia, FSFE, and many more",
      "content_html": "<p>I am still an avid user of <a href=\"https://en.wikipedia.org/wiki/Category:Web_syndication_formats\">RSS/Atom feeds</a>. I use\n<a href=\"https://feedly.com/\">Feedly</a> daily, partly because of their easy to use app. My blog is part of\n<a href=\"https://planetrdf.com/\">Planet RDF</a>, a <em>blog planet</em>. Blog planets aggregate blogs from many people around a certain topic.\nIt’s like a forum, but open, free, community driven. It’s exactly what the web should be.</p>\n\n<p>It turned out that planets do still exist, so I started a small corner on Wikidata: <a href=\"https://www.wikidata.org/wiki/Q121134938\">Q121134938</a>,\nand a number of <a href=\"https://www.wikidata.org/wiki/Special:WhatLinksHere/Q121134938\">existing blog planets</a>:</p>\n\n<p><img src=\"/assets/images/blog_planets.png\" alt=\"Screenshot of the 'What links here' page for the Wikidata item 'blog planet'.\" /></p>\n\n<p>The software used to run these planets is ancient, though. We need a new generation of software, replacing things like\n<a href=\"https://en.wikipedia.org/wiki/Planet_(software)\">Planet</a>. And I want something people can easily host on GitHub or GitLab Pages or the likes.</p>\n\n<p>I created a minimal shape expression but the Wikidata items for the planets still lack a lot of information that can be added. First,\nwe can think of them as venues, perhaps, where people “publish” their work. Second, we can annotate the blog planets with ‘main subject’\nfor the topics the cover. Or we can list the people that are “author” on the planet; most planets are very transparent about which\nblogs they aggregate.</p>\n\n<p>Love to see where this is going. Who knows? Maybe we will see Postgenomic (see doi:<a href=\"https://doi.org/10.1186/1471-2105-8-487\">10.1186/1471-2105-8-487</a>) and\n<a href=\"https://chem-bla-ics.blogspot.com/search?q=%22chemical+blogspace%22\">Chemical blogspace</a> resurface :)</p>",
      "summary": "I am still an avid user of RSS/Atom feeds. I use Feedly daily, partly because of their easy to use app. My blog is part of Planet RDF, a blog planet. Blog planets aggregate blogs from many people around a certain topic. It’s like a forum, but open, free, community driven. It’s exactly what the web should be.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/blog_planets.png",
      "date_published": "2023-08-04T00:00:00+00:00",
      "date_modified": "2025-02-16T00:00:00+00:00",
      "tags": ["rss","wikidata","cb"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1186/1471-2105-8-487", "doi": "10.1186/1471-2105-8-487"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/nfqxs-qs982",
      "url": "https://chem-bla-ics.linkedchemistry.info/2023/07/27/archiving-and-updating-my-blog.html",
      "title": "Archiving and updating my blog",
      "content_html": "<p>This blog is <a href=\"https://chem-bla-ics.linkedchemistry.info/2005/10/15/chem-bla-ics.html\">almost 18 years old</a> now. I have long wanted\nto migrate it to a version control system and at the same time have more control over things. Markdown would be awesome.\nIn the past year, I learned a lot about the power of <a href=\"https://github.com/jekyll/minima\">Jekyll</a> and needed to get more\nexperienced with it to use it for more databases, like we now do for <a href=\"https://wikipathways.org/\">WikiPathways</a>.</p>\n\n<p>So, time to <a href=\"https://egonw.github.io/blog/\">migrate</a> this blog :) This is probably a multiyear project, so feel free to continue\nreading it hear. Why? Because I start with the old posts :) Along the way, I am fixing things, improving it. I still\nhave plenty on my todo list, but already happy with having learned <a href=\"https://fontawesome.com/\">Font Awesome</a>, which makes\nit easy to annotate with how I fixed broken links (or not). I now use three icons: a box for when I use the\nInternet Archive (they can use your donation); a ‘recycle’ icon when I found a new URL for the same page; and a\nbroken URL link for other situations.</p>\n\n<p>This is what it looks like:</p>\n\n<p><img src=\"/assets/images/new_blog.png\" alt=\"Screenshot of the landing page of the new blog platform.\" /></p>",
      "summary": "This blog is almost 18 years old now. I have long wanted to migrate it to a version control system and at the same time have more control over things. Markdown would be awesome. In the past year, I learned a lot about the power of Jekyll and needed to get more experienced with it to use it for more databases, like we now do for WikiPathways.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/new_blog.png",
      "date_published": "2023-07-27T00:00:00+00:00",
      "date_modified": "2025-02-22T00:00:00+00:00",
      "tags": ["markdown","wikipathways"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/t1zwp-21x05",
      "url": "https://chem-bla-ics.linkedchemistry.info/2023/07/07/universities-and-open-infrastructures.html",
      "title": "Universities and open infrastructures",
      "content_html": "<p>The role of a university is manifold. Being a place where people can find knowledge and the track record how that knowledge was reached is\noften seen as part of that. Over the past decades universities outsources this role, for example to publishers. This is seeing a lot of\ndiscussion and I am happy to see that the <a href=\"https://www.universiteitenvannederland.nl/\">Dutch Universities</a> are\n<a href=\"/2023/07/06/journal-rankings.html\">taking back control</a> <a href=\"https://www.openaire.eu/next-narcis-dutch-research-portal-on-openaire\">fast now</a>.\nFor example, <a href=\"https://mastodon.social/@Radboud_uni\">Radboud University</a> (&gt;1k followers) already joined the Fediverse (Mastodon etc), making\nthem independent from non-EU law and commercial interests. Scientific journals, Nobel Prize winners, etc\n<a href=\"2022-11-21-finding-mastodon-accounts-with-wikidata.markdown\">already joined too  <i class=\"fa-solid fa-recycle fa-xs\"></i></a>, btw.</p>\n\n<p><a href=\"https://netzpolitik.org/2023/a-call-to-action-universities-of-the-world-into-the-fediverse/\">This effort</a> is calling for more universities\nto go into the direction of open infrastructures. I am looking forward to seeing all Dutch Universities post news on Mastodon, post videos\non PeerTube, etc.</p>\n\n<p>Would it not be awesome if the Fediverse would become the new multidimensional knowledge dissemination and peer review system we have all\nbeen waiting for?</p>\n\n<p><strong>Update</strong>: universities with a Mastodon listed in Wikidata on the world map: <a href=\"https://w.wiki/6zR3\">https://w.wiki/6zR3</a></p>",
      "summary": "The role of a university is manifold. Being a place where people can find knowledge and the track record how that knowledge was reached is often seen as part of that. Over the past decades universities outsources this role, for example to publishers. This is seeing a lot of discussion and I am happy to see that the Dutch Universities are taking back control fast now. For example, Radboud University (&gt;1k followers) already joined the Fediverse (Mastodon etc), making them independent from non-EU law and commercial interests. Scientific journals, Nobel Prize winners, etc already joined too , btw.",
      
      "date_published": "2023-07-07T00:00:00+00:00",
      "date_modified": "2024-01-07T00:00:00+00:00",
      "tags": ["openscience","mastodon"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/6n69s-syc29",
      "url": "https://chem-bla-ics.linkedchemistry.info/2023/07/06/journal-rankings.html",
      "title": "Journal Rankings",
      "content_html": "<p>I am pleased to learn that the <a href=\"https://www.universiteitenvannederland.nl/nl_NL/nieuws-detail/nieuwsbericht/915-p-nederlandse-universiteiten-gaan-voortaan-anders-om-met-rankings-p.html\">Dutch Universities start looking at rankings of a more scientific way</a>.\nIt is long overdue that we take scientific peer review of the indicators used in those rankings seriously, instead of hiding beyond\n<a href=\"https://en.wikipedia.org/wiki/Fear,_uncertainty,_and_doubt\">fud</a> around the decline of quality of research.</p>\n\n<p>So, what defines the quality of a journal? Or better, of any scholarly dissemination channel? After all, some databases do better peer review\nthan some journals. Sadly, I am not aware of literature that compares the quality of peer review in databases with that in scientific journals.\nAlso long overdue, in my opinion.</p>\n\n<p>I hope the <a href=\"https://osc-international.com/\">Open Science community</a> will help shape these scholarly dissemination channels, journals included.\nSome ideas, the outlet:</p>\n\n<ul>\n  <li>encourages post-publication peer review</li>\n  <li>communicates the post-publication peer review</li>\n  <li>allows updating easily small fixes and clarifications (no hiding behind the version-of-record)</li>\n  <li>ensures supp info / additional files undergo the same level of peer review</li>\n  <li>use modern solutions for communication (like semantic web technologies)</li>\n  <li>have clear licenses for all aspects of the <a href=\"/2023/07/02/qeios-open-dissemination-platform-for.html\">research output</a></li>\n  <li>actively fight against visual representation only, but provides all data</li>\n  <li>guarantees that supp info / additional files are archived, as the output itself</li>\n  <li>adopts, promotes, requires community standards (including global, unique identifiers)</li>\n</ul>\n\n<p>Okay, these items are pretty broad. Many of them are part of FAIR, but that should not surprise you, because <a href=\"https://doi.org/10.1162/dint_r_00024\">FAIR</a>\nare just applying traditional scholarly approaches, like properly keeping notebooks. It’s just a bit more “digital” then we have been taught.</p>\n\n<p>Do we know how to do this? Yes, pretty much. This is not a technical exercise, but one of social change and particularly willingness. Basically, if you want\nto keep the current way of doing things, the declare you want unreproducible, low quality research reporting. That’s your academic freedom, of course.\nIf I were a funder or a university, I would also expect a bit more in return for my money.</p>\n\n<p>Let me stress, glossy articles are fine! You do not have to stop that. Media appearances, key notes, these are all also fine. They are, however,\ncomplementary. We should not continue the habit of fancy narratives as replacement for quality research dissemination. Do both, if you must.</p>",
      "summary": "I am pleased to learn that the Dutch Universities start looking at rankings of a more scientific way. It is long overdue that we take scientific peer review of the indicators used in those rankings seriously, instead of hiding beyond fud around the decline of quality of research.",
      
      "date_published": "2023-07-06T00:00:00+00:00",
      "date_modified": "2025-02-16T00:00:00+00:00",
      "tags": ["publishing"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1162/dint_r_00024", "doi": "10.1162/dint_r_00024"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/nq7bp-cqj08",
      "url": "https://chem-bla-ics.linkedchemistry.info/2023/07/02/qeios-open-dissemination-platform-for.html",
      "title": "Qeios, an open dissemination platform for research output",
      "content_html": "<p>A bit over a year ago I got introduced to <a href=\"https://www.qeios.com/\">Qeios</a> when I was asked to review an article by Michie, West, and Hasting:\n<em>“Creating ontological definitions for use in science”</em> (doi:<a href=\"https://doi.org/10.32388/YGIF9B.2\">10.32388/YGIF9B.2</a>). I wrote up my thoughts after\nreading the paper, and the review was posted openly online and got a <a href=\"https://doi.org/10.32388/7MQYM4\">DOI</a>. Not the first platform to do this\n(think F1000), but it is always nice to see some publishers taking publishing seriously. Since then, I reviewed\n<a href=\"https://www.qeios.com/read/ZJ4QDA\">two</a> <a href=\"https://www.qeios.com/read/YCHHA7\">more</a> papers.</p>\n\n<p>One of these latter two was not a more traditional paper, but a different kind of <strong>research output</strong>: a definition, about “<em>Drive-by Curation</em>”\n(doi:<a href=\"https://doi.org/10.32388/KBX9VO\">10.32388/KBX9VO</a>). Now about this output type, collaboratively working on definitions is something core to\nontology development (e.g. see doi:<a href=\"https://doi.org/10.1186/s13326-015-0005-5\">10.1186/s13326-015-0005-5</a>), but there is a clear need to discuss\nterminology. The <a href=\"https://www.h2020gracious.eu/\">GRACIOUS</a> project in the <a href=\"https://www.nanosafetycluster.eu/\">EU NanoSafety Cluster</a> also recognized\nthis and set up a tool for this, their <a href=\"https://terminology-harmonizer.greendecision.eu/\">Terminology Harmonizer</a>\n(doi:<a href=\"https://doi.org/10.1016/j.impact.2021.100366\">10.1016/j.impact.2021.100366</a>).</p>\n\n<p>This GRACIOUS tool, much more than what Qeios does, helps users. Unfortunately, and why how these topics nicely come together, writing definitions,\nthinking about when some zeta potential is different from another zeta potential, and the (drive-by) community curation, it needs transparency.\nI understand it, but landing on a login page is for me a recipe for a silent death as it disallows people to learn, without making an (time)\ninvestment first. That is what Qeios does differently: it is more FAIR.</p>\n\n<p>So, that brings me to my last point in this post. Jente Houweling and I wrote up a definition for “<em>Research Output Management</em>”\n(doi:<a href=\"https://doi.org/10.32388/ZNWI7T\">10.32388/ZNWI7T</a>), based on our discussions about her research insights. See the screenshot below.</p>\n\n<p>It has been reviewed internally, and by one independent peer (doi:<a href=\"https://doi.org/10.32388/C3SJTN\">10.32388/C3SJTN</a>). But we would love to hear\nyour review too. Just follow the instructions online. We are looking forward to reading your thoughts and to refining our definition.</p>\n\n<p><img src=\"/assets/images/qeios_romp.png\" alt=\"Screenshot of the Qeios page for the Research Output Management paper.\" /></p>",
      "summary": "A bit over a year ago I got introduced to Qeios when I was asked to review an article by Michie, West, and Hasting: “Creating ontological definitions for use in science” (doi:10.32388/YGIF9B.2). I wrote up my thoughts after reading the paper, and the review was posted openly online and got a DOI. Not the first platform to do this (think F1000), but it is always nice to see some publishers taking publishing seriously. Since then, I reviewed two more papers.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/qeios_romp.png",
      "date_published": "2023-07-02T00:00:00+00:00",
      "date_modified": "2025-02-16T00:00:00+00:00",
      "tags": ["rom","publishing"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.32388/YGIF9B.2", "doi": "10.32388/YGIF9B.2"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.32388/7MQYM4", "doi": "10.32388/7MQYM4"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.32388/ZJ4QDA", "doi": "10.32388/ZJ4QDA"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.32388/YCHHA7", "doi": "10.32388/YCHHA7"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.32388/KBX9VO", "doi": "10.32388/KBX9VO"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1186/S13326-015-0005-5", "doi": "10.1186/S13326-015-0005-5"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1016/j.impact.2021.100366", "doi": "10.1016/j.impact.2021.100366"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.32388/ZNWI7T", "doi": "10.32388/ZNWI7T"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.32388/C3SJTN", "doi": "10.32388/C3SJTN"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/jse00-g6y66",
      "url": "https://chem-bla-ics.linkedchemistry.info/2023/07/01/twitter-exits-fair-and-is-no-longer.html",
      "title": "Twitter exits FAIR and is no longer a dissemination solution",
      "content_html": "<p>And just like that, without a warning, Twitter changed policies again, and you now need a Twitter account and be logged in to see public tweets:\n<a href=\"https://www.theverge.com/2023/6/30/23779764/twitter-blocks-unregistered-users-account-tweets\">Twitter has started blocking unregistered users</a>\n(The Verge). Though I learned it first via Mastodon, of course.</p>\n\n<p>For example, this is what happens when you go to <a href=\"http://twitter.com/wikipathways\">twitter.com/wikipathways</a>:</p>\n\n<p><img src=\"/assets/images/twitter_login.png\" alt=\"Screenshot of the Twitter login page.\" /></p>\n\n<p>Fortunately, <a href=\"https://wikipathways.org/\">WikiPathways</a> does have a <a href=\"https://fosstodon.org/@wikipathways\">Mastodon account</a>,\nthat anyone can see without having a Mastodon account. You can even follow WikiPathways’s account with\n<a href=\"https://fosstodon.org/@wikipathways.rss\">its RSS feed</a>. Dissemination should not be paywalled.</p>\n\n<p>Maybe Musk has been talking to Elsevier and Springer Nature.</p>\n\n<p>Tip: <a href=\"https://chem-bla-ics.linkedchemistry.info/2022/11/21/finding-mastodon-accounts-with-wikidata.html\">Finding Mastodon accounts with Wikidata (a few SPARQL queries) <i class=\"fa-solid fa-recycle fa-xs\"></i></a></p>\n\n<p><strong>Update</strong>: <a href=\"https://tweakers.net/nieuws/211364/musk-blokkeren-van-niet-ingelogde-gebruikers-op-twitter-is-tijdelijke-maatregel.html\">Musk</a> said this\nwas a temporary measure. The problem was scraping of content, you know, the content we openly share on Twitter. Maybe they could have done this\nwith APIs. Oh wait, they closed those behind a very expensive paywall.</p>\n\n<p><strong>Update 2</strong>: Another rumor is that the forgot to make a deal with a cloud provider and suddenly were left with a fraction of the computing power.</p>\n\n<p><strong>Update 3</strong>: The access has been restored, so you can start scraping/archiving all interesting tweets again.</p>",
      "summary": "And just like that, without a warning, Twitter changed policies again, and you now need a Twitter account and be logged in to see public tweets: Twitter has started blocking unregistered users (The Verge). Though I learned it first via Mastodon, of course.",
      
      "date_published": "2023-07-01T00:00:00+00:00",
      "date_modified": "2024-11-02T00:00:00+00:00",
      "tags": ["twitter","mastodon","wikipathways"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/b60bz-ark18",
      "url": "https://chem-bla-ics.linkedchemistry.info/2023/06/11/community-activity-2-fairsharing.html",
      "title": "Community activity #2: FAIRsharing",
      "content_html": "<p>Some years ago we started the <a href=\"https://elixir-europe.org/communities/toxicology\">ELIXIR Toxicology Community</a>. It has been an interesting journey,\npartly covered in <a href=\"https://f1000research.com/articles/10-1129/v1\">this whitepaper</a>). We started with interaction we had in several projects already,\nbut particularly the potential. I see this. This series of posts is a number of things toxicology projects can do to benefit from ELIXIR solutions\n(“<a href=\"https://elixir-europe.org/services\">services</a>”). The posts have been sent first to the ELIXIR Toxicology Community mailing list (please join!).</p>\n\n<h3 id=\"history\">History</h3>\n\n<p>In this post, let’s look at <a href=\"https://fairsharing.org/\">FAIRsharing</a>. It is “A curated, informative and educational resource on data and metadata standards,\ninter-related to databases and data policies” [0,1].</p>\n\n<p>The ELIXIR Toxicology Community (we) maintains the toxicology corner of this database and members of our community have been adding toxicology-related\ndatabases, relevant standards. On the side of the policies we are falling a bit short:\n<a href=\"https://fairsharing.org/Toxicology\">fairsharing.org/Toxicology</a>.</p>\n\n<h3 id=\"why-adopt-fairsharing\">Why adopt FAIRsharing</h3>\n\n<p>FAIRsharing is one place where metadata can be shared about your databases. It helps make your resources and research more FAIR and explains people\nhow your work relates to other work (<a href=\"https://fairsharing.org/graph/3496\">fairsharing.org/graph/3496</a>):</p>\n\n<p><img src=\"/assets/images/fairsharing_toxicology.png\" alt=\"Screenshot of the 'collects' graph of the FAIRsharing Toxicology Community.\" /></p>\n\n<h3 id=\"what-you-can-do\">What you can do</h3>\n\n<p>Get an account (with your ORCID or GitHub account) and add resources important to your research, your projects, your work generally. Particularly,\n(data) policies and standards you are expected to comply with are useful. Also, links between various resources. For example, if some (project)\ndatabase complies with an important policy or standards, this is worth seeing show up.</p>\n\n<p>Alternatively, join the ELIXIR Toxicology Community <a href=\"https://doi.org/10.1162/dint_r_00024\">mailing list</a> and post the missing resource there,\nor use our issue tracker at <a href=\"https://github.com/elixir-europe/toxicology-community/issues/\">github.com/elixir-europe/toxicology-community/issues/</a>.</p>\n\n<p>Let’s make toxicology more <a href=\"https://doi.org/10.1162/dint_r_00024\">FAIR</a>.</p>\n\n<p>0.<a href=\"https://www.nature.com/articles/s41587-019-0080-8\">https://www.nature.com/articles/s41587-019-0080-8</a>\n1.<a href=\"https://scholia.toolforge.org/work/Q64084285\">https://scholia.toolforge.org/work/Q64084285</a></p>",
      "summary": "Some years ago we started the ELIXIR Toxicology Community. It has been an interesting journey, partly covered in this whitepaper). We started with interaction we had in several projects already, but particularly the potential. I see this. This series of posts is a number of things toxicology projects can do to benefit from ELIXIR solutions (“services”). The posts have been sent first to the ELIXIR Toxicology Community mailing list (please join!).",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/fairsharing_toxicology.png",
      "date_published": "2023-06-11T00:00:00+00:00",
      "date_modified": "2025-02-16T00:00:00+00:00",
      "tags": ["elixir","fair","toxicology"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.12688/f1000research.74502.1", "doi": "10.12688/f1000research.74502.1"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1038/s41587-019-0080-8", "doi": "10.1038/s41587-019-0080-8"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1162/dint_r_00024", "doi": "10.1162/dint_r_00024"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/r68sc-a1t25",
      "url": "https://chem-bla-ics.linkedchemistry.info/2023/05/31/information-retrieval-versus-chatgpt.html",
      "title": "Information Retrieval versus ChatGPT",
      "content_html": "<p>When last week in a large (and relevant) Dutch research event ChatGPT came up, and that this was going to change the world. Even the critiques came up,\nbut were effectively disregarded with “these methods get better very quickly”. This is not untrue, but not really true either. I murmur “not even wrong”.\nI know how hard it is to get computers to find meaningful patters; I did a PhD in this in the early 21st century.</p>\n\n<p>What strikes me, is that ChatGPT is now pitches as an informational retrieval (IR) system. This is a system where it tries to find information, that is,\nit “retrieves” information form a knowledge base. Like SQL or SPARQL. Or like Google Maps. IR about reproducing existing knowledge.</p>\n\n<p>Now, deep learning starts with a different premise: we can find the patterns and in this way compress an unlimited number of facts into a mathematical\nequation, a physical law. That way, you do not have to record if the sun comes up every day. We predict it does. We do not have to record that rain drop\nwill fall (that they do. when they do that actually is something to record). At best, we would record when rain drops start “falling” to the sky.\nThat is, we have the laws of gravitation.</p>\n\n<p>But here lies the problem with systems like ChatGPT: they are as good as their predictive patterns they learned. But they do not retrieve information.\nThey predict information. This is why it doesn’t know about references. It lost the link between predictions and on which shelf the the book was stored.</p>\n\n<p>So, when last week the research event mentioned that lawyers were starting to use it, citing existing work, I was skeptical: that would actually mean\nthey moved ChatGPT into IR. And I already had learned (*) that ChatGPT would predict references, rather than look them up. It’s a prediction method,\nnot an IR method. So, how come it would accurately give citations to court cases.</p>\n\n<p>It didn’t. It’s all over the news now. It “hallucinated” legal citations.</p>\n\n<p>Does this matter? I think it does. This is why I moved my research focus after my PhD back to IR, away from the machine learning. Deep learning can only\ngeneralize the facts, so we better start accurately recording facts. This is why I study interoperable and reusable knowledge bases, like WikiPathways,\nWikidata, technologies like RDF in science, etc. Actually, this realization predates my machine learning. I guess I already had this notion when I\nstarted the <em>Woordenboek Organische Chemie</em> back in the nineties.</p>\n\n<p>Someone has to. I just hope the funding for this fundamental aspect of research doesn’t run out. Information Retrieval will remain essential to science\nfor a few decades more.</p>",
      "summary": "When last week in a large (and relevant) Dutch research event ChatGPT came up, and that this was going to change the world. Even the critiques came up, but were effectively disregarded with “these methods get better very quickly”. This is not untrue, but not really true either. I murmur “not even wrong”. I know how hard it is to get computers to find meaningful patters; I did a PhD in this in the early 21st century.",
      
      "date_published": "2023-05-31T00:00:00+00:00",
      "date_modified": "2023-05-31T00:00:00+00:00",
      "tags": ["ml","rdf"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/16tan-06d32",
      "url": "https://chem-bla-ics.linkedchemistry.info/2023/05/22/paper-fair-cookbook-essential-resource.html",
      "title": "Paper: The FAIR Cookbook - the essential resource for and by FAIR doers",
      "content_html": "<p>I think that if you want to make your knowledge FAIR, you should use an open license and RDF. Simple. Now, not everything is knowledge.\nA lot of data is, but a lot more is not, think raw data. Using RDF to explain a protein sequence is still something that makes me feel uneasy.</p>\n\n<p>However, first, you need to make RDF, you need to make assumptions explicit, you need to decide on meaning. Making RDF is not easy.\nIt’s not hard, just a lot of administration and scientific thinking. What did I measure? What model do I use to describe the chemistry?\nYou know, my research job.</p>\n\n<p>Moreover, not only data should be FAIR. All research output (worth communicating) should be FAIR.</p>\n\n<p>In the past, Andra Waagmeester invited me to co-author a recipe that explains the\n<a href=\"http://www.openphacts.org/specs/2013/WD-rdfguide-20131007/\">general steps of creating RDF</a>. This was during the Open PHACTS project and with Carina Haupt.\nWriting recipes is something getting traction. They are a bit like <a href=\"https://r-pkgs.org/vignettes.html\">vignettes from the R world</a>.</p>\n\n<p>In the past few years the <a href=\"https://cordis.europa.eu/project/id/802750\">FAIRplus project</a> created a\n<a href=\"https://faircookbook.elixir-europe.org/\">FAIR Cookbook</a> with recipes and I wrote a few. Actually, I still have a few to finish,\nfor which I cannot find the time. I retrospect, I spent too much time on perfecting the recipe to finish them earlier. The FAIR Cookbook\nis now a professional venue with editorial board. It is fully open source and welcomes your recipes. Oh, and it is now hosted as ELIXIR service,\nwhich is great to see!</p>\n\n<p>Finally, the <a href=\"https://doi.org/10.1038/s41597-023-02166-3\">The FAIR Cookbook - the essential resource for and by FAIR doers paper</a>\nis out. Go read it :)</p>\n\n<p><img src=\"https://media.springernature.com/full/springer-static/image/art%3A10.1038%2Fs41597-023-02166-3/MediaObjects/41597_2023_2166_Fig2_HTML.png?as=webp\" alt=\"Screenshot of a FAIR Cookbook recipe showing the infobox at the top (with reading time, difficulty indicator (4/5 flames), the audience (PIs, ontologists, data scholars), and the author list with ORCID, affiliation, and CReDIT annotation.)\" /></p>\n<center>\nFigure 2 from the article: 'Citability of recipes and identification of and credit for authors; an example is provided.'\n</center>",
      "summary": "I think that if you want to make your knowledge FAIR, you should use an open license and RDF. Simple. Now, not everything is knowledge. A lot of data is, but a lot more is not, think raw data. Using RDF to explain a protein sequence is still something that makes me feel uneasy.",
      "image": "https://chem-bla-ics.linkedchemistry.infohttps://media.springernature.com/full/springer-static/image/art%3A10.1038%2Fs41597-023-02166-3/MediaObjects/41597_2023_2166_Fig2_HTML.png?as=webp",
      "date_published": "2023-05-22T00:00:00+00:00",
      "date_modified": "2025-02-16T00:00:00+00:00",
      "tags": ["fair"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1038/s41597-023-02166-3", "doi": "10.1038/s41597-023-02166-3"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/jakew-pe809",
      "url": "https://chem-bla-ics.linkedchemistry.info/2023/04/02/cito-updates-4-annotations-in-datasets.html",
      "title": "CiTO updates #4: annotations in datasets",
      "content_html": "<p>Okay, <a href=\"https://jcheminf.biomedcentral.com/articles/10.1186/s13321-023-00683-2\">the Pilot</a>\n<a href=\"https://jcheminf.biomedcentral.com/articles/10.1186/s13321-023-00684-1\">is over</a> ending with 17 papers, 16 of which have CiTO\nannotations (and so far 4 J.Cheminform. <a href=\"https://doi.org/10.1186/s13321-022-00656-x\">papers</a>\n<a href=\"https://doi.org/10.1186/s13321-022-00673-w\">after</a> <a href=\"https://doi.org/10.1186/s13321-022-00677-6\">the</a>\n<a href=\"https://doi.org/10.1186/s13321-023-00701-3\">pilot</a>), but my interest in the\n<a href=\"http://purl.org/spar/cito\">Citation Typing Ontology</a> continues and we just need\n<a href=\"https://chem-bla-ics.blogspot.com/2023/02/citation-typing-progress-but-we-need.html\">more adoption</a>.</p>\n\n<p><strong>Datasets as source of annotations</strong></p>\n\n<p>So, here’s a quick <a href=\"https://wikidata.org/\">Wikidata</a> update. I have been using Wikidata as infrastructure to collect and share CiTO\nannotations (see also the below “Scholia patch” posts). Some time ago I recovered my CiteULike CiTO annotations and made this\n<a href=\"https://scholia.toolforge.org/work/Q115470140\">available on Zenodo</a> (doi:<a href=\"https://doi.org/10.5281/ZENODO.7368209\">10.5281/zenodo.7368209</a>).</p>\n\n<p>And while thinking about datasets with CiTO annotations, I found two other datasets. One was from an article in Portuguese and one from an\n<a href=\"https://scholia.toolforge.org/work/Q117369886\">article by Peroni et al.</a> with\n<a href=\"https://zenodo.org/record/6885109\">this data file</a>. That data file is actually a zip, but inside the zip file is a CSV file with three\ninteresting columns: <code class=\"language-plaintext highlighter-rouge\">cited_doi</code>, <code class=\"language-plaintext highlighter-rouge\">citing_doi</code>, and <code class=\"language-plaintext highlighter-rouge\">intext_citation.intent</code>. There are many more columns and I can highly recommend browsing\nthem. But these are the three I need to add data to Wikidata. The third column is free text, but using the CiTO for labels, making it\nrelatively easy to convert to <a href=\"https://w.wiki/62sR\">citation intentions from Wikidata</a>\n(PS, thanks to <a href=\"https://www.wikidata.org/wiki/User:Fvtvr3r\">Fvtvr3r</a> for adding more!).</p>\n\n<p>So, I had a cleaned file and started writing a Groovy Bioclipse script using <a href=\"https://doi.org/10.21105/joss.02558\">Bacting</a>.\nIt basically does a few things: extract all DOIs, check which ones are in Wikidata, analyze the <code class=\"language-plaintext highlighter-rouge\">intext_citation.intent</code> column content,\nand then generate QuickStatements (see <a href=\"https://gist.github.com/egonw/f74fd3bc1f6361434b042a4cac2a8089\">this gist</a>). Out of the 600\nlines from the input, it creates some 200 new CiTO-annotated citations in Wikidata between\n<a href=\"https://scholia.toolforge.org/work/Q117357537#statements\">some 150 article pairs</a>:</p>\n\n<p><img src=\"/assets/images/Screenshot_20230402_084711.png\" alt=\"\" /></p>\n\n<p>The ability to include CiTO annotations from datasets is another welcome boost for the CiTO statistics in Wikidata.\n<a href=\"https://w.wiki/6XQf\">This SPARQL query</a> shows an overview of sources that support the CiTO intention annotation, but note that\na claim with a CiTO intention may also have CrossRef, PubMed, and COCI as reference. In those cases, they are primarily for\nthe citations and not the intention.</p>\n\n<p>There are <a href=\"https://scholar.social/@egonw/110124747053293502\">now</a> (the <a href=\"https://scholia.toolforge.org/cito/#statistics\">latest stats are here</a>)\n<strong>1202 citation intention</strong> annotations in Wikidata for 992 citations from <strong>405 articles in 199 venues</strong>. Of these 27 articles have\nexplicit annotations in the article itself and are found in 4 venues, two journals and two preprint servers). These annotated citations\nare to 510 articles in 190 different venues. <a href=\"https://github.com/WDscholia/scholia/pull/2271\">This Scholia patch</a> will add a new\nstatistics, the number of datasets providing citation intentions, of which there are (as discussed)\n<a href=\"https://scholia.toolforge.org/topic/Q115470140\">currently</a> <a href=\"https://scholia.toolforge.org/work/Q117357537\">two</a> in Wikidata.\nThe latter two provide intentions for the majority of articles and are depicted in yellow in the below overview.</p>\n\n<p><img src=\"/assets/images/Screenshot_20230402_085317.png\" alt=\"\" /></p>\n\n<p>With an annotation in <a href=\"https://www.wikidata.org/wiki/Q27638524\">an 1938 article by Alan Turing</a>! I ran into this article in November 2011\nnoting an apparent duplicate title in his article list. I turned out an earlier article had a correction with the same name.\nI added <a href=\"https://www.wikidata.org/w/index.php?title=Q27638524&amp;diff=1527020358&amp;oldid=984628387&amp;diffmode=source\">this clarification</a>:</p>\n\n<p><img src=\"/assets/images/Screenshot_20230402_090600.png\" alt=\"\" /></p>\n\n<p>This is very trivial citation intention data that publishers could provide as open data.</p>\n\n<p>Okay, that will do for today. There are actually some really interesting things in the pipeline, but I will have to write about that later. I have some deadlines I should start looking at. Below is some extra reading.\nSome more history</p>\n\n<ul>\n  <li>2021: <a href=\"https://chem-bla-ics.linkedchemistry.info/2021/11/15/biohackathon-europe-2021-1-cito.html\">BioHackathon Europe 2021 #1: CiTO annotations in BioHackrXiv <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n  <li>2021: <a href=\"https://chem-bla-ics.blogspot.com/2021/03/markdown-template-for-journal-of.html\">Markdown template for the Journal of Cheminformatics with CiTO support</a></li>\n  <li>2020: <a href=\"https://chem-bla-ics.linkedchemistry.info/2020/11/30/cito-updates-3-third-paper-in.html\">CiTO updates #3: third paper in the collection and updated Scholia patch <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n  <li>2020: <a href=\"https://chem-bla-ics.linkedchemistry.info/2020/11/01/cito-updates-2-annotation-migration-to.html\">CiTO updates #2: annotation migration to Wikidata and first Scholia patch <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n  <li>2020: <a href=\"https://chem-bla-ics.linkedchemistry.info/2020/11/01/cito-updates-1-first-research-paper-in.html\">CiTO updates #1: first research paper in the Journal of Cheminformatics with CiTO annotation published <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n  <li>July 2020: <a href=\"https://chem-bla-ics.blogspot.com/2020/07/new-editorial-adoption-of-citation.html\">New Editorial: “Adoption of the Citation Typing Ontology by the Journal of Cheminformatics”</a></li>\n  <li>2015: <a href=\"https://chem-bla-ics.blogspot.com/2015/03/what-youre-doing-is-rather-desperate.html\">“What You’re Doing Is Rather Desperate”</a></li>\n  <li>2012: <a href=\"https://chem-bla-ics.linkedchemistry.info/2012/02/23/cito-citeulike-publishing-innovation.html\">CiTO / CiteULike: publishing innovation <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n  <li>2010: <a href=\"https://chem-bla-ics.linkedchemistry.info/2010/10/31/citeulike-cito-use-case-1-wordles.html\">CiteULike CiTO Use Case #1: Wordles <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n  <li>September 2010: <a href=\"https://chem-bla-ics.linkedchemistry.info/2010/09/17/list-of-things-i-miss-in-citeulike.html\">A list of things I miss in CiteULike <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n</ul>",
      "summary": "Okay, the Pilot is over ending with 17 papers, 16 of which have CiTO annotations (and so far 4 J.Cheminform. papers after the pilot), but my interest in the Citation Typing Ontology continues and we just need more adoption.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/Screenshot_20230402_085317.png",
      "date_published": "2023-04-02T00:00:00+00:00",
      "date_modified": "2025-12-29T00:00:00+00:00",
      "tags": ["cito","data","scholia"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1186/s13321-023-00683-2", "doi": "10.1186/s13321-023-00683-2"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1186/s13321-023-00684-1", "doi": "10.1186/s13321-023-00684-1"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1186/s13321-022-00656-x", "doi": "10.1186/s13321-022-00656-x"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1186/s13321-022-00673-w", "doi": "10.1186/s13321-022-00673-w"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1186/s13321-022-00677-6", "doi": "10.1186/s13321-022-00677-6"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1186/s13321-023-00701-3", "doi": "10.1186/s13321-023-00701-3"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1162/QSS_A_00222", "doi": "10.1162/QSS_A_00222"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.5281/zenodo.5155219", "doi": "10.5281/zenodo.5155219"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.21105/joss.02558", "doi": "10.21105/joss.02558"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.5281/ZENODO.7368209", "doi": "10.5281/ZENODO.7368209"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/qhhk0-prq78",
      "url": "https://chem-bla-ics.linkedchemistry.info/2023/01/27/scholia-timeline.html",
      "title": "Scholia timeline",
      "content_html": "<p><img style=\"float: right;\" src=\"/assets/images/Scholia_work_profile_screenshot_as_of_2018-09-04.png\" width=\"300\" />\nSometimes I think back about how <a href=\"https://scholia.toolforge.org/\">Scholia</a> started, and then I think I remember a\nTwitter discussion. Twitter was a social platform that was unable to fight hate speech. I left it last year in favor\nof <a href=\"https://scholar.social/@egonw\">Mastodon</a>.</p>\n\n<p>Anyway, I did some digging today and found <a href=\"https://web.archive.org/web/20230402075737/https://twitter.com/fnielsen/status/785008295505489920\">this thread <i class=\"fa-solid fa-box-archive fa-xs\"></i></a> from\nOctober 8-9 2016. A few days earlier, Finn has created a profile based on data in Wikidata on his homepage,\n<a href=\"https://twitter.com/egonwillighagen/status/783190125882777600\">which I was very happy about <i class=\"fa-solid fa-link-slash fa-xs\"></i></a>. You can see how\n<a href=\"https://twitter.com/ReaderMeter/status/784810921029881856\">Dario suggests <i class=\"fa-solid fa-link-slash fa-xs\"></i></a> to put that webpage up on Toolforge.\nFor completeness, this is <a href=\"https://github.com/WDscholia/scholia/commit/484104fdf60e4d8384b9816500f2826dbfe064ce.patch\">the first commit</a>,\nOctober 9.</p>\n\n<p>This chat was after <a href=\"https://fosstodon.org/@fnielsen\">@fnielsen</a>’s <a href=\"https://finnaarupnielsen.wordpress.com/2016/09/30/the-wikidata-scholarly-profile-page/\">blog post</a>\nabout the idea of the needed open infrastructure and a possible <a href=\"https://wikidata.org/\">Wikidata</a> solution from\nSeptember 2016. Finally, it was also only half a year before Scholia got\n<a href=\"https://www.nature.com/articles/nature.2017.21800\">mentioned in Nature</a>.</p>\n\n<p>BTW, at the time there still was a focus on bibliographic information. We learned since that the Wikidata platform\ncannot technically meet the needs, at least not at this moment. Instead, the focus is now much more about the\nliterature that supports the knowledge in Wikidata and Wikipedia and make that as interoperable as possible.</p>",
      "summary": "Sometimes I think back about how Scholia started, and then I think I remember a Twitter discussion. Twitter was a social platform that was unable to fight hate speech. I left it last year in favor of Mastodon.",
      
      "date_published": "2023-01-27T00:00:00+00:00",
      "date_modified": "2025-04-20T00:00:00+00:00",
      "tags": ["scholia","twitter","wikidata"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1038/nature.2017.21800", "doi": "10.1038/nature.2017.21800"
             }
            
          
        ],
      
      "_funding": [{"award": { "title" : "To support development on Scholia, a software tool to facilitate the exploration and curation of the research literature", "acronym" : "Scholia", "uri" : "https://sloan.org/grant-detail/G-2019-11458" }, "funder": { "name": "Alfred P. Sloan Foundation", "ror": "052csg198" } }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/ncbjx-8yt52",
      "url": "https://chem-bla-ics.linkedchemistry.info/2023/01/15/doing-open-science-challenge.html",
      "title": "Doing the &quot;Open Science Challenge&quot;",
      "content_html": "<p><span style=\"width: 30%; display: block; margin-left: auto; margin-right: auto; float: right\">\n<img src=\"/assets/images/openscience_challenge2003.png\" /> <br />\nScreenshot of the <a href=\"https://heidiseibold.ck.page/opensciencechallenge\">sign up page</a>.\n</span>\nTriggered by the “reflections on your career” in the announcement I decide to give the <em><a href=\"https://heidiseibold.ck.page/opensciencechallenge\">Open Science Challenge</a></em>\nby <a href=\"https://fosstodon.org/@HeidiSeibold\">Heidi Seibold</a> a try: “12 emails over the course of a month that are designed to help you on your Open Science journey.”</p>\n\n<p>I will post here my replies to the various challenges, by linking to the first Mastodon, allowing you to follow the replies:</p>\n\n<ul>\n  <li>Day 1: <a href=\"https://akademienl.social/@egonw/109670641195409165\">Why am I participating</a></li>\n  <li>Day 2: <a href=\"https://akademienl.social/@egonw/109680882466924235\">Your Open Science peers</a></li>\n  <li>Day 3: <a href=\"https://akademienl.social/@egonw/109692571311027920\">Write down all of your projects</a> and <a href=\"https://akademienl.social/@egonw/109692658433491610\">put them in a (im)portant/(un)passionate matrix</a></li>\n  <li>Day 4: <a href=\"https://akademienl.social/@egonw/109725863715473896\">Stop working on your CV</a></li>\n  <li>Day 5: <a href=\"https://akademienl.social/@egonw/109731992239207995\">Open Materials</a></li>\n  <li>Day 6: <a href=\"https://akademienl.social/@egonw/109771672138102276\">Open Code</a></li>\n  <li>Day 7: <a href=\"https://akademienl.social/@egonw/109891076560277054\">Mindsets that hold you back</a></li>\n  <li>Day 8: <a href=\"https://akademienl.social/@egonw/109930364875787030\">Science Communication</a></li>\n  <li>Day 9: <a href=\"https://akademienl.social/@egonw/109970899019771144\">Social Change</a></li>\n  <li>Day 10: <a href=\"https://akademienl.social/@egonw/110010850838483320\">Open Access</a></li>\n  <li>Day 11: <a href=\"https://akademienl.social/@egonw/110044760451238819\">Ethics and Research</a></li>\n  <li>Day 12: <a href=\"https://akademienl.social/@egonw/110128344460387409\">Wrap up</a></li>\n</ul>",
      "summary": "Screenshot of the sign up page. Triggered by the “reflections on your career” in the announcement I decide to give the Open Science Challenge by Heidi Seibold a try: “12 emails over the course of a month that are designed to help you on your Open Science journey.”",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/openscience_challenge2003.png",
      "date_published": "2023-01-15T00:00:00+00:00",
      "date_modified": "2023-01-15T00:00:00+00:00",
      "tags": ["openscience"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/v08gc-ney75",
      "url": "https://chem-bla-ics.linkedchemistry.info/2022/11/21/finding-mastodon-accounts-with-wikidata.html",
      "title": "Finding Mastodon accounts with Wikidata (a few SPARQL queries)",
      "content_html": "<p>There are multiple initiatives to support the migration from Twitter to Mastodon (see also\n<a href=\"2022-11-12-stwittermastodong.markdown\">this blog post <i class=\"fa-solid fa-recycle fa-xs\"></i></a>). But\n<a href=\"https://wikidata.org/\">Wikidata</a>\nshould not be forgotten here which has been tracking Mastodon accounts of things in their database:</p>\n\n<p><img src=\"/assets/images/Screenshot_20221121_075015.png\" alt=\"Screenshot of a Wikidata query showing the growth in number of Mastodon accounts listed in Wikidata.\" /></p>\n\n<p>So, here are some <a href=\"https://query.wikidata.org/\">Wikidata SPARQL</a> queries to see the uptake:</p>\n\n<ul>\n  <li><a href=\"https://w.wiki/5$3w\">Universities with Mastodon</a></li>\n  <li><a href=\"https://w.wiki/5$42\">All Mastodon accounts in Wikidata</a> (or <a href=\"https://w.wiki/5$4S\">subset with also a Twitter account</a>)</li>\n  <li><a href=\"https://w.wiki/6zFm\">Nobel Prize winners with Mastodon</a></li>\n  <li><a href=\"https://w.wiki/5$4V\">Academic journals with Mastodon</a></li>\n  <li><a href=\"https://w.wiki/5$4a\">People with Mastodon that published in a PLOS journal</a> (you can pick another publisher)</li>\n  <li><a href=\"https://w.wiki/5$4e\">Find your co-authors with your ORCID</a> (just replace my ORCID with yours)</li>\n</ul>\n\n<p>If you find yourself missing, back in April I <a href=\"https://web.archive.org/web/20220428130716/https://threadreaderapp.com/thread/1519193166188007424.html\">tweeted <i class=\"fa-solid fa-box-archive fa-xs\"></i></a> (sorry)\nhow you can find yourself and others in Wikidata and how to add your or their Mastodon account.</p>",
      "summary": "There are multiple initiatives to support the migration from Twitter to Mastodon (see also this blog post ). But Wikidata should not be forgotten here which has been tracking Mastodon accounts of things in their database:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/Screenshot_20221121_075015.png",
      "date_published": "2022-11-21T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["mastodon","sparql","wikidata","rdf","orcid"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/e0h5f-7cg90",
      "url": "https://chem-bla-ics.linkedchemistry.info/2022/11/12/wikidata-script-for-smiles-smarts-and.html",
      "title": "Wikidata script for SMILES, SMARTS, and CXSMILES depiction",
      "content_html": "<p>In August I reported about <a href=\"https://chem-bla-ics.blogspot.com/2022/08/wikidata-now-escapes-smiles-and-cxsmiles.html\">2D depiction of (CX)SMILES in Wikidata via linkouts</a>\n(<a href=\"https://chem-bla-ics.blogspot.com/2017/07/wikidata-visualizes-smiles-strings-with.html\">going back to 2017</a>). Based on a script by\n<a href=\"https://orcid.org/0000-0001-5916-0947\">Magnus Manske</a>, I wrote a <a href=\"https://www.wikidata.org/wiki/User:Egon_Willighagen/cdkdepict_gadget.js\">Wikidata gadget</a>\nthat uses the same <a href=\"https://www.simolecule.com/cdkdepict/depict.html\">CDK Depict</a>\n(<a href=\"https://cdkdepict.cloud.vhp4safety.nl/\">VHP4Safety mirror</a>) to depict the 2D structure in <a href=\"https://wikidata.org/\">Wikidata</a> itself:</p>\n\n<p><img src=\"/assets/images/Screenshot_20221112_130346.png\" alt=\"Depicting of part of a Wikidata page with 2D structures of a canonical SMILES and matching CXSMILES.\" /></p>\n\n<p>Note the depiction of the undefined (CIP) stereochemistry on two atoms. Thanks to\n<a href=\"https://orcid.org/0000-0003-0443-9902\">Adriano</a> and <a href=\"https://nextmovesoftware.com/blog/author/john/\">John</a> for working that out.</p>\n\n<p>More about CXSMILES in Wikidata in <a href=\"https://egonw.github.io/cdk-cxsmiles/\">this Dagstuhl meeting results write up</a>.</p>",
      "summary": "In August I reported about 2D depiction of (CX)SMILES in Wikidata via linkouts (going back to 2017). Based on a script by Magnus Manske, I wrote a Wikidata gadget that uses the same CDK Depict (VHP4Safety mirror) to depict the 2D structure in Wikidata itself:",
      
      "date_published": "2022-11-12T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["wikidata","cdk","cxsmiles","dagstuhl","smiles","vhp4safety"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/ejhaf-0k749",
      "url": "https://chem-bla-ics.linkedchemistry.info/2022/11/12/stwittermastodong.html",
      "title": "s/Twitter/Mastodon/g",
      "content_html": "<p><img src=\"/assets/images/Mastodon_logotype_(simple)_new_hue.svg.png\" style=\"width: 30%; display: block; margin-left: auto; margin-right: auto; float: right\" alt=\"Mastodon logo. AGPL source: WikiCommons\" />\nYeah, it has been hard to miss it (see e.g. <a href=\"https://www.nature.com/articles/d41586-022-03668-7\">Should I join Mastodon? A scientists’ guide to Twitter’s rival</a>).\nTwitter is experiencing some turbulence and <a href=\"https://joinmastodon.org/\">Mastodon</a> has become a very attractive, open source,\ncommunity-driven, inclusive alternative. It’s been <a href=\"https://scholia.toolforge.org/topic/Q27986619\">around since 2016</a> and there\nis some <a href=\"https://scholia.toolforge.org/topic/Q27986619\">research literature about it</a> already. I got\n<a href=\"https://chem-bla-ics.blogspot.com/2018/09/mastodon-somewhere-between-twitter-and.html?q=mastodon\">my account in 2018</a>, but did\nnot start actively using it until earlier this year.</p>\n\n<p>It’s a fascinating platform: federated, community driven, and open source. Oh, and it uses an open standard:\n<a href=\"https://en.wikipedia.org/wiki/ActivityPub\">ActivityPub</a>. I have still a lot to learn, but there are some reasons why Mastodon\nis better and some reasons why it is worse than Twitter.</p>\n\n<p>First, how can you follow me:</p>\n\n<ul>\n  <li>main scholarly account: <a href=\"https://social.edu.nl/@egonw\">https://social.edu.nl/@egonw</a></li>\n  <li>politics, foss, hobby account: <a href=\"https://mastodon.social/@egonw\">https://mastodon.social/@egonw</a></li>\n</ul>\n\n<p><strong>Better</strong></p>\n\n<p>Well, this is personal, of course, but the following points makes Mastodon for me a better platform:</p>\n\n<ul>\n  <li>distributed, open standard\n    <ul>\n      <li>e.g. no more tweeting of new Zotero entries (soon I hope), just follow my Zotero account</li>\n    </ul>\n  </li>\n  <li>community standards\n    <ul>\n      <li>you can pick; if you don’t like the terms of your current server (read: service provider), just move to another server</li>\n      <li>images must have alternate descriptions on many servers</li>\n    </ul>\n  </li>\n  <li>edit button with version control</li>\n  <li>content warnings</li>\n  <li>ability to hide anything with #caturday (or any other word)</li>\n  <li>detailed annotation of privacy (public, unlisted, etc; no encryption, tho)</li>\n</ul>\n\n<p><strong>Worse</strong></p>\n\n<p>Maybe this category can better be called opportunities. After all, it’s the community that defines how it will evolve, just like Twitter did (which did not originally have hashtags, retweets). One big elephant in the scientific social media world wright now is the uncertainty about searching and indexing: will it be useful as (post-publication) platform? will we be able to use if for conference tweeting?</p>\n\n<p>Another aspect is that in some countries mobile internet is deeply coupled with big companies. Think coupling of access with free whatsapp.</p>\n\n<p>Finally: growing pains. The platform is growing fast, and right now it can be hard to find a server that accepts new accounts.</p>\n\n<p><strong>Tips?</strong></p>\n\n<p>Sure. Start with <a href=\"https://fedi.tips/\">https://fedi.tips/</a>. Have fun! And I love to hear what your tips are :)</p>\n\n<p>Image from <a href=\"https://commons.wikimedia.org/wiki/File:Mastodon_logotype_(simple)_new_hue.svg\">WikiCommons</a>.</p>",
      "summary": "Yeah, it has been hard to miss it (see e.g. Should I join Mastodon? A scientists’ guide to Twitter’s rival). Twitter is experiencing some turbulence and Mastodon has become a very attractive, open source, community-driven, inclusive alternative. It’s been around since 2016 and there is some research literature about it already. I got my account in 2018, but did not start actively using it until earlier this year.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/Mastodon_logotype_(simple)_new_hue.svg.png",
      "date_published": "2022-11-12T00:00:00+00:00",
      "date_modified": "2025-02-16T00:00:00+00:00",
      "tags": ["mastodon","twitter"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1038/d41586-022-03668-7", "doi": "10.1038/d41586-022-03668-7"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/29mez-a1s32",
      "url": "https://chem-bla-ics.linkedchemistry.info/2022/10/08/is-your-research-cited-by-nobel-prize.html",
      "title": "Is your research cited by a Nobel prize winner?",
      "content_html": "<p><span style=\"width: 20%; float: right\"><a href=\"https://en.wikipedia.org/wiki/File:Nobel_Prize.png\">\n  <img src=\"https://upload.wikimedia.org/wikipedia/en/e/ed/Nobel_Prize.png?20131011153104\" /></a></span>\nForget the journal impact factor and the H-index. You want your research being used. A first approximation of that is getting cited,\nsure. So, with the Nobel Prize week over (congrats to all winners! the <a href=\"https://www.sciencelink.net/news/nobel-prize-in-physiology-awarded-to-sequencing-of-ancient-genomes/20811.article\">Neanderthaler prize</a>\nactually helped my work in Maastricht this week), let’s figure out of you are cited by a Nobel Prize winner.\nWikidata allows us to figure this out with a SPARQL query\n(<a href=\"https://twitter.com/Adafede/status/1577642035011534850\">created together with Adriano</a>):</p>\n\n<div class=\"language-sparql highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"c1\">#title: Are you cited by Nobel Prize winners?</span><span class=\"w\">\n\n</span><span class=\"k\">SELECT</span><span class=\"w\"> </span><span class=\"p\">(</span><span class=\"nb\">MIN</span><span class=\"p\">(</span><span class=\"nv\">?dates</span><span class=\"p\">)</span><span class=\"w\"> </span><span class=\"k\">AS</span><span class=\"w\"> </span><span class=\"nv\">?date</span><span class=\"p\">)</span><span class=\"w\"> </span><span class=\"nv\">?work</span><span class=\"w\"> </span><span class=\"nv\">?workLabel</span><span class=\"w\">\n  </span><span class=\"p\">(</span><span class=\"nb\">GROUP_CONCAT</span><span class=\"p\">(</span><span class=\"k\">DISTINCT</span><span class=\"w\"> </span><span class=\"nv\">?winnerLabel</span><span class=\"p\">;</span><span class=\"w\"> </span><span class=\"nb\">SEPARATOR</span><span class=\"w\"> </span><span class=\"p\">=</span><span class=\"w\"> </span><span class=\"s2\">\", \"</span><span class=\"p\">)</span><span class=\"w\"> </span><span class=\"k\">AS</span><span class=\"w\"> </span><span class=\"nv\">?winners</span><span class=\"p\">)</span><span class=\"w\">\n  </span><span class=\"p\">(</span><span class=\"nb\">COUNT</span><span class=\"p\">(</span><span class=\"k\">DISTINCT</span><span class=\"p\">(</span><span class=\"nv\">?winnerLabel</span><span class=\"p\">))</span><span class=\"w\"> </span><span class=\"k\">AS</span><span class=\"w\"> </span><span class=\"nv\">?count</span><span class=\"p\">)</span><span class=\"w\">\n</span><span class=\"k\">WHERE</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\">\n  </span><span class=\"k\">VALUES</span><span class=\"w\"> </span><span class=\"nv\">?nobel</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\">\n    </span><span class=\"nn\">wd</span><span class=\"o\">:</span><span class=\"ss\">Q7191</span><span class=\"w\">\n    </span><span class=\"nn\">wd</span><span class=\"o\">:</span><span class=\"ss\">Q80061</span><span class=\"w\">\n    </span><span class=\"nn\">wd</span><span class=\"o\">:</span><span class=\"ss\">Q44585</span><span class=\"w\">\n    </span><span class=\"nn\">wd</span><span class=\"o\">:</span><span class=\"ss\">Q38104</span><span class=\"w\">\n  </span><span class=\"p\">}</span><span class=\"w\">\n  </span><span class=\"nv\">?work</span><span class=\"w\"> </span><span class=\"nn\">wdt</span><span class=\"o\">:</span><span class=\"ss\">P50</span><span class=\"o\">/</span><span class=\"nn\">wdt</span><span class=\"o\">:</span><span class=\"ss\">P496</span><span class=\"w\"> </span><span class=\"s2\">\"0000-0002-2627-833X\"</span><span class=\"w\"> </span><span class=\"p\">;</span><span class=\"w\"> </span><span class=\"c1\"># REPLACE WITH YOUR ORCID id</span><span class=\"w\">\n    </span><span class=\"nn\">wdt</span><span class=\"o\">:</span><span class=\"ss\">P577</span><span class=\"w\"> </span><span class=\"nv\">?datetimes</span><span class=\"p\">.</span><span class=\"w\">\n  </span><span class=\"p\">[]</span><span class=\"w\"> </span><span class=\"nn\">wdt</span><span class=\"o\">:</span><span class=\"ss\">P2860</span><span class=\"w\"> </span><span class=\"nv\">?work</span><span class=\"p\">;</span><span class=\"w\">\n    </span><span class=\"nn\">wdt</span><span class=\"o\">:</span><span class=\"ss\">P50</span><span class=\"w\"> </span><span class=\"nv\">?winner</span><span class=\"p\">.</span><span class=\"w\">\n  </span><span class=\"nv\">?winner</span><span class=\"w\"> </span><span class=\"nn\">wdt</span><span class=\"o\">:</span><span class=\"ss\">P166</span><span class=\"w\"> </span><span class=\"nv\">?nobel</span><span class=\"p\">.</span><span class=\"w\">\n  </span><span class=\"k\">BIND</span><span class=\"p\">(</span><span class=\"nn\">xsd</span><span class=\"o\">:</span><span class=\"ss\">date</span><span class=\"p\">(</span><span class=\"nv\">?datetimes</span><span class=\"p\">)</span><span class=\"w\"> </span><span class=\"k\">AS</span><span class=\"w\"> </span><span class=\"nv\">?dates</span><span class=\"p\">)</span><span class=\"w\">\n  </span><span class=\"k\">SERVICE</span><span class=\"w\"> </span><span class=\"nn\">wikibase</span><span class=\"o\">:</span><span class=\"ss\">label</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\">\n    </span><span class=\"nn\">bd</span><span class=\"o\">:</span><span class=\"ss\">serviceParam</span><span class=\"w\"> </span><span class=\"nn\">wikibase</span><span class=\"o\">:</span><span class=\"ss\">language</span><span class=\"w\"> </span><span class=\"s2\">\"en\"</span><span class=\"p\">.</span><span class=\"w\">\n    </span><span class=\"nv\">?winner</span><span class=\"w\"> </span><span class=\"nn\">rdfs</span><span class=\"o\">:</span><span class=\"ss\">label</span><span class=\"w\"> </span><span class=\"nv\">?winnerLabel</span><span class=\"p\">.</span><span class=\"w\">\n    </span><span class=\"nv\">?work</span><span class=\"w\"> </span><span class=\"nn\">rdfs</span><span class=\"o\">:</span><span class=\"ss\">label</span><span class=\"w\"> </span><span class=\"nv\">?workLabel</span><span class=\"p\">.</span><span class=\"w\">\n  </span><span class=\"p\">}</span><span class=\"w\">\n</span><span class=\"p\">}</span><span class=\"w\">\n</span><span class=\"k\">GROUP</span><span class=\"w\"> </span><span class=\"k\">BY</span><span class=\"w\"> </span><span class=\"nv\">?work</span><span class=\"w\"> </span><span class=\"nv\">?workLabel</span><span class=\"w\">\n</span><span class=\"k\">ORDER</span><span class=\"w\"> </span><span class=\"k\">BY</span><span class=\"w\"> </span><span class=\"k\">DESC</span><span class=\"w\"> </span><span class=\"p\">(</span><span class=\"nv\">?count</span><span class=\"p\">)</span><span class=\"w\">\n</span></code></pre></div></div>\n\n<p>Run this query <a href=\"https://w.wiki/5nBX\">here</a>. Notice the ORCID given in the middle: change that to your own ORCID identifier.</p>\n\n<p>Please keep in mind that <a href=\"https://www.wikidata.org/\">Wikidata</a> does not contain all literature (neither do Google Scholar,\nWeb of Science, PubMed) and not all citations.</p>",
      "summary": "Forget the journal impact factor and the H-index. You want your research being used. A first approximation of that is getting cited, sure. So, with the Nobel Prize week over (congrats to all winners! the Neanderthaler prize actually helped my work in Maastricht this week), let’s figure out of you are cited by a Nobel Prize winner. Wikidata allows us to figure this out with a SPARQL query (created together with Adriano):",
      
      "date_published": "2022-10-08T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["wikidata","sparql"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/458r6-cmn16",
      "url": "https://chem-bla-ics.linkedchemistry.info/2022/08/02/wikidata-now-escapes-smiles-and-cxsmiles.html",
      "title": "Wikidata now escapes SMILES and CXSMILES!",
      "content_html": "<p>In the end it was a very <a href=\"https://www.wikidata.org/w/index.php?title=MediaWiki%3AGadget-AuthorityControl.js&amp;type=revision&amp;diff=1694196586&amp;oldid=1409657932\">simple change</a>\ntoday (huge thanks to <a href=\"https://www.wikidata.org/wiki/User:Nikki\">Nikki</a>!), but <a href=\"https://wikidata.org/\">Wikidata</a>\nnow escapes SMILES and CXSMILES (<a href=\"https://wikidata.org/entity/P10718\">P10718</a>) with the formatter URL\n(<a href=\"https://wikidata.org/entity/P1630\">P1630</a>)!</p>\n\n<p><img src=\"/assets/images/Screenshot_20220802_100605.png\" alt=\"\" /></p>\n\n<p>That means that the link to <a href=\"https://cdkdepict.cloud.vhp4safety.nl/\">CDK Depict</a> now also works for SMILES\n(<a href=\"https://wikidata.org/entity/P233\">P233</a> and <a href=\"https://wikidata.org/entity/P2017\">P2017</a>) with a triple bond in it :) And because\n<a href=\"https://twitter.com/Adafede\">Adriano</a> created the so far missing <code class=\"language-plaintext highlighter-rouge\">formatter URL</code> for CXSMILES, it also\nworks for lipid classes (see <a href=\"https://chem-bla-ics.blogspot.com/2022/08/biology-acps-lipids-cheminformatics-and.html\">my post yesterday</a>),\npolymers, etc :)</p>\n\n<p><img src=\"/assets/images/Screenshot_20220802_100818.png\" alt=\"\" /></p>\n\n<center>CXSMILES for a group of acyl-carrier proteins.</center>\n<p><br /></p>\n\n<p><img src=\"/assets/images/Screenshot_20220802_100841.png\" alt=\"\" /></p>\n\n<center>The <i>formatter URL</i> info added to make the link outs for CXSMILES work. The patch\nby Nikki ensures that characters like # are escaped before the URL is created.</center>\n<p><br /></p>",
      "summary": "In the end it was a very simple change today (huge thanks to Nikki!), but Wikidata now escapes SMILES and CXSMILES (P10718) with the formatter URL (P1630)!",
      
      "date_published": "2022-08-02T00:00:00+00:00",
      "date_modified": "2022-08-02T00:00:00+00:00",
      "tags": ["wikidata","cxsmiles"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/ab2rj-qdg37",
      "url": "https://chem-bla-ics.linkedchemistry.info/2022/08/01/biology-acps-lipids-cheminformatics-and.html",
      "title": "Biology, ACPs, lipids, cheminformatics, and Dagstuhl",
      "content_html": "<p>Already 3 months ago I visited <a href=\"https://www.dagstuhl.de/\">Dagstuhl</a> for the second time. The weather was much better than in the January right before\nthe start of the pandemic. The first I attended the Computational Metabolomics meeting, with the focus From Cheminformatics to Machine Learning, one\nof the things we concerned ourselves with was how to do computation with compound classes (see\n<a href=\"https://drops.dagstuhl.de/opus/volltexte/2020/12403/pdf/dagrep_v010_i001_p144_20051.pdf\">Section 3.6</a> and\n<a href=\"https://egonw.github.io/cdk-cxsmiles/\">this online book</a>). We know how to handle\nSMILES and we know how to the substructure searching with SMARTS, but what if you have compound classes or lipid classes? Biology is a greasy business.</p>\n\n<p>From a <a href=\"https://wikipathways.org/\">WikiPathways</a> there is additional complexity, with modified proteins involved in lipid metabolism, the acyl-carrier\nproteins. They look like this, and the R group is a protein:</p>\n\n<p><img src=\"/assets/images/Screenshot_20220801_180944.png\" alt=\"\" /></p>\n\n<p>We have quite a few of them in WikiPathway and they also show up in <a href=\"https://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:5697\">ChEBI</a> (and likely\nReactome), <a href=\"https://www.lipidmaps.org/databases/lmsd/LMFA07060040?LMID=LMFA07060040\">LIPID MAPS</a>, and\n<a href=\"https://www.kegg.jp/entry/C05764\">KEGG</a>.</p>\n\n<p>During this years Dagstuhl we used up one session to continue working on it (report pending). Part of the results is that\n<a href=\"https://www.wikidata.org/\">Wikidata</a> (see doi:<a href=\"https://doi.org/10.7554/eLife.52614\">10.7554/eLife.52614</a> and\ndoi:<a href=\"https://doi.org/10.7554/eLife.70780\">10.7554/eLife.70780</a>) now has <a href=\"https://www.wikidata.org/wiki/Property:P10718\">a property for CXSMILES</a>.\nCDK 2.0 (doi:<a href=\"https://doi.org/10.1186/s13321-017-0220-4\">10.1186/s13321-017-0220-4</a>) already supported CXSMILES and the above image is actually created with\n<a href=\"https://github.com/cdk/depict\">CDK Depict</a> (thx to John!).</p>\n\n<p>So, that means I can now start adding all those ACPs to Wikidata :) Here’s <a href=\"https://www.wikidata.org/wiki/Q113377202\">hexadecanoyl-[acp]</a>\n(or this <a href=\"https://scholia.toolforge.org/chemical-class/Q113377202\">Scholia page</a>):</p>\n\n<p><img src=\"/assets/images/Screenshot_20220801_182345.png\" alt=\"\" /></p>",
      "summary": "Already 3 months ago I visited Dagstuhl for the second time. The weather was much better than in the January right before the start of the pandemic. The first I attended the Computational Metabolomics meeting, with the focus From Cheminformatics to Machine Learning, one of the things we concerned ourselves with was how to do computation with compound classes (see Section 3.6 and this online book). We know how to handle SMILES and we know how to the substructure searching with SMARTS, but what if you have compound classes or lipid classes? Biology is a greasy business.",
      
      "date_published": "2022-08-01T00:00:00+00:00",
      "date_modified": "2022-08-01T00:00:00+00:00",
      "tags": ["cdk","chebi","dagstuhl","epilipidnet","kegg","wikipathways","lipidmaps","metabolomics","smiles","wikidata","cxsmiles"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.7554/ELIFE.52614", "doi": "10.7554/ELIFE.52614"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.7554/ELIFE.70780", "doi": "10.7554/ELIFE.70780"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1186/S13321-017-0220-4", "doi": "10.1186/S13321-017-0220-4"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/wnect-mj679",
      "url": "https://chem-bla-ics.linkedchemistry.info/2022/05/22/new-cas-common-chemistry-in-2021.html",
      "title": "new: &quot;CAS Common Chemistry in 2021: Expanding Access to Trusted Chemical Information for the Scientific Community&quot;",
      "content_html": "<p>Open Science is happening. The merits are no longer theoretical or idealistic but tangible. Research is faster than ever, more vetted than ever (think PubPeer),\nmore cited than ever. Fairly, not just because of Open Science, but open access causes readership causes impact causes citations. When new people and\norganizations start adopting Open Science this warms my hearth.</p>\n\n<p>So, when I was asked to work with <a href=\"https://www.cas.org/\">Chemical Abstracts Service</a> (CAS) on a new, bigger than ever version of\n<a href=\"https://commonchemistry.cas.org/\">Common Chemistry</a> (which started as a project between CAS and Wikipedia), I welcomed the project. I don’t quite\nremember the first meetings, but roughly my task became to work with the new content and match this against Wikidata and Wikipedia. It aligned well\nwith <a href=\"https://bridgedb.github.io/\">BridgeDb</a>, <a href=\"https://scholia.toolforge.org/\">Scholia</a>, and our metabolomics research, so I even could find\nsufficient research time for it. This work is now published in the <a href=\"https://pubs.acs.org/journal/jcisd8\">JCIM</a>:\n<em>CAS Common Chemistry in 2021: Expanding Access to Trusted Chemical Information for the Scientific Community</em>\n(doi:<a href=\"https://doi.org/10.1021/acs.jcim.2c00268\">10.1021/acs.jcim.2c00268</a>).</p>\n\n<p><img src=\"/assets/images/images_medium_ci2c00268_0003.png\" alt=\"\" /> <br />\n<em>Figure 2 from the article. Detailed record for caffeine in CAS Common Chemistry (image: CC-BY).</em></p>\n\n<p>About Wikidata, the paper writes (CC-BY):</p>\n\n<blockquote>\n  <p>The latest release of CAS Common Chemistry has also supported updates and corrections to CAS RNs in Wikidata and Wikipedia. (22)\nInChIKeys were calculated from CAS SMILES using Bacting 0.0.31 (23) with the Chemistry Development Kit 2.7.1 (24) and were\nmatched with content in Wikidata. The CAS RNs were then compared. References to CAS Common Chemistry were added for CAS RNs\nthat matched. Mismatches have been shared with the Wikidata and Wikipedia communities so that they can manually review and\ncorrect the misleading entries using CAS Common Chemistry as a reference. Because Wikidata also curates identifiers from\nother data sources, validated CAS RNs in Wikidata may also be used to cross-reference with other resources. Scripts are\nprovided in the Supporting Information.</p>\n</blockquote>\n\n<p>The alignment is a continuous process, as new chemical compounds get added to Wikidata on a weekly basis. The comparison of\nCommon Chemistry with Wikidata and Wikipedia resulted in a wealth of curation data, e.g. inconsistent CAS numbers linked to\nInChIKeys, where Common Chemistry had a different match than Wikidata or Wikipedia.</p>\n\n<p>CAS registry numbers were not added to Wikidata in this process, only confirmed or reported as different. The latter\nallowed manual curation by the community, which it did. Reports <a href=\"https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_Chemistry/CAS_Validation_Results\">look like this</a>.\nWhen a InChIKey-CAS RN combination in Wikidata was confirmed, it was recorded as a reference, like this:</p>\n\n<p><img src=\"/assets/images/Screenshot_20220522_084233.png\" alt=\"\" /> <br />\n<em>Screenshot of Wikidata with two references, one reflecting a confirmation\nby the English Wikipedia (potentially the result of the original Common Chemistry\nproject) and the second as outcome of the now published project.</em></p>\n\n<p>Thanks to everyone on this project and <a href=\"https://orcid.org/0000-0001-9316-9400\">Andrea Jacobs</a>\nparticularly for leading this open science project.</p>",
      "summary": "Open Science is happening. The merits are no longer theoretical or idealistic but tangible. Research is faster than ever, more vetted than ever (think PubPeer), more cited than ever. Fairly, not just because of Open Science, but open access causes readership causes impact causes citations. When new people and organizations start adopting Open Science this warms my hearth.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/images_medium_ci2c00268_0003.png",
      "date_published": "2022-05-22T00:00:00+00:00",
      "date_modified": "2022-05-22T00:00:00+00:00",
      "tags": ["curation","chemistry","cas","bioclipse","cdk","bridgedb"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1021/ACS.JCIM.2C00268", "doi": "10.1021/ACS.JCIM.2C00268"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/3f7mc-e0a42",
      "url": "https://chem-bla-ics.linkedchemistry.info/2022/05/17/new-providing-adverse-outcome-pathways.html",
      "title": "new: &quot;Providing Adverse Outcome Pathways from the AOP-Wiki in a Semantic Web Format to Increase Usability and Accessibility of the Content&quot;",
      "content_html": "<p>I am a bit behind with tweeting about new published papers, but let that not reflect that these papers are not very exciting. The first paper is by\n<a href=\"https://scholar.google.com/citations?user=GvOHiicAAAAJ&amp;hl=en\">Marvin</a> an almost-finished PhD candidate in our group and now working as postdoc on the\n<a href=\"https://vhp4safety.nl/\">VHP4Safety</a> project. He has been working on linking adverse outcome pathways (AOPs) with molecular pathways, such as in\n<a href=\"https://www.wikipathways.org/\">WikiPathways</a>. This work was mostly done as part of the EU projects\n<a href=\"https://openrisknet.org/\">OpenRiskNet</a> and <a href=\"https://cordis.europa.eu/project/id/681002\">EUToxRisk <i class=\"fa-solid fa-recycle fa-xs\"></i></a>, during which he disseminated his research in many directions\n(e.g. the second paper in <a href=\"https://chem-bla-ics.blogspot.com/2022/03/contributions-to-two-new-papers-skin.html\">this post</a>). Talking about impact.</p>\n\n<p>He previously already sketched out the ideas of integration the two kinds of pathways in <a href=\"https://doi.org/10.3389/fgene.2018.00661\">this paper</a>.\nThe implementation of this has now been published in the paper <em>Providing Adverse Outcome Pathways from the AOP-Wiki in a Semantic Web Format\nto Increase Usability and Accessibility of the Content</em> (doi:<a href=\"https://doi.org/10.1089/aivt.2021.0010\">10.1089/aivt.2021.0010</a>).\nIt’s an important piece of our growing molecular life sciences knowledge graph, which already contains data from WikiPathways and ChEMBL. Of course, integrated with other SPARQL endpoints, such as NextProt/UniProt, Rhea, etc.</p>\n\n<p>Schematic diagram from the article showing the kind of information in the database:</p>\n\n<p><img src=\"/assets/images/images_medium_aivt.2021.0010_figure1.png\" alt=\"\" /></p>\n\n<p>Marvin writes: <em>“The resulting RDF contains &gt;122,000 triples describing 158 unique properties of &gt;15,000 unique subjects. Furthermore, &gt;3500 link-outs\nwere added to 12 chemical databases, and &gt;7500 link-outs to 4 gene and protein databases. The AOP-Wiki RDF has been made available at\n<a href=\"https://aopwiki.rdf.bigcat-bioinformatics.org\">https://aopwiki.rdf.bigcat-bioinformatics.org</a>”.</em> The last comes with many example queries.</p>",
      "summary": "I am a bit behind with tweeting about new published papers, but let that not reflect that these papers are not very exciting. The first paper is by Marvin an almost-finished PhD candidate in our group and now working as postdoc on the VHP4Safety project. He has been working on linking adverse outcome pathways (AOPs) with molecular pathways, such as in WikiPathways. This work was mostly done as part of the EU projects OpenRiskNet and EUToxRisk , during which he disseminated his research in many directions (e.g. the second paper in this post). Talking about impact.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/images_medium_aivt.2021.0010_figure1.png",
      "date_published": "2022-05-17T00:00:00+00:00",
      "date_modified": "2022-05-17T00:00:00+00:00",
      "tags": ["openrisknet","eutoxrisk"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1089/AIVT.2021.0010", "doi": "10.1089/AIVT.2021.0010"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.3389/FGENE.2018.00661", "doi": "10.3389/FGENE.2018.00661"
             }
            
          
        ],
      
      "_funding": [{"award": { "title" : "OpenRiskNet: Open e-Infrastructure to Support Data Sharing, Knowledge Integration and in silico Analysis and Modelling in Risk Assessment", "acronym" : "OpenRiskNet", "uri" : "cordis.project:731075" }, "funder": { "name": "European Commission", "ror": "00k4n6c32" } },{"award": { "title" : "An Integrated European ‘Flagship’ Program Driving Mechanism-based Toxicity Testing and Risk Assessment for the 21st Century", "acronym" : "EU-ToxRisk", "uri" : "cordis.project:681002" }, "funder": { "name": "European Commission", "ror": "00k4n6c32" } }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/5akm5-x6328",
      "url": "https://chem-bla-ics.linkedchemistry.info/2022/04/30/the-international-conference-on.html",
      "title": "The International Conference on Chemical Structures scientific program is online!",
      "content_html": "<p><span style=\"width: 40%; display: block; margin-left: auto; margin-right: auto; float: right\">\n<img src=\"/assets/images/iccs_2022_sciprog.png\" /> <br />\nPart of the scientific program of the ICCS 2022.\n</span>\nNow that most speakers confirmed their talk by registering for the conference, it was time to upload the preliminary\n<a href=\"https://web.archive.org/web/20220429210627/https://iccs-nl.org/general-information/scientific-program/\">scientific program <i class=\"fa-solid fa-box-archive fa-xs\"></i></a> of the\n<a href=\"https://iccs-nl.org/\">International Conference on Chemical Structures</a>\n(<a href=\"https://hashtags-hub.toolforge.org/2022ICCS\">#2022ICCS <i class=\"fa-solid fa-recycle fa-xs\"></i></a>).</p>\n\n<p>The conference will have 5 sessions:</p>\n\n<ul>\n  <li>Analysis of Large Chemical Data Sets,</li>\n  <li>Structure-Activity and Structure-Property Prediction,</li>\n  <li>Dealing with Biological Complexity,</li>\n  <li>Structure-Based Approaches, and</li>\n  <li>Cheminformatics Approaches.</li>\n</ul>\n\n<p>I am also a bit disappointed that the nanoinformatics community did not submit much work. I guess that like the gaps with the\nnanomedicine community, the link with the cheminformatics community is also a bit weak.</p>\n\n<p>I am also delighted that two <a href=\"http://www.bigcat.unimaas.nl/\">BiGCaT</a> researchers (Denise Slenter and Ammar Ammar) will\npresent their research in Noordwijkerhout. I am looking forward to seeing you all in June. You can\n<a href=\"https://iccs-nl.org/general-information/registration/\">register here</a>.\nKeep in mind the number of places is limited and the registration is filling up quickly.</p>",
      "summary": "Part of the scientific program of the ICCS 2022. Now that most speakers confirmed their talk by registering for the conference, it was time to upload the preliminary scientific program of the International Conference on Chemical Structures (#2022ICCS ).",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/iccs_2022_sciprog.png",
      "date_published": "2022-04-30T00:00:00+00:00",
      "date_modified": "2025-06-08T00:00:00+00:00",
      "tags": ["iccs"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/t5wrt-55f66",
      "url": "https://chem-bla-ics.linkedchemistry.info/2022/04/17/bridgedb-nwo-grant-update-2-building-up.html",
      "title": "BridgeDb NWO grant update #2: building up momentum",
      "content_html": "<p><a href=\"/assets/images/bridgedb_nwo_uml.png\"><img src=\"/assets/images/bridgedb_nwo_uml.png\" style=\"width: 40%; display: block; margin-left: auto; margin-right: auto; float: right\" alt=\"UML diagram showing the steps in a BridgeDb webservice call.\" /></a>\nLast month I <a href=\"https://chem-bla-ics.linkedchemistry.info/2022/03/05/bridgedb-nwo-grant-update-1-first-steps.html\">reported <i class=\"fa-solid fa-recycle fa-xs\"></i></a> on the start of the\n<a href=\"https://www.nwo.nl/en/researchprogrammes/open-science/open-science-fund\">NWO Open Science grant</a> and it is time for an update. First,\nour grant now has a grant number, <a href=\"https://www.nwo.nl/en/projects/203001121\">203.001.121</a>. For a project that is about identifiers,\nhaving a project identifier is a big deal.</p>\n\n<p>Some updates by Denise, Martina, Tooba, Helena, and me:</p>\n\n<ul>\n  <li>the project proposal was accepted and published in RIO Journal (doi:<a href=\"https://doi.org/10.3897/rio.8.e83031\">10.3897/rio.8.e83031</a>)</li>\n  <li>we started drawing various <a href=\"https://github.com/bridgedb/stories\">BridgeDb stories as UML diagrams</a> using\n<a href=\"https://mermaid-js.github.io/\">Mermaid</a></li>\n  <li>updated the documentation in the <a href=\"https://github.com/bridgedb/bridgedb-webservice\">BridgeDb Webservice repository</a></li>\n  <li>an <a href=\"https://github.com/bridgedb/data/commit/172a9c69ef557e7cb065a138f0fc4f5243615188\">Ensembl 104-based gene/protein ID mapping database</a>\n(doi:<a href=\"10.5281/zenodo.6367091\">10.5281/zenodo.6367091</a>)</li>\n  <li>better unit test coverage of the BridgeDb Java library</li>\n  <li>various <a href=\"https://citation-file-format.github.io/\">CITATION.cff</a> updates</li>\n</ul>\n\n<p>There are some further things cooking, including an updated <a href=\"https://github.com/bridgedb/datasources\">datasources.tsv</a> and a few\n<a href=\"https://github.com/bridgedb/BridgeDb/pulls\">pull requests</a>. I expect a new release of the BridgeDb Java library before the end of the month.</p>\n\n<p>With these new results, we also updated <a href=\"https://www.isaac.nwo.nl/\">the ISAAC database</a> for the two new products\n(the published proposal and the gene/protein ID mapping database):</p>\n\n<p><img src=\"/assets/images/bridgedb_nwo_isaac.png\" alt=\"\" /></p>\n\n<p>Right now, the ISAAC database does not make it easy to add content. Instead, there is a series of forms that have to be\nmanually filled, including separate forms for authors. You cannot simply add a DOI. Well, until recent.\n<a href=\"https://orcid.org/0000-0002-4751-4637\">Lars Willighagen</a> and I developed <a href=\"https://chrome.google.com/webstore/detail/isaac-chrome-extension/kiljfbiapahlahhilgcgfkfjnkgggode\">a Chrome browser add-on</a>\nto help out (also works with Brave), using his awesome <a href=\"https://citation.js.org/\">citation-js</a>\n(doi:<a href=\"https://doi.org/10.7717/peerj-cs.214\">10.7717/peerj-cs.214</a>). The above two entries in the database have\nbeen added using this add-on.</p>\n\n<p>We hope it will help other NWO grant holders too and that the add-on becomes obsolete in the near future Because the ISAAC database needs some updates elsewhere too. For example, it does not seem to value open source and open data so much yet:</p>\n\n<p><img src=\"/assets/images/bridgedb_nwo_isaac_output_types.png\" alt=\"\" /></p>\n\n<p>That is a shame.</p>",
      "summary": "Last month I reported on the start of the NWO Open Science grant and it is time for an update. First, our grant now has a grant number, 203.001.121. For a project that is about identifiers, having a project identifier is a big deal.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/bridgedb_nwo_isaac.png",
      "date_published": "2022-04-17T00:00:00+00:00",
      "date_modified": "2025-03-14T00:00:00+00:00",
      "tags": ["bridgedb","openscience","isaac"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.3897/RIO.8.E83031", "doi": "10.3897/RIO.8.E83031"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.5281/ZENODO.6367091", "doi": "10.5281/ZENODO.6367091"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.7717/peerj-cs.214", "doi": "10.7717/peerj-cs.214"
             }
            
          
        ],
      
      "_funding": [{"award": { "title" : "BridgeDb and Wikidata: a powerful combination generating interoperable open research", "uri" : "drc.filenumber:203001121" }, "funder": { "name": "Dutch Research Council", "ror": "04jsz6e67" } }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/aegfe-5zc88",
      "url": "https://chem-bla-ics.linkedchemistry.info/2022/03/05/bridgedb-nwo-grant-update-1-first-steps.html",
      "title": "BridgeDb NWO grant update #1: first steps",
      "content_html": "<p>Last year, Denise, Tina, Marvin, and I received an <a href=\"https://www.nwo.nl/en/researchprogrammes/open-science/open-science-fund\">NWO Open Science</a>\ngrant (<a href=\"https://www.nwo.nl/en/projects/203001121\">203.001.121</a>) to improve the long running BridgeDb project, originally developed by Martijn van Iersel\n(see doi:<a href=\"https://doi.org/10.1186/1471-2105-11-5\">10.1186/1471-2105-11-5</a>). Helena joined our group as research software engineer and will work\npart-time on this grant. We started two weeks ago, so time for an update of results:</p>\n\n<ul>\n  <li>the project started after writing our data management and software sustainability plans (mostly, GitHub+Zenodo)</li>\n  <li>the project proposal has been submitted to <a href=\"https://riojournal.com/\">RIO Journal</a></li>\n  <li>created a private project in the <a href=\"https://gitlab.maastrichtuniversity.nl/\">Maastricht University GitLab</a> instance (with all tasks as issues, so that we can monitor progress)</li>\n  <li>first patches by Helena to the <a href=\"https://github.com/bridgedb/bridgedb\">BridgeDb Java library</a></li>\n  <li>factored out the <a href=\"https://github.com/bridgedb/bridgedb-webservice\">BridgeDb Webservice</a> into a separate (unpretty, see topright screenshot) repository, so that the BridgeDb Java library compiles again</li>\n  <li>Marvin update the <a href=\"https://hub.docker.com/layers/bigcatum/bridgedb/3.0.13.20220304/images/sha256-ad373eae152806d0935b751bcd06216732c7e26d3c34efba5e6a388d48c37087?context=explore\">BridgeDb Docker</a> with the latest BridgeDb 3.0.13 and the latest mapping files</li>\n</ul>\n\n<p>It should be noted that FAIRplus has funded Chris’s team to work on identifier mapping too. Luc, Lucas, and now Tooba in our team have been working\non Ensembl-based gene/protein identifier mappings and <a href=\"https://fairplus.github.io/the-fair-cookbook/\">FAIRplus Cookbook</a> recipes.</p>\n\n<p>Not bad this progress in the first two weeks. We are ready now to start writing unit tests for much of the BridgeDb code. There were some, but a lot of code is used in production, but not formally tested. So far, the number of regressions due to updated libraries (dependencies) has been quite manageable. But with the work planned in this grant, we need more sustainable software, and therefore more unit testing. With the BridgeDb Webservice factored out, the code is compiling again and so is the code coverage testing.</p>\n\n<p>The BridgeDb Webservice itself needs a rewrite from scratch. At least the mapping between underlying code (which we can reuse) and the REST calls. The library we used here has never been updated and I spent last weekend figuring out how to change the code, but gave up after two days. Rewriting is faster.</p>",
      "summary": "Last year, Denise, Tina, Marvin, and I received an NWO Open Science grant (203.001.121) to improve the long running BridgeDb project, originally developed by Martijn van Iersel (see doi:10.1186/1471-2105-11-5). Helena joined our group as research software engineer and will work part-time on this grant. We started two weeks ago, so time for an update of results:",
      
      "date_published": "2022-03-05T00:00:00+00:00",
      "date_modified": "2025-03-14T00:00:00+00:00",
      "tags": ["grant","bridgedb","openscience"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1186/1471-2105-11-5", "doi": "10.1186/1471-2105-11-5"
             }
            
          
        ],
      
      "_funding": [{"award": { "title" : "BridgeDb and Wikidata: a powerful combination generating interoperable open research", "uri" : "drc.filenumber:203001121" }, "funder": { "name": "Dutch Research Council", "ror": "04jsz6e67" } }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/5z9yt-vy941",
      "url": "https://chem-bla-ics.linkedchemistry.info/2021/11/15/biohackathon-europe-2021-1-cito.html",
      "title": "BioHackathon Europe 2021 #1: CiTO annotations in BioHackrXiv",
      "content_html": "<p>Serendipity. I did not plan this hack at the <a href=\"https://biohackathon-europe.org/\">BioHackathon Europe 2021</a> but it happened anyway.\nBased on earlier work in the <a href=\"https://www.biomedcentral.com/collections/cito\">Journal of Cheminformatics</a>, extending on the\n<a href=\"https://doi.org/10.7717/peerj-cs.112\">work by Krewinkel et al.</a> I looked into the idea of using the Lua filter for\n<a href=\"https://biohackrxiv.org/\">BioHackrXiv</a>, a preprint server for BioHackathons. Actually, I started by looking at the\nCitation Styling Language file used by the BioHackrXiv tools. But that was just wrong.</p>\n\n<p>Long story short: <a href=\"https://github.com/biohackrxiv/bhxiv-gen-pdf/pull/10\">it worked</a>! Thanks to the encouragements from\n<a href=\"https://github.com/pjotrp\">Pjotr</a> and <a href=\"https://github.com/inutano\">Tazro</a> and suggestions from\n<a href=\"https://twitter.com/larswillighagen/status/1458059589925187585\">Lars</a> and some code on how to\n<a href=\"http://lua-users.org/wiki/TableUtils\">dump a Lua data structure to stdout</a>.</p>\n\n<p>In the Markdown/BibTeX combination you would normally write <code class=\"language-plaintext highlighter-rouge\">[@bibtexkey]</code> to add the reference to the article with the given key\nin the <code class=\"language-plaintext highlighter-rouge\">.bib</code> file. To type the citation (to state the intention why you cite that source), for example because you use a method\nin it, you write <code class=\"language-plaintext highlighter-rouge\">[@usesMethodIn:bibtexkey]</code>. This is different from\n<a href=\"https://github.com/jcheminform/markdown-jcheminf\">how it currently works for the Journal of Cheminformatics</a>,\nwhere the intention cannot be given at citation level yet. You can even use more than one intention, e.g. <code class=\"language-plaintext highlighter-rouge\">[@usesMethodIn:extends:bibtexkey]</code>.</p>\n\n<p>If you want to try it, just create a compatible Markdown file with BibTeX file in a new GitHub repository, and post the repository URL on\nthis <a href=\"http://preview.biohackrxiv.org/\">cool preview website</a>.</p>\n\n<p>Here’s what the created PDF could look like:</p>\n\n<p><img src=\"/assets/images/citoBioHackrXiv.png\" alt=\"\" /></p>",
      "summary": "Serendipity. I did not plan this hack at the BioHackathon Europe 2021 but it happened anyway. Based on earlier work in the Journal of Cheminformatics, extending on the work by Krewinkel et al. I looked into the idea of using the Lua filter for BioHackrXiv, a preprint server for BioHackathons. Actually, I started by looking at the Citation Styling Language file used by the BioHackrXiv tools. But that was just wrong.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/citoBioHackrXiv.png",
      "date_published": "2021-11-15T00:00:00+00:00",
      "date_modified": "2025-02-16T00:00:00+00:00",
      "tags": ["cito","biohackrxiv","markdown","pandoc","biohackeu12"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.7717/peerj-cs.112", "doi": "10.7717/peerj-cs.112"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/q8zfm-45s98",
      "url": "https://chem-bla-ics.linkedchemistry.info/2021/08/28/scholarly-journals-should-use-archived.html",
      "title": "Scholarly journals should use &quot;Archived on&quot; instead of &quot;Accessed on&quot;",
      "content_html": "<p>Publishing habits changes very slowly, too slowly. The whole industry is incredibly inert, which can lead to severe frustration\nas <a href=\"https://chem-bla-ics.linkedchemistry.info/2021/06/11/conflict-of-interest-or-why-i-am.html\">it did for me <i class=\"fa-solid fa-recycle fa-xs\"></i></a>. But sometimes small\nchanges can do so much.</p>\n\n<p>Linkrot, the phenomenon that URLs are not persistent, has been studied, including the in scholarly settings (see\n<a href=\"https://doi.org/10.3998/3336451.0004.210\">1998</a>,\n<a href=\"https://www.jstor.org/stable/20863780\">2000</a>,\n<a href=\"https://doi.org/10.1353/pla.2003.0098\">2003</a>,\n<a href=\"https://doi.org/10.1002/bmb.2003.494031010165\">2006</a>,\n<a href=\"https://doi.org/10.1300/J123v49n03_10\">2008</a>,\n<a href=\"https://doi.org/10.1371/journal.pone.0115253\">2014</a>,\n<a href=\"https://doi.org/10.18329/09757597/2015/8105\">2015</a>,\n<a href=\"https://doi.org/10.1108/GKMC-06-2019-0067\">2000</a>,\n<a href=\"https://journal.code4lib.org/articles/15509\">2021</a>,\nand probably many more). Indeed, scholarly publishers started introducing the following: URLs should be accompanied with an\n“accessed on” statement. Indeed, you can find this in many bibliographic formatting standards.</p>\n\n<p>Indeed, this must change, and we already have a solution <a href=\"https://en.wikipedia.org/wiki/Internet_Archive\">since 1996</a>:\nthe <a href=\"https://archive.org/web/\">Internet Archive</a> (tho the archive goes back much longer). I call all publishers to change\ntheir “Accessed on” to “Archived on”. Two simpel solutions that can compliment each other:</p>\n\n<h2 id=\"authors-archive-upon-submission\">Authors archive upon submission</h2>\n\n<p>This solution is simply introduced by updating author guidelines. Surely it will take a bit of time for bibliography software\nto be updated, and for the time being we still write “Accessed on” until there is proper support of “Archived on”.</p>\n\n<h2 id=\"journals-archive-upon-acceptance\">Journals archive upon acceptance</h2>\n\n<p>This solution looks for all URLs in journal articles and archives them. It doesn’t matter if the author already did this,\nbecause the Internet Archive has no trouble handling this:</p>\n\n<p><img src=\"/assets/images/Screenshot_20210828_102732.png\" alt=\"\" /></p>\n\n<center>Screenshot of the WaybackMachine showing <a href=\"https://web.archive.org/web/19990615000000*/sci.kun.nl\">many captures of the sci.kun.nl domain.</a></center>\n<p><br /></p>\n\n<p>BTW, projects like Wikipedia have <a href=\"https://meta.wikimedia.org/wiki/InternetArchiveBot\">automated the process</a> of\narchiving URLs and I see no reason why publishers could not do this.</p>",
      "summary": "Publishing habits changes very slowly, too slowly. The whole industry is incredibly inert, which can lead to severe frustration as it did for me . But sometimes small changes can do so much.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/Screenshot_20210828_102732.png",
      "date_published": "2021-08-28T00:00:00+00:00",
      "date_modified": "2025-10-11T00:00:00+00:00",
      "tags": ["publishing"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.3998/3336451.0004.210", "doi": "10.3998/3336451.0004.210"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1353/pla.2003.0098", "doi": "10.1353/pla.2003.0098"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1002/bmb.2003.494031010165", "doi": "10.1002/bmb.2003.494031010165"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1300/J123v49n03_10", "doi": "10.1300/J123v49n03_10"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1371/journal.pone.0115253", "doi": "10.1371/journal.pone.0115253"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.18329/09757597/2015/8105", "doi": "10.18329/09757597/2015/8105"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1108/GKMC-06-2019-0067", "doi": "10.1108/GKMC-06-2019-0067"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/daewm-efm18",
      "url": "https://chem-bla-ics.linkedchemistry.info/2021/06/11/conflict-of-interest-or-why-i-am.html",
      "title": "Conflict of Interest. Or why I am stepping down as Editor-in-Chief of the Journal of Cheminformatics.",
      "content_html": "<p><span style=\"width: 30%; display: block; margin-left: auto; margin-right: auto; float: right\">\n<img src=\"/assets/images/jcheminfTimeline.png\" /> <br />\nRough timeline of the <br /> Journal of Cheminformatics. <br />\n<i>The linked PDF has linked years <br /> with references. <br /></i>\n</span>\nIn this open letter, I will explain why I intend to step down as Editor-in-Chief of the <em><a href=\"https://jcheminf.biomedcentral.com/\">Journal of Cheminformatics</a></em>,\nwhich also happens to be a Springer Nature journal. It took me two years to come to this decision, and it cannot be claimed that I did not carefully\nevaluate the various aspects of it. However, I have now come to the conclusion that the opportunity it gives me to implement my ambition to shape open\nscience chemistry now conflicts with the interests of Springer Nature. I will here outline some of the things I have taken into consideration.</p>\n\n<p>… <a href=\"https://doi.org/10.5281/zenodo.4926030\">read more in the full letter on Zenodo</a>.</p>\n\n<h2 id=\"a-personal-note-and-thank-you\">A personal note and thank you</h2>\n<p>I will remain Editor-in-Chief until the end of the year. The journal is doing awesome things and if I get the chance, I will likely continue help the\njournal become even more open science.</p>\n\n<p>Thanks also to <strong>Chris</strong> and <strong>David</strong> for starting the journal. It has been an honor and pleasure to follow their steps. Also a huge thanks to\n<strong>Rajarshi</strong> with whom we have been Editor-in-Chief for close to five years now. It has been a great pleasure to work so actively on Open Science.\nAlso thanks to <strong>Nina</strong> and <strong>Barbara</strong> to joined me and Rajarshi as associate editor and have helped shape the journal. It is comforting I leave\nthe journal with an excellent team. Also, a big thanks to <strong>Samuel</strong> and <strong>Matthew</strong> who have been helping us getting the Open Science things done\nand the the current and past <a href=\"https://jcheminf.biomedcentral.com/about/editorial-board\">Editorial Board</a> members for important discussions in the\ntelcons. Finally, a thank you to all the <strong>authors and reviewers</strong> who have all been working so hard to make the journal a success!</p>",
      "summary": "Rough timeline of the Journal of Cheminformatics. The linked PDF has linked years with references. In this open letter, I will explain why I intend to step down as Editor-in-Chief of the Journal of Cheminformatics, which also happens to be a Springer Nature journal. It took me two years to come to this decision, and it cannot be claimed that I did not carefully evaluate the various aspects of it. However, I have now come to the conclusion that the opportunity it gives me to implement my ambition to shape open science chemistry now conflicts with the interests of Springer Nature. I will here outline some of the things I have taken into consideration.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/jcheminfTimeline.png",
      "date_published": "2021-06-11T00:00:00+00:00",
      "date_modified": "2021-06-11T00:00:00+00:00",
      "tags": ["jcheminf","publishing"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.5281/ZENODO.4926030", "doi": "10.5281/ZENODO.4926030"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/erxg2-sm862",
      "url": "https://chem-bla-ics.linkedchemistry.info/2021/02/16/downloading-all-currently-released.html",
      "title": "Downloading all currently released BridgeDb identifier mapping databases",
      "content_html": "<p>The <a href=\"https://bridgedb.github.io/\">BridgeDb</a> project (doi:<a href=\"https://doi.org/10.1186/1471-2105-11-5\">10.1186/1471-2105-11-5</a>)\n(and <a href=\"https://elixir-europe.org/platforms/interoperability/rirs\">ELIXIR recommended interoperability resource</a>) has several\naims, all around identifier mapping:</p>\n\n<ul>\n  <li>provide a Java API for identifier mapping</li>\n  <li>provide ID mappings (two flavors: with and without semantic meaning)</li>\n  <li>provide services (<a href=\"https://www.bioconductor.org/packages/release/bioc/html/BridgeDbR.html\">R package</a>,\n<a href=\"http://webservice.bridgedb.org/\">OpenAPI webservice</a>)</li>\n  <li>track the history of identifiers</li>\n</ul>\n\n<p>The last one is more recent and two aspects are under development here: secondary identifiers and dead identifiers. More\nabout that in some future post. About the first and the third I am also not going to tell much in this post. Just follow the\nabove links.</p>\n\n<p>I do want to say something in this post about the actually identifier mapping databases, in particular those we distribute as\nApache Derby files, the storage format used by the Java libraries. These are the files you download if you want mapping databases\nfor <a href=\"https://pathvisio.github.io/\">PathVisio</a> (doi:<a href=\"https://doi.org/10.1371/journal.pcbi.1004085\">10.1371/journal.pcbi.1004085</a>).\nBridgeDb has mapping files for various things and some example databases the data it maps between:</p>\n\n<ol>\n  <li>genes and proteins: Ensembl, UniProt, NCBI Gene</li>\n  <li>metabolites; HMDB, ChEBI, LIPID MAPS, Wikidata, CAS</li>\n  <li>publications: DOI, PubMed</li>\n  <li>macromolecular complexes: Complex Portal, Wikidata</li>\n</ol>\n\n<p>The BridgeDb API is agnostic to the things it can map identifiers for.</p>\n\n<p><strong>Downloading mapping files</strong>:\nBridgeDb has an <a href=\"https://bioschemas.org/\">BioSchemas</a>-powered\n<a href=\"https://bridgedb.github.io/data/gene_database/\">web page with an overview of the latest released mapping files</a>.\nIt looks like this:</p>\n\n<p><img src=\"/assets/images/bridgedbDownloadsImage.png\" alt=\"\" /></p>\n\n<p>This webpage is the result from the cyber attack in late 2019, disrupting a good bit of the infrastructure. This is why we\nrenewed the website, including the download page. The new page actually is hosted <a href=\"https://github.com/bridgedb/data\">on GitHub as a Markdown file</a>,\nbut this is where things get interesting. The Markdown file is actually autogenerated from a JSON file with all the info. Everything,\nincluding the BioSchemas annotation is created from that. Basically, JSON gets converted into Markdown (with a custom script), which\ngets converted into HTML by a GitHub Action/Pages. So, when someone releases a new mapping file on Zenodo or Figshare, they only have\nto send me a pull request with updated JSON file.</p>\n\n<p>Now, previously, downloading all released mapping files, for example for the BridgeDb webservice, was a bit complicated. The\ninformation was a HTML file generated by the webserver for a folder. No metadata. Nuno wrote code to extract the relevant info\nand download all the files. However, since the information is now available in a public JSON file, it is a lot easier. The\nfollowing code uses wget and jq, two tools readily available on the popular operating systems. Have fun!</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"o\">!</span>/bin/bash\n\nwget <span class=\"nt\">-nc</span> https://bridgedb.github.io/data/gene.json\nwget <span class=\"nt\">-nc</span> https://bridgedb.github.io/data/corona.json\nwget <span class=\"nt\">-nc</span> https://bridgedb.github.io/data/other.json\n\njq <span class=\"nt\">-r</span> <span class=\"s1\">'.mappingFiles | .[] | \"\\(.file)=\\(.downloadURL)\"'</span> gene.json <span class=\"o\">&gt;</span> files.txt\njq <span class=\"nt\">-r</span> <span class=\"s1\">'.mappingFiles | .[] | \"\\(.file)=\\(.downloadURL)\"'</span> corona.json <span class=\"o\">&gt;&gt;</span> files.txt\njq <span class=\"nt\">-r</span> <span class=\"s1\">'.mappingFiles | .[] | \"\\(.file)=\\(.downloadURL)\"'</span> other.json <span class=\"o\">&gt;&gt;</span> files.txt\n\n<span class=\"k\">for </span>FILE <span class=\"k\">in</span> <span class=\"si\">$(</span><span class=\"nb\">cat </span>files.txt<span class=\"si\">)</span>\n<span class=\"k\">do\n  </span>readarray <span class=\"nt\">-d</span> <span class=\"o\">=</span> <span class=\"nt\">-t</span> splitFILE<span class=\"o\">&lt;&lt;&lt;</span> <span class=\"s2\">\"</span><span class=\"nv\">$FILE</span><span class=\"s2\">\"</span>\n  <span class=\"nb\">echo</span> <span class=\"k\">${</span><span class=\"nv\">splitFILE</span><span class=\"p\">[0]</span><span class=\"k\">}</span>\n  wget <span class=\"nt\">-nc</span> <span class=\"nt\">-O</span> <span class=\"k\">${</span><span class=\"nv\">splitFILE</span><span class=\"p\">[0]</span><span class=\"k\">}</span> <span class=\"k\">${</span><span class=\"nv\">splitFILE</span><span class=\"p\">[1]</span><span class=\"k\">}</span>\n<span class=\"k\">done</span>\n</code></pre></div></div>\n\n<p>Actually, while writing this blog post, I notice the code can be further simplified.</p>",
      "summary": "The BridgeDb project (doi:10.1186/1471-2105-11-5) (and ELIXIR recommended interoperability resource) has several aims, all around identifier mapping:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/bridgedbDownloadsImage.png",
      "date_published": "2021-02-16T00:00:00+00:00",
      "date_modified": "2025-02-16T00:00:00+00:00",
      "tags": ["bridgedb","json"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1186/1471-2105-11-5", "doi": "10.1186/1471-2105-11-5"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1371/journal.pcbi.1004085", "doi": "10.1371/journal.pcbi.1004085"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/40an0-kne14",
      "url": "https://chem-bla-ics.linkedchemistry.info/2020/11/30/cito-updates-3-third-paper-in.html",
      "title": "CiTO updates #3: third paper in the collection and updated Scholia patch",
      "content_html": "<p>Last week <a href=\"https://jcheminf.biomedcentral.com/articles/10.1186/s13321-020-00474-z\">the third paper</a> got published\nin the <a href=\"https://www.biomedcentral.com/collections/cito\">Citation Typing Ontology Collection</a> and this weekend\nI finished adding the citation annotations to Wikidata.</p>\n\n<p>While the number of papers in the <a href=\"https://jcheminf.biomedcentral.com/\">Journal of Cheminformatics</a> is only\n<a href=\"http://127.0.0.1:8100/venue/Q6294930/cito\">slowly growing</a>, the number of journals receiving annotated citations\nis growing faster. And there are 70 now:</p>\n\n<p><img src=\"/assets/images/Screenshot_20201130_085511.png\" alt=\"\" /></p>\n\n<p>The <a href=\"https://github.com/fnielsen/scholia/pull/1289\">Scholia patch</a> needed for this updated table is not online yet.</p>",
      "summary": "Last week the third paper got published in the Citation Typing Ontology Collection and this weekend I finished adding the citation annotations to Wikidata.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/Screenshot_20201130_085511.png",
      "date_published": "2020-11-30T00:00:00+00:00",
      "date_modified": "2020-11-30T00:00:00+00:00",
      "tags": ["cito","scholia"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1186/s13321-020-00474-z", "doi": "10.1186/s13321-020-00474-z"
             }
            
          
        ],
      
      "_funding": [{"award": { "title" : "To support development on Scholia, a software tool to facilitate the exploration and curation of the research literature", "acronym" : "Scholia", "uri" : "https://sloan.org/grant-detail/G-2019-11458" }, "funder": { "name": "Alfred P. Sloan Foundation", "ror": "052csg198" } }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/taepn-e3n12",
      "url": "https://chem-bla-ics.linkedchemistry.info/2020/11/28/new-paper-wikipathways-connecting.html",
      "title": "new paper: &quot;WikiPathways: connecting communities&quot;",
      "content_html": "<p><span style=\"width: 45%; display: block; margin-left: auto; margin-right: auto; float: right\">\n<img src=\"/assets/images/gkaa1024fig2.jpeg\" /> <br />\nThe number of revisions and contributors for all pathways in the human pathway analysis collection.\n</span></p>\n\n<p>The last WikiPathways was already 3 years ago, an often used frequency for Nucleic Acids Research updates.\nSo, time for an update, and what an updates we had: <em>WikiPathways: connecting communities</em>\n(doi:<a href=\"https://doi.org/10.1093/nar/gkaa1024\">10.1093/nar/gkaa1024</a>). This update focuses on the open,\ncollaborative nature of <a href=\"http://wikipathways.org/\">WikiPathway</a> and on the growing role of the portals,\nlike the <a href=\"http://lipids.wikipathways.org/\">lipids portal</a>, the <a href=\"http://aop.wikipathways.org/\">AOP portal</a>,\nthe <a href=\"http://nanomaterials.wikipathways.org/\">nanomaterials portal</a>, and the\n<a href=\"http://iem.wikipathways.org/\">inborn errors of metabolism (IEM) portal</a>. There is also a lot happening\nin the background, to make our tools better (much needed), our curation support better (in the future\navailable in multiple ways), our data model better, and our dissemination even better (e.g. with\nScholia/Toolforge and nanopublications). A huge thanks to <a href=\"https://orcid.org/0000-0003-2230-0840\">Marvin</a>\nand <a href=\"https://orcid.org/0000-0002-7699-8191\">Tina</a> to get everything together. Finally, if you haven’t\nrecently checked the WikiPathways SPARQL endpoint, read <a href=\"https://doi.org/10.1093/nar/gkaa1024\">the paper</a> :)</p>",
      "summary": "The number of revisions and contributors for all pathways in the human pathway analysis collection.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/gkaa1024fig2.jpeg",
      "date_published": "2020-11-28T00:00:00+00:00",
      "date_modified": "2020-11-28T00:00:00+00:00",
      "tags": ["wikipathways","lipidmaps","aop","scholia","nanosolveit","riskgone"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1093/NAR/GKAA1024", "doi": "10.1093/NAR/GKAA1024"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/6v7gp-pfd21",
      "url": "https://chem-bla-ics.linkedchemistry.info/2020/11/01/cito-updates-2-annotation-migration-to.html",
      "title": "CiTO updates #2: annotation migration to Wikidata and first Scholia patch",
      "content_html": "<p>During the time of the editorial about the Journal of Cheminformatics\n<a href=\"https://chem-bla-ics.blogspot.com/2020/07/new-editorial-adoption-of-citation.html\">Citation Typing Ontology (CiTO) Pilot</a>\nI already worked out a model to add CiTO annotation in Wikidata. It looks like this for\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2020/11/01/cito-updates-1-first-research-paper-in.html\">the first research article with annotation <i class=\"fa-solid fa-recycle fa-xs\"></i></a>:</p>\n\n<p><img src=\"/assets/images/cito_in_wikidata.png\" alt=\"\" /></p>\n\n<p>At the time I also write some <a href=\"https://github.com/egonw/cito-wikidata-queries\">SPARQL queries against Wikidata</a> to\nsummaries the current use. There are, for example, at this moment <a href=\"https://query.wikidata.org/#SELECT%20%3FcitingArticle%20%3FcitingArticleLabel%20%3FcitedArticle%20%3FcitedArticleLabel%20%3Fintention%20%3FintentionLabel%20%3Fcito%20WHERE%20%7B%0A%20%20%3FcitingArticle%20p%3AP2860%20%3FcitationStatement%20.%0A%20%20%3FcitationStatement%20pq%3AP3712%20%3Fintention%20%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20ps%3AP2860%20%3FcitedArticle%20.%0A%20%20%3Fintention%20wdt%3AP31%20wd%3AQ96471816%20%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20wdt%3AP2888%20%3Fcito%20.%0A%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22%5BAUTO_LANGUAGE%5D%2Cen%22.%20%7D%0A%7D\">128 CiTO annotations in Wikidata</a>\n(with the above model). At this moment the citation intention <a href=\"https://query.wikidata.org/embed.html#SELECT%20%3Fcito%20%3FintentionLabel%20(COUNT(DISTINCT%20%3FcitingArticle)%20AS%20%3Fcount)%20WHERE%20%7B%0A%20%20%3FcitingArticle%20p%3AP2860%20%3FcitationStatement%20.%0A%20%20%3FcitationStatement%20pq%3AP3712%20%3Fintention%20%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20ps%3AP2860%20%3FcitedArticle%20.%0A%20%20%3Fintention%20wdt%3AP31%20wd%3AQ96471816%20%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20wdt%3AP2888%20%3Fcito%20.%0A%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22%5BAUTO_LANGUAGE%5D%2Cen%22.%20%7D%0A%7D%20GROUP%20BY%20%3Fcito%20%3Fintention%20%3FintentionLabel%0A%20%20ORDER%20BY%20DESC(%3Fcount)\">“uses method in cited work” is currently the most common</a>.\nAnd 20 journals now have one or more articles with CiTO annotation, with the <a href=\"https://query.wikidata.org/embed.html#SELECT%20%3Fjournal%20%3FjournalLabel%20(COUNT(DISTINCT%20%3FcitingArticle)%20AS%20%3Fcount)%20WHERE%20%7B%0A%20%20%3FcitingArticle%20p%3AP2860%20%3FcitationStatement%20%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20wdt%3AP1433%20%3Fjournal%20.%0A%20%20%3FcitationStatement%20pq%3AP3712%20%3Fintention%20%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20ps%3AP2860%20%3FcitedArticle%20.%0A%20%20%3Fintention%20wdt%3AP31%20wd%3AQ96471816%20.%0A%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22%5BAUTO_LANGUAGE%5D%2Cen%22.%20%7D%0A%7D%20GROUP%20BY%20%3Fjournal%20%3FjournalLabel%0A%20%20ORDER%20BY%20DESC(%3Fcount)\">Journal of Cheminformatics the most</a>\n(no surprise).</p>\n\n<h2 id=\"scholia\">Scholia</h2>\n\n<p>Next up is to enrich <a href=\"https://scholia.toolforge.org/\">Scholia</a>. This may be a bit tricky at this moment, with the annotation not\nbeing very abundant at this moment. However, I have started a patch (WIP, <strong>w</strong>ork <strong>i</strong>n <strong>p</strong>rogress) to show CiTO information.\nThe first step is an extension to the venue aspect, here in action (locally) for the Journal of Cheminformatics:</p>\n\n<p><img src=\"/assets/images/cito_jcheminf.png\" alt=\"\" /></p>\n\n<p>What we learn from this bubble graph is that at this moment that ‘updates the cited work’ is the most common annotation of\narticles citing J.Cheminform. articles. Similar pages will have to be developed for works, authors, etc.</p>\n\n<p>This Scholia work, btw, was funded by a Alfred P. Sloan under grant number <a href=\"https://sloan.org/grant-detail/8961\">G-2019-11458</a>.</p>",
      "summary": "During the time of the editorial about the Journal of Cheminformatics Citation Typing Ontology (CiTO) Pilot I already worked out a model to add CiTO annotation in Wikidata. It looks like this for the first research article with annotation :",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/cito_in_wikidata.png",
      "date_published": "2020-11-01T00:10:00+00:00",
      "date_modified": "2025-12-29T00:00:00+00:00",
      "tags": ["cito","wikidata","scholia"],
      
      "_funding": [{"award": { "title" : "To support development on Scholia, a software tool to facilitate the exploration and curation of the research literature", "acronym" : "Scholia", "uri" : "https://sloan.org/grant-detail/G-2019-11458" }, "funder": { "name": "Alfred P. Sloan Foundation", "ror": "052csg198" } }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/w03hn-k8p90",
      "url": "https://chem-bla-ics.linkedchemistry.info/2020/11/01/cito-updates-1-first-research-paper-in.html",
      "title": "CiTO updates #1: first research paper in the Journal of Cheminformatics with CiTO annotation published",
      "content_html": "<p>After a time of exploration of technical needs, idea, plans, the <a href=\"https://jcheminf.biomedcentral.com/\">Journal of Cheminformatics</a> launched\n<a href=\"https://chem-bla-ics.blogspot.com/2020/07/new-editorial-adoption-of-citation.html\">its Citation Typing Ontology (CiTO) Pilot</a> this summer\n(doi:<a href=\"https://doi.org/10.1186/s13321-020-00448-1\">10.1186/s13321-020-00448-1</a>). I am very excited about this, because the CiTO tells us why we\nare citing literature. We are a very long way away from publishing industry adoption, but we have to start somewhere. Laeeq Ahmed <em>et al.</em>\npublished a few weeks ago the first research article with CiTO annotation of references\n(“<a href=\"https://jcheminf.biomedcentral.com/articles/10.1186/s13321-020-00464-1\">Predicting target profiles with confidence as a service using docking scores</a>”)!</p>\n\n<p><img src=\"/assets/images/Screenshot_20201101_144808.png\" alt=\"\" /></p>\n\n<p>Of course, I also have to show a screenshot of what the annotation actually looks like, so here goes:</p>\n\n<p><img src=\"/assets/images/Screenshot_20201101_145124.png\" alt=\"\" /></p>\n\n<p>Thanks for the authors for adding these annotations!</p>",
      "summary": "After a time of exploration of technical needs, idea, plans, the Journal of Cheminformatics launched its Citation Typing Ontology (CiTO) Pilot this summer (doi:10.1186/s13321-020-00448-1). I am very excited about this, because the CiTO tells us why we are citing literature. We are a very long way away from publishing industry adoption, but we have to start somewhere. Laeeq Ahmed et al. published a few weeks ago the first research article with CiTO annotation of references (“Predicting target profiles with confidence as a service using docking scores”)!",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/Screenshot_20201101_145124.png",
      "date_published": "2020-11-01T00:00:00+00:00",
      "date_modified": "2020-11-01T00:00:00+00:00",
      "tags": ["cito","jcheminf"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1186/S13321-020-00448-1", "doi": "10.1186/S13321-020-00448-1"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1186/s13321-020-00464-1", "doi": "10.1186/s13321-020-00464-1"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/krqxd-hfw15",
      "url": "https://chem-bla-ics.linkedchemistry.info/2020/10/31/sars-cov-2-covid-19-and-open-science.html",
      "title": "SARS-CoV-2, COVID-19, and Open Science",
      "content_html": "<div style=\"float: right; width: 300px\">\n<img src=\"/assets/images/Screenshot_20201031_100753.png\" /><br /><br />\n<span style=\"font-size: smaller; text-wrap: break-word;\"><a href=\"https://www.wikipathways.org/index.php/Pathway:WP4846\">WP4846</a> that I started on March 16. It will see a massive overhaul in the next weeks.</span></div>\n<p>Voices are <a href=\"https://www.robertschuwer.nl/?p=3116\">getting stronger</a> over <a href=\"https://twitter.com/RetractionWatch/status/1321512267855339520\">how important Open Science</a>\nis. Insiders have known the advantages for decades. We also know the issues in the transition, but the transition has been steady. Contributing to Open Science is\nsimple: there are plenty of project where you can contribute without jeopardizing your own research (funding or prestige). Myself, my small contributions have been\ndone without funding too. But I needed to do something. I have been mostly <a href=\"https://chem-bla-ics.blogspot.com/2020/03/sars-cov-2-stuck-at-home-flu-and.html\">self-quarantined since March 6</a>,\nwith only very few exception. And I’m so done with it. Like so many other people. It won’t stop me wearing masks when I go shopping (etc).</p>\n\n<p>Reflecting on the past eight months, particularly the last two months have been tough. It’s easier to sit at home and in the garden when it is warm outside, light. But for another 7 weeks or so, they days will only get darker. The past two months were also so busy with grant reporting that I did not get around to much else, even with an uncommon long stretch of long working weeks, with about 8 weeks of 70-80 hrs of active work in that period. In fact, the past two weeks, with most of the deadlines past, I had a physical reset, and was happy if I made 40 hrs a week.</p>\n\n<p>So, where is my COVID-19 work now, where is it going?</p>\n\n<h2 id=\"molecular-pathways\">Molecular Pathways</h2>\n\n<p>First, what did we reach? First, leveraging from the Open Science community I am involved in, I stared collaborating. With old friends and making new\nfriends. I was delighted to see I was not the only one. In fact, Somewhere in May/June, I had to give up following all Open Science around COVID-19,\nbecause there was too much.</p>\n\n<p>For example, I was not the only one wanting to describe our slowly developing molecular knowledge of the\n<a href=\"https://scholia.toolforge.org/topic/Q82069695\">SARS-CoV-2</a> virus. While my pathway focused on specifically the confirmed processes for SARS-CoV-2,\nmy colleague Freddie digitized a recent review about other corona viruses. Check out her work:\n<a href=\"https://www.wikipathways.org/index.php/Pathway:WP4863\">WP4863</a>, <a href=\"https://www.wikipathways.org/index.php/Pathway:WP4864\">WP4864</a>,\n<a href=\"https://www.wikipathways.org/index.php/Pathway:WP4877\">WP4877</a>, <a href=\"https://www.wikipathways.org/index.php/Pathway:WP4880\">WP4880</a>,\nand <a href=\"https://www.wikipathways.org/index.php/Pathway:WP4912\">WP4912</a>. In fact, In fact, so much was done by so many people in such a short time, that the\n<a href=\"http://covid.wikipathways.org/\">WikiPathways COVID-19 Portal</a> was set up.</p>\n\n<p>Further reading:</p>\n<ul>\n  <li>Ostaszewski M. COVID-19 Disease Map, a computational knowledge repository of SARS-CoV-2 virus-host interaction mechanisms. bioRxiv. 2020 Oct 28;\n<a href=\"https://doi.org/10.1101/2020.10.26.356014v1\">10.1101/2020.10.26.356014v1</a> (and unversioned <a href=\"https://doi.org/10.1101/2020.10.26.356014\">10.1101/2020.10.26.356014</a>)</li>\n  <li>Ostaszewski M, Mazein A, Gillespie ME, Kuperstein I, Niarakis A, Hermjakob H, et al. COVID-19 Disease Map, building a computational\nrepository of SARS-CoV-2 virus-host interaction mechanisms. Sci Data. 2020 May 5;7(1):136\n<a href=\"https://doi.org/10.1038/s41597-020-0477-8\">10.1038/s41597-020-0477-8</a></li>\n</ul>\n\n<h2 id=\"interoperability-with-wikidata\">Interoperability with Wikidata</h2>\n\n<div style=\"float: right; width: 300px\"><img src=\"/assets/images/Screenshot_20201031_104646.png\" /></div>\n<p>Because I see an <a href=\"https://chem-bla-ics.blogspot.com/2020/03/new-paper-wikidata-as-knowledge-graph.html\">essential role</a> for\n<a href=\"https://wikidata.org/\">Wikidata</a> in Open Science, and because regular databases did not provide identifiers for the molecular building blocks,\nwe created them in Wikidata. This was essential, because I wanted to use <a href=\"https://scholia.toolforge.org/\">Scholia</a> (see screenshot on the\nright) to track the research output (something that by now has become quite a challenge; btw, checkout\n<a href=\"https://laurendupuis.github.io/Scholia_tutorial/\">Lauren’s tutorial on this</a>). This too was\n<a href=\"https://chem-bla-ics.blogspot.com/2020/03/talking-sars-cov-2-with-big-data.html\">still in March</a>. However, because Scholia itself is a\ngeneral tool, I needed shortlists of all SARS-CoV-2 genes, all proteins, etc. So, I created <a href=\"https://egonw.github.io/SARS-CoV-2-Queries/\">this book</a>.\nIt’s autogenerated and auto-updated by taking advantage of SPARQL queries against Wikidata. And I am so excited the book has been translated in\n<a href=\"https://egonw.github.io/SARS-CoV-2-Queries/ja/\">Japanese</a>, <a href=\"https://egonw.github.io/SARS-CoV-2-Queries/pt/\">Portugues</a>, and\n<a href=\"https://egonw.github.io/SARS-CoV-2-Queries/es/\">Spanish</a>. The i18n work is thanks to the\n<a href=\"https://github.com/virtual-biohackathons/covid-19-bh20\">virtual BioHackathon in April</a>, where Yayamamo bootstrapped the framework\nto localize the content.</p>\n\n<p>Also during that BioHackathon, we started a collaboration with <a href=\"https://www.ebi.ac.uk/complexportal/home\">Complex Portal</a>’s\nBirgit, because the next step was to have identifiers for (bio)molecular complexes. This work is still ongoing, but using a\nworkaround we developed for <a href=\"https://wikipathways.org/\">WikiPathways</a> (because complexes in GPML currently cannot have\nidentifiers), we can now link out to Complex Portal, as visible in this screenshot:</p>\n\n<p><img src=\"/assets/images/Screenshot_20200408_235320.png\" alt=\"\" /></p>\n\n<center><div style=\"font-size: smaller; text-wrap: break-word;\">The autophagy initiation complex has the\n<a href=\"https://www.ebi.ac.uk/complexportal/complex/CPX-373\">CPX-373</a> identifier in Complex Portal.</div><br /></center>\n\n<p>Joining the Wikidata effort is simple. Just visit <a href=\"https://www.wikidata.org/wiki/Wikidata:WikiProject_COVID-19\">Wikidata:WikiProject_COVID-19</a>\nand find your thing of interest. Because the past two months have been so crowded, I still did not get around to explore the\n<a href=\"https://github.com/Knowledge-Graph-Hub/kg-covid-19\">kg-covid-19 project</a>, but sounds very interesting too!</p>\n\n<p>Further reading:</p>\n<ul>\n  <li>Waagmeester A, Stupp G, Burgstaller-Muehlbacher S, Good BM, Griffith M, Griffith OL, et al. Wikidata as a knowledge graph\nfor the life sciences. eLife. 2020 Mar 17;9:e52614. <a href=\"https://doi.org/10.7554/eLife.52614\">10.7554/eLife.52614</a></li>\n  <li>Waagmeester A, Willighagen EL, Su AI, Kutmon M, Labra Gayo JE, Fernández-Álvarez D, et al. A protocol for adding\nknowledge to Wikidata, a case report. bioRxiv [Internet]. 2020 Apr 7 [cited 2020 Apr 17];\n<a href=\"https://doi.org/10.1101/2020.04.05.026336\">10.1101/2020.04.05.026336</a></li>\n</ul>\n\n<h2 id=\"computer-assisted-data-curation\">Computer-assisted data curation</h2>\n\n<p>For some years now, I have been working on computer-assisted data curation of WikiPathways, but also Wikidata (for chemical compounds).\nOnce your biological knowledge is machine readable, you can have learn machines to recognize common mistakes. Some are basically\nsimple checks, like missing information. But it gets exciting if we take advantage of linked data, and we can have machines check\nconsistency between two or more resources. The better out annotation, the more powerful this computer-assisted data curation becomes.\nChris has been urging me to publish this, but I haven’t gotten around to this yet.</p>\n\n<p>As part of my COVID-19 work, I have started making <a href=\"https://github.com/wikipathways/SARS-CoV-2-WikiPathways/tree/master/reports\">curation reports for specific WikiPathways</a>.\nTo enable this, I worked out how to reuse the testing without JUnit, allowing the tests to be used as a library. That allows creating the reports, but in the future\nwill also allow use directly in PathVisio. A second improvement to the testing stack is that tests are now more easily annotated.\nThat allows specifying tests only to be run for a certain WikiPathways portal.</p>\n\n<p>But a lot remains to be done. I think at this moment I only migrated, perhaps, some 5% of all tests. So, this is very much on my “what is next?” list.</p>\n\n<h2 id=\"what-is-next\">What is next?</h2>\n\n<p>There is a lot I need, want, and should do. here are some ideas. Maybe you wan to beat me to it. Really, I don’t mind being scooped,\nwhen it comes to public health. Here goes:</p>\n\n<ol>\n  <li>file SARS-CoV-2 book <a href=\"https://github.com/egonw/SARS-CoV-2-Queries/issues?q=is%3Aissue+is%3Aopen+label%3Atranslations\">translation update requests</a> for some recent updates</li>\n  <li>update the SARS-CoV-2 book with a list of important SNPs</li>\n  <li>add BioSchemase to the SARS-CoV-2 book for individual proteins, genes, etc</li>\n  <li>update WP4846 with recent literature</li>\n  <li>have another ‘main subject’ annotation round for SARS-CoV-2 proteins</li>\n  <li>migrate more pathways tests from JUnit into the testing library</li>\n  <li>write a new test to detect preprints in pathway literature lists and check for journal article versions</li>\n  <li>finish the Dutch translation of the SARS-CoV-2 book</li>\n  <li>write a tool to recognize WikiPathways complexes with matches in Complex Portal</li>\n  <li>write a tool to generate markdown for any WikiPathways with curation suggestions based on content in other resources</li>\n  <li>develop a few HTML+JavaScript pages to summarize WikiPathways COVID-19 Portal content</li>\n</ol>\n\n<p>Am I missing anything? Tweet me or leave a comment here.</p>",
      "summary": "WP4846 that I started on March 16. It will see a massive overhaul in the next weeks. Voices are getting stronger over how important Open Science is. Insiders have known the advantages for decades. We also know the issues in the transition, but the transition has been steady. Contributing to Open Science is simple: there are plenty of project where you can contribute without jeopardizing your own research (funding or prestige). Myself, my small contributions have been done without funding too. But I needed to do something. I have been mostly self-quarantined since March 6, with only very few exception. And I’m so done with it. Like so many other people. It won’t stop me wearing masks when I go shopping (etc).",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/Screenshot_20201031_100753.png",
      "date_published": "2020-10-31T00:00:00+00:00",
      "date_modified": "2020-10-31T00:00:00+00:00",
      "tags": ["covid19","wikipathways","wikidata"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1038/S41597-020-0477-8", "doi": "10.1038/S41597-020-0477-8"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1101/2020.10.26.356014", "doi": "10.1101/2020.10.26.356014"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.7554/ELife.52614", "doi": "10.7554/ELife.52614"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1101/2020.04.05.026336", "doi": "10.1101/2020.04.05.026336"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/7ts20-k7s71",
      "url": "https://chem-bla-ics.linkedchemistry.info/2020/07/03/bioclipse-git-experiences-2-create.html",
      "title": "Bioclipse git experiences #2: Create patches for individual plugins/features",
      "content_html": "<p>This is a series of two posts repeating some content I <a href=\"https://web.archive.org/web/20180821111520/http://wiki.bioclipse.net/index.php?title=Git_Development\">wrote up back in the Bioclipse days</a>\n(see also <a href=\"https://scholia.toolforge.org/topic/Q1769726\">this Scholia page</a>). They both deal with something\nwe were facing: restructuring of version control repositories, while actually keeping the history. For\nexample, you may want to copy or move code from one repository to another. A second use case can be a file\nthat must be removed (there are valid reasons for that). Because these posts are based on Bioclipse work,\nthere will be some specific terminology, but the approach I regularly apply in other situations.</p>\n\n<p>This second post talks about how to migrate code from one repository to another.</p>\n\n<h2 id=\"create-patches-for-individual-pluginsfeatures\">Create patches for individual plugins/features</h2>\n\n<p>While the above works pretty well, a good alternative in situations where you only need to get a\nrepository-with-history for a few plugins, is to use patch sets.</p>\n\n<p>First, initialize a new git repository, e.g. bioclipse.rdf:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nb\">mkdir </span>bioclipse.rdf\n<span class=\"nb\">cd </span>bioclipse.rdf\ngit init\nnano README\ngit commit <span class=\"nt\">-m</span> <span class=\"s2\">\"Added README with some basic info about the new repository\"</span> README\n</code></pre></div></div>\n\n<p>Then, for each plugin discover you need what the commit was where the plugins was first commited, using the git-svn repository created earlier:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nb\">cd </span>your.gitsvn.checkout\ngit log <span class=\"nt\">--pretty</span><span class=\"o\">=</span>oneline externals/com.hp.hpl.jena/ | <span class=\"nb\">tail</span> <span class=\"nt\">-1</span>\n</code></pre></div></div>\n\n<p>Then create patches for the last tree before that last patch by appending <code class=\"language-plaintext highlighter-rouge\">^1</code> to the commit hash. For example, the first patch of the Jena libraries was 06d0eb0542377f958d06892860ea3363e3316389, so I type:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nb\">rm </span>00<span class=\"k\">*</span>.patch\ngit format-patch 06d0eb0542377f958d06892860ea3363e3316389^1 <span class=\"nt\">--</span> externals/com.hp.hpl.jena\n</code></pre></div></div>\n\n<p>(tune the filter when removing old patches if there are more than 99!)</p>\n\n<p>The previous two steps can be combined into a Perl script:</p>\n\n<div class=\"language-perl highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"c1\">#!/usr/bin/perl</span>\n<span class=\"k\">use</span> <span class=\"nv\">diagnostics</span><span class=\"p\">;</span>\n<span class=\"k\">use</span> <span class=\"nv\">strict</span><span class=\"p\">;</span>\n\n<span class=\"k\">my</span> <span class=\"nv\">$plugin</span> <span class=\"o\">=</span> <span class=\"nv\">$ARGV</span><span class=\"p\">[</span><span class=\"mi\">0</span><span class=\"p\">];</span>\n\n<span class=\"k\">if</span> <span class=\"p\">(</span><span class=\"o\">!</span><span class=\"nv\">$plugin</span><span class=\"p\">)</span> <span class=\"p\">{</span>\n  <span class=\"k\">print</span> <span class=\"p\">\"</span><span class=\"s2\">Syntax: gfp &lt;plugin|feature&gt;</span><span class=\"se\">\\n</span><span class=\"p\">\";</span>\n  <span class=\"nb\">exit</span><span class=\"p\">(</span><span class=\"mi\">0</span><span class=\"p\">);</span>\n<span class=\"p\">}</span>\n\n<span class=\"nb\">die</span> <span class=\"p\">\"</span><span class=\"s2\">Cannot find plugin or feature </span><span class=\"si\">$plugin</span><span class=\"s2\"> !</span><span class=\"p\">\"</span> <span class=\"k\">if</span> <span class=\"p\">(</span><span class=\"o\">!</span><span class=\"p\">(</span><span class=\"o\">-</span><span class=\"nv\">e</span> <span class=\"nv\">$plugin</span><span class=\"p\">));</span>\n\n<span class=\"p\">`</span><span class=\"sb\">rm -f *.patch</span><span class=\"p\">`;</span>\n<span class=\"k\">my</span> <span class=\"nv\">$hash</span> <span class=\"o\">=</span> <span class=\"p\">`</span><span class=\"sb\">git log --follow --pretty=oneline </span><span class=\"si\">$plugin</span><span class=\"sb\"> | tail -1 | cut -d' ' -f1</span><span class=\"p\">`;</span>\n<span class=\"nv\">$hash</span> <span class=\"o\">=~</span> <span class=\"sr\">s/\\n|\\r//g</span><span class=\"p\">;</span>\n\n<span class=\"k\">print</span> <span class=\"p\">\"</span><span class=\"s2\">Plugin: </span><span class=\"si\">$plugin</span><span class=\"s2\"> </span><span class=\"se\">\\n</span><span class=\"p\">\";</span>\n<span class=\"k\">print</span> <span class=\"p\">\"</span><span class=\"s2\">Hash: </span><span class=\"si\">$hash</span><span class=\"s2\"> </span><span class=\"se\">\\n</span><span class=\"p\">\";</span>\n<span class=\"p\">`</span><span class=\"sb\">git format-patch </span><span class=\"si\">$hash</span><span class=\"sb\">^1 -- </span><span class=\"si\">$plugin</span><span class=\"p\">`;</span>\n</code></pre></div></div>\n\n<p>Move these patches into your new repository:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nb\">mv </span>00<span class=\"k\">*</span>.patch ../bioclipse.rdf\n</code></pre></div></div>\n\n<p>(tune the filter when moving the patches if there are more than 99! Also customize the target folder name to match your situation)</p>\n\n<p>Apply the new patches in your new git repository:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nb\">cd</span> ../bioclipse.rdf\ngit am 00<span class=\"k\">*</span>.patch\n</code></pre></div></div>\n\n<p>(You’re on your own if that fails… and you may have to default to the other alternative then)</p>\n\n<p>Repeat those two steps for all plugins you want in your new repository</p>",
      "summary": "This is a series of two posts repeating some content I wrote up back in the Bioclipse days (see also this Scholia page). They both deal with something we were facing: restructuring of version control repositories, while actually keeping the history. For example, you may want to copy or move code from one repository to another. A second use case can be a file that must be removed (there are valid reasons for that). Because these posts are based on Bioclipse work, there will be some specific terminology, but the approach I regularly apply in other situations.",
      
      "date_published": "2020-07-03T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["bioclipse","git"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/a75k0-fjg73",
      "url": "https://chem-bla-ics.linkedchemistry.info/2020/07/02/bioclipse-git-experiences-1-strip-away.html",
      "title": "Bioclipse git experiences #1: Strip away unwanted plugins",
      "content_html": "<p>This is a series of two posts repeating some content I <a href=\"https://web.archive.org/web/20180821111520/http://wiki.bioclipse.net/index.php?title=Git_Development\">wrote up back in the Bioclipse days</a>\n(see also <a href=\"https://scholia.toolforge.org/topic/Q1769726\">this Scholia page</a>). They both deal with something\nwe were facing: restructuring of version control repositories, while actually keeping the history. For\nexample, you may want to copy or move code from one repository to another. A second use case can be a file\nthat must be removed (there are valid reasons for that). Because these posts are based on Bioclipse work,\nthere will be some specific terminology, but the approach I regularly apply in other situations.</p>\n\n<p>For this first post, think of a <em>plugin</em> as a subfolder, tho it even applies to files.</p>\n\n<h2 id=\"strip-away-unwanted-plugins\">Strip away unwanted plugins</h2>\n\n<p>In this case, you remove everything you do not want in your new git repository. Do:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>git clone <span class=\"nt\">--bare</span> <span class=\"nt\">--no-hardlinks</span> old.local.clone/ new.local.clone/\n</code></pre></div></div>\n\n<p>Then use:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>git filter-branch <span class=\"nt\">--index-filter</span> <span class=\"s1\">'git rm -r -q --cached --ignore-unmatch plugins/net.bioclipse.actionHistory plugins/net.bioclipse.analysis'</span> HEAD\n</code></pre></div></div>\n\n<p>It often happens that you need to run the above command several times, in cases when there are many subdirectories to be removed.\nWhen you removed all the bits you need removed, you can clean up the repository and reduce the size considerably with:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code> git repack <span class=\"nt\">-ad</span><span class=\"p\">;</span> git prune\n</code></pre></div></div>",
      "summary": "This is a series of two posts repeating some content I wrote up back in the Bioclipse days (see also this Scholia page). They both deal with something we were facing: restructuring of version control repositories, while actually keeping the history. For example, you may want to copy or move code from one repository to another. A second use case can be a file that must be removed (there are valid reasons for that). Because these posts are based on Bioclipse work, there will be some specific terminology, but the approach I regularly apply in other situations.",
      
      "date_published": "2020-07-02T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["bioclipse","git"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/ee0y3-3ez74",
      "url": "https://chem-bla-ics.linkedchemistry.info/2020/03/19/new-paper-wikidata-as-knowledge-graph.html",
      "title": "new paper: &quot;Wikidata as a knowledge graph for the life sciences&quot;",
      "content_html": "<p><span style=\"width: 40%; display: block; margin-left: auto; margin-right: auto; float: right\">\n<img src=\"/assets/images/default.png\" /> <br />\nA figure from the article, outlining the idea of using SPARQL queries to extract data from the open knowledge base.\n</span>\nAs a reader of my blog, you know I have been doing quite some research <a href=\"https://chem-bla-ics.blogspot.com/search?q=wikidata\">where Wikidata has some role</a>.\nI am preparing a paper on the work I have done around chemicals in Wikidata, based on what I\n<a href=\"https://figshare.com/articles/Wikidata_and_Scholia_as_a_hub_linking_chemical_knowledge/6356027\">presented at the ICCS with a poster</a>.\nSo, I was delighted when <a href=\"https://twitter.com/andrawaag\">Andra</a> and <a href=\"https://twitter.com/andrewsu\">Andrew</a> asked me to contribute\nto a paper outline the importance of Wikidata to the life sciences. The paper was published in <a href=\"https://elifesciences.org/\">eLife</a>,\nwhich I’m excited about to, as they do a significant amount of publishing innovation.</p>\n\n<p>I’ll keep this post brief, as I have plenty of work to do, among which is SARS-CoV-2 data in Wikidata. Join this project,\nafter you read the paper: <em>Wikidata as a knowledge graph for the life sciences</em> (doi:<a href=\"https://doi.org/10.7554/eLife.52614\">10.7554/eLife.52614</a>,\nor in Scholia):</p>\n\n<ul>\n  <li><a href=\"https://www.wikidata.org/wiki/Wikidata:WikiProject_COVID-19\">Wikidata:WikiProject COVID-19</a>.</li>\n</ul>\n\n<p>I’ll write up some more queries for this eBook now: <a href=\"https://egonw.github.io/SARS-CoV-2-Queries/\">Wikidata Queries around the SARS-CoV-2 virus and pandemic</a>.</p>",
      "summary": "A figure from the article, outlining the idea of using SPARQL queries to extract data from the open knowledge base. As a reader of my blog, you know I have been doing quite some research where Wikidata has some role. I am preparing a paper on the work I have done around chemicals in Wikidata, based on what I presented at the ICCS with a poster. So, I was delighted when Andra and Andrew asked me to contribute to a paper outline the importance of Wikidata to the life sciences. The paper was published in eLife, which I’m excited about to, as they do a significant amount of publishing innovation.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/default.png",
      "date_published": "2020-03-19T00:00:00+00:00",
      "date_modified": "2020-03-19T00:00:00+00:00",
      "tags": ["wikidata","sparql","rdf"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.7554/ELIFE.52614", "doi": "10.7554/ELIFE.52614"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.6084/m9.figshare.6356027.v1", "doi": "10.6084/m9.figshare.6356027.v1"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/8ny1p-h0g32",
      "url": "https://chem-bla-ics.linkedchemistry.info/2019/10/14/chemcuration-2019-poster-conference.html",
      "title": "ChemCuration 2019 Poster Conference: Call for Posters",
      "content_html": "<p><span style=\"width: 40%; display: block; margin-left: auto; margin-right: auto; float: right\">\n<img src=\"/assets/images/Screenshot_20191014_174204.png\" /> <br />\nTwitter profile.\n</span></p>\n\n<p><em>It giet oan!</em> That it a Frisian phrase for something unlike is going to happen, like and particularly related to the\n<a href=\"https://en.wikipedia.org/wiki/Elfstedentocht\">Elfstedentocht</a>.</p>\n\n<p><strong>ChemCuration 2019</strong> is a go. The <a href=\"https://chemcuration.github.io/chemcuration2019/\">website is online</a>, the\n<a href=\"https://twitter.com/chemcuration\">Twitter account</a> and <a href=\"https://twitter.com/hashtag/chemcur2019\">hashtag are ready</a>,\nwe got a poster prize, and here is the call for posters!</p>\n\n<blockquote>\n  <p>On December 3 the first ChemCuration conference will take place. ChemCuration 2019 is a one day, online-only conference around data curation and curated data in the chemistry domain. During the entire conference day, you can participate by tweeting about the poster that you uploaded, along with the meeting hashtag, and responding to questions about your poster in the 24 hours of the conference day. The poster must be available in an online repository (e.g. Zenodo or Figshare) under the CCZero, CC-BY or CC-BY-SA license prior to the conference.</p>\n\n  <p>This is the meeting scope: anything around data curation and curated data of open science data in chemistry. This includes but is not limited to: 1. a new release of curated open data; 2. FAIR metadata around open data; and 3. open source tools for data curation.</p>\n\n  <p><strong>How do I participate in ChemCuration?</strong><br />\nYou can participate in this online poster conference by presenting your poster on Twitter\nduring the conference day. You do this by first archiving your poster via Figshare or Zenodo,\nwith an open license (e.g. CCZero or CC-BY). Then, during the day you tweet an image of\n(part of) your digital poster with the <a href=\"https://twitter.com/hashtag/chemcur2019\">#chemcur2019</a>\nhashtag, a short summary, and a link to your online poster with its DOI. The archived poster\nshould be a regular A0 poster (WxH = 841 x 1189 mm or 33.1 x 46.8 in)</p>\n\n  <p><strong>Do I need to register?</strong><br />\nRegistration is not obligatory to participate. However, if you would like to be eligible\nfor a poster prize, then registration is required, by Nov. 30th, 2019. The registration form\nis found at <a href=\"https://github.com/chemcuration/chemcuration2019/issues/new/choose\">https://github.com/chemcuration/chemcuration2019/issues/new/choose</a></p>\n\n  <p>More information can be found on the website (<a href=\"https://chemcuration.github.io/chemcuration2019/\">https://chemcuration.github.io/chemcuration2019/</a>)\nand on Twitter <a href=\"https://twitter.com/chemcuration\">https://twitter.com/chemcuration</a></p>\n</blockquote>",
      "summary": "Twitter profile.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/Screenshot_20191014_174204.png",
      "date_published": "2019-10-14T00:00:00+00:00",
      "date_modified": "2019-10-14T00:00:00+00:00",
      "tags": ["curation","chemistry"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/preah-ehh72",
      "url": "https://chem-bla-ics.linkedchemistry.info/2019/10/09/chemcuration-small-trick-to-fix-smiles.html",
      "title": "ChemCuration: a small trick to fix the SMILES of glucuronides",
      "content_html": "<p><span style=\"width: 40%; display: block; margin-left: auto; margin-right: auto; float: right\">\n<img src=\"/assets/images/Screenshot_20191008_144049.png\" /> <br />\nGlucuronide functional group.\n</span></p>\n\n<p>Now that the <a href=\"https://chemcuration.github.io/chemcuration2019/\">ChemCuration 2019</a> online poster conference is nearing, and\nmy upcoming talks about chemistry in <a href=\"https://wikidata.org/\">Wikidata</a> (also needing curation), and the much longer process\nof curation of metabolite (-like) structures in <a href=\"https://wikipathways.org/\">WikiPathways</a>, I decided that something I\ntweeted earlier this week is actually quite useful, and therefore something I should really write up in my lab notebook.</p>\n\n<p><a href=\"https://en.wikipedia.org/wiki/Glucuronide\">Glucuronide</a> is an example (biological) functional group. And there are several\ndatabases that represent the stereochemistry now always correct. That is an interoperability (and thus FAIR) problem.\nCorrecting this is not trivial, particularly if you have to redraw the same glucuronide group again and again.</p>\n\n<p>So, not looking forward to that, I invested a bit of time to find a <a href=\"http://opensmiles.org/\">SMILES</a> trick. What if I had\na SMILES snippet that I could easily copy/paste and attach to the SMILES of the chemical structure it is attached to? Here\ngoes.</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>O1[C@H](C(O)=O)[C@H]([C@H](O)[C@@H](O)[C@@H]1O9)O.\n</code></pre></div></div>\n\n<p>I just realized that <a href=\"https://twitter.com/egonwillighagen/status/1181573810543321088\">the original 3 I used</a> can better be\na <code class=\"language-plaintext highlighter-rouge\">9</code>, which is less likely to occur in the SMILES of the rest of the molecule. The period at the end is also deliberate.\nThat way, I can just copy past the SMILES of the rest directly after that period. Then I get a disconnected structure, but\nI only have to put a 9 next to the atom that is binding to the glucuronide. So, let’s see the R group is methane, I get:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>O1[C@H](C(O)=O)[C@H]([C@H](O)[C@@H](O)[C@@H]1O9)O.C9\n</code></pre></div></div>\n\n<p>Now, next stop: <code class=\"language-plaintext highlighter-rouge\">CoA</code> and other common biological tags.</p>",
      "summary": "Glucuronide functional group.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/Screenshot_20191008_144049.png",
      "date_published": "2019-10-09T00:00:00+00:00",
      "date_modified": "2019-10-09T00:00:00+00:00",
      "tags": ["chemistry","curation","wikidata","wikipathways","smiles"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/8kxrz-48p64",
      "url": "https://chem-bla-ics.linkedchemistry.info/2019/04/07/history-of-term-open-science-1-early.html",
      "title": "History of the term Open Science #1: the early days",
      "content_html": "<p><span style=\"width: 40%; display: block; margin-left: auto; margin-right: auto; float: right\">\n<img src=\"/assets/images/Screenshot_20190407_144507.png\" /> <br />\nScreenshot of the <i><a href=\"http://www.citeulike.org/group/20496\">Open Science History</a></i> group on CiteULike.\n</span></p>\n\n<p>Open Science has been around for some time. Before Copyright became a thing, knowledge dissemination was mostly limited by\nhow easy you could get knowledge from one place to another. The introduction of Copyright changed this. No longer the question was\n<strong><em>how to get people to know the new knowledge to how to get people to pay for new knowledge</em></strong>. One misconception, for example,\nis that publishing is a free market. Yes, you can argue that you can publish anywhere you like (theoretically, at least, but\nreality says otherwise), but the monopoly is in getting access: for every new fact (and republishing the same fact is a faux\npas), there is exactly one provider of that fact.</p>\n\n<p>Slowly this is changing, but only slowly. What this really needs, is open licenses, just like open source licenses. Licenses that\nallow fixing typos, allow resharing with your students, etc.</p>\n\n<p>But contrary to what has been prevalent in the <a href=\"https://tools.wmflabs.org/scholia/topic/Q56458321\">Plan S</a> discussion, these\nideas are not new. And people have been trying Open Science for more than two decades already.</p>\n\n<p>I have been trying to dig up the oldest references (ongoing effort) of the term Open Science (in the current meaning), and\nhad <a href=\"https://web.archive.org/web/20190404104652/http://www.citeulike.org/group/20496\">a CiteULike group for that</a>.\nBut CiteULike is shutting down, so I will blog the references I found, and add some context.</p>\n\n<p>A first article to mention is this 1998 article that mentions Open Science: <em><a href=\"https://www.jstor.org/stable/116885\">Common Agency Contracting and the Emergence of\n“Open Science” Institutions</a> The American Economic Review, Vol. 88, No. 2. (May 1998),\npp. 15-21 by Paul A. David</em>. Worth reading, but does require reading some of the cited literature.</p>\n\n<p>The follow two magazine articles took the term Open Science to a wider public, and in reply to a conference held at\nBrookhaven National Laboratory:</p>\n\n<ul>\n  <li><a href=\"https://www.chronicle.com/article/the-open-source-movement/3254\">The ‘Open-Source Movement’ Turns Its Eye to Science</a> <em>The Chronicle of Higher Education</em> (5 November 1999) by Vincent Kiernan</li>\n  <li><a href=\"http://www.drdobbs.com/a-natural-home-for-open-source/184411210\">A Natural Home for Open Source</a> 1999 <em>Dr. Dobb’s The World of Software Development</em> (1 October 1999)</li>\n  <li><a href=\"http://www.linuxjournal.com/article/3739\">Open Source/Open Science</a> 1999 <em>Linux Journal</em> (1 February 2000) by Stephen Adler</li>\n</ul>\n\n<p>I would also like to note that the <a href=\"http://openscience.org/\">openscience.org</a> website by\n<a href=\"https://tools.wmflabs.org/scholia/author/Q20900795\">Dan Gezelter</a> went online in the late nineties already, which I have\nused in various of my source code projects, and, of course, also has been used by the\n<a href=\"https://cdk.github.io/\">Chemistry Development Kit</a> from the start.</p>",
      "summary": "Screenshot of the Open Science History group on CiteULike.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/Screenshot_20190407_144507.png",
      "date_published": "2019-04-07T00:00:00+00:00",
      "date_modified": "2019-04-07T00:00:00+00:00",
      "tags": ["citeulike","opensource","openscience"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/c7db7-70j94",
      "url": "https://chem-bla-ics.linkedchemistry.info/2019/03/30/what-metabolites-are-found-in-which.html",
      "title": "What metabolites are found in which species? Nanopublications from Wikidata",
      "content_html": "<p>In December I reported about Groovy <a href=\"https://chem-bla-ics.linkedchemistry.info/2018/12/27/creating-nanopublications-with-groovy.html\">code to create nanopublications <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.\nThis has been running for some time now, extracting nanopubs that assert that some\nmetabolite is found in some species. I send the resulting nanopubs to\n<a href=\"https://scholia.toolforge.org/author/Q42027946\">Tobias Kuhn <i class=\"fa-solid fa-recycle fa-xs\"></i></a>, to populate his\n<em>Growing Resource of Provenance-Centric Scientific Linked Data</em>\n(doi:<a href=\"https://doi.org/10.1109/eScience.2018.00024\">10.1109/eScience.2018.00024</a>,\n<a href=\"https://arxiv.org/pdf/1809.06532.pdf\">PDF</a>).</p>\n\n<p>Each data set comes with <a href=\"http://np.inn.ac/RA6KPZ2qS8joGDOA9EvfcNHeNsg6nI2_T1YePsYMjL9io\">an index pointing to the individual nanopubs</a>,\nand that looks like this:</p>\n\n<p><img src=\"/assets/images/nanopubs.png\" alt=\"\" /></p>\n\n<p>I wonder what options I have to to archive the full set up nanopublications on\nFigshare or Zenodo, and see that DOI show up here…</p>",
      "summary": "In December I reported about Groovy code to create nanopublications . This has been running for some time now, extracting nanopubs that assert that some metabolite is found in some species. I send the resulting nanopubs to Tobias Kuhn , to populate his Growing Resource of Provenance-Centric Scientific Linked Data (doi:10.1109/eScience.2018.00024, PDF).",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/nanopubs.png",
      "date_published": "2019-03-30T00:00:00+00:00",
      "date_modified": "2025-02-16T00:00:00+00:00",
      "tags": ["nanopub","cheminf","wikidata"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1109/ESCIENCE.2018.00024", "doi": "10.1109/ESCIENCE.2018.00024"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/v3szx-3jk73",
      "url": "https://chem-bla-ics.linkedchemistry.info/2018/12/27/creating-nanopublications-with-groovy.html",
      "title": "Creating nanopublications with Groovy",
      "content_html": "<p><img style=\"float: right\" width=\"200\" src=\"/assets/images/Screenshot_20181227_075006.png\" />\nYesterday, I struggled some with creating <a href=\"http://nanopub.org/\">nanopublications</a> with <a href=\"https://en.wikipedia.org/wiki/Apache_Groovy\">Groovy</a>.\nMy first attempt was an utter failure, but then I discovered <a href=\"https://twitter.com/txkuhn\">Thomas Kuhn</a>’s\n<a href=\"https://github.com/Nanopublication/nanopub-java/blob/master/src/main/java/org/nanopub/NanopubCreator.java\">NanopubCreator</a>\nand it was downhill from there.</p>\n\n<p>On the right, a depiction is given of a compound found in Taphrorychus bicolor (doi:<a href=\"https://doi.org/10.1002/JLAC.199619961005\">10.1002/JLAC.199619961005</a>).\nPublished in <em>Liebigs Annalen</em>, see <a href=\"https://chem-bla-ics.blogspot.com/2018/12/from-annalen-der-pharmacie-to-european.html\">this post</a>\nabout the history of that journal.</p>\n\n<p>There are two good things about this. First, I now have a <a href=\"https://github.com/egonw/wikidataNanopublications\">code base</a>\nthat I can easily repurpose to make <em>trusty nanopublications</em> (doi:<a href=\"10.1007/978-3-319-07443-6_27\">10.1007/978-3-319-07443-6_27</a>)\nabout anything structured as a table (so can you).</p>\n\n<p>Second, I now about almost 1200 CCZero nanopublications that tell you in which species a certain metabolite\nhas been found. Sourced from <a href=\"https://wikidata.org/\">Wikidata</a>, using <a href=\"https://query.wikidata.org/\">their SPARQL end point</a>.\nThis collection is a bit boring that this moment, and most of them are human metabolites, where the source is either\n<a href=\"https://tools.wmflabs.org/scholia/work/Q28601559\">Recon 2.2</a> or <a href=\"https://wikipathways.org/\">WikiPathways</a>.\nBut I expect (hope) to see more DOIs to show up. Think\n<em><a href=\"https://blogs.biomedcentral.com/bmcblog/2018/11/01/challenge-reuse-additional-files-supplementary-information/\">We challenge you to reuse Additional Files</a></em>.</p>\n\n<p>Finally, you are probably interested in learning what one of the created nanopublications looks like, to I put\n<a href=\"https://gist.github.com/egonw/5fb0994cac6f9e851f3857cd306f0890\">a Gist online</a>:</p>\n\n<pre><code class=\"language-trig\">@prefix this: &lt;http://www.bigcat.unimaas.nl/nanopubs/wikidata/tmp/np742.RAwXcetTykN6UPVzBOyatKm30mbT6endXfDrxnarRysL0&gt; .\n@prefix sub: &lt;http://www.bigcat.unimaas.nl/nanopubs/wikidata/tmp/np742.RAwXcetTykN6UPVzBOyatKm30mbT6endXfDrxnarRysL0#&gt; .\n@prefix wd: &lt;http://www.wikidata.org/entity/&gt; .\n@prefix np: &lt;http://www.nanopub.org/nschema#&gt; .\n@prefix has-source: &lt;http://semanticscience.org/resource/SIO_000253&gt; .\n@prefix has-inchikey: &lt;http://semanticscience.org/resource/CHEMINF_000399&gt; .\n@prefix orcid: &lt;http://orcid.org/&gt; .\n@prefix wdt: &lt;http://www.wikidata.org/prop/direct/&gt; .\n@prefix owl: &lt;http://www.w3.org/2002/07/owl#&gt; .\n@prefix pav: &lt;http://purl.org/pav/&gt; .\n@prefix rdfs: &lt;http://www.w3.org/2000/01/rdf-schema#&gt; .\n@prefix skos: &lt;http://www.w3.org/2004/02/skos/core#&gt; .\n\nsub:Head {\n        this: np:hasAssertion sub:assertion ;\n                np:hasProvenance sub:provenance ;\n                np:hasPublicationInfo sub:pubinfo ;\n                a np:Nanopublication .\n}\n\nsub:assertion {\n        wd:Q15978631 rdfs:label \"Homo sapiens\"@en ;\n                skos:exactMatch &lt;http://purl.obolibrary.org/obo/NCBITaxon_9606&gt; .\n\n        wd:Q27125029 has-inchikey: \"APJYDQYYACXCRM-UHFFFAOYSA-O\" ;\n                rdfs:label \"tryptaminium\"@en ;\n                wdt:P703 wd:Q15978631 .\n}\n\nsub:provenance {\n        sub:assertion has-source: wd:Q2013 , wd:Q28601559 .\n\n        wd:Q28601559 rdfs:label \"Recon 2.2: from reconstruction to model of human metabolism\"@en ;\n                owl:sameAs &lt;https://doi.org/10.1007/S11306-016-1051-4&gt; .\n}\n\nsub:pubinfo {\n        this: pav:createdBy orcid:0000-0001-7542-0286 .\n}\n</code></pre>",
      "summary": "Yesterday, I struggled some with creating nanopublications with Groovy. My first attempt was an utter failure, but then I discovered Thomas Kuhn’s NanopubCreator and it was downhill from there.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/Screenshot_20181227_075006.png",
      "date_published": "2018-12-27T00:00:00+00:00",
      "date_modified": "2025-02-16T00:00:00+00:00",
      "tags": ["nanopub","wikidata","groovy"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1007/978-3-319-07443-6_27", "doi": "10.1007/978-3-319-07443-6_27"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1002/JLAC.199619961005", "doi": "10.1002/JLAC.199619961005"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/bsvz8-n7h68",
      "url": "https://chem-bla-ics.linkedchemistry.info/2018/11/17/join-me-in-encouraging-acs-to-join.html",
      "title": "Join me in encouraging the ACS to join the Initiative for Open Citations",
      "content_html": "<p>My research is into abstract representation of chemical information, important for other research to be performed. Indeed, my work\nis generally reused, but knowing which research fields my work is used in, or which societal problems it is helping solve, is not\neasily retrieved or determined. Efforts like <a href=\"https://meta.wikimedia.org/wiki/WikiCite\">WikiCite</a> and\n<a href=\"https://tools.wmflabs.org/scholia/topic/Q45340488\">Scholia</a> do allow me to navigate the citation network, so that I can determine\nwhich research fields my output influences and which diseases are studied with methods I proposed. Here’s a\n<a href=\"https://query.wikidata.org/embed.html#%23defaultView%3AGraph%0ASELECT%0A%20%20%3Ftopic1%20%3Ftopic1Label%20%3Ftopic2%20%3Ftopic2Label%20%3Fcount%0AWITH%20%7B%0A%20%20SELECT%0A%20%20%20%20(COUNT(%3Fwork)%20AS%20%3Fcount)%20%3Ftopic1%20%3Ftopic2%0A%20%20WHERE%20%7B%0A%20%20%20%20%23%20Find%20works%20that%20are%20marked%20with%20main%20subject%20of%20the%20topic.%0A%20%20%20%20%3Fwork%20wdt%3AP2860%2Fwdt%3AP50%20wd%3AQ20895241%20.%0A%20%20%20%20%0A%20%20%20%20%23%20Identify%20co-occuring%20topics.%20%0A%20%20%20%20%3Fwork%20wdt%3AP921%20%3Ftopic1%2C%20%3Ftopic2%20.%20%0A%0A%20%20%20%20%23%20article%20by%20author%0A%20%20%20%20MINUS%20%7B%20%3Fwork%20wdt%3AP50%20wd%3AQ20895241%20.%20%7D%0A%20%20%20%20FILTER%20(%20%3Ftopic1%20!%3D%20%3Ftopic2%20)%0A%20%20%7D%0A%20%20GROUP%20BY%20%3Ftopic1%20%3Ftopic2%0A%20%20ORDER%20BY%20DESC(%3Fcount)%0A%0A%20%20%23%20There%20a%20performance%20problems%20in%20the%20browser%3A%20We%20cannot%20show%20large%20graphs%2C%0A%20%20%23%20so%20we%20put%20a%20limit%20on%20the%20number%20of%20links%20displayed.%0A%20%20LIMIT%20400%0A%0A%7D%20AS%20%25results%0AWHERE%20%7B%0A%20%20INCLUDE%20%25results%0A%20%20%0A%20%20%23%20Label%20the%20results%0A%20%20SERVICE%20wikibase%3Alabel%20%7B%0A%20%20%20%20bd%3AserviceParam%20wikibase%3Alanguage%20%22en%2Cda%2Cde%2Ces%2Cfr%2Cjp%2Cnl%2Cno%2Cru%2Csv%2Czh%22.%0A%20%20%7D%0A%7D%0A%0A\">network of topics of articles citing my work</a>:</p>\n\n<p><img src=\"/assets/images/exploring.png\" alt=\"\" /></p>\n\n<p>Graphs like this show information on how people are using my work, which in turn allows me to further support. But this relies on\nopen citations.</p>\n\n<p>In my opinion, citations are an essential part of our research process. It gives us access to import prior work on which a study\nis based, and reflects how a work influences other research or even is essential to that other work. For example, it allows us\nto not repeat earlier published work, while preserving the ability to reproduce the full work. The\n<a href=\"https://i4oc.org/\">Initiative for Open Citations</a> encourages these citations to be publicly available to benefit research, but\nremoving barriers to access this critical part of scholarly communication. While many societies and publishers have joined this\ninitiative, the <a href=\"https://pubs.acs.org/\">American Chemical Society</a> (ACS) has not yet. By not joining the limit the sharing of\nknowledge for unclear reasons.</p>\n\n<p>And I would really like to see the ACS to join this initiative, and proposed this a few times already. Because they still have\nnot joined the initiative, I have <a href=\"https://www.change.org/p/the-american-chemical-society-to-join-the-initiative-for-open-citations\">started this petition</a>.\nIf you agree, please sign and share it with others.</p>",
      "summary": "My research is into abstract representation of chemical information, important for other research to be performed. Indeed, my work is generally reused, but knowing which research fields my work is used in, or which societal problems it is helping solve, is not easily retrieved or determined. Efforts like WikiCite and Scholia do allow me to navigate the citation network, so that I can determine which research fields my output influences and which diseases are studied with methods I proposed. Here’s a network of topics of articles citing my work:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/exploring.png",
      "date_published": "2018-11-17T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["acs","i4oc","publishing"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/3aha2-9ct41",
      "url": "https://chem-bla-ics.linkedchemistry.info/2018/11/04/programming-in-life-sciences-23.html",
      "title": "Programming in the Life Sciences #23: research output for the future",
      "content_html": "<p><span style=\"width: 30%; display: block; margin-left: auto; margin-right: auto; float: right\">\n<img src=\"/assets/images/15528-illustration-of-a-ten-of-clubs-playing-card-pv.png\" /> <br />\nA random public domain picture with 10 in it.<br />\n</span>\nEnsuring that you and others can understand you research output five years from now requires effort. This is why scholars tend to keep lab\nnotebooks. The computational age has perhaps made us a bit lazy here, but we still make an effort. A series of <em>Ten Simple Rules</em> articles\noutline some of the things to think about:</p>\n\n<ol>\n  <li>Goodman A, Pepe A, Blocker AW, Borgman CL, Cranmer K, Crosas M, et al. <a href=\"https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003542\"><strong>Ten Simple Rules</strong> for the Care and Feeding of Scientific Data</a>. Bourne PE, editor. PLoS Computational Biology. 2014 Apr 24;10(4):e1003542.</li>\n  <li>List M, Ebert P, Albrecht F. <a href=\"https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005265\"><strong>Ten Simple Rules</strong> for Developing Usable Software in Computational Biology</a>. Markel S, editor. PLOS Computational Biology. 2017 Jan 5;13(1):e1005265.</li>\n  <li>Perez-Riverol Y, Gatto L, Wang R, Sachsenberg T, Uszkoreit J, Leprevost F da V, et al. <a href=\"https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004947\"><strong>Ten Simple Rules</strong> for Taking Advantage of Git and GitHub</a>. Markel S, editor. PLOS Computational Biology. 2016 Jul 14;12(7):e1004947.</li>\n  <li>Prlić A, Procter JB. <a href=\"https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002802\"><strong>Ten Simple Rules</strong> for the Open Development of Scientific Software</a>. PLoS Computational Biology. 2012 Dec 6;8(12):e1002802.</li>\n  <li>Sandve GK, Nekrutenko A, Taylor J, Hovig E. <a href=\"https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003285\"><strong>Ten Simple Rules</strong> for Reproducible Computational Research</a>. Bourne PE, editor. PLoS Computational Biology. 2013 Oct 24;9(10):e1003285.</li>\n</ol>\n\n<p>Regarding licensing, I can highly recommend reading this book:</p>\n\n<ol>\n  <li>Rosen L. Open Source Licensing [Internet]. 2004. Available from: <a href=\"https://www.rosenlaw.com/oslbook.htm\">https://www.rosenlaw.com/oslbook.htm</a></li>\n</ol>\n\n<p>Regarding Git, I recommend these two resources:</p>\n\n<ol>\n  <li>Wiegley J. Git From the Bottom Up [Internet]. 2017. Available from: <a href=\"https://jwiegley.github.io/git-from-the-bottom-up/\">https://jwiegley.github.io/git-from-the-bottom-up/</a></li>\n  <li>Task 1: How to set up a repository on GitHub [Internet]. 2018. Available from: <a href=\"https://github.com/OpenScienceMOOC/Module-5-Open-Research-Software-and-Open-Source/blob/master/content_development/Task_1.md\">https://github.com/OpenScienceMOOC/Module-5-Open-Research-Software-and-Open-Source/blob/master/content_development/Task_1.md</a></li>\n</ol>",
      "summary": "A random public domain picture with 10 in it. Ensuring that you and others can understand you research output five years from now requires effort. This is why scholars tend to keep lab notebooks. The computational age has perhaps made us a bit lazy here, but we still make an effort. A series of Ten Simple Rules articles outline some of the things to think about:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/15528-illustration-of-a-ten-of-clubs-playing-card-pv.png",
      "date_published": "2018-11-04T00:00:00+00:00",
      "date_modified": "2018-11-04T00:00:00+00:00",
      "tags": ["pra3006"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1371/journal.pcbi.1003542", "doi": "10.1371/journal.pcbi.1003542"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1371/journal.pcbi.1005265", "doi": "10.1371/journal.pcbi.1005265"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1371/journal.pcbi.1004947", "doi": "10.1371/journal.pcbi.1004947"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1371/journal.pcbi.1002802", "doi": "10.1371/journal.pcbi.1002802"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1371/journal.pcbi.1003285", "doi": "10.1371/journal.pcbi.1003285"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/wsyrx-kkf55",
      "url": "https://chem-bla-ics.linkedchemistry.info/2018/10/11/two-presentations-at-wikipathways-2018.html",
      "title": "Two presentations at WikiPathways 2018 Summit #WP18Summit",
      "content_html": "<p><span style=\"width: 40%; display: block; margin-left: auto; margin-right: auto; float: right\">\n<img src=\"/assets/images/430px-Wp18summit.png\" />\n</span></p>\n\n<p>Found my way back to my room a few kilometers from the San Francisco city center, after a third day at the\n<a href=\"https://gladstone.org/WP18Summit\">WikiPathways 2018 Summit</a> at the <a href=\"https://gladstone.org/\">Gladstone Institutes</a>\nin Mission Bay, celebrating 10 years of the project, which I only joined\n<a href=\"http://chem-bla-ics.blogspot.com/2012/01/first-month-back-in-nl.html\">some six and a half years ago</a>.</p>\n\n<p>The Summit was awesome and the whole trip was awesome. The flight was long, with a stop in Seattle. I always get a\nbit nervous of lay-overs (having missed my plane twice before…), but a stop in Seattle is interesting, with a great view of\n<a href=\"https://tools.wmflabs.org/reasonator/?q=Q194057&amp;lang=en\">Mt. Rainier</a>, which is also from an airplane quite a sight.\nAlex picked us up from the airport and the Airbnb is great (HT to Annie for being a great host), from which we can even\nsee the Golden Gate Bridge.</p>\n\n<p>The Sunday was surreal. With some 27 degrees Celsius the choice to visit the beach and stand, for the first time,\nin the Pacific was great. I had the great pleasure to meet Dario and his family and played volleyball at a beach\nfor the first time in some 28 years. Apparently, there was an airshow nearby and several shows were visible from\nour site, including a very long show by the <a href=\"https://www.instagram.com/p/BopjukvhUK1/\">Blue Angels</a>.\nThanks for a great afternoon!</p>\n\n<p>Sunday evening Adam hosted us for an <a href=\"https://www.wikipathways.org/index.php/WikiPathways:Team\">WikiPathways team</a> dinner.\nHis place gave a great view on San Francisco, the Bay Bridge, etc. Because Chris was paying attention, we actually got\nto see <a href=\"https://www.space.com/42068-amazing-spacex-rocket-launch-photos-not-aliens.html\">the SpaceX rocket launch</a>\n(no, my photo is not so impressive :). Well, I cannot express in words how cool that is, to see a rocket escape the\nearth gravity with your own eyes.</p>\n\n<p>And the Summit had not even started yet.</p>\n\n<p>I will have quite a lot to write up about the meeting itself. It was a great line up of speakers, great workshops,\nawesome discussions, and a high density of very knowledgeable people. I think we need 5M to implement just the ideas\nthat came up in the past three days. And it would be well invested. Anyway, more about that later. Make sure to keep\nan eye on the <a href=\"https://github.com/wikipathways\">GitHub repo for WikiPathways</a>.</p>\n\n<p>That leave me only, right now, to return to the title of this post. And below they are, my two contributions to this summit:</p>\n\n<p><a href=\"https://doi.org/10.5281/zenodo.3544361\"><img src=\"/assets/images/wpsummit_PDF_slide1_andabit.png\" alt=\"\" /></a></p>\n\n<div style=\"margin-bottom: 5px;\">\n<strong> <a href=\"https://zenodo.org/records/3544361\" target=\"_blank\" title=\"Automated Curation with Internal and External Validation\">Automated Curation with Internal and External Validation</a> </strong> from <strong><a href=\"https://zenodo.org/search?q=metadata.creators.person_or_org.name:%22Willighagen,+Egon%22\" target=\"_blank\">Egon Willighagen</a></strong> </div>\n\n<p><br /></p>\n\n<p><a href=\"https://doi.org/10.5281/zenodo.3544363\"><img src=\"/assets/images/wpsummit_PDF2_slide1_andabit.png\" alt=\"\" /></a></p>\n\n<div style=\"margin-bottom: 5px;\">\n<strong> <a href=\"https://zenodo.org/records/3544363\" target=\"_blank\" title=\"Using WikiPathways with Its Resource Description Framework Format\">Using WikiPathways with Its Resource Description Framework Format</a> </strong> from <strong><a href=\"https://zenodo.org/search?q=metadata.creators.person_or_org.name:%22Willighagen,+Egon%22\" target=\"_blank\">Egon Willighagen</a></strong> </div>",
      "summary": "",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/430px-Wp18summit.png",
      "date_published": "2018-10-11T00:00:00+00:00",
      "date_modified": "2025-06-29T00:00:00+00:00",
      "tags": ["curation","wikipathways"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.5281/zenodo.3544361", "doi": "10.5281/zenodo.3544361"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.5281/zenodo.3544363", "doi": "10.5281/zenodo.3544363"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/b65kv-58g66",
      "url": "https://chem-bla-ics.linkedchemistry.info/2018/09/16/data-curation-5-inspiration-95.html",
      "title": "Data Curation: 5% inspiration, 95% frustration (cleaning up data inconsistencies)",
      "content_html": "<p><span style=\"width: 40%; display: block; margin-left: auto; margin-right: auto; float: right\">\n<img src=\"/assets/images/Screenshot_20180916_170915.png\" /> <br />\nSlice of the spreadsheet in the supplementary info.\n</span></p>\n\n<p>Just some bit of cleaning I scripted today for a number of toxicology end points in a database published some time ago the\nzero-APC Open Access (CC_BY) journal <a href=\"https://www.beilstein-journals.org/bjnano/\">Beilstein of Journal of Nanotechnology</a>,\nNanoE-Tox (doi:<a href=\"https://www.beilstein-journals.org/bjnano/articles/6/183\">10.3762/bjnano.6.183</a>).</p>\n\n<p>The curation I am doing is to redistribute the data in the eNanoMapper database (see doi:<a href=\"https://doi.org/10.3762/bjnano.6.165/\">10.3762/bjnano.6.165</a>)\nand thus with ontology annotation (see doi:<a href=\"https://doi.org/10.1186/s13326-015-0005-5\">10.1186/s13326-015-0005-5</a>):</p>\n\n<div class=\"language-groovy highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">recognizedToxicities</span> <span class=\"o\">=</span> <span class=\"o\">[</span>\n  <span class=\"s2\">\"EC10\"</span><span class=\"o\">:</span> <span class=\"s2\">\"http://www.bioassayontology.org/bao#BAO_0001263\"</span><span class=\"o\">,</span>\n  <span class=\"s2\">\"EC20\"</span><span class=\"o\">:</span> <span class=\"s2\">\"http://www.bioassayontology.org/bao#BAO_0001235\"</span><span class=\"o\">,</span>\n  <span class=\"s2\">\"EC25\"</span><span class=\"o\">:</span> <span class=\"s2\">\"http://www.bioassayontology.org/bao#BAO_0001264\"</span><span class=\"o\">,</span>\n  <span class=\"s2\">\"EC30\"</span><span class=\"o\">:</span> <span class=\"s2\">\"http://www.bioassayontology.org/bao#BAO_0000599\"</span><span class=\"o\">,</span>\n  <span class=\"s2\">\"EC50\"</span><span class=\"o\">:</span> <span class=\"s2\">\"http://www.bioassayontology.org/bao#BAO_0000188\"</span><span class=\"o\">,</span>\n  <span class=\"s2\">\"EC80\"</span><span class=\"o\">:</span> <span class=\"s2\">\"http://purl.enanomapper.org/onto/ENM_0000053\"</span><span class=\"o\">,</span>\n  <span class=\"s2\">\"EC90\"</span><span class=\"o\">:</span> <span class=\"s2\">\"http://www.bioassayontology.org/bao#BAO_0001237\"</span><span class=\"o\">,</span>\n  <span class=\"s2\">\"IC50\"</span><span class=\"o\">:</span> <span class=\"s2\">\"http://www.bioassayontology.org/bao#BAO_0000190\"</span><span class=\"o\">,</span>\n  <span class=\"s2\">\"LC50\"</span><span class=\"o\">:</span> <span class=\"s2\">\"http://www.bioassayontology.org/bao#BAO_0002145\"</span><span class=\"o\">,</span>\n  <span class=\"s2\">\"MIC\"</span><span class=\"o\">:</span>  <span class=\"s2\">\"http://www.bioassayontology.org/bao#BAO_0002146\"</span><span class=\"o\">,</span>\n  <span class=\"s2\">\"NOEC\"</span><span class=\"o\">:</span> <span class=\"s2\">\"http://purl.enanomapper.org/onto/ENM_0000060\"</span><span class=\"o\">,</span>\n  <span class=\"s2\">\"NOEL\"</span><span class=\"o\">:</span> <span class=\"s2\">\"http://purl.enanomapper.org/onto/ENM_0000056\"</span>\n<span class=\"o\">]</span>\n</code></pre></div></div>\n\n<p>With 402(!) variants left. Many do not have an ontology term yet, and I\n<a href=\"https://github.com/enanomapper/ontologies/issues/143\">filed a feature request</a>.</p>\n\n<p>Units:</p>\n\n<div class=\"language-groovy highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">recognizedUnits</span> <span class=\"o\">=</span> <span class=\"o\">[</span>\n  <span class=\"s2\">\"g/L\"</span><span class=\"o\">:</span> <span class=\"s2\">\"g/L\"</span><span class=\"o\">,</span>\n  <span class=\"s2\">\"g/l\"</span><span class=\"o\">:</span> <span class=\"s2\">\"g/l\"</span><span class=\"o\">,</span>\n  <span class=\"s2\">\"mg/L\"</span><span class=\"o\">:</span> <span class=\"s2\">\"mg/L\"</span><span class=\"o\">,</span>\n  <span class=\"s2\">\"mg/ml\"</span><span class=\"o\">:</span> <span class=\"s2\">\"mg/ml\"</span><span class=\"o\">,</span>\n  <span class=\"s2\">\"mg/mL\"</span><span class=\"o\">:</span> <span class=\"s2\">\"mg/mL\"</span><span class=\"o\">,</span>\n  <span class=\"s2\">\"µg/L of food\"</span><span class=\"o\">:</span> <span class=\"s2\">\"µg/L\"</span><span class=\"o\">,</span>\n  <span class=\"s2\">\"µg/L\"</span><span class=\"o\">:</span> <span class=\"s2\">\"µg/L\"</span><span class=\"o\">,</span>\n  <span class=\"s2\">\"µg/mL\"</span><span class=\"o\">:</span> <span class=\"s2\">\"µg/mL\"</span><span class=\"o\">,</span>\n  <span class=\"s2\">\"mg Ag/L\"</span><span class=\"o\">:</span> <span class=\"s2\">\"mg/L\"</span><span class=\"o\">,</span>\n  <span class=\"s2\">\"mg Cu/L\"</span><span class=\"o\">:</span> <span class=\"s2\">\"mg/L\"</span><span class=\"o\">,</span>\n  <span class=\"s2\">\"mg Zn/L\"</span><span class=\"o\">:</span> <span class=\"s2\">\"mg/L\"</span><span class=\"o\">,</span>\n  <span class=\"s2\">\"µg dissolved Cu/L\"</span><span class=\"o\">:</span> <span class=\"s2\">\"µg/L\"</span><span class=\"o\">,</span>\n  <span class=\"s2\">\"µg dissolved Zn/L\"</span><span class=\"o\">:</span> <span class=\"s2\">\"µg/L\"</span><span class=\"o\">,</span>\n  <span class=\"s2\">\"µg Ag/L\"</span><span class=\"o\">:</span> <span class=\"s2\">\"µg/L\"</span><span class=\"o\">,</span>\n  <span class=\"s2\">\"fmol/L\"</span><span class=\"o\">:</span> <span class=\"s2\">\"fmol/L\"</span><span class=\"o\">,</span>\n\n  <span class=\"s2\">\"mmol/g\"</span><span class=\"o\">:</span> <span class=\"s2\">\"mmol/g\"</span><span class=\"o\">,</span>\n  <span class=\"s2\">\"nmol/g fresh weight\"</span><span class=\"o\">:</span> <span class=\"s2\">\"nmol/g\"</span><span class=\"o\">,</span>\n  <span class=\"s2\">\"µg Cu/g\"</span><span class=\"o\">:</span> <span class=\"s2\">\"µg/g\"</span><span class=\"o\">,</span>\n  <span class=\"s2\">\"mg Ag/kg\"</span><span class=\"o\">:</span> <span class=\"s2\">\"mg/kg\"</span><span class=\"o\">,</span>\n  <span class=\"s2\">\"mg Zn/kg\"</span><span class=\"o\">:</span> <span class=\"s2\">\"mg/kg\"</span><span class=\"o\">,</span>\n  <span class=\"s2\">\"mg Zn/kg  d.w.\"</span><span class=\"o\">:</span> <span class=\"s2\">\"mg/kg\"</span><span class=\"o\">,</span>\n  <span class=\"s2\">\"mg/kg of dry feed\"</span><span class=\"o\">:</span> <span class=\"s2\">\"mg/kg\"</span><span class=\"o\">,</span>\n  <span class=\"s2\">\"mg/kg\"</span><span class=\"o\">:</span> <span class=\"s2\">\"mg/kg\"</span><span class=\"o\">,</span>\n  <span class=\"s2\">\"g/kg\"</span><span class=\"o\">:</span> <span class=\"s2\">\"g/kg\"</span><span class=\"o\">,</span>\n  <span class=\"s2\">\"µg/g dry weight sediment\"</span><span class=\"o\">:</span> <span class=\"s2\">\"µg/g\"</span><span class=\"o\">,</span>\n  <span class=\"s2\">\"µg/g\"</span><span class=\"o\">:</span> <span class=\"s2\">\"µg/g\"</span>\n<span class=\"o\">]</span>\n</code></pre></div></div>\n\n<p>Oh, and don’t get me started on actual values, with endpoint values, as ranges, errors, etc. That variety is\nnot the problem, but the lack of FAIR-ness makes the whole really hard to process. I now have something like:</p>\n\n<div class=\"language-groovy highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">prop</span> <span class=\"o\">=</span> <span class=\"n\">prop</span><span class=\"o\">.</span><span class=\"na\">replace</span><span class=\"o\">(</span><span class=\"s2\">\",\"</span><span class=\"o\">,</span> <span class=\"s2\">\".\"</span><span class=\"o\">)</span>\n<span class=\"k\">if</span> <span class=\"o\">(</span><span class=\"n\">prop</span><span class=\"o\">.</span><span class=\"na\">substring</span><span class=\"o\">(</span><span class=\"mi\">1</span><span class=\"o\">).</span><span class=\"na\">contains</span><span class=\"o\">(</span><span class=\"s2\">\"-\"</span><span class=\"o\">))</span> <span class=\"o\">{</span>\n  <span class=\"n\">rdf</span><span class=\"o\">.</span><span class=\"na\">addTypedDataProperty</span><span class=\"o\">(</span>\n    <span class=\"n\">store</span><span class=\"o\">,</span> <span class=\"n\">endpointIRI</span><span class=\"o\">,</span> <span class=\"s2\">\"${oboNS}STATO_0000035\"</span><span class=\"o\">,</span>\n    <span class=\"n\">prop</span><span class=\"o\">,</span> <span class=\"s2\">\"${xsdNS}string\"</span>\n  <span class=\"o\">)</span>\n  <span class=\"n\">rdf</span><span class=\"o\">.</span><span class=\"na\">addDataProperty</span><span class=\"o\">(</span>\n    <span class=\"n\">store</span><span class=\"o\">,</span> <span class=\"n\">endpointIRI</span><span class=\"o\">,</span> <span class=\"s2\">\"${ssoNS}has-unit\"</span><span class=\"o\">,</span> <span class=\"n\">units</span>\n  <span class=\"o\">)</span>\n<span class=\"o\">}</span> <span class=\"k\">else</span> <span class=\"k\">if</span> <span class=\"o\">(</span><span class=\"n\">prop</span><span class=\"o\">.</span><span class=\"na\">contains</span><span class=\"o\">(</span><span class=\"s2\">\"±\"</span><span class=\"o\">))</span> <span class=\"o\">{</span>\n  <span class=\"n\">rdf</span><span class=\"o\">.</span><span class=\"na\">addTypedDataProperty</span><span class=\"o\">(</span>\n    <span class=\"n\">store</span><span class=\"o\">,</span> <span class=\"n\">endpointIRI</span><span class=\"o\">,</span> <span class=\"s2\">\"${oboNS}STATO_0000035\"</span><span class=\"o\">,</span>\n    <span class=\"n\">prop</span><span class=\"o\">,</span> <span class=\"s2\">\"${xsdNS}string\"</span>\n  <span class=\"o\">)</span>\n  <span class=\"n\">rdf</span><span class=\"o\">.</span><span class=\"na\">addDataProperty</span><span class=\"o\">(</span>\n    <span class=\"n\">store</span><span class=\"o\">,</span> <span class=\"n\">endpointIRI</span><span class=\"o\">,</span> <span class=\"s2\">\"${ssoNS}has-unit\"</span><span class=\"o\">,</span> <span class=\"n\">units</span>\n  <span class=\"o\">)</span>\n<span class=\"o\">}</span> <span class=\"k\">else</span> <span class=\"k\">if</span> <span class=\"o\">(</span><span class=\"n\">prop</span><span class=\"o\">.</span><span class=\"na\">contains</span><span class=\"o\">(</span><span class=\"s2\">\"&lt;\"</span><span class=\"o\">))</span> <span class=\"o\">{</span>\n<span class=\"o\">}</span> <span class=\"k\">else</span> <span class=\"o\">{</span>\n  <span class=\"n\">rdf</span><span class=\"o\">.</span><span class=\"na\">addTypedDataProperty</span><span class=\"o\">(</span>\n    <span class=\"n\">store</span><span class=\"o\">,</span> <span class=\"n\">endpointIRI</span><span class=\"o\">,</span> <span class=\"s2\">\"${ssoNS}has-value\"</span><span class=\"o\">,</span> <span class=\"n\">prop</span><span class=\"o\">,</span>\n    <span class=\"s2\">\"${xsdNS}double\"</span>\n  <span class=\"o\">)</span>\n  <span class=\"n\">rdf</span><span class=\"o\">.</span><span class=\"na\">addDataProperty</span><span class=\"o\">(</span>\n    <span class=\"n\">store</span><span class=\"o\">,</span> <span class=\"n\">endpointIRI</span><span class=\"o\">,</span> <span class=\"s2\">\"${ssoNS}has-unit\"</span><span class=\"o\">,</span> <span class=\"n\">units</span>\n  <span class=\"o\">)</span>\n<span class=\"o\">}</span>\n</code></pre></div></div>\n\n<p>But let me make clear: I can actually do this, add more data to the eNanoMapper database (with\n<a href=\"https://tools.wmflabs.org/scholia/github/vedina\">Nina</a>), only because the developers of this database made their data\navailable under an Open license (CC-BY, to be precise), allowing me to reuse, modify (change format), and redistribute\nit. Thanks to the authors. Data curation is expensive, whether I do it, or if the authors of the database did. They\nalready did a lot of data curation. But only because of Open licenses, <strong>we only have to do this once</strong>.</p>",
      "summary": "Slice of the spreadsheet in the supplementary info.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/Screenshot_20180916_170915.png",
      "date_published": "2018-09-16T00:00:00+00:00",
      "date_modified": "2018-09-16T00:00:00+00:00",
      "tags": ["curation","toxicology","nanosafety"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.3762/bjnano.6.183", "doi": "10.3762/bjnano.6.183"
            , "cito":
              
              
                [ 
                  "usesDataFrom"
                  
                 ]
              
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.3762/BJNANO.6.165", "doi": "10.3762/BJNANO.6.165"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1186/S13326-015-0005-5", "doi": "10.1186/S13326-015-0005-5"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/e05yz-mhy44",
      "url": "https://chem-bla-ics.linkedchemistry.info/2018/09/08/also-new-this-week-google-dataset-search.html",
      "title": "Also new this week: &quot;Google Dataset Search&quot;",
      "content_html": "<p>There was a lot of Open Science news this week. The <a href=\"https://www.blog.google/products/search/making-it-easier-discover-datasets/\">announcement</a>\nof the <a href=\"https://toolbox.google.com/datasetsearch\">Google Dataset Search</a> was one of them:</p>\n\n<p><img src=\"/assets/images/google_dataset_search.png\" alt=\"\" /></p>\n\n<p>Of course, I first tried searching for “<a href=\"https://toolbox.google.com/datasetsearch/search?query=RDF%20chemistry&amp;docid=hiQ14TdWzjx%2FQ37gAAAAAA%3D%3D\">RDF chemistry</a>”\nwhich shows some of my data sets (and a lot more):</p>\n\n<p><img src=\"/assets/images/google_dataset_search2.png\" alt=\"\" /></p>\n\n<p>It picks up data from many sources, such as <a href=\"https://figshare.com/\">Figshare</a> in this image. That means it also works\n(well, sort of, as <a href=\"https://twitter.com/baoilleach/status/1037986030266318848\">Noel O’Boyle noticed</a>) for\nsupplementary information from the <a href=\"https://jcheminf.biomedcentral.com/\">Journal of Cheminformatics</a>.</p>\n\n<p>It picks up metadata in several ways, among which <a href=\"http://schemas.org/\">schemas.org</a>. So, next week we’ll see if\nwe can get <a href=\"http://enanomapper.net/\">eNanoMapper</a> extended to spit compatible JSON-LD for its data sets, called “bundles”.</p>\n\n<h2 id=\"integrated-with-google-scholar\">Integrated with Google Scholar?</h2>\n\n<p>While the URL for the search engine does not suggest the service is more than a 20% project, we can\nhope it will stay around like Google Scholar has been. But I do hope they will further integrate it\nwith Scholar. For example, in the above figure, it did pick up that I am the author of that data set\n(well, repurposed from an effort of <a href=\"https://twitter.com/rapodaca\">Rich Apodaca</a>), it did not figure\nout that I am also on Scholar.</p>\n\n<p>So, these data sets do not show up in your Google Scholar profile yet, but they <strong><em>must</em></strong>. Time will\ntell where this data search engine is going. There are many interesting features, and given the amount\nof online attention, they won’t stop development just yet, and I expect to discover more and better\nfeatures in the next months. Give it a spin!</p>",
      "summary": "There was a lot of Open Science news this week. The announcement of the Google Dataset Search was one of them:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/google_dataset_search2.png",
      "date_published": "2018-09-08T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["data","google"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/7ej1y-tp828",
      "url": "https://chem-bla-ics.linkedchemistry.info/2018/08/18/compound-class-identifiers-in-wikidata.html",
      "title": "Compound (class) identifiers in Wikidata",
      "content_html": "<p><span style=\"width: 40%; display: block; margin-left: auto; margin-right: auto; float: right\">\n<img src=\"/assets/images/extid-wikidata-histogram.png\" /> <br />\n<a href=\"https://edu.nl/h6kg3\">Bar chart</a> showing the number of compounds with a particular chemical identifier.\n</span>\nI think <a href=\"http://wikidata.org/\">Wikidata</a> is a groundbreaking project, which will have a major impact on science. One of the\nreasons is the open license (CCZero), the very basic approach (<a href=\"http://wikiba.se/\">Wikibase</a>), and the superb community around\nit. For example, setting up your own Wikibase including a cool SPARQL endpoint, is\n<a href=\"https://github.com/wmde/wikibase-docker\">easily done with Docker</a>.</p>\n\n<p>Wikidata has many sub projects, such as <a href=\"http://wikicite.org/\">WikiCite</a>, which captures the collective of primary literature.\nAnother one is the <a href=\"https://www.wikidata.org/wiki/Wikidata:WikiProject_Chemistry\">WikiProject Chemistry</a>. The two nicely match\nup, I think, making a public database linking chemicals to literature (tho, very much needs to be done here), see my recent\nICCS 2018 poster (doi:<a href=\"https://doi.org/10.6084/m9.figshare.6356027.v1\">10.6084/m9.figshare.6356027.v1</a>, paper pending).</p>\n\n<p>But Wikidata is also a great resource for identifier mappings between chemical databases, something we need for\n<a href=\"https://chem-bla-ics.blogspot.com/2017/11/new-paper-wikipathways-multifaceted.html\">our metabolism pathway research</a>.\nThe mapping, as you may know, are <a href=\"https://chem-bla-ics.blogspot.com/2016/09/metabolite-identifier-mapping-databases.html\">used in the latter</a>\nvia <a href=\"https://www.bridgedb.org/\">BridgeDb</a> and we have been using Wikidata as one of three sources for some time now (the others being\n<a href=\"http://www.hmdb.ca/\">HMDB</a> and <a href=\"https://www.ebi.ac.uk/chebi/\">ChEBI</a>). WikiProject Chemistry has a related\n<a href=\"https://www.wikidata.org/wiki/Wikidata:WikiProject_Chemistry/ChemID\">ChemID</a> effort, and while the wiki page does not show\nmuch recent activity, there is actually a lot of ongoing effort (see <a href=\"https://edu.nl/h6kg3\">plot</a>).\nAnd I’ve been <a href=\"https://chem-bla-ics.blogspot.com/2018/07/lipid-map-identifiers-and.html\">adding my bits</a>.</p>\n\n<h2 id=\"limitations-of-the-links\">Limitations of the links</h2>\n<p>But not each identifier in Wikidata has the same meaning. While they are all classified as ‘external-id’, the actual link may\nhave different meaning. This, of course, is the essence of scientific lenses, see <a href=\"https://chem-bla-ics.blogspot.com/2013/05/linking-wikipathways-to-binding.html\">this post</a>\nand the papers cited therein. One reason here is the difference in what entries in the various databases mean.</p>\n\n<p>Wikidata has an extensive model, defined by the aforementioned WikiProject Chemistry. For example, it has different concepts\nfor chemical compounds (in fact, the hierarchy is pretty rich) and compound classes. And these are differently modeled. Furthermore,\nit has a model that formalizes that things with a different InChI are different, but even allows things with the same InChI to be\ndifferent, if need arises. It tries to accurately and precisely capture the certainty and uncertainty of the chemistry. As such,\nit is a powerful system to handle identifier mappings, because databases are not clear, and chemistry and biological in data is\neven less: we measure experimentally a characterization of chemicals, but what we put in databases and give names, are specific\nmodels (often chemical graphs).</p>\n\n<p>That model differs from what other (chemical) databases use, or seem to use, because not always do databases indicate what they\nactually have in a record. But I think this is a fair guess.</p>\n\n<h2 id=\"chebi\">ChEBI</h2>\n<p>ChEBI (and the matching <a href=\"https://www.wikidata.org/wiki/Property:P683\">ChEBI ID</a>) has entries for chemical classes (e.g.\n<a href=\"https://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:35366\">fatty acid</a>) and specific compounds (e.g.\n<a href=\"https://www.ebi.ac.uk/chebi/searchId.do?chebiId=30089\">acetate</a>).</p>\n\n<h2 id=\"pubchem-chemspider-unichem\">PubChem, ChemSpider, UniChem</h2>\n<p>These three resources use the InChI as central asset. While they do not really have the concept of compound classes so much\n(though increasingly they have classifications), they do have entries where stereochemistry is undefined or unknown. Each\none has their own way to link to other databases themselves, which normally includes tons of structure normalization (see\ne.g. doi:<a href=\"https://doi.org/10.1186/s13321-018-0293-8\">10.1186/s13321-018-0293-8</a> and\ndoi:<a href=\"https://doi.org/10.1186/s13321-015-0072-8\">10.1186/s13321-015-0072-8</a>).</p>\n\n<h2 id=\"hmdb\">HMDB</h2>\n<p>HMDB (and the matching <a href=\"https://www.wikidata.org/wiki/Property:P2057\">P2057</a>) has a biological perspective; the entries\nreflect the biology of a chemical. Therefore, for most compounds, they focus on the neutral forms of compounds. This makes\nlinking to/from other databases where the compound is not neutral chemically less precise.</p>\n\n<h2 id=\"cas-registry-numbers\">CAS registry numbers</h2>\n<p>CAS (and the matching <a href=\"https://www.wikidata.org/wiki/Property:P231\">P231</a>) is pretty unique itself, and has identifiers\nfor substances (see <a href=\"https://www.wikidata.org/wiki/Q79529\">Q79529</a>), much more than chemical compounds, and comes with a\nown set of unique features. For example, solutions of some compound, by design, have the same identifier. Previously,\nformaldehyde and formalin had different Wikipedia/Wikidata pages, both with the same CAS registry number.</p>\n\n<h2 id=\"limitations-of-the-links-2\">Limitations of the links #2</h2>\n<p>Now, returning to our starting point: limitations in linking databases. If we want FAIR mappings, we need to be as precise\nas possible. Of course, that may mean we need more steps, but we can always simplify at will, but we never can have a\ncomputer make the links more complex (well, not without making assumptions, etc).</p>\n\n<p>And that is why Wikidata is so suitable to link all these chemical databases: it can distinguish differences when needed,\nand make that explicit. It make mappings between the databases more <a href=\"https://www.nature.com/articles/sdata201618\">FAIR</a>.</p>",
      "summary": "Bar chart showing the number of compounds with a particular chemical identifier. I think Wikidata is a groundbreaking project, which will have a major impact on science. One of the reasons is the open license (CCZero), the very basic approach (Wikibase), and the superb community around it. For example, setting up your own Wikibase including a cool SPARQL endpoint, is easily done with Docker.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/extid-wikidata-histogram.png",
      "date_published": "2018-08-18T00:00:00+00:00",
      "date_modified": "2025-05-25T00:00:00+00:00",
      "tags": ["wikidata","scholia","chemistry","bridgedb","cas","chebi","chemspider","fair","hmdb","pubchem","rdf","wikicite"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.6084/m9.figshare.6356027.v1", "doi": "10.6084/m9.figshare.6356027.v1"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1186/s13321-018-0293-8", "doi": "10.1186/s13321-018-0293-8"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1186/s13321-015-0072-8", "doi": "10.1186/s13321-015-0072-8"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1038/sdata.2016.18", "doi": "10.1038/sdata.2016.18"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/6e3y2-wmy66",
      "url": "https://chem-bla-ics.linkedchemistry.info/2017/12/15/new-paper-integration-among-databases.html",
      "title": "New paper: &quot;Integration among databases and data sets to support productive nanotechnology: Challenges and recommendations&quot;",
      "content_html": "<p><img style=\"float: right;\" src=\"/assets/images/1-s2.0-S2452074817301398-gr1.png\" width=\"200\" alt=\"Figure 1 from the NanoImpact article. CC-BY.\" />\nThe U.S.A and European nanosafety communities have a longstanding history of collaboration. On both sides there are working groups,\n<a href=\"https://nciphub.org/groups/nanowg\">NanoWG</a> and <a href=\"https://www.nanosafetycluster.eu/working-groups/wg-f-data-management.html\">WG-F</a> (previously called\nWG4) of the NanoSafety Cluster. I have been chair of WG4 for about three years and still active in the group, though in the past half year, without\ndedicated funding, less active. That is already changing again with the imminent start of the\n<a href=\"https://twitter.com/iseult5/status/836879814581698560\">NanoCommons</a> project.</p>\n\n<p>One of these collaborations resulted in a series of papers around data curation (see\ndoi:<a href=\"https://doi.org/10.1039/C5NR08944A\">10.1039/C5NR08944A</a> and\ndoi:<a href=\"https://doi.org/10.3762/bjnano.6.189\">10.3762/bjnano.6.189</a>). Part of this effort was also an survey about the state of databases. A good\nnumber of databases responded to the call. It turned out non-trivial to analyse the results and write up a report around it with recommendations.\nThe first version was submitted and rejected, and with fresh leadership, the paper underwent a significant restructuring by\n<a href=\"http://www.codata.org/events/codata-prize/2006-john-rumble-usa\">John Rumble</a> and resubmitted to Elsevier’s\n<a href=\"http://www.sciencedirect.com/science/journal/24520748\">NanoImpact</a> and now online\n(doi:<a href=\"http://dx.doi.org/10.1016/j.impact.2017.11.002\">10.1016/j.impact.2017.11.002</a>).</p>\n\n<p>The paper outlines an overview of challenges and a recommendation to the community on how to proceed. That is, basically, how should projects\nlike <a href=\"https://search.data.enanomapper.net/\">eNanoMapper</a>, <a href=\"https://cananolab.nci.nih.gov/caNanoLab/\">caNanoLab</a>, and\n<a href=\"https://www.nanomaterialregistry.org/\">Nanomaterial Registry</a> evolve to, and what might the\n<a href=\"https://echa.europa.eu/-/eu-observatory-for-nanomaterials-launched\">European Union Observatory for Nanomaterials</a> (EUON) look like. BTW, a\nsimilar paper by Tropsha et al. was recently published the other week with a focus on the USA database ecosystem\n(doi:<a href=\"https://doi.org/10.1038/nnano.2017.233\">10.1038/nnano.2017.233</a>).</p>\n\n<p>Have fun reading <a href=\"https://doi.org/10.1016/j.impact.2017.11.002\">it</a>, and if you are working in a related field, please join\neither of the two aforementioned working groups! And a huge thanks to everyone involved, particular Sandra, John, and Christine.</p>",
      "summary": "The U.S.A and European nanosafety communities have a longstanding history of collaboration. On both sides there are working groups, NanoWG and WG-F (previously called WG4) of the NanoSafety Cluster. I have been chair of WG4 for about three years and still active in the group, though in the past half year, without dedicated funding, less active. That is already changing again with the imminent start of the NanoCommons project.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/1-s2.0-S2452074817301398-gr1.png",
      "date_published": "2017-12-15T00:00:00+00:00",
      "date_modified": "2017-12-15T00:00:00+00:00",
      "tags": ["nanosafety","enanomapper","nanocommons","eunsc"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1039/C5NR08944A", "doi": "10.1039/C5NR08944A"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.3762/bjnano.6.189", "doi": "10.3762/bjnano.6.189"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1016/J.IMPACT.2017.11.002", "doi": "10.1016/J.IMPACT.2017.11.002"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1038/nnano.2017.233", "doi": "10.1038/nnano.2017.233"
             }
            
          
        ],
      
      "_funding": [{"award": { "title" : "eNanoMapper - A Database and Ontology Framework for Nanomaterials Design and Safety Assessment", "acronym" : "eNanoMapper", "uri" : "cordis.project:604134" }, "funder": { "name": "European Commission", "ror": "00k4n6c32" } }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/kmvrd-h9e71",
      "url": "https://chem-bla-ics.linkedchemistry.info/2017/11/26/winter-solstice-challenge-what-is-your.html",
      "title": "Winter solstice challenge: what is your Open Knowledge score?",
      "content_html": "<p><img src=\"/assets/images/Robert_Snache_-_Spirithands.net_-_Winter_Solstice_Lunar_Eclipse_Startrails_(by).jpg\" style=\"width: 30%; display: block; margin-left: auto; margin-right: auto; float: right\" alt=\"Photo of a time laps of a starry night, making the stars show as lines in the sky. Source: Wikimedia, CC-BY 2.0, https://commons.wikimedia.org/wiki/File:Robert_Snache_-_Spirithands.net_-_Winter_Solstice_Lunar_Eclipse_Startrails_(by).jpg)\" />\nHi all, welcome to this winter solstice challenge! Umm, to not give our southern hemisphere colleagues\nnot a disadvantage, as their winter solstice has already passes, you’re up for a summer solstice challenge!</p>\n\n<h2 id=\"introduction\">Introduction</h2>\n\n<p>So, you know <a href=\"http://impactstory.org/\">ImpactStory</a> and <a href=\"http://altmetric.com/\">Altmetric.com</a> (if not,\n<a href=\"https://chem-bla-ics.blogspot.com/search?q=impactstory&amp;max-results=20&amp;by-date=true\">browse</a>\n<a href=\"https://chem-bla-ics.blogspot.com/search?q=altmetric&amp;max-results=20&amp;by-date=true\">my blog</a>);\nthese are wonderful tools to see what people are doing with your work. I hope you already know about\n<a href=\"http://opencitations.net/\">OpenCitations</a>, a collaboration of publishers, CrossRef, and many others, to\nmake all citation data available. They just passed the 50% milestone, congratulations on that amazing\nachievement! For the younger scientists it may be worth realizing that for the past 20 years, at least,\nthis data was copyrighted and not to be used unless you paid. Elsevier is, BTW,\n<a href=\"https://opencitations.wordpress.com/2017/11/24/elsevier-references-dominate-those-that-are-not-open-at-crossref/\">the major culprit</a>\nstill claiming IP on this, but RT this if you are surprised.</p>\n\n<p>So, the reason I introduce both ImpactStory and OpenCitations is the following. Scientific articles are\ndata and knowledge dense documents. If we did not redirect the reader to other literature. That may give\na more complete sketch of the context, describe a measurement protocol, describe how certain knowledge\nwas derived, etc. Therefore, just having your article Open Access is not enough: the articles you cite\nshould be Open Access too. That’s the next phase if really making an effort to have\n<a href=\"https://en.wikisource.org/wiki/Universal_Declaration_of_Human_Rights\">all of humanity benefit from the fruits of science</a>.</p>\n\n<p>I know it is hard already to calculate a “Open Access” score, though ImpactStory does a great job at\nthat! So, calculating this for your paper and the papers those papers cite is even harder. You may\nneed to brush up your algorithm and programming skills.</p>\n\n<h2 id=\"eligibility\">Eligibility</h2>\n\n<p>Anyone is allowed to participate. Submission of your entry is done online, e.g. in your blog, in a public\nwrite up, or even a <a href=\"https://en.wikipedia.org/wiki/Open_notebook_science\">open notebook</a>!\nHowever, you need at least on citable research object. That is, it\nneeds a DOI. Otherwise, I cannot give you the prize (see below). The score should be based on all your\nproducts. Bonus points for those who include software and data citations. Excluding citable object to\nboost your score (for example, I would have to exclude my book chapters), is seen as cheating the system.</p>\n\n<p><img src=\"/assets/images/800px-Global_key-route_main_paths_for_a_citation_network.svg.png\" style=\"width: 40%; display: block; margin-left: auto; margin-right: auto; float: right\" alt=\"Your article B may cite three articles (C, D, J) but article D also cited articles (F, I). So, your Open Knowledge score is recursive. Source: Wikipedia, CC-BY-SA 4.0, https://commons.wikimedia.org/wiki/File:Global_key-route_main_paths_for_a_citation_network.svg\" /></p>\n\n<h2 id=\"depth\">Depth</h2>\n\n<p>Calculating your Open Knowledge score can be done at multiple levels. After all, your article depends\n(cites) articles, and your software depends on libraries, but those cited articles and software\ndependencies recursively also cite articles and/or software. The complexity is non-trivial, making it\na perfect solstice challenge indeed!</p>\n\n<h2 id=\"prizes\">Prizes</h2>\n\n<p>The prize I have to offer is my continued commitment to Open Science, but that you already get for\nfree and may not be enough boon. So, instead, soon after the winter/summer solstice at the end of this year,\nI will blog about your research boosting your <a href=\"https://en.wikipedia.org/wiki/Altmetrics\">#altmetrics</a>\nscores. Yes, I will actually read and try to understand it!</p>\n\n<p>And because there is the results and the method, neither of which exist yet, there are two categories! I just\n<strong><em>doubled your chance</em></strong> of winning! That’s because humanity is worth it! One prize for the best tool to calculated\nyour Open Knowledge score, and one prize for the researcher with the highest score.</p>\n\n<h2 id=\"audience-prize\">Audience Prize</h2>\n\n<p>If someone feels a need to organize an audience prize, this is very much encouraged! (Assuming Open approaches, of course :)</p>",
      "summary": "Hi all, welcome to this winter solstice challenge! Umm, to not give our southern hemisphere colleagues not a disadvantage, as their winter solstice has already passes, you’re up for a summer solstice challenge!",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/Robert_Snache_-_Spirithands.net_-_Winter_Solstice_Lunar_Eclipse_Startrails_(by).jpg",
      "date_published": "2017-11-26T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["solstice","altmetrics","opencitations"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/9skvt-80331",
      "url": "https://chem-bla-ics.linkedchemistry.info/2017/10/15/two-conference-proceedings.html",
      "title": "Two conference proceedings: nanopublications and Scholia",
      "content_html": "<p><img style=\"float: right;\" src=\"/assets/images/Screenshot_20171015_131507.png\" width=\"300\" alt=\"The nanopublication conference article in Scholia.\" />\nIt takes effort to move scholarly publishing forward. And the traditional publishers have not all shown to\nbe good at that: we’re still basically stuck with machine-broken channels like PDFs and ReadCubes. They seem\nto all love text mining, but only if they can do it themselves.</p>\n\n<p>Fortunately, there are plenty of people who do like to make a difference and like to innovate. I find this\nimportant, because if we do not do it, who will. Two people who make an effort are two researchers who\nrecently published their work as conference proceedings: <a href=\"http://www.tkuhn.org/\">Tobias Kuhn</a> and\n<a href=\"https://github.com/fnielsen\">Finn Nielsen</a>. And I am happy to have been able to contribute to both efforts.</p>\n\n<h2 id=\"nanopublications\">Nanopublications</h2>\n\n<p>Tobias works on <a href=\"https://web.archive.org/web/20171004200524/http://nanopub.org/wordpress/\">nanopublications <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>\nwhich innovates how we make knowledge machine\nreadable. And I have stressed how important this is in my blog for years. Nanopublications describe how\nknowledge is captures, makes it FAIR, but importantly, it links the knowledge to the research that led to the\nknowledge. His <a href=\"https://doi.org/10.1007/978-3-319-68288-4_26\">recent conference proceedings</a>\ndetails how nanopublications can be used to establish incremental\nknowledge. That is, given two sets of nanopubblications, it determines which have been removed, added, and\nchanged. The paper continues outlining how that can be used to reduce, for example, download sizes and how\nit can help establish an efficient change history.</p>\n\n<h2 id=\"scholia\">Scholia</h2>\n\n<p>And Finn developed <a href=\"https://scholia.toolforge.org/\">Scholia <i class=\"fa-solid fa-recycle fa-xs\"></i></a>, an interface not unlike Web-of-Science. But\nthen based on <a href=\"http://wikidata.org/\">Wikidata</a> and therefore fully on CCZero data. And, with a community\nactively adding the full history of scholarly literature and the citations between papers, courtesy to the\n<a href=\"https://i4oc.org/\">Initiative for Open Citations</a>. This is opening up a lot of possibilities: from keeping\ntrack of articles citing your work, to get alerts of articles publishing new data on your favorite gene or\nmetabolite.</p>",
      "summary": "It takes effort to move scholarly publishing forward. And the traditional publishers have not all shown to be good at that: we’re still basically stuck with machine-broken channels like PDFs and ReadCubes. They seem to all love text mining, but only if they can do it themselves.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/Screenshot_20171015_131507.png",
      "date_published": "2017-10-15T00:00:00+00:00",
      "date_modified": "2025-02-16T00:00:00+00:00",
      "tags": ["scholia","nanopub"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.48550/ARXIV.1703.04222", "doi": "10.48550/ARXIV.1703.04222"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1007/978-3-319-68288-4_26", "doi": "10.1007/978-3-319-68288-4_26"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2016/12/18/my-swat4ls-poster-about-enanomapper.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2016/12/18/my-swat4ls-poster-about-enanomapper.html",
      "title": "The SWAT4LS poster about eNanoMapper",
      "content_html": "<p><a href=\"http://www.swat4ls.org/\">SWAT4LS</a> was once again a great meeting. I doubt I will find time soon enough to\nwrite up notes, but at least I can post the <a href=\"http://enanomapper.net/\">eNanoMapper</a> poster I presented, which\nis available from <a href=\"https://f1000research.com/\">F1000Research</a>:</p>\n\n<p><img src=\"/assets/images/enanomapper_poster.png\" alt=\"\" /></p>",
      "summary": "SWAT4LS was once again a great meeting. I doubt I will find time soon enough to write up notes, but at least I can post the eNanoMapper poster I presented, which is available from F1000Research:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/enanomapper_poster.png",
      "date_published": "2016-12-18T00:00:00+00:00",
      "date_modified": "2016-12-18T00:00:00+00:00",
      "tags": ["swat4ls","enanomapper","sparql"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.7490/f1000RESEARCH.1113520.1", "doi": "10.7490/f1000RESEARCH.1113520.1"
             }
            
          
        ],
      
      "_funding": [{"award": { "title" : "eNanoMapper - A Database and Ontology Framework for Nanomaterials Design and Safety Assessment", "acronym" : "eNanoMapper", "uri" : "cordis.project:604134" }, "funder": { "name": "European Commission", "ror": "00k4n6c32" } }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/nwnd6-hj737",
      "url": "https://chem-bla-ics.linkedchemistry.info/2016/07/02/two-apache-jena-sparql-query.html",
      "title": "Two Apache Jena SPARQL query performance observations",
      "content_html": "<p><span style=\"width: 50%; display: block; margin-left: auto; margin-right: auto; float: right\">\n<img src=\"/assets/images/jenaSlow.png\" />\n</span></p>\n\n<p>Doing searches in RDF stores is commonly done with SPARQL queries. I have been using this with <a href=\"http://chem-bla-ics.blogspot.nl/2016/06/new-paper-using-semantic-web-for-rapid.html\">the semantic web translation of WikiPathways</a>\nby <a href=\"https://twitter.com/andrawaag\">Andra</a> to find common content issues, though sometimes combined with some additional Java code.\nFor example, find <a href=\"http://www.ncbi.nlm.nih.gov/pubmed\">PubMed</a> identifiers that are not numbers.</p>\n\n<p>Based on <a href=\"http://orcid.org/0000-0003-3477-7443\">Ryan</a>’s work on interactions, a more complex curation query I\nrecently wrote in reply to issues that <a href=\"https://twitter.com/xanderpico\">Alex</a> ran into with converting pathways to\nBioPax, is to find interactions that convert a gene to another gene. Such occurred in <a href=\"http://wikipathways.org/\">WikiPathways</a>\nbecause graphically you do not see the difference. I originally had this query:</p>\n\n<div class=\"language-sparql highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">SELECT</span><span class=\"w\"> </span><span class=\"p\">(</span><span class=\"nb\">str</span><span class=\"p\">(</span><span class=\"nv\">?organismName</span><span class=\"p\">)</span><span class=\"w\"> </span><span class=\"k\">as</span><span class=\"w\"> </span><span class=\"nv\">?organism</span><span class=\"p\">)</span><span class=\"w\"> </span><span class=\"nv\">?page</span><span class=\"w\">\n       </span><span class=\"nv\">?gene1</span><span class=\"w\"> </span><span class=\"nv\">?gene2</span><span class=\"w\"> </span><span class=\"nv\">?interaction</span><span class=\"w\">\n</span><span class=\"k\">WHERE</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\">\n  </span><span class=\"nv\">?gene1</span><span class=\"w\"> </span><span class=\"k\">a</span><span class=\"w\"> </span><span class=\"nn\">wp</span><span class=\"o\">:</span><span class=\"ss\">GeneProduct</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n  </span><span class=\"nv\">?gene2</span><span class=\"w\"> </span><span class=\"k\">a</span><span class=\"w\"> </span><span class=\"nn\">wp</span><span class=\"o\">:</span><span class=\"ss\">GeneProduct</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n  </span><span class=\"nv\">?interaction</span><span class=\"w\"> </span><span class=\"nn\">wp</span><span class=\"o\">:</span><span class=\"ss\">source</span><span class=\"w\"> </span><span class=\"nv\">?gene1</span><span class=\"w\"> </span><span class=\"p\">;</span><span class=\"w\">\n    </span><span class=\"nn\">wp</span><span class=\"o\">:</span><span class=\"ss\">target</span><span class=\"w\"> </span><span class=\"nv\">?gene2</span><span class=\"w\"> </span><span class=\"p\">;</span><span class=\"w\">\n    </span><span class=\"k\">a</span><span class=\"w\"> </span><span class=\"nn\">wp</span><span class=\"o\">:</span><span class=\"ss\">Conversion</span><span class=\"w\"> </span><span class=\"p\">;</span><span class=\"w\">\n    </span><span class=\"nn\">dcterms</span><span class=\"o\">:</span><span class=\"ss\">isPartOf</span><span class=\"w\"> </span><span class=\"nv\">?pathway</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n  </span><span class=\"nv\">?pathway</span><span class=\"w\"> </span><span class=\"nn\">foaf</span><span class=\"o\">:</span><span class=\"ss\">page</span><span class=\"w\"> </span><span class=\"nv\">?page</span><span class=\"w\"> </span><span class=\"p\">;</span><span class=\"w\">\n    </span><span class=\"nn\">wp</span><span class=\"o\">:</span><span class=\"ss\">organismName</span><span class=\"w\"> </span><span class=\"nv\">?organismName</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n</span><span class=\"p\">}</span><span class=\"w\"> </span><span class=\"k\">ORDER</span><span class=\"w\"> </span><span class=\"k\">BY</span><span class=\"w\"> </span><span class=\"k\">ASC</span><span class=\"p\">(</span><span class=\"nv\">?organism</span><span class=\"p\">)</span><span class=\"w\">\n</span></code></pre></div></div>\n\n<p>This query properly found all gene-gene conversions to be fixed. However, it was also horribly slow with my\n<a href=\"http://junit.org/\">JUnit</a>/<a href=\"https://jena.apache.org/\">Apache Jena</a> set up. The queries runs very efficiently on <a href=\"http://sparql.wikipathways.org/\">the Virtuoso-based SPARQL end point</a>.\nI had been trying to speed it up in the past, but without much success. Instead, I ended up batching the\ntesting on our Jenkins instance. But this got a bit silly, with at some point subsets of less than 100 pathways.</p>\n\n<h2 id=\"observation-1\">Observation #1</h2>\n\n<p>So, I <a href=\"https://twitter.com/egonwillighagen/status/748817658758344704\">turned to twitter</a>, and quite soon got\n<a href=\"https://twitter.com/xbib/status/748818534457716736\">three</a> <a href=\"https://twitter.com/jervenbolleman/status/748820145028550656\">useful</a>\n<a href=\"https://twitter.com/soilandreyes/status/748891148182257664\">leads</a>. The first two suggestions did not help, but helped me rule out the problem.\nOf course, there is literature about optimizing, like this recent paper by Antonis (doi:<a href=\"http://doi.org/10.1016/j.websem.2014.11.003\">10.1016/j.websem.2014.11.003</a>),\nbut I haven’t been able to convert this knowledge into practical steps either. After ruling out these options (though I kept the\n<code class=\"language-plaintext highlighter-rouge\">sameTerm()</code> suggestion), and realized it had to be the first two triples with the variables <code class=\"language-plaintext highlighter-rouge\">?gene1</code> and <code class=\"language-plaintext highlighter-rouge\">?gene2</code>. So,\n<a href=\"https://github.com/BiGCAT-UM/WikiPathwaysCurator/commit/b8283419b252bd8525631d5035d086a15d0773e0\">I tried using <em>FILTER</em> there too</a>,\nresulting with this query:</p>\n\n<div class=\"language-sparql highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">WHERE</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\">\n  </span><span class=\"nv\">?interaction</span><span class=\"w\"> </span><span class=\"nn\">wp</span><span class=\"o\">:</span><span class=\"ss\">source</span><span class=\"w\"> </span><span class=\"nv\">?gene1</span><span class=\"w\"> </span><span class=\"p\">;</span><span class=\"w\">\n    </span><span class=\"nn\">wp</span><span class=\"o\">:</span><span class=\"ss\">target</span><span class=\"w\"> </span><span class=\"nv\">?gene2</span><span class=\"w\"> </span><span class=\"p\">;</span><span class=\"w\">\n    </span><span class=\"k\">a</span><span class=\"w\"> </span><span class=\"nn\">wp</span><span class=\"o\">:</span><span class=\"ss\">Conversion</span><span class=\"w\"> </span><span class=\"p\">;</span><span class=\"w\">\n    </span><span class=\"nn\">dcterms</span><span class=\"o\">:</span><span class=\"ss\">isPartOf</span><span class=\"w\"> </span><span class=\"nv\">?pathway</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n  </span><span class=\"nv\">?pathway</span><span class=\"w\"> </span><span class=\"nn\">foaf</span><span class=\"o\">:</span><span class=\"ss\">page</span><span class=\"w\"> </span><span class=\"nv\">?page</span><span class=\"w\"> </span><span class=\"p\">;</span><span class=\"w\">\n    </span><span class=\"nn\">wp</span><span class=\"o\">:</span><span class=\"ss\">organismName</span><span class=\"w\"> </span><span class=\"nv\">?organismName</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n  </span><span class=\"k\">FILTER</span><span class=\"w\"> </span><span class=\"p\">(</span><span class=\"o\">!</span><span class=\"nb\">sameTerm</span><span class=\"p\">(</span><span class=\"nv\">?gene1</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"nv\">?gene2</span><span class=\"p\">))</span><span class=\"w\">\n  </span><span class=\"k\">FILTER</span><span class=\"w\"> </span><span class=\"p\">(</span><span class=\"nv\">?gene1</span><span class=\"w\"> </span><span class=\"k\">a</span><span class=\"w\"> </span><span class=\"nn\">wp</span><span class=\"o\">:</span><span class=\"ss\">GeneProduct</span><span class=\"p\">)</span><span class=\"w\">\n  </span><span class=\"k\">FILTER</span><span class=\"w\"> </span><span class=\"p\">(</span><span class=\"nv\">?gene2</span><span class=\"w\"> </span><span class=\"k\">a</span><span class=\"w\"> </span><span class=\"nn\">wp</span><span class=\"o\">:</span><span class=\"ss\">GeneProduct</span><span class=\"p\">)</span><span class=\"w\">\n</span><span class=\"p\">}</span><span class=\"w\"> </span><span class=\"k\">ORDER</span><span class=\"w\"> </span><span class=\"k\">BY</span><span class=\"w\"> </span><span class=\"k\">ASC</span><span class=\"p\">(</span><span class=\"nv\">?organism</span><span class=\"p\">)</span><span class=\"w\">\n</span></code></pre></div></div>\n\n<p>That did it! The time to run a query halved. Not so surprising, in retrospect, but it all depends on the SPARQL engine:\nwhich parts does it run first. Apparently, Jena’s SPARQL engine starts at the top. This seems to be confirmed by\n<a href=\"https://twitter.com/soilandreyes/status/748891148182257664\">the third comment I got</a>. However, I always understood\nengine can also start at the bottom.</p>\n\n<h2 id=\"observation-2\">Observation #2</h2>\n\n<p>But that’s not all. This speed up made me wonder something else. The problem clearly seems to engine approach to run\nparts of the query. So, what if I remove further choices in what to run first? That leads me to\n<a href=\"https://twitter.com/egonwillighagen/status/748844395701506048\">a second observation</a>. It helps significantly if you\nreduce the number of subgraphs it should later “merge”. Instead, if possible, use\n<a href=\"https://www.w3.org/TR/sparql11-query/#propertypaths\">property paths</a>. That again, about halved the runtime of the query.\nI ended up with the below query, which, obviously, no longer give me access to the pathway resources, but I can live\nwith that:</p>\n\n<div class=\"language-sparql highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">WHERE</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\">\n  </span><span class=\"nv\">?interaction</span><span class=\"w\"> </span><span class=\"nn\">wp</span><span class=\"o\">:</span><span class=\"ss\">source</span><span class=\"w\"> </span><span class=\"nv\">?gene1</span><span class=\"w\"> </span><span class=\"p\">;</span><span class=\"w\">\n    </span><span class=\"nn\">wp</span><span class=\"o\">:</span><span class=\"ss\">target</span><span class=\"w\"> </span><span class=\"nv\">?gene2</span><span class=\"w\"> </span><span class=\"p\">;</span><span class=\"w\">\n    </span><span class=\"k\">a</span><span class=\"w\"> </span><span class=\"nn\">wp</span><span class=\"o\">:</span><span class=\"ss\">Conversion</span><span class=\"w\"> </span><span class=\"p\">;</span><span class=\"w\">\n    </span><span class=\"nn\">dcterms</span><span class=\"o\">:</span><span class=\"ss\">isPartOf</span><span class=\"o\">/</span><span class=\"nn\">foaf</span><span class=\"o\">:</span><span class=\"ss\">page</span><span class=\"w\"> </span><span class=\"nv\">?pathway</span><span class=\"w\"> </span><span class=\"p\">;</span><span class=\"w\">\n    </span><span class=\"nn\">dcterms</span><span class=\"o\">:</span><span class=\"ss\">isPartOf</span><span class=\"o\">/</span><span class=\"nn\">wp</span><span class=\"o\">:</span><span class=\"ss\">organismName</span><span class=\"w\"> </span><span class=\"nv\">?organismName</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n  </span><span class=\"k\">FILTER</span><span class=\"w\"> </span><span class=\"p\">(</span><span class=\"o\">!</span><span class=\"nb\">sameTerm</span><span class=\"p\">(</span><span class=\"nv\">?gene1</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"nv\">?gene2</span><span class=\"p\">))</span><span class=\"w\">\n  </span><span class=\"k\">FILTER</span><span class=\"w\"> </span><span class=\"k\">EXISTS</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"nv\">?gene1</span><span class=\"w\"> </span><span class=\"k\">a</span><span class=\"w\"> </span><span class=\"nn\">wp</span><span class=\"o\">:</span><span class=\"ss\">GeneProduct</span><span class=\"p\">}</span><span class=\"w\">\n  </span><span class=\"k\">FILTER</span><span class=\"w\"> </span><span class=\"k\">EXISTS</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"nv\">?gene2</span><span class=\"w\"> </span><span class=\"k\">a</span><span class=\"w\"> </span><span class=\"nn\">wp</span><span class=\"o\">:</span><span class=\"ss\">GeneProduct</span><span class=\"p\">}</span><span class=\"w\">\n</span><span class=\"p\">}</span><span class=\"w\"> </span><span class=\"k\">ORDER</span><span class=\"w\"> </span><span class=\"k\">BY</span><span class=\"w\"> </span><span class=\"k\">ASC</span><span class=\"p\">(</span><span class=\"nv\">?organism</span><span class=\"p\">)</span><span class=\"w\">\n</span></code></pre></div></div>\n\n<p>I’m hoping these two observations may help other with using Apache Jena with unit and integrated testing of RDF generation too.</p>",
      "summary": "",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/jenaSlow.png",
      "date_published": "2016-07-02T00:00:00+00:00",
      "date_modified": "2016-07-02T00:00:00+00:00",
      "tags": ["curation","wikipathways","sparql","rdf"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1016/j.websem.2014.11.003", "doi": "10.1016/j.websem.2014.11.003"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/me1j9-t5g38",
      "url": "https://chem-bla-ics.linkedchemistry.info/2016/06/25/new-paper-using-semantic-web-for-rapid.html",
      "title": "New Paper: &quot;Using the Semantic Web for Rapid Integration of WikiPathways with Other Biological Online Data Resources&quot;",
      "content_html": "<p><a href=\"http://micelio.be/\">Andra Waagmeester</a> published a paper on his work on a semantic web version of the <a href=\"https://wikipathways.org/\">WikiPathways</a>\n(doi:<a href=\"https://doi.org/10.1371/journal.pcbi.1004989\">10.1371/journal.pcbi.1004989</a>). The paper outlines the design decisions, shows\n<a href=\"https://sparql.wikipathways.org/\">the SPARQL endpoint</a>, and several examples SPARQL queries. These include federates queries, like a mashup\nwith <a href=\"https://www.disgenet.org/\">DisGeNET</a> (doi:<a href=\"https://doi.org/10.1093/database/bav028\">10.1093/database/bav028</a>) and EMBL-EBI’s\n<a href=\"https://www.ebi.ac.uk/gxa/home\">Expression Atlas</a>. That results in nice visualisations like this:</p>\n\n<p><img src=\"/assets/images/journal.pcbi.1004989.g002.PNG\" alt=\"\" /></p>\n\n<p>If you have the relevant information in the pathway, these pathways can help a lot in helping understanding of what is biologically going on.\nAnd, of course, used for exactly that a lot.</p>\n\n<h2 id=\"press-release\">Press release</h2>\n\n<p>Because press releases have become an interesting tool in knowledge dissemination, I wanted to learn what it involved to get one out. This\ninvolved the people as <a href=\"http://journals.plos.org/ploscompbiol/\">PLOS Computational Biology</a> and the press offices of the Gladstone Institutes\nand our Maastricht University (<a href=\"https://gladstone.org/about-us/news/easy-integration-biological-knowledge-improves-understanding-diseases\">press release 1</a>,\n<a href=\"https://www.maastrichtuniversity.nl/news/easy-integrating-biological-knowledge-improves-understanding-diseases\">press release 2 EN</a>/<a href=\"https://www.maastrichtuniversity.nl/nl/nieuws/eenvoudigere-integratie-van-biologische-kennis-verbetert-begrip-van-ziekten\">NL</a>).\nThere is already one thing I learned in retrospect, and I am pissed with myself that I did not think of this: you should always have a\ngraphics supporting your story. I have been doing this for a long time in my blog now (sometimes I still forget), but did not think of\nthat in the press release. The press release was picked up by three outlets, though all basically as we presented it to them (thanks to\n<a href=\"http://altmetric.com/\">Altmetric.com</a>):</p>\n\n<p><img src=\"/assets/images/pressReleaseUptake.png\" alt=\"\" /></p>\n\n<h2 id=\"sparql\">SPARQL</h2>\n\n<p>But what makes me appreciate this piece of work, and WikiPathways itself, is how it creates a central hub of biological knowledge.\nPathway databases capture knowledge not easily embedded an generally structured (relational) databases. As such, expression this\nin the RDF format seems simple enough. The thing I really love about this approach, is that your queries become machine readable\nstories, particularly when you start using human readable variants of SPARQL for this. And you can\n<a href=\"http://chem-bla-ics.blogspot.nl/2009/08/bioclipse-and-sparql-end-points-2.html\">share these queries with the online scientific community with, for example, myExperiment</a>.</p>\n\n<p>There are two applications how I have used SPARQL on WikiPathways data for metabolomics: 1. curation; 2. statistics. Data analysis\nis harder, because in the RDF world resources scientific lenses are needed to accommodate for the chemical structural-temporal\ncomplexity of metabolites. For curation, we have long used SPARQL for unit tests to support the curation of WikiPathways.\nMoreover, I have manually used the SPARQL end point to find curation tasks. But now that the paper is out, I can blog about\nthis more. For now, <a href=\"http://www.wikipathways.org/index.php/Help:WikiPathways_Sparql_queries\">many examples SPARQL queries can be found in the WikiPathways wiki</a>.\nIt features several queries showing statistics, but also some for curation. This is an example query I use to improve the\ninteroperability of WikiPathways with <a href=\"https://wikidata.org/\">Wikidata</a> (also for <a href=\"https://bridgedb.org/\">BridgeDb</a>):</p>\n\n<div class=\"language-sparql highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">SELECT</span><span class=\"w\"> </span><span class=\"k\">DISTINCT</span><span class=\"w\"> </span><span class=\"nv\">?metabolite</span><span class=\"w\"> </span><span class=\"k\">WHERE</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\">\n  </span><span class=\"nv\">?metabolite</span><span class=\"w\"> </span><span class=\"k\">a</span><span class=\"w\"> </span><span class=\"nn\">wp</span><span class=\"o\">:</span><span class=\"ss\">Metabolite</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n  </span><span class=\"k\">OPTIONAL</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\"> </span><span class=\"nv\">?metabolite</span><span class=\"w\"> </span><span class=\"nn\">wp</span><span class=\"o\">:</span><span class=\"ss\">bdbWikidata</span><span class=\"w\"> </span><span class=\"nv\">?wikidata</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\"> </span><span class=\"p\">}</span><span class=\"w\">\n  </span><span class=\"k\">FILTER</span><span class=\"w\"> </span><span class=\"p\">(</span><span class=\"o\">!</span><span class=\"nb\">BOUND</span><span class=\"p\">(</span><span class=\"nv\">?wikidata</span><span class=\"p\">))</span><span class=\"w\">\n</span><span class=\"p\">}</span><span class=\"w\">\n</span></code></pre></div></div>\n\n<p>Feel free to give this query a go at <a href=\"https://sparql.wikipathways.org/\">sparql.wikipathways.org</a>!</p>\n\n<h2 id=\"triptych\">Triptych</h2>\n\n<p>This papers completes a nice triptych of three papers about WikiPathways in the past 6 months. Thanks to\nwhole community and <a href=\"http://www.wikipathways.org/index.php/Special:ContributionScores\">the very many contributors</a>!\nAll three papers are linked below.</p>",
      "summary": "Andra Waagmeester published a paper on his work on a semantic web version of the WikiPathways (doi:10.1371/journal.pcbi.1004989). The paper outlines the design decisions, shows the SPARQL endpoint, and several examples SPARQL queries. These include federates queries, like a mashup with DisGeNET (doi:10.1093/database/bav028) and EMBL-EBI’s Expression Atlas. That results in nice visualisations like this:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/journal.pcbi.1004989.g002.PNG",
      "date_published": "2016-06-25T00:00:00+00:00",
      "date_modified": "2016-06-25T00:00:00+00:00",
      "tags": ["wikipathways","curation","sparql","rdf","wikidata"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1093/database/bav028", "doi": "10.1093/database/bav028"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1371/JOURNAL.PCBI.1004989", "doi": "10.1371/JOURNAL.PCBI.1004989"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1093/NAR/GKV1024", "doi": "10.1093/NAR/GKV1024"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1371/journal.pcbi.1004941", "doi": "10.1371/journal.pcbi.1004941"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/n403c-fb376",
      "url": "https://chem-bla-ics.linkedchemistry.info/2016/03/27/migrating-pka-data-from-drugmet-to.html",
      "title": "Migrating pKa data from DrugMet to Wikidata",
      "content_html": "<p>In 2010 <a href=\"https://twitter.com/smllmp\">Samuel Lampa</a> and I started a pet project:\ncollecting pK<sub>a</sub> data: he was working on RDF extension of MediaWiki and I like consuming\nRDF data. We started <a href=\"http://drugmet.rilspace.org/wiki/Main_Page\">DrugMet</a>.\nWhen you read this post, this MediaWiki installation may already be down, which\nis why I am migrating the data to <a href=\"https://en.wikipedia.org/wiki/Wikidata\">Wikidata</a>.\nWhy? Because data curation takes effort, I like to play with Wikidata (see\n<a href=\"http://rio.pensoft.net/articles.php?id=7573\">this H2020 proposal</a> by \n<a href=\"https://twitter.com/EvoMRI\">Daniel Mietchen</a> <em>et al.</em>), I like Open Data, and it still\n<a href=\"http://proteinsandwavefunctions.blogspot.nl/2016/03/generating-protonation-states-and.html\">much needed</a>.</p>\n\n<p>We opted for a page with the minimal amount of information. To maximize the speed\nat which we could add information. However, when it came to semantics, we tried\nto be as explicit as possible, and, e.g. use <a href=\"https://doi.org/10.1371/journal.pone.0025513\">the CHEMINF ontology</a>.\nSo, it collected:</p>\n\n<ol>\n  <li>InChIKey (used to show images)</li>\n  <li>the paper it was collected from (identified by a DOI)</li>\n  <li>the value, and where possible, the experimental error</li>\n</ol>\n\n<p>A page typically looks something like this:</p>\n\n<p><img src=\"/assets/images/pKa.png\" alt=\"\" /></p>\n\n<p>While not used on all pages, at some point I even started using templates, and\nI used these two, for molecules and papers:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>{{Molecule\n  |Name=\n  |InChIKey=\n  |DOI=\n  |Wikidata=\n}}\n\n{{Paper\n  |DOI=\n  |Year=\n  |Wikidata=\n}}\n</code></pre></div></div>\n\n<p>These templates, as well as the above screenshot, already contain a spoiler, but\nmore about that later. Using MediaWiki functionality it was now easy to make lists,\ne.g. for all pK<sub>a</sub> data (more spoilers):</p>\n\n<p><img src=\"/assets/images/pKa1.png\" alt=\"\" /></p>\n\n<p>I find a database like this very important. It does not capture all the information\nit should be capturing, though, as is clear from <a href=\"https://www.overleaf.com/read/wqfsxrgrrbzx\">the proposal</a>\nsome of use worked on a while back. However, this project got on hold; I don’t\nhave time for it anymore, and it is not core to our department enough to spend\ntime on write grant proposals for it.</p>\n\n<p>But I still do not want to get this data get lost. Wikidata is something I have\nstarted using, as it is a machine readable CCZero database with an increasing\namount of scientific knowledge. More and more people are working on it, and you\nmust absolutely <a href=\"http://dx.doi.org/10.1093/database/baw015\">read this paper</a>\nabout this very topic (by <a href=\"https://bitbucket.org/sulab/wikidatabots\">a great team</a>\nyou should track, anyway). I am using it myself as source of identifier mappings\nand more. So, migrating the previously collected data to Wikidata makes perfect\nsense to me:</p>\n\n<ol>\n  <li>if a compound is missing, I can easily <a href=\"https://chem-bla-ics.linkedchemistry.info/2016/03/20/adding-disclosures-to-wikidata-with.html\">create a new one using Bioclipse <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n  <li>if a paper is missing, I can easily <a href=\"https://chem-bla-ics.linkedchemistry.info/2016/03/20/adding-disclosures-to-wikidata-with.html\">create a new one using Magnus Manske’s QuickStatements <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n  <li>Wikidata has a pretty decent provenance model</li>\n</ol>\n\n<p>I can annotate data with the data source (paper) it came from and also experimental conditions:</p>\n\n<p><img src=\"/assets/images/pKa2.png\" alt=\"\" /></p>\n\n<p>In fact, you’ll note that the the book is a separate Wikidata entry in itself.\nBetter even, it’s an ‘edition’ of the book. This is the whole point we make in\nthe above linked H2020 proposal: Wikidata is not a database specific for one\ndomain, it works for any (scholarly) domain, and seamlessly links all those\ndomains.</p>\n\n<p>Now, to keep track of what data I have migrated, I am annotating DrugMet entries\nwith links to Wikidata: everything with a Wikidata Q-code is already migrated.\nThe above pK<sub>a</sub> table already shows Q-identifiers, but I also created them for all\ndata sources I have used (three of them are two books and\n<a href=\"https://twitter.com/JBiolChem/status/713779938969698305\">one old paper without a DOI</a>):</p>\n\n<p><img src=\"/assets/images/pKa3.png\" alt=\"\" /></p>\n\n<p>I have still quite a number of entries to do, but all the protocols are set up now.</p>\n\n<p>On the downstream side, Wikidata is also great because of\n<a href=\"https://query.wikidata.org/\">their SPARQL end point</a>. Something that I did not\nget worked out some weeks ago, I did manage yesterday (after\n<a href=\"https://twitter.com/arthursmith/status/713730159422095360\">some encouragement from @arthursmith</a>):\nlist all pK<sub>a</sub> statements, including literature source if available:</p>\n\n<p>If you <a href=\"https://query.wikidata.org/#SELECT%20%3Fwikidata%20%3Fcompound%20%3FpKa%20%3Fsource%20%3Ftitle%20%3Fdoi%20WHERE%20%7B%0A%20%20%3Fwikidata%20%3Chttp%3A%2F%2Fwww.wikidata.org%2Fprop%2FP1117%3E%20%3Ffoo%20%3B%0A%20%20%20%20rdfs%3Alabel%20%3Fcompound%20.%0A%20%20%3Ffoo%20a%20wikibase%3ABestRank%20%3B%0A%20%20%20%20%3Chttp%3A%2F%2Fwww.wikidata.org%2Fprop%2Fstatement%2FP1117%3E%20%3FpKa%20.%0A%20%20OPTIONAL%20%7B%0A%20%20%20%20%3Ffoo%20prov%3AwasDerivedFrom%2F%3Chttp%3A%2F%2Fwww.wikidata.org%2Fprop%2Freference%2FP248%3E%20%3Fsource%20.%0A%20%20%20%20%3Fsource%20rdfs%3Alabel%20%3Ftitle%20.%0A%20%20%20%20OPTIONAL%20%7B%20%3Fsource%20wdt%3AP356%20%3Fdoi%20.%20%7D%0A%20%20%20%20FILTER(lang(%3Ftitle)%20%3D%20%22en%22)%0A%20%20%7D%0A%20%20FILTER(lang(%3Fcompound)%20%3D%20%22en%22)%0A%7D\">run that query on the Wikidata endpoint</a>,\nyou get a table like this:</p>\n\n<p><img src=\"/assets/images/pKa4.png\" alt=\"\" /></p>\n\n<p>We here see experimental data from two papers: <a href=\"https://doi.org/10.1021/ja01489a008\">10.1021/ja01489a008</a>\nand <a href=\"https://doi.org/10.1021/ed050p510\">10.1021/ed050p510</a>. This can all be\ndisplayed a lot fancier, like make histograms, tables with 2D drawings of the\nchemical structures, etc, but I leave that to the reader.</p>",
      "summary": "In 2010 Samuel Lampa and I started a pet project: collecting pKa data: he was working on RDF extension of MediaWiki and I like consuming RDF data. We started DrugMet. When you read this post, this MediaWiki installation may already be down, which is why I am migrating the data to Wikidata. Why? Because data curation takes effort, I like to play with Wikidata (see this H2020 proposal by Daniel Mietchen et al.), I like Open Data, and it still much needed.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/pKa4.png",
      "date_published": "2016-03-27T00:00:00+00:00",
      "date_modified": "2025-02-16T00:00:00+00:00",
      "tags": ["wikidata","chemistry"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.3897/RIO.1.E7573", "doi": "10.3897/RIO.1.E7573"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1371/JOURNAL.PONE.0025513", "doi": "10.1371/JOURNAL.PONE.0025513"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1093/DATABASE/BAW015", "doi": "10.1093/DATABASE/BAW015"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1021/ED050P510", "doi": "10.1021/ED050P510"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1021/JA01489A008", "doi": "10.1021/JA01489A008"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/k8jnz-7fb76",
      "url": "https://chem-bla-ics.linkedchemistry.info/2016/03/20/adding-disclosures-to-wikidata-with.html",
      "title": "Adding disclosures to Wikidata with Bioclipse",
      "content_html": "<p>Last week the huge, bi-annual ACS meeting took place (<a href=\"https://twitter.com/search?q=%23ACSSanDiego\">#ACSSanDiego</a>),\nduring which commonly new drug (leads) are disclosed. This time too, like this one tweeted by\n<a href=\"https://twitter.com/beth_halford\">Bethany Halford</a>:</p>\n\n<iframe id=\"twitter-widget-3\" scrolling=\"no\" frameborder=\"0\" allowtransparency=\"true\" allowfullscreen=\"true\" class=\"\" title=\"X Post\" src=\"https://platform.twitter.com/embed/Tweet.html?dnt=false&amp;embedId=twitter-widget-3&amp;features=eyJ0ZndfdGltZWxpbmVfbGlzdCI6eyJidWNrZXQiOltdLCJ2ZXJzaW9uIjpudWxsfSwidGZ3X2ZvbGxvd2VyX2NvdW50X3N1bnNldCI6eyJidWNrZXQiOnRydWUsInZlcnNpb24iOm51bGx9LCJ0ZndfdHdlZXRfZWRpdF9iYWNrZW5kIjp7ImJ1Y2tldCI6Im9uIiwidmVyc2lvbiI6bnVsbH0sInRmd19yZWZzcmNfc2Vzc2lvbiI6eyJidWNrZXQiOiJvbiIsInZlcnNpb24iOm51bGx9LCJ0ZndfZm9zbnJfc29mdF9pbnRlcnZlbnRpb25zX2VuYWJsZWQiOnsiYnVja2V0Ijoib24iLCJ2ZXJzaW9uIjpudWxsfSwidGZ3X21peGVkX21lZGlhXzE1ODk3Ijp7ImJ1Y2tldCI6InRyZWF0bWVudCIsInZlcnNpb24iOm51bGx9LCJ0ZndfZXhwZXJpbWVudHNfY29va2llX2V4cGlyYXRpb24iOnsiYnVja2V0IjoxMjA5NjAwLCJ2ZXJzaW9uIjpudWxsfSwidGZ3X3Nob3dfYmlyZHdhdGNoX3Bpdm90c19lbmFibGVkIjp7ImJ1Y2tldCI6Im9uIiwidmVyc2lvbiI6bnVsbH0sInRmd19kdXBsaWNhdGVfc2NyaWJlc190b19zZXR0aW5ncyI6eyJidWNrZXQiOiJvbiIsInZlcnNpb24iOm51bGx9LCJ0ZndfdXNlX3Byb2ZpbGVfaW1hZ2Vfc2hhcGVfZW5hYmxlZCI6eyJidWNrZXQiOiJvbiIsInZlcnNpb24iOm51bGx9LCJ0ZndfdmlkZW9faGxzX2R5bmFtaWNfbWFuaWZlc3RzXzE1MDgyIjp7ImJ1Y2tldCI6InRydWVfYml0cmF0ZSIsInZlcnNpb24iOm51bGx9LCJ0ZndfbGVnYWN5X3RpbWVsaW5lX3N1bnNldCI6eyJidWNrZXQiOnRydWUsInZlcnNpb24iOm51bGx9LCJ0ZndfdHdlZXRfZWRpdF9mcm9udGVuZCI6eyJidWNrZXQiOiJvbiIsInZlcnNpb24iOm51bGx9fQ%3D%3D&amp;frame=false&amp;hideCard=false&amp;hideThread=false&amp;id=710543705812426752&amp;lang=en&amp;origin=https%3A%2F%2Fchem-bla-ics.blogspot.com%2F2016%2F03%2Fadding-disclosures-to-wikidata-with.html&amp;sessionId=ba8a9ed10d55387ac0f656bfaf73f3a579e1e77a&amp;theme=light&amp;widgetsVersion=2615f7e52b7e0%3A1702314776716&amp;width=550px\" style=\"position: static; visibility: visible; width: 550px; height: 1311px; display: block; flex-grow: 1;\" data-tweet-id=\"710543705812426752\"></iframe>\n<p><br /></p>\n\n<p>Because getting this information out in the open is important, I think it’s a good idea to add them to\n<a href=\"http://wikidata.org/\">Wikidata</a> (see doi:<a href=\"http://dx.doi.org/10.3897/rio.1.e7573\">10.3897/rio.1.e7573</a>).\nSo, with <a href=\"http://www.bioclipse.net/\">Bioclipse</a> (doi:<a href=\"http://dx.doi.org/10.1186/1471-2105-8-59\">10.1186/1471-2105-8-59</a>)\nI redrew the structure:</p>\n\n<p><img src=\"/assets/images/strucutre.png\" alt=\"\" /></p>\n\n<p>I previously blogged about how to <a href=\"https://chem-bla-ics.linkedchemistry.info/2016/01/27/adding-chemical-compound-to-wikidata.html\">add chemicals to Wikidata <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\nbut I realized that I wanted to also use Bioclipse to automate this process a bit. So, I wrote this script to generated the SMILES, InChI,\nInChIKey, double check the compound is not already in Wikidata (using the <a href=\"https://query.wikidata.org/\">Wikidata SPARQL endpoint</a>),\nan look up the <a href=\"https://pubchem.ncbi.nlm.nih.gov/\">PubChem</a> compound identifier (example SMILES).</p>\n\n<div class=\"language-groovy highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">smiles</span> <span class=\"o\">=</span> <span class=\"s2\">\"CCCC\"</span>\n\n<span class=\"n\">mol</span> <span class=\"o\">=</span> <span class=\"n\">cdk</span><span class=\"o\">.</span><span class=\"na\">fromSMILES</span><span class=\"o\">(</span><span class=\"n\">smiles</span><span class=\"o\">)</span>\n<span class=\"n\">ui</span><span class=\"o\">.</span><span class=\"na\">open</span><span class=\"o\">(</span><span class=\"n\">mol</span><span class=\"o\">)</span>\n\n<span class=\"n\">inchiObj</span> <span class=\"o\">=</span> <span class=\"n\">inchi</span><span class=\"o\">.</span><span class=\"na\">generate</span><span class=\"o\">(</span><span class=\"n\">mol</span><span class=\"o\">)</span>\n<span class=\"n\">inchiShort</span> <span class=\"o\">=</span> <span class=\"n\">inchiObj</span><span class=\"o\">.</span><span class=\"na\">value</span><span class=\"o\">.</span><span class=\"na\">substring</span><span class=\"o\">(</span><span class=\"mi\">6</span><span class=\"o\">)</span>\n<span class=\"n\">key</span> <span class=\"o\">=</span> <span class=\"n\">inchiObj</span><span class=\"o\">.</span><span class=\"na\">key</span> <span class=\"c1\">// key = \"GDGXJFJBRMKYDL-FYWRMAATSA-N\"</span>\n\n<span class=\"n\">sparql</span> <span class=\"o\">=</span> <span class=\"s2\">\"\"\"\nPREFIX wdt: &lt;http://www.wikidata.org/prop/direct/&gt;\nSELECT ?compound WHERE {\n  ?compound wdt:P235 \"$key\" .\n}\n\"\"\"</span>\n\n<span class=\"k\">if</span> <span class=\"o\">(</span><span class=\"n\">bioclipse</span><span class=\"o\">.</span><span class=\"na\">isOnline</span><span class=\"o\">())</span> <span class=\"o\">{</span>\n  <span class=\"n\">results</span> <span class=\"o\">=</span> <span class=\"n\">rdf</span><span class=\"o\">.</span><span class=\"na\">sparqlRemote</span><span class=\"o\">(</span>\n    <span class=\"s2\">\"https://query.wikidata.org/sparql\"</span><span class=\"o\">,</span> <span class=\"n\">sparql</span>\n  <span class=\"o\">)</span>\n  <span class=\"n\">missing</span> <span class=\"o\">=</span> <span class=\"n\">results</span><span class=\"o\">.</span><span class=\"na\">rowCount</span> <span class=\"o\">==</span> <span class=\"mi\">0</span>\n<span class=\"o\">}</span> <span class=\"k\">else</span> <span class=\"o\">{</span>\n  <span class=\"n\">missing</span> <span class=\"o\">=</span> <span class=\"kc\">true</span>\n<span class=\"o\">}</span>\n\n<span class=\"n\">formula</span> <span class=\"o\">=</span> <span class=\"n\">cdk</span><span class=\"o\">.</span><span class=\"na\">molecularFormula</span><span class=\"o\">(</span><span class=\"n\">mol</span><span class=\"o\">)</span>\n\n<span class=\"c1\">// Create the Wikidata QuickStatement,</span>\n<span class=\"c1\">// see https://tools.wmflabs.org/wikidata-todo/quick_statements.php</span>\n\n<span class=\"n\">item</span> <span class=\"o\">=</span> <span class=\"s2\">\"LAST\"</span> <span class=\"c1\">// set to Qxxxx if you need to append info,</span>\n              <span class=\"c1\">// e.g. item = \"Q22579236\"</span>\n\n<span class=\"n\">pubchemLine</span> <span class=\"o\">=</span> <span class=\"s2\">\"\"</span>\n<span class=\"k\">if</span> <span class=\"o\">(</span><span class=\"n\">bioclipse</span><span class=\"o\">.</span><span class=\"na\">isOnline</span><span class=\"o\">())</span> <span class=\"o\">{</span>\n  <span class=\"n\">pcResults</span> <span class=\"o\">=</span> <span class=\"n\">pubchem</span><span class=\"o\">.</span><span class=\"na\">search</span><span class=\"o\">(</span><span class=\"n\">key</span><span class=\"o\">)</span>\n  <span class=\"k\">if</span> <span class=\"o\">(</span><span class=\"n\">pcResults</span><span class=\"o\">.</span><span class=\"na\">size</span> <span class=\"o\">==</span> <span class=\"mi\">1</span><span class=\"o\">)</span> <span class=\"o\">{</span>\n    <span class=\"n\">cid</span> <span class=\"o\">=</span> <span class=\"n\">pcResults</span><span class=\"o\">[</span><span class=\"mi\">0</span><span class=\"o\">]</span>\n    <span class=\"n\">pubchemLine</span> <span class=\"o\">=</span> <span class=\"s2\">\"$item\\tP662\\t\\\"$cid\\\"\"</span>\n  <span class=\"o\">}</span>\n<span class=\"o\">}</span>\n\n<span class=\"k\">if</span> <span class=\"o\">(!</span><span class=\"n\">missing</span><span class=\"o\">)</span> <span class=\"o\">{</span>\n  <span class=\"n\">println</span> <span class=\"s2\">\"====================\"</span>\n  <span class=\"n\">println</span> <span class=\"s2\">\"Already in Wikidata as \"</span> <span class=\"o\">+</span> <span class=\"n\">results</span><span class=\"o\">.</span><span class=\"na\">get</span><span class=\"o\">(</span><span class=\"mi\">1</span><span class=\"o\">,</span><span class=\"s2\">\"compound\"</span><span class=\"o\">)</span>\n  <span class=\"n\">println</span> <span class=\"s2\">\"====================\"</span>\n<span class=\"o\">}</span> <span class=\"k\">else</span> <span class=\"o\">{</span>\n  <span class=\"n\">statement</span> <span class=\"o\">=</span> <span class=\"s2\">\"\"\"\n    CREATE\n    \n    $item\\tDen\\t\\\"chemical compound\\\"\n    $item\\tP233\\t\\\"$smiles\\\"\n    $item\\tP274\\t\\\"$formula\\\"\n    $item\\tP234\\t\\\"$inchiShort\\\"\n    $item\\tP235\\t\\\"$key\\\"\n    $pubchemLine\n  \"\"\"</span>\n\n  <span class=\"n\">println</span> <span class=\"s2\">\"====================\"</span>\n  <span class=\"n\">println</span> <span class=\"n\">statement</span>\n  <span class=\"n\">println</span> <span class=\"s2\">\"====================\"</span>\n<span class=\"o\">}</span>\n</code></pre></div></div>\n\n<p>The output of this script is a <a href=\"https://tools.wmflabs.org/wikidata-todo/quick_statements.php\">QuickStatement</a> for\n<a href=\"http://twitter.org/MagnusManske\">Magnus Manske</a>’s tool (IMPORTANT: it’s not meant to automate editing Wikidata! I only automate\ncreating the input, which I carefully check (e.g. checking all stereochemistry is defined)! Note, how Bioclipse opens up the\nstructure in a viewer with ui.open()), which is a list of commands to create and edit entries in Wikidata. You need to enable\nit first, but if you have an account, this is not too hard. Of course, the advantage is that it is a lot quicker. I have similar\nscript to create QuickStatements starting with only a <a href=\"https://www.ebi.ac.uk/chembl/\">ChEMBL</a> identifier.</p>\n\n<p>The QuickStatement for GDC-0853 looks like:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>    CREATE\n    \n    LAST Den \"chemical compound\"\n    LAST P233 \"O=C1C(=CC(=CN1C)c2ccnc(c2CO)N4C(=O)c3cc5c(n3CC4)CC(C)(C)C5)Nc6ncc(cc6)N7CCN(C[C@@H]7C)C8COC8\"\n    LAST P274 \"C37H44N8O4\"\n    LAST P234 \"1S/C37H44N8O4/c1-23-18-42(27-21-49-22-27)9-10-43(23)26-5-6-33(39-17-26)40-30-13-25(19-41(4)35(30)47)28-7-8-38-34(29(28)20-46)45-12-11-44-31(36(45)48)14-24-15-37(2,3)16-32(24)44/h5-8,13-14,17,19,23,27,46H,9-12,15-16,18,20-22H2,1-4H3,(H,39,40)/t23-/m0/s1\"\n    LAST P235 \"WNEODWDFDXWOLU-QHCPKHFHSA-N\"\n    LAST P662 \"86567195\"\n</code></pre></div></div>\n\n<p>The first line creates a new Wikidata item, while the next ones add information about this compound. GDC-0853 is now also\n<a href=\"https://www.wikidata.org/wiki/Q23304817\">Q23304817</a>. The label I added manually afterwards. Note how the Bioclipse script found\nthe PubChem identifier, using the InChIKey. I also use this approach to add compounds to Wikidata that we have in\n<a href=\"http://wikipathways.org/\">WikiPathways</a>.</p>",
      "summary": "Last week the huge, bi-annual ACS meeting took place (#ACSSanDiego), during which commonly new drug (leads) are disclosed. This time too, like this one tweeted by Bethany Halford:",
      
      "date_published": "2016-03-20T00:00:00+00:00",
      "date_modified": "2025-02-16T00:00:00+00:00",
      "tags": ["acs","bioclipse","chembl","inchi","pubchem","wikidata","acssandiego"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1186/1471-2105-8-59", "doi": "10.1186/1471-2105-8-59"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.3897/RIO.1.E7573", "doi": "10.3897/RIO.1.E7573"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/q5a38-51n24",
      "url": "https://chem-bla-ics.linkedchemistry.info/2016/01/27/adding-chemical-compound-to-wikidata.html",
      "title": "Adding chemical compounds to Wikidata",
      "content_html": "<p>Adding chemical compounds to <a href=\"https://www.wikidata.org/\">Wikidata</a> is not difficult. You can store the chemical formula\n(<a href=\"https://www.wikidata.org/wiki/Property:P274\">P274</a>), (canonical) <a href=\"http://chem-bla-ics.blogspot.nl/2015/12/the-quality-of-smiles-strings-in.html\">SMILES</a>\n(<a href=\"https://www.wikidata.org/wiki/Property:P233\">P233</a>), InChIKey (<a href=\"https://www.wikidata.org/wiki/Property:P235\">P235</a>) (and InChI\n(<a href=\"https://www.wikidata.org/wiki/Property:P233\">P234</a>), of course), as well various database identifiers (see what I wrote about that\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2015/12/22/new-edition-getting-cas-registry.html\">here <i class=\"fa-solid fa-recycle fa-xs\"></i></a>]). It also allows storing of the provenance, and has predicates\nfor that too.</p>\n\n<p>So, to enter a new structure for a compound, you should enter the compound information to Wikidata. Of course, make sure to create the needed accounts,\nparticularly one for Wikidata (<a href=\"https://www.wikidata.org/w/index.php?title=Special:UserLogin&amp;returnto=Wikidata%3AMain+Page&amp;type=signup\">create account</a>)\n(not sure if the next steps needs a more general Wikimedia account too).</p>\n\n<p><strong>Entering the research paper</strong>: <br />\n<a href=\"https://twitter.com/MagnusManske\">Magnus Manske</a> <a href=\"https://twitter.com/MagnusManske/status/691664308523130880\">pointed</a> me to\n<a href=\"http://tools.wmflabs.org/sourcemd/\">this helper tool</a>. If you have the DOI of the paper, it is easy to add a new paper. This is what the tool shows\nfor doi:<a href=\"http://dx.doi.org/10.1128/AAC.01148-08\">10.1128/AAC.01148-08</a> (but no longer when you try!):</p>\n\n<p><img src=\"/assets/images/smd.png\" alt=\"\" /></p>\n\n<p>You need permission to run this script and the tool will alert you about that, and give the instructions how to get permission. After\nI clicked the Open in QuickStatements I get this output, showing me an entry in Wikidata was created for this paper:</p>\n\n<p><img src=\"/assets/images/smd1.png\" alt=\"\" /></p>\n\n<p>Later, I can use the new Q-code (<a href=\"https://www.wikidata.org/wiki/Q22309806\">Q22309806</a>) to use as source for statements about the compound (formula, etc).</p>\n\n<p><strong>Draw your compound and get an InChIKey</strong>: <br />\nThe next step is to draw a compound and get an InChIKey. This can be done with many tools, including\n<a href=\"http://bioclipse.net/\">Bioclipse</a>. Rajarshi opted for alternatives:</p>\n\n<ul>\n<a href=\"https://twitter.com/collabchem\">@collabchem</a> <a href=\"https://twitter.com/egonwillighagen\">@egonwillighagen</a> OSRA or <a href=\"https://t.co/ZIQdgrYsmr\">https://t.co/ZIQdgrYsmr</a>? <br />\n— Rajarshi Guha (@rguha) <a href=\"https://twitter.com/rguha/status/692377715735949313\">January 27, 2016</a>\n</ul>\n\n<p>Then check if the compound is not already in Wikidata. You can use this SPARQL query for that using the InChIKey of the compound (it’s for acetic acid, so it will be found):</p>\n\n<p><img src=\"/assets/images/smd3.png\" alt=\"\" /></p>\n\n<p>For convenience, here the copy/pastable SPARQL:</p>\n\n<div class=\"language-sparql highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">PREFIX</span><span class=\"w\"> </span><span class=\"nn\">wdt</span><span class=\"o\">:</span><span class=\"w\">\n</span><span class=\"k\">SELECT</span><span class=\"w\"> </span><span class=\"nv\">?compound</span><span class=\"w\"> </span><span class=\"k\">WHERE</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\">\n  </span><span class=\"nv\">?compound</span><span class=\"w\"> </span><span class=\"nn\">wdt</span><span class=\"o\">:</span><span class=\"ss\">P235</span><span class=\"w\"> </span><span class=\"s2\">\"QTBSBXVTEAMEQO-UHFFFAOYSA-N\"</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n</span><span class=\"p\">}</span><span class=\"w\">\n</span></code></pre></div></div>\n\n<p><strong>Entering the compound</strong>: <br />\nSo, the compound is not already in Wikidata, so time to add it. The minimal information you should provide is the following:</p>\n\n<ul>\n  <li>mark the new entry as ‘instance of’ (P) ‘chemical compound (Q)</li>\n  <li>the chemical formula and SMILES (use as reference the paper)\n    <ul>\n      <li>add the reference to the paper you entered above</li>\n    </ul>\n  </li>\n  <li>add the InChIKey and/or InChI</li>\n</ul>\n\n<p>The first step is to create a new Wikidata entry. The Create new item menu in the left side panel can be used, showing a page like this:</p>\n\n<p><img src=\"/assets/images/smd2.png\" alt=\"\" /></p>\n\n<p>As a label you can use the name used in the paper for the compound, even if a code, and as description ‘chemical compound’ will do for now; it can be changed later.</p>\n\n<p>Feel free to add as much information about the compound as you can find. There are some chemically rich entries in Wikidata, such as that for acetic acid\n(<a href=\"https://www.wikidata.org/wiki/Q47512\">Q47512</a>).</p>",
      "summary": "Adding chemical compounds to Wikidata is not difficult. You can store the chemical formula (P274), (canonical) SMILES (P233), InChIKey (P235) (and InChI (P234), of course), as well various database identifiers (see what I wrote about that here ]). It also allows storing of the provenance, and has predicates for that too.",
      
      "date_published": "2016-01-27T00:00:00+00:00",
      "date_modified": "2025-05-25T00:00:00+00:00",
      "tags": ["wikidata","chemistry","bioclipse"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1128/AAC.01148-08", "doi": "10.1128/AAC.01148-08"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/vt6nr-28g72",
      "url": "https://chem-bla-ics.linkedchemistry.info/2015/12/22/new-edition-getting-cas-registry.html",
      "title": "New Edition! Getting CAS registry numbers out of WikiData",
      "content_html": "<p><span style=\"width: 30%; display: block; margin-left: auto; margin-right: auto; float: right\">\n<img src=\"/assets/images/aceticAcidCAS.png\" /> <br />\nSource: Wikipedia. <a href=\"https://en.wikipedia.org/wiki/File:Acetic_acid.jpg\">CC-BY-SA</a>\n</span>\nApril this year <a href=\"https://chem-bla-ics.linkedchemistry.info/2015/04/10/getting-cas-registry-numbers-out-of.html\">I blogged about an important SPARQL query <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nfor many chemists: getting CAS registry numbers from Wikidata. This is relevant for two reasons:</p>\n\n<ol>\n  <li><a href=\"http://commonchemistry.org/\">CAS works together with Wikimedia</a> on a large, free CAS-to-structure database</li>\n  <li><a href=\"http://wikidata.org/\">Wikidata</a> is <a href=\"https://creativecommons.org/choose/zero/\">CCZero</a></li>\n</ol>\n\n<p>The original effort validated about eight thousand registry numbers, made available via Wikipedia and the\n<a href=\"http://commonchemistry.org/\">Common Chemistry</a> website. However, the effort did not stop there, and Wikipedia\nnow contains many more CAS registry numbers. In fact, Wikidata picked up many of these and now lists almost\ntwenty thousand CAS numbers. That well exceeds what databases are allowed to aggregate and make available.</p>\n\n<p>Since the post in April, Wikidata put online a <a href=\"https://query.wikidata.org/\">new SPARQL end point</a> and\ncreated “direct” property links. This way, you loose the provenance information, but the query becomes simpler:</p>\n\n<div class=\"language-sparql highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">PREFIX</span><span class=\"w\"> </span><span class=\"nn\">wdt</span><span class=\"o\">:</span><span class=\"w\"> </span><span class=\"nn\">&lt;http://www.wikidata.org/prop/direct/&gt;</span><span class=\"w\">\n</span><span class=\"k\">SELECT</span><span class=\"w\"> </span><span class=\"nv\">?compound</span><span class=\"w\"> </span><span class=\"nv\">?id</span><span class=\"w\"> </span><span class=\"k\">WHERE</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\">\n  </span><span class=\"nv\">?compound</span><span class=\"w\"> </span><span class=\"nn\">wdt</span><span class=\"o\">:</span><span class=\"ss\">P231</span><span class=\"w\"> </span><span class=\"nv\">?id</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n</span><span class=\"p\">}</span><span class=\"w\">\n</span></code></pre></div></div>\n\n<p>The other thing that changed since April is that others and I requested the creation of more compound identifiers,\nand here’s an overview along with the current number of such identifiers in Wikidata:</p>\n\n<ul>\n  <li>CAS registry number (<a href=\"https://www.wikidata.org/wiki/Property:P231\">P231</a>): <a href=\"https://query.wikidata.org/#PREFIX%20wdt%3A%20%3Chttp%3A%2F%2Fwww.wikidata.org%2Fprop%2Fdirect%2F%3E%0ASELECT%20(count(%3Fid)%20as%20%3Fcount)%20WHERE%20%7B%0A%20%20%3Fcompound%20wdt%3AP231%20%3Fid%20.%0A%7D%0A\">19420</a></li>\n  <li>PubChem ID (CID) (<a href=\"https://www.wikidata.org/wiki/Property:P662\">P662</a>): <a href=\"https://query.wikidata.org/#PREFIX%20wdt%3A%20%3Chttp%3A%2F%2Fwww.wikidata.org%2Fprop%2Fdirect%2F%3E%0ASELECT%20%28count%28%3Fid%29%20as%20%3Fcount%29%20WHERE%20%7B%0A%20%20%3Fcompound%20wdt%3AP662%20%3Fid%20.%0A%7D%0A\">16616</a></li>\n  <li>InChI (<a href=\"https://www.wikidata.org/wiki/Property:P234\">P234</a>): <a href=\"https://query.wikidata.org/#PREFIX%20wdt%3A%20%3Chttp%3A%2F%2Fwww.wikidata.org%2Fprop%2Fdirect%2F%3E%0ASELECT%20%28count%28%3Fid%29%20as%20%3Fcount%29%20WHERE%20%7B%0A%20%20%3Fcompound%20wdt%3AP234%20%3Fid%20.%0A%7D%0A\">14312</a></li>\n  <li>ChemSpider ID (<a href=\"https://www.wikidata.org/wiki/Property:P661\">P661</a>): <a href=\"https://query.wikidata.org/#PREFIX%20wdt%3A%20%3Chttp%3A%2F%2Fwww.wikidata.org%2Fprop%2Fdirect%2F%3E%0ASELECT%20%28count%28%3Fid%29%20as%20%3Fcount%29%20WHERE%20%7B%0A%20%20%3Fcompound%20wdt%3AP661%20%3Fid%20.%0A%7D%0A\">11566</a></li>\n  <li>ChEBI ID (<a href=\"https://www.wikidata.org/wiki/Property:P683\">P683</a>): <a href=\"https://query.wikidata.org/#PREFIX%20wdt%3A%20%3Chttp%3A%2F%2Fwww.wikidata.org%2Fprop%2Fdirect%2F%3E%0ASELECT%20%28count%28%3Fid%29%20as%20%3Fcount%29%20WHERE%20%7B%0A%20%20%3Fcompound%20wdt%3AP683%20%3Fid%20.%0A%7D%0A\">4313</a></li>\n  <li>KEGG ID (<a href=\"https://www.wikidata.org/wiki/Property:P665\">P665</a>): <a href=\"https://query.wikidata.org/#PREFIX%20wdt%3A%20%3Chttp%3A%2F%2Fwww.wikidata.org%2Fprop%2Fdirect%2F%3E%0ASELECT%20%28count%28%3Fid%29%20as%20%3Fcount%29%20WHERE%20%7B%0A%20%20%3Fcompound%20wdt%3AP665%20%3Fid%20.%0A%7D%0A\">3983</a></li>\n  <li>Drugbank ID (<a href=\"https://www.wikidata.org/wiki/Property:P715\">P715</a>): <a href=\"https://query.wikidata.org/#PREFIX%20wdt%3A%20%3Chttp%3A%2F%2Fwww.wikidata.org%2Fprop%2Fdirect%2F%3E%0ASELECT%20%28count%28%3Fid%29%20as%20%3Fcount%29%20WHERE%20%7B%0A%20%20%3Fcompound%20wdt%3AP715%20%3Fid%20.%0A%7D%0A\">2518</a></li>\n  <li>KNApSAcK ID (<a href=\"https://www.wikidata.org/wiki/Property:P2064\">P2064</a>): <a href=\"https://query.wikidata.org/#PREFIX%20wdt%3A%20%3Chttp%3A%2F%2Fwww.wikidata.org%2Fprop%2Fdirect%2F%3E%0ASELECT%20%28count%28%3Fid%29%20as%20%3Fcount%29%20WHERE%20%7B%0A%20%20%3Fcompound%20wdt%3AP2064%20%3Fid%20.%0A%7D%0A\">9</a></li>\n  <li>HMDB ID (<a href=\"https://www.wikidata.org/wiki/Property:P2057\">P2057</a>): <a href=\"https://query.wikidata.org/#PREFIX%20wdt%3A%20%3Chttp%3A%2F%2Fwww.wikidata.org%2Fprop%2Fdirect%2F%3E%0ASELECT%20%28count%28%3Fid%29%20as%20%3Fcount%29%20WHERE%20%7B%0A%20%20%3Fcompound%20wdt%3AP2057%20%3Fid%20.%0A%7D%0A\">6</a></li>\n  <li>ZINC ID (<a href=\"https://www.wikidata.org/wiki/Property:P2084\">P2084</a>): <a href=\"https://query.wikidata.org/#PREFIX%20wdt%3A%20%3Chttp%3A%2F%2Fwww.wikidata.org%2Fprop%2Fdirect%2F%3E%0ASELECT%20%28count%28%3Fid%29%20as%20%3Fcount%29%20WHERE%20%7B%0A%20%20%3Fcompound%20wdt%3AP2084%20%3Fid%20.%0A%7D%0A\">4</a></li>\n  <li>LIPID MAPS ID (<a href=\"https://www.wikidata.org/wiki/Property:P2063\">P2063</a>): <a href=\"https://query.wikidata.org/#PREFIX%20wdt%3A%20%3Chttp%3A%2F%2Fwww.wikidata.org%2Fprop%2Fdirect%2F%3E%0ASELECT%20(count(%3Fid)%20as%20%3Fcount)%20WHERE%20%7B%0A%20%20%3Fcompound%20wdt%3AP2063%20%3Fid%20.%0A%7D%0A\">3</a></li>\n  <li>Leadscope ID (<a href=\"https://www.wikidata.org/wiki/Property:P2083\">P2083</a>): <a href=\"https://query.wikidata.org/#PREFIX%20wdt%3A%20%3Chttp%3A%2F%2Fwww.wikidata.org%2Fprop%2Fdirect%2F%3E%0ASELECT%20(count(%3Fid)%20as%20%3Fcount)%20WHERE%20%7B%0A%20%20%3Fcompound%20wdt%3AP2083%20%3Fid%20.%0A%7D%0A\">3</a></li>\n</ul>\n\n<p>Clearly, some identifiers are not well populated yet. This is what bots are for, like\n<a href=\"https://bitbucket.org/sulab/wikidatabots/overview\">those used by the Andrew Su team</a>.</p>\n\n<p>Because there is also a predicate for SMILES, we can also create a query that puts the CAS registry\nnumber alongside to the SMILES (or any other identifier):</p>\n\n<div class=\"language-sparql highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">PREFIX</span><span class=\"w\"> </span><span class=\"nn\">wdt</span><span class=\"o\">:</span><span class=\"w\"> </span><span class=\"nn\">&lt;http://www.wikidata.org/prop/direct/&gt;</span><span class=\"w\">\n</span><span class=\"k\">SELECT</span><span class=\"w\"> </span><span class=\"nv\">?compound</span><span class=\"w\"> </span><span class=\"nv\">?id</span><span class=\"w\"> </span><span class=\"nv\">?smiles</span><span class=\"w\"> </span><span class=\"k\">WHERE</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\">\n  </span><span class=\"nv\">?compound</span><span class=\"w\"> </span><span class=\"nn\">wdt</span><span class=\"o\">:</span><span class=\"ss\">P231</span><span class=\"w\"> </span><span class=\"nv\">?id</span><span class=\"w\"> </span><span class=\"p\">;</span><span class=\"w\">\n            </span><span class=\"nn\">wdt</span><span class=\"o\">:</span><span class=\"ss\">P233</span><span class=\"w\"> </span><span class=\"nv\">?smiles</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n</span><span class=\"p\">}</span><span class=\"w\">\n</span></code></pre></div></div>\n\n<p>Of course, then the question is, <a href=\"https://chem-bla-ics.blogspot.nl/2015/10/how-to-test-smiles-strings-in.html\">are these SMILES string valid</a>…\nAnd, importantly, this is nothing compared to the number of chemical compounds we know about, which currently is in\nthe order of 100 million, of which a quarter can be readily purchased:</p>\n\n<p><a href=\"https://twitter.com/chem4biology/status/679314144680513536\"><img src=\"/assets/images/twitter_chem4biology_679314144680513536.png\" alt=\"\" /></a></p>\n\n<p><a href=\"https://twitter.com/chem4biology/status/677593362640142336\"><img src=\"/assets/images/twitter_chem4biology_677593362640142336.png\" alt=\"\" /></a></p>",
      "summary": "Source: Wikipedia. CC-BY-SA April this year I blogged about an important SPARQL query for many chemists: getting CAS registry numbers from Wikidata. This is relevant for two reasons:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/aceticAcidCAS.png",
      "date_published": "2015-12-22T00:00:00+00:00",
      "date_modified": "2025-05-25T00:00:00+00:00",
      "tags": ["cas","wikidata","chemistry"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.15200/winn.142867.72538", "doi": "10.15200/winn.142867.72538"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/4k55a-8c261",
      "url": "https://chem-bla-ics.linkedchemistry.info/2015/09/27/coding-owl-ontology-in-html5-and-rdfa.html",
      "title": "Coding an OWL ontology in HTML5 and RDFa",
      "content_html": "<p><img src=\"/assets/images/bdbOnto.png\" style=\"width: 30%; display: block; margin-left: auto; margin-right: auto; float: right\" />\nThere are many fancy tools to edit ontologies. I like simple editors, like <a href=\"https://en.wikipedia.org/wiki/GNU_nano\">nano</a>. And like any hacker, I can hack\n<a href=\"https://en.wikipedia.org/wiki/Web_Ontology_Language\">OWL</a> ontologies in nano. The hacking implies OWL was never meant to be hacked on a simple text editor;\nI am not sure that is really true. Anyways, <a href=\"https://en.wikipedia.org/wiki/HTML5\">HTML5</a> and <a href=\"https://en.wikipedia.org/wiki/RDFa\">RDFa</a> will do fine, and\nhere is a brief write up. This post will not cover the basics of RDFa and does assume you already know how triples work. If not, read this\n<a href=\"http://www.w3.org/TR/xhtml-rdfa-primer/\">RDFa primer</a> first.</p>\n\n<h2 id=\"the-bridgedb-datasource-ontology\">The BridgeDb DataSource Ontology</h2>\n<p>This example uses the <a href=\"http://www.bridgedb.org/\">BridgeDb</a> DataSource Ontology, created by BridgeDb developers from Manchester University (Christian,\nStian, and Alasdair). The ontology covers describing data sources of identifiers, a technology outlined in the BridgeDb paper by Martijn (see below)\nas well as terms from the Open PHACTS <a href=\"http://www.openphacts.org/specs/datadesc/\">Dataset Descriptions for the Open Pharmacological Space</a>\nby Alasdair et al.</p>\n\n<p>Because I needed to put this online for <a href=\"https://www.openphacts.org/\">Open PHACTS</a> (BTW,\n<a href=\"https://www.openphacts.org/news-and-events/news-archive/2015/398-open-phacts-wins-the-european-linked-data-contest\">the project won a big award</a>!)\nand our previous solution did not work well enough anymore. You may also see the HTML of the result first. You may also want to verify it really is\n<a href=\"http://vocabularies.bridgedb.org/ops\">HTML</a>: here is the <a href=\"https://validator.w3.org/nu/?doc=http://vocabularies.bridgedb.org/ops\">HTML5 validation report</a>.\nAlso, you may be interested in what the ontology in RDF looks like: here is\n<a href=\"http://www.w3.org/2012/pyRdfa/extract?uri=http://vocabularies.bridgedb.org/ops#\">the extracted RDF for the ontology</a>.\nNow follow the HTML+RDFa snippets. First, the ontology details (actually, I have it split up):</p>\n\n<pre>&lt;div <span style=\"color: red;\">about=\"</span><span style=\"background-color: #76a5af; color: white;\">http://vocabularies.bridgedb.org/ops#</span><span style=\"color: red;\">\"\n     typeof=\"</span><span style=\"background-color: #76a5af; color: white;\">owl:Ontology</span><span style=\"color: red;\">\"</span>&gt;\n  &lt;h1&gt;The &lt;span <span style=\"color: red;\">property=\"rdfs:label\"</span>&gt;<span style=\"background-color: #6fa8dc;\"><span style=\"color: white;\">BridgeDb DataSource Ontology</span></span>&lt;/span&gt;\n    (version &lt;span <span style=\"color: red;\">property=\"owl:versionInfo\"</span>&gt;<span style=\"background-color: #6fa8dc; color: white;\">2.1.0</span>&lt;/span&gt;)&lt;/h1&gt;\n  &lt;p&gt;\n    This page describes the BridgeDb ontology. Make sure to visit our\n    &lt;a <span style=\"color: red;\">property=\"rdfs:seeAlso\"</span> href=\"<span style=\"background-color: #76a5af;\"><span style=\"color: white;\">http://www.bridgedb.org/</span></span>\"&gt;homepage&lt;/a&gt; too!\n  &lt;/p&gt;\n&lt;/div&gt;\n&lt;p <span style=\"color: red;\">about=\"</span><span style=\"background-color: #76a5af; color: white;\">http://vocabularies.bridgedb.org/ops#</span><span style=\"color: red;\">\"</span>&gt;\n  The OWL ontology can be extracted\n  &lt;a <span style=\"color: red;\">property=\"owl:versionIRI\"</span>\n     href=\"<span style=\"background-color: #76a5af; color: white;\">http://www.w3.org/2012/pyRdfa/extract?uri=http://vocabularies.bridgedb.org/ops#</span>\"&gt;here&lt;/a&gt;.\n  The Open PHACTS specification on\n  &lt;a <span style=\"color: red;\">property=\"rdf:seeAlso\"</span>\n    href=\"<span style=\"background-color: #76a5af;\"><span style=\"color: white;\">http://www.openphacts.org/specs/2013/WD-datadesc-20130912/#bridgedb</span></span>\"\n  &gt;Dataset Descriptions&lt;/a&gt; is also useful.\n&lt;/p&gt;\n</pre>\n\n<p>This is the last time I show the custom color coding, but for a first time it is useful. In red are basically the predicates, where <code class=\"language-plaintext highlighter-rouge\">@about</code>\nindicates a new resource is started, <code class=\"language-plaintext highlighter-rouge\">@typeof</code> defines the <code class=\"language-plaintext highlighter-rouge\">rdf:type</code>, and <code class=\"language-plaintext highlighter-rouge\">@property</code> indicates all other predicates. The blue and\ngreen blobs are literals and object resources, respectively. If you work this out, you get this OWL code (more or less):</p>\n\n<div class=\"language-turtle highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nn\">bridgedb:</span><span class=\"w\"> </span><span class=\"k\">a</span><span class=\"w\"> </span><span class=\"nn\">owl:</span><span class=\"n\">Ontology</span><span class=\"p\">;</span><span class=\"w\">\n  </span><span class=\"nn\">rdfs:</span><span class=\"n\">label</span><span class=\"w\"> </span><span class=\"s\">\"BridgeDb DataSource Ontology\"</span><span class=\"na\">@en</span><span class=\"p\">;</span><span class=\"w\">\n  </span><span class=\"nn\">rdf:</span><span class=\"n\">seeAlso\n</span><span class=\"w\">    </span><span class=\"nl\">&lt;http://www.openphacts.org/specs/2013/WD-datadesc-20130912/#bridgedb&gt;</span><span class=\"p\">;</span><span class=\"w\">\n  </span><span class=\"nn\">rdfs:</span><span class=\"n\">seeAlso</span><span class=\"w\"> </span><span class=\"nl\">&lt;http://www.bridgedb.org/&gt;</span><span class=\"p\">;</span><span class=\"w\">\n  </span><span class=\"nn\">owl:</span><span class=\"n\">versionIRI\n</span><span class=\"w\">    </span><span class=\"nl\">&lt;http://www.w3.org/2012/pyRdfa/extract?uri=http://vocabularies.bridgedb.org/ops#&gt;</span><span class=\"p\">;</span><span class=\"w\">\n  </span><span class=\"nn\">owl:</span><span class=\"n\">versionInfo</span><span class=\"w\"> </span><span class=\"s\">\"2.1.0\"</span><span class=\"na\">@en</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n</span></code></pre></div></div>\n\n<h2 id=\"an-owl-class\">An OWL class</h2>\n<p>Defining OWL classes are using the same approach: define the resource it is <code class=\"language-plaintext highlighter-rouge\">@about</code>, define the <code class=\"language-plaintext highlighter-rouge\">@typeOf</code> and giving is properties.\nBTW, note that I added a <code class=\"language-plaintext highlighter-rouge\">@id</code> so that ontology terms can be looked up using the HTML # functionality. For example:</p>\n\n<div class=\"language-xml highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nt\">&lt;div</span> <span class=\"na\">id=</span><span class=\"s\">\"DataSource\"</span>\n  <span class=\"na\">about=</span><span class=\"s\">\"http://vocabularies.bridgedb.org/ops#DataSource\"</span>\n  <span class=\"na\">typeof=</span><span class=\"s\">\"owl:Class\"</span><span class=\"nt\">&gt;</span>\n  <span class=\"nt\">&lt;h3</span> <span class=\"na\">property=</span><span class=\"s\">\"rdfs:label\"</span><span class=\"nt\">&gt;</span>Data Source<span class=\"nt\">&lt;/h3&gt;</span>\n  <span class=\"nt\">&lt;p</span> <span class=\"na\">property=</span><span class=\"s\">\"dc:description\"</span><span class=\"nt\">&gt;</span>A resource that defines\n    identifiers for some biological entity, like a gene,\n    protein, or metabolite.<span class=\"nt\">&lt;/p&gt;</span>\n<span class=\"nt\">&lt;/div&gt;</span>\n</code></pre></div></div>\n\n<h2 id=\"an-owl-object-property\">An OWL object property</h2>\n<p>Defining an OWL data property is pretty much the same, but note that we can arbitrary add additional things, making use of <code class=\"language-plaintext highlighter-rouge\">&lt;span&gt;</code>,\n<code class=\"language-plaintext highlighter-rouge\">&lt;div&gt;</code>, and <code class=\"language-plaintext highlighter-rouge\">&lt;p&gt;</code> elements. The following example also defines the <code class=\"language-plaintext highlighter-rouge\">rdfs:domain</code> and <code class=\"language-plaintext highlighter-rouge\">rdfs:range</code>:</p>\n\n<div class=\"language-xml highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nt\">&lt;div</span> <span class=\"na\">id=</span><span class=\"s\">\"aboutOrganism\"</span>\n  <span class=\"na\">about=</span><span class=\"s\">\"http://vocabularies.bridgedb.org/ops#aboutOrganism\"</span>\n  <span class=\"na\">typeof=</span><span class=\"s\">\"owl:ObjectProperty\"</span><span class=\"nt\">&gt;</span>\n  <span class=\"nt\">&lt;h3</span> <span class=\"na\">property=</span><span class=\"s\">\"rdfs:label\"</span><span class=\"nt\">&gt;</span>About Organism<span class=\"nt\">&lt;/h3&gt;</span>\n  <span class=\"nt\">&lt;p&gt;&lt;span</span> <span class=\"na\">property=</span><span class=\"s\">\"dc:description\"</span><span class=\"nt\">&gt;</span>Organism for all entities\n    with identifiers from this datasource.<span class=\"nt\">&lt;/span&gt;</span>\n    This property has\n    <span class=\"nt\">&lt;a</span> <span class=\"na\">property=</span><span class=\"s\">\"rdfs:domain\"</span>\n      <span class=\"na\">href=</span><span class=\"s\">\"http://vocabularies.bridgedb.org/ops#DataSource\"</span><span class=\"nt\">&gt;</span>DataSource<span class=\"nt\">&lt;/a&gt;</span>\n    as domain and\n    <span class=\"nt\">&lt;a</span> <span class=\"na\">property=</span><span class=\"s\">\"rdfs:range\"</span>\n      <span class=\"na\">href=</span><span class=\"s\">\"http://vocabularies.bridgedb.org/ops#Organism\"</span><span class=\"nt\">&gt;</span>Organism<span class=\"nt\">&lt;/a&gt;</span>\n    as range.<span class=\"nt\">&lt;/p&gt;</span>\n<span class=\"nt\">&lt;/div&gt;</span>\n</code></pre></div></div>\n\n<p>So, now anyone can host an OWL ontology with dereferencable terms: to remove confusion, I have used the full URLs of the terms in <code class=\"language-plaintext highlighter-rouge\">@about</code> attributes.</p>",
      "summary": "There are many fancy tools to edit ontologies. I like simple editors, like nano. And like any hacker, I can hack OWL ontologies in nano. The hacking implies OWL was never meant to be hacked on a simple text editor; I am not sure that is really true. Anyways, HTML5 and RDFa will do fine, and here is a brief write up. This post will not cover the basics of RDFa and does assume you already know how triples work. If not, read this RDFa primer first.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/bdbOnto.png",
      "date_published": "2015-09-27T00:00:00+00:00",
      "date_modified": "2015-09-27T00:00:00+00:00",
      "tags": ["ontology","bridgedb","web","rdf","owl"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1186/1471-2105-11-5", "doi": "10.1186/1471-2105-11-5"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/fn8sq-0fg97",
      "url": "https://chem-bla-ics.linkedchemistry.info/2015/07/15/pubchemrdf-semantic-web-access-to.html",
      "title": "PubChemRDF: semantic web access to PubChem data",
      "content_html": "<p><img src=\"/assets/images/s13321-015-0084-4-graphical-abstract.gif\" style=\"width: 30%; display: block; margin-left: auto; margin-right: auto; float: right\" />\nGang Fu and Evan Bolton have <a href=\"https://pubchem.ncbi.nlm.nih.gov/rdf/\">blogged</a> about it previously, but their PubChemRDF paper is out now\n(doi:<a href=\"https://doi.org/10.1186/s13321-015-0084-4\">10.1186/s13321-015-0084-4</a>). It very likely defines the largest collection of RDF triples\nusing the <a href=\"http://chem-bla-ics.blogspot.nl/search?q=CHEMINF&amp;max-results=20&amp;by-date=true\">CHEMINF ontology</a> and I congratulate the\nauthors with a increasingly powerful <a href=\"http://pubchem.ncbi.nlm.nih.gov/\">PubChem</a> database.</p>\n\n<p>With this major provider of Linked Open Data for chemistry now published, I should soon see where\n<a href=\"http://chem-bla-ics.blogspot.nl/2012/07/isbjrn-4-added-cheminf-support.html\">my Isbjørn stands</a>. The release of this publication is\nalso very timely with respect to the CHEMINF ontology, as I last week finished a transition from Google to GitHub, by moving the important\nwiki pages, including one about “<a href=\"https://github.com/semanticchemistry/semanticchemistry/wiki/Where-is-the-CHEMINF-ontology-used%3F\">Where is the CHEMINF ontology used?</a>”.\nI already added Gang’s paper. A big thanks and congratulations to the PubChem team and my sincere thanks to have been able to contribute to this paper.</p>",
      "summary": "Gang Fu and Evan Bolton have blogged about it previously, but their PubChemRDF paper is out now (doi:10.1186/s13321-015-0084-4). It very likely defines the largest collection of RDF triples using the CHEMINF ontology and I congratulate the authors with a increasingly powerful PubChem database.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/s13321-015-0084-4-graphical-abstract.gif",
      "date_published": "2015-07-15T00:00:00+00:00",
      "date_modified": "2015-07-15T00:00:00+00:00",
      "tags": ["pubchem","rdf","cheminf","ontology"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1186/S13321-015-0084-4", "doi": "10.1186/S13321-015-0084-4"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/2kt7t-91257",
      "url": "https://chem-bla-ics.linkedchemistry.info/2015/04/18/chemistry-central-and-orcid-identifier.html",
      "title": "Chemistry Central and the ORCID identifier",
      "content_html": "<p><img style=\"float: right;\" src=\"/assets/images/orcidTshirt.png\" width=\"200\" />\nIf you are a scientist you have heard about the <a href=\"http://orcid.org/\">ORCID</a> identifier by now. If not, you have\nbeen focusing on groundbreaking research and isolated yourself from the rest of the world, just to make it perfect\nand get that Nobel prize next year. If you have been working on impactful research, Nobel prize-worthy, and have\nbeen blogging and tweeting about your progress, as a good Open Scholar, you know ORCID is the DOI for\n“research contributors” and you already have one yourself, and probably also that T-shirt with your own identifier.\nMine is <a href=\"http://orcid.org/0000-0001-7542-0286\">0000-0001-7542-0286</a>, and\n<a href=\"https://orcid.org/statistics\">almost 1.3M other authors</a> got one too. The list of\n<a href=\"https://en.wikipedia.org/wiki/Wikipedia:ORCID\">ORCIDs on Wikipedia</a> is growing\n(<a href=\"https://www.wikidata.org/wiki/Property:P496\">and Wikidata</a>), thanks to\n<a href=\"https://twitter.com/pigsonthewing\">Andy Mabbett</a>, whom also made it possible to add\n<a href=\"http://wikipathways.org/index.php/Template:User_ORCID\">your ORCID on WikiPathways</a>.</p>\n\n<p>Anyway, what I was pleased to see today that you can now log in with your ORCID identifier with the\n<a href=\"https://www.editorialmanager.com/CHIN/default.aspx\">Chemistry Central article submission system</a> (notice\nthe green icon):</p>\n\n<p><img src=\"/assets/images/orcidChemistryCentral.png\" style=\"width: 90%; display: block; margin-left: auto; margin-right: auto;\" alt=\"Screenshot of the Chemistry Central system login page with the normal username/password text boxes, but also a green ORCID logo to login via ORCID.\" /></p>\n\n<p>Many other publishers allow logging in with your ORCID too, which benefits many:</p>\n\n<ol>\n  <li>authors who just enter a list of ORCID identifiers, instead of a long list of author names and affiliations</li>\n  <li>publishers, which have a simpler submission system and get more accurate information about submitters</li>\n  <li>funding agencies which can more easily track what is done with the research funding</li>\n  <li>research institutes which can more easily track what their employees are studying</li>\n</ol>\n\n<p>Don’t have one yet? <a href=\"https://orcid.org/register\">Get your very own ORCID here</a>.</p>",
      "summary": "If you are a scientist you have heard about the ORCID identifier by now. If not, you have been focusing on groundbreaking research and isolated yourself from the rest of the world, just to make it perfect and get that Nobel prize next year. If you have been working on impactful research, Nobel prize-worthy, and have been blogging and tweeting about your progress, as a good Open Scholar, you know ORCID is the DOI for “research contributors” and you already have one yourself, and probably also that T-shirt with your own identifier. Mine is 0000-0001-7542-0286, and almost 1.3M other authors got one too. The list of ORCIDs on Wikipedia is growing (and Wikidata), thanks to Andy Mabbett, whom also made it possible to add your ORCID on WikiPathways.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/orcidTshirt.png",
      "date_published": "2015-04-18T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["orcid","wikidata","wikipedia"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/2cfqr-fwe26",
      "url": "https://chem-bla-ics.linkedchemistry.info/2015/04/10/getting-cas-registry-numbers-out-of.html",
      "title": "Getting CAS registry numbers out of WikiData",
      "content_html": "<p>I have promised my Twitter followers the <a href=\"https://www.wikidata.org/wiki/Q54871\">SPARQL query</a> you have all been waiting\nfor. Sadly, you had to wait for it for more than two months. I’m sorry about that. But, here it is:</p>\n\n<div class=\"language-sparql highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">PREFIX</span><span class=\"w\"> </span><span class=\"nn\">wd</span><span class=\"o\">:</span><span class=\"w\"> </span><span class=\"nn\">&lt;http://www.wikidata.org/entity/&gt;</span><span class=\"w\">\n\n</span><span class=\"k\">SELECT</span><span class=\"w\"> </span><span class=\"nv\">?compound</span><span class=\"w\"> </span><span class=\"nv\">?id</span><span class=\"w\"> </span><span class=\"k\">WHERE</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\">\n  </span><span class=\"nv\">?compound</span><span class=\"w\"> </span><span class=\"nn\">wd</span><span class=\"o\">:</span><span class=\"ss\">P231s</span><span class=\"w\"> </span><span class=\"p\">[</span><span class=\"w\"> </span><span class=\"nn\">wd</span><span class=\"o\">:</span><span class=\"ss\">P231v</span><span class=\"w\"> </span><span class=\"nv\">?id</span><span class=\"w\"> </span><span class=\"p\">]</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n</span><span class=\"p\">}</span><span class=\"w\">\n</span></code></pre></div></div>\n\n<p>What this query does is ask for all things (let’s call whatever is behind the identifier is a “compound”; of course, it can\nbe mixtures, ill-defined chemicals, nanomaterials, etc) that have a CAS registry identifier. This query results in a nice\ntable of <a href=\"https://www.wikidata.org/\">Wikidata</a> identifiers (e.g. <a href=\"https://www.wikidata.org/wiki/Q47512\">Q47512</a> is acetic acid)\nand matching CAS numbers, 16298 of them.</p>\n\n<p>Because Wikidata is not specific to the English Wikipedia, CAS numbers from other origin will show up too. For example, the\nCAS number for N-benzylacrylamide (<a href=\"https://www.wikidata.org/wiki/Q10334928\">Q10334928</a>) is provided by the Portuguese Wikipedia:</p>\n\n<p><img src=\"/assets/images/casPT.png\" alt=\"\" /></p>\n\n<p>I used Peter Ertl’s <a href=\"http://www.cheminfo.org/wikipedia\">cheminfo.org</a> (doi:<a href=\"https://doi.org/10.1186/s13321-015-0061-y\">10.1186/s13321-015-0061-y</a>)\nto confirm this compound indeed does not have an English page, which is somewhat surprising.</p>\n\n<p>The SPARQL query uses a predicate specifically for the CAS registry number (<a href=\"https://www.wikidata.org/wiki/Property:P231\">P231</a>).\nOther identifiers have similar predicates, like for PubChem compound (<a href=\"https://www.wikidata.org/wiki/Property:P662\">P662</a>) and\nChemspider (<a href=\"https://www.wikidata.org/wiki/Property:P661\">P661</a>). That means, Wikidata can become a community crowdsource of\nidentifier mappings, which is one of the things Daniel Mietchen, me, and a few others proposed in this H2020 grant application\n(doi:<a href=\"https://doi.org/10.5281/zenodo.13906\">10.5281/zenodo.13906</a>). The SPARQL query is run by the\n<a href=\"http://linkeddatafragments.org/\">Linked Data Fragments</a> platform, which you should really check out too, using the\n<a href=\"http://www.bioclipse.net/\">Bioclipse</a> manager I wrote around that.</p>\n\n<p>The full Bioclipse script looks like:</p>\n\n<div class=\"language-groovy highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">wikidataldf</span> <span class=\"o\">=</span> <span class=\"n\">ldf</span><span class=\"o\">.</span><span class=\"na\">createStore</span><span class=\"o\">(</span>\n  <span class=\"s2\">\"http://data.wikidataldf.com/wikidata\"</span>\n<span class=\"o\">)</span>\n\n<span class=\"c1\">// P231 CAS</span>\n<span class=\"n\">identifier</span> <span class=\"o\">=</span> <span class=\"s2\">\"P231\"</span>\n<span class=\"n\">type</span> <span class=\"o\">=</span> <span class=\"s2\">\"cas\"</span>\n\n<span class=\"n\">sparql</span> <span class=\"o\">=</span> <span class=\"s2\">\"\"\"\nPREFIX wd:\n\nSELECT ?compound ?id WHERE {\n  ?compound wd:${identifier}s [ wd:${identifier}v ?id ] .\n}\n\"\"\"</span>\n<span class=\"n\">mappings</span> <span class=\"o\">=</span> <span class=\"n\">rdf</span><span class=\"o\">.</span><span class=\"na\">sparql</span><span class=\"o\">(</span><span class=\"n\">wikidataldf</span><span class=\"o\">,</span> <span class=\"n\">sparql</span><span class=\"o\">)</span>\n\n<span class=\"c1\">// recreate an empty output file</span>\n<span class=\"n\">outFilename</span> <span class=\"o\">=</span> <span class=\"s2\">\"/Wikidata/${type}2wikidata.csv\"</span>\n<span class=\"k\">if</span> <span class=\"o\">(</span><span class=\"n\">ui</span><span class=\"o\">.</span><span class=\"na\">fileExists</span><span class=\"o\">(</span><span class=\"n\">outFilename</span><span class=\"o\">))</span> <span class=\"o\">{</span>\n  <span class=\"n\">ui</span><span class=\"o\">.</span><span class=\"na\">remove</span><span class=\"o\">(</span><span class=\"n\">outFilename</span><span class=\"o\">)</span>\n  <span class=\"n\">ui</span><span class=\"o\">.</span><span class=\"na\">newFile</span><span class=\"o\">(</span><span class=\"n\">outFilename</span><span class=\"o\">)</span>\n<span class=\"o\">}</span>\n\n<span class=\"c1\">// safe to a file</span>\n<span class=\"k\">for</span> <span class=\"o\">(</span><span class=\"n\">i</span><span class=\"o\">=</span><span class=\"mi\">1</span><span class=\"o\">;</span> <span class=\"n\">i</span><span class=\"o\">&lt;=</span><span class=\"n\">mappings</span><span class=\"o\">.</span><span class=\"na\">rowCount</span><span class=\"o\">;</span> <span class=\"n\">i</span><span class=\"o\">++)</span> <span class=\"o\">{</span>\n  <span class=\"n\">wdID</span> <span class=\"o\">=</span> <span class=\"n\">mappings</span><span class=\"o\">.</span><span class=\"na\">get</span><span class=\"o\">(</span><span class=\"n\">i</span><span class=\"o\">,</span> <span class=\"s2\">\"compound\"</span><span class=\"o\">).</span><span class=\"na\">substring</span><span class=\"o\">(</span><span class=\"mi\">3</span><span class=\"o\">)</span>\n  <span class=\"n\">ui</span><span class=\"o\">.</span><span class=\"na\">append</span><span class=\"o\">(</span>\n    <span class=\"n\">outFilename</span><span class=\"o\">,</span>\n    <span class=\"n\">wdID</span> <span class=\"o\">+</span> <span class=\"s2\">\",\"</span> <span class=\"o\">+</span> <span class=\"n\">mappings</span><span class=\"o\">.</span><span class=\"na\">get</span><span class=\"o\">(</span><span class=\"n\">i</span><span class=\"o\">,</span> <span class=\"s2\">\"id\"</span><span class=\"o\">)</span> <span class=\"o\">+</span> <span class=\"s2\">\"\\n\"</span>\n  <span class=\"o\">)</span>\n<span class=\"o\">}</span>\n</code></pre></div></div>\n\n<p>BTW, of course, all this depends on work by many others including the core <a href=\"http://tools.wmflabs.org/wikidata-exports/rdf/\">RDF generation</a>\nwith the <a href=\"https://www.mediawiki.org/wiki/Wikidata_Toolkit\">Wikidata Toolkit</a>. See also the paper by Erxleben <em>et al.</em>\n(<a href=\"http://korrekt.org/papers/Wikidata-RDF-export-2014.pdf\">PDF</a>).</p>",
      "summary": "I have promised my Twitter followers the SPARQL query you have all been waiting for. Sadly, you had to wait for it for more than two months. I’m sorry about that. But, here it is:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/casPT.png",
      "date_published": "2015-04-10T00:00:00+00:00",
      "date_modified": "2015-04-10T00:00:00+00:00",
      "tags": ["wikidata","chemistry","cas","ldf"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1186/s13321-015-0061-y", "doi": "10.1186/s13321-015-0061-y"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.5281/ZENODO.13906", "doi": "10.5281/ZENODO.13906"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1007/978-3-319-11964-9_4", "doi": "10.1007/978-3-319-11964-9_4"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/vbvzk-2d025",
      "url": "https://chem-bla-ics.linkedchemistry.info/2015/01/11/programming-in-life-sciences-21-2014.html",
      "title": "Programming in the Life Sciences #21: 2014 Screenshots #1",
      "content_html": "<p>December saw the end of this year’s PRA3006 course (aka <a href=\"http://chem-bla-ics.blogspot.nl/2013/10/programming-in-life-sciences-1-six-day.html\">#mcspils</a>).\nTime to blog some screenshots of the student projects. Like last year, the aim is to use the <a href=\"https://dev.openphacts.org/\">Open PHACTS API</a>\nto collect data with <a href=\"https://github.com/openphacts/ops.js\">ops.js</a> and which should then be visualized in a HTML page, preferably with\n<a href=\"http://d3js.org/\">d3.js</a>. This year, all projects reached that goal.</p>\n\n<h2 id=\"ace-inhibitors\">ACE inhibitors</h2>\n\n<p>The first team (Mischa-Alexander and Hamza) focused on the <a href=\"https://en.wikipedia.org/wiki/ACE_inhibitor\">ACE inhibitors</a> (type:”drug class”) and the\n<a href=\"http://wikipathways.org/index.php/Pathway:WP554\">WP554 from WikiPathways</a>. The use a tree structure to list inhibitors along with their activity:</p>\n\n<p><img src=\"/assets/images/pra3006_screenshot1.png\" alt=\"\" /></p>\n\n<p>The source code for this project is available under a MIT license.</p>\n\n<h2 id=\"diabetes\">Diabetes</h2>\n\n<p>The second team (Catherine and Moritz) looked at compounds hitting diabetes mellitus targets. They take advantage from the new disease API methods and\nfirst ask for all targets for the disease, and then query for all compounds. Mind you, the compounds are not filtered by activity, so it mostly shows\ninteractions that real targets.</p>\n\n<p><img src=\"/assets/images/pra3006_screenshot2.png\" alt=\"\" /></p>\n\n<p>This product too is available with the MIT license.</p>\n\n<h2 id=\"tuberculosis\">Tuberculosis</h2>\n\n<p>The third project (Nadia en Loic) also goes from disease to targets and they looked at tuberculosis.</p>\n\n<p><img src=\"/assets/images/pra3006_screenshot3.png\" alt=\"\" /></p>\n\n<h2 id=\"asynchronous-calls\">Asynchronous calls</h2>\n\n<p>If you know the ops.js, d3.js, and JavaScript a bit, you know that these projects are not trivial. The remote web service calls are made in an\nasynchronous manner: each call comes with a callback function that gets called when the server returns an answer, at some future point in time.\nTherefore, if you want to visualization, for example, compounds with activities against targets for a particular disease, you need two web service\ncalls, with the second made in the callback function of the first call. Now, try to globally collect the data from that with JavaScript and HTML,\nand make sure to call the visualization call when all information is collected! But even without that, the students need to convert the returned\nweb service answer into a format that d3.js can handle. In short: quite a challenge for six practical days!</p>",
      "summary": "December saw the end of this year’s PRA3006 course (aka #mcspils). Time to blog some screenshots of the student projects. Like last year, the aim is to use the Open PHACTS API to collect data with ops.js and which should then be visualized in a HTML page, preferably with d3.js. This year, all projects reached that goal.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/pra3006_screenshot1.png",
      "date_published": "2015-01-11T00:00:00+00:00",
      "date_modified": "2015-01-11T00:00:00+00:00",
      "tags": ["pra3006","openphacts"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/x3nam-7eb49",
      "url": "https://chem-bla-ics.linkedchemistry.info/2015/01/09/royal-society-of-chemistry-grants.html",
      "title": "&quot;Royal Society of Chemistry grants journals access to Wikipedia Editors&quot;",
      "content_html": "<p>The <a href=\"http://www.rsc.org/\">Royal Society of Chemistry</a> and <a href=\"http://www.wikipedia.org/\">Wikipedia</a>\njust released an interesting <a href=\"https://blog.wikimedia.org.uk/2015/01/royal-society-of-chemistry-grants-journals-access-to-wikipedia-editors/comment-page-1/\">press release</a>:</p>\n\n<blockquote>\n  <p>“The Royal Society of Chemistry has announced that it is donating 100 “RSC Gold” accounts –\nthe complete portfolio of their journals and databases – to be used by Wikipedia editors who\nwrite about chemistry. The partnership is part of a wider collaboration between the Society’s\nmembers and staff, Wikimedia UK and the Wikimedia community. The collaboration is working to\nimprove the coverage of chemistry-related topics on Wikipedia and its sister projects.”</p>\n</blockquote>\n\n<p>This leaves me with a lot of questions. I asked these in a comment awaiting moderation:</p>\n\n<blockquote>\n  <p>Can you elaborate on the conditions? Is it limited to wikipedia.org or does it extend to\nother Wikimedia projects, like Wikidata? Does the agreement allow manual lookup of information\nonly, or does it allow text mining on the literature as well as on the database? How should\nI put this in perspective with the UK law that allows text mining, and, in particular, can\nUK Wikipedia editors use text mining anyway, or is that restricted? Is there an overview\nof the details of what is allowed and not allowed, or a list of restrictions otherwise?</p>\n</blockquote>\n\n<p>Details on how to apply to access can be found <a href=\"https://en.wikipedia.org/wiki/Wikipedia:RSC_Gold\">here</a>.</p>",
      "summary": "The Royal Society of Chemistry and Wikipedia just released an interesting press release:",
      
      "date_published": "2015-01-09T00:00:00+00:00",
      "date_modified": "2015-01-09T00:00:00+00:00",
      "tags": ["chemistry","openscience","wikipedia"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/szpy6-v8c44",
      "url": "https://chem-bla-ics.linkedchemistry.info/2014/11/16/programming-in-life-sciences-20.html",
      "title": "Programming in the Life Sciences #20: extracting data from JSON",
      "content_html": "<p>I previously wrote about the <a href=\"http://chem-bla-ics.blogspot.nl/2013/10/programming-in-life-sciences-10.html\">JavaScript Object Notation</a>\n(JSON) which has become a de facto standard for sharing data by web services. I personally\nstill prefer something using the <a href=\"https://en.wikipedia.org/wiki/Resource_Description_Framework\">Resource Description Framework</a>\n(RDF) because of its clear link to ontologies, but perhaps\n<a href=\"https://en.wikipedia.org/wiki/JSON-LD\">JSON-LD</a> combines the best of both worlds.</p>\n\n<p>The <a href=\"https://dev.openphacts.org/\">Open PHACTS API</a> support various formats and this\nJSON is the default format used by the <a href=\"https://github.com/openphacts/ops.js\">ops.js</a>\nlibrary. However, the amount of information returned by the Open PHACTS cache is complex,\nand generally includes more than you want to use in the next step. Therefore, it is\nneeded to extract data from the JSON document, which was not covered in the\n<a href=\"http://chem-bla-ics.blogspot.nl/2013/10/programming-in-life-sciences-10.html\">post #10</a>\n<a href=\"http://chem-bla-ics.blogspot.nl/2013/10/programming-in-life-sciences-11-html.html\">or #11</a>.</p>\n\n<p>Let’s start with the example JSON given in that post, and let’s consider this is the\nvalue of a variable with the name jsonData:</p>\n\n<div class=\"language-json highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"p\">{</span><span class=\"w\">\n    </span><span class=\"nl\">\"id\"</span><span class=\"p\">:</span><span class=\"w\"> </span><span class=\"mi\">1</span><span class=\"p\">,</span><span class=\"w\">\n    </span><span class=\"nl\">\"name\"</span><span class=\"p\">:</span><span class=\"w\"> </span><span class=\"s2\">\"Foo\"</span><span class=\"p\">,</span><span class=\"w\">\n    </span><span class=\"nl\">\"price\"</span><span class=\"p\">:</span><span class=\"w\"> </span><span class=\"mi\">123</span><span class=\"p\">,</span><span class=\"w\">\n    </span><span class=\"nl\">\"tags\"</span><span class=\"p\">:</span><span class=\"w\"> </span><span class=\"p\">[</span><span class=\"w\"> </span><span class=\"s2\">\"Bar\"</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"s2\">\"Eek\"</span><span class=\"w\"> </span><span class=\"p\">],</span><span class=\"w\">\n    </span><span class=\"nl\">\"stock\"</span><span class=\"p\">:</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\">\n        </span><span class=\"nl\">\"warehouse\"</span><span class=\"p\">:</span><span class=\"w\"> </span><span class=\"mi\">300</span><span class=\"p\">,</span><span class=\"w\">\n        </span><span class=\"nl\">\"retail\"</span><span class=\"p\">:</span><span class=\"w\"> </span><span class=\"mi\">20</span><span class=\"w\">\n    </span><span class=\"p\">}</span><span class=\"w\">\n</span><span class=\"p\">}</span><span class=\"w\">\n</span></code></pre></div></div>\n\n<p>We can see that this JSON value starts with a map-like structure. We can also see that\nthere is a list embedded, and another map. I guess that one of the reasons why JSON\nhas taken such a flight is how well it integrates with the JavaScript language: selecting\ncontent can be done in terms of core language features, different from, for example,\n<a href=\"https://en.wikipedia.org/wiki/XPath\">XPath</a> statements needed for\n<a href=\"https://en.wikipedia.org/wiki/XML\">XML</a> or <a href=\"https://en.wikipedia.org/wiki/SPARQL\">SPARQL</a>\nfor RDF content. This is because the notation just follows core data types of JavaScript\nand data is stored as native data types and objects.</p>\n\n<p>For example, to get the price value from the above JSON code, we use:</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kd\">var</span> <span class=\"nx\">price</span> <span class=\"o\">=</span> <span class=\"nx\">jsonData</span><span class=\"p\">.</span><span class=\"nx\">price</span><span class=\"p\">;</span>\n</code></pre></div></div>\n\n<p>Or, if we want to get the first value in the Bar-Eek list, we use:</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kd\">var</span> <span class=\"nx\">tag</span> <span class=\"o\">=</span> <span class=\"nx\">jsonData</span><span class=\"p\">.</span><span class=\"nx\">tags</span><span class=\"p\">[</span><span class=\"mi\">0</span><span class=\"p\">];</span>\n</code></pre></div></div>\n\n<p>Or, if we want to inspect the warehouse stock:</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kd\">var</span> <span class=\"nx\">inStock</span> <span class=\"o\">=</span> <span class=\"nx\">jsonData</span><span class=\"p\">.</span><span class=\"nx\">stock</span><span class=\"p\">.</span><span class=\"nx\">warehouse</span><span class=\"p\">;</span>\n</code></pre></div></div>\n\n<p>Now, the JSON returned by the Open PHACTS API has a lot more information. This is why the\nonline, interactive documentation is so helpful: it shows the JSON. In fact, given that\nJSON is so much used, there are many tools online that help you, such as\n<a href=\"http://jsoneditoronline.org/\">jsoneditoronline.org</a> (yes, it will show error messages\nif the syntax is wrong):</p>\n\n<p><img src=\"/assets/images/debug3.png\" alt=\"\" /></p>\n\n<p>BTW, I also recommend installing a JSON viewer extension for\n<a href=\"https://chrome.google.com/webstore/detail/jsonview/chklaanhfefbnpoihckbnefhakgolnmc?hl=en#sthash.vsIhyalK.dpuf\">Chrome</a>\nor for <a href=\"https://addons.mozilla.org/en-US/firefox/addon/jsonview/\">Firefox</a>. Once you\nhave installed this plugin, you can not just read the JSON on Open PHACTS’\ninteractive documentation page, but also open the Request URL into a separate browser\nwindow. Just copy/paste the URL from this output:</p>\n\n<p><img src=\"/assets/images/json.png\" alt=\"\" /></p>\n\n<p>And with a JSON viewing extension, opening this <em>https://beta.openphacts.org/1.3/pathways/…</em>\nURL in your browser window will look something like:</p>\n\n<p><img src=\"/assets/images/json1.png\" alt=\"\" /></p>\n\n<p>And because these extensions typically use syntax highlighting, it is easier to understand\nhow to access information from within your JavaScript code. For example, if we want the\nnumber of pathways in which the compound <a href=\"http://www.conceptwiki.org/concept/index/7e0a4dd4-8160-4906-9db1-fdb300e888ea\">testosterone</a>\n(the link is the <a href=\"http://scholar.google.com/scholar?hl=nl&amp;q=ConceptWiki\">ConceptWiki</a>\nURL in the above example) is found, we can use this code:</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kd\">var</span> <span class=\"nx\">pathwayCount</span> <span class=\"o\">=</span> <span class=\"nx\">jsonData</span><span class=\"p\">.</span><span class=\"nx\">result</span><span class=\"p\">.</span><span class=\"nx\">primaryTopic</span><span class=\"p\">.</span><span class=\"nx\">pathway_count</span><span class=\"p\">;</span>\n</code></pre></div></div>",
      "summary": "I previously wrote about the JavaScript Object Notation (JSON) which has become a de facto standard for sharing data by web services. I personally still prefer something using the Resource Description Framework (RDF) because of its clear link to ontologies, but perhaps JSON-LD combines the best of both worlds.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/debug3.png",
      "date_published": "2014-11-16T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["pra3006"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/m3gc1-3mv56",
      "url": "https://chem-bla-ics.linkedchemistry.info/2014/11/16/programming-in-life-sciences-19.html",
      "title": "Programming in the Life Sciences #19: debugging",
      "content_html": "<p>Debugging is the process find removing a fault in your code\n(<a href=\"https://en.wikipedia.org/wiki/Software_bug#Etymology\">the etymology</a> goes further back\nthan the moth story, I learned today). Being able to debug is an essential programming skill,\nand being able to program flawlessly is not enough; the bug can be outside your own code.\n(… there is much that can be written up about module interactions, APIs, documentation, etc,\nthat lead to <em>malfunctioning</em> code …)</p>\n\n<p>While there are full debugging tools, achieving the task of finding where the bug is can\noften be reached with simpler means:</p>\n\n<ol>\n  <li>take notice of error messages</li>\n  <li>add debug statements in your code</li>\n</ol>\n\n<h2 id=\"error-messages\">Error messages</h2>\n\n<p>Keeping track of error messages is first starting point. This skill is almost an art:\nit requires having seen enough for them to understand how to interpret them. I guess\nerror messages are the worst developed aspects of programming language, and I do not\nfrequently see programming language tutorial that discuss error messages. The field can\ncertainly improve here.</p>\n\n<p>However, at least error messages in general give an indication where the problem occurs.\nOften by a line number, though this number is not always accurate. Underlying causes of\nthat are the problem that if there is a problem in the code, it is not always clear what\nthe problem is. For example, if there is a closing (or opening) bracket missing somewhere,\nhow can the <a href=\"http://chem-bla-ics.blogspot.nl/2013/10/exercise-what-variable-type-would-you.html\">compiler</a>\ndecide what the author of the code meant? Web browsers like Firefox/Iceweasel and\nChrome (Ctrl-Shift-C) have a console that displays compiler errors and warnings:</p>\n\n<p><img src=\"/assets/images/debug1.png\" alt=\"\" /></p>\n\n<p>Another issue is that error messages can be cryptic and misleading. For example, the\nabove error message <em>“TypeError: searcher.bytag is not a function example1.html:73”</em>\nis confusing for a starting programmer. Surely, the source code calls <code class=\"language-plaintext highlighter-rouge\">searcher.bytag()</code>\nwhich definately is a function. So, why does the compiler say it is not?? The bug here,\nof course, is that the function called in the source code is not found: it should be\n<a href=\"https://github.com/openphacts/ops.js/blob/master/src/ConceptWikiSearch.js#L9\">byTag()</a>.</p>\n\n<p>But this bug at least can be detected during interpretation and executing of the code.\nThat is, it is clear to the compiler that it doesn’t know how to handle the code.\nAnother common problem is the situation where the code looks fine (to the compiler),\nbut the data it handles makes the code break down. For example, an variable doesn’t\nhave the expected value, leading to errors (e.g. null pointer-style). Therefore,\nunderstanding the variable values at a particular point in your code can be of\ngreat use.</p>\n\n<h2 id=\"console-output\">Console output</h2>\n\n<p>A simple way to inspect the content of a variable is to use this console visible in\nthe above screenshot. Many programming languages have their custom call to send output\nthere. Java has the <code class=\"language-plaintext highlighter-rouge\">System.out.println()</code> and JavaScript has <code class=\"language-plaintext highlighter-rouge\">console.log()</code>:</p>\n\n<p><img src=\"/assets/images/debug2.png\" alt=\"\" /></p>\n\n<p>Thus, if you have some complex bit of code with multiple for-loops, if-else statements,\netc, this can be used to see if some part of your code that you expect to be called\nreally is:</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nx\">console</span><span class=\"p\">.</span><span class=\"nf\">log</span><span class=\"p\">(</span><span class=\"dl\">\"</span><span class=\"s2\">He, I'm here!</span><span class=\"dl\">\"</span><span class=\"p\">);</span>\n</code></pre></div></div>\n\n<p>This can be very useful when using asynchronous web service calls! Similarly, see\nwhat the value of some variable is:</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kd\">var</span> <span class=\"nx\">label</span> <span class=\"o\">=</span> <span class=\"nx\">jsonResponse</span><span class=\"p\">.</span><span class=\"nx\">items</span><span class=\"p\">[</span><span class=\"nx\">i</span><span class=\"p\">].</span><span class=\"nx\">prefLabel</span><span class=\"p\">;</span>\n<span class=\"nx\">console</span><span class=\"p\">.</span><span class=\"nf\">log</span><span class=\"p\">(</span><span class=\"dl\">\"</span><span class=\"s2\">label: </span><span class=\"dl\">\"</span> <span class=\"o\">+</span> <span class=\"nx\">label</span><span class=\"p\">);</span>\n</code></pre></div></div>\n\n<p>Also, because JavaScript is not a <a href=\"https://en.wikipedia.org/wiki/Strong_and_weak_typing\">strongly typed programming</a>\nI frequently find myself inspecting the <a href=\"http://chem-bla-ics.blogspot.nl/2013/10/exercise-what-variable-type-would-you.html\">data type</a>\nof a variable:</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kd\">var</span> <span class=\"nx\">label</span> <span class=\"o\">=</span> <span class=\"nx\">jsonResponse</span><span class=\"p\">.</span><span class=\"nx\">items</span><span class=\"p\">[</span><span class=\"nx\">i</span><span class=\"p\">].</span><span class=\"nx\">prefLabel</span><span class=\"p\">;</span>\n\n<span class=\"nx\">console</span><span class=\"p\">.</span><span class=\"nf\">log</span><span class=\"p\">(</span><span class=\"dl\">\"</span><span class=\"s2\">typeof label: </span><span class=\"dl\">\"</span> <span class=\"o\">+</span> <span class=\"k\">typeof</span><span class=\"p\">(</span><span class=\"nx\">label</span><span class=\"p\">));</span>\n</code></pre></div></div>\n\n<h2 id=\"conclusion\">Conclusion</h2>\n\n<p>These tools are very useful to find the location of a bug. And this matters. Yesterday,\nI was trying to use the <a href=\"http://chem-bla-ics.blogspot.nl/2014/11/programming-in-life-sciences-18.html\">histogram code in example6.html</a>\nto visualize a set of values with negative numbers (<a href=\"https://en.wikipedia.org/wiki/Zeta_potential\">zeta potentials</a>\nof nanomaterials, to be precise) and I was debugging the issue, trying to find where\nmy code when wrong. I used the above approaches, and the array of values looked in\norder, but different from the original example. But still the histogram was not\nshowing up. Well, after hours, and having asked someone else to look at the code\ntoo, and having ruled out many alternatives, she pointed out that the problem was\nnot in the JavaScript part of the code, but in the HTML: I was mixing up how\ndefault JavaScript and the d3.js library add SVG content to the HTML data model.\nThat is, I was using <code class=\"language-plaintext highlighter-rouge\">&lt;div id=\"chart\"&gt;</code>, which works with <code class=\"language-plaintext highlighter-rouge\">document.getElementById(\"chart\").innerHTML</code>,\nbut needed to use <code class=\"language-plaintext highlighter-rouge\">&lt;div class=\"chart\"&gt;</code> with the <code class=\"language-plaintext highlighter-rouge\">d3.select(\".chart\").innerHTML</code>\ncode I was using later.</p>\n\n<p>OK, that bug was on my account. However, it still was not working: I did see a\nhistogram, but it didn’t look good. Again debugging, and after again much too long,\nI found out that this was a bug in the d3.js code that makes it impossible to use\ntheir histogram example code for negative values. Again, once I knew where the bug\nwas, I could Google and quickly found\n<a href=\"http://stackoverflow.com/questions/15388481/d3-js-histogram-with-positive-and-negative-values\">the solution for it on StackOverflow</a>.</p>\n\n<p>So, the workflow of debugging at a top level, looks like:</p>\n\n<ol>\n  <li>find where the problem is</li>\n  <li>try to solve the problem</li>\n</ol>\n\n<p>Happy debugging!</p>",
      "summary": "Debugging is the process find removing a fault in your code (the etymology goes further back than the moth story, I learned today). Being able to debug is an essential programming skill, and being able to program flawlessly is not enough; the bug can be outside your own code. (… there is much that can be written up about module interactions, APIs, documentation, etc, that lead to malfunctioning code …)",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/debug2.png",
      "date_published": "2014-11-16T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["pra3006"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/x11tv-jc874",
      "url": "https://chem-bla-ics.linkedchemistry.info/2014/11/06/programming-in-life-sciences-17-open.html",
      "title": "Programming in the Life Sciences #17: The Open PHACTS scientific questions",
      "content_html": "<p>I think the authors of the <a href=\"http://www.openphacts.org/\">Open PHACTS</a> proposal made a right choice\nin defining a small set of questions that the solution to be developed could be tested against.\nThe questions being specific, it is much easier to understand the needs. In fact, I suspect it may\neven be a very useful form of requirement analysis, and makes it hard to keep using vague terms.</p>\n\n<p><img src=\"/assets/images/opsSciencyQs.jpg\" alt=\"\" /></p>\n\n<p>Open PHACTS has come up with 20 questions (doi:<a href=\"https://doi.org/10.1016/j.drudis.2013.05.008\">10.1016/j.drudis.2013.05.008</a>;\nOpen Access):</p>\n\n<ol>\n  <li><em>Give me all oxidoreductase inhibitors active &lt;100 nM in human and mouse</em></li>\n  <li><em>Given compound X, what is its predicted secondary pharmacology? What are the on- and off-target safety concerns for a compound? What is the evidence and how reliable is that evidence (journal impact factor, KOL) for findings associated with a compound?</em></li>\n  <li><em>Given a target, find me all actives against that target. Find/predict polypharmacology of actives. Determine ADMET profile of actives</em></li>\n  <li><em>For a given interaction profile – give me similar compounds</em></li>\n  <li><em>The current Factor Xa lead series is characterized by substructure X. Retrieve all bioactivity data in serine protease assays for molecules that contain substructure X</em></li>\n  <li><em>A project is considering protein kinase C alpha (PRKCA) as a target. What are all the compounds known to modulate the target directly? What are the compounds that could modulate the target directly? I.e. return all compounds active in assays where the resolution is at least at the level of the target family (i.e. PKC) from structured assay databases and the literature</em></li>\n  <li><em>Give me all active compounds on a given target with the relevant assay data</em></li>\n  <li><em>Identify all known protein–protein interaction inhibitors</em></li>\n  <li><em>For a given compound, give me the interaction profile with targets</em></li>\n  <li><em>For a given compound, summarize all ‘similar compounds’ and their activities</em></li>\n  <li><em>Retrieve all experimental and clinical data for a given list of compounds defined by their chemical structure (with options to match stereochemistry or not)</em></li>\n  <li><em>For my given compound, which targets have been patented in the context of Alzheimer’s disease?</em></li>\n  <li><em>Which ligands have been described for a particular target associated with transthyretin-related amyloidosis, what is their affinity for that target and how far are they advanced into preclinical/clinical phases, with links to publications/patents describing these interactions?</em></li>\n  <li><em>Target druggability: compounds directed against target X have been tested in which indications? Which new targets have appeared recently in the patent literature for a disease? Has the target been screened against in AZ before? What information on in vitro or in vivo screens has already been performed on a compound?</em></li>\n  <li><em>Which chemical series have been shown to be active against target X? Which new targets have been associated with disease Y? Which companies are working on target X or disease Y?</em></li>\n  <li><em>Which compounds are known to be activators of targets that relate to Parkinson’s disease or Alzheimer’s disease</em></li>\n  <li><em>For my specific target, which active compounds have been reported in the literature? What is also known about upstream and downstream targets?</em></li>\n  <li><em>Compounds that agonize targets in pathway X assayed in only functional assays with a potency &lt;1 μM</em></li>\n  <li><em>Give me the compound(s) that hit most specifically the multiple targets in a given pathway (disease)</em></li>\n  <li><em>For a given disease/indication, give me all targets in the pathway and all active compounds hitting them</em></li>\n</ol>\n\n<p>Students in the <a href=\"http://chem-bla-ics.blogspot.nl/search/label/%23mscpils\">Programming in the Life Sciences course</a>\nwill this year pick one of these questions as a starting point in the project. The goal is to develop\na HTML+JavaScript solution that will answer the question the selected. There is freedom to tweak the\nquestion to personal interests, of course. By selecting a simpler pharmacological question that last\nyear, more time and effort can be put into visualization and interpretation of the found data.</p>",
      "summary": "I think the authors of the Open PHACTS proposal made a right choice in defining a small set of questions that the solution to be developed could be tested against. The questions being specific, it is much easier to understand the needs. In fact, I suspect it may even be a very useful form of requirement analysis, and makes it hard to keep using vague terms.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/opsSciencyQs.jpg",
      "date_published": "2014-11-06T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["pra3006"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1016/j.drudis.2013.05.008", "doi": "10.1016/j.drudis.2013.05.008"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/3r6kd-ev543",
      "url": "https://chem-bla-ics.linkedchemistry.info/2014/08/30/on-open-access.html",
      "title": "On Open Access in The Netherlands",
      "content_html": "<p><img style=\"float: right;\" src=\"/assets/images/vsnu.png\" width=\"200\" />\nYesterday, I received a letter from the <a href=\"http://vsnu.nl/\">Association of Universities The Netherlands</a> (VSNU, <a href=\"https://twitter.com/deVSNU\">@deVSNU</a>)\nabout Open Access. The Netherlands is for research a very interesting country: it’s small, meaning we have few resources to establish and maintain\nhigh profile centers, we also believe strong education benefits from distribution, so we we have many good universities, rather than a few excelling\nuniversities. Mind you, this clouds that we absolutely do have excelling research institutes and research groups; they just are not concentrated in\none university.</p>\n\n<p>Another important aspect is that all those Dutch universities are expected to compete which each other for funding. As a result I have experience\nrather interesting collaborations between universities. That’s a downside of a small country: everyone knows each other, often in way to much\ndetail. But my point is that the Dutch can be rather conservative. That kills innovation, and is in my opinion a key reason why\n<a href=\"http://www.rathenau.nl/actueel/nieuwsberichten/2014/08/universiteiten-blijven-hangen-in-de-subtop.html\">we are not breaking into the top 50 of rankings</a>,\nmore than concentration. Concentration of funding in Top research institutes has not been extensively evaluated, but I think the efficiency is\nnot proven higher than previous funding approaches.</p>\n\n<p>Anyway, this letter I received is part of <a href=\"http://vsnu.nl/openaccess/\">their Open Access program</a>. Here too, the Dutch universities are conservative\n(well, relatively from my views, at least). Now, the Open Access debate is not so interesting, because it primarily ends up about who pays who\n(boring) and whether we should go gold or green (besides the point, see below), and, sadly, here too many people think about who pays who again\n(still boring).</p>\n\n<p>Therefore, giving the outlined importance and impact of Dutch research, I found it relevant to post about the progress of Open Access in my small\ncountry. The letter is <a href=\"http://www.vsnu.nl/files/documenten/Domeinen/Onderzoek/Open%20access/14267%20Open%20Access%20to%20publications%20(ENG).pdf\">available in English</a>.</p>\n\n<p>Basically, the letter is an answer to an earlier letter from our government about Open Access, and it warns about actions that will soon be\nundertaken (so, not really pro-active). However,</p>\n\n<blockquote>\n  <p>“[they] are also appealing to you to continue to advocate free access to your own scientific publications.”</p>\n</blockquote>\n\n<p>Well, I have, not so actively, and maybe this post can be the start of a change. Because what basically bothers me is that the Open Access\ndiscussion, also in The Netherlands, is biased. And indeed, the letter continues with a section about gold and green access. If the VSNU\nreally wants to promote free access to <strong><em>research</em></strong>, it should not even accept green. We all know that it is not about being able to look at (free),\nbut to be able to mix and improve. Reuse. Continue. Stand on shoulders. The fact that this letter focuses on publications only, does not spend a\nword on reuse, is rather depressing and not giving me even the slightest hint that The Netherlands will break into that Top 50 any time soon.</p>\n\n<p>Overall, the latter is relatively positive for the Open Access movement, though reactive. <a href=\"https://twitter.com/egonwillighagen/status/504973493742891008\">They still have some explanation to do</a>:</p>\n\n<blockquote>\n  <p>“The golden route is more complex. However, many believe that in the end it is a\nmore sustainable route to Open Access.”</p>\n</blockquote>\n\n<p>(Or maybe readers can explain me what is complex about the golden route?)</p>\n\n<p>The following is a rather interesting section, but really only when they had focused on Open Access in its pure form that allows research\nreuse. I think it now leaves you with a low starting point bargaining with resistant publisher lawyers and managers that have long\n<a href=\"http://alexholcombe.wordpress.com/2013/01/09/scholarly-publishers-and-their-high-profits/\">lost the interest of the academics in favor of that of the share holders</a>:</p>\n\n<blockquote>\n  <p>For the past ten years, publishers have been offering journals in package deals referred to as Big Deals. Shortly negotiations with\nthe major publishers about these Big Deals Will take place, including Elsevier, Springer and Wiley. The Dutch universities have expressed\ntheir wish to make agreements with these publishers about the transition to Open Access as part of those Big Deals. Universities expect\npublishers to take serious steps to facilitate that transition.</p>\n</blockquote>\n\n<p>I hope the VSNU will clarify with what they mean with “serious”. Because they all came up with “me too” solutions (setting up new OA\njournals) without seriously changing their model. No large publisher dared making the flagship journals full gold Open Access. That is\nserious business; all we see now is scribbling in the margin.</p>\n\n<p>Perhaps that is the reason of the wish to be in the top 50. Maybe the VSNU just wants a better bargaining position.</p>\n\n<p>The letter ends with what researchers can do. And with that, they are spot on:</p>\n\n<blockquote>\n  <p>As a researcher, you can play a vital role in the transition to Open Access. We have\nmentioned the possibility of depositing arlídes in the repository of your own\nuniversity. But there is more. It’s important to consider that researchers play a key\nrole in the publishing process: as providers of the scientific content, as reviewers\nand as members of editorial and advisory boards. We hope that where ever possible,\nyou will ask publishers to convert to an Open Access model.</p>\n</blockquote>\n\n<p>What any researcher can already do to promote (proper) Open Access:</p>\n\n<ol>\n  <li>stop reviewing publishing closed-access papers (you have way too much review requests already, and some filtering will not hurt you)</li>\n  <li>stop reviewing publishing for non-gold Open Access journals (step further than the first item)</li>\n  <li>submit only to full-gold Open Access journals (plenty of options; importantly, the quality and impact of your paper is not dependent on the journal, but on you. if not, you’re just a bad author and researcher and should go back to school or start learning from feed back on your Open Notebook Science, so that you improve your act before you submit; really, it happens to the best of us: multidisciplinary research is hard: you cannot excel in biology and chemistry and statistics and informatics and computer science and data analysis and materials science and as perfect and creative linguistic (well, not all of us, anyway))</li>\n  <li>put your previous mistakenly closed-access papers in university repositories (most Dutch universities have solutions; not all yet)</li>\n  <li>make previously published closed-access papers gold Open Access (yes, you can! I am in the process of doing this for the CDK I paper, and other ACS papers will follow)</li>\n  <li><a href=\"https://orcid.org/register\">get an ORCID</a></li>\n  <li>use <a href=\"https://en.wikipedia.org/wiki/Altmetrics\">#altmetrics</a> to see that gold Open Access gives you more impact for your papers too (service providers include <a href=\"https://impactstory.org/\">ImpactStory</a>, <a href=\"http://altmetric.com/\">Altmetric.com</a>, <a href=\"http://www.plumanalytics.com/\">Plum Analytics</a>, etc)</li>\n</ol>\n\n<p>Of course, it is not only about publications. Again, the VSNU would do good to learn that research is not the same as publications.\nBesides sending letters, I think the VSNU can do this to promote Open Science, which is what I hope they are after:</p>\n\n<ol>\n  <li>negotiate with the government and major science and funding agencies (KNAW, NWO) to stop focusing on publications as primary output</li>\n  <li>start focusing on output other than publications (e.g. data sets, software) even if you have not ended negotiations with other, just to set a proper example</li>\n  <li>make research outcomes machine readable (read <a href=\"https://researchkb.wordpress.com/2014/08/26/linked-open-data-at-the-national-library-of-the-netherlands/\">this interesting post from our national library</a>)</li>\n  <li>actively explore business models around Open Science (and not have your universities’ spin-off departments only know about patent law, ignore the rest of the world)</li>\n  <li>adopt the ORCID nation wide, staring Jan 2015</li>\n  <li>start using #altmetrics to get a better perspective of the performance of your members</li>\n</ol>\n\n<p>Of course, I am more than willing to help the VNSU with this transition. I can be reached at the\n<a href=\"http://www.bigcat.unimaas.nl/\">Department of Bioinformatics - BiGCaT</a>, <a href=\"http://www.maastrichtuniversity.nl/web/show/id=6265112/langid=42\">NUTRIM</a>,\n<a href=\"http://www.maastrichtuniversity.nl/web/show/id=74338/langid=42\">FHML</a>, <a href=\"http://www.maastrichtuniversity.nl/\">Maastricht University</a>.\nThere are many options I have missed here (like data repositories, data citing, DOIs, and whatever).</p>\n\n<p>PS. <a href=\"https://impactstory.org/EgonWillighagen\">my ImpactStory profile</a> will tell you that more than\n80% of my publications are Open Access. Not all gold yet, but I am working on changing that for some old papers.</p>",
      "summary": "Yesterday, I received a letter from the Association of Universities The Netherlands (VSNU, @deVSNU) about Open Access. The Netherlands is for research a very interesting country: it’s small, meaning we have few resources to establish and maintain high profile centers, we also believe strong education benefits from distribution, so we we have many good universities, rather than a few excelling universities. Mind you, this clouds that we absolutely do have excelling research institutes and research groups; they just are not concentrated in one university.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/vsnu.png",
      "date_published": "2014-08-30T00:00:00+00:00",
      "date_modified": "2025-02-22T00:00:00+00:00",
      "tags": ["openaccess","openscience"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/ghad4-j2824",
      "url": "https://chem-bla-ics.linkedchemistry.info/2014/05/14/jean-claude-bradley-blue-obelisk-award.html",
      "title": "Jean-Claude Bradley, Blue Obelisk award winner",
      "content_html": "<p><span style=\"width: 35%; display: block; margin-left: auto; margin-right: auto; float: right\">\n<img src=\"/assets/images/1752-153X-3-14-graphical-abstract.png\" /> <br />\nChemistry in Second Life. DOI:<a href=\"https://doi.org/10.1186/1752-153X-3-14\">10.1186/1752-153X-3-14</a>\n</span>\nThere are nowadays a lot of people talking about Open, about open access, open data, open source. In fact, some discussion on Twitter resulted\nin the realization that it is highly unlikely that any scholar has not taken advantage of Open in some way in their research in the last few\nyears. However, this is mostly due to people whom actually do, not by those who talk about it or use it.</p>\n\n<p>One of the few people in chemistry who did both promoting Open and doing Open was <a href=\"http://usefulchem.blogspot.nl/\">Jean-Claude Bradley</a>.\nYesterday, I heard the sad news that he passed away. This is a great loss to many of us and certainly to the open chemistry community.\nJean-Claude received the <a href=\"https://sourceforge.net/apps/mediawiki/blueobelisk/index.php?title=Blue_Obelisk_Awards\">Blue Obelisk award</a> for\nhis <a href=\"https://en.wikipedia.org/wiki/Open_Notebook_Science\">Open Notebook Science</a> work back in 2007 (I handed him the obelisk at the ACS\nmeeting in Chicago; thanx to <a href=\"http://www.steinbeck-molecular.de/steinblog/index.php/2014/05/14/in-memory-of-open-science-pioneer-jean-claude-bradley/\">Chris</a>\nfor taking the picture, and digging it up!) and he contributed much to the community, among which his melting point and\n<a href=\"http://chem-bla-ics.blogspot.nl/2008/11/solubility-data-in-bioclipse-1.html\">solubility data</a> for organic compounds.</p>\n\n<p><img src=\"/assets/images/JCBradley-Blue-Obelisk.jpg\" alt=\"\" /></p>\n<center>A proud me handing out the Blue Obelisk award to Jean-Claude in Chicago in 2007.\nCC-BY 2007 <a href=\"http://www.steinbeck-molecular.de/steinblog/index.php/2014/05/14/in-memory-of-open-science-pioneer-jean-claude-bradley/\">Christoph Steinbeck</a>.</center>\n\n<p>Jean-Claude and I did some work together, including a book chapter, which I liked being a trained organic chemist myself (well, just a\n<a href=\"http://chem-bla-ics.blogspot.nl/2011/06/chiral-molecules-how-cool-is-sem.html\">6 month minor</a> during my M.Sc. on supramolecular chemistry).\nI was really pleased that he had accepted to become part of the <a href=\"http://enanomapper.net/\">eNanoMapper</a> scientific advisory board, and\nI was very much looking forward to working with him again on the journal side of dissemination of nanosafety research, in his role as\neditor-in-chief of <a href=\"http://journal.chemistrycentral.com/\">Chemistry Central Journal</a>.</p>\n\n<p>Few people leave a big impression on me, but he was certainly one of them.\n<a href=\"http://friendfeed.com/jcbradley\">Let</a> <a href=\"http://www.slideshare.net/jcbradley\">his</a> <a href=\"https://www.youtube.com/user/jeanclaudebradley/videos\">extensive</a>\n<a href=\"http://usefulchem.wikispaces.com/Jean-Claude+Bradley\">work</a> <a href=\"https://twitter.com/jcbradley/\">not</a>\n<a href=\"http://journal.chemistrycentral.com/content/3/1/14\">go</a> <a href=\"http://www.jcheminf.com/content/1/1/9/abstract\">unnoticed</a>;\nthere is still a <a href=\"http://chem-bla-ics.blogspot.nl/2013/09/urgent-open-science-needs-for-drug.html\">lot to do</a> in Open chemistry.</p>\n\n<p><a href=\"http://www.chemconnector.com/2014/05/14/in-memory-of-jean-claude-bradley/\">Other</a>\n<a href=\"http://www.steinbeck-molecular.de/steinblog/index.php/2014/05/14/in-memory-of-open-science-pioneer-jean-claude-bradley/\">posts</a>\n<a href=\"http://drexel.edu/chemistry/news/archive/mourning-jean-claude-bradley-department-of-chemistry/\">about</a>\n<a href=\"http://cenblog.org/terra-sigillata/2014/05/14/mourning-open-notebook-science-pioneer-dr-jean-claude-bradley/\">this</a> loss.</p>",
      "summary": "Chemistry in Second Life. DOI:10.1186/1752-153X-3-14 There are nowadays a lot of people talking about Open, about open access, open data, open source. In fact, some discussion on Twitter resulted in the realization that it is highly unlikely that any scholar has not taken advantage of Open in some way in their research in the last few years. However, this is mostly due to people whom actually do, not by those who talk about it or use it.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/1752-153X-3-14-graphical-abstract.png",
      "date_published": "2014-05-14T00:00:00+00:00",
      "date_modified": "2014-05-14T00:00:00+00:00",
      "tags": ["blue-obelisk","enanomapper","obituary"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1186/1752-153X-3-14", "doi": "10.1186/1752-153X-3-14"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1186/1758-2946-1-9", "doi": "10.1186/1758-2946-1-9"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/j5a8t-j7z67",
      "url": "https://chem-bla-ics.linkedchemistry.info/2014/02/21/slow-publishing-innovation.html",
      "title": "Slow publishing innovation: SMILES in ACS journals",
      "content_html": "<p>Elsevier is not the only publisher with a <a href=\"https://chem-bla-ics.linkedchemistry.info/2014/02/15/elseviers-new-text-mining-initiative-is.html\">large innovation inertia</a>.\nIn fact, I think many large organizations do, particularly if there are too many interdependencies, causing too long lines. Greg Laundrum\n<a href=\"https://plus.google.com/u/0/+GregLandrum/posts/JsgLruHQ6go\">made me aware</a> that one <a href=\"http://acs.org/\">American Chemical Society</a>\njournal is now <a href=\"http://pubs.acs.org/doi/full/10.1021/jm5002056\">going to encourage</a> (not require) machine readable forms of chemical\nstructures to be included in their flagship. The reasoning by Gilson <em>et al.</em> is balanced. It is also 15 years too late. This\nquestion was relevant at the end of the last century. The technologies were already more advanced than what will now be adopted.\n15 years!!! Seriously, that’s close to the time it takes to bring a new drug on the market!</p>\n\n<p>Look at what they suggest and think about it. Include SMILES strings for structures in the paper. I very much welcome this, of course,\ndespite I am not a big fan of SMILES at all. They could have said something about <a href=\"http://opensmiles.org/spec/open-smiles.html\">OpenSMILES</a>\ntoo, which is more precise. They do say something about the InChI and InChIKey, but not that the SMILES string can more precisely reflect\nthe drawing. I wonder why they don’t go for a format that can actually capture the image, like CML or a MDL molfile. Then again, a SMILES\ncopy/pastes so nicely. Talking about slow innovation. There is zero technical reason you could not copy/paste a MDL molfile into a\nspreadsheet (and you can with many tools, in fact…)</p>\n\n<p>Now, I still have tons of questions. What tool will be used to validate the correctness and absence of ambiguity before the publication?\nWill the SMILES strings be validated at all? And at what level? Will it have to be compatible with particular tools? Does it have to be\ncompatible with OpenSMILES? Under what license will these SMILES be available (can we data mine <a href=\"https://en.wikipedia.org/wiki/Digital_object_identifier\">DOI</a>-SMILES\nlinks and openly share them)? What was the reasoning for finally adopting this? Will the journal also accept submission where both SMILES\nand other formats are provided? Will they accept or deny SMARTS strings (e.g. for Markush structures)?</p>\n\n<p>All in all, I second the others, and am happy to see this step. I do hope they do not stop here and wait again 15 years for another step.\nIn fact, they ask for input on <a href=\"mailto:jmc@jmedchem.acs.org\">jmc@jmedchem.acs.org</a>. That is double promising!</p>",
      "summary": "Elsevier is not the only publisher with a large innovation inertia. In fact, I think many large organizations do, particularly if there are too many interdependencies, causing too long lines. Greg Laundrum made me aware that one American Chemical Society journal is now going to encourage (not require) machine readable forms of chemical structures to be included in their flagship. The reasoning by Gilson et al. is balanced. It is also 15 years too late. This question was relevant at the end of the last century. The technologies were already more advanced than what will now be adopted. 15 years!!! Seriously, that’s close to the time it takes to bring a new drug on the market!",
      
      "date_published": "2014-02-21T00:00:00+00:00",
      "date_modified": "2014-02-21T00:00:00+00:00",
      "tags": ["publishing","smiles","acs"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1021/jm5002056", "doi": "10.1021/jm5002056"
            , "cito":
              
              
                [ 
                  "discusses"
                  
                 ]
              
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/5e32g-08a89",
      "url": "https://chem-bla-ics.linkedchemistry.info/2014/02/15/elseviers-new-text-mining-initiative-is.html",
      "title": "Elsevier&apos;s new text mining initiative is a step sideways",
      "content_html": "<p>Elsevier’s <a href=\"http://www.elsevier.com/about/universal-access/content-mining-policies\">new ideas on text mining</a> are getting a lot\n<a href=\"http://www.nature.com/news/elsevier-opens-its-papers-to-text-mining-1.14659\">attention</a> now. Sadly, they get it wrong, again.\nOn the bright side, all other publishers, which are <a href=\"http://www.nature.com/news/elsevier-opens-its-papers-to-text-mining-1.14659\">expected to follow this year</a>,\ncan learn from this mistake.</p>\n\n<p>Because if done right, the publishers can even help forward science, despite crippling progress. That sound harsh, and surely\nthey have done a lot of good for science. In fact, we would not be where we are now without the publishers. But things have\nchanged. With the internet anyone can be publisher. We see this with blogs, we see this with <a href=\"http://lulu.com/\">Lulu.com</a>.\nAnd, unlike some misinformed people think, this is independent from peer review. Publishers were important because they\nprovide a channel to disseminate knowledge. But paper publishing is no longer the most efficient way. In fact, in terms\nof value, paper has been overtaken for some years now.</p>\n\n<p>And we need more added value. Not the shipping of the knowledge, but keeping up is the issue. And there too, publishing is\ninefficient: human language is nice for sharing ideas and concepts, but it fails at disseminating raw facts: measured data.\nAnyone who has tried creating a data set to find patterns knows this: extracting the information is a lot of effort, mostly\ncaused by the broken paper publishing model. This is most apparent in some research domain where data repositories exist,\nbut sadly this applies to a small minority of data types.</p>\n\n<p>Now, text mining seems in that sense the wrong question: why trying to recover knowledge that should have gone into\nrepositories in the first places. I agree. However, we cannot just throw away all the knowledge kept in these papers, and\ncertainly not as long as people keep insisting on seeing only papers as scientific success. We are slowly seeing this\nimprove, but only very slowly. Things that were apparent to me as a student 20 years ago, are the things that scholars are\nstill struggling with today. Depressing indeed, but it does help you grow a good sense of patience.</p>\n\n<p>And now, Elsevier wants to make a step forward, wants to be leading in science dissemination again. And they come up with\nan intermediate solution between actual knowledge dissemination and profit: they come up with a license-model, increasing\ntheir monopoly on knowledge and trying to lure the scientist into a non-commercial license. From a money-making\nperspective this is what society expects from them. From someone who likes to see societal problems solves, this is\ndisappointing. They had a great opportunity to lead the field.</p>\n\n<p>Now, is all bad? Not at all. It’s a step, but not the step I would have liked to see. It will be a success: because the\nCC-BY-NC data that will come out of it, will be part of the web of knowledge. No one will care about the NC part, except\nall those SMEs in Europe that work on products to help society which will find it much harder to collaborate with other\ncompanies, because they cannot share the knowledge the created from analyzing the literature (does Elsevier want a monopoly\nin this analysis?).</p>\n\n<p>Nor will many in the academic community complain. Surely, those that have worried about this, they will. But the scholar\nat universities do not care about NC licenses. After all, universities are not commercial. Asking a student to pay\n30 thousand euro for a year is surely not commercial. That is the consensus. But I note that this consensus has not be\ntried in court, and I am looking forward to the day it will happen. Elsevier will likely not challenge this, and silently\naccept this situation. Just like Microsoft never made a big deal out of people copying office versions of their operating\nsystem for at home: you do not bit the hand that feeds you (too hard). You rather\n<a href=\"http://svpow.com/2013/12/06/elsevier-is-taking-down-papers-from-academia-edu/\">go after others</a>, like\n<a href=\"http://academia.edu/\">Academia.edu</a>. It will not be scholar Elsevier will enforce the NC on, and it will not be large\ncompanies either: if any, it will be the SMEs. Support them, and do not agree with the license.</p>\n\n<p>Well, it was a nice opportunity for Elsevier. I only see my choice to sign <a href=\"http://thecostofknowledge.com/\">The Cost of Knowledge</a>\nreaffirmed.</p>\n\n<p>The choice of the NC clause is totally useless in any context of dissemination. I call for Elsevier to at least add this\noption, if they are serious about improving: text mining is provided to subscribers, via a decent API, adhering to:</p>\n\n<ol>\n  <li>Facts extracted from literature are licensed CCZero and attribution is paid (facts are copyright free in most parts of the world)</li>\n  <li>Output can contain “snippets” of the original text under international “fair use” concepts, and licensed as CC-BY</li>\n</ol>\n\n<p>Any scientist is expected to attribute the source of information in the first place, and it is kind of sad Elsevier is on\nsuch bad foot with their audience that they feel this must be enforced via a contract, but that is not a problem. I also\nsee no reason to deviate from international law about “fair use”; I do understand this is probably an ill defined concept,\nbut 200 characters seems pretty limited to me, as facts can be spread of sentences longer than this.</p>\n\n<p>I know that many will disagree on the CCZero license, and many will feel awkward about giving away data. It has value, right?\nIt’s your property, right? I am not going to argue against that. But personally I do not understand how it aligns with the\nidea of scientific dissemination. Holding back knowledge as part of making knowledge available? How exactly does that make\nsense? Importantly, just like with software, Open is not the same as Without-Cost! Hosting and sharing Open Data also costs\nmoney (particularly, if it is 1 TB of data). Those are different concepts.</p>\n\n<p>However, I also stress that the scholars have a great responsibility hear: I call for all Elsevier journal editorial\nboards to not accept this deal either. In fact, all editorial boards have great say in this: it’s them who make a journal\nvaluable. I also call all scholars to be aware the consequences of selling away your copyright. That is a choice in the\ncurrent era. There are plenty of means to disseminate your science <em>without</em> (much) cost, and APC is a flawed argument.</p>\n\n<p>The current step by Elsevier, after all the effort from many, is not a step forward, it’s a step sideways. Elsevier,\nI know you can do better. Are you willing?</p>\n\n<p>I am willing, and have been supporting science by making data available as CCZero. However, I also am happy if others\nare not ready for this, or have other reasons not to. It is not always under their control. For example, I have heard\nstories where data has been used by politicians as small change to get industry to test their products for safety.\nI also accept that getting funding as a scholar is hard work, often not paid for, and that it is hard to give away\nyour only security of a future career. Then again, we all know what data is valuable, has already given its value,\nor is of no use to you anymore. And this latter case I ask you to consider to make data available: data of no use\nto you anymore, but that could be valuable to others. Make it available, and get cited, and get value out of it,\nyou would not have received when it sat on some hard disk, and probably is lost in five years.</p>\n\n<p>I also fully understand this is my opinion. Thus, not all data I make available is CCZero: I fully respect copyright\nand license from others; in fact, I often feel I do much more than scientists which object to Open licenses, which\njust take data as their own as they please. That is why I insist often on clear copyright and license information.\nBecause if missing, default (local) law applies.</p>\n\n<p>If you want to read more analysis, please refer to the following posts:</p>\n\n<ol>\n  <li><a href=\"http://www.nature.com/news/elsevier-opens-its-papers-to-text-mining-1.14659\">Elsevier opens its papers to text-mining</a></li>\n  <li><a href=\"http://blogs.ch.cam.ac.uk/pmr/2014/02/06/elseviers-tdm-terms-tac-can-they-force-us-to-copyright-data-2/\">#elsevier’s TDM Terms (TaC): Can they force us to copyright data? (2)</a></li>\n  <li><a href=\"http://blogs.ch.cam.ac.uk/pmr/2014/02/10/natures-recent-news-article-on-text-and-data-mining-was-an-unacceptable-marketing-exercise-i-ask-them-to-renounce-licensing/\">Nature’s recent “news” article on Text and Data Mining was unacceptable [redacted]; I ask them to renounce licensing.</a></li>\n  <li><a href=\"http://blogs.ch.cam.ac.uk/pmr/2014/02/10/natures-recent-news-article-on-text-and-data-mining-was-an-unacceptable-marketing-exercise-i-ask-them-to-renounce-licensing/#comment-150096\">“Dear Peter,”, Richard van Noorden</a></li>\n  <li><a href=\"http://blogs.ch.cam.ac.uk/pmr/2014/02/14/reply-to-richard-van-noorden/\">Reply to Richard van Noorden</a></li>\n</ol>",
      "summary": "Elsevier’s new ideas on text mining are getting a lot attention now. Sadly, they get it wrong, again. On the bright side, all other publishers, which are expected to follow this year, can learn from this mistake.",
      
      "date_published": "2014-02-15T00:00:00+00:00",
      "date_modified": "2014-02-15T00:00:00+00:00",
      "tags": ["publishing","textmining"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1038/506017a", "doi": "10.1038/506017a"
            , "cito":
              
              
                [ 
                  "citesAsEvidence"
                  
                 ]
              
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/ayrym-a5h65",
      "url": "https://chem-bla-ics.linkedchemistry.info/2013/12/06/programming-in-life-sciences-13-another.html",
      "title": "Programming in the Life Sciences #13: Another screenshot",
      "content_html": "<p>I got a one more source code zip file from the <a href=\"http://www.maastrichtuniversity.nl/web/Schools/MaastrichtScienceProgramme.htm\">Maastricht Science Programme</a>\nstudents (see also the <a href=\"http://chem-bla-ics.blogspot.nl/2013/12/programming-in-life-sciences-12-first.html\">first two screenshots</a>). Vincent and Błażej extended the\n<a href=\"http://d3js.org/\">d3.js</a> tree view, showing classification information from <a href=\"http://www.ebi.ac.uk/chebi/\">ChEBI</a> (they also\n<a href=\"https://github.com/openphacts/ops.js/pull/2\">submitted</a> <a href=\"https://github.com/openphacts/ops.js/pull/3\">three</a>\n<a href=\"https://github.com/openphacts/ops.js/pull/4\">patches</a> to the <a href=\"http://www.openphacts.org/\">Open PHACTS</a> <a href=\"https://github.com/openphacts/ops.js\">ops.js</a>):</p>\n\n<p><img src=\"/assets/images/blazejAndVincent.png\" alt=\"\" /></p>",
      "summary": "I got a one more source code zip file from the Maastricht Science Programme students (see also the first two screenshots). Vincent and Błażej extended the d3.js tree view, showing classification information from ChEBI (they also submitted three patches to the Open PHACTS ops.js):",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/blazejAndVincent.png",
      "date_published": "2013-12-06T00:10:00+00:00",
      "date_modified": "2013-12-06T00:10:00+00:00",
      "tags": ["pra3006","openphacts","chebi"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/yj6nf-x5998",
      "url": "https://chem-bla-ics.linkedchemistry.info/2013/12/06/programming-in-life-sciences-12-first.html",
      "title": "Programming in the Life Sciences #12: First screenshots",
      "content_html": "<p>Yesterday was the last <a href=\"http://chem-bla-ics.blogspot.nl/2013/10/programming-in-life-sciences-1-six-day.html\">Programming in the Life Sciences</a> practical day,\nand the 2nd and 3rd year B.Sc. <a href=\"http://www.maastrichtuniversity.nl/web/Schools/MaastrichtScienceProgramme.htm\">MSC</a> students presented their results yesterday\nafternoon. I am impressed with the results that they reached in only six practical days. I have suggested them to upload the presentations to SlideShare or\n<a href=\"http://figshare.com/\">FigShare</a> (with the advantage that you get a DOI), and asked them to send them their tools. Below are some screenshots.</p>\n\n<p>The first app is by Tim and Taís, and look up activities from the <a href=\"http://www.openphacts.org/\">Open PHACTS</a> platform and filters it for activities related\nto a set of five anti-oxidants (see also <a href=\"http://figshare.com/articles/Tais_and_Tim_Final_Presentation_pptx/870474\">their FigShare</a>):</p>\n\n<p><img src=\"/assets/images/timAndTais.png\" alt=\"\" /></p>\n\n<p>The next app is by Janneke and Lukas and uses the Open PHACTS <a href=\"https://dev.openphacts.org/\">API</a> to report on single protein targets for the compound\nthe user enters (see also <a href=\"http://www.slideshare.net/lukasfreedaheimfriedeheim/target-search-janneke-mes-lukas-friedeheim\">their SlideShare</a>):</p>\n\n<p><img src=\"/assets/images/jannekeAndLukas.png\" alt=\"\" /></p>\n\n<p>More apps will follow soon.</p>",
      "summary": "Yesterday was the last Programming in the Life Sciences practical day, and the 2nd and 3rd year B.Sc. MSC students presented their results yesterday afternoon. I am impressed with the results that they reached in only six practical days. I have suggested them to upload the presentations to SlideShare or FigShare (with the advantage that you get a DOI), and asked them to send them their tools. Below are some screenshots.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/jannekeAndLukas.png",
      "date_published": "2013-12-06T00:00:00+00:00",
      "date_modified": "2013-12-06T00:00:00+00:00",
      "tags": ["pra3006","openphacts"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.6084/m9.figshare.870474.v1", "doi": "10.6084/m9.figshare.870474.v1"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/s6am1-1sa79",
      "url": "https://chem-bla-ics.linkedchemistry.info/2013/11/08/looking-for-phd-and-postdoc-to-work-on.html",
      "title": "Looking for a PhD and a Postdoc to work on Open Science Nanosafety",
      "content_html": "<p>I am happy that I got my first research grant awarded (EU FP7), which should start after all the contracts are signed,\netc, somewhere early 2014. The project is about setting up data needs for the analysis of nanosafety studies. And for this,\nI have the below two position vacancies available now. If you are keen on doing Open Science (CDK, Bioclipse, OpenTox, WikiPathways, …, …),\nworking within the European <a href=\"http://www.nanosafetycluster.eu/\">NanoSafety Cluster</a>, and have an affinity with understanding the\nsystems biology of nanomaterials, then you may be interested in applying.</p>\n\n<p><strong>PhD position</strong></p>\n\n<p><img src=\"/assets/images/vac1.png\" alt=\"\" /></p>\n\n<p><strong>Postdoc position</strong></p>\n\n<p><img src=\"/assets/images/vac2.png\" alt=\"\" /></p>",
      "summary": "I am happy that I got my first research grant awarded (EU FP7), which should start after all the contracts are signed, etc, somewhere early 2014. The project is about setting up data needs for the analysis of nanosafety studies. And for this, I have the below two position vacancies available now. If you are keen on doing Open Science (CDK, Bioclipse, OpenTox, WikiPathways, …, …), working within the European NanoSafety Cluster, and have an affinity with understanding the systems biology of nanomaterials, then you may be interested in applying.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/vac1.png",
      "date_published": "2013-11-08T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["nanosafety","enanomapper","opentox","ontology"],
      
      "_funding": [{"award": { "title" : "eNanoMapper - A Database and Ontology Framework for Nanomaterials Design and Safety Assessment", "acronym" : "eNanoMapper", "uri" : "cordis.project:604134" }, "funder": { "name": "European Commission", "ror": "00k4n6c32" } }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/tqs3s-x7289",
      "url": "https://chem-bla-ics.linkedchemistry.info/2013/10/30/programming-in-life-sciences-11-html.html",
      "title": "Programming in the Life Sciences #11: HTML",
      "content_html": "<p><a href=\"https://en.wikipedia.org/wiki/HTML\">HTML</a> (HyperText Markup Language), the language of the web,\nis no longer the only language of the web. But it still is the primary language in which source\ncode of webpages is shared. Originally, HTML pages were always static: the only HTML source of a\nweb page was that was downloaded from a website. Nowadays, much HTML the is visualized in your\nweb browser, is generated on the fly with JavaScript. In fact, that is exactly what you will\nlearn to do in this course.</p>\n\n<p>HTML has many dialects, and HTML5 is the upcoming next version. The features have become so\nextensive that we will not have capture half of them; instead, we will stick to the bare\nminimum needed. But even at an minimum, writing a web page with HTML code is basically writing\nsource code. The compiled version is the view of the webpage your web browser shows you. One\nimportant difference is that HTML is much more like a data model representation than it is like\ncomputational instructions. That is, rather than saying things like <code class=\"language-plaintext highlighter-rouge\">put(\"String\", xCoord, yCoord)</code>,\nwe define what is to be shown in in what order with general instructions. Well, in pure HTML\nthat is. <a href=\"https://en.wikipedia.org/wiki/CSS\">Cascading Style Sheets</a> (CSS) is quite outside the\nscope of this course.</p>\n\n<p>A minimal HTML page looks like:</p>\n\n<div class=\"language-html highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nt\">&lt;html&gt;</span>\n  <span class=\"nt\">&lt;head&gt;</span>\n  <span class=\"nt\">&lt;/head&gt;</span>\n  <span class=\"nt\">&lt;body&gt;</span>\n  Hello world!\n  <span class=\"nt\">&lt;/body&gt;</span>\n<span class=\"nt\">&lt;/html&gt;</span>\n</code></pre></div></div>\n\n<p>When we think about this structure, we notice that it is not unlike the key-value maps we\ncovered earlier. For example, compare it to this\n<a href=\"http://chem-bla-ics.blogspot.nl/2013/10/programming-in-life-sciences-10.html\">JSON</a>:</p>\n\n<div class=\"language-json highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"p\">{</span><span class=\"w\">\n  </span><span class=\"nl\">\"html\"</span><span class=\"p\">:</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\">\n    </span><span class=\"nl\">\"head\"</span><span class=\"p\">:{},</span><span class=\"w\">\n    </span><span class=\"nl\">\"body\"</span><span class=\"p\">:{</span><span class=\"err\">value:</span><span class=\"s2\">\"Hello world!\"</span><span class=\"p\">}</span><span class=\"w\">\n  </span><span class=\"p\">}</span><span class=\"w\">\n</span><span class=\"p\">}</span><span class=\"w\">\n</span></code></pre></div></div>\n\n<p>Even if we introduce HTML attributes:</p>\n\n<div class=\"language-html highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nt\">&lt;html&gt;</span>\n  <span class=\"nt\">&lt;head&gt;</span>\n  <span class=\"nt\">&lt;/head&gt;</span>\n  <span class=\"nt\">&lt;body&gt;</span>\n  <span class=\"nt\">&lt;h1&gt;&lt;a</span> <span class=\"na\">name=</span><span class=\"s\">\"hello\"</span><span class=\"nt\">&gt;</span>Hello world!<span class=\"nt\">&lt;/a&gt;&lt;/h1&gt;</span>\n  <span class=\"nt\">&lt;/body&gt;</span>\n<span class=\"nt\">&lt;/html&gt;</span>\n</code></pre></div></div>\n\n<p>The JSON equivalent would be:</p>\n\n<div class=\"language-json highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"p\">{</span><span class=\"w\">\n  </span><span class=\"nl\">\"html\"</span><span class=\"p\">:</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\">\n    </span><span class=\"nl\">\"head\"</span><span class=\"p\">:{},</span><span class=\"w\">\n    </span><span class=\"nl\">\"body\"</span><span class=\"p\">:{</span><span class=\"w\">\n      </span><span class=\"nl\">\"h1\"</span><span class=\"p\">:{</span><span class=\"w\">\n        </span><span class=\"nl\">\"a\"</span><span class=\"p\">:{</span><span class=\"w\">\n          </span><span class=\"err\">attributes:</span><span class=\"p\">{</span><span class=\"nl\">\"name\"</span><span class=\"p\">:</span><span class=\"s2\">\"hello\"</span><span class=\"p\">},</span><span class=\"w\">\n          </span><span class=\"err\">value:</span><span class=\"s2\">\"Hello world!\"</span><span class=\"w\">\n        </span><span class=\"p\">}</span><span class=\"w\">\n      </span><span class=\"p\">}</span><span class=\"w\">\n    </span><span class=\"p\">}</span><span class=\"w\">\n  </span><span class=\"p\">}</span><span class=\"w\">\n</span><span class=\"p\">}</span><span class=\"w\">\n</span></code></pre></div></div>\n\n<p>So, while these are quite different languages than programming languages, we can clearly\nsee they have been made up by the same (computer science) people. But in my opinion, this\nis an advantage: because we only need to learn the underlying patterns and can then much\nmore easily switch between different language.</p>\n\n<p>Now, returning to the HTML example, we introduce a bit of terminology. Let’s start with\nthe last example:</p>\n\n<div class=\"language-html highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nt\">&lt;h1&gt;&lt;a</span> <span class=\"na\">name=</span><span class=\"s\">\"hello\"</span><span class=\"nt\">&gt;</span>Hello world!<span class=\"nt\">&lt;/a&gt;&lt;/h1&gt;</span>\n</code></pre></div></div>\n\n<p>This HTML code example shows the <code class=\"language-plaintext highlighter-rouge\">&lt;h1&gt;</code> <strong>element</strong> which has one <strong>child element</strong>\n<code class=\"language-plaintext highlighter-rouge\">&lt;a&gt;</code>. This child element has an <strong>attribute</strong> <code class=\"language-plaintext highlighter-rouge\">@name</code>. Elements can contain string content,\nsuch as the <code class=\"language-plaintext highlighter-rouge\">&lt;a&gt;</code> element has, and one or more child elements (and any combination of that).\nAttributes can only have string content. The HTML specification defines in detail which\nelements can be child elements of other elements. For example, the <code class=\"language-plaintext highlighter-rouge\">&lt;head&gt;</code> element can\nonly be a child element of <code class=\"language-plaintext highlighter-rouge\">&lt;html&gt;</code>. Similarly, each HTML element can only have specific\nattributes, though some attributes can be attached to any element.</p>\n\n<p>There is plenty of documentation on the web, but there are also tools that can help us write\nHTML. For example, the <a href=\"http://validator.w3.org/\">http://validator.w3.org/</a>. This website\ndetects errors in your HTML code, and is quite helpful if you are new to editing HTML, as\nwell as useful if you have a lot of HTML experience.</p>\n\n<p>HTML elements you may find useful include the following:</p>\n\n<ul>\n  <li><code class=\"language-plaintext highlighter-rouge\">&lt;h1&gt;</code>, <code class=\"language-plaintext highlighter-rouge\">&lt;h2&gt;</code>, …, <code class=\"language-plaintext highlighter-rouge\">&lt;h5&gt;</code>: these are header and can be used to make sections</li>\n  <li><code class=\"language-plaintext highlighter-rouge\">&lt;p&gt;</code>: indicates a paragraph</li>\n  <li><code class=\"language-plaintext highlighter-rouge\">&lt;div id=\"someID\"&gt;</code>: indicates a section of text. The content of any element with an id attribute can be replaced by any appropriate HTML content with JavaScript</li>\n  <li><code class=\"language-plaintext highlighter-rouge\">&lt;a href=\"http://...\"&gt;some link&lt;/a&gt;</code>: this is used to make hyperlinks, href means hyperlink reference</li>\n  <li><code class=\"language-plaintext highlighter-rouge\">&lt;a name=\"mark1\"&gt;some text&lt;/a&gt;</code>: this is used to create bookmarks. with <code class=\"language-plaintext highlighter-rouge\">&lt;a href=\"#mark1\"&gt;jump to section Mark 1&lt;/a&gt;</code></li>\n  <li><code class=\"language-plaintext highlighter-rouge\">&lt;script&gt;</code>: used to include JavaScript code in your HTML page</li>\n  <li><code class=\"language-plaintext highlighter-rouge\">&lt;head&gt;</code>: this HTML blob contains metadata, a list of libraries to be loaded, but also JavaScript which is executed before the HTML <code class=\"language-plaintext highlighter-rouge\">&lt;body&gt;</code> is processed</li>\n  <li><code class=\"language-plaintext highlighter-rouge\">&lt;body&gt;</code>: this contains the HTML that is depicted in your browser window</li>\n</ul>\n\n<p>Keep the HTML simple; the programming is more important.</p>\n\n<p><strong>Exercise</strong>: below is part of the HTML/JavaScript <a href=\"https://github.com/egonw/mscpils/blob/master/example1.html\">source code</a>\nbehind <a href=\"http://chem-bla-ics.blogspot.nl/2013/10/programming-in-life-sciences-5.html\">this app</a>.\nPlease indicate which lines are HTML source code, and what is JavaScript.</p>\n\n<div class=\"language-html highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"cp\">&lt;!DOCTYPE HTML PUBLIC\n  \"-//W3C//DTD HTML 4.01 Transitional//EN\"\n  \"http://www.w3.org/TR/html4/loose.dtd\"&gt;</span>\n<span class=\"nt\">&lt;html&gt;</span>\n<span class=\"c\">&lt;!--\n\nCopyright (c) 2013  Egon Willighagen &lt;egon.willighagen@maastrichtuniversity.nl&gt;\n\n Permission is hereby granted, free of charge, to any person\n obtaining a copy of this software and associated documentation\n files (the \"Software\"), to deal in the Software without\n restriction, including without limitation the rights to use,\n copy, modify, merge, publish, distribute, sublicense, and/or sell\n copies of the Software, and to permit persons to whom the\n Software is furnished to do so, subject to the following\n conditions:\n\n The above copyright notice and this permission notice shall be\n included in all copies or substantial portions of the Software.\n\n THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND,\n EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES\n OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND\n NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT\n HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,\n WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING\n FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR\n OTHER DEALINGS IN THE SOFTWARE.\n\n--&gt;</span>\n<span class=\"nt\">&lt;head&gt;</span>\n  <span class=\"nt\">&lt;title&gt;</span>OpenPHACTS Jasmine Spec Runner<span class=\"nt\">&lt;/title&gt;</span>\n  <span class=\"nt\">&lt;script </span><span class=\"na\">src=</span><span class=\"s\">\"lib/jquery-1.9.1.min.js\"</span><span class=\"nt\">&gt;&lt;/script&gt;</span>\n  <span class=\"nt\">&lt;script </span><span class=\"na\">type=</span><span class=\"s\">\"text/javascript\"</span> <span class=\"na\">src=</span><span class=\"s\">\"lib/purl.js\"</span><span class=\"nt\">&gt;&lt;/script&gt;</span>\n\n  <span class=\"c\">&lt;!-- include source files here... --&gt;</span>\n  <span class=\"nt\">&lt;script </span><span class=\"na\">type=</span><span class=\"s\">\"text/javascript\"</span> <span class=\"na\">src=</span><span class=\"s\">\"src/OPS.js\"</span><span class=\"nt\">&gt;&lt;/script&gt;</span>\n  <span class=\"nt\">&lt;script </span><span class=\"na\">type=</span><span class=\"s\">\"text/javascript\"</span> <span class=\"na\">src=</span><span class=\"s\">\"src/ConceptWikiSearch.js\"</span><span class=\"nt\">&gt;&lt;/script&gt;</span>\n\n  <span class=\"c\">&lt;!-- setup --&gt;</span>\n  <span class=\"nt\">&lt;script </span><span class=\"na\">type=</span><span class=\"s\">\"text/javascript\"</span><span class=\"nt\">&gt;</span>\n  <span class=\"c1\">// get the app_key and app_id from the webpage call --&gt;</span>\n<span class=\"kd\">var</span> <span class=\"nx\">prmstr</span> <span class=\"o\">=</span> <span class=\"nb\">window</span><span class=\"p\">.</span><span class=\"nx\">location</span><span class=\"p\">.</span><span class=\"nx\">search</span><span class=\"p\">.</span><span class=\"nf\">substr</span><span class=\"p\">(</span><span class=\"mi\">1</span><span class=\"p\">);</span>\n<span class=\"kd\">var</span> <span class=\"nx\">prmarr</span> <span class=\"o\">=</span> <span class=\"nx\">prmstr</span><span class=\"p\">.</span><span class=\"nf\">split </span><span class=\"p\">(</span><span class=\"dl\">\"</span><span class=\"s2\">&amp;</span><span class=\"dl\">\"</span><span class=\"p\">);</span>\n<span class=\"kd\">var</span> <span class=\"nx\">params</span> <span class=\"o\">=</span> <span class=\"p\">{};</span>\n<span class=\"k\">for</span> <span class=\"p\">(</span> <span class=\"kd\">var</span> <span class=\"nx\">i</span> <span class=\"o\">=</span> <span class=\"mi\">0</span><span class=\"p\">;</span> <span class=\"nx\">i</span> <span class=\"o\">&lt;</span> <span class=\"nx\">prmarr</span><span class=\"p\">.</span><span class=\"nx\">length</span><span class=\"p\">;</span> <span class=\"nx\">i</span><span class=\"o\">++</span><span class=\"p\">)</span> <span class=\"p\">{</span>\n    <span class=\"kd\">var</span> <span class=\"nx\">tmparr</span> <span class=\"o\">=</span> <span class=\"nx\">prmarr</span><span class=\"p\">[</span><span class=\"nx\">i</span><span class=\"p\">].</span><span class=\"nf\">split</span><span class=\"p\">(</span><span class=\"dl\">\"</span><span class=\"s2\">=</span><span class=\"dl\">\"</span><span class=\"p\">);</span>\n    <span class=\"nx\">params</span><span class=\"p\">[</span><span class=\"nx\">tmparr</span><span class=\"p\">[</span><span class=\"mi\">0</span><span class=\"p\">]]</span> <span class=\"o\">=</span> <span class=\"nx\">tmparr</span><span class=\"p\">[</span><span class=\"mi\">1</span><span class=\"p\">];</span>\n<span class=\"p\">}</span>\n  <span class=\"nt\">&lt;/script&gt;</span>\n<span class=\"nt\">&lt;/head&gt;</span>\n\n<span class=\"nt\">&lt;body&gt;</span>\n  <span class=\"nt\">&lt;h3&gt;</span>Output<span class=\"nt\">&lt;/h3&gt;</span>\n  <span class=\"nt\">&lt;h3&gt;</span>Search Results<span class=\"nt\">&lt;/h3&gt;</span>\n  <span class=\"nt\">&lt;p&gt;&lt;div</span> <span class=\"na\">id=</span><span class=\"s\">\"table\"</span><span class=\"nt\">&gt;&lt;/div&gt;&lt;/p&gt;</span>\n  <span class=\"nt\">&lt;h3&gt;</span>Compound Details<span class=\"nt\">&lt;/h3&gt;</span>\n  <span class=\"nt\">&lt;p&gt;&lt;div</span> <span class=\"na\">id=</span><span class=\"s\">\"details\"</span><span class=\"nt\">&gt;&lt;/div&gt;&lt;/p&gt;</span>\n  <span class=\"nt\">&lt;h3&gt;</span>JSON reply<span class=\"nt\">&lt;/h3&gt;</span>\n  <span class=\"nt\">&lt;p&gt;&lt;div</span> <span class=\"na\">id=</span><span class=\"s\">\"json\"</span><span class=\"nt\">&gt;</span>Nothing yet<span class=\"nt\">&lt;/div&gt;&lt;/p&gt;</span>\n  <span class=\"nt\">&lt;script </span><span class=\"na\">type=</span><span class=\"s\">\"text/javascript\"</span><span class=\"nt\">&gt;</span>\n<span class=\"kd\">var</span> <span class=\"nx\">searcher</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"nx\">Openphacts</span><span class=\"p\">.</span><span class=\"nc\">ConceptWikiSearch</span><span class=\"p\">(</span>\n  <span class=\"dl\">\"</span><span class=\"s2\">https://beta.openphacts.org</span><span class=\"dl\">\"</span><span class=\"p\">,</span>\n  <span class=\"nx\">params</span><span class=\"p\">[</span><span class=\"dl\">\"</span><span class=\"s2\">app_id</span><span class=\"dl\">\"</span><span class=\"p\">],</span> <span class=\"nx\">params</span><span class=\"p\">[</span><span class=\"dl\">\"</span><span class=\"s2\">app_key</span><span class=\"dl\">\"</span><span class=\"p\">]</span>\n<span class=\"p\">);</span>\n<span class=\"kd\">var</span> <span class=\"nx\">callback</span> <span class=\"o\">=</span> <span class=\"kd\">function</span><span class=\"p\">(</span><span class=\"nx\">success</span><span class=\"p\">,</span> <span class=\"nx\">status</span><span class=\"p\">,</span> <span class=\"nx\">response</span><span class=\"p\">){</span>\n  <span class=\"nb\">document</span><span class=\"p\">.</span><span class=\"nf\">getElementById</span><span class=\"p\">(</span><span class=\"dl\">\"</span><span class=\"s2\">json</span><span class=\"dl\">\"</span><span class=\"p\">).</span><span class=\"nx\">innerHTML</span> <span class=\"o\">=</span> <span class=\"nx\">JSON</span><span class=\"p\">.</span><span class=\"nf\">stringify</span><span class=\"p\">(</span><span class=\"nx\">response</span><span class=\"p\">);</span>\n  <span class=\"nx\">html</span> <span class=\"o\">=</span> <span class=\"dl\">\"</span><span class=\"s2\">&lt;table&gt;</span><span class=\"dl\">\"</span><span class=\"p\">;</span>\n  <span class=\"k\">for</span> <span class=\"p\">(</span><span class=\"kd\">var</span> <span class=\"nx\">i</span><span class=\"o\">=</span><span class=\"mi\">0</span><span class=\"p\">;</span> <span class=\"nx\">i</span><span class=\"o\">&lt;</span><span class=\"nx\">response</span><span class=\"p\">.</span><span class=\"nx\">length</span><span class=\"p\">;</span> <span class=\"nx\">i</span><span class=\"o\">++</span><span class=\"p\">)</span> <span class=\"p\">{</span>\n    <span class=\"nx\">html</span> <span class=\"o\">+=</span> <span class=\"dl\">\"</span><span class=\"s2\">&lt;tr&gt;</span><span class=\"dl\">\"</span><span class=\"p\">;</span>\n    <span class=\"nx\">html</span> <span class=\"o\">+=</span> <span class=\"dl\">\"</span><span class=\"s2\">&lt;td&gt;</span><span class=\"dl\">\"</span><span class=\"p\">;</span>\n    <span class=\"nx\">html</span> <span class=\"o\">+=</span> <span class=\"dl\">\"</span><span class=\"s2\">Name: &lt;span&gt;</span><span class=\"dl\">\"</span> <span class=\"o\">+</span>\n      <span class=\"nx\">response</span><span class=\"p\">[</span><span class=\"nx\">i</span><span class=\"p\">].</span><span class=\"nx\">prefLabel</span> <span class=\"o\">+</span>\n      <span class=\"dl\">\"</span><span class=\"s2\">&lt;/span&gt;</span><span class=\"dl\">\"</span><span class=\"p\">;</span>\n    <span class=\"nx\">html</span> <span class=\"o\">+=</span> <span class=\"dl\">\"</span><span class=\"s2\">&lt;/td&gt;</span><span class=\"dl\">\"</span><span class=\"p\">;</span>\n    <span class=\"nx\">html</span> <span class=\"o\">+=</span> <span class=\"dl\">\"</span><span class=\"s2\">&lt;/tr&gt;</span><span class=\"dl\">\"</span><span class=\"p\">;</span>\n  <span class=\"p\">}</span>\n  <span class=\"nx\">html</span> <span class=\"o\">+=</span> <span class=\"dl\">\"</span><span class=\"s2\">&lt;/table&gt;</span><span class=\"dl\">\"</span><span class=\"p\">;</span>\n  <span class=\"nb\">document</span><span class=\"p\">.</span><span class=\"nf\">getElementById</span><span class=\"p\">(</span><span class=\"dl\">\"</span><span class=\"s2\">table</span><span class=\"dl\">\"</span><span class=\"p\">).</span><span class=\"nx\">innerHTML</span> <span class=\"o\">=</span> <span class=\"nx\">html</span><span class=\"p\">;</span>\n<span class=\"p\">};</span>\n<span class=\"nx\">searcher</span><span class=\"p\">.</span><span class=\"nf\">byTag</span><span class=\"p\">(</span>\n  <span class=\"dl\">'</span><span class=\"s1\">Aspirin</span><span class=\"dl\">'</span><span class=\"p\">,</span> <span class=\"dl\">'</span><span class=\"s1\">5</span><span class=\"dl\">'</span><span class=\"p\">,</span> <span class=\"dl\">'</span><span class=\"s1\">4</span><span class=\"dl\">'</span><span class=\"p\">,</span>\n  <span class=\"dl\">'</span><span class=\"s1\">07a84994-e464-4bbf-812a-a4b96fa3d197</span><span class=\"dl\">'</span><span class=\"p\">,</span>\n  <span class=\"nx\">callback</span>\n<span class=\"p\">);</span>\n  <span class=\"nt\">&lt;/script&gt;</span>\n<span class=\"nt\">&lt;/body&gt;</span>\n<span class=\"nt\">&lt;/html&gt;</span>\n</code></pre></div></div>",
      "summary": "HTML (HyperText Markup Language), the language of the web, is no longer the only language of the web. But it still is the primary language in which source code of webpages is shared. Originally, HTML pages were always static: the only HTML source of a web page was that was downloaded from a website. Nowadays, much HTML the is visualized in your web browser, is generated on the fly with JavaScript. In fact, that is exactly what you will learn to do in this course.",
      
      "date_published": "2013-10-30T00:20:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["pra3006","html"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/xdnrb-rrc91",
      "url": "https://chem-bla-ics.linkedchemistry.info/2013/10/30/programming-in-life-sciences-10.html",
      "title": "Programming in the Life Sciences #10: JavaScript Object Notation (JSON)",
      "content_html": "<p>As said, <a href=\"https://en.wikipedia.org/wiki/JSON\">JSON</a> is the format we will use as serialization format\nfor answers given by the <a href=\"https://dev.openphacts.org/docs\">Open PHACTS LDA</a>. The API actually supports\nXML, RDF, HTML, and TSV too, but I think JSON is a good balance between expressiveness and compactness.\nMoreover, and perhaps a much better argument, JSON works very well in a JavaScript environment: it is\nvery easy to convert the serialization into a data model:</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kd\">var</span> <span class=\"nx\">jsonData</span> <span class=\"o\">=</span> <span class=\"nx\">JSON</span><span class=\"p\">.</span><span class=\"nf\">parse</span><span class=\"p\">(</span><span class=\"nx\">jsonString</span><span class=\"p\">);</span>\n</code></pre></div></div>\n\n<p>Now, we previously covered maps. Maps have keys and values: the keys unlock a particular value.\nFor example, take this JavaScript:</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kd\">var</span> <span class=\"nx\">map</span> <span class=\"o\">=</span> <span class=\"p\">{</span> <span class=\"dl\">\"</span><span class=\"s2\">key</span><span class=\"dl\">\"</span><span class=\"p\">:</span> <span class=\"dl\">\"</span><span class=\"s2\">value</span><span class=\"dl\">\"</span><span class=\"p\">,</span> <span class=\"dl\">\"</span><span class=\"s2\">key2</span><span class=\"dl\">\"</span><span class=\"p\">:</span> <span class=\"dl\">\"</span><span class=\"s2\">value2</span><span class=\"dl\">\"</span> <span class=\"p\">};</span>\n</code></pre></div></div>\n\n<p>We define here a key-value object, and we can access the two values with the two keys:</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nx\">map</span><span class=\"p\">[</span><span class=\"dl\">\"</span><span class=\"s2\">key2</span><span class=\"dl\">\"</span><span class=\"p\">];</span> <span class=\"c1\">// == value2</span>\n</code></pre></div></div>\n\n<p>These examples are JavaScript source code. Not a string. The content of the map variable is a data\nstructure. But when we communicate with a web service, we need a (string) serialization of the data\nmodel, because we cannot send around memory pointers (which a variable is) because they are only\nvalid on a single machine.</p>\n\n<p>This is where the JSON format comes in. We can convert the content of the above map variable into a\nstring representation with this code:</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kd\">var</span> <span class=\"nx\">mapStringified</span> <span class=\"o\">=</span> <span class=\"nx\">JSON</span><span class=\"p\">.</span><span class=\"nf\">stringify</span><span class=\"p\">(</span><span class=\"nx\">map</span><span class=\"p\">);</span>\n</code></pre></div></div>\n\n<p>which gives us the following output:</p>\n\n<div class=\"language-json highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"p\">{</span><span class=\"nl\">\"key\"</span><span class=\"p\">:</span><span class=\"s2\">\"value\"</span><span class=\"p\">,</span><span class=\"nl\">\"key2\"</span><span class=\"p\">:</span><span class=\"s2\">\"value2\"</span><span class=\"p\">}</span><span class=\"w\">\n</span></code></pre></div></div>\n\n<p>This string looks an awful lot like the JavaScript code we wrote earlier.</p>\n\n<p>And, likewise we can convert the JSON string back into a JavaScript data model again, with:</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kd\">var</span> <span class=\"nx\">mapAgain</span> <span class=\"o\">=</span> <span class=\"nx\">JSON</span><span class=\"p\">.</span><span class=\"nf\">parse</span><span class=\"p\">(</span><span class=\"nx\">mapStringified</span><span class=\"p\">);</span>\n</code></pre></div></div>\n\n<p>Now, I did warn you earlier that values can be lists and maps itself again, so consider this\nJSON example from Wikipedia:</p>\n\n<div class=\"language-json highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"p\">{</span><span class=\"w\">\n    </span><span class=\"nl\">\"id\"</span><span class=\"p\">:</span><span class=\"w\"> </span><span class=\"mi\">1</span><span class=\"p\">,</span><span class=\"w\">\n    </span><span class=\"nl\">\"name\"</span><span class=\"p\">:</span><span class=\"w\"> </span><span class=\"s2\">\"Foo\"</span><span class=\"p\">,</span><span class=\"w\">\n    </span><span class=\"nl\">\"price\"</span><span class=\"p\">:</span><span class=\"w\"> </span><span class=\"mi\">123</span><span class=\"p\">,</span><span class=\"w\">\n    </span><span class=\"nl\">\"tags\"</span><span class=\"p\">:</span><span class=\"w\"> </span><span class=\"p\">[</span><span class=\"w\"> </span><span class=\"s2\">\"Bar\"</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"s2\">\"Eek\"</span><span class=\"w\"> </span><span class=\"p\">],</span><span class=\"w\">\n    </span><span class=\"nl\">\"stock\"</span><span class=\"p\">:</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\">\n        </span><span class=\"nl\">\"warehouse\"</span><span class=\"p\">:</span><span class=\"w\"> </span><span class=\"mi\">300</span><span class=\"p\">,</span><span class=\"w\">\n        </span><span class=\"nl\">\"retail\"</span><span class=\"p\">:</span><span class=\"w\"> </span><span class=\"mi\">20</span><span class=\"w\">\n    </span><span class=\"p\">}</span><span class=\"w\">\n</span><span class=\"p\">}</span><span class=\"w\">\n</span></code></pre></div></div>\n\n<p>Here we see that the value behind the stock key is another map, and the value behind the tags\nkey is a list. This creates a quite flexible serialization format, which is happily used by\nOpen PHACTS. (And for the semantic web readers, yes, we can make JSON more semantic. The Open\nPHACTS LDA supports a “rdfjson” format.)</p>",
      "summary": "As said, JSON is the format we will use as serialization format for answers given by the Open PHACTS LDA. The API actually supports XML, RDF, HTML, and TSV too, but I think JSON is a good balance between expressiveness and compactness. Moreover, and perhaps a much better argument, JSON works very well in a JavaScript environment: it is very easy to convert the serialization into a data model:",
      
      "date_published": "2013-10-30T00:10:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["pra3006","json"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/7skws-4f170",
      "url": "https://chem-bla-ics.linkedchemistry.info/2013/10/30/programming-in-life-sciences-9-apis-and.html",
      "title": "Programming in the Life Sciences #9: APIs and Web Services",
      "content_html": "<p>Continuing on the <a href=\"http://chem-bla-ics.blogspot.nl/2013/10/exercise-what-variable-type-would-you.html\">theory</a>\n<a href=\"http://chem-bla-ics.blogspot.nl/2013/10/programming-in-life-sciences-8-coding.html\">covered</a> in this course,\nthis part will talk about <a href=\"https://en.wikipedia.org/wiki/Application_programming_interface\">application programming interfaces</a>\n(APIs) and <a href=\"https://en.wikipedia.org/wiki/Web_service\">web services</a>.</p>\n\n<h2 id=\"application-programming-interfaces\">Application Programming Interfaces</h2>\n\n<p>APIs define how programs can be used by other programs. An API defines how methods are called and what feedback\nyou can expect. It basically is the combination of documentation and the program itself. But, unlike any piece\nof software, an API is aimed at users, rather than use in the same program. The API is how you communicate\nbetween programs.</p>\n\n<p>Now, in this course we will see two key types of APIs. The first are the APIs provided by the libraries that we\nuse. For example, we already indicated that we will be using at least the following two libraries,\n<a href=\"http://chem-bla-ics.blogspot.nl/2013/10/programming-in-life-sciences-8-coding.html\">ops.js and d3.js</a>.\nThese libraries are a collection of functional bits (e.g. classes and methods). For example, ops.js\ndefines an API which wraps closely the <a href=\"https://dev.openphacts.org/docs\">Open PHACTS Linked Data API</a>\n(LDA) itself. The API requires as to do a few things: 1. create a wrapper for the LDA; 2. define a\ncall back function; 3. invoke the actual</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nx\">call</span><span class=\"p\">.</span><span class=\"kd\">var</span> <span class=\"nx\">searcher</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"nx\">Openphacts</span><span class=\"p\">.</span><span class=\"nc\">ConceptWikiSearch</span><span class=\"p\">(</span>\n  <span class=\"dl\">\"</span><span class=\"s2\">https://beta.openphacts.org</span><span class=\"dl\">\"</span><span class=\"p\">,</span> <span class=\"nx\">appID</span><span class=\"p\">,</span> <span class=\"nx\">appKey</span>\n<span class=\"p\">);</span>  \n<span class=\"kd\">var</span> <span class=\"nx\">callback</span><span class=\"o\">=</span><span class=\"kd\">function</span><span class=\"p\">(</span><span class=\"nx\">success</span><span class=\"p\">,</span> <span class=\"nx\">status</span><span class=\"p\">,</span> <span class=\"nx\">response</span><span class=\"p\">){</span>  \n    <span class=\"nx\">searcher</span><span class=\"p\">.</span><span class=\"nf\">parseResponse</span><span class=\"p\">(</span><span class=\"nx\">response</span><span class=\"p\">);</span>\n<span class=\"p\">};</span>  \n<span class=\"nx\">searcher</span><span class=\"p\">.</span><span class=\"nf\">findCompounds</span><span class=\"p\">(</span><span class=\"dl\">'</span><span class=\"s1\">Aspirin</span><span class=\"dl\">'</span><span class=\"p\">,</span> <span class=\"dl\">'</span><span class=\"s1\">20</span><span class=\"dl\">'</span><span class=\"p\">,</span> <span class=\"dl\">'</span><span class=\"s1\">4</span><span class=\"dl\">'</span><span class=\"p\">,</span> <span class=\"nx\">callback</span><span class=\"p\">);</span>\n</code></pre></div></div>\n\n<h2 id=\"web-services\">Web Services</h2>\n\n<p>Web services are a special kind of APIs: they expose an API over the web. That imposes some features of\nthese APIs: first, they are based on a web transport layer, commonly\n<a href=\"https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol\">HTTP</a>, but\n<a href=\"https://en.wikipedia.org/wiki/Xmpp\">XMPP</a> is possible too. HTTP is used by your web browser too. Secondly,\nthe web server needs a common communication language to serialize the method call. Here, two key standards\nare used, <a href=\"https://en.wikipedia.org/wiki/XML\">XML</a> and <a href=\"https://en.wikipedia.org/wiki/JSON\">JSON</a>.\nWe will cover these in more detail later. For now, it suffices to think of these as\nenvelopes in which are message is sent. Now, another aspect standardized is how to call the web services.\nFor that, <a href=\"https://en.wikipedia.org/wiki/SOAP\">SOAP</a> and <a href=\"https://en.wikipedia.org/wiki/REST\">REST</a> are\nthe most important standards for the life sciences (though I still think\n<a href=\"http://www.biomedcentral.com/1471-2105/10/279\">Wagener’s XMPP approach</a> is still\nworthwhile checking out!). SOAP and REST use XML and JSON are underlying serialization format.</p>\n\n<p>So, web services are theoretically complex. For this course, most of it is hidden by the client library that will take care of the HTTP and SOAP/REST layers. The students who wish to use Java instead of JavaScript, will face the problem that you first need to find a Java client library for the LDA. There is this library, but that needs exploring for use with the latest Open PHACTS LDA. Higher stakes, higher rewards.</p>\n\n<h2 id=\"take-home-message\">Take home message</h2>\n\n<p>Practically, you do not need to know much of the technologies behind web services, just like you do not need to know machine instructions CPUs follow to run your program. But, it is important to have seen these terms. You will run into them, and need enough context to know where and how to find answers to the questions that you will have.</p>\n\n<p>There is one exception: JavaScript Object Notation, JSON. That is the format in which the data is returned by the service, and you will have to handle that. JSON will be the topic of the next post.</p>",
      "summary": "Continuing on the theory covered in this course, this part will talk about application programming interfaces (APIs) and web services.",
      
      "date_published": "2013-10-30T00:00:00+00:00",
      "date_modified": "2025-02-16T00:00:00+00:00",
      "tags": ["pra3006"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1186/1471-2105-10-279", "doi": "10.1186/1471-2105-10-279"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/ckryb-b4v19",
      "url": "https://chem-bla-ics.linkedchemistry.info/2013/10/29/programming-in-life-sciences-8-coding.html",
      "title": "Programming in the Life Sciences #8: coding standards",
      "content_html": "<p>Never underestimate the power of lack of coding standards in code obfuscation. Just try randomly to\nread code you wrote a year ago or four years ago. You’ll be surprised with what you find. Coding\nstandards are like the grammar in writing: they ensure that our message gets understood. Of course,\nthe primary goal is that the CPU understands what you mean, but because programming languages are\nnot your native language, you may not always say what you think you are saying.</p>\n\n<h2 id=\"copyright-and-licensing\">Copyright and Licensing</h2>\n\n<p>First standard is attribution: if you use the solution of someone else, you write in your source\ncode whom wrote the solution. Secondly, you must allow others to do the same. Therefore, you always\nadd your name (and normally email address) to your source code, and under what conditions people\nmay use your code. This is commonly done by assigning a license. Open Source licenses promote\n(scientific) collaboration, and give others the rights to use your solution, redistribution\nmodifications, etc. They may explicitly require attributions, but often not. In a scholarly setting,\nyou always give attribution, even if not required by the license. Remember, that software falls\nunder copyright but algorithms typically not. Copyright/author and license information is typically\nadded to source code using a <a href=\"http://chem-bla-ics.blogspot.nl/2009/06/making-patches-attribution-copyright.html\">header</a>.</p>\n\n<h2 id=\"documentation\">Documentation</h2>\n\n<p>The second thing is to document what your code is supposed to do, what assumptions are made,\nhow people should use it, and preferably under what conditions it will fail. Comments in your\nsource are just as much documentation as a tutorial in Word format. They are complementary, and\ndocumentation must not only be targeted at users, but also at yourself so that you understand\nwhy you added that weird check. You will not (have to) remember in two years.</p>\n\n<h2 id=\"coding-standards\">Coding standards</h2>\n\n<p>Just like English has coding standards, programming language have too. Both also have styles,\nand a selection of a style is up to the author, but consistency is important. What coding\nstandards should you be thinking about, include consistent use of variable and method names,\nkeeping code blocks small, etc. For example, compare the following two code examples which do\nthe same thing:</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kd\">var</span> <span class=\"nx\">method</span> <span class=\"o\">=</span> <span class=\"kd\">function</span><span class=\"p\">(</span><span class=\"nx\">string</span><span class=\"p\">)</span> <span class=\"p\">{</span>\n  <span class=\"nx\">number</span> <span class=\"o\">=</span> <span class=\"mi\">0</span>\n  <span class=\"k\">for </span><span class=\"p\">(</span><span class=\"kd\">var</span> <span class=\"nx\">i</span><span class=\"o\">=</span><span class=\"mi\">0</span><span class=\"p\">;</span> <span class=\"nx\">i</span><span class=\"o\">&lt;</span><span class=\"nx\">string</span><span class=\"p\">.</span><span class=\"nx\">length</span><span class=\"p\">;</span> <span class=\"nx\">i</span><span class=\"o\">++</span><span class=\"p\">)</span> <span class=\"p\">{</span>\n    <span class=\"k\">if </span><span class=\"p\">(</span><span class=\"nx\">string</span><span class=\"p\">[</span><span class=\"nx\">i</span><span class=\"p\">]</span> <span class=\"o\">==</span> <span class=\"dl\">\"</span><span class=\"s2\">A</span><span class=\"dl\">\"</span><span class=\"p\">)</span> <span class=\"nx\">number</span> <span class=\"o\">=</span> <span class=\"nx\">number</span> <span class=\"o\">+</span><span class=\"mi\">1</span> \n  <span class=\"p\">}</span>\n  <span class=\"k\">return</span> <span class=\"nx\">number</span>\n<span class=\"p\">}</span>\n</code></pre></div></div>\n\n<p>And this version:</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kd\">var</span> <span class=\"nx\">countTheANucleotides</span> <span class=\"o\">=</span> <span class=\"kd\">function</span><span class=\"p\">(</span><span class=\"nx\">dnaSequence</span><span class=\"p\">)</span> <span class=\"p\">{</span>\n  <span class=\"nx\">count</span> <span class=\"o\">=</span> <span class=\"mi\">0</span>\n  <span class=\"c1\">// iterate over all nucleotides in the DNA string</span>\n  <span class=\"k\">for </span><span class=\"p\">(</span><span class=\"kd\">var</span> <span class=\"nx\">i</span><span class=\"o\">=</span><span class=\"mi\">0</span><span class=\"p\">;</span> <span class=\"nx\">i</span><span class=\"o\">&lt;</span><span class=\"nx\">dnaSequence</span><span class=\"p\">.</span><span class=\"nx\">length</span><span class=\"p\">;</span> <span class=\"nx\">i</span><span class=\"o\">++</span><span class=\"p\">)</span> <span class=\"p\">{</span>\n    <span class=\"k\">if </span><span class=\"p\">(</span><span class=\"nx\">dnaSequence</span><span class=\"p\">[</span><span class=\"nx\">i</span><span class=\"p\">]</span> <span class=\"o\">==</span> <span class=\"dl\">\"</span><span class=\"s2\">A</span><span class=\"dl\">\"</span><span class=\"p\">)</span> <span class=\"nx\">count</span> <span class=\"o\">=</span> <span class=\"nx\">count</span> <span class=\"o\">+</span><span class=\"mi\">1</span> \n  <span class=\"p\">}</span>\n  <span class=\"k\">return</span> <span class=\"nx\">count</span>\n<span class=\"p\">}</span>\n</code></pre></div></div>\n\n<p>Which one do you find easier to understand the function of?</p>\n\n<ol>\n  <li>use clear, descriptive variable and method names</li>\n  <li>use source code comments to describe the intention of source code</li>\n  <li>keep source code lines short enough that you can read the full line without (horizontal) scrolling</li>\n  <li>keep code blocks short enough that the fit a single screen (say, 25 lines max)</li>\n</ol>\n\n<h2 id=\"unit-testing\">Unit testing</h2>\n\n<p>It is important to realize that what you intend to have the computer to calculate is\nsomething different that what your source code actually tells the computer to do. Even\nmore important is to realize that it is not always your fault if the calculation goes\nwrong; in particular, the input you pass to some program can always be crafted such,\nthat it will fool your code in doing unintended things.</p>\n\n<p>But, a common cause of misbehaving code is the author itself. At first (and many, many\ntimes after that) it’s just getting the code to compile: missing semi-colons, typos in\nvariable names, etc, etc. After a bit, and hunting you down to your grave, are bugs\ncaused by unintuitive features of programming language, libraries you’re using, etc.\nCommon (and often expensive) mistakes include for-loops missing the first or the last\nelement, incorrect conversion of units (<a href=\"https://en.wikipedia.org/wiki/Mars_Climate_Orbiter\">125 M$ expensive!</a>),\netc.</p>\n\n<p>Fortunately, we can call in the help of computers for this too. We have code checking\ntools, and importantly, libraries to help us define (unit) tests. These tests call\nrunning code, and check if the calculated results are matching our expectation. For\nexample, for JavaScript we could use the <a href=\"https://github.com/jquery/qunit/blob/master/MIT-LICENSE.txt\">MIT-licensed</a>\nqunit. For example, we could write the following tests (in qunit):</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nf\">test</span><span class=\"p\">(</span> <span class=\"dl\">\"</span><span class=\"s2\">counting tests</span><span class=\"dl\">\"</span><span class=\"p\">,</span> <span class=\"kd\">function</span><span class=\"p\">()</span> <span class=\"p\">{</span>\n  <span class=\"nf\">equal</span><span class=\"p\">(</span><span class=\"mi\">1</span><span class=\"p\">,</span> <span class=\"nf\">countTheANucleotides</span><span class=\"p\">(</span><span class=\"dl\">\"</span><span class=\"s2\">AGCT</span><span class=\"dl\">\"</span><span class=\"p\">));</span>\n  <span class=\"nf\">equal</span><span class=\"p\">(</span><span class=\"mi\">4</span><span class=\"p\">,</span> <span class=\"nf\">countTheANucleotides</span><span class=\"p\">(</span><span class=\"dl\">\"</span><span class=\"s2\">AAAA</span><span class=\"dl\">\"</span><span class=\"p\">));</span>\n  <span class=\"nf\">equal</span><span class=\"p\">(</span><span class=\"mi\">0</span><span class=\"p\">,</span> <span class=\"nf\">countTheANucleotides</span><span class=\"p\">(</span><span class=\"dl\">\"</span><span class=\"s2\">GCGC</span><span class=\"dl\">\"</span><span class=\"p\">));</span>\n<span class=\"p\">});</span>\n</code></pre></div></div>\n\n<p>OK, you get the idea. That other scientists really start to care about these things,\nis shown by these two recent papers:</p>\n\n<ul>\n  <li><a href=\"http://dx.doi.org/10.1371/journal.pcbi.1002802\">Ten simple rules for the open development of scientific software</a></li>\n  <li><a href=\"http://dx.doi.org/10.1371/journal.pcbi.1003285\">Ten simple rules for reproducible computational research</a></li>\n</ul>",
      "summary": "Never underestimate the power of lack of coding standards in code obfuscation. Just try randomly to read code you wrote a year ago or four years ago. You’ll be surprised with what you find. Coding standards are like the grammar in writing: they ensure that our message gets understood. Of course, the primary goal is that the CPU understands what you mean, but because programming languages are not your native language, you may not always say what you think you are saying.",
      
      "date_published": "2013-10-29T00:00:00+00:00",
      "date_modified": "2025-02-16T00:00:00+00:00",
      "tags": ["pra3006"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1371/journal.pcbi.1002802", "doi": "10.1371/journal.pcbi.1002802"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1371/journal.pcbi.1003285", "doi": "10.1371/journal.pcbi.1003285"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/7yb0k-mwq04",
      "url": "https://chem-bla-ics.linkedchemistry.info/2013/10/23/exercise-what-variable-type-would-you.html",
      "title": "Programming in the Life Sciences #7: theory",
      "content_html": "<p>No course, with some good theory. In <a href=\"http://chem-bla-ics.blogspot.nl/2013/10/programming-in-life-sciences-1-six-day.html\">this six-day course</a>,\nI plan to cover this computing theory. It’s very practice oriented:</p>\n\n<p><img src=\"/assets/images/theorySlide.png\" alt=\"\" /></p>\n\n<p>That should give them enough head start to work on something <a href=\"http://chem-bla-ics.blogspot.nl/2013/10/programming-in-life-sciences-5.html\">like this</a>.\nThe material will be more extensive, but I’ll give myself a head start, with some initial content.</p>\n\n<h2 id=\"introduction\">Introduction</h2>\n<p>Programming in the Life Sciences is done to solve problems in the life sciences, but only\nproblems that can be solved with pen and paper too. Programming cannot measure metabolites\nin a cell. For that, you need equipment that gives the things it measured as data as input to\nthe computer.</p>\n\n<p>Instead, the program defines some computation that is done on the computer. For example, noise\nreduction, DNA/RNA/protein sequence alignment, metabolite identification, etc. But all\ncomputation start with input data.</p>\n\n<p>The program tells the computer what it should do, step by step. Get the data from the LC/MS; find\npeaks; group peaks at the same retention time; match that against a metabolite spectral database;\ndetermine the best match; report the best three matches to the user via the screen. Step by step.</p>\n\n<p>The computer consists of input/output devices (to get data; to present results), various kinds of\nmemory (to remember things), and a central processing unit (CPU) that performs the computation\nsteps.</p>\n\n<p>Considering all this, programming is to define what the computer should do, in a (programming)\nlanguage that the computer understands. Note that I say “the computer understands” rather than\n“the CPU understands”. The CPU only speaks one language (machine instructions). But we use a\nhigher level programming language, which is much more compact and easier to read/understand. A\ncompiler translate this higher level language into machine instructions (sometimes more\ncompilers).</p>\n\n<h2 id=\"data-types\">Data Types</h2>\n<p>The programming language says do this, do that. It does not know about data. Fortunately, it\nknows about bit, and bits we can use to store data. That way, we can instruct the CPU to do\nthings like: OK, take the measured LC/MS data, take the MS at retention time 5, then start with\nthe first m/z value, and if that is larger than 10, then… etc. We do not want to hard code the\ndata in our program, so we instruct the CPU to remember it. The computer has various levels of\nmemory that are relevant (ignoring those at a CPU level!): variables stored in the working\nmemory, and data stored on external memory (hard disk, USB disk, LC/MS machine).</p>\n\n<p><em>Exercise: write a program that counts the sum of all numbers starting with 1 up to 50 without\nusing variables.</em></p>\n\n<p>Some programming languages have variables types. This variable is a non-integer number, this\nvariable is a text string. This ensures that you cannot sum up “cat” with 5.3. This is called\nvariable typing. Some programming language have hard typing (types are defined in the source\ncode), while others have dynamic typing (the program figures it out when it is compiling), and\nsome even no typing at all (the computer will complain when it runs).</p>\n\n<p>Example basic variable types include: string, integers, floats, and booleans. Strings can be used\nto remember names; integers are needed for counts and iterations (how many m/z values did I\nalready look at again??), and floats are needed for pretty much all scientific data. A boolean is\na yes/no type, or true/false.</p>\n\n<p>Also, variables do not have units. Remember those high school days? <em>“John, six WHAT??”</em>, <em>“Umm,\nsix mole, sir.”</em> Variables do not have units. Thus while you cannot calculate the sum of “cat”\nand 5.3, a computer has no problem calculating the sum of six mole and three days.</p>\n\n<h2 id=\"complex-types\">Complex Types</h2>\n\n<p><em>Exercise: What variable type would you use for that photo you took last week of that western blot?</em></p>\n\n<p>It is clear that these basic types don’t suffice. This touches on the topic of computer\nrepresentation. How does a computer keep a western blot in memory? That photo you tool with\nyour Android digitized the western blot into a matrix of numbers: if it was a greyscale photo,\nthen a single integer per position.</p>\n\n<p>Programming languages have various complex types, though most even support the definition of even\nmore complicated data structures. But the more basic complex types first: list. A list, vector,\nor array all refer to the same concept: a list of variables, typically of the same type. For\nexample, a mathematical vector is a list of floats (e.g. <code class=\"language-plaintext highlighter-rouge\">float[]</code> in JavaScript, where the\n<code class=\"language-plaintext highlighter-rouge\">[]</code> refers to the list or array nature). A string, actually, which we marked as a “basic”\nvariable type, is really a complex one too: it is a list of characters. That is, the string “cat”\nis a list of three characters. Importantly, each item in the list has an index, and the full list\nhas a length. Depending on the programming language, the first item in the list has index 0 or 1.</p>\n\n<p>As said, a list typically contains variables of the same type, just because it is easier to work\nwith. But the list can contain complex types too. For example, we can create a list of lists (we\nwould write <code class=\"language-plaintext highlighter-rouge\">float[][]</code>). Each element in the top list is a list again; that is, the first\nelement of the outer list is again a list. This matches vary closes the mathematical matrix.</p>\n\n<p>A second complex type important in this course is the map. A map is basically a list of key-value\npairs, where they keys take the role of the index in lists. Instead of asking for the list item\nwith index 7, we ask for the value behind a certain key. And, like we could make a list of lists,\nwe can also make a map of maps, etc. Keep this in mind! We will use this extensively in this\ncourse.</p>\n\n<h2 id=\"automation\">Automation</h2>\n<p>Now that we know how the CPU uses memory, we turn back to what the processor must do, according\nto our program. First, I mentioned the step by step at the start. This is critical: the processor\nhas a linear progression through the steps it must do. I can only go forward, and only step by\nstep. It cannot go back. Yet, that is exactly what we write in a for-loop, like in this four line\nJavaScript example:</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kd\">var</span> <span class=\"nx\">sum</span> <span class=\"o\">=</span> <span class=\"mi\">0</span><span class=\"p\">;</span>\n<span class=\"k\">for </span><span class=\"p\">(</span><span class=\"kd\">var</span> <span class=\"nx\">i</span><span class=\"o\">=</span><span class=\"mi\">1</span><span class=\"p\">;</span> <span class=\"nx\">i</span><span class=\"o\">&lt;</span><span class=\"mi\">50</span><span class=\"p\">;</span> <span class=\"nx\">i</span><span class=\"o\">=</span><span class=\"mi\">1</span><span class=\"o\">+</span><span class=\"mi\">1</span><span class=\"p\">)</span> <span class=\"p\">{</span> \n  <span class=\"nx\">sum</span> <span class=\"o\">=</span> <span class=\"nx\">sum</span> <span class=\"o\">+</span> <span class=\"nx\">i</span><span class=\"p\">;</span>\n<span class=\"p\">}</span>\n</code></pre></div></div>\n\n<p>This code defines the variable sum in the first line, and then starts counting, from 1 to 50, one\nby one, and adding that number to the sum. This loop is only for our convenience. This is how the\ncomputer will run this program (and at a CPU machine instruction level it’s even longer):</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kd\">var</span> <span class=\"nx\">sum</span> <span class=\"o\">=</span> <span class=\"mi\">0</span><span class=\"p\">;</span>\n<span class=\"kd\">var</span> <span class=\"nx\">i</span><span class=\"o\">=</span><span class=\"mi\">1</span><span class=\"p\">;</span>\n<span class=\"nx\">sum</span> <span class=\"o\">=</span> <span class=\"nx\">sum</span> <span class=\"o\">+</span> <span class=\"nx\">i</span><span class=\"p\">;</span>\n<span class=\"nx\">i</span><span class=\"o\">=</span><span class=\"nx\">i</span><span class=\"o\">+</span><span class=\"mi\">1</span><span class=\"p\">;</span>\n<span class=\"nx\">sum</span> <span class=\"o\">=</span> <span class=\"nx\">sum</span> <span class=\"o\">+</span> <span class=\"nx\">i</span><span class=\"p\">;</span>\n<span class=\"nx\">i</span><span class=\"o\">=</span><span class=\"nx\">i</span><span class=\"o\">+</span><span class=\"mi\">1</span><span class=\"p\">;</span>\n<span class=\"nx\">sum</span> <span class=\"o\">=</span> <span class=\"nx\">sum</span> <span class=\"o\">+</span> <span class=\"nx\">i</span><span class=\"p\">;</span>\n<span class=\"nx\">i</span><span class=\"o\">=</span><span class=\"nx\">i</span><span class=\"o\">+</span><span class=\"mi\">1</span><span class=\"p\">;</span>\n<span class=\"nx\">sum</span> <span class=\"o\">=</span> <span class=\"nx\">sum</span> <span class=\"o\">+</span> <span class=\"nx\">i</span><span class=\"p\">;</span>\n<span class=\"nx\">i</span><span class=\"o\">=</span><span class=\"nx\">i</span><span class=\"o\">+</span><span class=\"mi\">1</span><span class=\"p\">;</span>\n<span class=\"nx\">sum</span> <span class=\"o\">=</span> <span class=\"nx\">sum</span> <span class=\"o\">+</span> <span class=\"nx\">i</span><span class=\"p\">;</span>\n<span class=\"nx\">i</span><span class=\"o\">=</span><span class=\"nx\">i</span><span class=\"o\">+</span><span class=\"mi\">1</span><span class=\"p\">;</span>\n<span class=\"nx\">sum</span> <span class=\"o\">=</span> <span class=\"nx\">sum</span> <span class=\"o\">+</span> <span class=\"nx\">i</span><span class=\"p\">;</span>\n<span class=\"nx\">i</span><span class=\"o\">=</span><span class=\"nx\">i</span><span class=\"o\">+</span><span class=\"mi\">1</span><span class=\"p\">;</span>\n<span class=\"nx\">sum</span> <span class=\"o\">=</span> <span class=\"nx\">sum</span> <span class=\"o\">+</span> <span class=\"nx\">i</span><span class=\"p\">;</span>\n<span class=\"nx\">i</span><span class=\"o\">=</span><span class=\"nx\">i</span><span class=\"o\">+</span><span class=\"mi\">1</span><span class=\"p\">;</span>\n<span class=\"nx\">sum</span> <span class=\"o\">=</span> <span class=\"nx\">sum</span> <span class=\"o\">+</span> <span class=\"nx\">i</span><span class=\"p\">;</span>\n<span class=\"c1\">// ...</span>\n</code></pre></div></div>\n\n<p>OK, I won’t give the full sequence of steps the computer takes. I guess you can see the virtues\nof higher level programming languages :) Importantly, it is a linear list of steps it takes.</p>\n\n<p>Another important control structures in programming languages is the if-statement. This gives us\nthe power of making decisions. For example, we can skip the 7 in the above summation:</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kd\">var</span> <span class=\"nx\">sum</span> <span class=\"o\">=</span> <span class=\"mi\">0</span><span class=\"p\">;</span>\n<span class=\"k\">for </span><span class=\"p\">(</span><span class=\"kd\">var</span> <span class=\"nx\">i</span><span class=\"o\">=</span><span class=\"mi\">1</span><span class=\"p\">;</span> <span class=\"nx\">i</span><span class=\"o\">&lt;</span><span class=\"mi\">50</span><span class=\"p\">;</span> <span class=\"nx\">i</span><span class=\"o\">=</span><span class=\"mi\">1</span><span class=\"o\">+</span><span class=\"mi\">1</span><span class=\"p\">)</span> <span class=\"p\">{</span>\n  <span class=\"k\">if </span><span class=\"p\">(</span><span class=\"nx\">i</span> <span class=\"o\">==</span> <span class=\"mi\">7</span><span class=\"p\">)</span> <span class=\"p\">{</span>\n  <span class=\"p\">}</span> <span class=\"k\">else</span> <span class=\"p\">{</span>\n     <span class=\"nx\">sum</span> <span class=\"o\">=</span> <span class=\"nx\">sum</span> <span class=\"o\">+</span> <span class=\"nx\">i</span><span class=\"p\">;</span>\n  <span class=\"p\">}</span>\n<span class=\"p\">}</span>\n</code></pre></div></div>\n\n<p>But I yet did not discuss another important concept: the operator. The operator tells the\ncomputer what operation to perform, and how. This last source code example uses various operators: <code class=\"language-plaintext highlighter-rouge\">=</code>, <code class=\"language-plaintext highlighter-rouge\">&lt;</code>, <code class=\"language-plaintext highlighter-rouge\">+</code>, and <code class=\"language-plaintext highlighter-rouge\">==</code>. The first is an assignment operator: it assigns the value ‘0’\nto the variable sum. This operation does not return anything. The <code class=\"language-plaintext highlighter-rouge\">&lt;</code> operator compares two\nvariable values, or a variable value with a specific value. For example, the above code compares\nthe value behind the ‘i’ variable with 50; indeed, it does not compare 50 with “i”, which is the\nvariable name. The + operator follows the mathematical + operator for floats and integers; for\nstrings the + operator performs a concatenation: <code class=\"language-plaintext highlighter-rouge\">\"cat\" + \"fish\"</code> is not one less fish, but a\n<code class=\"language-plaintext highlighter-rouge\">\"catfish\"</code>. Note that these two operators, &lt; and +, return a new value. The <code class=\"language-plaintext highlighter-rouge\">&lt;</code> returns a\nboolean (yes, it’s smaller; no, it’s not smaller); the <code class=\"language-plaintext highlighter-rouge\">+</code> returns an integer if it was summing\nintegers, or a string when it concatenated two strings. The <code class=\"language-plaintext highlighter-rouge\">==</code> operator also returns a boolean:\ntrue of the two variables are the same (in general). During the course, we will see several more\noperators. Look out for them!</p>\n\n<p>In some way, this brings us to the next topic: functions of parameters. An operator is a special\nkind of function, and that will become more clear if I give an example function:</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kd\">function</span> <span class=\"nf\">add</span><span class=\"p\">(</span><span class=\"nx\">first</span><span class=\"p\">,</span> <span class=\"nx\">second</span><span class=\"p\">)</span> <span class=\"p\">{</span>\n  <span class=\"kd\">var</span> <span class=\"nx\">sum</span> <span class=\"o\">=</span> <span class=\"nx\">first</span> <span class=\"o\">+</span> <span class=\"nx\">second</span><span class=\"p\">;</span>\n  <span class=\"k\">return</span> <span class=\"nx\">sum</span><span class=\"p\">;</span>\n<span class=\"p\">}</span>\n</code></pre></div></div>\n\n<p>Effectively, we just made an alias function “add” which internally just uses the + operator, with\nthe exact same outcome.</p>\n\n<p><em>Exercise: what would be returned by these two function calls? 1. add(1,2); 2. add(“cat”,\n“fish”);</em></p>\n\n<p>This function example is not so interesting, and only makes the code harder to read. However,\nwhen the “body” of the function becomes larger, it allows you to easily replace a complex list\nof steps with one function call. Consider: <code class=\"language-plaintext highlighter-rouge\">sumAllNumbers(1,50)</code>.</p>\n\n<p>Now, if we collect many such functions, pretty much like books, we get a library. So, that one\nwas easy.</p>\n\n<p>That includes this episode of the <a href=\"http://chem-bla-ics.blogspot.nl/search/label/%23mscpils\">Programming in the Life Sciences</a>\nseries. I will continue later with the theory about Web Services and Clients, Serialization\nformats, and Other.</p>",
      "summary": "No course, with some good theory. In this six-day course, I plan to cover this computing theory. It’s very practice oriented:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/theorySlide.png",
      "date_published": "2013-10-23T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["pra3006"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/hrzhb-m7g26",
      "url": "https://chem-bla-ics.linkedchemistry.info/2013/10/12/programming-in-life-sciences-6-functions.html",
      "title": "Programming in the Life Sciences #6: functions",
      "content_html": "<p>One key feature of programming languages is the following: first, there is linearity. This is an important point\nthat is not always clear to students who just start to program. In fact, ask yourself what the algorithm is for\ncounting the chairs in the room where you are now sitting. Could a computer do that in the same way? How should\nyour algorithm change? A key point is, is that the program is run step by step, in a linear way.</p>\n\n<p>However, we very easily jump to functions. In fact, we use so many libraries nowadays, this linearity is not so\nclear anymore. Things just happen with magic library calls. But at the same time, the library calls make our life\na lot easier: by using functions, we group functionality in easy to read and easier to understand blobs.</p>\n\n<p>OK, the previous example showed that we could use the HTML <code class=\"language-plaintext highlighter-rouge\">@onClick</code> attribute to provide further detail.\nBut I did not show how. This is how:</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nx\">html</span> <span class=\"o\">+=</span> <span class=\"dl\">\"</span><span class=\"s2\">Name: &lt;span onClick=</span><span class=\"se\">\\\"</span><span class=\"s2\">showDetails('</span><span class=\"dl\">\"</span> <span class=\"o\">+</span>\n  <span class=\"nf\">escape</span><span class=\"p\">(</span><span class=\"nx\">dataJSON</span><span class=\"p\">)</span> <span class=\"o\">+</span> <span class=\"dl\">\"</span><span class=\"se\">\\</span><span class=\"s2\">')</span><span class=\"se\">\\\"</span><span class=\"s2\">&gt;</span><span class=\"dl\">\"</span> <span class=\"o\">+</span> \n  <span class=\"nx\">response</span><span class=\"p\">[</span><span class=\"nx\">i</span><span class=\"p\">].</span><span class=\"nx\">prefLabel</span> <span class=\"o\">+</span> <span class=\"dl\">\"</span><span class=\"s2\">&lt;/span&gt;</span><span class=\"dl\">\"</span><span class=\"p\">;</span>\n</code></pre></div></div>\n\n<p>This code adds the <code class=\"language-plaintext highlighter-rouge\">@onClick</code> attribute and a function call to the <code class=\"language-plaintext highlighter-rouge\">showDetails()</code> method which takes one parameter,\nwhere we pass escaped JSON. That is non-trivial, I understand, and may be due to my limited knowledge of JavaScript.\nThe escaping of the JSON is needed to make quotes match in the generated HTML. In the function later, we can unescape\nit and get the original JSON again. Importantly, the dataJSON data contains all the details I like to show.</p>\n\n<p>Now, this functions needs to be defined. Yes, plural, because two functions are used in this code snippet: <code class=\"language-plaintext highlighter-rouge\">showDetails()</code>\nand <code class=\"language-plaintext highlighter-rouge\">escape()</code>. The last is defined by one of the used libraries. The <code class=\"language-plaintext highlighter-rouge\">showDetails()</code> function, however, I made up.\nSo, I had to define it elsewhere in the HTML document, and it looks like:</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kd\">var</span> <span class=\"nx\">showDetails</span> <span class=\"o\">=</span> <span class=\"kd\">function</span><span class=\"p\">(</span><span class=\"nx\">dataJSON</span><span class=\"p\">){</span>\n  <span class=\"nx\">data</span> <span class=\"o\">=</span> <span class=\"nx\">JSON</span><span class=\"p\">.</span><span class=\"nf\">parse</span><span class=\"p\">(</span><span class=\"nf\">unescape</span><span class=\"p\">(</span><span class=\"nx\">dataJSON</span><span class=\"p\">));</span>\n  <span class=\"nb\">document</span><span class=\"p\">.</span><span class=\"nf\">getElementById</span><span class=\"p\">(</span><span class=\"dl\">\"</span><span class=\"s2\">details</span><span class=\"dl\">\"</span><span class=\"p\">).</span><span class=\"nx\">innerHTML</span> <span class=\"o\">=</span>\n    <span class=\"nx\">data</span><span class=\"p\">.</span><span class=\"nx\">_about</span><span class=\"p\">;</span>\n<span class=\"p\">};</span>\n</code></pre></div></div>\n\n<p>This example actually gives the exact same output as the code in the previous post, but with one major difference.\nWe now can extend the function as much as we like, but the code to output the list of found compounds does not have\nto get more complex than it already is.</p>",
      "summary": "One key feature of programming languages is the following: first, there is linearity. This is an important point that is not always clear to students who just start to program. In fact, ask yourself what the algorithm is for counting the chairs in the room where you are now sitting. Could a computer do that in the same way? How should your algorithm change? A key point is, is that the program is run step by step, in a linear way.",
      
      "date_published": "2013-10-12T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["pra3006","javascript","html"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/7yj2v-4sz07",
      "url": "https://chem-bla-ics.linkedchemistry.info/2013/10/09/programming-in-life-sciences-5.html",
      "title": "Programming in the Life Sciences #5: converting the results into HTML",
      "content_html": "<p>Now that we have <a href=\"http://chem-bla-ics.blogspot.nl/2013/10/programming-in-life-sciences-4.html\">the communication working</a>\nwith the Open PHACTS LDA, it is time to make a nice GUI. I will not go into details, but we can use basic JavaScript to\niterate over the JSON results, and, for example, create a HTML table:</p>\n\n<p><img src=\"/assets/images/mscpils1_output.png\" alt=\"\" /></p>\n\n<p>In fact, I hooked in some HTML <code class=\"language-plaintext highlighter-rouge\">onClick()</code> functionality so that when you click one of the compound names, you get further\ndetails (under <em>Compound Details</em>), though that only outputs the ConceptWiki URI at this moment. A simple for-loop does\nthe heavy work:</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nx\">html</span> <span class=\"o\">=</span> <span class=\"dl\">\"</span><span class=\"s2\">&lt;table&gt;</span><span class=\"dl\">\"</span><span class=\"p\">;</span>\n<span class=\"k\">for </span><span class=\"p\">(</span><span class=\"kd\">var</span> <span class=\"nx\">i</span><span class=\"o\">=</span><span class=\"mi\">0</span><span class=\"p\">;</span> <span class=\"nx\">i</span><span class=\"o\">&lt;</span><span class=\"nx\">response</span><span class=\"p\">.</span><span class=\"nx\">length</span><span class=\"p\">;</span> <span class=\"nx\">i</span><span class=\"o\">++</span><span class=\"p\">)</span> <span class=\"p\">{</span>\n  <span class=\"nx\">html</span> <span class=\"o\">+=</span> <span class=\"dl\">\"</span><span class=\"s2\">&lt;tr&gt;</span><span class=\"dl\">\"</span><span class=\"p\">;</span>\n  <span class=\"nx\">html</span> <span class=\"o\">+=</span> <span class=\"dl\">\"</span><span class=\"s2\">&lt;td&gt;</span><span class=\"dl\">\"</span><span class=\"p\">;</span>\n  <span class=\"nx\">dataJSON</span> <span class=\"o\">=</span> <span class=\"nx\">JSON</span><span class=\"p\">.</span><span class=\"nf\">stringify</span><span class=\"p\">(</span><span class=\"nx\">response</span><span class=\"p\">[</span><span class=\"nx\">i</span><span class=\"p\">]);</span>\n  <span class=\"c1\">//   dataJSON.replace(/\"/g, \"'\");</span>\n  <span class=\"nx\">html</span> <span class=\"o\">+=</span> <span class=\"dl\">\"</span><span class=\"s2\">Name: &lt;span&gt;</span><span class=\"dl\">\"</span> <span class=\"o\">+</span> <span class=\"nx\">response</span><span class=\"p\">[</span><span class=\"nx\">i</span><span class=\"p\">].</span><span class=\"nx\">prefLabel</span> <span class=\"o\">+</span> <span class=\"dl\">\"</span><span class=\"s2\">&lt;/span&gt;</span><span class=\"dl\">\"</span><span class=\"p\">;</span>\n  <span class=\"nx\">html</span> <span class=\"o\">+=</span> <span class=\"dl\">\"</span><span class=\"s2\">&lt;/td&gt;</span><span class=\"dl\">\"</span><span class=\"p\">;</span>\n  <span class=\"nx\">html</span> <span class=\"o\">+=</span> <span class=\"dl\">\"</span><span class=\"s2\">&lt;/tr&gt;</span><span class=\"dl\">\"</span><span class=\"p\">;</span>\n<span class=\"p\">}</span>\n<span class=\"nx\">html</span> <span class=\"o\">+=</span> <span class=\"dl\">\"</span><span class=\"s2\">&lt;/table&gt;</span><span class=\"dl\">\"</span><span class=\"p\">;</span>\n<span class=\"nb\">document</span><span class=\"p\">.</span><span class=\"nf\">getElementById</span><span class=\"p\">(</span><span class=\"dl\">\"</span><span class=\"s2\">table</span><span class=\"dl\">\"</span><span class=\"p\">).</span><span class=\"nx\">innerHTML</span> <span class=\"o\">=</span> <span class=\"nx\">html</span><span class=\"p\">;</span>\n</code></pre></div></div>\n\n<p>So, we’re set to teach the students all the basics of programming: loops, variables, functions, etc.</p>",
      "summary": "Now that we have the communication working with the Open PHACTS LDA, it is time to make a nice GUI. I will not go into details, but we can use basic JavaScript to iterate over the JSON results, and, for example, create a HTML table:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/mscpils1_output.png",
      "date_published": "2013-10-09T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["pra3006","html","javascript"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/7ex09-4x603",
      "url": "https://chem-bla-ics.linkedchemistry.info/2013/10/09/programming-in-life-sciences-4.html",
      "title": "Programming in the Life Sciences #4: communication from within HTML",
      "content_html": "<p>The purpose of a web service is that you give it a question or task, and that it returns an answer. For example, we can ask the\n<a href=\"http://www.openphacts.org/\">Open PHACTS</a> platform what compounds it knows with aspirin in the name. We pass the question (with the\n<a href=\"http://chem-bla-ics.blogspot.nl/2013/10/programming-in-life-sciences-2-accounts.html\">API key</a>) and get a list of matching compounds.\nNow, this communication is complex: it happens at many levels, which are spelled out in the\n<a href=\"https://en.wikipedia.org/wiki/Internet_model\">Internet Model</a>. There are various variants of the stack of communication layers,\nbut we are interested mostly in the top layers, at the <em>application layer</em>. In fact, for this course this model only serves as\nsupporting information for those who want to learn more.</p>\n\n<p>Practically, what matters here is how to ask the question and how to understand the answer.</p>\n\n<p>We are supported in these practicalities with JavaScript libraries, in particular the <a href=\"https://github.com/openphacts/ops.js\">ops.js</a>\nlibrary and general <a href=\"https://en.wikipedia.org/wiki/JSON\">JSON</a> functionality provided by most browsers (unless the student decided to use\na <em>different</em> programming language, in which there are different libraries). Personally, I have only very limited JavaScript experience,\nand this mostly goes back to the good old <a href=\"http://www.biomedcentral.com/1471-2105/8/487\">Userscript and Greasemonkey days</a> (wow! the\npaper is actually the <a href=\"http://www.altmetric.com/details.php?citation_id=103983\">4th highest scoring BMC Bioinformatics article!</a>).\nBut because my JavaScript knowledge is limited and rusty, I spent a good part of today, to get a basic example running. Very basic,\nand barely exceeding the communication details. That is, this is the output in the browser:</p>\n\n<p><img src=\"/assets/images/mcspils_jsonOutput.png\" alt=\"\" /></p>\n\n<p>So, what does the question look like? The question is actually hardcoded in the HTML source, but the page does take two parameters:\nthe <code class=\"language-plaintext highlighter-rouge\">app_key</code> and <code class=\"language-plaintext highlighter-rouge\">app_id</code> that come <a href=\"http://chem-bla-ics.blogspot.nl/2013/10/programming-in-life-sciences-2-accounts.html\">with your Open PHACTS account</a>.</p>\n\n<p>The ops.js library helps us, and wraps the Open PHACTS LDA methods in JavaScript methods. Thus, rather can crafting special HTTP calls,\nwe use two JavaScript calls:</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kd\">var</span> <span class=\"nx\">searcher</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"nx\">Openphacts</span><span class=\"p\">.</span><span class=\"nc\">ConceptWikiSearch</span><span class=\"p\">(</span>\n  <span class=\"dl\">\"</span><span class=\"s2\">https://beta.openphacts.org</span><span class=\"dl\">\"</span><span class=\"p\">,</span>\n  <span class=\"nx\">params</span><span class=\"p\">[</span><span class=\"dl\">\"</span><span class=\"s2\">app_id</span><span class=\"dl\">\"</span><span class=\"p\">],</span> <span class=\"nx\">params</span><span class=\"p\">[</span><span class=\"dl\">\"</span><span class=\"s2\">app_key</span><span class=\"dl\">\"</span><span class=\"p\">]</span>\n<span class=\"p\">);</span>\n<span class=\"nx\">searcher</span><span class=\"p\">.</span><span class=\"nf\">byTag</span><span class=\"p\">(</span>\n  <span class=\"dl\">'</span><span class=\"s1\">Aspirin</span><span class=\"dl\">'</span><span class=\"p\">,</span> <span class=\"dl\">'</span><span class=\"s1\">20</span><span class=\"dl\">'</span><span class=\"p\">,</span> <span class=\"dl\">'</span><span class=\"s1\">4</span><span class=\"dl\">'</span><span class=\"p\">,</span> <span class=\"dl\">'</span><span class=\"s1\">07a84994-e464-4bbf-812a-a4b96fa3d197</span><span class=\"dl\">'</span><span class=\"p\">,</span>\n  <span class=\"nx\">callback</span>\n<span class=\"p\">);</span>\n</code></pre></div></div>\n\n<p>The first statement creates an LDA method object, while the second makes an actual question. I have not defined the callback variable,\nwhich actually is a JavaScript function that looks like:</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kd\">var</span> <span class=\"nx\">callback</span> <span class=\"o\">=</span> <span class=\"kd\">function</span><span class=\"p\">(</span><span class=\"nx\">success</span><span class=\"p\">,</span> <span class=\"nx\">status</span><span class=\"p\">,</span> <span class=\"nx\">response</span><span class=\"p\">){</span>\n  <span class=\"kd\">var</span> <span class=\"nx\">result</span> <span class=\"o\">=</span> <span class=\"nx\">searcher</span><span class=\"p\">.</span><span class=\"nf\">parseResponse</span><span class=\"p\">(</span><span class=\"nx\">response</span><span class=\"p\">);</span>\n  <span class=\"nb\">document</span><span class=\"p\">.</span><span class=\"nf\">getElementById</span><span class=\"p\">(</span><span class=\"dl\">\"</span><span class=\"s2\">output</span><span class=\"dl\">\"</span><span class=\"p\">).</span><span class=\"nx\">innerHTML</span> <span class=\"o\">=</span>\n    <span class=\"dl\">\"</span><span class=\"s2\">Results: </span><span class=\"dl\">\"</span> <span class=\"o\">+</span> <span class=\"nx\">JSON</span><span class=\"p\">.</span><span class=\"nf\">stringify</span><span class=\"p\">(</span><span class=\"nx\">result</span><span class=\"p\">);</span>\n<span class=\"p\">};</span>\n</code></pre></div></div>\n\n<p>When the LDA web service returns data, this method gets called, providing asynchronous functionality to keep the web page responsive.\nBut when called, it first parses the returned data, and then puts the JSON output as text in the HTML. The output that is given in\nthe earlier screenshot.</p>\n\n<p>So, hurdle taken. From here on it’s easier. Regular looping over the results, creating some HTML output, etc. The\n<a href=\"https://gist.github.com/egonw/6902776\">full source code</a> if this example is available as Gist.</p>",
      "summary": "The purpose of a web service is that you give it a question or task, and that it returns an answer. For example, we can ask the Open PHACTS platform what compounds it knows with aspirin in the name. We pass the question (with the API key) and get a list of matching compounds. Now, this communication is complex: it happens at many levels, which are spelled out in the Internet Model. There are various variants of the stack of communication layers, but we are interested mostly in the top layers, at the application layer. In fact, for this course this model only serves as supporting information for those who want to learn more.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/mcspils_jsonOutput.png",
      "date_published": "2013-10-09T00:00:00+00:00",
      "date_modified": "2025-02-16T00:00:00+00:00",
      "tags": ["pra3006","javascript","html","openphacts"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1186/1471-2105-8-487", "doi": "10.1186/1471-2105-8-487"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/yvptc-3vm13",
      "url": "https://chem-bla-ics.linkedchemistry.info/2013/10/09/programming-in-life-sciences-3.html",
      "title": "Programming in the Life Sciences #3: the assessment",
      "content_html": "<p>Now that I have wrote out <a href=\"http://chem-bla-ics.blogspot.nl/2013/10/programming-in-life-sciences-1-six-day.html\">the goals</a>,\nwhat they students will practically do, and how to <a href=\"http://chem-bla-ics.blogspot.nl/2013/10/programming-in-life-sciences-2-accounts.html\">get started</a>\nwith the <a href=\"http://openphacts.org/\">Open PHACTS</a> platform, I will list how we will assess the students:</p>\n\n<ol>\n  <li>a presentation on the second day, outlining the project and work plan,</li>\n  <li>working source code at the end of the cour\nse,</li>\n  <li>a final presentation, showing the results and conclusions.</li>\n</ol>\n\n<p>Primarily, they will be judged on their acquired programming skills. Working code is the minimum; but code quality will be taken\ninto account too. I will show them how blogging works as a pre-print server for presentations. I hope it will also learn them\nwhat role this has in scientific communication.</p>",
      "summary": "Now that I have wrote out the goals, what they students will practically do, and how to get started with the Open PHACTS platform, I will list how we will assess the students:",
      
      "date_published": "2013-10-09T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["pra3006"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/hb7ye-mzp21",
      "url": "https://chem-bla-ics.linkedchemistry.info/2013/10/08/programming-in-life-sciences-2-accounts.html",
      "title": "Programming in the Life Sciences #2: accounts and API keys",
      "content_html": "<p>I have outlined the scope of the <a href=\"http://chem-bla-ics.blogspot.nl/2013/10/programming-in-life-sciences-1-six-day.html\">six-day course</a>:\nthe students will learn to program while hacking on the <a href=\"https://dev.openphacts.org/docs\">Open PHACTS’ Linked Data API</a> (LDA). The first\nstep is to get an account for the LDA. I have already done that to save time. But these are the steps to take. You go to\n<a href=\"https://dev.openphacts.org/signup\">https://dev.openphacts.org/signup</a>:</p>\n\n<p><img src=\"/assets/images/gscholar1.png\" alt=\"\" /></p>\n\n<p>You then approve the account via your email account and you are set. The account is needed to get an API key. Using this key,\nOpen PHACTS developers can contact you if your scripts go berserk  So, you are kindly invited to make crazy hypotheses and hack the\nhell out of the platform. That’s what I hope my students will do.</p>\n\n<p>To try your new key, go to the documentation page, and open, for example, the <em>SMILES to URL</em> method:</p>\n\n<p><img src=\"/assets/images/mscpils.png\" alt=\"\" /></p>\n\n<p>Here you can see what parameters this LDA method has. We focus now on the <code class=\"language-plaintext highlighter-rouge\">app_id</code> and <code class=\"language-plaintext highlighter-rouge\">app_key</code> fields. Each account comes by default\nwith a, um, default <code class=\"language-plaintext highlighter-rouge\">app_id</code> and default <code class=\"language-plaintext highlighter-rouge\">app_key</code>. Just click on the field and select them:</p>\n\n<p><img src=\"/assets/images/mscpils1.png\" alt=\"\" /></p>\n\n<p>Select the defaults and enter a SMILES (try: <a href=\"https://apps.ideaconsult.net:8080/ambit2/depict?search=CC(=O)NC1=CC=C(C=C1)O\">CC(=O)NC1=CC=C(C=C1)O)</a>).\nYou can select the format you like (I like Turtle) and you get Linked Data back on this <a href=\"https://rdf.chemspider.com/1906\">compound</a>.</p>\n\n<p>Now, go explore the LDA methods.</p>",
      "summary": "I have outlined the scope of the six-day course: the students will learn to program while hacking on the Open PHACTS’ Linked Data API (LDA). The first step is to get an account for the LDA. I have already done that to save time. But these are the steps to take. You go to https://dev.openphacts.org/signup:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/gscholar1.png",
      "date_published": "2013-10-08T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["pra3006","openphacts","javascript","rest"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/shsz6-67p30",
      "url": "https://chem-bla-ics.linkedchemistry.info/2013/10/05/programming-in-life-sciences-1-six-day.html",
      "title": "Programming in the Life Sciences #1: a six day course",
      "content_html": "<p>Our <a href=\"http://www.bigcat.unimaas.nl/\">department</a> will soon start the course Programming in the Life Sciences for a group of some\n10 students from the <a href=\"http://www.maastrichtuniversity.nl/web/Schools/MaastrichtScienceProgramme.htm\">Maastricht Science Programme</a>.\nThis is the first time we give this course, and over the next weeks I will be blogging about this course. First, some information.\nThese are the goals, to use programming to:</p>\n\n<ul>\n  <li>have the ability to recognize various classes of chemical entities in pharmacology and to understand the basic physical and chemical interactions.</li>\n  <li>be familiar with technologies for web services in the life sciences.</li>\n  <li>obtain experience in using such web services with a programming language.</li>\n  <li>be able to select web services for a particular pharmacological question.</li>\n  <li>have sufficient background for further, more advanced, bioinformatics data analyses.</li>\n</ul>\n\n<p>So, this course will be a mix of things. I will likely start with a lecture or too about scientific programming, such as the\nimportance of reproducibility, licensing, documentation, and (unit) testing. To achieve these learning goals we have set a\nproblem. The description is:</p>\n\n<blockquote>\n  <p>In the life sciences the interactions between chemical entities is of key interest. Not only do these play an important role\nin the regulation of gene expression, and therefore all cellular processes, they are also one of the primary approaches in\ndrug discovery. Pharmacology is the science studies the action of drugs, and for many common drugs, this is studying the\ninteraction of small organic molecules and protein targets.</p>\n\n  <p>And with the increasing information in the life sciences, automation becomes increasingly important. Big data and small data\nalike, provide challenges to integrate data from different experiments. The Open PHACTS platform provides web services to\nsupport pharmacological research and in this course you will learn how to use such web services from programming languages,\nallowing you to link data from such knowledge bases to other platforms, such as those for data analysis.</p>\n</blockquote>\n\n<p>So, it becomes pretty clear what the students will be doing. They only have six days, so it won’t be much. It’s just to learn\nthem the basic skills. The students are in their 3rd year at the university, and because of the nature of the programme they\nfollow, a mixed background in biology, mathematics, chemistry, and physics. So, I have a good hope they will surprise me in\nwhat they will get done.</p>\n\n<p>Pharmacology is the basic topic: drug-protein interaction, but the students are free to select a research question. In fact,\nI will not care that much what they like to study, as long as they do it properly. They will start with\n<a href=\"https://dev.openphacts.org/docs\">Open PHACTS’ Linked Data API</a>, but here too, they are free to complement data from the\nOPS cache with additional information. I hope they do.</p>\n\n<p>Now, regarding the technology they will use. The default will be JavaScript, and in the next week I will hack up demo code\nshowing the integration of <a href=\"https://github.com/openphacts/ops.js\">ops.js</a> and <a href=\"http://d3js.org/\">d3.js</a>.\nLet’s see how hard it will be; it’s new to me too. But, if the students\nalready are familiar with another programming language and prefer to use that, I won’t stop them.</p>\n\n<p>(For the Dutch readers, would #mscpils be a good tag?)</p>",
      "summary": "Our department will soon start the course Programming in the Life Sciences for a group of some 10 students from the Maastricht Science Programme. This is the first time we give this course, and over the next weeks I will be blogging about this course. First, some information. These are the goals, to use programming to:",
      
      "date_published": "2013-10-05T00:00:00+00:00",
      "date_modified": "2025-02-22T00:00:00+00:00",
      "tags": ["pra3006","javascript","openphacts"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/h24n0-r8e92",
      "url": "https://chem-bla-ics.linkedchemistry.info/2013/05/09/new-paper-chembl-database-as-linked.html",
      "title": "New Paper: &quot;The ChEMBL database as linked open data&quot;",
      "content_html": "<script src=\"https://d1bxh8uas1mnw7.cloudfront.net/assets/embed.js\" type=\"text/javascript\"></script>\n\n<div class=\"altmetric-embed\" data-badge-details=\"right\" data-badge-type=\"donut\" data-doi=\"10.1186/1758-2946-5-23\" style=\"float: right;\"></div>\n\n<p><strong>Update</strong>: Mark wrote up a <a href=\"http://chembl.blogspot.co.uk/2013/05/chembl-chembl-rdf.html\">blog post</a> on the RDF that the ChEMBL team itself.</p>\n\n<p>Yesterday, the paper “The ChEMBL database as linked open data” (doi:<a href=\"https://doi.org/10.1186/1758-2946-5-23\">10.1186/1758-2946-5-23</a>) by\nAndra Waagmeester (<a href=\"https://twitter.com/andrawaag\">@andrawaag</a>), Ola Spjuth (<a href=\"https://twitter.com/ola_spjuth\">@ola_spjuth</a>), Peter Ansell\n(<a href=\"http://twitter.com/p_ansell\">@p_ansell</a>), Antony Williams (<a href=\"https://twitter.com/chemconnector\">@chemconnector</a>), Valery Tkachenko,\nJanna Hastings, Bin Chen (<a href=\"http://twitter.com/binchenindiana\">@binchenindiana</a>), David J Wild (<a href=\"http://twitter.com/davidjohnwild\">@davidjohnwild</a>),\nand me appeared in the OA <a href=\"http://en.wikipedia.org/wiki/Journal_of_Cheminformatics\">JChemInf</a> journal.</p>\n\n<p>I am also indebted to the <a href=\"https://www.ebi.ac.uk/chembl/\">ChEMBL</a> team (<a href=\"http://twitter.com/chembl\">@chembl</a>) for both providing such\nvaluable data under a liberal Open Access license and their critical reading of the manuscript! <strong>Additionally, I would like to stress\nthat the ChEMBL team will create their own RDF version of ChEMBL and that this paper is not describing the version they will release.</strong></p>\n\n<p>BTW, the <a href=\"https://github.com/egonw/chembl-rdf-paper/\">source of the paper</a> is available from GitHub. And the\n<a href=\"https://github.com/egonw/chembl.rdf\">(original) scripts to create RDF from the MySQL dump of ChEMBL</a> are also on GitHub.</p>\n\n<p><img src=\"https://media.springernature.com/lw685/springer-static/image/art%3A10.1186%2F1758-2946-5-23/MediaObjects/13321_2012_Article_469_Figa_HTML.gif\" alt=\"\" /></p>\n\n<p>This paper outlines the <a href=\"http://www.jcheminf.com/content/3/1/15\">RDF</a> as it has evolved from various earlier projects. The above\ndiagram visualizes the basic structure (red), various Linked Data resources linked too (blue) and illustrates how various ontologies are used,\nsuch as the <a href=\"http://www.plosone.org/article/info:doi/10.1371/journal.pone.0025513\">CHEMINF</a>, <a href=\"http://bibliontology.com/\">BIBO</a>,\nand <a href=\"http://www.jbiomedsem.com/content/1/S1/S6\">CiTO</a> ontologies.</p>\n\n<p>Additionally, various applications and links are described developed by various co-authors. For example, Peter worked on the use in\n<a href=\"http://bio2rdf.org/\">Bio2RDF</a> and Bin and David on <a href=\"http://cheminfov.informatics.indiana.edu:8080/\">Chem2Bio2RDF</a>. Andra developed\nan extension for his (#altmetric) <a href=\"http://citedin.org/\">CitedIn</a> resource, giving credit to a paper when data in it is extracted into\nChEMBL. Ola, Valery, and Anthony developed a <a href=\"http://www.bioclipse.net/decision-support\">Bioclipse Decision Support</a> extension,\nwhich supports a nearest neighbor search in ChEMBL using <a href=\"http://chemspider.com/\">ChemSpider</a>. Of course, Ola also hosts\n<a href=\"http://rdf.farmbio.uu.se/chembl/snorql/\">the SPARQL end point</a> of which you can monitor the uptime at the also cool\n<a href=\"http://labs.mondeca.com/sparqlEndpointsStatus/details/farmbio-chembl.html\">mondeca.com service</a>:</p>\n\n<p><img src=\"/assets/images/mondecaUptime.png\" alt=\"\" /></p>\n\n<p>(Yes, I think I have all the cool buzzwords covered in this paper. Sadly, marketing is needed nowadays as a scientist. Where is the\ntime that you could rant on page after page in all your domain specific jargon, not having to worry if your reader would understand\nit immediately, or without a university degree…)</p>\n\n<p>What this paper does not describe, is all the things I did with ChEMBL-RDF in the <a href=\"http://www.openphacts.org/\">Open PHACTS</a> project\n(<a href=\"https://twitter.com/open_phacts\">@Open_PHACTS</a>), which includes the use of <a href=\"http://qudt.org/\">QUDT</a> and the\n<a href=\"https://github.com/egonw/jqudt\">jQUDT</a> library for unit normalization outlined in <a href=\"http://www.bigcat.unimaas.nl/~egonw/units/\">this document</a>\nand the use of VoID for link sets as described in <a href=\"http://www.openphacts.org/specs/2012/WD-datadesc-20121019/\">this document</a>.</p>",
      "summary": "",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/mondecaUptime.png",
      "date_published": "2013-05-09T00:00:00+00:00",
      "date_modified": "2024-08-08T00:00:00+00:00",
      "tags": ["chembl","rdf","cito","cheminf","ontology","chemspider","openphacts"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1186/1758-2946-5-23", "doi": "10.1186/1758-2946-5-23"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1186/1758-2946-3-15", "doi": "10.1186/1758-2946-3-15"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1371/JOURNAL.PONE.0025513", "doi": "10.1371/JOURNAL.PONE.0025513"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1186/2041-1480-1-S1-S6", "doi": "10.1186/2041-1480-1-S1-S6"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/zmvjh-41p57",
      "url": "https://chem-bla-ics.linkedchemistry.info/2013/02/06/citeulike-adds-html-widget-to-embed.html",
      "title": "CiteULike adds a HTML widget to embed citations",
      "content_html": "<div class=\"cul_citation\" id=\"cul_citation_11962023\">\n<a href=\"https://web.archive.org/web/20170317105559/http://www.citeulike.org/user/egonw/article/11962023\"><img class=\"cul_citation_icon\" src=\"/assets/images/cul_icon.gif\" /></a>&nbsp;<span class=\"cul_citation_text\">Spjuth,&nbsp;O.; Carlsson,&nbsp;L.; Alvarsson,&nbsp;J.; Georgiev,&nbsp;V.; Willighagen,&nbsp;E.; Eklund,&nbsp;M.&nbsp;<i>Current Topics in Medicinal Chemistry</i>&nbsp;<b>2012,</b>&nbsp;<i>12,</i>&nbsp;1980-1986.</span><br />\n<br /></div>\n\n<p>Yeah, that looks like <a href=\"https://web.archive.org/web/20170424163633/https://citeulike.org/groupforum/2919/?highlight=40978#msg_40978\">what I asked for <i class=\"fa-solid fa-box-archive fa-xs\"></i></a> :) Thanx, and well done!</p>",
      "summary": "&nbsp;Spjuth,&nbsp;O.; Carlsson,&nbsp;L.; Alvarsson,&nbsp;J.; Georgiev,&nbsp;V.; Willighagen,&nbsp;E.; Eklund,&nbsp;M.&nbsp;Current Topics in Medicinal Chemistry&nbsp;2012,&nbsp;12,&nbsp;1980-1986.",
      
      "date_published": "2013-02-06T00:00:00+00:00",
      "date_modified": "2025-06-22T00:00:00+00:00",
      "tags": ["citeulike"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.2174/156802612804910287", "doi": "10.2174/156802612804910287"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/97y2t-wnh91",
      "url": "https://chem-bla-ics.linkedchemistry.info/2012/04/10/emerging-practices-for-mapping-and.html",
      "title": "&quot;Emerging practices for mapping and linking life sciences data using RDF&quot;",
      "content_html": "<p>The “Emerging practices for mapping and linking life sciences data using RDF” (doi:<a href=\"https://doi.org/10.1016/j.websem.2012.02.003\">10.1016/j.websem.2012.02.003</a>)\nis now available online, where I contributed a section on the original workflow for creating <a href=\"https://www.ebi.ac.uk/chembldb/\">ChEMBL</a> triples,\nand contributed to the section about open licensing, referring to <a href=\"http://creativecommons.org/publicdomain/zero/1.0/\">CCZero</a> and the\n<a href=\"http://pantonprinciples.org/\">Panton Principles</a>. Happy reading!</p>\n\n<p>(Yes, it is indeed an Elsevier journal…)</p>",
      "summary": "The “Emerging practices for mapping and linking life sciences data using RDF” (doi:10.1016/j.websem.2012.02.003) is now available online, where I contributed a section on the original workflow for creating ChEMBL triples, and contributed to the section about open licensing, referring to CCZero and the Panton Principles. Happy reading!",
      
      "date_published": "2012-04-10T00:00:00+00:00",
      "date_modified": "2012-04-10T00:00:00+00:00",
      "tags": ["semweb","chembl"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1016/J.WEBSEM.2012.02.003", "doi": "10.1016/J.WEBSEM.2012.02.003"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/y93s2-g7r29",
      "url": "https://chem-bla-ics.linkedchemistry.info/2012/03/22/visualizing-metabolite-fluxes-on.html",
      "title": "Visualizing metabolite fluxes on WikiPathways pathways using a PathVisio plugin",
      "content_html": "<p><strong>Visualizing metabolite fluxes on <a href=\"http://wikipathways.org/index.php/WikiPathways\">WikiPathways</a> pathways using a\n<a href=\"http://www.pathvisio.org/\">PathVisio</a> plugin</strong></p>\n\n<p><strong>Presenters</strong>: Anwesha Dutta (student in our group)</p>\n\n<p><img src=\"/assets/images/wp.png\" style=\"width: 40%; display: block; margin-left: auto; margin-right: auto; float: right\" />\n<strong>Date</strong>: Thursday March 29th 2012</p>\n\n<p><strong>Description</strong>:  Biological pathways provide intuitive frameworks to integrate and co-analyze different kinds of biological data, such as system-wide transcriptomic, proteomic, and metabolomic measurements. While insightful, pathway analysis is generally limited to qualitative conclusions, and the analyses can only be as powerful as the curated annotations can enable. Using our open-source pathway analysis platform, PathVisio, we will bridge pathway analysis to the wealth of quantitative approaches already in development for metabolic network modeling, such as flux balance analysis and dynamic simulation. Our focus will be on the visualization of the modeling results, which will be critical for understanding how simulated models correlate with experimental measurements.\nThe same biological processes that are visualized in pathways are also described by quantitative models. For example, the arrows that connect entities within metabolic pathways actually represent metabolite fluxes. The integration of large scale data analysis with modeled or measured fluxomics data, will help to gain more insights into the mechanism of the biological process.</p>\n\n<p>The meeting is in <strong>the BiGCaT course</strong> room (1.302 In H1), UNS50 south wing 1th floor from <strong>16.00 to 17.00</strong>.</p>",
      "summary": "Visualizing metabolite fluxes on WikiPathways pathways using a PathVisio plugin",
      
      "date_published": "2012-03-22T00:00:00+00:00",
      "date_modified": "2012-03-22T00:00:00+00:00",
      "tags": ["wikipathways","pathvisio"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/qtjby-n6m67",
      "url": "https://chem-bla-ics.linkedchemistry.info/2012/03/04/chembl-13-as-rdf.html",
      "title": "ChEMBL 13 as RDF",
      "content_html": "<p><strong>Update</strong>: this work is now described in <a href=\"https://chem-bla-ics.linkedchemistry.info/2013/05/09/new-paper-chembl-database-as-linked.html\">this paper <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.</p>\n\n<p>Last week, ChEMBL 13 was <a href=\"http://chembl.blogspot.com/2012/02/chembl-13-released.html\">released</a>, with even more data, data fixes,\n<a href=\"ftp://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/releases/chembl_13/chembl_13_release_notes.txt\">etc</a>. Since my RDF for\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2011/04/21/chembl-09-as-rdf.html\">ChEMBL 09 <i class=\"fa-solid fa-recycle fa-xs\"></i></a> my workflow has become\n<a href=\"https://github.com/egonw/chembl.rdf/commits/master\">more solid</a> and uses more common ontologies, started using more common ontologies\nand ontologies I just like, such as <a href=\"http://www.plosone.org/article/info:doi/10.1371/journal.pone.0025513\">CHEMINF</a> and\n<a href=\"http://www.jbiomedsem.com/content/1/S1/S6\">CiTO</a>. Below is an overview of the resource types present in the RDF:\nactivities (almost 7M now), chemical entities, assays, targets, and documents.</p>\n\n<p><img src=\"/assets/images/relations.png\" alt=\"\" /></p>\n\n<p>The <a href=\"https://chem-bla-ics.linkedchemistry.info/2011/10/22/chembl-rdf-uploading-data-to-kasabi.html\">data on Kasabi <i class=\"fa-solid fa-recycle fa-xs\"></i></a> will be updated soon,\nand the <a href=\"http://rdf.farmbio.uu.se/chembl/sparql\">SPARQL end point</a> hosted by Uppsala University was updated yesterday, including the\n<a href=\"http://rdf.farmbio.uu.se/chembl/snorql/\">SNORQL frontend</a>:</p>\n\n<p><img src=\"/assets/images/chemblRDF13.png\" alt=\"\" /></p>\n\n<p>The new data is not fully backwards compatible. The changes to the RDF include the use of <code class=\"language-plaintext highlighter-rouge\">cito:citesAsDataSource</code>, more typing\nusing existing ontologies, e.g. with <code class=\"language-plaintext highlighter-rouge\">cheminf:CHEMINF_000000</code> and <code class=\"language-plaintext highlighter-rouge\">pro:PR_000000001</code> from the\n<a href=\"http://pir.georgetown.edu/pro/\">PRotein Ontology</a>.</p>\n\n<p>A paper dedicated to the ChEMBL-RDF is in preparation. Existing use cases can be found\n<a href=\"http://www.jbiomedsem.com/content/2/S1/S6\">here</a>.</p>",
      "summary": "Update: this work is now described in this paper .",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/relations.png",
      "date_published": "2012-03-04T00:00:00+00:00",
      "date_modified": "2024-11-02T00:00:00+00:00",
      "tags": ["chembl","rdf","semweb","ontology","cheminf","cito"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1371/JOURNAL.PONE.0025513", "doi": "10.1371/JOURNAL.PONE.0025513"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1186/2041-1480-1-S1-S6", "doi": "10.1186/2041-1480-1-S1-S6"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1186/2041-1480-2-S1-S6", "doi": "10.1186/2041-1480-2-S1-S6"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/25dgb-j2y93",
      "url": "https://chem-bla-ics.linkedchemistry.info/2012/02/23/cito-citeulike-publishing-innovation.html",
      "title": "CiTO / CiteULike: publishing innovation",
      "content_html": "<p>Readers of my blog know I have been using the Citation Typing Ontology, CiTO (doi:<a href=\"http://dx.doi.org/10.1186/2041-1480-1-S1-S6\">10.1186/2041-1480-1-S1-S6</a>).\nI allows me to see <a href=\"http://chem-bla-ics.blogspot.com/2010/02/citing-chemistry-development-kit.html\">how the CDK</a> is\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2010/10/31/citeulike-cito-use-case-1-wordles.html\">cited and used <i class=\"fa-solid fa-recycle fa-xs\"></i></a>. CiteULike is currently adding more CiTO more functionality,\nwhich they <a href=\"https://chem-bla-ics.linkedchemistry.info/2010/09/17/list-of-things-i-miss-in-citeulike.html\">started <i class=\"fa-solid fa-recycle fa-xs\"></i></a> doing almost one and a half years ago.</p>\n\n<p>One of the things, is that the CiTO data added via a certain account, can be downloaded as triples:</p>\n\n<p><img src=\"/assets/images/culcito2.png\" alt=\"\" /></p>\n\n<p>The second is that they are improving the graphics of how it is visualized. E.g. they added an ‘Expand’ link, which I found when they\n<a href=\"https://twitter.com/#!/citeulike/status/172446830666321921\">tweeted</a> they had hidden drag-n-drop, which I haven’t found yet, though.\nClicking that action, will show you the following:</p>\n\n<p><img src=\"/assets/images/culcito.png\" alt=\"\" /></p>\n\n<p>Because CiteULike takes advantage of the <a href=\"http://www.w3.org/TR/owl-ref/#InverseFunctionalProperty-def\">inverse function</a> of the CiTO predictates,\nthey show up with the cited paper too, which is less suitable for the top-down flow graphics:</p>\n\n<p><img src=\"/assets/images/culcito1.png\" alt=\"\" /></p>\n\n<p>To make this advertorial a bit balanced, not all <a href=\"https://chem-bla-ics.linkedchemistry.info/2010/09/17/list-of-things-i-miss-in-citeulike.html\">my wishes <i class=\"fa-solid fa-recycle fa-xs\"></i></a> have been\nimplemented yet, and the next up from my perspective should be Linked Data. There is some Linked Data embedded as RDFa, but the latter is not turning out\nto be the killer I had hoped, and regular RDF entry points should be used.</p>\n\n<p>Each CiteULike entry (post) should get a unique <a href=\"http://en.wikipedia.org/wiki/Internationalized_Resource_Identifier\">IRI</a> (or\n<a href=\"http://en.wikipedia.org/wiki/Uniform_resource_identifier\">URI</a>) and opening that link should give RDF about that post\n(<a href=\"http://www.citeulike.org/groupforum/2191\">wish #10</a>). That’s is <a href=\"http://en.wikipedia.org/wiki/Dereferenceable_Uniform_Resource_Identifier\">dereferencibility</a>.\nThe RDF can be, for example, in <a href=\"http://bibliontology.com/\">BIBO</a> but there are many alternatives, and I have not been keeping up with which is the best\n(please leave a comment, if you have an opinion on that).</p>\n\n<p>But I like where this is going! Thanx, CiteIReallyLikeThis!</p>",
      "summary": "Readers of my blog know I have been using the Citation Typing Ontology, CiTO (doi:10.1186/2041-1480-1-S1-S6). I allows me to see how the CDK is cited and used . CiteULike is currently adding more CiTO more functionality, which they started doing almost one and a half years ago.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/culcito1.png",
      "date_published": "2012-02-23T00:00:00+00:00",
      "date_modified": "2024-11-02T00:00:00+00:00",
      "tags": ["citeulike","cito","rdf"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1186/2041-1480-1-S1-S6", "doi": "10.1186/2041-1480-1-S1-S6"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/yve3j-c9015",
      "url": "https://chem-bla-ics.linkedchemistry.info/2012/01/15/groovy-cheminformatics-4th-edition.html",
      "title": "Groovy Cheminformatics 4th edition",
      "content_html": "<p>Six month was not quite the amount of time I anticipated between the third and fourth edition, but I finally managed\nto upload edition 1.4.7-0 of my <a href=\"http://www.lulu.com/product/paperback/groovy-cheminformatics-with-the-chemistry-development-kit/18825420\">Groovy Cheminformatics</a>\nbook. The first three editions sold 37 copies, including two for myself. Enough to feel supported and to continue working on it.</p>\n\n<p>So, this new edition is again thicker, summing up to 152 pages now, which is 28 pages more than\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2011/07/31/groovy-cheminformatics-3rd-edition.html\">the 3rd edition <i class=\"fa-solid fa-recycle fa-xs\"></i></a>. Indeed, the table of contents\nis more than half a page longer in itself, though, just barely, still fitting on four pages. In fact, I had to remove one (new)\nsubsection title, because it would take otherwise two further pages.</p>\n\n<p>The new content is again a mix of sections and chapters. While writing new chapters, I find myself realizing I need to cover\nmore basics. Those get typically added as new sections. I did not get many feature requests, except for one email pointing me\nthe text promised how to interpret and handle failing atom type perception, which explains one of the new sections.\nThe full list of new content is:</p>\n\n<ul>\n  <li>Section 2.1.4: explaining the three flavors of atomic coordinates</li>\n  <li>Extended Section 2.2: added detail about electron counts of bonds (partly in reply to this post by Rich)</li>\n  <li>Chapter 5 “Protein and DNA”: four pages, mostly about PDB files, and the matching CDK data structure</li>\n  <li>Chapter 6 “IChemObjectBuilders”: four pages explaining the four alternative builders CDK 1.4.7 has</li>\n  <li>Section 7.8: a new section with recipes on how to post-process read input, discussing MDL molfiles only now. It talks about what information is present in the file format, and what steps must be untertaken to add missing information</li>\n  <li>Section 8.2.4 “No atom type perceived?!”</li>\n  <li>Section 11.4: describes how to depict aromatic rings</li>\n  <li>Section 11.5: describes how to change the background color of depictions</li>\n  <li>Section 13.4: explains how to calculate the Van der Waals volume of molecules</li>\n  <li>Section 18.1.3: discussing the API improvement in the iterating readers</li>\n  <li>Appendix C: a list of all descriptors provided by the CDK</li>\n  <li>Appendix D: a list of file formats known by the CDK, indicating which has readers and writers</li>\n</ul>\n\n<p>On top of that, I improved other bits of the book too, such as the resolution of the depictions of molecules,\nas well as those of various diagrams. Also the number of scripts has seriously gone up, from 94 to 134!</p>\n\n<p>Appendix C is a prelude to a chapter I am already writing, but did not get finished yet: a chapter about\ndescriptor calculation. But since I just started a new post-doc position, it may take another six months\nfor that chapter to make it into print.</p>\n\n<p>The paperbak is <a href=\"http://www.lulu.com/product/paperback/groovy-cheminformatics-with-the-chemistry-development-kit/18825420\">available from Lulu.com</a>,\nan on-demand publisher, as well as <a href=\"http://www.lulu.com/product/ebook/groovy-cheminformatics-with-the-chemistry-development-kit/18825437\">this ebook version</a>.</p>",
      "summary": "Six month was not quite the amount of time I anticipated between the third and fourth edition, but I finally managed to upload edition 1.4.7-0 of my Groovy Cheminformatics book. The first three editions sold 37 copies, including two for myself. Enough to feel supported and to continue working on it.",
      
      "date_published": "2012-01-15T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["cdk","cdkbook","java","cheminf"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/rz0tz-3wa91",
      "url": "https://chem-bla-ics.linkedchemistry.info/2011/11/01/oscar4-paper-text-mining-in-bioclipse.html",
      "title": "Oscar4 paper: text mining in Bioclipse (and everywhere else, of course)",
      "content_html": "<p>The <a href=\"http://www.jcheminf.com/content/3/1/41\">Oscar4 paper</a> (CC-BY, just like the screenshots of the paper below) was out already some days now, but the formatting has finished:</p>\n\n<p><img src=\"/assets/images/oscar4Paper.png\" alt=\"\" /></p>\n\n<p>I spotted a rogue <code class=\"language-plaintext highlighter-rouge\">http://</code> in the code example b) in <a href=\"http://www.jcheminf.com/content/3/1/41#IDAE2JBD\">Appendix B</a>:</p>\n\n<p><img src=\"/assets/images/oscar4Paper2.png\" alt=\"\" /></p>\n\n<p>I’ll see what I can do about that, but the API might evolve a bit anyway.</p>\n\n<p>That leaves me to mention that <a href=\"http://chem-bla-ics.blogspot.com/2011/09/almost-year-ago-i-started-position-with.html\">Bioclipse has an Oscar extension</a>\n(<a href=\"http://www.bioclipse.net/\">Bioclipse</a> has a lot of functionality nowadays, in fact),\nand that I <a href=\"http://chem-bla-ics.blogspot.com/2010/12/text-mining-chemistry-from-dutch-or.html\">blogged several times on Oscar4</a>\nwhen I was working with the other authors on the refactoring last year.</p>",
      "summary": "The Oscar4 paper (CC-BY, just like the screenshots of the paper below) was out already some days now, but the formatting has finished:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/oscar4Paper.png",
      "date_published": "2011-11-01T00:00:00+00:00",
      "date_modified": "2011-11-01T00:00:00+00:00",
      "tags": ["oscar","bioclipse","myexperiment"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1186/1758-2946-3-41", "doi": "10.1186/1758-2946-3-41"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/k1860-kks41",
      "url": "https://chem-bla-ics.linkedchemistry.info/2011/10/22/chembl-rdf-uploading-data-to-kasabi.html",
      "title": "ChEMBL-RDF: Uploading data to Kasabi with pytassium",
      "content_html": "<p>I reported earlier how to I <a href=\"http://chem-bla-ics.blogspot.com/2011/07/chempedia-rdf-3-uploading-data-to.html\">uploaded the ChemPedia (RIP) data onto Kasabi</a>.\nBut for ChEMBL-RDF I have used the <a href=\"https://github.com/iand/pytassium\">pytassium</a> tool, not just because it has a cool name :) I discovered yesterday,\nhowever, that I did not write down in this lab notebook, what steps I needed to take to reproduce it. And I just wanted to uploaded new triples to the\n<a href=\"http://kasabi.com/dataset/chembl-rdf\">ChEMBL-RDF data set on Kasabi</a>.</p>\n\n<p>The new triples I wanted to upload, link the <a href=\"http://chembl.blogspot.com/2011/08/chembl-11-released.html\">new public CHEMBL identifiers</a>\n(like <a href=\"https://www.ebi.ac.uk/chembldb/index.php/compound/inspect/CHEMBL25\">CHEMBL25 for aspirin</a>) to the internal ChEMBL database identifier I used for\nChEMBL 09 for the URIs. So, I am adding a lot of triples like:</p>\n\n<div class=\"language-turtle highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nl\">&lt;http://data.kasabi.com/dataset/chembl-rdf/09/molecule/m517180&gt;</span><span class=\"w\"> </span><span class=\"nl\">&lt;http://www.w3.org/2002/07/owl#sameAs&gt;</span><span class=\"w\">\n</span><span class=\"nl\">&lt;http://data.kasabi.com/dataset/chembl-rdf/09/chemblid/CHEMBL1&gt;</span><span class=\"w\">\n</span></code></pre></div></div>\n\n<p>And the pytassium code I use to upload this to Kasabi looks like:</p>\n\n<div class=\"language-python highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kn\">import</span> <span class=\"n\">pytassium</span>\n<span class=\"kn\">import</span> <span class=\"n\">time</span>\n\n<span class=\"n\">dataset</span> <span class=\"o\">=</span> <span class=\"n\">pytassium</span><span class=\"p\">.</span><span class=\"nc\">Dataset</span><span class=\"p\">(</span><span class=\"sh\">'</span><span class=\"s\">chembl-rdf</span><span class=\"sh\">'</span><span class=\"p\">,</span><span class=\"sh\">'</span><span class=\"s\">XXX</span><span class=\"sh\">'</span><span class=\"p\">)</span>\n\n<span class=\"c1\"># Store the contents of a turtle file\n</span><span class=\"n\">dataset</span><span class=\"p\">.</span><span class=\"nf\">store_file</span><span class=\"p\">(</span><span class=\"sh\">'</span><span class=\"s\">chemblids.nt</span><span class=\"sh\">'</span><span class=\"p\">,</span> <span class=\"n\">media_type</span><span class=\"o\">=</span><span class=\"sh\">'</span><span class=\"s\">text/plain</span><span class=\"sh\">'</span><span class=\"p\">)</span>\n</code></pre></div></div>\n\n<p>So, that omission in my log book has been corrected now.</p>",
      "summary": "I reported earlier how to I uploaded the ChemPedia (RIP) data onto Kasabi. But for ChEMBL-RDF I have used the pytassium tool, not just because it has a cool name :) I discovered yesterday, however, that I did not write down in this lab notebook, what steps I needed to take to reproduce it. And I just wanted to uploaded new triples to the ChEMBL-RDF data set on Kasabi.",
      
      "date_published": "2011-10-22T00:00:00+00:00",
      "date_modified": "2011-10-22T00:00:00+00:00",
      "tags": ["kasabi","chembl","semweb"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/qgrq1-4r761",
      "url": "https://chem-bla-ics.linkedchemistry.info/2011/09/27/almost-year-ago-i-started-position-with.html",
      "title": "Bioclipse-Oscar4 - Text mining in Bioclipse",
      "content_html": "<p>Almost a year ago I <a href=\"https://chem-bla-ics.linkedchemistry.info/2010/10/15/working-on-oscar-for-three-months.html\">started a position <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nwith <a href=\"http://blogs.ch.cam.ac.uk/pmr/\">Peter Murray-Rust</a> to work on Oscar for three months (see this overview of results;\na paper by the full Oscar team (Sam, David, Dan, Lezan) is pending, and I’m really happy to have been able to contribute\nbits to the project). Since then, I have had little time :( That’s how it goes, with post-hopping, unfortunately.\nOne thing I did do after that, was write a <a href=\"https://github.com/bioclipse/bioclipse.oscar\">Bioclipse plugin</a>.</p>\n\n<p>I was asked recently via <a href=\"http://www.linkedin.com/in/egonw\">LinkedIn</a> if I was planning a Bioclipse-Oscar plugin, and\nI realized that I forgot to blog about it. So, here goes. The <code class=\"language-plaintext highlighter-rouge\">oscar</code> manager I implemented follows the\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2010/10/28/oscar4-java-api-chemical-name.html\">Oscar API <i class=\"fa-solid fa-recycle fa-xs\"></i></a>, and these\nmethods are available: <code class=\"language-plaintext highlighter-rouge\">extractText()</code>, <code class=\"language-plaintext highlighter-rouge\">findNamedEntities()</code>,  <code class=\"language-plaintext highlighter-rouge\">findResolvedNamedEntities()</code>.</p>\n\n<p>When I wrote the plugin, I also uploaded an <a href=\"http://www.myexperiment.org/workflows/2117.html\">example workflow to MyExperiment</a>.\nThe code is:</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"c1\">// Demo showing the Oscar text mining functionality</span>\n<span class=\"c1\">// in Bioclipse</span>\n<span class=\"kd\">var</span> <span class=\"nx\">html</span> <span class=\"o\">=</span> <span class=\"nx\">bioclipse</span><span class=\"p\">.</span><span class=\"nf\">download</span><span class=\"p\">(</span>\n  <span class=\"dl\">\"</span><span class=\"s2\">http://dx.doi.org/10.3762/bjoc.6.133</span><span class=\"dl\">\"</span><span class=\"p\">,</span>\n  <span class=\"dl\">\"</span><span class=\"s2\">text/html</span><span class=\"dl\">\"</span>\n<span class=\"p\">)</span>\n<span class=\"kd\">var</span> <span class=\"nx\">text</span> <span class=\"o\">=</span> <span class=\"nx\">oscar</span><span class=\"p\">.</span><span class=\"nf\">extractText</span><span class=\"p\">(</span><span class=\"nx\">html</span><span class=\"p\">);</span>\n<span class=\"c1\">// the next step may take some time, while</span>\n<span class=\"c1\">// initializing the Oscar software for the</span>\n<span class=\"c1\">// first time</span>\n<span class=\"kd\">var</span> <span class=\"nx\">mols</span> <span class=\"o\">=</span> <span class=\"nx\">oscar</span><span class=\"p\">.</span><span class=\"nf\">findResolvedNamedEntities</span><span class=\"p\">(</span><span class=\"nx\">text</span><span class=\"p\">);</span>\n<span class=\"kd\">var</span> <span class=\"nx\">file</span> <span class=\"o\">=</span> <span class=\"dl\">\"</span><span class=\"s2\">/Oscar Demo/extractedMols.sdf</span><span class=\"dl\">\"</span><span class=\"p\">;</span>\n<span class=\"nx\">cdk</span><span class=\"p\">.</span><span class=\"nf\">saveSDFile</span><span class=\"p\">(</span><span class=\"nx\">file</span><span class=\"p\">,</span> <span class=\"nx\">mols</span><span class=\"p\">);</span>\n<span class=\"nx\">ui</span><span class=\"p\">.</span><span class=\"nf\">open</span><span class=\"p\">(</span><span class=\"nx\">file</span><span class=\"p\">);</span>\n</code></pre></div></div>\n\n<p>The code will extract chemical entities, and open a molecules table in <a href=\"http://www.bioclipse.net/\">Bioclipse</a>:</p>\n\n<p><img src=\"/assets/images/oscarDemo2.png\" alt=\"\" /></p>",
      "summary": "Almost a year ago I started a position with Peter Murray-Rust to work on Oscar for three months (see this overview of results; a paper by the full Oscar team (Sam, David, Dan, Lezan) is pending, and I’m really happy to have been able to contribute bits to the project). Since then, I have had little time :( That’s how it goes, with post-hopping, unfortunately. One thing I did do after that, was write a Bioclipse plugin.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/oscarDemo2.png",
      "date_published": "2011-09-27T00:00:00+00:00",
      "date_modified": "2025-03-05T00:00:00+00:00",
      "tags": ["oscar","bioclipse","beilstein"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/pxxek-shz13",
      "url": "https://chem-bla-ics.linkedchemistry.info/2011/09/17/inchikey-collision-diy-copypastables.html",
      "title": "InChIKey collision: the DIY copy/pastables",
      "content_html": "<p>About two weeks ago, the ChemConnector blog <a href=\"https://web.archive.org/web/20110928120027/http://www.chemconnector.com/2011/09/01/an-inchikey-collision-is-discovered-and-not-based-on-stereochemistry/\">reported an InChIKey collosion <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>\ndetected by <a href=\"https://www.ch.cam.ac.uk/person/jmg11\">Prof. Goodman <i class=\"fa-solid fa-recycle fa-xs\"></i></a>. Unlike the previous collision, this one was based solely on the graph and not on\nstereochemistry. The two molecules both have the InChIKey OCPAUTFLLNMYSX-UHFFFAOYSA-N:</p>\n\n<table>\n<tr>\n  <td><img src=\"/assets/images/inchi2.png\" /></td>\n  <td><img src=\"/assets/images/inchi1.png\" /></td>\n</tr>\n</table>\n\n<p>The compounds are really different, the molecular formulas are C<sub>50</sub>H<sub>102</sub>O and C<sub>57</sub>H<sub>114</sub>O\nrespectively. The SMILESes are <span class=\"chem:smiles\">OC(C)C(C)CC(C)C(C)CCC(C)C(C)CCCC(C)C(C)CC(C)C(C)CCCC(C)C(C)CCC(C)C(C)CC(C)CCCCCCC</span>\nand <span class=\"chem:smiles\">O=C(C)CC(C)C(C)CCC(C)CCC(C)C(C)C(C)C(C)C(C)C(C)C(C)C(C)CC(C)C(C)C(C)CC(C)C(C)C(C)CCCCC(C)C(C)CC(C)C(C)C</span>.\nThe IUPAC names are useful to have as copy/pastables too (e.g. with the <a href=\"http://opsin.ch.cam.ac.uk/\">OPSIN</a>-based\n‘<a href=\"http://chem-bla-ics.blogspot.com/2011/02/opsin-used-for-bioclipse-wizard.html\">Molecule from IUPAC name</a>‘-wizard in\n<a href=\"http://bioclipse.net/\">Bioclipse</a> 2.5,\nwhich has been updated to the latest OPSIN version this week): 3,5,6,9,10,14,15,17,18,22,23,26,27,29-tetradecamethylhexatriacontan-2-ol\nand 4,5,8,11,12,13,14,15,16,17,18,20,21,22,24,25,26,31,32,34,35-henicosamethylhexatriacontan-2-one.</p>\n\n<p>I am adding these structures to the <a href=\"http://chem-bla-ics.blogspot.com/2011/03/pharmaceutical-bioinformatics.html\">pharmbio.org course book</a>\nand the matching Bioclipse plugin this weekend.</p>",
      "summary": "About two weeks ago, the ChemConnector blog reported an InChIKey collosion detected by Prof. Goodman . Unlike the previous collision, this one was based solely on the graph and not on stereochemistry. The two molecules both have the InChIKey OCPAUTFLLNMYSX-UHFFFAOYSA-N:",
      
      "date_published": "2011-09-17T00:00:00+00:00",
      "date_modified": "2025-02-23T00:00:00+00:00",
      "tags": ["inchi"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/eg94z-9dg88",
      "url": "https://chem-bla-ics.linkedchemistry.info/2011/08/02/my-google-scholar-citations-profile.html",
      "title": "My Google Scholar Citations profile arrived",
      "content_html": "<p><a href=\"http://en.wikipedia.org/wiki/Web_of_Science\">Web of Science</a> is my de facto standard for citation statistics (I need these for\n<a href=\"http://vr.se/\">VR</a> grant applications), and defines the lower limit of citations (it is pretty clean, but I do have to ping them now\nand then to fix something). The public front-end of it is <a href=\"http://www.researcherid.com/rid/C-6136-2008\">Researcher ID</a>. There is an\n<a href=\"http://academic.research.microsoft.com/Author/2893110/egon-l-willighagen\">Microsoft initiative</a>, which looks clean but doesn’t work\non Linux for the nicer things, but the coverage of journals is pretty bad in my field, giving a biased (downwards)\n<a href=\"http://en.wikipedia.org/wiki/H-index\">H-index</a>. And\n<a href=\"http://web.archive.org/web/20110815142119/http://www.citeulike.org/user/egonw\">CiteULike <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>\nand <a href=\"http://www.mendeley.com/profiles/egon-willighagen/\">Mendeley</a> focus more on your publications than on citations (though the former\nhas <a href=\"http://opencitations.wordpress.com/2010/10/21/use-of-cito-in-citeulike/\">great CiTO support</a>!).</p>\n\n<p>Then <a href=\"http://googlescholar.blogspot.com/2011/07/google-scholar-citations.html\">Google Scholar Citations</a> (GSC) shows up. While it\ndoes not look as pretty as competing products, it compensates that with a wide coverage of literature (for example, it supports the\n<a href=\"http://jcheminf.com/\">JChemInf</a>, which Web-of-Science currently does not; and I happen to publish a lot in that journal recently),\nbooks, and reports, while keeping false positives fairly low. Thus, it provides an upper limit of my citations statistics, but one\nI am pretty happy confident about. And my H-index is quite comparable anyway. This is what\n<a href=\"http://scholar.google.com/citations?user=u8SjMZ0AAAAJ\">my profile</a> looks like:</p>\n\n<p><img src=\"/assets/images/gsc.png\" alt=\"\" /></p>\n\n<p>So, these statistics have two purposes to me: 1. grant applications, and 2. I like to know what people based theirs on my research. (Well,\nOK, 3. it helps me understand why I work so hard on too many things.)</p>\n\n<p>Now the question is, will GSC take off. Will it replace <a href=\"http://orcid.org/\">ORCID</a>? Will they join ORCID? Will GSC get a good API?\nWho will write the first <a href=\"http://www.biomedcentral.com/1471-2105/8/487\">userscript</a> to make the GUI fancier? Will GSC support CiTO?\nWill GSC start using microformats or RDFa? What mashups can we expect between bibliographic databases? Will new entries automatically\nbe posted to Google+? Will it have a button to autocreate a blog post when a paper gets cited 100, 500, or a 1000 times? Will GSC\nsupport <a href=\"http://friendfeed.com/search?q=%23altmetrics\">#altmetrics</a>?</p>",
      "summary": "Web of Science is my de facto standard for citation statistics (I need these for VR grant applications), and defines the lower limit of citations (it is pretty clean, but I do have to ping them now and then to fix something). The public front-end of it is Researcher ID. There is an Microsoft initiative, which looks clean but doesn’t work on Linux for the nicer things, but the coverage of journals is pretty bad in my field, giving a biased (downwards) H-index. And CiteULike and Mendeley focus more on your publications than on citations (though the former has great CiTO support!).",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/gsc.png",
      "date_published": "2011-08-02T00:00:00+00:00",
      "date_modified": "2025-03-08T00:00:00+00:00",
      "tags": ["google","citeulike"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1186/1471-2105-8-487", "doi": "10.1186/1471-2105-8-487"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/q66d6-pqr12",
      "url": "https://chem-bla-ics.linkedchemistry.info/2011/07/31/groovy-cheminformatics-3rd-edition.html",
      "title": "Groovy Cheminformatics 3rd edition",
      "content_html": "<p><strong>Update</strong>: the <a href=\"https://chem-bla-ics.linkedchemistry.info/2012/01/15/groovy-cheminformatics-4th-edition.html\">fourth edition <i class=\"fa-solid fa-recycle fa-xs\"></i></a> is out.</p>\n\n<p>I am starting to get the hang of this publishing soon, publishing often thing, and\n<a href=\"http://www.lulu.com/product/paperback/groovy-cheminformatics-with-the-chemistry-development-kit/16378378\">just uploaded</a>\nedition 1.4.1-0 of the <a href=\"https://chem-bla-ics.linkedchemistry.info/2011/02/06/groovy-cheminformatics.html\">Groovy Cheminformatics <i class=\"fa-solid fa-recycle fa-xs\"></i></a> book.\nThe cover is the same (with one typo fix), and the content is 20 pages thicker. True, six of those pages are isotope\nmasses of all natural isotopes. That leaves 14 pages with this new content:</p>\n\n<ul>\n  <li>Section 2.7 on line notations with 2.7.1 about reading and writing SMILES</li>\n  <li>Section 6.3 about Sybyl (mol2) atom types</li>\n  <li>Section 7.4 on atom numbering with 7.4.1 on Morgan atom numbers, and 7.4.2 on InChI atom numbers</li>\n  <li>Chapter 9 on molecule depiction with the new rendering code, with\n    <ul>\n      <li>Section 9.1 on drawing molecules,</li>\n      <li>Section 9.2 on rendering parameters, and</li>\n      <li>Section 9.3 on the generator API and how to add custom content</li>\n    </ul>\n  </li>\n  <li>Section 11.4 on calculating aromaticity</li>\n  <li>Appendix A.2 listing all Sybyl atom types</li>\n  <li>Appendix B listing all naturally occurring isotopes</li>\n</ul>\n\n<p>Features requests most welcome.</p>",
      "summary": "Update: the fourth edition is out.",
      
      "date_published": "2011-07-31T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["cdk","cdkbook","java","cheminf"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/zq2m3-dxp07",
      "url": "https://chem-bla-ics.linkedchemistry.info/2011/07/13/data-nonotify-or-silent.html",
      "title": "Data, Nonotify, or Silent?",
      "content_html": "<p>I cannot find the bug report just now, but the <a href=\"http://cdk.sf.net/\">CDK</a> has an open problem with change even notification,\nwhere the nonotify classes still caused change event to be sent around.</p>\n\n<p>This was because the nonotify classes extended in a wrong way the data classes. So, I worked today on copying the data class\nimplementations into a new implementation, not extending the data classes, while removing the listener code: the <em>silent</em>\nmodule. I’m not entirely done yet, but close enough to blog about it. While checking things, I ran the\n<a href=\"https://github.com/egonw/cheminfbenchmark\">cheminfbench</a> code on it, with these results:</p>\n\n<p><img src=\"/assets/images/silent.png\" alt=\"\" /></p>\n\n<p>So, removal of the notification listening improves the performance, when reading a 416 entry SD file. I think the difference\nwill be more significant for other tasks, like ring finding.</p>\n\n<p>But, but…?!?! Yeah, this is a rather weird plot indeed… the blue bar should also be lower than the red one! And it used\nto be too… :( Bad regression… hard to unit test too :(</p>\n\n<p>OK, back to some final clean up.</p>\n\n<p><strong>Update</strong>: the clean up is done, and I have now run the fingerprint benchmark from cheminfbench using the new module and\nnonotify. In a situation when change events are much more used (as is with fingerprint calculation), we see that nonotify\nstill improves speed, and that the new silent module shows about the same speed up. We also see that the 1.4.x classes\nare a bit slower than one classes of some 20 months ago. That probably reflects\n<a href=\"https://sourceforge.net/tracker/?func=detail&amp;aid=2992921&amp;group_id=20024&amp;atid=120024\">bug 2992921</a> that was recently fixed.\nThe full bar plot:</p>\n\n<p><img src=\"/assets/images/silent1.png\" alt=\"\" /></p>\n\n<p>Red and blue are CDK 1.2.x (as the plot legend says), green and yellow the same for CDK 1.3.x (and both clearly faster than\nthe 1.2 series, and purple an light blue the same for CDK 1.4.0. The last bar is the new silent module, a tid bit slower\nthan nonotify.</p>\n\n<p><strong>Update 2</strong>: OK, one last update. The performance difference can actually be larger than this. The below screen shot shows\nthe effect of the silent module (blue, yellow) on SMILES generation (without and with lower case formalism, red and green\nrespectively):</p>\n\n<p><img src=\"/assets/images/silent2.png\" alt=\"\" /></p>\n\n<p>If you did not get it yet, if you bring your system to production level, do not use the default implementation,\n<strong><em>unless</em></strong> you really need to change notifications.</p>",
      "summary": "I cannot find the bug report just now, but the CDK has an open problem with change even notification, where the nonotify classes still caused change event to be sent around.",
      
      "date_published": "2011-07-13T00:00:00+00:00",
      "date_modified": "2011-07-13T00:00:00+00:00",
      "tags": ["cdk"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/f95v6-r1630",
      "url": "https://chem-bla-ics.linkedchemistry.info/2011/07/06/chempedia-rdf-2-kasabi.html",
      "title": "ChemPedia-RDF #2: Kasabi",
      "content_html": "<p><img style=\"float: right;\" src=\"/assets/images/kasabi.png\" width=\"200\" />\n<a href=\"http://beta.kasabi.com/\">Kasabi</a> is a new, RDF hosting service by <a href=\"http://www.talis.com/\">Talis</a>. It’s still in beta, and I have been testing\ntheir beta service with the <a href=\"https://chem-bla-ics.linkedchemistry.info/2009/11/19/chempedia-rdf-1-sparql-end-point.html\">RDF version <i class=\"fa-solid fa-recycle fa-xs\"></i></a> I created of\n<a href=\"http://metamolecular.com/chempedia/\">ChemPedia Substances</a> (the now no longer existing cool web service from\n<a href=\"http://metamolecular.com/\">MetaMolecular</a> to draw and name organic molecules).</p>\n\n<p>Kasabi makes the RDF data available via a few APIs, depending on the APIs selected by the uploader. I picked all five of them, just to see how\nthings work. Of direct interest are the SPARQL end point, but also the option to host the data as dereferencable resources. Cool! That was just\nwhat was missing for me.</p>\n\n<p>Now, using the API requires you to get an account. This will allow Kasabi to control the traffic, and as such creates a business model around\nproviding services around Open Data. I think this approach will work. But just to make clear, this does mean you need to get an account first,\nif you like to play with this data. Once you got an account, you get an API key, and you can append that to any URI with <code class=\"language-plaintext highlighter-rouge\">?apikey=XXXX</code> to\nauthenticate yourself. I think this does mean Kasabi will have to go to a https connection, which is not yet the case. Moreover, you will need\nto subscribe to the data set too. That, in fact, with #altmetrics in mind, sounds really interesting :)</p>\n\n<p>The ChemPedia RDF data is available at: <a href=\"http://beta.kasabi.com/dataset/chempedia-rdf\">http://beta.kasabi.com/dataset/chempedia-rdf</a></p>\n\n<p>This web page will give the five APIs, of which the augmentation one is really interesting, but I have not played with that yet to say much\nabout it. The idea of that API is to augment RDF you post with data from the data set. Like in a <a href=\"http://en.wikipedia.org/wiki/Augmented_reality\">augmented reality</a>.\nThat should be cool for mashups.</p>\n\n<p>Now, the APIs I do understand include this SPARQL end point (remember to add your API key!):</p>\n\n<p><a href=\"http://labs.kasabi.com/explorer/sparql/sparql-endpoint-chempedia-rdf\">http://labs.kasabi.com/explorer/sparql/sparql-endpoint-chempedia-rdf</a></p>\n\n<p>And the Linked Data feature. In the <a href=\"http://chem-bla-ics.blogspot.com/2011/07/chempedia-rdf-3-uploading-data-to.html\">next post</a>, I will\nexplain how I tweaked the original data, how I uploaded it, and how this resulted in the dereferencable resources, like:</p>\n\n<p><a href=\"http://data.kasabi.com/dataset/chempedia-rdf/substances/2-2595-7562-8125.html\">http://data.kasabi.com/dataset/chempedia-rdf/substances/2-2595-7562-8125.html</a></p>\n\n<p>Note the links for RDF/XML, RDF/JSON, and Turtle, directly accessible by replacing the .html extension with .rdf, .json, and .ttl respectively.\nAn API key does not seem required for this, which makes perfect sense.</p>\n\n<p>It took me some chatting with the people from Talis, who have been very helpful, as the whole platform was a bit overwhelming. But, for the first\ntime ever, I actually got Linked Open Data online, in a Linked Data manner.</p>",
      "summary": "Kasabi is a new, RDF hosting service by Talis. It’s still in beta, and I have been testing their beta service with the RDF version I created of ChemPedia Substances (the now no longer existing cool web service from MetaMolecular to draw and name organic molecules).",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/kasabi.png",
      "date_published": "2011-07-06T00:00:00+00:00",
      "date_modified": "2024-12-29T00:00:00+00:00",
      "tags": ["semweb","kasabi","chemistry"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/n4hbf-t3t23",
      "url": "https://chem-bla-ics.linkedchemistry.info/2011/06/25/from-archives-my-iccs-2005-poster.html",
      "title": "From the archives: my ICCS 2005 poster",
      "content_html": "<p>Julio and Gert placed their ICCS 2011 <a href=\"http://www.slideshare.net/Gertdus/9th-iccs-noordwijkerhout-8335140\">work <i class=\"fa-solid fa-link-slash fa-xs\"></i></a>\n<a href=\"http://www.slideshare.net/peyron/julio-peironcely-iccs-2011\">online</a>, and today I was going through old CDs (see\n<a href=\"http://chem-bla-ics.blogspot.com/2011/06/from-archives-chemical-web-and-cdk-in.html\">From the archives: Chemical Web, and the CDK in 2004</a>\nand <a href=\"http://chem-bla-ics.blogspot.com/2011/06/chiral-molecules-how-cool-is-sem.html\">Chiral Molecules: how cool is the SEM picture?</a>).\nI also ran into my ICCS 2005 poster, and because that too was before I started blogging, I never posted it online. So, here it is,\nbased on <a href=\"https://chem-bla-ics.linkedchemistry.info/2008/03/01/todo-april-2nd-defend-my-phd-work.html\">my thesis <i class=\"fa-solid fa-recycle fa-xs\"></i></a>:</p>\n\n<p><img src=\"/assets/images/poster2.png\" alt=\"\" /></p>",
      "summary": "Julio and Gert placed their ICCS 2011 work online, and today I was going through old CDs (see From the archives: Chemical Web, and the CDK in 2004 and Chiral Molecules: how cool is the SEM picture?). I also ran into my ICCS 2005 poster, and because that too was before I started blogging, I never posted it online. So, here it is, based on my thesis :",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/poster2.png",
      "date_published": "2011-06-25T00:00:00+00:00",
      "date_modified": "2025-06-08T00:00:00+00:00",
      "tags": ["iccs"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/q97cq-5ah86",
      "url": "https://chem-bla-ics.linkedchemistry.info/2011/06/25/chiral-molecules-how-cool-is-sem.html",
      "title": "Chiral Molecules: how cool is the SEM picture?",
      "content_html": "<p>I just found my student thesis in Organic Chemistry from my <a href=\"http://www.ru.nl/\">Nijmegen</a> education. It’s in Dutch, but I’ll\nexplore if I can upload this to <a href=\"http://repository.ubn.ru.nl/\">Radboud University’s DSpace</a>. But I could not resist sharing\nthis nice scanning electron microscope picture :) Look at those amphiphiles show a nice chiral ribbon!</p>\n\n<p><img src=\"/assets/images/coolChemistry.png\" alt=\"\" /></p>\n\n<p>This disk also has quite a few raw spectra (as TIFF images). I’ll try figure out what to do with those. Uploading as Open Data\nto <a href=\"http://chemspider.com/\">ChemSpider</a> is tempting, but I want to make sure I can easily have people download the collection\ntoo (read: programmatically).</p>",
      "summary": "I just found my student thesis in Organic Chemistry from my Nijmegen education. It’s in Dutch, but I’ll explore if I can upload this to Radboud University’s DSpace. But I could not resist sharing this nice scanning electron microscope picture :) Look at those amphiphiles show a nice chiral ribbon!",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/coolChemistry.png",
      "date_published": "2011-06-25T00:00:00+00:00",
      "date_modified": "2011-06-25T00:00:00+00:00",
      "tags": ["chemistry"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/9w84r-evn93",
      "url": "https://chem-bla-ics.linkedchemistry.info/2011/04/21/chembl-09-as-rdf.html",
      "title": "ChEMBL 09 as RDF",
      "content_html": "<p><em>Update 2021-02</em>: this post is still the second-most read post in my blog. Welcome! Some updates:</p>\n\n<ul>\n  <li>Ammar Ammar in our BiGCaT group has set up a <a href=\"https://chemblmirror.rdf.bigcat-bioinformatics.org/\">new SPARQL endpoint</a>. Please use and tweet. blog, or otherwise let others now how you use the ChEMBL RDF.</li>\n  <li>Since this post I have <a href=\"https://chem-bla-ics.blogspot.com/search/label/chembl\">blogged a lot more about ChEMBL</a>.</li>\n</ul>\n\n<p><em>Update</em>: this work is now written down in <a href=\"https://chem-bla-ics.linkedchemistry.info/2013/05/09/new-paper-chembl-database-as-linked.html\">this paper</a>.</p>\n\n<p>I’m having a really bad month, as you can see from the number of posts. Too much to do, too little time. One of the things\nI have been doing in the past weeks is update the RDF for <a href=\"https://www.ebi.ac.uk/chembldb/\">ChEMBL</a>, now up to\nversion 09. The <a href=\"https://web.archive.org/web/20121123055403/http://rdf.farmbio.uu.se/chembl/sparql\">SPARQL end point <i class=\"fa-solid fa-box-archive fa-xs\"></i></a> has not been updated yet (which is\nstill at ChEBML 04), but you can now download the triples for self-hosting here. Like the database itself, the RDF is\navailable under the <a href=\"http://creativecommons.org/licenses/by-sa/3.0/\">CC-SA-BY license</a>, requiring attribution to both\nthe ChEMBL team as well as our efforts to create the RDF (see this\n<a href=\"https://github.com/egonw/chembl.rdf/blob/master/README.markdown\">README</a>).</p>",
      "summary": "Update 2021-02: this post is still the second-most read post in my blog. Welcome! Some updates:",
      
      "date_published": "2011-04-21T00:00:00+00:00",
      "date_modified": "2025-02-22T00:00:00+00:00",
      "tags": ["chembl","rdf"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/8my4k-rfz51",
      "url": "https://chem-bla-ics.linkedchemistry.info/2011/02/06/groovy-cheminformatics.html",
      "title": "Groovy Cheminformatics...",
      "content_html": "<p><strong>Update</strong>: the <a href=\"https://chem-bla-ics.linkedchemistry.info/2012/01/15/groovy-cheminformatics-4th-edition.html\">fourth edition <i class=\"fa-solid fa-recycle fa-xs\"></i></a> is out.</p>\n\n<p>Some project are never finished. Neither is this one, but it is never too late to change how things work, so, taking advantage of\npublishing-on-demand, here I introduce the release-soon, release-often equivalent of cheminformatics books, my\n<a href=\"http://www.lulu.com/product/paperback/groovy-cheminformatics-with-the-chemistry-development-kit/14745007\">Groovy Cheminformatics with the Chemistry Development Kit</a>\nbook:</p>\n\n<p><img src=\"/assets/images/cdkBook.png\" alt=\"\" /></p>\n\n<p>With a serious discount for just being the first edition (1.3.8-0), but still counting at 72 pages with 75 code examples, this edition\nmarks a personal milestone (and probably not much more than that). There remains much to do, but I promised a release by tomorrow, so\nhere it is. Next releases will contain more code examples, more functionality descriptions, and more literature reviewing where such\ncode is used in science. The plan is to make new editions with each new <a href=\"http://cdk.sf.net/\">CDK</a> release, as well as new editions\nwhen I added a new chapter, section, or just paragraph. But, there will not be a Nightly build service anytime soon.</p>\n\n<p>The current table of content is as follows:</p>\n\n<p><img src=\"/assets/images/cdkBookToc1.png\" alt=\"\" /></p>\n\n<p>Now, the book content is <strong><em>not</em></strong> open content. However, it contains nothing that is not available in other means. It’s just the\ncompilation that makes this book interesting, as well as that I put effort in ensuring the code examples remain working.\nFor that, I ask a minor financial contribution.</p>",
      "summary": "Update: the fourth edition is out.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/cdkBook.png",
      "date_published": "2011-02-06T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["cdk","java","cheminf","cdkbook"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/7dnxr-jv029",
      "url": "https://chem-bla-ics.linkedchemistry.info/2011/01/30/github-tip-download-commits-as-patches.html",
      "title": "GitHub Tip: download commits as patches",
      "content_html": "<p><img style=\"float: right;\" src=\"/assets/images/1000px-GitHub.svg.png\" width=\"200\" />\nSome time ago, the brilliant <a href=\"http://github.com/\">GitHub</a> people gave me the following tip. Rajarshi is\n<a href=\"https://sourceforge.net/tracker/index.php?func=detail&amp;aid=3160093&amp;group_id=20024&amp;atid=120024#\">lazy</a>, and might\nfind it interesting. By appending <code class=\"language-plaintext highlighter-rouge\">.patch</code> to the commit URL, a commit can easily be downloaded as patch. That way,\ndevelopers can easily download it with <code class=\"language-plaintext highlighter-rouge\">wget</code> or <code class=\"language-plaintext highlighter-rouge\">curl</code> and apply it locally with <code class=\"language-plaintext highlighter-rouge\">git am</code>,\nwithout having the fetch the full repository.</p>\n\n<p>For example, Dmitry made this commit in his branch, having the URL\n<a href=\"https://github.com/dmak/cdk/commit/9b0478d50c7b5ca10f77fb01d89329db5fe80625\">https://github.com/dmak/cdk/commit/9b0478d50c7b5ca10f77fb01d89329db5fe80625</a>.\nThe patch for this commit can then be downloaded at this URL\n<a href=\"https://github.com/dmak/cdk/commit/9b0478d50c7b5ca10f77fb01d89329db5fe80625.patch\">https://github.com/dmak/cdk/commit/9b0478d50c7b5ca10f77fb01d89329db5fe80625.patch</a>.</p>",
      "summary": "Some time ago, the brilliant GitHub people gave me the following tip. Rajarshi is lazy, and might find it interesting. By appending .patch to the commit URL, a commit can easily be downloaded as patch. That way, developers can easily download it with wget or curl and apply it locally with git am, without having the fetch the full repository.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/1000px-GitHub.svg.png",
      "date_published": "2011-01-30T00:00:00+00:00",
      "date_modified": "2011-01-30T00:00:00+00:00",
      "tags": ["github"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/x4cvj-yw944",
      "url": "https://chem-bla-ics.linkedchemistry.info/2011/01/17/9th-international-conference-on.html",
      "title": "The 9th International Conference on Chemical Structures (ICCS)",
      "content_html": "<p><img src=\"/assets/images/iccs_logo_2011.png\" alt=\"\" /></p>\n\n<p>Later this year the ninth <a href=\"http://www.int-conf-chem-structures.org/home.html\">International Conference on Chemical Structures</a>\n(ICSS) conference will be held in the Netherlands. I had the pleasure of joining this meeting, I think, eight years ago, when\nI was doing my PhD in Nijmegen. Mind you, I did not <em>attend</em> the conference; I helped with the organization ;) That was a good\ndeal, particularly because I got to meet many cheminformaticians while working behind the registration desk ;)</p>\n\n<p>Actually, my gravatar still reflects that meeting, as it is a picture taken on the boat trip on the <a href=\"http://en.wikipedia.org/wiki/Markermeer\">Markermeer</a>.\nThat was one great boat trip: I steered a <em>driemaster</em>, and helped out on the boat on ropes outside the deck, meters above the water.\nCheminformatics can be so nice! The photo was taken during a calmer part of that boat trip :)</p>\n\n<p>Back to the ICCS. It’s one of the bigger cheminformatics meetings, and likely the best after the yearly\n<a href=\"http://www.gdch.de/gcc2010/\">GCC meetings</a>. Mind you, the term cheminformatics reflects more the methods than the domains.\nIndeed, the meeting’s <a href=\"http://www.int-conf-chem-structures.org/call-for-papers.html\">Call for Papers</a> lists many topics\nhighly relevant to <a href=\"http://chem-bla-ics.blogspot.com/2011/01/karolinska-institutet.html\">my position here at KI</a>,\nincluding chemogenomics, (Q)SAR, literature mining, “integration of medical and biological information” (including\nsemantic web technologies), and in-silico analysis of toxicology, drug safety, and adverse events.</p>\n\n<p>Depending on the schedule this year, I may actually submit an abstract based on what we will do in the next year, and see\nwhat happens. The CfP deadline is 31 January.</p>",
      "summary": "",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/iccs_logo_2011.png",
      "date_published": "2011-01-17T00:00:00+00:00",
      "date_modified": "2025-06-08T00:00:00+00:00",
      "tags": ["iccs"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/k1tea-hnb50",
      "url": "https://chem-bla-ics.linkedchemistry.info/2010/12/30/text-mining-chemistry-from-dutch-or.html",
      "title": "Text mining chemistry from Dutch or Swedish texts",
      "content_html": "<p><a href=\"http://oscar3-chem.sf.net/\">Oscar</a> is a text miner. It mines in text for chemistry.\n<a href=\"https://bitbucket.org/wwmm/oscar4/\">Oscar4</a> is the next iteration of Oscar\ncode that I worked on in the past three months, with Lezan, Sam, and David. I blogged about\naspects of Oscar4 at several occasions:</p>\n\n<ul>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2010/10/15/working-on-oscar-for-three-months.html\">Working on Oscar for three months <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2010/10/21/oscar-text-mining-in-taverna.html\">Oscar text mining in Taverna <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n  <li><a href=\"http://chem-bla-ics.blogspot.com/2010/10/multiple-unit-test-inheritance-with.html\">Multiple unit test inheritance with JExample</a></li>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2010/10/28/oscar4-java-api-chemical-name.html\">Oscar4 Java API: chemical name dictionaries <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2010/11/18/oscar4-command-line-utilities.html\">Oscar4 command line utilities <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n  <li><a href=\"http://chem-bla-ics.blogspot.com/2010/11/installing-oscar.html\">Installing Oscar</a></li>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2010/11/29/adding-new-dictionary-to-oscar.html\">Adding a new dictionary to Oscar <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n  <li><a href=\"http://chem-bla-ics.blogspot.com/2010/12/status-update-on-bjoc-analysis-with.html\">Status update on BJOC analysis with Oscar and ChemicalTagger</a></li>\n  <li><a href=\"http://chem-bla-ics.blogspot.com/2010/12/status-update-on-bjoc-analysis-with_11.html\">Status update on BJOC analysis with Oscar and ChemicalTagger #2</a></li>\n  <li><a href=\"http://chem-bla-ics.blogspot.com/2010/12/supramolecular-chemistry.html\">Supramolecular chemistry</a></li>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2010/12/23/status-update-on-bjoc-analysis-with_23.html\">Status update on BJOC analysis with Oscar and ChemicalTagger #3 <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2010/12/26/oscar-training-data-models-etc.html\">Oscar: training data, models, etc <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n</ul>\n\n<p>These posts will server is a some initial critical mass for a draft report I plan to finish\ntoday. I might have to blog some further posts with diagrams, here and there. This post is\nactually one of them, and discusses something where Oscar can be expected to go next, now\nthat the design is cleaned up (though this effort is not halted now) and it has become\npossible again to extend it. The over <a href=\"https://hudson.ch.cam.ac.uk/job/oscar4/lastBuild/testReport/\">250 unit tests</a>\nmake this a lot easier too.</p>\n\n<p>One aspect where I expect Oscar to go in 2011 is the support for other languages. To a very\nlarge extend this is based on multi-language support in the dictionaries, as well as having\ntraining data in a particular language. This also provides some context to my earlier post\nabout the <a href=\"https://chem-bla-ics.linkedchemistry.info/2010/12/26/oscar-training-data-models-etc.html\">need for a Oscar training data repository <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.</p>\n\n<p>This extension opens a number of options: analysis of patent literature in other languages,\nmonitoring of press releases in other languages, and news items in local news papers, etc.\nFor example, it could analyse <a href=\"http://www.c2w.nl/energierijke-gistcel.119621.lynkx\">this C2W news item</a>\non <a href=\"http://en.wikipedia.org/wiki/Yeast\">yeast</a> cells:</p>\n\n<p><img src=\"/assets/images/c2w.png\" alt=\"\" /></p>\n\n<p>There are many use cases for such localized text mining. And it surely matters for determining\nthe impact of research.</p>\n\n<p>Oscar has various places where language specifics are found. For example, in tokenization of a\ntext. One step here is the detection of sentence ends. This is done in most western languages\nwith a period, exclamation mark, question mark, etc. But periods (dots) are also used in\nabbreviations. Similarly, colons can be used in chemical names. But the every language comes in\nwith different abbreviations that need to be recognized.</p>\n\n<p>Currently, some abbreviations are found in <a href=\"https://bitbucket.org/wwmm/oscar4/src/005ffa00a69d/oscar4-core/src/main/java/uk/ac/cam/ch/wwmm/oscar/document/NonSentenceEndings.java\">NonSentenceEndings</a>.\nIn the past three months, we have been cleaning up the code, and restructured the source code,\nmaking it easier to detect such places. This class will likely undergo further refactoring, to\nmaking the list of such non-sentence-endings configurable via files or so. What I expect to see,\nis that we you initiate Oscar like this:</p>\n\n<div class=\"language-java highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nc\">Oscar</span> <span class=\"n\">oscar</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"nc\">Oscar</span><span class=\"o\">(</span><span class=\"nc\">Locale</span><span class=\"o\">.</span><span class=\"na\">US</span><span class=\"o\">);</span>\n</code></pre></div></div>\n\n<p>This might actually even make a nice student summer project. The biggest challenge will be in making a good\ncorpus of training data, like the SciBorg training data that was used for training Oscar3.</p>\n\n<p>But the whole normalization is tainted with English language specifics too. For example, the normalizer\nwill have to ‘normalize’ the question marks, for which there exist several\n<a href=\"http://en.wikipedia.org/wiki/Question_mark#Stylistic_variants\">unicode variations</a>.\nBut the normalized variant is language dependent. For example, greek and armenian have different characters\n(see <a href=\"http://en.wikipedia.org/wiki/Question_mark#Opening_and_closing_question_marks\">this page</a>),\nand then we have not even started talking about the right to left.</p>\n\n<p>Besides localized dictionaries, this Oscar will also benefit from a localized <a href=\"http://opsin.ch.cam.ac.uk/\">OPSIN</a>.\nIt seem to recognize the Dutch <a href=\"https://opsin.ch.cam.ac.uk/opsin/propaan.png\">propaan</a>, but not\n<a href=\"https://opsin.ch.cam.ac.uk/opsin/benzeen.png\">benzeen</a>. I am not going to look at that soon, but if you are\ninterested, I recommend checking out Rich’\n<a href=\"https://doi.org/10.59350/bbrwt-e5n35\">posts <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\n<a href=\"https://doi.org/10.59350/vtadn-tdt17\">about <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\n<a href=\"https://doi.org/10.59350/nbtxd-kdz73\">forking <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nOPSIN and writing patches.</p>\n\n<p>Getting Oscar going for other languages is a challenge, but also offers new opportunities. Just email the\n<a href=\"http://sourceforge.net/mailarchive/forum.php?forum_name=oscar3-chem-developers\">oscar mailing list</a>\nif you are interested and need help.</p>",
      "summary": "Oscar is a text miner. It mines in text for chemistry. Oscar4 is the next iteration of Oscar code that I worked on in the past three months, with Lezan, Sam, and David. I blogged about aspects of Oscar4 at several occasions:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/c2w.png",
      "date_published": "2010-12-30T00:00:00+00:00",
      "date_modified": "2025-03-05T00:00:00+00:00",
      "tags": ["oscar","textmining"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.59350/vtadn-tdt17", "doi": "10.59350/vtadn-tdt17"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.59350/nbtxd-kdz73", "doi": "10.59350/nbtxd-kdz73"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.59350/bbrwt-e5n35", "doi": "10.59350/bbrwt-e5n35"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/2repm-7m232",
      "url": "https://chem-bla-ics.linkedchemistry.info/2010/12/29/converting-json-to-rdfxml-with-groovy.html",
      "title": "Converting JSON to RDF/XML with Groovy",
      "content_html": "<p>Mark’s new <a href=\"http://www.science3point0.com/blog/2010/12/29/cc0-rdf-hosting-for-scientists/\">CCO/RDF hosting functionality</a>\n(see also <a href=\"http://chem-bla-ics.blogspot.com/2010/12/what-should-free-cc0-rdf-hosting-for.html\">my post two days ago</a>)\nrequires <a href=\"http://www.w3.org/TR/REC-rdf-syntax/\">RDF/XML format</a>, so I updated my code to convert the\n<a href=\"http://chempedia.com/substances\">Chempedia Substances</a> data into RDF/XML instead of N3 (I have asked\n<a href=\"http://depth-first.com/\">Rich</a> to put a new download link online). This is the\n<a href=\"http://groovy.codehaus.org/\">Groovy</a> code I used:</p>\n\n<div class=\"language-groovy highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kn\">import</span> <span class=\"nn\">groovy.xml.MarkupBuilder</span>\n<span class=\"kn\">import</span> <span class=\"nn\">groovy.util.IndentPrinter</span>\n\n<span class=\"n\">input</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"n\">File</span><span class=\"o\">(</span><span class=\"s2\">\"substances.json\"</span><span class=\"o\">)</span>\n<span class=\"n\">json</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"n\">JsonSlurper</span><span class=\"o\">().</span><span class=\"na\">parse</span><span class=\"o\">(</span><span class=\"n\">input</span><span class=\"o\">);</span>\n\n<span class=\"kt\">def</span> <span class=\"n\">writer</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"n\">StringWriter</span><span class=\"o\">()</span>\n<span class=\"kt\">def</span> <span class=\"n\">xml</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"n\">MarkupBuilder</span><span class=\"o\">(</span>\n  <span class=\"k\">new</span> <span class=\"nf\">IndentPrinter</span><span class=\"o\">(</span><span class=\"k\">new</span> <span class=\"n\">PrintWriter</span><span class=\"o\">(</span><span class=\"n\">writer</span><span class=\"o\">))</span>\n<span class=\"o\">)</span>\n<span class=\"n\">xml</span><span class=\"o\">.</span><span class=\"s1\">'rdf:RDF'</span><span class=\"o\">(</span>\n  <span class=\"s1\">'xmlns:rdf'</span><span class=\"o\">:</span>\n    <span class=\"s1\">'http://www.w3.org/1999/02/22-rdf-syntax-ns#'</span><span class=\"o\">,</span>\n  <span class=\"s1\">'xmlns:dc'</span> <span class=\"o\">:</span>\n    <span class=\"s1\">'http://purl.org/dc/elements/1.1/'</span><span class=\"o\">,</span>\n  <span class=\"s1\">'xmlns:iupac'</span> <span class=\"o\">:</span>\n    <span class=\"s1\">'http://www.iupac.org/'</span><span class=\"o\">,</span>\n  <span class=\"s1\">'xmlns:cp'</span> <span class=\"o\">:</span>\n    <span class=\"s1\">'http://rdf.openmolecules.net/chempedia/onto#'</span><span class=\"o\">,</span>\n  <span class=\"s1\">'xmlns:owl'</span> <span class=\"o\">:</span>\n    <span class=\"s1\">'http://www.w3.org/2002/07/owl#'</span>\n<span class=\"o\">)</span> <span class=\"o\">{</span>\n  <span class=\"n\">json</span><span class=\"o\">.</span><span class=\"na\">each</span> <span class=\"o\">{</span> <span class=\"n\">substance</span> <span class=\"o\">-&gt;</span>\n    <span class=\"n\">xml</span><span class=\"o\">.</span><span class=\"s1\">'rdf:Description'</span><span class=\"o\">(</span>\n      <span class=\"s1\">'rdf:about'</span><span class=\"o\">:</span> <span class=\"n\">substance</span><span class=\"o\">.</span><span class=\"na\">uri</span>\n    <span class=\"o\">)</span> <span class=\"o\">{</span>\n      <span class=\"n\">xml</span><span class=\"o\">.</span><span class=\"s1\">'dc:identifier'</span><span class=\"o\">(</span><span class=\"n\">substance</span><span class=\"o\">.</span><span class=\"na\">gsid</span><span class=\"o\">)</span>\n      <span class=\"n\">xml</span><span class=\"o\">.</span><span class=\"s1\">'owl:sameAs'</span><span class=\"o\">(</span>\n        <span class=\"s1\">'rdf:resource'</span> <span class=\"o\">:</span>\n        <span class=\"s1\">'http://rdf.openmolecules.net/?'</span> <span class=\"o\">+</span>\n        <span class=\"n\">substance</span><span class=\"o\">.</span><span class=\"na\">inchi</span>\n      <span class=\"o\">)</span>\n      <span class=\"n\">xml</span><span class=\"o\">.</span><span class=\"s1\">'iupac:inchi'</span><span class=\"o\">(</span>\n        <span class=\"s1\">'http://rdf.openmolecules.net/?'</span> <span class=\"o\">+</span>\n        <span class=\"n\">substance</span><span class=\"o\">.</span><span class=\"na\">inchi</span>\n      <span class=\"o\">)</span>\n      <span class=\"k\">for</span> <span class=\"o\">(</span><span class=\"kt\">int</span> <span class=\"n\">i</span> <span class=\"o\">=</span> <span class=\"mi\">0</span><span class=\"o\">;</span> <span class=\"n\">i</span><span class=\"o\">&lt;</span><span class=\"n\">substance</span><span class=\"o\">.</span><span class=\"na\">namings</span><span class=\"o\">.</span><span class=\"na\">size</span><span class=\"o\">();</span> <span class=\"n\">i</span><span class=\"o\">++)</span>\n      <span class=\"o\">{</span>\n        <span class=\"n\">naming</span> <span class=\"o\">=</span> <span class=\"n\">substance</span><span class=\"o\">.</span><span class=\"na\">namings</span><span class=\"o\">.</span><span class=\"na\">get</span><span class=\"o\">(</span><span class=\"n\">i</span><span class=\"o\">);</span>\n        <span class=\"n\">namingURI</span> <span class=\"o\">=</span> <span class=\"n\">substance</span><span class=\"o\">.</span><span class=\"na\">uri</span> <span class=\"o\">+</span> <span class=\"s2\">\"/naming\"</span> <span class=\"o\">+</span> <span class=\"n\">i</span><span class=\"o\">;</span>\n        <span class=\"n\">xml</span><span class=\"o\">.</span><span class=\"s1\">'cp:hasNaming'</span> <span class=\"o\">{</span>\n          <span class=\"n\">xml</span><span class=\"o\">.</span><span class=\"s1\">'rdf:Description'</span> <span class=\"o\">{</span>\n            <span class=\"n\">xml</span><span class=\"o\">.</span><span class=\"s1\">'cp:hasName'</span><span class=\"o\">(</span><span class=\"n\">naming</span><span class=\"o\">.</span><span class=\"na\">name</span><span class=\"o\">)</span>\n            <span class=\"n\">xml</span><span class=\"o\">.</span><span class=\"s1\">'cp:hasStatus'</span><span class=\"o\">(</span><span class=\"n\">naming</span><span class=\"o\">.</span><span class=\"na\">status</span><span class=\"o\">)</span>\n            <span class=\"n\">xml</span><span class=\"o\">.</span><span class=\"s1\">'cp:hasScore'</span><span class=\"o\">(</span><span class=\"n\">naming</span><span class=\"o\">.</span><span class=\"na\">score</span><span class=\"o\">)</span>\n          <span class=\"o\">}</span>\n        <span class=\"o\">}</span>\n      <span class=\"o\">}</span>\n    <span class=\"o\">}</span>\n  <span class=\"o\">}</span>\n<span class=\"o\">}</span>\n<span class=\"n\">println</span> <span class=\"n\">writer</span><span class=\"o\">.</span><span class=\"na\">toString</span><span class=\"o\">();</span>\n</code></pre></div></div>",
      "summary": "Mark’s new CCO/RDF hosting functionality (see also my post two days ago) requires RDF/XML format, so I updated my code to convert the Chempedia Substances data into RDF/XML instead of N3 (I have asked Rich to put a new download link online). This is the Groovy code I used:",
      
      "date_published": "2010-12-29T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["groovy","chemistry","rdf","json"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/pa72q-ykk64",
      "url": "https://chem-bla-ics.linkedchemistry.info/2010/12/26/oscar-training-data-models-etc.html",
      "title": "Oscar: training data, models, etc",
      "content_html": "<p><a href=\"https://sourceforge.net/projects/oscar3-chem/\">Oscar</a> uses a Maximum Entropy Markov Model (MEMM) based on <a href=\"http://en.wikipedia.org/wiki/N-gram\">n-grams</a>.\nPeter Corbett has written this up (doi:<a href=\"https://doi.org/10.1186/1471-2105-9-S11-S4\">10.1186/1471-2105-9-S11-S4</a>). So, it basically is statistics\nonce more. If you really want a proper bioinformatics education, so do your PhD at a (proteo)chemometrics department.</p>\n\n<p>N-grams are word parts of n characters. For example, the trigrams of <a href=\"http://en.wikipedia.org/wiki/Acetic_acid\">acetic acid</a>\ninclude <code class=\"language-plaintext highlighter-rouge\">ace</code>, <code class=\"language-plaintext highlighter-rouge\">cid</code>, <code class=\"language-plaintext highlighter-rouge\">tic</code>, <code class=\"language-plaintext highlighter-rouge\">eti</code>, and <code class=\"language-plaintext highlighter-rouge\">aci</code>. N-grams of length four include acid, etic, and acet. The MEMM assigns weights to\nthese n-grams, and based on that decided if something is in deed a <em>named entity</em> (in Oscar terminology). For example,\nconsider the <code class=\"language-plaintext highlighter-rouge\">acet</code> n-gram: acetone should be matched, but the n-gram <code class=\"language-plaintext highlighter-rouge\">facet</code> not.</p>\n\n<p>Put this in perspective in the ongoing refactoring of the Oscar software. We are changing normalization (e.g. converting\nall unicode hyphen alternatives into one specific hyphen), updating the tokenizer (e.g. changing the list of\nnon-sentence-endings like <em>Prof.</em>). It is clear this changes the n-grams typical for chemical-like things. Worse,\nthe weights are tuned towards to know n-grams, and statistical models are generally a bit overtrained for the\ndata, or, at least, specific for it.</p>\n\n<p>Now, if the distribution of n-grams changes, the weights in the model need to be updated too, to not degrade\nthe model performance. So, Oscar is useless if we cannot retrain its MEMM component after a refactoring. If\nthat would be impossible, we would have effectively created an <em>intellectual monopoly</em>.</p>\n\n<p>Thus, what the Oscar project needs, is one or more free sets of annotated literature, which can be used to\ntrain new MEMM models. The SciBorg corpus was used to train the current Oscar3 and Oscar4 models. This data\n(copyright <a href=\"http://rsc.org/\">RSC</a>) will very likely be available under a <a href=\"http://creativecommons.org/licenses/\">Creative Commons</a>\nlicense (RSC++), but may have the NC clause, which would not be good for developing a business model around\nthe opensource Oscar (such as providing a high-performance web service via a subscription service). I have\nrecently written up <a href=\"http://chem-bla-ics.blogspot.com/2010/12/re-why-i-and-you-should-avoid-nc.html\">the problems the NC clause introduces</a>,\nand some <a href=\"http://chem-bla-ics.blogspot.com/2010/12/blog-post.html\">examples of commercial Open Source cheminformatics projects</a>.</p>\n\n<p>We need not focus only on this SciBorg data, however. In fact, we will need multiple models anyway. For\nexample, the SciBorg papers (42 if not mistaken) are around a particular kind of literature. So, it\nintroduces the risk of using it to analyse papers out of the application domain. Furthermore, I am very\ninterested (and others indicated so too) to use Oscar for other languages. Surely, English is the major\nlanguage, but there are many use cases for Oscar when useful for other languages.</p>\n\n<p>Therefore, for what we need in the Oscar project, is a registry of training (/test) data, annotated itself\nwith metadata around how that data was created (what quality assurance, what kind of named entity types,\nhow many domain experts were involved, etc), test results for those data sets, etc. My time on the Oscar\nproject is almost over, and I have no clue when I will be able to invest the same amount of time into the\nproject as I did in the past three months. But the creation of this registry is clear step that must be\ntaken in the Oscar4 development.</p>",
      "summary": "Oscar uses a Maximum Entropy Markov Model (MEMM) based on n-grams. Peter Corbett has written this up (doi:10.1186/1471-2105-9-S11-S4). So, it basically is statistics once more. If you really want a proper bioinformatics education, so do your PhD at a (proteo)chemometrics department.",
      
      "date_published": "2010-12-26T00:00:00+00:00",
      "date_modified": "2010-12-26T00:00:00+00:00",
      "tags": ["oscar","textmining"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1186/1471-2105-9-S11-S4", "doi": "10.1186/1471-2105-9-S11-S4"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/8n7nt-fas57",
      "url": "https://chem-bla-ics.linkedchemistry.info/2010/12/23/status-update-on-bjoc-analysis-with_23.html",
      "title": "Status update on BJOC analysis with Oscar and ChemicalTagger #3",
      "content_html": "<p>The <a href=\"http://chem-bla-ics.blogspot.com/2010/12/status-update-on-bjoc-analysis-with_11.html\">two</a>\n<a href=\"http://chem-bla-ics.blogspot.com/2010/12/status-update-on-bjoc-analysis-with.html\">earlier</a> posts\nin this series showed screenshots of results of Oscar, but the title also promised results by Lezan’s\n<a href=\"http://www-ucc.ch.cam.ac.uk/products/software/chemicaltagger\">ChemicalTagger</a>. Sam\nhelped with getting the HTML pages online via the Cambridge Hudson installation. Where\nOscar find named entities (chemical compounds, processes, etc), ChemicalTagger finds\nroles, like solvent, acid, base, catalyst. Roles are properties of chemical compounds\nin certain situations. Ethanol is not always a solvent, sometimes it is a Xmas present.\nThe current output is not entirely where I want to go yet, but makes it easy which\nsolvents are frequently found in the BJOC corpus:</p>\n\n<p><img src=\"/assets/images/chemtag1.png\" alt=\"\" /></p>\n\n<p>This screenshot of an analysis of 15 BJOC papers shows that AcOEt (is that the\n<a href=\"http://lab.chempedia.com/questions/427/are-etoac-and-acoet-the-same\">same as EtOAc?</a>)\nis mentioned as solvent three times in <a href=\"http://www.ncbi.nlm.nih.gov/sites/ppmc/articles/PMC1399459\">PMC1399459</a>.\nBrine, however, is mentioned as solvent in three papers.</p>\n\n<p>As said, these <a href=\"https://hudson.ch.cam.ac.uk/job/oscar4-chebi/ws/target/output/bjoc.html\">two</a>\n<a href=\"https://hudson.ch.cam.ac.uk/job/oscar4-chebi/ws/target/output/roles.html\">pages</a> contain\nRDF and the tables are sortable. Hudson recompiles them automatically when I update the\nsource code to create the HTML+RDFa. So, go ahead, send me bug reports, feature requests,\nand patches!</p>",
      "summary": "The two earlier posts in this series showed screenshots of results of Oscar, but the title also promised results by Lezan’s ChemicalTagger. Sam helped with getting the HTML pages online via the Cambridge Hudson installation. Where Oscar find named entities (chemical compounds, processes, etc), ChemicalTagger finds roles, like solvent, acid, base, catalyst. Roles are properties of chemical compounds in certain situations. Ethanol is not always a solvent, sometimes it is a Xmas present. The current output is not entirely where I want to go yet, but makes it easy which solvents are frequently found in the BJOC corpus:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/chemtag1.png",
      "date_published": "2010-12-23T00:00:00+00:00",
      "date_modified": "2010-12-23T00:00:00+00:00",
      "tags": ["oscar","chemicaltagger","beilstein"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1186/1860-5397-1-11", "doi": "10.1186/1860-5397-1-11"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/w1acd-1d323",
      "url": "https://chem-bla-ics.linkedchemistry.info/2010/12/03/chemwriter-google-chrome-and-many-eyes.html",
      "title": "ChemWriter, Google Chrome, and Many Eyes in Open Source",
      "content_html": "<p><a href=\"http://en.wikipedia.org/wiki/Linus'_Law\">Linus’ law</a>:</p>\n\n<blockquote>\n  <p>given enough eyeballs, all bugs are shallow.</p>\n</blockquote>\n\n<p><a href=\"http://depth-first.com/\">Rich</a> of <a href=\"http://metamolecular.com/\">MetaMolecular</a> works on Open Source and closed source cheminformatics\nsolutions. <a href=\"http://chemwriter.com/\">ChemWriter</a> is one product he is working on which uses JavaScript and <a href=\"http://en.wikipedia.org/wiki/SVG\">SVG</a>\n(two Open Standards), and recently asked feedback on the new version. Test users found a problem on Google’s\n<a href=\"http://www.google.com/chrome\">Chrome</a> browser, and Rich then <a href=\"https://doi.org/10.59350/x4j7q-m6h98\">did something <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nthat is only possible in an Open Source environment: he downloaded the buggy product (Chrome), started looking for the cause, found it, and\nfiled a <a href=\"http://code.google.com/p/chromium/issues/detail?id=65238\">detailed bug report</a>. Just think that would have happened\nif this problem was in MS Internet Explorer…</p>\n\n<p>Well done!</p>",
      "summary": "Linus’ law:",
      
      "date_published": "2010-12-03T00:00:00+00:00",
      "date_modified": "2025-02-22T00:00:00+00:00",
      "tags": ["opensource"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.59350/x4j7q-m6h98", "doi": "10.59350/x4j7q-m6h98"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/v90k2-5a907",
      "url": "https://chem-bla-ics.linkedchemistry.info/2010/11/29/adding-new-dictionary-to-oscar.html",
      "title": "Adding a new dictionary to Oscar",
      "content_html": "<p>Say, you have your own dictionary of chemical compounds. For example, like your company’s list of yet-unpublished\n<a href=\"http://chembl.blogspot.com/2010/08/research-code-to-company-name-mapping.html\">internal research codes</a>. Still,\nyou want to index your local <a href=\"http://en.wikipedia.org/wiki/LISTSERV\">listserv</a> to make it easier for your\nemployees to search for particular chemistry you are working on and perhaps related to something done at\nother company sites. This is what Oscar is for.</p>\n\n<p>But, it will need to understand things like <a href=\"http://chembl.blogspot.com/p/research-code-stems.html\">UK-92,480</a>.\nThis is made possible with the Oscar4 refactorings we are currently working on. You only need to\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2010/10/28/oscar4-java-api-chemical-name.html\">register a dedicated dictionary <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.\nOscar4 has a default dictionary which corresponds to the dictionary used by Oscar3, and a dictionary based on\n<a href=\"http://www.ebi.ac.uk/chebi/\">ChEBI</a> (an old version) (see <a href=\"http://bitbucket.org/wwmm/oscar4/src/247b8deef001/oscar4-chemnamedict/src/main/java/uk/ac/cam/ch/wwmm/oscar/chemnamedict/core/\">this folder</a>\nin the source code repository).</p>\n\n<p>Adding a new dictionary is very straightforward: you just implement the <a href=\"http://bitbucket.org/wwmm/oscar4/src/247b8deef001/oscar4-chemnamedict/src/main/java/uk/ac/cam/ch/wwmm/oscar/chemnamedict/IChemNameDict.java\">IChemNameDict</a>\ninterface. This is, for example, what the OPSIN dictionary looks like:</p>\n\n<div class=\"language-java highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kd\">public</span> <span class=\"kd\">class</span> <span class=\"nc\">OpsinDictionary</span>\n<span class=\"kd\">implements</span> <span class=\"nc\">IChemNameDict</span><span class=\"o\">,</span> <span class=\"nc\">IInChIProvider</span> <span class=\"o\">{</span>\n\n  <span class=\"kd\">private</span> <span class=\"no\">URI</span> <span class=\"n\">uri</span><span class=\"o\">;</span>\n\n  <span class=\"kd\">public</span> <span class=\"nf\">OpsinDictionary</span><span class=\"o\">()</span> <span class=\"kd\">throws</span> <span class=\"nc\">URISyntaxException</span> <span class=\"o\">{</span>\n    <span class=\"k\">this</span><span class=\"o\">.</span><span class=\"na\">uri</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"no\">URI</span><span class=\"o\">(</span>\n      <span class=\"s\">\"http://wwmm.cam.ac.uk/oscar/dictionariy/opsin/\"</span>\n    <span class=\"o\">);</span>\n  <span class=\"o\">}</span>\n\n  <span class=\"c1\">// the URI is somewhat like a namespace</span>\n  <span class=\"kd\">public</span> <span class=\"no\">URI</span> <span class=\"nf\">getURI</span><span class=\"o\">()</span> <span class=\"o\">{</span>\n    <span class=\"k\">return</span> <span class=\"n\">uri</span><span class=\"o\">;</span>\n  <span class=\"o\">}</span>\n\n  <span class=\"c1\">// there are no stop words defined in this</span>\n  <span class=\"c1\">// dictionary</span>\n  <span class=\"kd\">public</span> <span class=\"kt\">boolean</span> <span class=\"nf\">hasStopWord</span><span class=\"o\">(</span><span class=\"nc\">String</span> <span class=\"n\">queryWord</span><span class=\"o\">)</span> <span class=\"o\">{</span>\n    <span class=\"k\">return</span> <span class=\"kc\">false</span><span class=\"o\">;</span>\n  <span class=\"o\">}</span>\n\n  <span class=\"c1\">// see hasStopWord()</span>\n  <span class=\"kd\">public</span> <span class=\"nc\">Set</span> <span class=\"nf\">getStopWords</span><span class=\"o\">()</span> <span class=\"o\">{</span>\n    <span class=\"k\">return</span> <span class=\"nc\">Collections</span><span class=\"o\">.</span><span class=\"na\">emptySet</span><span class=\"o\">();</span>\n  <span class=\"o\">}</span>\n\n  <span class=\"c1\">// it has the name in the dictionary if the name</span>\n  <span class=\"c1\">// can be converted into an InChI</span>\n  <span class=\"kd\">public</span> <span class=\"kt\">boolean</span> <span class=\"nf\">hasName</span><span class=\"o\">(</span><span class=\"nc\">String</span> <span class=\"n\">queryName</span><span class=\"o\">)</span> <span class=\"o\">{</span>\n    <span class=\"k\">return</span> <span class=\"nf\">getInChI</span><span class=\"o\">(</span><span class=\"n\">queryName</span><span class=\"o\">).</span><span class=\"na\">size</span><span class=\"o\">()</span> <span class=\"o\">!=</span> <span class=\"mi\">0</span><span class=\"o\">;</span>\n  <span class=\"o\">}</span>\n\n  <span class=\"c1\">// this dictionary can return InChIs for names</span>\n  <span class=\"c1\">// so, it implements the IInChIProvider interface</span>\n  <span class=\"kd\">public</span> <span class=\"nc\">Set</span> <span class=\"nf\">getInChI</span><span class=\"o\">(</span><span class=\"nc\">String</span> <span class=\"n\">queryName</span><span class=\"o\">)</span> <span class=\"o\">{</span>\n    <span class=\"k\">try</span> <span class=\"o\">{</span>\n      <span class=\"nc\">NameToStructure</span> <span class=\"n\">nameToStructure</span> <span class=\"o\">=</span>\n        <span class=\"nc\">NameToStructure</span><span class=\"o\">.</span><span class=\"na\">getInstance</span><span class=\"o\">();</span>\n      <span class=\"nc\">OpsinResult</span> <span class=\"n\">result</span> <span class=\"o\">=</span> <span class=\"n\">nameToStructure</span>\n        <span class=\"o\">.</span><span class=\"na\">parseChemicalName</span><span class=\"o\">(</span>\n          <span class=\"n\">queryName</span><span class=\"o\">,</span> <span class=\"kc\">false</span>\n        <span class=\"o\">);</span>\n      <span class=\"k\">if</span> <span class=\"o\">(</span><span class=\"n\">result</span><span class=\"o\">.</span><span class=\"na\">getStatus</span><span class=\"o\">()</span>\n          <span class=\"o\">==</span> <span class=\"no\">OPSIN_RESULT_STATUS</span><span class=\"o\">.</span><span class=\"na\">SUCCESS</span><span class=\"o\">)</span> <span class=\"o\">{</span>\n        <span class=\"nc\">Set</span> <span class=\"n\">inchis</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"nc\">HashSet</span><span class=\"o\">();</span>\n        <span class=\"nc\">String</span> <span class=\"n\">inchi</span> <span class=\"o\">=</span> <span class=\"nc\">NameToInchi</span>\n          <span class=\"o\">.</span><span class=\"na\">convertResultToInChI</span><span class=\"o\">(</span>\n            <span class=\"n\">result</span><span class=\"o\">,</span> <span class=\"kc\">false</span>\n          <span class=\"o\">);</span>\n        <span class=\"n\">inchis</span><span class=\"o\">.</span><span class=\"na\">add</span><span class=\"o\">(</span><span class=\"n\">inchi</span><span class=\"o\">);</span>\n        <span class=\"k\">return</span> <span class=\"n\">inchis</span><span class=\"o\">;</span>\n      <span class=\"o\">}</span>\n    <span class=\"o\">}</span> <span class=\"k\">catch</span> <span class=\"o\">(</span><span class=\"nc\">NameToStructureException</span> <span class=\"n\">e</span><span class=\"o\">)</span> <span class=\"o\">{</span>\n      <span class=\"n\">e</span><span class=\"o\">.</span><span class=\"na\">printStackTrace</span><span class=\"o\">();</span>   \n    <span class=\"o\">}</span>\n    <span class=\"k\">return</span> <span class=\"nc\">Collections</span><span class=\"o\">.</span><span class=\"na\">emptySet</span><span class=\"o\">();</span>\n  <span class=\"o\">}</span>\n\n  <span class=\"kd\">public</span> <span class=\"nc\">String</span> <span class=\"nf\">getInChIforShortestSMILES</span><span class=\"o\">(</span>\n    <span class=\"nc\">String</span> <span class=\"n\">queryName</span><span class=\"o\">)</span>\n  <span class=\"o\">{</span>\n    <span class=\"nc\">Set</span> <span class=\"n\">inchis</span> <span class=\"o\">=</span> <span class=\"n\">getInChI</span><span class=\"o\">(</span><span class=\"n\">queryName</span><span class=\"o\">);</span>\n    <span class=\"k\">if</span> <span class=\"o\">(</span><span class=\"n\">inchis</span><span class=\"o\">.</span><span class=\"na\">size</span><span class=\"o\">()</span> <span class=\"o\">==</span> <span class=\"mi\">0</span><span class=\"o\">)</span> <span class=\"k\">return</span> <span class=\"kc\">null</span><span class=\"o\">;</span>\n    <span class=\"k\">return</span> <span class=\"n\">inchis</span><span class=\"o\">.</span><span class=\"na\">iterator</span><span class=\"o\">().</span><span class=\"na\">next</span><span class=\"o\">();</span>\n  <span class=\"o\">}</span>\n\n  <span class=\"c1\">// since names are converted on the fly, we do</span>\n  <span class=\"c1\">// not enumerate them</span>\n  <span class=\"kd\">public</span> <span class=\"nc\">Set</span> <span class=\"nf\">getNames</span><span class=\"o\">(</span><span class=\"nc\">String</span> <span class=\"n\">inchi</span><span class=\"o\">)</span> <span class=\"o\">{</span>\n    <span class=\"k\">return</span> <span class=\"nc\">Collections</span><span class=\"o\">.</span><span class=\"na\">emptySet</span><span class=\"o\">();</span>\n  <span class=\"o\">}</span>\n  <span class=\"kd\">public</span> <span class=\"nc\">Set</span> <span class=\"nf\">getNames</span><span class=\"o\">()</span> <span class=\"o\">{</span>\n    <span class=\"k\">return</span> <span class=\"nc\">Collections</span><span class=\"o\">.</span><span class=\"na\">emptySet</span><span class=\"o\">();</span>\n  <span class=\"o\">}</span>\n  <span class=\"kd\">public</span> <span class=\"nc\">Set</span> <span class=\"nf\">getOrphanNames</span><span class=\"o\">()</span> <span class=\"o\">{</span>\n    <span class=\"k\">return</span> <span class=\"nc\">Collections</span><span class=\"o\">.</span><span class=\"na\">emptySet</span><span class=\"o\">();</span>\n  <span class=\"o\">}</span>\n  <span class=\"kd\">public</span> <span class=\"nc\">Set</span> <span class=\"nf\">getChemRecords</span><span class=\"o\">()</span> <span class=\"o\">{</span>\n    <span class=\"k\">return</span> <span class=\"nc\">Collections</span><span class=\"o\">.</span><span class=\"na\">emptySet</span><span class=\"o\">();</span>\n  <span class=\"o\">}</span>\n  <span class=\"kd\">public</span> <span class=\"kt\">boolean</span> <span class=\"nf\">hasOntologyIdentifier</span><span class=\"o\">(</span>\n    <span class=\"nc\">String</span> <span class=\"n\">identifier</span><span class=\"o\">)</span>\n  <span class=\"o\">{</span>\n    <span class=\"c1\">// this ontology does not use ontology</span>\n    <span class=\"c1\">// identifiers</span>\n    <span class=\"k\">return</span> <span class=\"kc\">false</span><span class=\"o\">;</span>\n  <span class=\"o\">}</span>\n<span class=\"o\">}</span>\n</code></pre></div></div>\n\n<p>Now, you can implement the interface in various ways. You can even have the implementation hook into a SQL database\nwith JDBC, or use something else fancy. The dictionary will be used at various steps of the Oscar4 text analysis\nworkflow.</p>\n\n<p>Mind you, the refactoring is not over yet, and the details may change here and there.</p>\n\n<p>Your comments are most welcome!</p>",
      "summary": "Say, you have your own dictionary of chemical compounds. For example, like your company’s list of yet-unpublished internal research codes. Still, you want to index your local listserv to make it easier for your employees to search for particular chemistry you are working on and perhaps related to something done at other company sites. This is what Oscar is for.",
      
      "date_published": "2010-11-29T00:00:00+00:00",
      "date_modified": "2025-03-05T00:00:00+00:00",
      "tags": ["oscar","java"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/jsfck-t351",
      "url": "https://chem-bla-ics.linkedchemistry.info/2010/11/18/oscar4-command-line-utilities.html",
      "title": "Oscar4 command line utilities",
      "content_html": "<p>One goal of my three month project is to take Oscar4 to the community. We want to get it used more, and we need\na larger development community. Oscar4 and the related technologies do a good, sometimes excellent, job, but\nhave to be maintained, just like any other piece of code. To make using it easier, we are developing new APIs,\nas well as two user-oriented applications: <a href=\"https://chem-bla-ics.linkedchemistry.info/2010/10/21/oscar-text-mining-in-taverna.html\">a Taverna 2 plugin <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\nand command line utilities. The <a href=\"https://chem-bla-ics.linkedchemistry.info/2010/10/28/oscar4-java-api-chemical-name.html\">Oscar4 Java API <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nhas slightly evolved in the last three weeks, removing some complexity. In this post, I will introduce the command\nline utilities.</p>\n\n<h2 id=\"oscar4\">Oscar4</h2>\n\n<p>Most people will be mostly interested into the full Oscar4 program, to extract chemical entities. Oscar3 was\nalso capable of extracting data (like <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/09/08/chemical-archeology-oscar3-to.html\">NMR spectra <i class=\"fa-solid fa-recycle fa-xs\"></i></a>),\nbut that is not yet being ported. The OscarCLI program takes input, extracts chemicals, and where possible resolves\nthem into connection tables (viz. InChI).</p>\n\n<p>To extract chemicals from a line of text (e.g. <em>“This is propane.”</em>, you do:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nv\">$ </span>java <span class=\"nt\">-cp</span> oscar4-cli-4.0-SNAPSHOT.jar <span class=\"se\">\\</span>\n  uk.ac.cam.ch.wwmm.oscar.oscarcli.OscarCLI <span class=\"se\">\\</span>\n  This is propane.\npropane: <span class=\"nv\">InChI</span><span class=\"o\">=</span>1/C3H8/c1-3-2/h3H2,1-2H3\n</code></pre></div></div>\n\n<p>For larger chunks of texts it is easier to route it via <a href=\"http://en.wikipedia.org/wiki/Standard_streams\">stdin</a>,\nfor which we can use the <code class=\"language-plaintext highlighter-rouge\">-stdin</code> option:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nv\">$ </span><span class=\"nb\">echo</span> <span class=\"s2\">\"This is propane.\"</span> | <span class=\"se\">\\</span>\n  java <span class=\"nt\">-cp</span> oscar4-cli-4.0-SNAPSHOT.jar <span class=\"se\">\\</span>\n  uk.ac.cam.ch.wwmm.oscar.oscarcli.OscarCLI <span class=\"se\">\\</span>\n  <span class=\"nt\">-stdin</span>\npropane: <span class=\"nv\">InChI</span><span class=\"o\">=</span>1/C3H8/c1-3-2/h3H2,1-2H3\n</code></pre></div></div>\n\n<p>That way, we can easily process large plain text files (output omitted):</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nv\">$ </span><span class=\"nb\">cat </span>largeFile.txt | <span class=\"se\">\\</span>\n  java <span class=\"nt\">-cp</span> oscar4-cli-4.0-SNAPSHOT.jar <span class=\"se\">\\</span>\n  uk.ac.cam.ch.wwmm.oscar.oscarcli.OscarCLI <span class=\"se\">\\</span>\n  <span class=\"nt\">-stdin</span>\n</code></pre></div></div>\n\n<p>If you prefer RDF output, for further integration, use the <code class=\"language-plaintext highlighter-rouge\">-output text/turtle</code>:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nv\">$ </span><span class=\"nb\">cat </span>largeFile.txt | <span class=\"se\">\\</span>\n  java <span class=\"nt\">-cp</span> oscar4-cli-4.0-SNAPSHOT.jar <span class=\"se\">\\</span>\n  uk.ac.cam.ch.wwmm.oscar.oscarcli.OscarCLI <span class=\"se\">\\</span>\n  <span class=\"nt\">-stdin</span> <span class=\"nt\">-output</span> text/turtle\n</code></pre></div></div>\n\n<p>This returns RDF using the <a href=\"http://code.google.com/p/semanticchemistry/\">CHEMINF</a> ontology like:</p>\n\n<div class=\"language-turtle highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kd\">@prefix</span><span class=\"w\"> </span><span class=\"nn\">dc:</span><span class=\"w\">  </span><span class=\"p\">.</span><span class=\"w\">\n</span><span class=\"kd\">@prefix</span><span class=\"w\"> </span><span class=\"nn\">rdfs:</span><span class=\"w\">  </span><span class=\"p\">.</span><span class=\"w\">\n</span><span class=\"kd\">@prefix</span><span class=\"w\"> </span><span class=\"nn\">ex:</span><span class=\"w\">  </span><span class=\"p\">.</span><span class=\"w\">\n</span><span class=\"kd\">@prefix</span><span class=\"w\"> </span><span class=\"nn\">cheminf:</span><span class=\"w\">  </span><span class=\"p\">.</span><span class=\"w\">\n</span><span class=\"kd\">@prefix</span><span class=\"w\"> </span><span class=\"nn\">sio:</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n\n</span><span class=\"nn\">ex:</span><span class=\"n\">entity0\n</span><span class=\"w\">  </span><span class=\"nn\">rdfs:</span><span class=\"n\">subClassOf</span><span class=\"w\"> </span><span class=\"nn\">cheminf:</span><span class=\"n\">CHEMINF_000000</span><span class=\"w\"> </span><span class=\"p\">;</span><span class=\"w\">\n  </span><span class=\"nn\">dc:</span><span class=\"n\">label</span><span class=\"w\"> </span><span class=\"s\">\"propane\"</span><span class=\"w\"> </span><span class=\"p\">;</span><span class=\"w\">\n  </span><span class=\"nn\">cheminf:</span><span class=\"n\">CHEMINF_000200</span><span class=\"w\"> </span><span class=\"p\">[</span><span class=\"w\">\n    </span><span class=\"k\">a</span><span class=\"w\"> </span><span class=\"nn\">cheminf:</span><span class=\"n\">CHEMINF_000113</span><span class=\"w\"> </span><span class=\"p\">;</span><span class=\"w\">\n    </span><span class=\"nn\">sio:</span><span class=\"n\">SIO_000300</span><span class=\"w\"> </span><span class=\"s\">\"InChI=1/C3H8/c1-3-2/h3H2,1-2H3\"</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n  </span><span class=\"p\">]</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n</span></code></pre></div></div>\n\n<p>We can, however, also use <a href=\"http://jericho.htmlparser.net/docs/index.html\">Jericho</a> to extract text from HTML pages, made\navailable with the <code class=\"language-plaintext highlighter-rouge\">-html</code> option, and pulling in a <a href=\"http://www.beilstein-journals.org/bjoc/\">Beilstein Journal of Organic Chemistry</a>\npaper with <a href=\"http://en.wikipedia.org/wiki/Wget\">wget</a>:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nv\">$ </span>wget <span class=\"nt\">-qO-</span> https://doi.org/10.3762/bjoc.6.122 | <span class=\"se\">\\</span>\n  java <span class=\"nt\">-cp</span> oscar4-cli-4.0-SNAPSHOT.jar <span class=\"se\">\\</span>\n  uk.ac.cam.ch.wwmm.oscar.oscarcli.OscarCLI <span class=\"se\">\\</span>\n  <span class=\"nt\">-stdin</span> <span class=\"nt\">-html</span>\n</code></pre></div></div>\n\n<p>This will return 271 chemical entities recognized in the text, matching 48 unique chemical structures.</p>",
      "summary": "One goal of my three month project is to take Oscar4 to the community. We want to get it used more, and we need a larger development community. Oscar4 and the related technologies do a good, sometimes excellent, job, but have to be maintained, just like any other piece of code. To make using it easier, we are developing new APIs, as well as two user-oriented applications: a Taverna 2 plugin , and command line utilities. The Oscar4 Java API has slightly evolved in the last three weeks, removing some complexity. In this post, I will introduce the command line utilities.",
      
      "date_published": "2010-11-18T00:00:00+00:00",
      "date_modified": "2025-03-05T00:00:00+00:00",
      "tags": ["oscar","textmining","beilstein"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/npbqm-gfa49",
      "url": "https://chem-bla-ics.linkedchemistry.info/2010/10/31/citeulike-cito-use-case-1-wordles.html",
      "title": "CiteULike CiTO Use Case #1: Wordles",
      "content_html": "<p>Last month I reported a <a href=\"https://chem-bla-ics.linkedchemistry.info/2010/09/17/list-of-things-i-miss-in-citeulike.html\">few things I missed <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nin <a href=\"http://www.citeulike.org/\">CiteULike</a>. One of them was support for CiTO (see\ndoi:<a href=\"https://doi.org/10.1186/2041-1480-1-S1-S6\">10.1186/2041-1480-1-S1-S6</a>), a great Citation Typing Ontology.</p>\n\n<p>I promised the CiTO author, <a href=\"http://www.zoo.ox.ac.uk/staff/academics/shotton_dm.htm\">David</a>, my use cases, but have been horribly\nbusy in the past few weeks with my new position, wrapping up my past position, and thinking on my position after Cambridge. But finally, here it is. Based on source code I\n<a href=\"http://github.com/egonw/groovy-citeulike\">wrote and released earlier</a>, the first use case I represent is the\n<a href=\"http://www.wordle.net/\">Wordle</a> one, which I <a href=\"http://chem-bla-ics.blogspot.com/2010/02/wordle-of-titles-of-20-most-recent.html\">showed with manual work in February</a>.</p>\n\n<p>Now that all the data is semantically marked up in CiteULike, I can easily extract all paper titles (or whatever is available in CiteULike) for all papers that cite the first\n<a href=\"http://cdk.sf.net/\">CDK</a> paper (doi:<a href=\"http://dx.doi.org/10.1021/ci025584y\">10.1021/ci025584y</a>). Using the JSON interface, I have\n<a href=\"http://github.com/egonw/groovy-citeulike/blob/master/cul2wordleInput.groovy\">this Groovy script</a> to extract all titles:</p>\n\n<div class=\"language-groovy highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kn\">import</span> <span class=\"nn\">groovyx.net.http.HTTPBuilder</span>\n<span class=\"kn\">import</span> <span class=\"nn\">groovyx.net.http.Method</span>\n<span class=\"kn\">import</span> <span class=\"nn\">static</span> <span class=\"n\">groovyx</span><span class=\"o\">.</span><span class=\"na\">net</span><span class=\"o\">.</span><span class=\"na\">http</span><span class=\"o\">.</span><span class=\"na\">ContentType</span><span class=\"o\">.</span><span class=\"na\">JSON</span>\n\n<span class=\"n\">culUrl</span> <span class=\"o\">=</span> <span class=\"s2\">\"http://www.citeulike.org/\"</span><span class=\"o\">;</span>\n\n<span class=\"n\">citotags</span> <span class=\"o\">=</span> <span class=\"o\">[</span>\n  <span class=\"s2\">\"cito--cites\"</span><span class=\"o\">,</span>\n  <span class=\"s2\">\"cito--usesMethodIn\"</span><span class=\"o\">,</span>\n  <span class=\"s2\">\"cito--discusses\"</span><span class=\"o\">,</span>\n  <span class=\"s2\">\"cito--extends\"</span>\n<span class=\"c1\">// there are more, but these are all</span>\n<span class=\"c1\">// I use right now</span>\n<span class=\"o\">]</span>\n\n<span class=\"n\">papers</span> <span class=\"o\">=</span> <span class=\"o\">[</span>\n  <span class=\"s2\">\"1073448\"</span><span class=\"o\">,</span>\n  <span class=\"s2\">\"423382\"</span>\n<span class=\"o\">]</span>\n\n<span class=\"n\">http</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"n\">HTTPBuilder</span><span class=\"o\">(</span><span class=\"n\">culUrl</span><span class=\"o\">)</span>\n\n<span class=\"n\">papers</span><span class=\"o\">.</span><span class=\"na\">each</span> <span class=\"o\">{</span> <span class=\"n\">paper</span> <span class=\"o\">-&gt;</span>\n  <span class=\"n\">println</span> <span class=\"s2\">\"# Processing $paper...\"</span>\n  <span class=\"n\">citotags</span><span class=\"o\">.</span><span class=\"na\">each</span> <span class=\"o\">{</span> <span class=\"n\">tag</span> <span class=\"o\">-&gt;</span>\n    <span class=\"n\">citation</span> <span class=\"o\">=</span> <span class=\"s2\">\"$tag--$paper\"</span><span class=\"o\">.</span><span class=\"na\">toLowerCase</span><span class=\"o\">()</span>\n    <span class=\"n\">http</span><span class=\"o\">.</span><span class=\"na\">request</span><span class=\"o\">(</span><span class=\"n\">Method</span><span class=\"o\">.</span><span class=\"na\">valueOf</span><span class=\"o\">(</span><span class=\"s2\">\"GET\"</span><span class=\"o\">),</span> <span class=\"n\">JSON</span><span class=\"o\">)</span> <span class=\"o\">{</span>\n      <span class=\"n\">uri</span><span class=\"o\">.</span><span class=\"na\">path</span> <span class=\"o\">=</span> <span class=\"s2\">\"/json/user/egonw/tag/$citation\"</span>\n\n      <span class=\"n\">response</span><span class=\"o\">.</span><span class=\"na\">success</span> <span class=\"o\">=</span> <span class=\"o\">{</span> <span class=\"n\">resp</span><span class=\"o\">,</span><span class=\"n\">json</span> <span class=\"o\">-&gt;</span>\n        <span class=\"n\">json</span><span class=\"o\">.</span><span class=\"na\">each</span> <span class=\"o\">{</span> <span class=\"n\">article</span> <span class=\"o\">-&gt;</span>\n          <span class=\"n\">tripleCount</span> <span class=\"o\">=</span> <span class=\"mi\">0</span><span class=\"o\">;</span>\n          <span class=\"n\">article</span><span class=\"o\">.</span><span class=\"na\">tags</span><span class=\"o\">.</span><span class=\"na\">each</span> <span class=\"o\">{</span> <span class=\"n\">artTag</span> <span class=\"o\">-&gt;</span>\n            <span class=\"k\">if</span> <span class=\"o\">(</span><span class=\"n\">artTag</span><span class=\"o\">.</span><span class=\"na\">startsWith</span><span class=\"o\">(</span><span class=\"n\">tag</span><span class=\"o\">))</span> <span class=\"n\">tripleCount</span><span class=\"o\">++</span>\n          <span class=\"o\">}</span>\n          <span class=\"k\">if</span> <span class=\"o\">(</span><span class=\"n\">tripleCount</span> <span class=\"o\">&gt;</span> <span class=\"mi\">0</span><span class=\"o\">)</span> <span class=\"o\">{</span>\n            <span class=\"n\">title</span> <span class=\"o\">=</span> <span class=\"n\">article</span><span class=\"o\">.</span><span class=\"na\">title</span>\n            <span class=\"n\">title</span> <span class=\"o\">=</span> <span class=\"n\">title</span><span class=\"o\">.</span><span class=\"na\">replaceAll</span><span class=\"o\">(</span><span class=\"s2\">\"\\\\{\"</span><span class=\"o\">,</span><span class=\"s2\">\"\"</span><span class=\"o\">)</span>\n            <span class=\"n\">title</span> <span class=\"o\">=</span> <span class=\"n\">title</span><span class=\"o\">.</span><span class=\"na\">replaceAll</span><span class=\"o\">(</span><span class=\"s2\">\"\\\\}\"</span><span class=\"o\">,</span><span class=\"s2\">\"\"</span><span class=\"o\">)</span>\n            <span class=\"n\">println</span> <span class=\"s2\">\"$title\"</span>\n          <span class=\"o\">}</span>\n        <span class=\"o\">}</span>\n      <span class=\"o\">}</span>\n    <span class=\"o\">}</span>\n  <span class=\"o\">}</span>\n<span class=\"o\">}</span>\n</code></pre></div></div>\n\n<p>The output is two blocks which I can easily copy/paste into Wordle. Now, I think I heard one can actually download the java code, so I am tempted to integrate it later,\nbut for now copy/paste will do fine, after the data handling is mostly automated: with a few lines extra I can make such visualizations for any paper\nI annotated in CiteULike with CiTO.</p>\n\n<p><strong>The CDK I paper</strong></p>\n\n<p><img src=\"/assets/images/wordleCDK1.png\" alt=\"\" /></p>\n\n<p><strong>The CDK II paper</strong></p>\n\n<p><img src=\"/assets/images/wordleCDK2.png\" alt=\"\" /></p>\n\n<p>Interesting differences… more statistics will soon follow. See <a href=\"http://chem-bla-ics.blogspot.com/2010/02/further-statistics-on-papers-citing-cdk.html\">Further statistics on the papers citing the CDK</a>\nfor the kind of analyses I have in mind.</p>",
      "summary": "Last month I reported a few things I missed in CiteULike. One of them was support for CiTO (see doi:10.1186/2041-1480-1-S1-S6), a great Citation Typing Ontology.",
      
      "date_published": "2010-10-31T00:00:00+00:00",
      "date_modified": "2025-02-23T00:00:00+00:00",
      "tags": ["cito","citeulike","cdk","wordle"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1186/2041-1480-1-S1-S6", "doi": "10.1186/2041-1480-1-S1-S6"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1021/CI025584Y", "doi": "10.1021/CI025584Y"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/866tq-qv177",
      "url": "https://chem-bla-ics.linkedchemistry.info/2010/10/28/oscar4-java-api-chemical-name.html",
      "title": "Oscar4 Java API: chemical name dictionaries",
      "content_html": "<p>Besides getting Oscar used by <a href=\"http://www.ebi.ac.uk/chebi/\">ChEBI</a> (hopefully <a href=\"https://chem-bla-ics.linkedchemistry.info/2010/10/21/oscar-text-mining-in-taverna.html\">via Taverna <i class=\"fa-solid fa-recycle fa-xs\"></i></a>),\nmy main task in <a href=\"https://chem-bla-ics.linkedchemistry.info/2010/10/15/working-on-oscar-for-three-months.html\">my three month Oscar project <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nis to refactor things to make it more modular, and remove some features no longer needed (e.g. an automatically created workspace environment).\nClearly, I need to define a lot of <a href=\"http://chem-bla-ics.blogspot.com/2010/10/multiple-unit-test-inheritance-with.html\">new unit tests</a>\nto ensure my assumptions on how to code works are valid.</p>\n\n<p>So, what are the API requirements set out? These include (but are not limited to):</p>\n\n<ul>\n  <li>have reasonable defaults</li>\n  <li>being able to add custom dictionaries</li>\n  <li>easily change the chemical entity recogniser</li>\n  <li>plugin text normalization (see <a href=\"https://blogs.ch.cam.ac.uk/pmr/2010/10/24/the-absolute-minimum-every-scientist-with-data-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/\">Peter’s post on UNICODE <i class=\"fa-solid fa-recycle fa-xs\"></i></a>)</li>\n</ul>\n\n<p>This week I worked on the dictionary refactoring, and talked with Lezan about the <a href=\"http://www-ucc.ch.cam.ac.uk/products/software/chemicaltagger\">ChemicalTagger</a>\nand trying to get this based on the newer Oscar code (I think we’ll be able to finish that today). So, I cleaned up\nsome code I did in the first week, and introduced <a href=\"https://bitbucket.org/wwmm/oscar4/src/bf79fd11045c/oscar4-api/src/main/java/uk/ac/cam/ch/wwmm/oscar/Oscar.java\">a Oscar class</a>\nproviding a Java API to the Oscar functionality.</p>\n\n<p>So, to get started with Oscar in your application, you only need to do:</p>\n\n<div class=\"language-java highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nc\">Oscar</span> <span class=\"n\">oscar</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"nc\">Oscar</span><span class=\"o\">(</span>\n  <span class=\"k\">this</span><span class=\"o\">.</span><span class=\"na\">getClass</span><span class=\"o\">().</span><span class=\"na\">getClassLoader</span><span class=\"o\">()</span>\n<span class=\"o\">);</span>\n<span class=\"n\">oscar</span><span class=\"o\">.</span><span class=\"na\">loadDefaultDictionaries</span><span class=\"o\">();</span>\n<span class=\"nc\">Map</span><span class=\"o\">&lt;</span><span class=\"nc\">NamedEntity</span><span class=\"o\">,</span><span class=\"nc\">String</span><span class=\"o\">&gt;</span> <span class=\"n\">structures</span> <span class=\"o\">=</span>\n  <span class=\"n\">oscar</span><span class=\"o\">.</span><span class=\"na\">getNamedEntities</span><span class=\"o\">(</span>\n    <span class=\"s\">\"Ingredients: acetic acid, water.\"</span>\n  <span class=\"o\">);</span>\n</code></pre></div></div>\n\n<p>The ClassLoader is needed because the Oscar class will not generally know how to load custom classes.</p>\n\n<p>You can add additional dictionaries, by implementing the <a href=\"https://bitbucket.org/wwmm/oscar4/src/tip/oscar4-chemnamedict/src/main/java/uk/ac/cam/ch/wwmm/oscar/chemnamedict/IChemNameDict.java\">IChemNameDict</a>\ninterface and one or more of <a href=\"https://bitbucket.org/wwmm/oscar4/src/tip/oscar4-chemnamedict/src/main/java/uk/ac/cam/ch/wwmm/oscar/chemnamedict/IInChIProvider.java\">IInChIProvider</a>,\n<a href=\"https://bitbucket.org/wwmm/oscar4/src/tip/oscar4-chemnamedict/src/main/java/uk/ac/cam/ch/wwmm/oscar/chemnamedict/ISMILESProvider.java\">ISMILESProvider</a>,\nand <a href=\"https://bitbucket.org/wwmm/oscar4/src/tip/oscar4-chemnamedict/src/main/java/uk/ac/cam/ch/wwmm/oscar/chemnamedict/ICMLProvider.java\">ICMLProvider</a>.\nFor example, adding the OPSIN dictionary would extend the above code to:</p>\n\n<div class=\"language-java highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nc\">Oscar</span> <span class=\"n\">oscar</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"nc\">Oscar</span><span class=\"o\">(</span>\n  <span class=\"k\">this</span><span class=\"o\">.</span><span class=\"na\">getClass</span><span class=\"o\">().</span><span class=\"na\">getClassLoader</span><span class=\"o\">()</span>\n<span class=\"o\">);</span>\n<span class=\"n\">oscar</span><span class=\"o\">.</span><span class=\"na\">loadDefaultDictionaries</span><span class=\"o\">();</span>\n<span class=\"n\">oscar</span><span class=\"o\">.</span><span class=\"na\">getChemNameDict</span><span class=\"o\">().</span><span class=\"na\">register</span><span class=\"o\">(</span>\n  <span class=\"k\">new</span> <span class=\"nf\">OpsinDictionary</span><span class=\"o\">()</span>\n<span class=\"o\">);</span>\n<span class=\"nc\">Map</span><span class=\"o\">&lt;</span><span class=\"nc\">NamedEntity</span><span class=\"o\">,</span><span class=\"nc\">String</span><span class=\"o\">&gt;</span> <span class=\"n\">structures</span> <span class=\"o\">=</span>\n  <span class=\"n\">oscar</span><span class=\"o\">.</span><span class=\"na\">getNamedEntities</span><span class=\"o\">(</span>\n    <span class=\"s\">\"Ingredients: acetic acid, water.\"</span>\n  <span class=\"o\">);</span>\n</code></pre></div></div>\n\n<p>And, I think the <code class=\"language-plaintext highlighter-rouge\">oscar.getChemNameDict()</code> method will be renamed to something like <code class=\"language-plaintext highlighter-rouge\">oscar.getDictionaryRegistry()</code> really soon.</p>",
      "summary": "Besides getting Oscar used by ChEBI (hopefully via Taverna ), my main task in my three month Oscar project is to refactor things to make it more modular, and remove some features no longer needed (e.g. an automatically created workspace environment). Clearly, I need to define a lot of new unit tests to ensure my assumptions on how to code works are valid.",
      
      "date_published": "2010-10-28T00:00:00+00:00",
      "date_modified": "2025-03-30T00:00:00+00:00",
      "tags": ["oscar","java","chebi"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/drfp6-c5p44",
      "url": "https://chem-bla-ics.linkedchemistry.info/2010/10/22/cb-new-blogs-14.html",
      "title": "Cb: New Blogs #14",
      "content_html": "<p>Just a few new blogs since <a href=\"https://chem-bla-ics.linkedchemistry.info/2010/07/15/cb-new-blogs-13.html\">#13 in July <i class=\"fa-solid fa-recycle fa-xs\"></i></a>:</p>\n\n<ul>\n  <li><a href=\"https://communities.acs.org/groups/chemical-abstracts-service-committee/blog\">Chemical Abstracts Service Committee (CCAS)</a>\n  (<a href=\"http://web.archive.org/web/2010/http://cb.openmolecules.net/blog_search.php?blog_id=257\">entry in Cb <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>)</li>\n  <li><a href=\"http://agilemolecule.wordpress.com/\">Agilemolecule’s Blog</a>\n  (<a href=\"http://web.archive.org/web/2010/http://cb.openmolecules.net/blog_search.php?blog_id=258\">entry in Cb <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>)</li>\n  <li><a href=\"http://allotrope.fieldofscience.com/\">The Allotrope</a>\n  (<a href=\"http://web.archive.org/web/2010/http://cb.openmolecules.net/blog_search.php?blog_id=259\">entry in Cb <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>)</li>\n</ul>\n\n<p>If you know good chemistry blogs, please contact the author and ask them to email me for inclusion.</p>",
      "summary": "Just a few new blogs since #13 in July :",
      
      "date_published": "2010-10-22T00:00:00+00:00",
      "date_modified": "2026-01-13T00:00:00+00:00",
      "tags": ["cb"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/7njvw-s6q24",
      "url": "https://chem-bla-ics.linkedchemistry.info/2010/10/21/oscar-text-mining-in-taverna.html",
      "title": "Oscar text mining in Taverna",
      "content_html": "<p>One of the goals of my <a href=\"https://chem-bla-ics.linkedchemistry.info/2010/10/15/working-on-oscar-for-three-months.html\">project in Cambridge <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nis to make <a href=\"http://oscar3-chem.sourceforge.net/\">Oscar</a> available as <a href=\"http://taverna.sf.net/\">Taverna</a> plugin\n(<a href=\"https://bitbucket.org/egonw/oscar4-taverna\">source code</a>, <a href=\"https://hudson.ch.cam.ac.uk/job/oscar4-taverna/\">Hudson build</a>).\nI have progressed somewhat, but still struggling with getting the update site working. The plugin actually installs into\n<a href=\"http://www.mygrid.org.uk/2010/07/taverna-220-workbench-and-command-line-tool-are-released/\">Taverna 2.2.0</a>, but the\nactivities do not show up. While this is work in progress, and the other project goal is refactoring, a current demo\nworkflow looks like:</p>\n\n<p><img src=\"/assets/images/oscarTaverna.png\" alt=\"\" /></p>\n\n<p>Example input would be: <em>This is a list of ethanol, methanol, and 2,4,6-trinitrotoluene.</em></p>\n\n<p>The plain text input can be linked to the pdf2text <a href=\"http://www.slideshare.net/markmoby/sadi-in-taverna-tutorial\">SADI service</a>,\nand the CML is suitable for the <a href=\"http://chem-bla-ics.blogspot.com/2010/03/cdk-taverna-paper-published.html\">CDK-Taverna plugin</a>,\nwhich is currently being updated by Andreas, Achim, and <a href=\"http://www.steinbeck-molecular.de/steinblog/\">Christoph</a> for\nTaverna 2.2. As soon as the update site is properly working, I will upload a demo workflow to\n<a href=\"http://www.myexperiment.org/\">MyExperiment.org</a>.</p>\n\n<p>I guess the first next activity (node in the workflow) will be around the dictionaries, as the\n<a href=\"http://opsin.ch.cam.ac.uk/\">OPSIN</a> activity converts only IUPAC names into connection tables. I was told OPSIN parses 97%\nof the IUPAC names it finds, and when it does, it does almost 100% correct. Want to challenge the code?\nUse <a href=\"http://opsin.ch.cam.ac.uk/\">this web service</a>.</p>",
      "summary": "One of the goals of my project in Cambridge is to make Oscar available as Taverna plugin (source code, Hudson build). I have progressed somewhat, but still struggling with getting the update site working. The plugin actually installs into Taverna 2.2.0, but the activities do not show up. While this is work in progress, and the other project goal is refactoring, a current demo workflow looks like:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/oscarTaverna.png",
      "date_published": "2010-10-21T00:00:00+00:00",
      "date_modified": "2025-03-05T00:00:00+00:00",
      "tags": ["oscar","taverna"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/bwg03-1ey37",
      "url": "https://chem-bla-ics.linkedchemistry.info/2010/10/15/working-on-oscar-for-three-months.html",
      "title": "Working on Oscar for three months",
      "content_html": "<p>As Peter <a href=\"https://blogs.ch.cam.ac.uk/pmr/2010/10/11/update-and-real-excitement/\">announced <i class=\"fa-solid fa-recycle fa-xs\"></i></a> in his blog, and I tweeted earlier, I have started as postdoctoral\nresearch associate in <a href=\"http://www-pmr.ch.cam.ac.uk/wiki/Main_Page\">Peter’s group</a> at the <a href=\"http://www.cam.ac.uk/\">University of Cambridge</a>,\nto work the next three months on <a href=\"https://oscar3-chem.sf.net\">Oscar</a>, a chemical text mining tool. My tasks will focus on programmatical\nplumbing instead of method development, and I am aiming at integration with <a href=\"http://cdktaverna.wordpress.com/installing-cdk-taverna/\">CDK-Taverna</a>\n(see doi:<a href=\"http://dx.doi.org/10.1186/1471-2105-11-159\">10.1186/1471-2105-11-159</a>, and which is currently being ported to\n<a href=\"http://www.taverna.org.uk/\">Taverna 2.2</a> by Andreas). <a href=\"http://sea36.blogspot.com/\">Sam</a> and Lezan having been working on the refactoring\nas well, and will help me out with the gory details of the current code.</p>\n\n<p>The source code of Oscar4 is available from <a href=\"https://bitbucket.org/wwmm/oscar4\">this BitBucket project</a>, and you can monitor the code\nstate on <a href=\"https://hudson.ch.cam.ac.uk/job/oscar4/\">this Hudson page</a>. The project I will be working on, is in collaboration with the\n<a href=\"http://www.ebi.ac.uk/chebi/\">ChEBI</a> project, and today we met up with various people in the group, and set out some really interesting\nuse cases.</p>",
      "summary": "As Peter announced in his blog, and I tweeted earlier, I have started as postdoctoral research associate in Peter’s group at the University of Cambridge, to work the next three months on Oscar, a chemical text mining tool. My tasks will focus on programmatical plumbing instead of method development, and I am aiming at integration with CDK-Taverna (see doi:10.1186/1471-2105-11-159, and which is currently being ported to Taverna 2.2 by Andreas). Sam and Lezan having been working on the refactoring as well, and will help me out with the gory details of the current code.",
      
      "date_published": "2010-10-15T00:00:00+00:00",
      "date_modified": "2025-03-30T00:00:00+00:00",
      "tags": ["oscar","textmining","chebi"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1186/1471-2105-11-159", "doi": "10.1186/1471-2105-11-159"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/g2ds0-81a33",
      "url": "https://chem-bla-ics.linkedchemistry.info/2010/09/17/list-of-things-i-miss-in-citeulike.html",
      "title": "A list of things I miss in CiteULike",
      "content_html": "<p>AJCann posted a blog today about what <a href=\"http://scienceoftheinvisible.blogspot.com/2010/09/long-list-of-things-i-dont-like-about.html\">he doesn’t like about Mendeley</a>.\nAbhishek replied that he does not like people complain about one tool, instead of pointing out a good alternative.\n<a href=\"http://www.mendeley.com/\">Mendeley</a> has two alternatives, <a href=\"http://www.zotero.org/\">Zotero</a> and <a href=\"http://www.citeulike.org/\">CiteULike</a> (there is also\n<a href=\"http://connotea.org/\">Connotea</a>, but got behind in evolution).</p>\n\n<p>Agreeing with <a href=\"http://twitter.com/citeulike\">@citeulike</a> and <a href=\"http://twitter.com/abhishektiwari\">@abhishektiwari</a>, as a service provider\nany bad news is good news too: they provide opportunities to improve. So, as encouraged to do so, I reported my long list of things I miss in CiteULike:</p>\n\n<ul>\n  <li>@citeulike ok, one more. wish #18: get readermeter.org to also support citeulike</li>\n  <li>@citeulike wish #17: allow people linking between papers in their libs using CiTO to annotate how they cite papers, see http://ur.ly/lBUO</li>\n  <li>@citeulike wish #16: I think I saw images from some papers, right? how about doing that for #biomedcentral journals too?</li>\n  <li>@citeulike wish #15: at the same http://ur.ly/lIGn page, the tag cloud should reflect tag use with font sizing</li>\n  <li>@citeulike wish #14: upon ‘post url’, the first page with extraced information should allow marking as ‘I am author’ (cannot find that)</li>\n  <li>@citeulike (new) wish #12: clicking an account name should get me to a public portal, rather than just his paper list</li>\n  <li>@citeulike good point, wish #13: be more strong on requiring people to tag papers… and use article keywords as default tags</li>\n  <li>@citeulike wish #11: remove ‘no-tag’ from tag clouds</li>\n  <li>@citeulike wish #10: support #RDF export with BIBO and/or PRISM</li>\n  <li>@citeulike wish #9: use #foaf for the RDFa for account pages, and to mark up friends</li>\n  <li>@citeulike wish #8: and more generally, make #citeulike part of the #linkeddata network (provide an #rdf API)</li>\n  <li>@citeulike wish #7: start using RDFa, e.g. with the PRISM ontology</li>\n  <li>@citeulike wish #6: on an article page (like http://ur.ly/lvWk) summarize the network that bookmarked that article, not just the acc names</li>\n  <li>@citeulike wish #5: don’t show the ‘copy’ button for papers that are already in my archive (really a bug)</li>\n  <li>@citeulike indeed, but don’t or do it right… wish #4: allow people to have that link automatically point to an external blog</li>\n  <li>@citeulike wish #3: provide summaries of lists, like article count per journal and article count per year</li>\n  <li>@citeulike well, I’ll use the blog functoinality to summarize… wish #2: do not try to be a blogging platform</li>\n  <li>@citeulike (new) wish #1: put automatically focus on text field after clicking search and select all text for easy deletion</li>\n</ul>\n\n<p>The reports are now also available in the <a href=\"http://www.citeulike.org/groupfunc/3124/forums\">fora of CiteULike</a>.</p>",
      "summary": "AJCann posted a blog today about what he doesn’t like about Mendeley. Abhishek replied that he does not like people complain about one tool, instead of pointing out a good alternative. Mendeley has two alternatives, Zotero and CiteULike (there is also Connotea, but got behind in evolution).",
      
      "date_published": "2010-09-17T00:00:00+00:00",
      "date_modified": "2010-09-17T00:00:00+00:00",
      "tags": ["cito","citeulike"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/832vn-qwh10",
      "url": "https://chem-bla-ics.linkedchemistry.info/2010/08/14/molecular-chemometrics-principles-3.html",
      "title": "The Molecular Chemometrics Principles #3: stand on shoulders",
      "content_html": "<p>I have blogged about two Molecular Chemometrics principles so far:</p>\n\n<ul>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2010/08/09/molecular-chemometrics-principles-1.html\">McPrinciple #1: access to data</a></li>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2010/08/12/molecular-chemometrics-principles-2-be.html\">McPrinciple #2: be clear in what you mean</a></li>\n</ul>\n\n<p>Peter’s post <a href=\"https://blogs.ch.cam.ac.uk/pmr/2010/08/14/solo10-green-chain-reaction-where-to-store-the-data-dsr-ir-biotorrent-okf-or/\">#solo10: Green Chain Reaction; where to store the data? DSR? IR? BioTorrent, OKF or ??? <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\ngives me enough basis to write up a third principle:</p>\n\n<p><strong>Molecular Chemometrics Principles #3</strong>: We make scientific progress if we build on past achievements.</p>\n\n<p>Sounds logical, right? Practically, the way we share our cheminformatics knowledge makes this standing on shoulders pretty difficult.\nBut there is one particular aspect I would like to ask your attention for: you can contribute by making clear what shoulders\nyou would like to stand on. That is, where do you prefer to put your effort, and what message would you like to give to your user community.</p>\n\n<p>In the aforelinked post, Peter asks where he should upload his data, and he suggest <a href=\"http://www.biotorrents.net/\">BioTorrent</a> (see my review\n<a href=\"http://chem-bla-ics.blogspot.com/2010/04/bittorrents-for-science.html\">BitTorrents for Science</a>), DSpace, and <a href=\"http://www.ckan.net/\">CKAN</a>.\nNow, his <a href=\"http://www.google.se/search?sourceid=chrome&amp;client=ubuntu&amp;channel=cs&amp;ie=UTF-8&amp;q=%22Green+Chain+Reaction%22\">Green Chain Reaction</a>\nis picked up (see <a href=\"http://researchremix.wordpress.com/2010/08/11/green-chain-reaction-project-putting-my-minutes-where-my-mouth-is/\">these</a>\n<a href=\"http://scienceonlinelondon.wikidot.com/topics:green-chain-reaction\">few</a> <a href=\"https://blogs.ch.cam.ac.uk/pmr/2010/08/14/solo10-green-chain-reaction-much-progress-and-continued-request-for-help/\">blog <i class=\"fa-solid fa-recycle fa-xs\"></i></a> posts),\nand the resulting data should be distributed as much as possible. The exact location does not really matter…</p>\n\n<p>But…</p>\n\n<p>By picking where you upload, you make a statement to your community: “<em>Look guys, we are distributing our data via Foo, because we believe those guys are doing good work! Perhaps you can support them too.</em>”.</p>\n\n<p>This principle does not only apply to data, it applies to things too. For example, when\n<a href=\"http://www.chemspider.com/blog/ichemlabs-and-rsc-chemspider-announce-partnership.html\">iChemLabs and RSC ChemSpider Announce Partnership</a>\nthey do not just improve the user experience of ChemSpider (which I certainly won’t object against), but they also imply\n“<em>Look dudes, your product is just not good enough and we do not want to help you improve it either</em>”.\nOf course, ChemSpider has every right, and for them to succeed it is crucial to make decisions like this. Fortunately,\n<a href=\"http://web.chemdoodle.com/installation.php\">ChemDoodle is GPL</a>.</p>\n\n<p>Every project with a user base has the opportunity to support shoulders, if they only visibly stand on them. By merely discussion the\n<em>Green Chain Reaction</em>, I show to support this social web experiment. You can too. Use these powers wisely. May the McPrinciples be with you.</p>",
      "summary": "I have blogged about two Molecular Chemometrics principles so far:",
      
      "date_published": "2010-08-14T00:00:00+00:00",
      "date_modified": "2025-03-30T00:00:00+00:00",
      "tags": ["mcprinciples","solo10","chemdoodle","chemspider","javascript"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/dzqvt-ynv20",
      "url": "https://chem-bla-ics.linkedchemistry.info/2010/08/12/molecular-chemometrics-principles-2-be.html",
      "title": "The Molecular Chemometrics Principles #2: be clear in what you mean",
      "content_html": "<p>I noted <a href=\"https://chem-bla-ics.linkedchemistry.info/2010/08/09/molecular-chemometrics-principles-1.html\">earlier this week</a>\nthat <em>[d]uring the week [in <a href=\"/2010/08/06/oxford-2.html\">Oxford <i class=\"fa-solid fa-recycle fa-xs\"></i></a>], someone (name and address is know at the\neditorial office) commented on the fact that my blog posts are somewhat difficult to follow; that is, it’s\noften not clear why I am posting what I am posting</em>. This triggered the start of a series of principles in\nthe field I coined <a href=\"https://doi.org/10.1080/10408340600969601\">Molecular Chemometrics</a>, and the promise\nthat I will try to indicate in each blog post to which of these principles it relates. Just to put things in a bit more\nperspective; to make a bit more clear why I am blogging about that bit; just to be clear in what I mean.</p>\n\n<p>Now, the first principle was about the need for access to data (<a href=\"https://chem-bla-ics.linkedchemistry.info/2010/08/09/molecular-chemometrics-principles-1.html\">McPrinciple #1</a>).\nThis principle goes without saying, one would think, but is not widely accepted yet. This is why Open Data promotion is still needed. For example, data in papers\nstill is not freely redistributable, as <a href=\"https://chem-bla-ics.linkedchemistry.info/2010/08/09/molecular-chemometrics-principles-1.html\">Peter points out once again</a>.</p>\n\n<p>Anyway, this post is not about McPrinciple #1, but about the second principle.</p>\n\n<p><strong>Molecular Chemometrics Principles #2</strong>: In order to reproduce cheminformatics studies you need to be able to understand the input data.</p>\n\n<p>Readers of my blog will surely recognize this theme. Clearly this theme explains my past fetish for the\n<a href=\"http://chem-bla-ics.blogspot.com/search?q=CML\">Chemical Markup Language</a>, and my more recent work on the\n<a href=\"http://chem-bla-ics.blogspot.com/search?q=RDF\">Resource Description Framework</a>.</p>\n\n<p>And it is so easy to jump to conclusions. Easy to make mistakes. And this is not just at the received side; the sending\nperson may have accidentally made a mistake, or left something accidentally unclear, causing incorrect assumptions, and\ntherefore errors in the cheminformatics computation. Now, if the data was semantically (clearly) annotated, and the\nmeaning was clear, it was also trivial to see when a mistake had sneaked in. Think of it as a check bit.</p>\n\n<p>“Well, isn’t this a bit exaggerated,” you might say. Perhaps, perhaps not. An simple, recent example. We all know\n<a href=\"http://www.opensmiles.org/\">SMILES</a>, right? And we all know that lower case element symbols indicate aromaticity, right?\nThat is, c1ccccc1 is aromatic, right? So, what’s the problem then?</p>\n\n<p>Now, consider the SMILES string c1ccc1. Lower case carbon element symbols, so aromatic, right? Oh, wait…</p>\n\n<p>Therefore, be clear in what you mean. It saves us from a lot of trouble.</p>\n\n<p>Further reading:</p>\n\n<ul>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2010/08/09/molecular-chemometrics-principles-1.html\">The Molecular Chemometrics Principles #1: access to data</a></li>\n  <li>Molecular Chemometrics, 2006 (doi:<a href=\"https://doi.org/10.1080/10408340600969601\">10.1080/10408340600969601</a>)</li>\n</ul>",
      "summary": "I noted earlier this week that [d]uring the week [in Oxford ], someone (name and address is know at the editorial office) commented on the fact that my blog posts are somewhat difficult to follow; that is, it’s often not clear why I am posting what I am posting. This triggered the start of a series of principles in the field I coined Molecular Chemometrics, and the promise that I will try to indicate in each blog post to which of these principles it relates. Just to put things in a bit more perspective; to make a bit more clear why I am blogging about that bit; just to be clear in what I mean.",
      
      "date_published": "2010-08-12T00:00:00+00:00",
      "date_modified": "2025-02-16T00:00:00+00:00",
      "tags": ["mcprinciples","chemometrics","rdf","cml","semweb"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1080/10408340600969601", "doi": "10.1080/10408340600969601"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/srwf0-4gf52",
      "url": "https://chem-bla-ics.linkedchemistry.info/2010/08/09/molecular-chemometrics-principles-1.html",
      "title": "The Molecular Chemometrics Principles #1: access to data",
      "content_html": "<p>The meetings in and around Oxford were great! I already wrote that the Predictive Toxicology workshop was brilliant\n(see <a href=\"/2010/08/01/oxford.html\">Oxford… #1 <i class=\"fa-solid fa-recycle fa-xs\"></i></a>) and\n<a href=\"/2010/08/06/oxford-2.html\">Oxford… #2 <i class=\"fa-solid fa-recycle fa-xs\"></i></a>), but I also very, very much enjoyed meeting up\nwith <a href=\"http://www.danhagon.me.uk/blog/\">Dan</a> and <a href=\"http://semanticscience.wordpress.com/\">Nico</a>! During the week, someone\n(name and address is know at the editorial office) commented on the fact that my blog posts are somewhat difficult\nto follow; that is, it’s often not clear why I am posting what I am posting.</p>\n\n<p>Indeed, I am not particularly one of those bloggers who spends trees after trees, in great detail explaining what is going on.\nI do make a lot of use of <a href=\"http://en.wikipedia.org/wiki/Hyperlink\">hyperlinking</a>; much more than the average blogger. I\nactually assume that readers follow links, to read about the perspective of a blog post. But we all know that scientists\ndo not read the cited papers in a paper they are reading, so who am I to assume blog readers would start doing that with blogs :)</p>\n\n<p>Well, since <a href=\"/2010/02/19/open-data-panton-principles.html\">principles seems popular <i class=\"fa-solid fa-recycle fa-xs\"></i></a>, it might be\na good start of my grand scheme that is behind this blog: the Molecular Chemometrics Principles. Hence, this first post about\nthe why. The why is simply to provide a reference frame to what I am blogging about. In the next few posts on these\nMcPrinciples (is that a catchy name, or what?) that will appear over the next two weeks, I will outline the code of\nchem-bla-ics. And, moreover, from now on, I will tag all my posts with the reaons why I make that post. I am sure that will\nnot be too helpful for the occasional reader, but for anyone who is serious about chem-bla-ics, this will be a genuine gold\nmine of data for pattern recognition and data mining otherwise.</p>\n\n<p>So, here goes.</p>\n\n<p><strong>Molecular Chemometrics Principles #1</strong>: In order to reproduce cheminformatics studies you need access to the input data.</p>\n\n<p>The reason for this is that statistical modeling very much depends on the data on which modeling was done, patterns\nwere recognized, etc. Therefore, without the input data, it is practically impossible to accurately reproduce results.\nFortunately, the acceptance of the importance of access to data (e.g. as Open Data) is slowly getting momentum in\nscience.</p>\n\n<p>Further reading: Molecular Chemometrics, 2006 (doi:<a href=\"https://doi.org/10.1080/10408340600969601\">10.1080/10408340600969601</a>)</p>",
      "summary": "The meetings in and around Oxford were great! I already wrote that the Predictive Toxicology workshop was brilliant (see Oxford… #1 ) and Oxford… #2 ), but I also very, very much enjoyed meeting up with Dan and Nico! During the week, someone (name and address is know at the editorial office) commented on the fact that my blog posts are somewhat difficult to follow; that is, it’s often not clear why I am posting what I am posting.",
      
      "date_published": "2010-08-09T00:00:00+00:00",
      "date_modified": "2025-02-16T00:00:00+00:00",
      "tags": ["chemometrics","mcprinciples"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1080/10408340600969601", "doi": "10.1080/10408340600969601"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/31apn-15c92",
      "url": "https://chem-bla-ics.linkedchemistry.info/2010/08/06/oxford-2.html",
      "title": "Oxford... #2",
      "content_html": "<p>The <a href=\"/2010/08/01/oxford.html\">Predictive Toxicology <i class=\"fa-solid fa-recycle fa-xs\"></i></a> meeting is over. It was a great meeting, by any standard.\nVery much recommended, and many thanx to Barry for the organization! The meeting was a true workshop, with a mix of presentations and getting\nwork done. I participated in a group that looked at mutagenicity of potential anti-malaria drugs from the datasets of GSK and Novartis recently\nrelease as Open Data. We used various tools to predict properties, and plan to make all our results freely available soon. Otherwise, it was\nalso great to meet Nina again (with whom I <a href=\"https://chem-bla-ics.blogspot.com/2010/08/using-bioclipse-to-upload-data-to.html\">talked about OpenTox</a>),\nand to meet other CDK users, including Patrik (<a href=\"https://web.archive.org/web/20100918124243/https://www.farma.ku.dk/smartcyp/\">SMARTCyp <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>,\ndoi:<a href=\"https://doi.org/10.1021/ml100016x\">10.1021/ml100016x</a>) and David (<a href=\"http://inkspotscience.com/\">Inkspot</a>).</p>\n\n<p>In the afternoon I walked around a bit more in Oxford, did some more shopping… and visited the Apple shop and played with an iPad. It’s\nindeed a great piece of hardware. Looking forward to the first Android versions :)</p>\n\n<p><img src=\"/assets/images/DSCI0107.JPG\" alt=\"\" /></p>",
      "summary": "The Predictive Toxicology meeting is over. It was a great meeting, by any standard. Very much recommended, and many thanx to Barry for the organization! The meeting was a true workshop, with a mix of presentations and getting work done. I participated in a group that looked at mutagenicity of potential anti-malaria drugs from the datasets of GSK and Novartis recently release as Open Data. We used various tools to predict properties, and plan to make all our results freely available soon. Otherwise, it was also great to meet Nina again (with whom I talked about OpenTox), and to meet other CDK users, including Patrik (SMARTCyp , doi:10.1021/ml100016x) and David (Inkspot).",
      
      "date_published": "2010-08-06T00:00:00+00:00",
      "date_modified": "2024-05-18T00:00:00+00:00",
      "tags": ["cdk","oxford","oxfordadmet2010","conference"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1021/ml100016x", "doi": "10.1021/ml100016x"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/ap7n7-58v06",
      "url": "https://chem-bla-ics.linkedchemistry.info/2010/08/01/oxford.html",
      "title": "Oxford...",
      "content_html": "<p>Yesterday I arrived in <a href=\"http://en.wikipedia.org/wiki/Oxford\">Oxford</a>, after a 3.5 hour bus transfer from\n<a href=\"http://en.wikipedia.org/wiki/London_Stansted_Airport\">London Stansted</a>. Long, boring ride (though I might have seen a few\n<a href=\"https://web.archive.org/web/20100728051221/http://www.rspb.org.uk/wildlife/birdguide/name/r/redkite/index.aspx\">red kites <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>, but seeing that they were near extinct, I am\nwondering what other large bird of prey has strong split tail like a swallow). Showed once more that the UK infrastructure has\nhardly changed since the 19th century. Enjoying an undergraduate room at one of the colleges. Pretty basic, but makes me feel\nmore like a human than a tourist. Yes!, undergraduate students are human too! One of the advantages is you get an excellent\ninternet connection :)</p>\n\n<p>Anyways, going to the <a href=\"https://web.archive.org/web/20111001000000*/http://echeminfo.com/comty_oxfordadmet10\">Predictive Toxicology <i class=\"fa-solid fa-box-archive fa-xs\"></i></a> workshop, thanx to the bursary award I received from\n<a href=\"https://web.archive.org/web/20110207193345/http://echeminfo.com/\">echeminfo <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>\n(see <a href=\"http://chem-bla-ics.blogspot.com/2010/03/oxford-august-2010-echeminfo-predictive.html\">Oxford, August 2010: eCheminfo Predictive ADME &amp; Toxicology 2010 Workshop</a>).</p>\n\n<p>This afternoon I walked around a bit, watching all the old buildings. But I guess being here without anyone to share it with,\nand that it looks just like <a href=\"http://en.wikipedia.org/wiki/Cambridge\">Cambridge</a>, makes me not-so-much impressed. Moreover, it’s too\nbusy with tourists and people randomly wearing Oxford University sweatshirts. Small and nice was the\n<a href=\"http://www.mhs.ox.ac.uk/\">Museum of the History of Science</a>, with some nice chemical pieces, like this one:</p>\n\n<p><img src=\"/assets/images/DSCI0089.JPG\" alt=\"\" /></p>\n\n<p>Buildings like the <a href=\"http://en.wikipedia.org/wiki/Radcliffe_Camera\">Radcliffe Camera</a> are nice on the outside, but closed.\nSeems I have to become a fellow first. This is what it looked like today:</p>\n\n<p><img src=\"/assets/images/DSCI0094.JPG\" alt=\"\" /></p>\n\n<p>Quite interesting too was the Oxford University Press shop. I’m a sucker for books. Apparently, you can just write a book\nand publish it. For example, an extensive list of <a href=\"http://ukcatalogue.oup.com/category/academic/series/general/opr.do\">dictionaries on about anything</a>…\nand since I have been writing several book chapters right now, perhaps this is actually an interesting route…</p>\n\n<p>But the question is, of course, how long will we keep reading books… they’re the\n<a href=\"https://blogs.ch.cam.ac.uk/pmr/2008/04/29/why-pdf-is-a-hamburger/\">hamburgers <i class=\"fa-solid fa-recycle fa-xs\"></i></a> of educational material… Kindle and alikes will soon drop in\nprice, and cost some €30 euro. But e-book prices will have to drop too, and I still do not get why an e-book is more expensive than a paperback…\n(see <a href=\"http://chem-bla-ics.blogspot.com/2010/07/amazon-kindle-edition-is-more-expensive.html\">Amazon, the Kindle edition is more expensive than the paperback??</a>).\nBut then again… they are rich, and I am not.</p>\n\n<p>There was some recent talk about the fact that no one can be Open to the full. You either do Open Data or Open Source, and\nmake a living from the rest. That’s where I nicely show I know bullocks of economics. I do\n<a href=\"http://bodr.sf.net/\">BODR</a>, <a href=\"http://cdk.sf.net/\">CDK</a>, … all Open, all for free.</p>\n\n<p>OK. That’s a plus for Oxford… it makes you think about things. Perhaps there is something to\n<a href=\"http://en.wikipedia.org/wiki/Morphic_field#Morphogenetic_field\">morphogenetic</a> fields…</p>",
      "summary": "Yesterday I arrived in Oxford, after a 3.5 hour bus transfer from London Stansted. Long, boring ride (though I might have seen a few red kites , but seeing that they were near extinct, I am wondering what other large bird of prey has strong split tail like a swallow). Showed once more that the UK infrastructure has hardly changed since the 19th century. Enjoying an undergraduate room at one of the colleges. Pretty basic, but makes me feel more like a human than a tourist. Yes!, undergraduate students are human too! One of the advantages is you get an excellent internet connection :)",
      
      "date_published": "2010-08-01T00:00:00+00:00",
      "date_modified": "2025-03-30T00:00:00+00:00",
      "tags": ["oxford","oxfordadmet2010","publishing","science","toxicology","conference"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/qarkm-5py65",
      "url": "https://chem-bla-ics.linkedchemistry.info/2010/07/15/cb-new-blogs-13.html",
      "title": "Cb: New Blogs #13",
      "content_html": "<p>The <a href=\"http://cb.openmolecules.net/\">Cb</a> software is still holding… I jettinsoned the old post cache, which speeded up the processing of blogs considerably,\nbut the system just doesn’t scale right. Yet, <a href=\"http://www.ghastlyfop.com/blog/2009/03/postgenomic-hiatus.html\">Euan</a> has done a great job, and the Cb site\nhas now been online for some three years! Here are some new blogs included in the aggregation and analysis:</p>\n\n<ul>\n  <li><a href=\"http://dobsonlab.blogspot.com/\">Paul Dobson Research</a> (<a href=\"http://web.archive.org/web/2010/http://cb.openmolecules.net/blog_search.php?blog_id=241\">entry in Cb <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>)</li>\n  <li><a href=\"http://verpa.wordpress.com/\">Loose Morels » science</a> (<a href=\"http://web.archive.org/web/2010/http://cb.openmolecules.net/blog_search.php?blog_id=242\">entry in Cb <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>)</li>\n  <li><a href=\"http://theplateisbad.blogspot.com/\">The plate is bad</a> (<a href=\"http://web.archive.org/web/2010/http://cb.openmolecules.net/blog_search.php?blog_id=243\">entry in Cb <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>)</li>\n  <li><a href=\"http://laborantje.nl/\">Laborantje</a> (<a href=\"http://web.archive.org/web/2010/http://cb.openmolecules.net/blog_search.php?blog_id=244\">entry in Cb <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>)</li>\n  <li><a href=\"http://alchemoinformatics.blogspot.com/\">alchemoinformatics</a> (<a href=\"http://web.archive.org/web/2010/http://cb.openmolecules.net/blog_search.php?blog_id=245\">entry in Cb <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>)</li>\n  <li><a href=\"http://webapps.oru.edu/new_php/blog\">Andy’s Blog - Oral Roberts University</a> (<a href=\"http://web.archive.org/web/2010/http://cb.openmolecules.net/blog_search.php?blog_id=246\">entry in Cb <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>)</li>\n  <li><a href=\"http://uucheminfoclub.blogspot.com/\">UU Cheminformatics Journal Club</a> (<a href=\"http://web.archive.org/web/2010/http://cb.openmolecules.net/blog_search.php?blog_id=247\">entry in Cb <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>)</li>\n  <li><a href=\"http://michaelseery.com/home\">Is this going to be on the exam?</a> (<a href=\"http://web.archive.org/web/2010/http://cb.openmolecules.net/blog_search.php?blog_id=248\">entry in Cb <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>)</li>\n  <li><a href=\"http://chembioinfo.wordpress.com/\">Asad’s Blog » Work</a> (<a href=\"http://web.archive.org/web/2010/http://cb.openmolecules.net/blog_search.php?blog_id=249\">entry in Cb <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>)</li>\n  <li><a href=\"http://blogs.nature.com/catalyst/\">Chemical Calisthenics</a> (<a href=\"http://web.archive.org/web/2010/http://cb.openmolecules.net/blog_search.php?blog_id=250\">entry in Cb <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>)</li>\n  <li><a href=\"http://chemistandcook.blogspot.com/\">Chemistry &amp; Cooking</a> (<a href=\"http://web.archive.org/web/2010/http://cb.openmolecules.net/blog_search.php?blog_id=251\">entry in Cb <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>)</li>\n  <li><a href=\"http://masterorganicchemistry.wordpress.com/\">Master Organic Chemistry</a> (<a href=\"http://web.archive.org/web/2010/http://cb.openmolecules.net/blog_search.php?blog_id=252\">entry in Cb <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>)</li>\n  <li><a href=\"http://imagingchemistry.com/\">Imaging Chemistry</a> (<a href=\"http://web.archive.org/web/2010/http://cb.openmolecules.net/blog_search.php?blog_id=253\">entry in Cb <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>)</li>\n  <li><a href=\"http://clickchemistry.blogspot.com/\">Click Chemistry</a> (<a href=\"http://web.archive.org/web/2010/http://cb.openmolecules.net/blog_search.php?blog_id=254\">entry in Cb <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>)</li>\n  <li><a href=\"http://joaquinbarroso.wordpress.com/\">Dr. Joaquin Barroso’s Blog</a> (<a href=\"http://web.archive.org/web/2010/http://cb.openmolecules.net/blog_search.php?blog_id=255\">entry in Cb <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>)</li>\n  <li><a href=\"http://tripod.nih.gov/\">Tripod Development</a> (<a href=\"http://web.archive.org/web/2010/http://cb.openmolecules.net/blog_search.php?blog_id=256\">entry in Cb <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>)</li>\n</ul>\n\n<p>Happy reading!</p>\n\n<p>BTW, some WordPress feeds are weird, causing the blog post titles to not show up properly in Cb. I’ll investigate this soon.</p>",
      "summary": "The Cb software is still holding… I jettinsoned the old post cache, which speeded up the processing of blogs considerably, but the system just doesn’t scale right. Yet, Euan has done a great job, and the Cb site has now been online for some three years! Here are some new blogs included in the aggregation and analysis:",
      
      "date_published": "2010-07-15T00:00:00+00:00",
      "date_modified": "2026-01-13T00:00:00+00:00",
      "tags": ["cb"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/q5sed-jea02",
      "url": "https://chem-bla-ics.linkedchemistry.info/2010/02/19/open-data-panton-principles.html",
      "title": "Open Data: the Panton Principles",
      "content_html": "<p>The <a href=\"http://blog.okfn.org/2010/02/19/launch-of-the-panton-principles-for-open-data-in-science/\">announcement</a> of the\n<a href=\"http://web.archive.org/web/20100222213041/http://pantonprinciples.org/\">Panton Principles <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>\n<a href=\"http://opendotdotdot.blogspot.com/2010/02/open-data-question-of-panton-principles.html\">is</a>\n<a href=\"http://web.archive.org/web/20100223064514/http://scienceblogs.com/commonknowledge/2010/02/reaching_agreement_on_the_publ.php\">the <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>\n<a href=\"http://usefulchem.blogspot.com/2010/02/support-open-data-by-endorsing-panton.html\">big</a>\n<a href=\"http://www.sennoma.net/main/archives/2010/02/panton_principles_for_open_dat.php\">news</a>\n<a href=\"http://www.nextgenerationscience.com/open-access/the-panton-principles-for-open-data-in-science/\">today</a>,\nthough Peter already spoke about them\n<a href=\"https://blogs.ch.cam.ac.uk/pmr/2009/05/16/the-panton-principles-a-breakthrough-on-data-licensing-for-public-science/\">in May last year <i class=\"fa-solid fa-recycle fa-xs\"></i></a> (see coverage on\n<a href=\"http://friendfeed.com/search?q=panton+principles\">FriendFeed</a> and\n<a href=\"http://search.twitter.com/search?q=panton+principles\">Twitter</a>). The four principles list in their short versions:</p>\n\n<ul>\n  <li>When publishing data make an explicit and robust statement of your wishes.</li>\n  <li>Use a recognized waiver or license that is appropriate for data.</li>\n  <li>If you want your data to be effectively used and added to by others it should be open as defined by the Open Knowledge/Data Definition – in particular non-commercial and other restrictive clauses should not be used.</li>\n  <li>Explicit dedication of data underlying published science into the public domain via PDDL or CCZero is strongly recommended and ensures compliance with both the Science Commons Protocol for Implementing Open Access Data and the Open Knowledge/Data Definition.</li>\n</ul>\n\n<p>I think these are very workable next steps in Open Date, perhaps even worthy end goals.\n<a href=\"http://web.archive.org/web/20100222084119/http://pantonprinciples.org/endorse\">I endorse them <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>.</p>\n\n<p><img src=\"/assets/images/panton.png\" alt=\"Sort of logo for the Panton Principles, showing this name and the text &quot;Principles for Open Data in Science&quot;.\" /></p>\n\n<p><strong>Principle 1: an explicit and robust statement</strong> <br />\nThis is in my opinion the most important principle. Too often you find a database with really useful data, but without\nany clue about what you are allowed to do with this data. Of course, I can contact the authors, get their permission, etc.\nThey probably like it that way, and I can even understand that. However, it does not scale, and it is slow. Even worse is\nthe situation when the original composer gets missing in action. Both are equally valid, but explicit statements just make\nthings easier.</p>\n\n<p><strong>Principle 2: use a waiver or license appropriate for data</strong> <br />\nThis principle is debatable. Very much like the BSD-vs-GPL flamewars, some like copylefting, others do not. There is an\nimportant difference though. Software has the concept of interfaces, allowing to more easily share incompatible licenses\ncleanly separated by these interfaces. This, for example, allows you to run proprietary software on a Linux kernel.\nHowever, data sets do not have such a concept. There is not such thing as an interface between two numbers.</p>\n\n<p>This makes the concept of mixing data sets different: because there is no such interface, any mixing can only happen\nbetween compatible licenses. This is one reason behind the choice of very liberal licenses like\n<a href=\"http://creativecommons.org/license/zero\">CC0</a>. This license, or waiver really, allows you to do anything, and most\ncertainly, mix data sets.</p>\n\n<p>And that makes things a lot easier. But then again, while these are nobel goals, I rather see people use a copylefting\nlicenses than no license at all.</p>\n\n<p><strong>Principle 3: non-commercial and other restrictive clauses should not be used</strong> <br />\nI think again making things easier is the goal. The non-commercial clause is interesting, and actually likely an important\none. Consider course material, a course book. Those are commercial. Some even argued that many universities themselves are\nactually commercial entities.</p>\n\n<p><strong>Principle 4: the public domain via PDDL or CCZero is strongly recommended</strong> <br />\nI second these choices over a mere claim claim that the data is public domain. The PD concept has many meanings and not\nthe same in every jurisdiction. In particular, differences between USA and EU law. Waiving these right, which is just\nthe same as claiming public domain, works in any jurisdiction, again, making things a lot easier.</p>\n\n<p><strong>Open Data, Open Source, Open Standards are not goals</strong> <br />\nThe underlying pattern of my comments must be clear: the principles make life easier. This is all what Open Source and Open Standards\n(<a href=\"http://blueobelisk.stackexchange.com/questions/231/what-formats-fall-into-open-specification\">whatever</a>\n<a href=\"http://blueobelisk.stackexchange.com/questions/106/which-formats-fall-into-open-data-open-source-and-open-standards\">those</a>\n<a href=\"http://sourceforge.net/mailarchive/forum.php?thread_name=6aeb064b1002162228qcc0603eo8f363a13f7d46805@mail.gmail.com&amp;forum_name=blueobelisk-discuss\">are</a>).</p>\n\n<ul>\n<i><b>The three pillars of the ODOSOS mantra is not goals, but merely the means of making life easier.</b></i>\n</ul>\n\n<p>The Panton Principles certainly make life easier in Open Data, and initiative like the\n<a href=\"http://esw.w3.org/topic/HCLSIG/LODD/\">Linking Open Drug Data</a> in which I participate will greatly benefit\nfrom people adopting them.</p>\n\n<p>The Principles do not solve all problems. There is still a lot of ‘Open Data’ licensed with unrecommended licenses.\nFor example, the <a href=\"http://chem-bla-ics.blogspot.com/2009/09/open-chemical-data-1-nmrshiftdb.html\">NMRShiftDB</a> uses a\nGNU FDL license, and data from supplementary material of Open Access journal articles is like Creative Commons.</p>\n\n<p><img src=\"/assets/images/panton_is_it_open_data.png\" alt=\"Screenshot of the &quot;Is it Open Data?&quot; website, showing starting points like the &quot;How Does It Work?&quot; button.\" /></p>\n\n<p>Another related initiative should certainly not go unnoticed either: <a href=\"http://www.isitopendata.org/\">Is it Open Data?</a>\nis a service where you can try to resolve what the license is for one of those databases which is not quite\nPanton Principles compatible yet.</p>\n\n<p>OK, one last thing. The <a href=\"http://www.volkskrant.nl/binnenland/article1351058.ece/Krachtmeting_in_kabinet_om_Uruzgan\">Dutch government is bursting</a>,\nand I want to listen to the music. With permission, I have been hacking the Panton Principles endorsement page,\nand injected some extra span elements, to make it easier to machine process (again, to make things easier), so\nyou can use the following one-liner to calculate the number of people endorsing the principles:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nv\">$ </span>wget <span class=\"nt\">-O</span> endorsed.html http://pantonprinciples.org/endorsed.html <span class=\"p\">;</span> xpath <span class=\"nt\">-q</span> <span class=\"nt\">-e</span> <span class=\"s2\">\"//span[@class='signature']/span[@class='Country']/text()\"</span> endorsed.html | <span class=\"nb\">sort</span> | <span class=\"nb\">uniq</span> <span class=\"nt\">-c</span>\n</code></pre></div></div>\n\n<p>The current count is <a href=\"http://pantonprinciples.org/endorse/\">hitting 44 now</a>, and has not quite reached the\n<a href=\"http://friendfeed.com/openchemicaldata/e6236e5a/panton-principles-endorse-open-data-go-visit\">500 I had hoped for</a> yet:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>1 Australia\n1 Canada\n1 Catalonia\n2 Espana\n2 France\n6 Germany\n1 Greece\n1 Italy\n1 Netherlands\n1 New Zealand\n1 Norway\n1 Poland\n1 Slovenia\n1 Sweden\n1 Switzerland\n1 The Netherlands\n9 UK\n1 U.K.\n1 United Kingdom\n1 United States of America\n9 USA\n</code></pre></div></div>\n\n<p>Anyone knows how we can convert this into some nice world map graphics with a few lines of code?</p>\n\n<p>Now, I am looking for a bar in Uppsala to write up some ideas about what specifications are :)</p>",
      "summary": "The announcement of the Panton Principles is the big news today, though Peter already spoke about them in May last year (see coverage on FriendFeed and Twitter). The four principles list in their short versions:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/panton_is_it_open_data.png",
      "date_published": "2010-02-19T00:00:00+00:00",
      "date_modified": "2025-03-30T00:00:00+00:00",
      "tags": ["opendata","nmrshiftdb"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/6m8qd-xed40",
      "url": "https://chem-bla-ics.linkedchemistry.info/2010/01/28/semantic-web-features-in-bioclipse-22.html",
      "title": "Semantic Web features in Bioclipse 2.2",
      "content_html": "<p><a href=\"http://www.blogger.com/profile/10379047094508592338\">Ola</a> is releasing <a href=\"http://web.archive.org/web/20100111032721/https://bioclipse.net/\">Bioclipse <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>\n<a href=\"http://sourceforge.net/projects/bioclipse/files/bioclipse2/bioclipse2.2.0/\">2.2.0</a>\ntoday, and asked me to show case the semantic web functionality in Bioclipse. I realized that I do not have a nice page showing the semantic web overview. But I did blog a lot about RDF functionality, so here’s a list of pointers:</p>\n\n<ul>\n  <li><a href=\"http://chem-bla-ics.blogspot.com/2009/11/bioclipse-manager-for-myexperimentorg.html\">Bioclipse Manager for MyExperiment.org</a></li>\n  <li><a href=\"http://chem-bla-ics.blogspot.com/2009/09/bioclipse-rdf-and-defeasible-reasoning.html\">Bioclipse, RDF and defeasible reasoning</a> (see also <a href=\"http://saml.rilspace.com/\">Samuel’s blog</a>)</li>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2009/08/21/bioclipse-and-sparql-end-points-2.html\">Bioclipse and SPARQL end points #2: MyExperiment <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2009/08/16/bioclipse-and-sparql-end-points.html\">Bioclipse and SPARQL end points <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2009/02/22/solubility-data-in-bioclipse-2-handling.html\">Solubility Data in Bioclipse #2: handling RDF <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2009/02/27/solubility-data-in-bioclipse-3-finding.html\">Solubility Data in Bioclipse #3: Finding ChEBI IDs <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n  <li><a href=\"http://chem-bla-ics.blogspot.com/2009/03/solubility-data-in-bioclipse-4-finding.html\">Solubility Data in Bioclipse #4: Finding ChEBI IDs (Again, but better)</a></li>\n  <li><a href=\"http://chem-bla-ics.blogspot.com/2009/05/me-is-having-bioclipsexmpprdf-fun.html\">/me is having Bioclipse/XMPP/RDF fun</a></li>\n</ul>\n\n<p>Or check this screenshot from <a href=\"http://web.archive.org/web/20130310013833/http://egonw.posterous.com/molecules-in-dbpedia-visualized-with-bioclips\">a Posterous post about a MyExperiment workflow\n<i class=\"fa-solid fa-box-archive fa-xs\"></i></a>:</p>\n\n<p><img src=\"/assets/images/dbPediaMolTable.png\" alt=\"\" /></p>\n\n<p>One thing I have not blogged about yet (I think), is that the Bioclipse RDF manager also understands RDFa now. Well, sort of… it relies on a webservice, but this is what the script looks like:</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nx\">model</span> <span class=\"o\">=</span> <span class=\"nx\">rdf</span><span class=\"p\">.</span><span class=\"nf\">createStore</span><span class=\"p\">()</span>\n<span class=\"nx\">rdf</span><span class=\"p\">.</span><span class=\"nf\">importRDFa</span><span class=\"p\">(</span><span class=\"nx\">model</span><span class=\"p\">,</span> <span class=\"dl\">\"</span><span class=\"s2\">http://egonw.github.com/</span><span class=\"dl\">\"</span><span class=\"p\">)</span>\n<span class=\"nx\">rdf</span><span class=\"p\">.</span><span class=\"nf\">saveRDFN3</span><span class=\"p\">(</span><span class=\"nx\">model</span><span class=\"p\">,</span> <span class=\"dl\">\"</span><span class=\"s2\">/Virtual/egonw.n3</span><span class=\"dl\">\"</span><span class=\"p\">)</span>\n</code></pre></div></div>\n\n<p>With support of SPARQL end points, and reading RDF from web resources directly (RDF/XML, N3, RDFa), Bioclipse is ready for the chemical semantic web.</p>",
      "summary": "Ola is releasing Bioclipse 2.2.0 today, and asked me to show case the semantic web functionality in Bioclipse. I realized that I do not have a nice page showing the semantic web overview. But I did blog a lot about RDF functionality, so here’s a list of pointers:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/dbPediaMolTable.png",
      "date_published": "2010-01-28T00:00:00+00:00",
      "date_modified": "2026-04-11T00:00:00+00:00",
      "tags": ["java","bioclipse","rdf","sparql"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/52595-4rj10",
      "url": "https://chem-bla-ics.linkedchemistry.info/2010/01/25/semantic-chemistry-with-resource.html",
      "title": "Semantic Chemistry with the Resource Description Framework",
      "content_html": "<p><strong>First Call for Papers</strong> <br />\nSemantic Chemistry with the Resource Description Framework <br />\n240th ACS National Meeting &amp; Exposition <br />\nBoston, Massachusetts, August 22-26, 2010 <br />\nCINF Division</p>\n\n<p>We now invite papers for our symposium on the use of the Resource Description Framework (RDF) technologies in semantic knowledge representation\nand data exchange in chemistry at the 240th National Meeting &amp; Exposition of the American Chemical Society (ACS) in Boston this fall.</p>\n\n<p>Semantic Chemistry has been around for a while, but is seeing a revival with the adoption of the Resource Description Framework (RDF) and\nmatching technologies in chemistry. RDF triples provide a simple structure that allow data and knowledge alike to be presented in a single\nframework. Derived technologies include the capturing of ontologies with the Web Ontology Language (OWL) and performing queries with SPARQL.\nA wide variety of free and open source product make it easy to set up servers with large amounts of RDF data, while integration with HTML\nis available too with RDFa.</p>\n\n<p>The RDF symposium at the 240th ACS national meeting in Boston invites submissions of talks about the use of RDF in chemistry and cheminformatics.\nTopics could include the use of OWL ontologies, OWL axioms, reasoning and interference, RDF in user interfaces, such as RDFa in web front ends,\nvisualization, querying systems, and applications thereof, such as linking data sets, compound classification, cloud computing, web services,\ndata aggregation, semantic publishing, and literature mining.</p>\n\n<p>Abstracts may be submitted via http://abstracts.acs.org. You’ll find the RDF session as part of the CINF division symposiums. Submissions open\nJanuary 25, 2010, and the deadline is March 28, 2010. In case of questions, please email Egon Willighagen at egon.willighagen@farmbio.uu.se or\nMartin Braendle at braendle@chem.ethz.ch.</p>",
      "summary": "First Call for Papers Semantic Chemistry with the Resource Description Framework 240th ACS National Meeting &amp; Exposition Boston, Massachusetts, August 22-26, 2010 CINF Division",
      
      "date_published": "2010-01-25T00:00:00+00:00",
      "date_modified": "2010-01-25T00:00:00+00:00",
      "tags": ["acs","acsrdf2010","rdf","chemistry","sparql"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/jc9ak-pdr79",
      "url": "https://chem-bla-ics.linkedchemistry.info/2010/01/21/extracting-rdf-from-chem4word-documents.html",
      "title": "Extracting RDF from Chem4Word documents",
      "content_html": "<p><a href=\"http://jat45.wordpress.com/\">Joe</a> has released the first <a href=\"http://research.microsoft.com/en-us/projects/chem4word/\">Chem4Word</a>\n<a href=\"http://jat45.files.wordpress.com/2010/01/example.docx\">demo file</a>, and has written about how to\n<a href=\"http://jat45.wordpress.com/2010/01/20/extracting-cml-from-a-chem4word-authored-document-java/\">extract the CML with Java</a>\nand <a href=\"http://jat45.wordpress.com/2010/01/21/extracting-cml-from-a-chem4word-authored-document-c/\">with C#</a>.</p>\n\n<p>I haven’t actually gotten around to fiddling with Java, but ran <a href=\"http://strigi.sf.net/\">Strigi</a> against it to extract RDF,\nwhile having the <a href=\"http://neksa.blogspot.com/2007/05/introduction.html\">Strigi-Chemistry</a> plugins installed. This is part of the\n<a href=\"http://en.wikipedia.org/wiki/Resource_Description_Framework\">RDF</a> that came out:</p>\n\n<div class=\"language-turtle highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nl\">&lt;example-doc.docx&gt;</span><span class=\"w\">\n  </span><span class=\"nl\">&lt;http://freedesktop.org/standards/xesam/1.0/core#title&gt;</span><span class=\"w\">\n    </span><span class=\"s\">\"acetic acid\"</span><span class=\"p\">,</span><span class=\"w\">\n    </span><span class=\"s\">\"(8R,9S,10R,13S,14S,17S)- 17-hydroxy-10,13-dimethyl- 1,2,6,7,8,9,11,12,14,15,16,17-dodecahydrocyclopenta[a] phenanthren-3-one\"</span><span class=\"p\">,</span><span class=\"w\">\n    </span><span class=\"s\">\"testosterone\"</span><span class=\"p\">;</span><span class=\"w\">\n  </span><span class=\"nl\">&lt;http://freedesktop.org/standards/xesam/1.0/core#version&gt;</span><span class=\"w\">\n    </span><span class=\"s\">\"2\"</span><span class=\"p\">,</span><span class=\"w\">\n    </span><span class=\"s\">\"2\"</span><span class=\"p\">;</span><span class=\"w\">\n  </span><span class=\"nl\">&lt;http://rdf.openmolecules.net/0.9#atomCount&gt;</span><span class=\"w\">\n    </span><span class=\"s\">\"8\"</span><span class=\"p\">,</span><span class=\"w\">\n    </span><span class=\"s\">\"49\"</span><span class=\"p\">;</span><span class=\"w\">\n  </span><span class=\"nl\">&lt;http://rdf.openmolecules.net/0.9#bondCount&gt;</span><span class=\"w\">\n    </span><span class=\"s\">\"7\"</span><span class=\"p\">,</span><span class=\"w\">\n    </span><span class=\"s\">\"52\"</span><span class=\"p\">;</span><span class=\"w\">\n  </span><span class=\"nl\">&lt;http://rdf.openmolecules.net/0.9#molecularFormula&gt;</span><span class=\"w\">\n    </span><span class=\"s\">\"C2H4O2\"</span><span class=\"p\">,</span><span class=\"w\">\n    </span><span class=\"s\">\"C19H28O2\"</span><span class=\"p\">;</span><span class=\"w\">\n</span></code></pre></div></div>\n\n<p>I believe there is quite some room for improvement, but it’s a start :) Thanx to Joe for posting the public domain test file, so\nthat other projects can start play with the exiting new technology. I should note, however, that I am not running a Microsoft OS\nnor MS-Word, and the saved documents source are the only way I have access to the\n<a href=\"http://en.wikipedia.org/wiki/Chemical_Markup_Language\">CML</a> right now.</p>",
      "summary": "Joe has released the first Chem4Word demo file, and has written about how to extract the CML with Java and with C#.",
      
      "date_published": "2010-01-21T00:00:00+00:00",
      "date_modified": "2010-01-21T00:00:00+00:00",
      "tags": ["cml","java","rdf","chem4word","strigi"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/jmd0y-ghc30",
      "url": "https://chem-bla-ics.linkedchemistry.info/2010/01/17/installation-howto-for-cdk-taverna-0511.html",
      "title": "Installation HOWTO for CDK-Taverna 0.5.1.1 in Taverna 1.7.2",
      "content_html": "<p><a href=\"http://cdktaverna.wordpress.com/\">Thomas</a> made a <a href=\"http://cdktaverna.wordpress.com/2010/01/17/cdk-taverna-version-0-5-1-1-released/\">new release of CDK-Taverna</a>\nfor the <a href=\"http://www.taverna.org.uk/\">Taverna</a> 1.7.2 release, which is great news as the previous release was for Taverna 1.7.1.</p>\n\n<p>He asked me to test it, and I installed a fresh Taverna install and the new plugin. After that, I used the <a href=\"http://myexperiment.org/\">MyExperiment</a>\nplugin to download one of the <a href=\"http://www.myexperiment.org/search?query=cdk-taverna&amp;type=workflows\">CDK-Taverna workflows Thomas has on MyExperiment</a>,\nand tuned it a bit to use some local input instead of the database. I took some screenshots while at it, and will use those now to talk you through the\ninstallation of Taverna and the <a href=\"http://cdk-taverna.de/\">CDK-Taverna</a> plugin.</p>\n\n<h3 id=\"download-taverna\">Download Taverna</h3>\n\n<p>Taverna 1.7.2 can be downloaded from <a href=\"http://www.mygrid.org.uk/tools/taverna/taverna-1/taverna-download/\">this download page</a>, but I took the\nLinux version from the <a href=\"http://sourceforge.net/projects/taverna/files/taverna/1.7.2/\">SourceForge download site</a>. I cannot detail the OS/X or\nWindows installation, but on Linux you simply unzip the downloaded file, and you’re ready to go:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nv\">$ </span><span class=\"nb\">cd </span>taverna-1.7.2/\n<span class=\"nv\">$ </span>sh runme.sh\n</code></pre></div></div>\n\n<h3 id=\"plugin-installation\">Plugin Installation</h3>\n\n<p>Plugins can be installed using with the <em>Plugin manager</em> which can be accessed via the <em>Tools</em> menu:</p>\n\n<p><img src=\"/assets/images/cdktav4.png\" alt=\"\" /></p>\n\n<p>Clicking the <em>Find New Plugins</em> takes you to a second dialog listing known plugin sites, and the default download has several already:</p>\n\n<p><img src=\"/assets/images/cdktav1.png\" alt=\"\" /></p>\n\n<p>The CDK-Taverna update site is available at <em>http://cdk-taverna.de/plugin/</em>, and we can make Taverna aware of this update site by clicking the\n<em>Add Plugin Site</em> button:</p>\n\n<p><img src=\"/assets/images/cdktav.png\" alt=\"\" /></p>\n\n<p>After filling out these values and approving it with the <em>OK</em> button, it will show up on the dialog showing all available plugins,\nwhere you need the check the check box in front of the CDK-Taverna plugin name, as done in this screenshot:</p>\n\n<p><img src=\"/assets/images/cdktav2.png\" alt=\"\" /></p>\n\n<p>You can then hit the <em>Install</em> button after which the plugin will be downloaded:</p>\n\n<p><img src=\"/assets/images/cdktav3.png\" alt=\"\" /></p>\n\n<p>After it is done downloading the plugin, you can close the <em>Plugin Sites</em> and <em>Plugin Manager</em> dialogs. I shutdown and restarted Taverna with\n<code class=\"language-plaintext highlighter-rouge\">sh runme.se</code>, but not entirely sure this is needed. After that, the CDK nodes showed up in the list of Taverna processors:</p>\n\n<p><img src=\"/assets/images/cdktav5.png\" alt=\"\" /></p>\n\n<h3 id=\"myexperiment-plugin\">MyExperiment Plugin</h3>\n\n<p>Using the same Taverna <em>Plugin Manager</em> you can also install the MyExperiment plugin that allows you to search, browse, preview and download\nTaverna workflows from the MyExperiment website from within Taverna itself. I installed the plugin, and then used it to search for CDK workflows\n(and downloaded a QSAR workflow):</p>\n\n<p><img src=\"/assets/images/cdktav6.png\" alt=\"\" /></p>\n\n<p>This about everything to get you going. It’s not particularly rocket science, but I guess this howto is useful as you get to see what\nyou should expect when setting up a CDK-Taverna environment. If you have further questions, please leave those in the comments section,\nand I’ll try to merge in answers where possible, or otherwise in the reactions too.</p>",
      "summary": "Thomas made a new release of CDK-Taverna for the Taverna 1.7.2 release, which is great news as the previous release was for Taverna 1.7.1.",
      
      "date_published": "2010-01-17T00:00:00+00:00",
      "date_modified": "2010-01-17T00:00:00+00:00",
      "tags": ["cdk","taverna"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/dskgb-hdz03",
      "url": "https://chem-bla-ics.linkedchemistry.info/2010/01/15/warren-delano-and-future-of-pymol.html",
      "title": "Warren DeLano and the future of PyMOL",
      "content_html": "<p>This blog is old and new news. The old news is that <a href=\"http://warrendelano.blogspot.com/2009/11/warren-delano-passes-away.html\">Warren passed away</a> at the\nend of last year, after having successfully shown how OpenSource cheminformatics (and/or bioinformatics) software can be developed in a commercial setting\n(<a href=\"http://delanoscientific.com/\">DeLano Scientific</a>), and <a href=\"http://pymol.org/\">PyMol</a> was a huge success. Warren had a SourceForge account\n(<a href=\"http://sourceforge.net/users/wdelano\">wdelano</a>) for almost 10 years:</p>\n\n<p><img src=\"/assets/images/pymol1.png\" alt=\"\" /></p>\n\n<p>I had not blogged about it before as the news hit me hard. Surely, Warren knew a lot of people and I only was only one of many, but Warren’s memory sticked well.\nI know Warren from the <a href=\"http://www.jmol.org/\">Jmol</a> project, where we talked in the past of coming to an Open Specification for exchanging scenes between\nJmol and PyMol. Around the end of my PhD contract we even briefly, but seriously, explored doing a post-doc in his group.</p>\n\n<p>Anyway, lot’s of people wrote up blogs (in arbitrary order: <a href=\"http://depth-first.com/articles/2009/11/06/warren-delano\">Rich</a>,\n<a href=\"http://www.p212121.com/2009/11/05/passing-of-warren-delano/\">P212121</a>, <a href=\"http://www.macresearch.org/memoriam-warren-l-delano\">MacResearch</a>,\n<a href=\"http://miningdrugs.blogspot.com/2009/11/warren-delano-in-memoriam.html\">Jörg</a>, <a href=\"http://rosettadesigngroup.com/blog/464/reports-of-warren-delano-passing-away-terrible-tragedy-if-true/\">MMB</a>,\n<a href=\"http://shirleywho.wordpress.com/2009/11/07/in-memoriam-warren-delano/\">Shirley</a>, <a href=\"http://pipeline.corante.com/archives/2009/11/17/warren_delano.php\">Derek</a>,\n<a href=\"http://wavefunction.fieldofscience.com/2009/11/warren-delano.html\">Wavefunction</a>, <a href=\"http://www.openscience.org/blog/?p=300\">Dan</a>,\n<a href=\"http://barryhardy.blogs.com/cheminfostream/2009/11/warren-delano-rip.html\">Barry</a>, and probably many more).\nThey have set up a memorial fund which will focus on promoting the Open Source ideas of Warren, including\nan <a href=\"http://www.wldmemorialfund.org/WLDMemorialFund/Warren_L._DeLano_Memorial_Fund.html\">Award</a>.</p>\n\n<h2 id=\"schrödinger\">Schrödinger</h2>\n<p>Yesterday, I was <a href=\"http://rosettadesigngroup.com/blog/545/pymol-schrodinger/\">pinged</a> about\n<a href=\"http://www.schrodinger.com/news/47/\">Schrödinger acquiring PyMol</a>. The press release is, as usual, short on details, but those have\nbecome clearer during the day. Schrödinger is not new to Open Source cheminformatics, and has an\n<a href=\"http://www.schrodinger.com/products/14/8/\">product based on KNIME</a>, which is now GPL, but also has a proprietary license\nfor those who wish to license so.</p>\n\n<p>But, unless I missed any other Open Source (-oriented) product, the acquisition of PyMol significantly changes the game for them:\nPyMol is a major Open Source product, bigger than KNIME at the moment, I’d guess. My immediate response to the acquisition is whether\nthey acquired copyrights, and they did, according to <a href=\"http://pymol.svn.sf.net/viewvc/pymol/branches/b099/pymol/layer0/Base.h?r1=3886&amp;r2=3885&amp;pathrev=3886\">this commit</a>:</p>\n\n<p><img src=\"/assets/images/pymol2.png\" alt=\"\" /></p>\n\n<p>This is important as it puts Schrödinger in charge of license changes. Fortunately, they seem rather serious about the Open Source thing, and\nhired an active PyMol developer (Jason), and kept the existing Open Source license:</p>\n\n<p><img src=\"/assets/images/pymol.png\" alt=\"\" /></p>\n\n<p>Therefore, congratulations to Schrödinger for getting seriously into the Open Source community, making them the next\n<a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=2059\">Dr Who of PyMol</a>, and\ncongratulations to the family of Warren in ensuring continued development of the PyMol project! It’s hearth-warming to see that despite\nthe bad times they are going through, and all they options they had with the PyMol code base, they find time for and strength in\nsupporting Warren’s ideas about the future of cheminformatics. My thoughts are with them!</p>",
      "summary": "This blog is old and new news. The old news is that Warren passed away at the end of last year, after having successfully shown how OpenSource cheminformatics (and/or bioinformatics) software can be developed in a commercial setting (DeLano Scientific), and PyMol was a huge success. Warren had a SourceForge account (wdelano) for almost 10 years:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/pymol2.png",
      "date_published": "2010-01-15T00:00:00+00:00",
      "date_modified": "2010-01-15T00:00:00+00:00",
      "tags": ["odosos","chemistry"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/7p4zc-khq82",
      "url": "https://chem-bla-ics.linkedchemistry.info/2010/01/06/very-cold-in-uppsala.html",
      "title": "Very cold in Uppsala!",
      "content_html": "<p>There is an image below, that is no longer available, but I have good hopes I will recover it:</p>\n\n<p><img src=\"http://posterous.com/getfile/files.posterous.com/egonw/amaLHDs93cX3EHSg2UHW2oGU2hBlGHtiC3VMq8V5nAmHxZcXA8yWuDDB4JQn/veryCold.jpeg.scaled.500.jpg\" alt=\"\" /></p>",
      "summary": "There is an image below, that is no longer available, but I have good hopes I will recover it:",
      
      "date_published": "2010-01-06T00:00:00+00:00",
      "date_modified": "2026-03-30T00:00:00+00:00",
      "tags": ["uppsala"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/x958j-0xm21",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/12/21/blueobelisk-stackexchange-summary-of.html",
      "title": "BlueObelisk StackExchange: summary of the first month",
      "content_html": "<p>The <a href=\"https://web.archive.org/web/20091231032042/http://blueobelisk.stackexchange.com/\">Blue Obelisk StackExchange <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>\n(BO<sub>x</sub>) has seen a relatively good start,\nbut the number of questions is dropping. The average number of unique visits is about 23-30 now:</p>\n\n<p><img src=\"/assets/images/box.png\" alt=\"\" /></p>\n\n<p>The number of <a href=\"https://web.archive.org/web/20091231024627/http://blueobelisk.stackexchange.com/users\">registered users <i class=\"fa-solid fa-box-archive fa-xs\"></i></a> is not insignificant but also\nhas not been growing much lately:</p>\n\n<p><img src=\"/assets/images/box1.png\" alt=\"\" /></p>\n\n<p>At the same time, the quality of the questions are high, and have real users questions:</p>\n\n<ul>\n  <li><a href=\"http://blueobelisk.stackexchange.com/questions/86/good-way-to-produce-a-table-of-r-groups\">Good way to produce a table of R-groups <i class=\"fa-solid fa-link-slash fa-xs\"></i></a> (yet unanswered!)</li>\n  <li><a href=\"http://blueobelisk.stackexchange.com/questions/91/is-there-an-open-source-pka-or-logd-tool-available\">Is there an open source pKa or LogD tool available <i class=\"fa-solid fa-link-slash fa-xs\"></i></a></li>\n  <li><a href=\"http://blueobelisk.stackexchange.com/questions/81/easiest-way-to-align-two-molecules-from-the-command-line\">Easiest way to align two molecules from the command line? <i class=\"fa-solid fa-link-slash fa-xs\"></i></a></li>\n</ul>\n\n<p>The overall state is <a href=\"https://web.archive.org/web/20100413233403/http://blueobelisk.stackexchange.com/questions\">37 questions <i class=\"fa-solid fa-box-archive fa-xs\"></i></a> with about\n<a href=\"http://blueobelisk.stackexchange.com/tags\">50 different tags <i class=\"fa-solid fa-link-slash fa-xs\"></i></a>:</p>\n\n<p><img src=\"/assets/images/box2.png\" alt=\"\" /></p>\n\n<p>To the make progress with BO<sub>x</sub>, we primarily need to promote it more as central point of entry to people who\nwant to know what <em>free</em> tools they can use to perform there need, and to the people who want to contribute to\n<em>ODOSOS cheminformatics</em>, by pointing out the unsolved problems.</p>",
      "summary": "The Blue Obelisk StackExchange (BOx) has seen a relatively good start, but the number of questions is dropping. The average number of unique visits is about 23-30 now:",
      
      "date_published": "2009-12-21T00:00:00+00:00",
      "date_modified": "2026-03-29T00:00:00+00:00",
      
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/pxwbm-2wn02",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/12/19/december-wrap-up-x-mas-holidays-at-last.html",
      "title": "December wrap up. X-mas holidays at last!",
      "content_html": "<p>Wow, I just saw it has been <strong>17 days</strong> since my last post already :( That’s a new record, I think! A lot has happened\nactually, but I have not had time to write up things. Actually, I have still have\n<a href=\"http://chem-bla-ics.blogspot.com/2009/11/swat4ls-wrapping-up-1.html\">SWAT4LS coverage left to do</a> :(</p>\n\n<h3 id=\"latex\">Latex</h3>\n\n<p>Anyway, one of the things our group has been up to in the last two weeks, is writing a book to support of the\n<a href=\"http://www.pharmbio.org/\">free, online Pharmaceutical Bioinformatics course</a>. The material includes a good deal of\ncheminformatics (molecular representation: chemical graph theory, 3D geometries, file formats, line notations, InChI),\nbioinformatics (sequence analysis), and statistics (PLS, PCA, proteochemometrics). All in light of drug discovery.\nOf course, we’re using LaTeX, and I asked around here and there about related things. For example, on\n<a href=\"http://stackoverflow.com/questions/1901213/open-source-latex-environment-for-educational-books\">StackOverflow on educational book styles</a>.\nBut also on <a href=\"http://friendfeed.com/the-life-scientists/5fd17e31/who-can-point-me-to-drug-where-tautomerism-is\">FriendFeed on tautomerism in relation to drug activity</a>.</p>\n\n<h3 id=\"bioclipse\">Bioclipse</h3>\n\n<p>I also hacked up a <a href=\"http://www.bioclipse.net/\">Bioclipse</a> plugin that allows me to convert a Bioclipse matrix\nresource into LaTeX source code, but that will not be part of the Bioclipse 2.2 release, as it requires quite\nsome updating of the statistics functionality. BTW, the LaTeX plugin is hosted at <a href=\"http://gitorious.org/~egonw\">Gitorious</a>,\nwhich is an GitHub alternative, but <a href=\"http://stackoverflow.com/questions/1913726/has-gitorious-hooks-for-cia-commit-notification\">does not seem to have post-commit hooks</a>\n:(</p>\n\n<p>Also, the Bioclipse2 paper “Bioclipse 2: A scriptable integration platform for the life sciences” has been\npublished now in BMC Bioinformatics (DOI:<a href=\"https://doi.org/10.1186/1471-2105-10-397\">10.1186/1471-2105-10-397</a>)!</p>\n\n<h3 id=\"new-student\">New student</h3>\n\n<p>I am also happy to have a second student starting in January, who will work primarily on an RDF version of the\n<a href=\"http://chembl.blogspot.com/\">ChEMBL</a> data. Her work will extend on the excellent work being done right now by\n<a href=\"http://saml.rilspace.com/\">Samuel on comparing Prolog with DL reasoning</a>.</p>\n\n<h3 id=\"cdk-licensing\">CDK Licensing</h3>\n\n<p>Another thing that required my attention was the problem brought up by Andew on licensing. There was considerable\nout-of-date problems with the statements the <a href=\"http://cdk.sf.net/\">CDK</a> makes on the license and copyright informations\ncertain CDK modules use, and the implications that has on what the CDK project is required to do (e.g. link to source\ncode of third party libraries) and for downstream CDK distributors, like the <a href=\"https://chem-bla-ics.blogspot.com/2009/12/packages.debian.org/libcdk-java\">Debian</a>\nand <a href=\"https://chem-bla-ics.blogspot.com/2009/12/packages.ubuntu.com/libcdk-java\">Ubuntu</a> projects. For example, it\nbecame apparent that the Debian package cannot distribute the XML Schema of <a href=\"http://en.wikipedia.org/wiki/Chemical_Markup_Language\">CML</a>,\nwhich is <a href=\"http://creativecommons.org/licenses/by-nd/2.5/\">CC-BY-ND</a> which is not DSFG-compatible. A few bugs\nhave been reported, and work is ongoing to fix the issues.</p>",
      "summary": "Wow, I just saw it has been 17 days since my last post already :( That’s a new record, I think! A lot has happened actually, but I have not had time to write up things. Actually, I have still have SWAT4LS coverage left to do :(",
      
      "date_published": "2009-12-19T00:00:00+00:00",
      "date_modified": "2009-12-19T00:00:00+00:00",
      "tags": ["bioclipse","bioinf","cheminfo","latex"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1186/1471-2105-10-397", "doi": "10.1186/1471-2105-10-397"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/m46g1-09v92",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/12/02/cdk-131-changes.html",
      "title": "CDK 1.3.1: the changes",
      "content_html": "<p>Two weeks ago, I released <a href=\"http://chem-bla-ics.blogspot.com/2009/11/cdk-124-changes.html\">CDK 1.2.4</a>. <a href=\"https://sourceforge.net/users/anaytamhankar/\">Anay</a>\nreported fails with generating the JavaDoc from the packages, which I think I both fixed now; the uploaded 1.2.4.1 packages on SourceForge include these\nfixes.</p>\n\n<p>The 1.2.4 release was soon followed by 1.3.1. Unfortunately, uploading the packages to to SourceForge over 3G with\n<a href=\"http://www.google.com/chrome\">Chrome</a> did not work well, so only finished that today. CDK 1.3.1 is the second release\nin the development branch, and brings in new functionality but also API changes. Here are the changes since the 1.3.0 release:</p>\n\n<ul>\n<li>Bumped version for 1.3.1 release <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=c34109572d\">c34109572d</a></li>\n<li>Added some extra lines, hopefully fixing the conflicts all the time <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=6dab943dd2\">6dab943dd2</a></li>\n<li>Merged changes from the CDK 1.2.4 release <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=c31c297a59\">c31c297a59</a></li>\n<li>Fixed param name <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=743bad345e\">743bad345e</a></li>\n<li>Updated the makefp3d target to work with the current build system <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=bbb78ee581\">bbb78ee581</a></li>\n<li>Set up a branch for the 1.2.4 release <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=4801d79b8c\">4801d79b8c</a></li>\n<li>Fixes bug 2898399. Updates to the SMARTS parser to handle proper matching for explicit hydrogens (including H, 1H, 2H and 3H). SMARTSQueryVisitor updated to take into account different isotopes of H.  Also updated unit tests to take into account proper H matching. Added a unit test to further check H matching. <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=b67d76ac96\">b67d76ac96</a></li>\n<li>Added tests to match hydrogens <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=45a7f54c3d\">45a7f54c3d</a></li>\n<li>Fixed junior issue 1816529: Missing Java5 generics for atomContainers() Iterator <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=484619e35d\">484619e35d</a></li>\n<li>Reworked the tests for bug 2898032. Updated Javadocs for smiles generator <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=7f68b07aa8\">7f68b07aa8</a></li>\n<li>Added unit test to confirm and check for bug 2898032 <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=924b56395e\">924b56395e</a></li>\n<li>Fixed junior issue 1802586: Misuse of assertTrue for tested strings <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=12bec4f992\">12bec4f992</a></li>\n<li>Made the AtomContainerPermutors IAtomContainer implementation independent <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=4748098973\">4748098973</a></li>\n<li>Merge branch 'cdk-1.2.x' <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=8a95d93506\">8a95d93506</a></li>\n<li>Updated UIT to handle single atom queries and added a unit test for bug 2888845. Also updated Javadocs to specifically note behavior of single atom queries <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=dfb28054f2\">dfb28054f2</a></li>\n<li>Fixed the dist-large target: removed to no longer existing .libdepends after the log4j module patch <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=9dc13e3c33\">9dc13e3c33</a></li>\n<li>Implemented instantiating custom loggers; example in the unit test class <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=2771eb94db\">2771eb94db</a></li>\n<li>Added the use of the SystemOutLoggingTool as back up <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=acf59538e9\">acf59538e9</a></li>\n<li>Added a ILoggerTool implementation for STDOUT <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=921447a690\">921447a690</a></li>\n<li>Dig up and updated the copyright history <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=a3cc8764b6\">a3cc8764b6</a></li>\n<li>Factored out initialization of the tool, to allow reusing the code for other logger class names <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=2af5f247fb\">2af5f247fb</a></li>\n<li>Moved the log4j.jar depending LoggingTool into a separate module <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=112f64d6a0\">112f64d6a0</a></li>\n<li>Introduces the ILoggingTool interface and a factory so that CDK code no longer needs to depend on LoggingTool which depends on Apache's Log4j library. <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=c6c8d38a93\">c6c8d38a93</a></li>\n<li>Added generation of java source jars <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=e33fba2af0\">e33fba2af0</a></li>\n<li>Merge branch 'cdk-1.2.x' <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=b66f8c7182\">b66f8c7182</a></li>\n<li>Fixed matchers to allow XML without new lines (closes #2832835) <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=f9a0552430\">f9a0552430</a></li>\n<li>Added unit tests for detection of PubChem XML files. <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=571f434a94\">571f434a94</a></li>\n<li>Fixed matchers to allow XML without new lines (closes #2832835) <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=a1f25d8629\">a1f25d8629</a></li>\n<li>Added unit tests for detection of PubChem XML files. <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=1cec794dec\">1cec794dec</a></li>\n<li>Merge branch 'stereo' <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=ffe9576b02\">ffe9576b02</a></li>\n<li>Added reading of E/Z stereochemistry from double bonds in MDL V2000 molfiles. <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=cb824f1896\">cb824f1896</a></li>\n<li>A minor fix to clean up a PDMD warning <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=024499e7c2\">024499e7c2</a></li>\n<li>Overwrite unit tests, because there are no change events passed around at all for the NoNotification interface implementations <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=36f295bf8a\">36f295bf8a</a></li>\n<li>Added missing unit tests for IChemModel event propagation for the ICrystal field <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=2993e0c5a0\">2993e0c5a0</a></li>\n<li>Fixed propagation of change events to IChemModel when modifications are made in child IChemObjects <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=0c8a88fec8\">0c8a88fec8</a></li>\n<li>Fixed unit tests: the IChemModel.setFoo(null) should actually give a change event on the listener of the IChemModel, and not after unregistering of the Foo object. <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=b8331764c2\">b8331764c2</a></li>\n<li>Synchronized with the Blue Obelisk version <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=a91062b454\">a91062b454</a></li>\n<li>Added unit test to the function of the new IO setting to force 2D coordinate output. <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=4e2b2bf31e\">4e2b2bf31e</a></li>\n<li>Added writer IO option to force writing of 2D coordinates if 3D coordinates are present too, which now are preferably outputted. <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=0e6aa2cf14\">0e6aa2cf14</a></li>\n<li>Added unit test to verify that if 2D and 3D coordinates are available, the 3D coordinates are outputted. <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=56852f8bd5\">56852f8bd5</a></li>\n<li>Changed IBond.get/setStereo() to use a IBond.Stereo enumeration instead of an int (fixes #2855850): <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=46893ed070\">46893ed070</a></li>\n<li>Merge branch 'cdk-1.2.x' <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=f0c16b0c76\">f0c16b0c76</a></li>\n<li>Fixed Taglets: only return HTML if the Tag is really given; the toString() method is given for all cases, not just when the tag is found <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=1107fb2fba\">1107fb2fba</a></li>\n<li>Added the Mannhold LogP descriptor <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=1e6b6cdfb4\">1e6b6cdfb4</a></li>\n<li>Added the Mannhold LogP descriptor to the ontology <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=a7adc9fe5c\">a7adc9fe5c</a></li>\n<li>Fixeda bug which was causing various parts of the DescriptorEngine to fail - it was trying to instantiate a non-descriptor class which happens to reside in the descriptor package directory. This fix is a bit kludgy - ideally only descriptors should be in that directory <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=0242d9ad67\">0242d9ad67</a></li>\n<li>Fixes ClassCastException when not IMolecule <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=6f3e848f9d\">6f3e848f9d</a></li>\n<li>Upgraded to PMD 2.4.5 with many bug fixes, giving more accurate error reports <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=f29a66b63a\">f29a66b63a</a></li>\n<li>Added missing dependency on cdk-diff, being used in one of the unit tests <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=0e287dd450\">0e287dd450</a></li>\n<li>Fixed methods names to match those in the test class <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=789a314a8e\">789a314a8e</a></li>\n<li>Fixed test method name to match the expected patters, fixing a coverage test fail <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=ac136190d0\">ac136190d0</a></li>\n<li>Removed duplicate code: MolecularFormulaTest now extends AbstractMolecularFormulaTest <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=b8651c75c8\">b8651c75c8</a></li>\n<li>Fixed test method annotation to point to the right method <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=bb7d341577\">bb7d341577</a></li>\n<li>Added missing @TestMethod annotation <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=f6f759b227\">f6f759b227</a></li>\n<li>Added modules that were missing from the PMD testing <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=073e5ec96b\">073e5ec96b</a></li>\n<li>Added modules that were missing from the doccheck testing <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=10dc19c09b\">10dc19c09b</a></li>\n<li>Added reference to IUPAC documentation about stereochemistry visualization. <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=56adf239b0\">56adf239b0</a></li>\n<li>Merge branch 'cdk-1.2.x' <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=03e8496d5c\">03e8496d5c</a></li>\n<li>Patch for bug 2843445. Aims to fix generation of NaN coordinates by SDG <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=d1397fe99d\">d1397fe99d</a></li>\n<li>Added missing dependency introduced by the use of AbstractFingerprinterTest in test-standard. <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=b26eb933e6\">b26eb933e6</a></li>\n<li>Updated the unit test classes for all IFingerprinter implementations to use the new AbstractFingerprinter class; a few unit tests actually fail <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=1989fa5c7b\">1989fa5c7b</a></li>\n<li>Extracted an AbstractFingerprinterTest with unit tests that should really apply to all IFingerprinter implementations <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=8bc42dcfc4\">8bc42dcfc4</a></li>\n<li>Clean up of layout. <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=5f7cb532ee\">5f7cb532ee</a></li>\n<li>Fix the unit test to not give a 'input must support mark' exception on some platforms, by wrapping the InputStream in a BufferedInputStream. <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=6f6f41ede3\">6f6f41ede3</a></li>\n<li>Added missing dependencies <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=8759481c19\">8759481c19</a></li>\n<li>Added ioformats to modules to test <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=56289e2dbc\">56289e2dbc</a></li>\n<li>Use StringBuilder to aggregate the field data, which gives an huge performance boost for SD file where multiline field data is found. <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=df35f02d32\">df35f02d32</a></li>\n<li>Use StringBuilder to aggregate the field data, which gives an huge performance boost for SD file where very much field data, like the ChEBI_complete.sdf <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=eac8266fe9\">eac8266fe9</a></li>\n<li>Factored out steps in reading the SD file data block <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=678e7ca206\">678e7ca206</a></li>\n<li>Bumped version, to make it clear this is not the 1.2.3 release <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=8c8166a1a2\">8c8166a1a2</a></li>\n<li>Bumped version, to make it clear this is not the 1.3.0 release <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=eeda652998\">eeda652998</a></li>\n<li>Fixed registering on the cdk.threadnonsage tag (closes #2796362) <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=d451576275\">d451576275</a></li>\n<li>Removed obsolete pattern from old svnrev tag <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=c8f5a727a3\">c8f5a727a3</a></li>\n<li>Fixed JavaDoc to remove traces of the old svnrev Tag <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=1a70488b81\">1a70488b81</a></li>\n<li>Synchronized exception message with implementation (fixes #2844333) <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=c70b79cbec\">c70b79cbec</a></li>\n<li>Made class private again, per authors request <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=fa7ba022ee\">fa7ba022ee</a></li>\n<li>Any class will do, not just public, final and abstract <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=dc9e8c5f59\">dc9e8c5f59</a></li>\n<li>Two further compile fixes after the merge with CDK 1.2.x <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=3458dee67e\">3458dee67e</a></li>\n<li>Made the class public, to fix a compile problem introduced by the merge with CDK 1.2.x <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=d8170d2f0e\">d8170d2f0e</a></li>\n<li>Added ant task to calculate JavaNCSS code statistics <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=a8b313eace\">a8b313eace</a></li>\n<li>Added JavaNCSS 32.53 (LGPL 3.0) <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=6753a8ceea\">6753a8ceea</a></li>\n<li>Merged from cdk-1.2.x. Also fixed some conflicts. Not sure why/who changed PharmacophoreMatcherTest to use QueryAtomContainer rather than PharmacophoreQuery <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=0d5689f97a\">0d5689f97a</a></li>\n<li>The Pauling Electronegativity is copied in configure as well. I can't see why not copy everything we have. <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=3fd2b171e8\">3fd2b171e8</a></li>\n<li>Revert \"added a test for bug 2831420\": <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=2c2add68bb\">2c2add68bb</a></li>\n<li>Patch for bug 2843445. Aims to fix generation of NaN coordinates by SDG <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=963b0a7980\">963b0a7980</a></li>\n<li>added a test for bug 2831420 <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=5d1522264b\">5d1522264b</a></li>\n<li>added a test for bug #2831420 <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=93536f0d99\">93536f0d99</a></li>\n<li>Made InChIGeneratorFactory a singleton. <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=242da910d0\">242da910d0</a></li>\n<li>Layout. <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=af4fac7a95\">af4fac7a95</a></li>\n<li>Added bug annotation <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=38d0235bba\">38d0235bba</a></li>\n<li>test case for bug #2846213 <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=f84c53b98a\">f84c53b98a</a></li>\n<li>Fixed perception of N.planar3 where N.sp2 was detected, by now taking into account the given hydrogen count. <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=1714de2663\">1714de2663</a></li>\n<li>Fixed perception of benzene with all single bond, but hydrogen count 1 and bonds flagged aromatic. In this case, the type is C.sp2 not C.sp3. <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=05e0be39a0\">05e0be39a0</a></li>\n<li>Added assertions to unit test for values being not null <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=863b0a5325\">863b0a5325</a></li>\n<li>Added two unit tests for the same problem: carbon atom types are not correctly perceived if bond order info is SINGLE only, and hydrogen count and aromaticity flag is set. <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=f19a451a72\">f19a451a72</a></li>\n<li>Moved class into a org.openscience.cdk package, which seems to work now. I'm puzzled why it did not before. Solved several unit test fails. <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=b055c6b0b0\">b055c6b0b0</a></li>\n<li>Merge branch 'cdk-1.2.x' of ssh://egonw@cdk.git.sourceforge.net/gitroot/cdk into cdk-1.2.x <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=f77db9c186\">f77db9c186</a></li>\n<li>Unsealed the XOM jar to allow having the CustomSerializer <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=3b8234020c\">3b8234020c</a></li>\n<li>Fixed Javadocs error <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=e0304bf4bd\">e0304bf4bd</a></li>\n<li>Fixed a wrong javadoc tag. Also removed svn tag in the SMARTS parser JJT file, replaced with git tag <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=c8887734af\">c8887734af</a></li>\n<li>Added support for 'public enum's <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=4bf822d57b\">4bf822d57b</a></li>\n<li>corrected bug in bondtools.isStereo(IAtomContainer container, IAtom stereoAtom). A comparision of atom symbols in a nested loop was using the counter of the outer loop twice. Note it worked before, because there is a sort of fallback to Morgan numbers. fallback to morgan (fixes #2830287) <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=025fb472b8\">025fb472b8</a></li>\n<li>added a new test for bondtools <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=13f72bd406\">13f72bd406</a></li>\n<li>Fixed inconsistency between accepts() and write: also support writing of IAtomContainerSet and IAtomContainer as accepts() indicates (fixes #2827745) <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=6380578865\">6380578865</a></li>\n<li>General test for testing consistency between write() and accepts(), testing that all accepted IChemObject's can also be written <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=f0678eb65a\">f0678eb65a</a></li>\n<li>Added unit test for bug #2826961: inconsistent atom typing for two SMILES. Unit test does not show a fail, ruling out a CDK bug <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=42e45efcd9\">42e45efcd9</a></li>\n<li>Remove erroneous throws statement <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=f8cfea8bc3\">f8cfea8bc3</a></li>\n<li>Bug found calculating the exact mass given a molecular formula when it is negative charged. <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=3d1de45add\">3d1de45add</a></li>\n<li>Fixed reading of the cdk/dict/data/elements.owl database which is now in OWL <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=73225a083a\">73225a083a</a></li>\n<li>Fixed issue 2458210: use assertNotNull(foo) etc instead of assertTrue(foo != null). <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=182afe6670\">182afe6670</a></li>\n<li>Added minimum equivalents for BondManipulator.getMaximumBondOrder() methods <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=6e126962ea\">6e126962ea</a></li>\n<li>Fixes asserts: after removal *no* change should be recorded <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=3b9fa30041\">3b9fa30041</a></li>\n<li>Added IO option to disable generator of XML declaration statements in the output CML. <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=74451b8f0e\">74451b8f0e</a></li>\n<li>Added generics, and consistified code by always returning a List&lt;?&gt; of the same '?'. (And some 80 chars fixes in the JavaDocs.) <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=d6337cd596\">d6337cd596</a></li>\n<li>Added unit tests to test that when a [Molecule|Reaction|Ring]Set has been removed from a ChemModel, the ChemModel should unregister as listener. <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=63e6c014a1\">63e6c014a1</a></li>\n<li>Added unit tests for event propagation from [Molecule|Reaction|Ring]Sets to ChemModel. <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=e01103543b\">e01103543b</a></li>\n<li>More testing of flags. <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=abb53842bf\">abb53842bf</a></li>\n<li>Fix for junior job id: [ 1837692 ] Test methods should throw only one Exception. <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=8c3853638e\">8c3853638e</a></li>\n<li>Fixed missing imports and wrapped to 80 chars <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=fd2d2df6ef\">fd2d2df6ef</a></li>\n<li>Better excpetion handling in builder3d: <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=bc5837d848\">bc5837d848</a></li>\n</ul>",
      "summary": "Two weeks ago, I released CDK 1.2.4. Anay reported fails with generating the JavaDoc from the packages, which I think I both fixed now; the uploaded 1.2.4.1 packages on SourceForge include these fixes.",
      
      "date_published": "2009-12-02T00:00:00+00:00",
      "date_modified": "2009-12-02T00:00:00+00:00",
      "tags": ["cdk","git"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/78fbd-y6h76",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/11/25/swat4ls-wrapping-up-1.html",
      "title": "SWAT4LS: wrapping up #1",
      "content_html": "<p>It’s already been five days since the <a href=\"http://www.swat4ls.org/2009/index.php\">SWAT4LS</a> meeting (<a href=\"http://swat4ls.blogspot.com/\">matching blog</a>),\nand finally got around to writing up my personal summary. I very much enjoyed the <a href=\"http://blueobelisk.stackexchange.com/questions/16/who-will-go-to-swat4ls-and-wants-to-join-a-blue-obelisk-dinner\">Blue Obelisk dinner</a>\non Thursday evening with <a href=\"http://semanticscience.wordpress.com/\">Nico</a>, <a href=\"http://duncan.hull.name/\">Duncan</a>, and\n<a href=\"http://blog.chemdom.com/\">Miguel</a> (the <a href=\"http://cdk.sf.net/\">CDK</a> one).</p>\n\n<p>The SWAT4LS was fun, interesting, perhaps to short, but very much appreciated! Thanx to all organizers! During the day various people tweeted the\nmeeting, using the <a href=\"http://search.twitter.com/search?q=%23swat4ls2009\">#swat4ls2009</a> hashtag (forwarded to <a href=\"http://friendfeed.com/swat4ls2009\">a FriendFeed room</a>),\nwhile Nico covered things in various blog posts which I’ll link to below where appropriate. Summaries I have seen so far are from\n<a href=\"http://semanticscience.wordpress.com/2009/11/24/semantic-web-tools-and-applications-for-life-sciences-2009-a-personal-summary/\">Nico</a>\nand <a href=\"http://duncan.hull.name/2009/11/24/swat4ls/\">Duncan</a> (again :), and <a href=\"http://swat4ls.blogspot.com/2009/11/swat4ls-aftermath.html\">the organizers</a>.</p>\n\n<p>The day kicked off with a presentation by Alan Ruttenberg (<a href=\"http://semanticscience.wordpress.com/2009/11/20/swat4ls2009-keynote-alan-ruttenberg-semantic-web-technology-to-support-studying-the-relation-of-hla-structure-variation-to-disease/\">Nico’s coverage</a>).\nIt nicely demonstrated where the semantic web for life sciences is going too. Particularly interesting was the integration of SPARQL with Jmol in\n<a href=\"http://http//neurocommons.org/page/ImmPort/JmolViz\">ImmPort/JmolViz</a>: it uses Jmol to visualize a PDB entry, while using SPARQL to retrieve atomic\nand residue annotation, using Jmol script (we have to thank another Miguel (the <a href=\"http://www.jmol.org/\">Jmol</a> one) for taking the scripting\nand visualization capabilities <a href=\"http://sourceforge.net/mailarchive/forum.php?thread_name=64707.217.127.90.82.1035878883.squirrel@www.howards.org&amp;forum_name=jmol-developers\">to the next level in 2002</a>).\nIt always makes me proud to see one of the projects I have worked on to hit a prominent place in keynote talks at conferences :)</p>\n\n<p>Alan also clarified that <a href=\"http://creativecommons.org/choose/zero\">CC0</a> is not a license, but a statement about the <em>public domain</em> nature\nof data; there is nothing to accept, nothing to live up to. The important is, and I am sure most of my readers are well aware of that, is\nthat it formalized the public domain concept by wrapping it in a full CC0 statement. My recommendation to all who want to make (chemical data)\navailable as <em>public domain</em>, use the CC0; just because the CC0 works in any country, and it will make a lot of your users very happy.\n<strong>If you cannot claim CC0 because you are not really owner (as I have seen done), do not claim the data to be public domain either\nthen (which was done)!</strong></p>\n\n<p>There was also note of the <a href=\"http://www.co-ode.org/ontologies/amino-acid/2009/02/16/\">Amino Acid Ontology</a>, which comes closer to our groups\nproteochemometrics work, but I have yet to look if this can be used for or linked protein descriptors. Also interesting is the idea behind\n<a href=\"https://github.com/alanruttenberg/rdfherd\">RDFHerd <i class=\"fa-solid fa-recycle fa-xs\"></i></a>, a project aiming to distribute RDF data sets as installable packages. If I understood\ncorrectly, only <a href=\"http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/\">Virtuoso</a> is yet supported, but this thing can fly, particularly,\nif these packages are easily converted into <a href=\"http://www.debian.org/doc/FAQ/ch-pkg_basics.en.html\">Debian packages</a>.</p>\n\n<p>More wrapping up will follow, but got other business to do first now.</p>",
      "summary": "It’s already been five days since the SWAT4LS meeting (matching blog), and finally got around to writing up my personal summary. I very much enjoyed the Blue Obelisk dinner on Thursday evening with Nico, Duncan, and Miguel (the CDK one).",
      
      "date_published": "2009-11-25T00:00:00+00:00",
      "date_modified": "2026-03-29T00:00:00+00:00",
      "tags": ["swat4ls","blue-obelisk","jmol","sparql"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/43dc5-arn55",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/11/21/swat4ls-linking-open-drug-data-to.html",
      "title": "SWAT4LS: &quot;Linking Open Drug Data to Cheminformatics and Proteochemometrics&quot;",
      "content_html": "<p>Please find below the presentation I gave today at <a href=\"https://www.swat4ls.org/workshops/amsterdam2009/\">SWAT4LS <i class=\"fa-solid fa-recycle fa-xs\"></i></a>:</p>\n\n<p><a href=\"https://zenodo.org/records/3551852\"><img src=\"/assets/images/zenodo_swat4ls_2009_lod.png\" alt=\"\" /></a></p>",
      "summary": "Please find below the presentation I gave today at SWAT4LS :",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/zenodo_swat4ls_2009_lod.png",
      "date_published": "2009-11-21T00:00:00+00:00",
      "date_modified": "2026-01-02T00:00:00+00:00",
      "tags": ["swat4ls","rdf","semweb"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.5281/ZENODO.3551852", "doi": "10.5281/ZENODO.3551852"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/zgafr-mre81",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/11/20/linking-two-virtuoso-instances-to-one.html",
      "title": "Linking two Virtuoso instances to one Apache server",
      "content_html": "<p><a href=\"http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/\">Virtuoso</a> comes with its own web front end, but I did not want to make that public.\nAdditionally, I actually have two instances running, one for the <a href=\"http://www.gnu.org/copyleft/fdl.html\">GNU FDL</a>\nlicensed <a href=\"https://chem-bla-ics.linkedchemistry.info/2009/09/04/nmrshiftdb-enters-rdfopenmoleculesnet-2.html\">NMRShiftDB <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\ndata, and one for the CC0 <a href=\"http://chem-bla-ics.blogspot.com/2009/11/chempedia-rdf-1-sparql-end-point.html\">ChemPedia</a> and\n<a href=\"http://chem-bla-ics.blogspot.com/2009/11/open-notebook-science-solubility-sparql.html\">Solubility</a> data sets.</p>\n\n<p>So, I used <a href=\"http://httpd.apache.org/docs/2.0/mod/mod_proxy.html\">Apache’s proxy module</a> linking to two Virtuoso instances.\nThese two are set up by just duplicating a data based folder and to have it use two <em>virtuoso.ini</em> config files. Modify one\nof two config files to have them run on a different port in the Parameters section, for example 1198 and 1199:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>[Parameters]\nServerPort                      = 1199\n</code></pre></div></div>\n\n<p>And assign a different server ports in the HTTPServer section, such as 2290 and 2291:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>[HTTPServer]\nServerPort                      = 2291\n</code></pre></div></div>\n\n<p>Then modify the <em>/etc/apache2/mods-enabled/proxy.conf</em> (or whatever equivalent on your system) to have two sections creating two URL rewrites proxying the request to the virtuoso server:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>&lt;Proxy /nmrshiftdb/sparql&gt;\n  RewriteEngine On\n  Allow from all\n  ProxyPass        http://localhost:2290/sparql\n  ProxyPassReverse http://localhost:2290/sparql\n&lt;/Proxy&gt;\n\n&lt;Proxy /cc0/sparql&gt;\n  RewriteEngine On\n  Allow from all\n  ProxyPass        http://localhost:2291/sparql\n  ProxyPassReverse http://localhost:2291/sparql\n&lt;/Proxy&gt;\n</code></pre></div></div>",
      "summary": "Virtuoso comes with its own web front end, but I did not want to make that public. Additionally, I actually have two instances running, one for the GNU FDL licensed NMRShiftDB data, and one for the CC0 ChemPedia and Solubility data sets.",
      
      "date_published": "2009-11-20T00:00:00+00:00",
      "date_modified": "2026-04-11T00:00:00+00:00",
      "tags": ["virtuoso","apache"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/tezva-0ty37",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/11/19/open-notebook-science-solubility-sparql.html",
      "title": "Open Notebook Science Solubility: the SPARQL end point",
      "content_html": "<p>The <a href=\"http://onschallenge.wikispaces.com/\">Open Notebook Science Solubility</a> challenge is an project crowd sourcing solubility\nof organic compounds in non-aqueous solvents. I have been working on RDF-ing this data:</p>\n\n<ul>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2008/11/18/solubility-data-in-bioclipse-1.html\">Solubility Data in Bioclipse #1 <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2009/02/22/solubility-data-in-bioclipse-2-handling.html\">Solubility Data in Bioclipse #2: handling RDF <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2009/02/27/solubility-data-in-bioclipse-3-finding.html\">Solubility Data in Bioclipse #3: Finding ChEBI IDs <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n  <li><a href=\"http://chem-bla-ics.blogspot.com/2009/03/solubility-data-in-bioclipse-4-finding.html\">Solubility Data in Bioclipse #4: Finding ChEBI IDs (Again, but better)</a></li>\n</ul>\n\n<p>And this resulted in a joint chapter in the nice <a href=\"http://oreilly.com/catalog/9780596157128\">Beatiful Data</a> book.</p>\n\n<p>What I had not done so far, is set up a SPARQL end point for this data, like I did for the\n<a href=\"http://chem-bla-ics.blogspot.com/2009/09/nmrshiftdb-rdf-2-some-statistics.html\">NMRShiftDB data</a>.</p>\n\n<p>Now, however, a Virtuoso-powered <a href=\"http://pele.farmbio.uu.se/cc0/sparql\">SPARQL end point</a> is available, and I hope this\nwill seen get picked up by the other nodes on the ONS Solubility project. It is not a auto-synchronized link, though.</p>\n\n<p>Possible advantages include that the client can perform any query and get these results in various formats,\nincluding JSON. For example, follow this link to get all\n<a href=\"http://pele.farmbio.uu.se/cc0/sparql?default-graph-uri=&amp;query=prefix+dc%3A+%3Chttp%3A%2F%2Fpurl.org%2Fdc%2Felements%2F1.1%2F%3E%0D%0A%0D%0Aselect+distinct+%3Fs+%3Ftitle+where+\\{\\%0D%0A++%3Fs+a+%3Chttp%3A%2F%2Fspreadsheet.google.com%2Fplwwufp30hfq0udnEmRD1aQ%2Fonto%23Solute%3E+%3B%0D%0A+++++dc%3Atitle+%3Ftitle+.%0D%0A\\}\\%0D%0A%0D%0A&amp;format=application%2Fjson&amp;debug=on&amp;timeout=\">solutes in JSON format</a>.</p>\n\n<p>The matching SPARQL looks like:</p>\n\n<div class=\"language-sparql highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">prefix</span><span class=\"w\"> </span><span class=\"nn\">dc</span><span class=\"o\">:</span><span class=\"w\"> </span><span class=\"nn\">&lt;http://purl.org/dc/elements/1.1/&gt;</span><span class=\"w\">\n</span><span class=\"k\">prefix</span><span class=\"w\"> </span><span class=\"nn\">ons</span><span class=\"o\">:</span><span class=\"w\"> </span><span class=\"nn\">&lt;http://spreadsheet.google.com/plwwufp30hfq0udnEmRD1aQ/onto#&gt;</span><span class=\"w\">\n\n</span><span class=\"k\">select</span><span class=\"w\"> </span><span class=\"k\">distinct</span><span class=\"w\"> </span><span class=\"nv\">?s</span><span class=\"w\"> </span><span class=\"nv\">?title</span><span class=\"w\"> </span><span class=\"k\">where</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\">\n  </span><span class=\"nv\">?s</span><span class=\"w\"> </span><span class=\"k\">a</span><span class=\"w\"> </span><span class=\"nn\">ons</span><span class=\"o\">:</span><span class=\"ss\">Solute</span><span class=\"w\"> </span><span class=\"p\">;</span><span class=\"w\">\n     </span><span class=\"nn\">dc</span><span class=\"o\">:</span><span class=\"ss\">title</span><span class=\"w\"> </span><span class=\"nv\">?title</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n</span><span class=\"p\">}</span><span class=\"w\">\n</span></code></pre></div></div>",
      "summary": "The Open Notebook Science Solubility challenge is an project crowd sourcing solubility of organic compounds in non-aqueous solvents. I have been working on RDF-ing this data:",
      
      "date_published": "2009-11-19T00:00:00+00:00",
      "date_modified": "2026-04-11T00:00:00+00:00",
      "tags": ["chemistry","rdf","sparql"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/qfhff-gen31",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/11/19/chempedia-rdf-1-sparql-end-point.html",
      "title": "ChemPedia RDF #1: the SPARQL end point",
      "content_html": "<p>Well, you might spot a pattern here; yes, another chemical <a href=\"http://pele.farmbio.uu.se/cc0/sparql\">SPARQL end point</a>\n(actually, it shares the end point with the <a href=\"http://chem-bla-ics.blogspot.com/2009/11/open-notebook-science-solubility-sparql.html\">Solubility data</a>).\nThis time around <a href=\"http://depth-first.com/\">Rich</a>’s <a href=\"http://chempedia.com/substances\">ChemPedia</a>. Taking advantage of the\n<a href=\"https://doi.org/10.59350/kprj3-gyg97\">CC0-licensed downloads <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\nI have created a small <a href=\"http://groovy.codehaus.org/\">Groovy</a> script (using this <a href=\"http://json-lib.sourceforge.net/\">JSON library</a>)\nto convert the ChemPedia <a href=\"http://en.wikipedia.org/wiki/Json\">JSON</a> into\n<a href=\"http://en.wikipedia.org/wiki/Notation3\">Notation3</a>:</p>\n\n<div class=\"language-groovy highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kn\">import</span> <span class=\"nn\">net.sf.json.groovy.JsonSlurper</span><span class=\"o\">;</span>\n\n<span class=\"n\">input</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"n\">File</span><span class=\"o\">(</span><span class=\"s2\">\"substances.json\"</span><span class=\"o\">)</span>\n<span class=\"n\">json</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"n\">JsonSlurper</span><span class=\"o\">().</span><span class=\"na\">parse</span><span class=\"o\">(</span><span class=\"n\">input</span><span class=\"o\">);</span>\n\n<span class=\"n\">println</span> <span class=\"s2\">\"@prefix dc: &lt;http://purl.org/dc/elements/1.1/&gt;\"</span><span class=\"o\">;</span>\n<span class=\"n\">println</span> <span class=\"s2\">\"@prefix cp: &lt;http://rdf.openmolecules.net/chempedia/onto#&gt;\"</span><span class=\"o\">;</span>\n<span class=\"n\">json</span><span class=\"o\">.</span><span class=\"na\">each</span> <span class=\"o\">{</span> <span class=\"n\">it</span> <span class=\"o\">-&gt;</span>\n  <span class=\"n\">println</span> <span class=\"s2\">\"&lt;\"</span> <span class=\"o\">+</span> <span class=\"n\">it</span><span class=\"o\">.</span><span class=\"na\">uri</span> <span class=\"o\">+</span> <span class=\"s2\">\"&gt; dc:identifier \\\"\"</span> <span class=\"o\">+</span> <span class=\"n\">it</span><span class=\"o\">.</span><span class=\"na\">gsid</span> <span class=\"o\">+</span> <span class=\"s2\">\"\\\";\"</span><span class=\"o\">;</span>\n  <span class=\"n\">println</span> <span class=\"s2\">\" &lt;http://www.w3.org/2002/07/owl#sameAs&gt; &lt;http://rdf.openmolecules.net/?\"</span> <span class=\"o\">+</span> <span class=\"n\">it</span><span class=\"o\">.</span><span class=\"na\">inchi</span> <span class=\"o\">+</span> <span class=\"s2\">\"&gt;;\"</span><span class=\"o\">;</span>\n  <span class=\"n\">println</span> <span class=\"s2\">\"  &lt;http://www.iupac.org/inchi&gt; \\\"\"</span> <span class=\"o\">+</span> <span class=\"n\">it</span><span class=\"o\">.</span><span class=\"na\">inchi</span> <span class=\"o\">+</span> <span class=\"s2\">\"\\\".\"</span><span class=\"o\">;</span>\n  <span class=\"k\">if</span> <span class=\"o\">(</span><span class=\"n\">it</span><span class=\"o\">.</span><span class=\"na\">namings</span><span class=\"o\">.</span><span class=\"na\">size</span><span class=\"o\">()</span> <span class=\"o\">&gt;</span> <span class=\"mi\">0</span><span class=\"o\">)</span> <span class=\"o\">{</span>\n    <span class=\"k\">for</span> <span class=\"o\">(</span><span class=\"kt\">int</span> <span class=\"n\">i</span> <span class=\"o\">=</span> <span class=\"mi\">0</span><span class=\"o\">;</span> <span class=\"n\">i</span><span class=\"o\">&lt;</span><span class=\"n\">it</span><span class=\"o\">.</span><span class=\"na\">namings</span><span class=\"o\">.</span><span class=\"na\">size</span><span class=\"o\">();</span> <span class=\"n\">i</span><span class=\"o\">++)</span> <span class=\"o\">{</span>\n      <span class=\"n\">naming</span> <span class=\"o\">=</span> <span class=\"n\">it</span><span class=\"o\">.</span><span class=\"na\">namings</span><span class=\"o\">.</span><span class=\"na\">get</span><span class=\"o\">(</span><span class=\"n\">i</span><span class=\"o\">);</span>\n      <span class=\"n\">namingURI</span> <span class=\"o\">=</span> <span class=\"n\">it</span><span class=\"o\">.</span><span class=\"na\">uri</span> <span class=\"o\">+</span> <span class=\"s2\">\"/naming\"</span> <span class=\"o\">+</span> <span class=\"n\">i</span><span class=\"o\">;</span>\n      <span class=\"n\">println</span> <span class=\"s2\">\"&lt;\"</span> <span class=\"o\">+</span> <span class=\"n\">it</span><span class=\"o\">.</span><span class=\"na\">uri</span> <span class=\"o\">+</span> <span class=\"s2\">\"&gt; cp:hasNaming \"</span> <span class=\"o\">+</span>\n        <span class=\"s2\">\"&lt;\"</span> <span class=\"o\">+</span> <span class=\"n\">namingURI</span> <span class=\"o\">+</span> <span class=\"s2\">\"&gt;.\"</span><span class=\"o\">;</span>\n      <span class=\"n\">println</span> <span class=\"s2\">\"&lt;\"</span> <span class=\"o\">+</span> <span class=\"n\">namingURI</span> <span class=\"o\">+</span> <span class=\"s2\">\"&gt; a cp:Naming;\"</span><span class=\"o\">;</span>\n      <span class=\"n\">println</span> <span class=\"s2\">\"  cp:hasName \\\"\"</span> <span class=\"o\">+</span> <span class=\"n\">naming</span><span class=\"o\">.</span><span class=\"na\">name</span> <span class=\"o\">+</span> <span class=\"s2\">\"\\\";\"</span><span class=\"o\">;</span>\n      <span class=\"n\">println</span> <span class=\"s2\">\"  cp:hasStatus \\\"\"</span> <span class=\"o\">+</span> <span class=\"n\">naming</span><span class=\"o\">.</span><span class=\"na\">status</span> <span class=\"o\">+</span> <span class=\"s2\">\"\\\";\"</span><span class=\"o\">;</span>\n      <span class=\"n\">println</span> <span class=\"s2\">\"  cp:hasScore \\\"\"</span> <span class=\"o\">+</span> <span class=\"n\">naming</span><span class=\"o\">.</span><span class=\"na\">score</span> <span class=\"o\">+</span> <span class=\"s2\">\"\\\".\"</span><span class=\"o\">;</span>\n    <span class=\"o\">}</span>\n  <span class=\"o\">}</span>\n<span class=\"o\">}</span>\n</code></pre></div></div>\n\n<p>After uploading it into <a href=\"http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VOSIndex\">Virtuoso</a> (now using <code class=\"language-plaintext highlighter-rouge\">DB.DBA.TTLP</code> instead of\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2009/09/04/nmrshiftdb-enters-rdfopenmoleculesnet-2.html\">DB.DBA.RDF_LOAD_RDFXML_MT <i class=\"fa-solid fa-recycle fa-xs\"></i></a>), we can now have our\nregular SPARQL fun with the data from ChemPedia. For example, list the 10 names with the most votes:</p>\n\n<div class=\"language-sparql highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">prefix</span><span class=\"w\"> </span><span class=\"nn\">dc</span><span class=\"o\">:</span><span class=\"w\"> </span><span class=\"nn\">&lt;http://purl.org/dc/elements/1.1/&gt;</span><span class=\"w\">\n</span><span class=\"k\">prefix</span><span class=\"w\"> </span><span class=\"nn\">cp</span><span class=\"o\">:</span><span class=\"w\"> </span><span class=\"nn\">&lt;http://rdf.openmolecules.net/chempedia/onto#&gt;</span><span class=\"w\">\n\n</span><span class=\"k\">select</span><span class=\"w\"> </span><span class=\"k\">distinct</span><span class=\"w\"> </span><span class=\"nv\">?name</span><span class=\"w\"> </span><span class=\"nv\">?score</span><span class=\"w\"> </span><span class=\"k\">where</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\">\n  </span><span class=\"nv\">?s</span><span class=\"w\"> </span><span class=\"k\">a</span><span class=\"w\"> </span><span class=\"nn\">cp</span><span class=\"o\">:</span><span class=\"ss\">Naming</span><span class=\"w\"> </span><span class=\"p\">;</span><span class=\"w\">\n     </span><span class=\"nn\">cp</span><span class=\"o\">:</span><span class=\"ss\">hasName</span><span class=\"w\"> </span><span class=\"nv\">?name</span><span class=\"w\"> </span><span class=\"p\">;</span><span class=\"w\">\n     </span><span class=\"nn\">cp</span><span class=\"o\">:</span><span class=\"ss\">hasScore</span><span class=\"w\"> </span><span class=\"nv\">?score</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n</span><span class=\"p\">}</span><span class=\"w\"> </span><span class=\"k\">ORDER</span><span class=\"w\"> </span><span class=\"k\">BY</span><span class=\"w\"> </span><span class=\"k\">DESC</span><span class=\"p\">(</span><span class=\"nv\">?score</span><span class=\"p\">)</span><span class=\"w\"> </span><span class=\"k\">LIMIT</span><span class=\"w\"> </span><span class=\"mi\">10</span><span class=\"w\">\n</span></code></pre></div></div>",
      "summary": "Well, you might spot a pattern here; yes, another chemical SPARQL end point (actually, it shares the end point with the Solubility data). This time around Rich’s ChemPedia. Taking advantage of the CC0-licensed downloads , I have created a small Groovy script (using this JSON library) to convert the ChemPedia JSON into Notation3:",
      
      "date_published": "2009-11-19T00:00:00+00:00",
      "date_modified": "2026-03-07T00:00:00+00:00",
      "tags": ["rdf","sparql","chempedia","nmrshiftdb"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.59350/kprj3-gyg97", "doi": "10.59350/kprj3-gyg97"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/22kw4-c3g34",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/11/18/cdk-124-authors.html",
      "title": "CDK 1.2.4: the authors",
      "content_html": "<p>The <a href=\"http://cdk.sf.net/\">CDK</a> <a href=\"http://chem-bla-ics.blogspot.com/2009/11/cdk-124-changes.html\">1.2.4 changelog</a> I posted earlier was directly\ncreated from git output. <a href=\"http://git-scm.com/\">Git</a> has many features which makes such thing simple. Here’s a list of authors of\nthe 1.2.4 change set:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>56 Egon Willighagen\n 9 Rajarshi  Guha\n 5 Stefan Kuhn\n 2 mark_rynbeek\n 1 Uli Köhler\n 1 Rajarshi Guha\n 1 Peter Odéus\n 1 Paul Turner\n 1 Miguel Rojas Cherto\n 1 Arvid Berg\n</code></pre></div></div>\n\n<p>This is just the number of commits, and many of mine are logistic in nature. You can also notice that <a href=\"http://blog.rguha.net/\">Rajarshi</a>\nhas changed his name (removed the extraneous space :). Thanx to all of authors for contributing to this release! I am happy to see a few\nnew names in this list, which seems to indicate that the people are settling in on the whole move from Subversion to Git.</p>\n\n<p>This list was created with this command adapted from <a href=\"http://stackoverflow.com/questions/1486819/which-git-commit-stats-are-easy-to-pull\">this StackOverflow question</a>:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>git log --pretty=format:%an cdk-1.2.3..cdk-1.2.4 | awk -- '{ ++c[$0]; } END { for(cc in c) printf \"%5d %s\\n\",c[cc],cc; }' | sort -n -r\n</code></pre></div></div>",
      "summary": "The CDK 1.2.4 changelog I posted earlier was directly created from git output. Git has many features which makes such thing simple. Here’s a list of authors of the 1.2.4 change set:",
      
      "date_published": "2009-11-18T00:10:00+00:00",
      "date_modified": "2009-11-18T00:10:00+00:00",
      "tags": ["cdk","git"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/tjz2n-7xe87",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/11/18/cdk-124-changes.html",
      "title": "CDK 1.2.4: the changes",
      "content_html": "<p>Here is the changelog of <a href=\"http://cdk.sf.net/\">CDK</a> 1.2.4 which I am about to upload to <a href=\"http://sourceforge.net/projects/cdk/files/cdk/\">SourceForge</a>:</p>\n\n<ul>\n<li>Fixed param name <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=743bad345e\">743bad345e</a></li>\n<li>Updated the makefp3d target to work with the current build system <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=bbb78ee581\">bbb78ee581</a></li>\n<li>Set up a branch for the 1.2.4 release <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=4801d79b8c\">4801d79b8c</a></li>\n<li>Fixes bug 2898399. Updates to the SMARTS parser to handle proper matching for explicit hydrogens (including H, 1H, 2H and 3H). SMARTSQueryVisitor updated to take into account different isotopes of H.  Also updated unit tests to take into account proper H matching. Added a unit test to further check H matching. <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=b67d76ac96\">b67d76ac96</a></li>\n<li>Added tests to match hydrogens <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=45a7f54c3d\">45a7f54c3d</a></li>\n<li>Reworked the tests for bug 2898032. Updated Javadocs for smiles generator <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=7f68b07aa8\">7f68b07aa8</a></li>\n<li>Added unit test to confirm and check for bug 2898032 <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=924b56395e\">924b56395e</a></li>\n<li>Updated UIT to handle single atom queries and added a unit test for bug 2888845. Also updated Javadocs to specifically note behavior of single atom queries <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=dfb28054f2\">dfb28054f2</a></li>\n<li>Added generation of java source jars <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=e33fba2af0\">e33fba2af0</a></li>\n<li>Fixed matchers to allow XML without new lines (closes #2832835) <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=f9a0552430\">f9a0552430</a></li>\n<li>Added unit tests for detection of PubChem XML files. <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=571f434a94\">571f434a94</a></li>\n<li>Overwrite unit tests, because there are no change events passed around at all for the NoNotification interface implementations <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=36f295bf8a\">36f295bf8a</a></li>\n<li>Added missing unit tests for IChemModel event propagation for the ICrystal field <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=2993e0c5a0\">2993e0c5a0</a></li>\n<li>Fixed propagation of change events to IChemModel when modifications are made in child IChemObjects <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=0c8a88fec8\">0c8a88fec8</a></li>\n<li>Fixed unit tests: the IChemModel.setFoo(null) should actually give a change event on the listener of the IChemModel, and not after unregistering of the Foo object. <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=b8331764c2\">b8331764c2</a></li>\n<li>Added unit test to the function of the new IO setting to force 2D coordinate output. <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=4e2b2bf31e\">4e2b2bf31e</a></li>\n<li>Added writer IO option to force writing of 2D coordinates if 3D coordinates are present too, which now are preferably outputted. <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=0e6aa2cf14\">0e6aa2cf14</a></li>\n<li>Added unit test to verify that if 2D and 3D coordinates are available, the 3D coordinates are outputted. <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=56852f8bd5\">56852f8bd5</a></li>\n<li>Fixed Taglets: only return HTML if the Tag is really given; the toString() method is given for all cases, not just when the tag is found <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=1107fb2fba\">1107fb2fba</a></li>\n<li>Fixeda bug which was causing various parts of the DescriptorEngine to fail - it was trying to instantiate a non-descriptor class which happens to reside in the descriptor package directory. This fix is a bit kludgy - ideally only descriptors should be in that directory <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=0242d9ad67\">0242d9ad67</a></li>\n<li>Fixes ClassCastException when not IMolecule <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=6f3e848f9d\">6f3e848f9d</a></li>\n<li>Upgraded to PMD 2.4.5 with many bug fixes, giving more accurate error reports <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=f29a66b63a\">f29a66b63a</a></li>\n<li>Added missing dependency on cdk-diff, being used in one of the unit tests <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=0e287dd450\">0e287dd450</a></li>\n<li>Fixed methods names to match those in the test class <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=789a314a8e\">789a314a8e</a></li>\n<li>Fixed test method name to match the expected patters, fixing a coverage test fail <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=ac136190d0\">ac136190d0</a></li>\n<li>Removed duplicate code: MolecularFormulaTest now extends AbstractMolecularFormulaTest <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=b8651c75c8\">b8651c75c8</a></li>\n<li>Fixed test method annotation to point to the right method <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=bb7d341577\">bb7d341577</a></li>\n<li>Added missing @TestMethod annotation <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=f6f759b227\">f6f759b227</a></li>\n<li>Added modules that were missing from the PMD testing <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=073e5ec96b\">073e5ec96b</a></li>\n<li>Added modules that were missing from the doccheck testing <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=10dc19c09b\">10dc19c09b</a></li>\n<li>Patch for bug 2843445. Aims to fix generation of NaN coordinates by SDG <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=d1397fe99d\">d1397fe99d</a></li>\n<li>Fix the unit test to not give a 'input must support mark' exception on some platforms, by wrapping the InputStream in a BufferedInputStream. <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=6f6f41ede3\">6f6f41ede3</a></li>\n<li>Added missing dependencies <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=8759481c19\">8759481c19</a></li>\n<li>Added ioformats to modules to test <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=56289e2dbc\">56289e2dbc</a></li>\n<li>Use StringBuilder to aggregate the field data, which gives an huge performance boost for SD file where multiline field data is found. <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=df35f02d32\">df35f02d32</a></li>\n<li>Use StringBuilder to aggregate the field data, which gives an huge performance boost for SD file where very much field data, like the ChEBI_complete.sdf <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=eac8266fe9\">eac8266fe9</a></li>\n<li>Factored out steps in reading the SD file data block <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=678e7ca206\">678e7ca206</a></li>\n<li>Bumped version, to make it clear this is not the 1.2.3 release <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=8c8166a1a2\">8c8166a1a2</a></li>\n<li>Fixed registering on the cdk.threadnonsage tag (closes #2796362) <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=d451576275\">d451576275</a></li>\n<li>Removed obsolete pattern from old svnrev tag <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=c8f5a727a3\">c8f5a727a3</a></li>\n<li>Fixed JavaDoc to remove traces of the old svnrev Tag <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=1a70488b81\">1a70488b81</a></li>\n<li>Synchronized exception message with implementation (fixes #2844333) <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=c70b79cbec\">c70b79cbec</a></li>\n<li>The Pauling Electronegativity is copied in configure as well. I can't see why not copy everything we have. <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=3fd2b171e8\">3fd2b171e8</a></li>\n<li>Added bug annotation <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=38d0235bba\">38d0235bba</a></li>\n<li>test case for bug #2846213 <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=f84c53b98a\">f84c53b98a</a></li>\n<li>Fixed perception of N.planar3 where N.sp2 was detected, by now taking into account the given hydrogen count. <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=1714de2663\">1714de2663</a></li>\n<li>Fixed perception of benzene with all single bond, but hydrogen count 1 and bonds flagged aromatic. In this case, the type is C.sp2 not C.sp3. <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=05e0be39a0\">05e0be39a0</a></li>\n<li>Added assertions to unit test for values being not null <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=863b0a5325\">863b0a5325</a></li>\n<li>Added two unit tests for the same problem: carbon atom types are not correctly perceived if bond order info is SINGLE only, and hydrogen count and aromaticity flag is set. <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=f19a451a72\">f19a451a72</a></li>\n<li>Moved class into a org.openscience.cdk package, which seems to work now. I'm puzzled why it did not before. Solved several unit test fails. <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=b055c6b0b0\">b055c6b0b0</a></li>\n<li>Merge branch 'cdk-1.2.x' of ssh://egonw@cdk.git.sourceforge.net/gitroot/cdk into cdk-1.2.x <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=f77db9c186\">f77db9c186</a></li>\n<li>Unsealed the XOM jar to allow having the CustomSerializer <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=3b8234020c\">3b8234020c</a></li>\n<li>Fixed Javadocs error <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=e0304bf4bd\">e0304bf4bd</a></li>\n<li>Fixed a wrong javadoc tag. Also removed svn tag in the SMARTS parser JJT file, replaced with git tag <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=c8887734af\">c8887734af</a></li>\n<li>Added support for 'public enum's <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=4bf822d57b\">4bf822d57b</a></li>\n<li>corrected bug in bondtools.isStereo(IAtomContainer container, IAtom stereoAtom). A comparision of atom symbols in a nested loop was using the counter of the outer loop twice. Note it worked before, because there is a sort of fallback to Morgan numbers. fallback to morgan (fixes #2830287) <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=025fb472b8\">025fb472b8</a></li>\n<li>added a new test for bondtools <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=13f72bd406\">13f72bd406</a></li>\n<li>Fixed inconsistency between accepts() and write: also support writing of IAtomContainerSet and IAtomContainer as accepts() indicates (fixes #2827745) <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=6380578865\">6380578865</a></li>\n<li>General test for testing consistency between write() and accepts(), testing that all accepted IChemObject's can also be written <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=f0678eb65a\">f0678eb65a</a></li>\n<li>Added unit test for bug #2826961: inconsistent atom typing for two SMILES. Unit test does not show a fail, ruling out a CDK bug <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=42e45efcd9\">42e45efcd9</a></li>\n<li>Remove erroneous throws statement <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=f8cfea8bc3\">f8cfea8bc3</a></li>\n<li>Bug found calculating the exact mass given a molecular formula when it is negative charged. <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=3d1de45add\">3d1de45add</a></li>\n<li>Fixed reading of the cdk/dict/data/elements.owl database which is now in OWL <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=73225a083a\">73225a083a</a></li>\n<li>Fixed issue 2458210: use assertNotNull(foo) etc instead of assertTrue(foo != null). <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=182afe6670\">182afe6670</a></li>\n<li>Added minimum equivalents for BondManipulator.getMaximumBondOrder() methods <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=6e126962ea\">6e126962ea</a></li>\n<li>Fixes asserts: after removal *no* change should be recorded <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=3b9fa30041\">3b9fa30041</a></li>\n<li>Added IO option to disable generator of XML declaration statements in the output CML. <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=74451b8f0e\">74451b8f0e</a></li>\n<li>Added generics, and consistified code by always returning a List&lt;?&gt; of the same '?'. (And some 80 chars fixes in the JavaDocs.) <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=d6337cd596\">d6337cd596</a></li>\n<li>Added unit tests to test that when a [Molecule|Reaction|Ring]Set has been removed from a ChemModel, the ChemModel should unregister as listener. <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=63e6c014a1\">63e6c014a1</a></li>\n<li>Added unit tests for event propagation from [Molecule|Reaction|Ring]Sets to ChemModel. <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=e01103543b\">e01103543b</a></li>\n<li>More testing of flags. <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=abb53842bf\">abb53842bf</a></li>\n<li>Fix for junior job id: [ 1837692 ] Test methods should throw only one Exception. <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=8c3853638e\">8c3853638e</a></li>\n<li>Fixed missing imports and wrapped to 80 chars <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=fd2d2df6ef\">fd2d2df6ef</a></li>\n<li>Better excpetion handling in builder3d: <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=bc5837d848\">bc5837d848</a></li>\n<li>Fixed serialization of IAtom's with null formal charge to not cause NullPointerExceptions <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=acc8012b6c\">acc8012b6c</a></li>\n<li>Added unit test for serialization of null formal charges into the MDL molfile format (which currently fails) <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=df57aea10f\">df57aea10f</a></li>\n<li>Updated Javadocs for SMARTS query tool to indicate unsupported features <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=e1da4c0689\">e1da4c0689</a></li>\n<li>Cleaned up source file to remove spurious line endings <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/cdk;a=commit;h=3d7adae977\">3d7adae977</a></li>\n</ul>\n\n<p>This overview was created with this Linux one-liner:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>git log <span class=\"nt\">--oneline</span> cdk-1.2.3.. | <span class=\"nb\">sed</span> <span class=\"s1\">'s/\\([a-f0-9]*\\)\\s\\(.*\\).*/&lt;li&gt;\\2 &lt;a href=\"http:\\/\\/cdk.git.sourceforge.net\\/git\\/gitweb.cgi?p=cdk\\/cdk;a=commit;h=\\1\"&gt;\\1&lt;\\/a&gt;&lt;\\/li&gt;/'</span>\n</code></pre></div></div>",
      "summary": "Here is the changelog of CDK 1.2.4 which I am about to upload to SourceForge:",
      
      "date_published": "2009-11-18T00:00:00+00:00",
      "date_modified": "2009-11-18T00:00:00+00:00",
      "tags": ["cdk","git"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/nvdsb-ygz96",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/11/11/blueobelisk-stackexchange-com.html",
      "title": "BlueObelisk StackExchange (.com)",
      "content_html": "<p>Oh no, not another communication channel?! We already have <a href=\"http://chem-bla-ics.blogspot.com/2009/09/google-wave-robot-for-cdk-functionality.html\">Google</a>\n<a href=\"http://chem-bla-ics.blogspot.com/2009/10/google-wave-invite-but-you-need-to-work.html\">Wave</a>! (BTW, I have quite some new invites…)</p>\n\n<p>Well, you are right. But I could not resist: <a href=\"http://blueobelisk.stackexchange.com/\">blueobelisk.stackexchange.com</a>…</p>\n\n<p><img src=\"/assets/images/boExchangeScreeny.png\" alt=\"\" /></p>\n\n<p>No, it is not using an Open platform, but plenty of Windows and Max users among us… the data is\n<a href=\"http://creativecommons.org/choose/zero\">CC0</a>.</p>\n\n<p><strong>Update</strong>: any question about Open Data, Open Source, or Open Standards (ODOSOS) is welcome. As well as any question on if and how some chemical\nquestion could be answered with ODOSOS tools. It is <em>not</em> restricted to the Blue Obelisk or the projects under the wings of the Blue Obelisk.\nAll Open Data, Open Source, and Open Standards in chemistry is worth asking about.</p>",
      "summary": "Oh no, not another communication channel?! We already have Google Wave! (BTW, I have quite some new invites…)",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/boExchangeScreeny.png",
      "date_published": "2009-11-11T00:00:00+00:00",
      "date_modified": "2009-11-11T00:00:00+00:00",
      "tags": ["blue-obelisk"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/49cn8-6n163",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/11/07/call-for-collaboration-javadoc.html",
      "title": "Call for Collaboration: JavaDoc validation with OpenJavaDocCheck",
      "content_html": "<p>I reported recently about my efforts to write an <a href=\"http://chem-bla-ics.blogspot.com/2009/10/work-in-progress-open-doccheck.html\">Open Source DocCheck replacement</a>.\nI received the first patches (from <a href=\"http://blog.rguha.net/\">Rajarshi</a>), and brought it online in a <a href=\"http://cdk.sf.net/\">CDK</a> branch (see\n<a href=\"http://pele.farmbio.uu.se/nightly-ojdcheck/ojdcheckSummary.html\">this Nightly page</a>).</p>\n\n<p><img src=\"/assets/images/ojdcheck.png\" alt=\"\" /></p>\n\n<p>This list shows a mix of tests that are now implemented in OpenJavaDocCheck itself, but the third line is actually a test that is plugged in and specific for the CDK. This is an important feature, I think, and allows users of OpenJavaDocCheck to add functionality is that is not interesting to the general public, but very interesting for the JavaDoc being analyzed. Well, at least, it is to our CDK project :)</p>\n\n<p>The current list of tests is still quite small, and consists of these tests:</p>\n\n<ul>\n  <li>test if each class and method has JavaDoc</li>\n  <li>test for missing @return tags</li>\n  <li>test for missing @param tags</li>\n  <li>test for @returns instead of @return</li>\n  <li>test @param template code, such as added by IDEs like Eclipse</li>\n  <li>test @exception template code, such as added by IDEs like Eclipse</li>\n  <li>test for redundant @version tags</li>\n</ul>\n\n<p>I am now <a href=\"http://github.com/egonw/ojdcheck/issues\">seeking feedback</a> on the current code base, and potentially collaboration with writing more\nJavaDoc validation tests. There is enough to do, and I have been thinking on tests for:</p>\n\n<ul>\n  <li>spell checking JavaDoc</li>\n  <li>checking for 404s of web pages linked with <code class=\"language-plaintext highlighter-rouge\">&lt;a href&gt;</code> in the JavaDoc</li>\n  <li>well-formedness of the HTML in the webpages</li>\n</ul>\n\n<p>And about:</p>\n\n<ul>\n  <li>a PMD-like system to allow people to choose which testing they want or not</li>\n  <li>an Eclipse plugin</li>\n</ul>",
      "summary": "I reported recently about my efforts to write an Open Source DocCheck replacement. I received the first patches (from Rajarshi), and brought it online in a CDK branch (see this Nightly page).",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/ojdcheck.png",
      "date_published": "2009-11-07T00:00:00+00:00",
      "date_modified": "2009-11-07T00:00:00+00:00",
      "tags": ["cdk","javadoc"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/2ty9f-n0m97",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/11/04/new-bioclipse-features-kabsch-alignment.html",
      "title": "New Bioclipse Features: Kabsch Alignment, RMSD Distance and Tanimoto Simarlity Matrices",
      "content_html": "<p>We recently submitted a second paper on <a href=\"http://www.bioclipse.net/\">Bioclipse</a>, and have worked hard in the past two weeks on addressing the\nreviewers’ questions (and we love these feature requests! See also these <a href=\"http://bioclipse.blogspot.com/2009/11/download-pdbs-with-bioclipse.html\">two</a>\n<a href=\"http://bioclipse.blogspot.com/2009/11/align-sequences-with-kalign-web-service.html\">blogs</a>). One reviewer seemed very interested in seeing\ndocking available in Bioclipse. While we do not have a full docking feature set up for Bioclipse, we do have functionality to deal with 3D\nstructures, though our researched urged us to focus on the 2D side of cheminformatics so far.</p>\n\n<p>To strengthen our intentions towards the 3D cheminformatics world, we have implemented a few new features, using <a href=\"http://cdk.sf.net/\">CDK</a>\nfunctionality. For example, we added Kabsch aligment and the related RMSD between molecular structures implemented as both popup menus\nas well as manager methods. The manager method you can see in action in <a href=\"http://www.myexperiment.org/workflows/937\">MyExperiment workflow 937</a>,\nwhich you can download directly into Bioclipse with one simple command (see\n<a href=\"http://chem-bla-ics.blogspot.com/2009/11/bioclipse-manager-for-myexperimentorg.html\">Bioclipse Manager for MyExperiment.org</a>):</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kd\">var</span> <span class=\"nx\">smileses</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"nc\">Array</span><span class=\"p\">(</span><span class=\"dl\">\"</span><span class=\"s2\">CC(C)C</span><span class=\"dl\">\"</span><span class=\"p\">,</span> <span class=\"dl\">\"</span><span class=\"s2\">CCCN</span><span class=\"dl\">\"</span><span class=\"p\">,</span> <span class=\"dl\">\"</span><span class=\"s2\">CCC=O</span><span class=\"dl\">\"</span><span class=\"p\">);</span>\n\n<span class=\"kd\">var</span> <span class=\"nx\">unaligned</span> <span class=\"o\">=</span> <span class=\"nx\">cdk</span><span class=\"p\">.</span><span class=\"nf\">createMoleculeList</span><span class=\"p\">();</span>\n<span class=\"k\">for </span><span class=\"p\">(</span><span class=\"nx\">i</span><span class=\"o\">=</span><span class=\"mi\">0</span><span class=\"p\">;</span> <span class=\"nx\">i</span><span class=\"o\">&lt;</span><span class=\"nx\">smileses</span><span class=\"p\">.</span><span class=\"nx\">length</span><span class=\"p\">;</span> <span class=\"nx\">i</span><span class=\"o\">++</span><span class=\"p\">)</span> <span class=\"p\">{</span>\n  <span class=\"nx\">mol</span> <span class=\"o\">=</span> <span class=\"nx\">cdk</span><span class=\"p\">.</span><span class=\"nf\">fromSMILES</span><span class=\"p\">(</span><span class=\"nx\">smileses</span><span class=\"p\">[</span><span class=\"nx\">i</span><span class=\"p\">]);</span>\n  <span class=\"nx\">mol</span> <span class=\"o\">=</span> <span class=\"nx\">cdk</span><span class=\"p\">.</span><span class=\"nf\">generate3dCoordinates</span><span class=\"p\">(</span><span class=\"nx\">mol</span><span class=\"p\">)</span>\n  <span class=\"nx\">unaligned</span><span class=\"p\">.</span><span class=\"nf\">add</span><span class=\"p\">(</span><span class=\"nx\">mol</span><span class=\"p\">);</span>\n<span class=\"p\">}</span>\n\n<span class=\"kd\">var</span> <span class=\"nx\">aligned</span> <span class=\"o\">=</span> <span class=\"nx\">cdk</span><span class=\"p\">.</span><span class=\"nf\">kabsch</span><span class=\"p\">(</span><span class=\"nx\">unaligned</span><span class=\"p\">)</span>\n\n<span class=\"nx\">jmol</span><span class=\"p\">.</span><span class=\"nf\">load</span><span class=\"p\">(</span><span class=\"nx\">aligned</span><span class=\"p\">.</span><span class=\"nf\">get</span><span class=\"p\">(</span><span class=\"mi\">0</span><span class=\"p\">));</span>\n<span class=\"k\">for </span><span class=\"p\">(</span><span class=\"nx\">i</span><span class=\"o\">=</span><span class=\"mi\">1</span><span class=\"p\">;</span> <span class=\"nx\">i</span><span class=\"o\">&lt;</span><span class=\"nx\">aligned</span><span class=\"p\">.</span><span class=\"nf\">size</span><span class=\"p\">();</span> <span class=\"nx\">i</span><span class=\"o\">++</span><span class=\"p\">)</span> <span class=\"p\">{</span>\n  <span class=\"nx\">jmol</span><span class=\"p\">.</span><span class=\"nf\">append</span><span class=\"p\">(</span><span class=\"nx\">aligned</span><span class=\"p\">.</span><span class=\"nf\">get</span><span class=\"p\">(</span><span class=\"nx\">i</span><span class=\"p\">));</span>\n<span class=\"p\">}</span>\n</code></pre></div></div>\n\n<p>Now, we do have to update the use of Jmol in Bioclipse, and a big overhaul is scheduled for the 2.4 released in February next year. But you\nget the idea.</p>\n\n<p>As said, there are two stories to adding this new functionality. Because we want all GUI interaction the user performs to be recordable\n(Scientist 1: <em>What did you do to get those nice results?</em> Scientist 2: <em>I pushed that button in the that long menu</em>. Scientist 1:\n<em>What button is that?</em> Scientist 2: <em>Wait, I send you the BSL script with a Google Wave.</em>)</p>\n\n<p>The managers that allow this recording is Bioclipse specific, and also the reason why it would not be trivial to make a general Bioclipse\nplugin for Eclipse… some Spring magic is used to inject the managers into the JavaScript language. Anyway, the second thing is to add\na GUI element, like popup menus. Now, this is a particular area where Eclipse excels. Now, I did have to ask for the details, as I am\nnot using this daily (I’m doing science, not IT), but Ola was kind enough to give me the pointers for it.</p>\n\n<p>The below configuration snippet links the pop up action to Bioclipse Navigator content (you know, where your MDL SD, CML, script and other\nfiles show up in Bioclipse). <strong><em>But</em></strong> only if I have selected 3 or more files! And, only if those files are actually some molecular\ncontent with 3D coordinates! And Bioclipse inherits this functionality by using the Eclipse platform.</p>\n\n<div class=\"language-xml highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nt\">&lt;menuContribution</span>\n  <span class=\"na\">locationURI=</span><span class=\"s\">\"popup:org.eclipse.ui.popup.any?after=additions\"</span><span class=\"nt\">&gt;</span>\n  <span class=\"nt\">&lt;command</span>\n    <span class=\"na\">commandId=</span><span class=\"s\">\"net.bioclipse.cdk.ui.handlers.kabschAlignment\"</span>\n    <span class=\"na\">label=</span><span class=\"s\">\"Perform Kabsch Alignment\"</span>\n    <span class=\"na\">icon=</span><span class=\"s\">\"icons/molecule2D.png\"</span><span class=\"nt\">&gt;</span>\n    <span class=\"nt\">&lt;visibleWhen&gt;</span>\n      <span class=\"nt\">&lt;with</span> <span class=\"na\">variable=</span><span class=\"s\">\"selection\"</span><span class=\"nt\">&gt;</span>\n        <span class=\"nt\">&lt;count</span> <span class=\"na\">value=</span><span class=\"s\">\"(2-\"</span><span class=\"nt\">/&gt;</span>\n        <span class=\"nt\">&lt;iterate</span> <span class=\"na\">operator=</span><span class=\"s\">\"and\"</span> <span class=\"na\">ifEmpty=</span><span class=\"s\">\"false\"</span><span class=\"nt\">&gt;</span>\n          <span class=\"nt\">&lt;adapt</span> <span class=\"na\">type=</span><span class=\"s\">\"org.eclipse.core.resources.IResource\"</span><span class=\"nt\">&gt;</span>\n            <span class=\"nt\">&lt;or&gt;</span>\n              <span class=\"nt\">&lt;test</span> <span class=\"na\">property=</span><span class=\"s\">\"org.eclipse.core.resources.contentTypeId\"</span>\n                       <span class=\"na\">value=</span><span class=\"s\">\"net.bioclipse.contenttypes.cml.singleMolecule3d\"</span><span class=\"nt\">/&gt;</span>\n              <span class=\"nt\">&lt;test</span> <span class=\"na\">property=</span><span class=\"s\">\"org.eclipse.core.resources.contentTypeId\"</span>\n                       <span class=\"na\">value=</span><span class=\"s\">\"net.bioclipse.contenttypes.cml.singleMolecule5d\"</span><span class=\"nt\">/&gt;</span>\n              <span class=\"nt\">&lt;test</span> <span class=\"na\">property=</span><span class=\"s\">\"org.eclipse.core.resources.contentTypeId\"</span>\n                       <span class=\"na\">value=</span><span class=\"s\">\"net.bioclipse.contenttypes.mdlMolFile3D\"</span><span class=\"nt\">/&gt;</span>\n            <span class=\"nt\">&lt;/or&gt;</span>\n          <span class=\"nt\">&lt;/adapt&gt;</span>\n        <span class=\"nt\">&lt;/iterate&gt;</span>\n      <span class=\"nt\">&lt;/with&gt;</span>\n    <span class=\"nt\">&lt;/visibleWhen&gt;</span>\n  <span class=\"nt\">&lt;/command&gt;</span>\n<span class=\"nt\">&lt;/menuContribution&gt;</span>\n</code></pre></div></div>\n\n<p>When Bioclipse is run, this looks like:</p>\n\n<p><img src=\"/assets/images/kabsch.png\" alt=\"\" /></p>\n\n<p>And the alignment results will nicely show up in a Jmol viewer (while it is implemented as an Eclipse editor, it is not yet):</p>\n\n<p><img src=\"/assets/images/bioclipseKabsch1.png\" alt=\"\" /></p>\n\n<p>The first screenshot also shows the new pop-up menus for calculating two matrices for 3 or more molecules. One is based on the\n<a href=\"http://en.wikipedia.org/wiki/Root_mean_square_deviation\">RMSD</a> of the 3D atomic coordinats of the atoms in the\n<a href=\"http://blog.rguha.net/?p=113\">MCSS</a> (BTW, Asad’s SMSD work is making its way into the CDK library, and will be available in a\nlater Bioclipse version too.) and will create a distance matrix. The second new pop-up menu used the Tanimoto similarity\nmeasure based on CDK fingerprints on the selected chemical graphs. If the Bioclipse Statistics feature is installed, the\ncreated <a href=\"http://en.wikipedia.org/wiki/Comma-separated_values\">CSV</a> files will open up in a matrix editor:</p>\n\n<p><img src=\"/assets/images/rmsdMatrix.png\" alt=\"\" /></p>\n\n<p>Kabsch alignment of protein backbones is planned for a later Bioclipse release, but an important feature for\n<a href=\"http://www.ncbi.nlm.nih.gov/sites/entrez?term=proteochemometrics%20wikberg\">our groups proteochemometrics work</a>.</p>",
      "summary": "We recently submitted a second paper on Bioclipse, and have worked hard in the past two weeks on addressing the reviewers’ questions (and we love these feature requests! See also these two blogs). One reviewer seemed very interested in seeing docking available in Bioclipse. While we do not have a full docking feature set up for Bioclipse, we do have functionality to deal with 3D structures, though our researched urged us to focus on the 2D side of cheminformatics so far.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/bioclipseKabsch1.png",
      "date_published": "2009-11-04T01:00:00+00:00",
      "date_modified": "2009-11-04T01:00:00+00:00",
      "tags": ["cheminf","cdk","bioclipse","jmol"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/4snrw-p4w70",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/11/04/milestones.html",
      "title": "Milestones...",
      "content_html": "<p>While I am still looking around for a assisting/associate professor position, there are two milestones around my scientific work I want to briefly\nmention here. This blog is the <strong>500th</strong> blog on <a href=\"http://chem-bla-ics.blogspot.com/\">chem-bla-ics</a>, and the two <a href=\"http://cdk.sf.net/\">CDK</a> papers\nhave combined reached 100+ citations as counted by Web-of-Science, as can be seen on <a href=\"http://www.researcherid.com/rid/C-6136-2008\">my ResearcherID profile</a>.</p>",
      "summary": "While I am still looking around for a assisting/associate professor position, there are two milestones around my scientific work I want to briefly mention here. This blog is the 500th blog on chem-bla-ics, and the two CDK papers have combined reached 100+ citations as counted by Web-of-Science, as can be seen on my ResearcherID profile.",
      
      "date_published": "2009-11-04T00:10:00+00:00",
      "date_modified": "2009-11-04T00:10:00+00:00",
      "tags": ["blog","cdk"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/c7qy9-qdj95",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/11/04/bioclipse-manager-for-myexperimentorg.html",
      "title": "Bioclipse Manager for MyExperiment.org",
      "content_html": "<p>Some time ago I wrote about using <a href=\"https://chem-bla-ics.linkedchemistry.info/2009/08/21/bioclipse-and-sparql-end-points-2.html\">Bioclipse to query to MyExperiment.org SPARQL end point <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.\nI think I had not mentioned that I have also written a manager to download <a href=\"http://www.myexperiment.org/\">MyExperiment</a>\n<a href=\"http://wiki.bioclipse.net/index.php?title=A_Meta_Language_for_Bioclipse (BSL)\">Bioclipse Scripting Language</a> scripts (though\nthere are no GUI elements yet):</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"o\">&gt;</span> <span class=\"nx\">myexperiment</span><span class=\"p\">.</span><span class=\"nf\">search</span><span class=\"p\">(</span><span class=\"dl\">\"</span><span class=\"s2\">RDF</span><span class=\"dl\">\"</span><span class=\"p\">)</span>\n<span class=\"p\">[</span><span class=\"mi\">921</span><span class=\"p\">,</span> <span class=\"mi\">928</span><span class=\"p\">,</span> <span class=\"mi\">889</span><span class=\"p\">]</span>\n\n<span class=\"o\">&gt;</span> <span class=\"nx\">myexperiment</span><span class=\"p\">.</span><span class=\"nf\">search</span><span class=\"p\">(</span><span class=\"dl\">\"</span><span class=\"s2\">Kabsch</span><span class=\"dl\">\"</span><span class=\"p\">)</span>\n<span class=\"p\">[</span><span class=\"mi\">937</span><span class=\"p\">]</span>\n</code></pre></div></div>\n\n<p>The returned lists give the workflow numbers for matching BSL scripts, which you can then simply download with:</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"o\">&gt;</span> <span class=\"kd\">var</span> <span class=\"nx\">file</span> <span class=\"o\">=</span> <span class=\"nx\">myexperiment</span><span class=\"p\">.</span><span class=\"nf\">downloadWorkflow</span><span class=\"p\">(</span><span class=\"mi\">937</span><span class=\"p\">)</span>\n<span class=\"nx\">ui</span><span class=\"p\">.</span><span class=\"nf\">open</span><span class=\"p\">(</span><span class=\"nx\">file</span><span class=\"p\">)</span>\n</code></pre></div></div>",
      "summary": "Some time ago I wrote about using Bioclipse to query to MyExperiment.org SPARQL end point . I think I had not mentioned that I have also written a manager to download MyExperiment Bioclipse Scripting Language scripts (though there are no GUI elements yet):",
      
      "date_published": "2009-11-04T00:00:00+00:00",
      "date_modified": "2026-03-19T00:00:00+00:00",
      "tags": ["bioclipse","rdf","sparql","myexperiment"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/7862g-njh70",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/10/21/maintaining-patches-is-fixing-patches.html",
      "title": "Maintaining patches is fixing patches",
      "content_html": "<p>Today I had a question about having to fix patches against upstream changes because those patches were not included upstream yet is <em>not very productive</em>.</p>\n\n<p>However, it is a prominent part of maintaining a code base. In the past 9 year, I <em>and many others</em> have been reworking a lot of <a href=\"http://cdk.sf.net/\">CDK</a>\ncode because of API changes <strong>and</strong> bug fixes in deeper parts of the CDK library. At least half of the work I have done for the CDK is doing this kind of\nfixing of downstream code. This is <em>never</em> trivial, and it is never productive. Well, depends somewhat on your definition of productivity.</p>\n\n<p>Whether productive or not, it is just something that needs to happen. Additionally, it is <strong>not</strong> something you can prevent. I guess one can call this a\n<em>fact of life</em>. Doesn’t make it nice work. Not at all. And most of my frustration with the CDK library is the lack of documentation and unit testing,\nwhich makes such fixing of downstream code hard. This means that the person best suited to do this job, is the one who wrote the patch in the first place.\nThe person who made the comment I mentioned earlier is seeing this from very up close now.</p>\n\n<h3 id=\"code-quality\">Code Quality</h3>\n\n<p>I very much understand his feeling of being unproductive when updating patches; been there, done that. He (that I can disclose) is absolutely right. With\nall the quality assurance functionality I have set up in the past for the CDK, nicely integrated in <a href=\"http://blog.rguha.net/\">Rajarshi</a>’s\n<a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk/nightly;a=summary\">Nightly</a> script, I hope to make it easier\nfor people to write proper maintainable patches. Often these reports are, however, again about doing tasks which make you feel unproductive. But I can\nassure you that writing such tools quality assurance tools, like the <a href=\"http://chem-bla-ics.blogspot.com/2009/10/work-in-progress-open-doccheck.html\">OpenJavaDocCheck</a>\nI worked on this weekend, makes you feel even less productive.</p>\n\n<h3 id=\"redesign\">Redesign</h3>\n\n<p>Sometimes making a library better maintainable, includes reworking the design. Almost always this take serious effort, and potentially introduce new\nbugs. At the same time, it always fixes a lot of older bugs and at the same time, of redesigned properly, makes it much easier to fix other bugs and\nallow more functionality to be implemented.</p>\n\n<p>But again, this requires rewriting of downstream patches too. And the one doing the redesign will always get comments about this requiring to make\nunproductive code updates downstream. I have seen this on several occasions in the CDK, such as my rewrite of the atom typing functionality in the\nCDK. (And don’t get any KDE4 developer started on that topic ;) Another <em>fact of life</em>, I guess.</p>",
      "summary": "Today I had a question about having to fix patches against upstream changes because those patches were not included upstream yet is not very productive.",
      
      "date_published": "2009-10-21T00:00:00+00:00",
      "date_modified": "2009-10-21T00:00:00+00:00",
      
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/j81e1-a3x41",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/10/20/crossref-writes-up-rss-usage.html",
      "title": "CrossRef writes up RSS usage recommendations",
      "content_html": "<p><a href=\"http://www.crossref.org/CrossTech/2009/10/recommendations_on_rss_feeds_f.html\">CrossTech announced</a> that a <a href=\"http://www.crossref.org/\">CrossRef</a>\nworking group has written a <a href=\"http://oxford.crossref.org/best_practice/rss/\">best practices</a> for the use of RSS feeds by publishers. Nice introduction\nfor anyone who is creating RSS feeds. Only comment I could make, is the lack of other modules. For example, a Chemistry module has been proposed by\nus 5 years ago already (DOI:<a href=\"http://dx.doi.org/10.1021/ci034244p\">10.1021/ci034244p</a>) and about which I blogged on\n<a href=\"http://chem-bla-ics.blogspot.com/search?q=CMLRSS\">several occasions</a>.</p>\n\n<p>Below is the <a href=\"http://cb.openmolecules.net/atom.php?category=&amp;type=latest_inchis\">CMLRSS feed</a> of <a href=\"http://cb.openmolecules.net/\">Chemical blogspace</a>.</p>\n\n<p><img src=\"/assets/images/cmlrss_Cb2.png\" alt=\"\" /></p>\n\n<p>Of course, publishers can take advantage of such modules, using the <a href=\"http://www.w3.org/TR/xml-names/\">XML Namespaces</a> technology. The <em>best practices</em>\nuses that for a <a href=\"http://dublincore.org/\">Dublin Core</a> and a <a href=\"http://purl.org/rss/1.0/modules/prism/\">PRISM</a> extension. The here discussed CML\nextension is another one, but the point is, that you can basically plug in any module.</p>",
      "summary": "CrossTech announced that a CrossRef working group has written a best practices for the use of RSS feeds by publishers. Nice introduction for anyone who is creating RSS feeds. Only comment I could make, is the lack of other modules. For example, a Chemistry module has been proposed by us 5 years ago already (DOI:10.1021/ci034244p) and about which I blogged on several occasions.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/cmlrss_Cb2.png",
      "date_published": "2009-10-20T00:00:00+00:00",
      "date_modified": "2009-10-20T00:00:00+00:00",
      "tags": ["cml","rss","xml"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1021/CI034244P", "doi": "10.1021/CI034244P"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/qfnkc-xtx89",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/10/17/work-in-progress-open-doccheck.html",
      "title": "Work in Progress: an Open DocCheck replacement",
      "content_html": "<p>While it is still very much in progress, I have already made more progress than I had hoped for. The <a href=\"http://java.sun.com/j2se/1.5.0/docs/guide/javadoc/doclet/spec/index.html\">JavaDoc Doclet API</a>\nis actually not too difficult to use, though my use will very likely improve more later. The <a href=\"http://cdk.sf.net/\">CDK</a> has been using\n<a href=\"http://java.sun.com/j2se/javadoc/doccheck/\">Sun’s DocCheck</a> utility for testing the library’s JavaDoc quality, but the reports never\nreally satisfied me. Moreover, the most recent version is ancient and because it is <em>closed source</em>, no one can continue on those efforts.\nDocCheck is <a href=\"http://en.wikipedia.org/wiki/Missing_In_Action\">MIA</a>.</p>\n\n<p>Instead, <a href=\"http://pmd.sf.net/\">PMD</a> is given nice overviews of what it believes to be wrong with the CDK, and also provides a decent XML\nformat which allows extraction of information, which is used by, for example, <a href=\"http://pele.farmbio.uu.se/supernightly/\">SuperNightly</a> as\nshowed yesterday in <a href=\"http://chem-bla-ics.blogspot.com/2009/10/pmd-245-installed-in-cdk-12x-branch.html\">PMD 2.4.5 installed in the CDK 1.2.x branch</a>.</p>\n\n<p>I have been pondering about it for a long time now, but writing a JavaDoc checking library is hardly core cheminformatics research;\nat least, you would not get funding for it, despite everyone always complaining about good documentation. <em>Alas</em>.</p>\n\n<p>Last week, I was reviewing some more code, and again saw the very common error of the missing period at the end of the first sentence\nin JavaDoc. This one is sort of important for proper JavaDoc documentation generation, but the complexity of the current DocCheck\nreporting, people are not familiar enough with it. Being tired of having to repeat myself, I decided to address the problen, but\ncreating better <a href=\"http://pele.farmbio.uu.se/nightly/javadoc/data/\">Nightly error reporting for the CDK JavaDoc</a>.</p>\n\n<p>So, I started <a href=\"http://github.com/egonw/ojdcheck\">OpenJavaDocCheck</a>, or <em>ojdcheck</em>. As mentioned, I have made quite promising progress,\nand the current version provides the ability to write custom tests (which I plan to use for validating content of\n<a href=\"http://cdk.sourceforge.net/guides/devel/ch01.html\">CDK taglet</a> content), and create XML as well as XHTML which can be saved to any file.\nTo give you a glimps of where things are going, here’s a screenshot of the current XHTML output:</p>\n\n<p><img src=\"/assets/images/ojdcheckXHTML.png\" alt=\"\" /></p>\n\n<p>The current list of tests is really small, and consists of a single test:</p>\n\n<ul>\n  <li>test if each class and method has JavaDoc</li>\n</ul>",
      "summary": "While it is still very much in progress, I have already made more progress than I had hoped for. The JavaDoc Doclet API is actually not too difficult to use, though my use will very likely improve more later. The CDK has been using Sun’s DocCheck utility for testing the library’s JavaDoc quality, but the reports never really satisfied me. Moreover, the most recent version is ancient and because it is closed source, no one can continue on those efforts. DocCheck is MIA.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/ojdcheckXHTML.png",
      "date_published": "2009-10-17T00:00:00+00:00",
      "date_modified": "2009-10-17T00:00:00+00:00",
      "tags": ["cdk","javadoc","java","xml"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/xewvg-pwc38",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/10/16/pmd-245-installed-in-cdk-12x-branch.html",
      "title": "PMD 2.4.5 installed in the CDK 1.2.x branch",
      "content_html": "<p>Today I installed <a href=\"http://pmd.sf.net/\">PMD</a> 4.2.5 in the <a href=\"http://github.com/egonw/cdk/tree/cdk-1.2.x\">CDK 1.2.x branch</a> which contains\nmostly <a href=\"http://sourceforge.net/project/shownotes.php?release_id=659603&amp;group_id=56262\">bug fixes</a> compared to the 4.2.2 version we had\nearlier. Several of these include false positives: warnings which were not really problems, but tests going bad.</p>\n\n<p><img src=\"/assets/images/supernightly2.png\" alt=\"\" /></p>\n\n<p>The number of these false positives seems to be significant as the number of PMD violations for the CDK 1.2.x branch seems to have\ndropped about 1500! warnings :)</p>",
      "summary": "Today I installed PMD 4.2.5 in the CDK 1.2.x branch which contains mostly bug fixes compared to the 4.2.2 version we had earlier. Several of these include false positives: warnings which were not really problems, but tests going bad.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/supernightly2.png",
      "date_published": "2009-10-16T00:00:00+00:00",
      "date_modified": "2009-10-16T00:00:00+00:00",
      "tags": ["cdk","pmd"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/g7yhg-frs33",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/10/15/sparql-end-points-jena-and-bifcontains.html",
      "title": "SPARQL end points, Jena and bif:contains",
      "content_html": "<p>I have been having fun with SPARQL in <a href=\"http://www.bioclipse.net/\">Bioclipse</a> for a while now, and blogged at several occasions:</p>\n\n<ul>\n  <li><a href=\"\">NMRShiftDB enters rdf.openmolecules.net #2: SPARQL end point with Virtuoso</a></li>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2009/08/21/bioclipse-and-sparql-end-points-2.html\">Bioclipse and SPARQL end points #2: MyExperiment <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2009/08/16/bioclipse-and-sparql-end-points.html\">Bioclipse and SPARQL end points <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n</ul>\n\n<p>One thing I had not been able to work out, is that <a href=\"http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/\">Virtuoso</a> uses a\n(rather nice) <em>bif:contains</em> extension that support indexing. However, <a href=\"http://jena.sourceforge.net/\">Jena</a> would complain with:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>com.hp.hpl.jena.query.QueryParseException: Line 1, column 31: Unresolved\nprefixed name: bif:contains\n</code></pre></div></div>\n\n<p>Defining the prefix did not solve the problem either, but <a href=\"http://www.linkedin.com/in/ivanmikhailov\">Ivan Mikhailov</a> just\nreplied to my post to the <a href=\"https://sourceforge.net/mailarchive/forum.php?forum_name=virtuoso-users\">virtuoso-user</a> mailing\nlist providing the solution.</p>\n\n<p>The solution is in the fact that <code class=\"language-plaintext highlighter-rouge\">bif:</code> is in its own namespace, which makes it possible to replace <code class=\"language-plaintext highlighter-rouge\">bif:contains</code> by its\nfull reference <code class=\"language-plaintext highlighter-rouge\">&lt;bif:contains&gt;</code>. I directly gave that a try in Bioclipse, and just succesfull ran this Bioclipse\nscript snippet:</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nx\">rdf</span><span class=\"p\">.</span><span class=\"nf\">sparqlRemote</span><span class=\"p\">(</span>\n  <span class=\"dl\">\"</span><span class=\"s2\">http://bio2rdf.org/sparql</span><span class=\"dl\">\"</span><span class=\"p\">,</span>\n  <span class=\"dl\">\"</span><span class=\"s2\">SELECT * WHERE {?s ?p ?o . ?o &lt;bif:contains&gt; </span><span class=\"se\">\\\"</span><span class=\"s2\">aspirin</span><span class=\"se\">\\\"</span><span class=\"s2\"> .};</span><span class=\"dl\">\"</span>\n<span class=\"p\">);</span>\n</code></pre></div></div>\n\n<p>Thanx, Ivan!</p>",
      "summary": "I have been having fun with SPARQL in Bioclipse for a while now, and blogged at several occasions:",
      
      "date_published": "2009-10-15T00:00:00+00:00",
      "date_modified": "2026-03-19T00:00:00+00:00",
      "tags": ["sparql","rdf","bioclipse","bio2rdf"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/rnjv0-tj350",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/10/09/nmrshiftdb-rdf-3-bio2rdf.html",
      "title": "NMRShiftDB RDF #3: Bio2RDF",
      "content_html": "<p>My might have seen my efforts to convert the <a href=\"http://www.nmrshiftdb.org/\">NMRShiftDB</a> data into RDF:</p>\n\n<ul>\n  <li><a href=\"http://chem-bla-ics.blogspot.com/2009/09/nmrshiftdb-rdf-2-some-statistics.html\">NMRShiftDB RDF #2: Some statistics</a></li>\n  <li><a href=\"http://chem-bla-ics.blogspot.com/2009/09/nmrshiftdb-rdf-1-spectra-by-inchi.html\">NMRShiftDB RDF #1: Spectra by InChI </a></li>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2009/09/04/nmrshiftdb-enters-rdfopenmoleculesnet-2.html\">NMRShiftDB enters rdf.openmolecules.net #2: SPARQL end point with Virtuoso <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n</ul>\n\n<p><a href=\"http://bio2rdf.blogspot.com/\">Peter Ansell</a> has shortly after that copied the data into <a href=\"http://bio2rdf.org/\">Bio2RDF</a>,\nbut I had not blogged about that yet. So, here goes. If you have not looked at Bio2RDF yet, this is a good time to do that.\nThe structure of the exposed triples is not perfect, and I just realized I made a beginners mistake, to use a domain name\nin a namespace I have not control over (bad me). The Virtuoso6 faceted browser allows you to navigate the data in Bio2RDF\nby molecule (e.g. <a href=\"http://cu.bio2rdf.org/page/nmrshiftdb_molecule:234\">molecule 234</a>):</p>\n\n<p><img src=\"/assets/images/nmrRDF1.png\" alt=\"\" /></p>\n\n<p>And by spectrum too (e.g. <a href=\"http://cu.bio2rdf.org/page/nmrshiftdb_spectrum:4735\">spectrum 4735</a>):</p>\n\n<p><img src=\"/assets/images/nmrRDF2.png\" alt=\"\" /></p>",
      "summary": "My might have seen my efforts to convert the NMRShiftDB data into RDF:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/nmrRDF1.png",
      "date_published": "2009-10-09T00:00:00+00:00",
      "date_modified": "2026-03-19T00:00:00+00:00",
      "tags": ["nmrshiftdb","rdf","bio2rdf"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/pdckn-qjy95",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/10/08/where-are-cdk-131-and-124-releases.html",
      "title": "Where are the CDK 1.3.1 and 1.2.4 releases ?!?",
      "content_html": "<p>You might be wondering what is keeping the <a href=\"http://cdk.sf.net/\">CDK</a> 1.3.1 and 1.2.4 releases. And right you are. When we look at\n<a href=\"http://pele.farmbio.uu.se/supernightly/\">Supernightly</a>, we get a clue (BTW, I hope the <a href=\"http://www.mail-archive.com/cdk-user@lists.sourceforge.net/msg01173.html\">EBI</a>\nnodes will join soon too):</p>\n\n<p><img src=\"/assets/images/supernightly1.png\" alt=\"\" /></p>\n\n<p>Studying this table shows the reasons: there are too many regressions, too many failing unit tests. For example, 1.2.4 (while not yet released,\ncalled 1.2.3.git) has 50 new failing tests. Now, fair enough, this is <a href=\"http://pele.farmbio.uu.se/nightly-1.2.x/test/result-ioformats.html\">mostly because</a>\nof ioformats not being tested in 1.2.3 <strong><em>and</em></strong> most of the fails caused by a bug in the test, not in the code. But that still leaves 20\nother failing tests. Mostly related to known bugs, and for some problems patches are actually available.</p>\n\n<p>These last 22 we also see in the differences between 1.3.0 and 1.3.1 (while not yet released, called 1.3.0.git).\nThat’s because the ioformats modules is not tested in that branch either, pending a new merge with the cdk-1.2.x\nbranch.</p>",
      "summary": "You might be wondering what is keeping the CDK 1.3.1 and 1.2.4 releases. And right you are. When we look at Supernightly, we get a clue (BTW, I hope the EBI nodes will join soon too):",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/supernightly1.png",
      "date_published": "2009-10-08T00:00:00+00:00",
      "date_modified": "2009-10-08T00:00:00+00:00",
      "tags": ["cdk","git"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/g0cry-vhk91",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/10/07/vrse-funded-research-to-be-os-as-of.html",
      "title": "VR.se funded research to be OA as of 2010",
      "content_html": "<p>Happy news from the Swedish <a href=\"http://www.vr.se/\">Vetenskapsradet</a> (via <a href=\"http://scienceblogs.com/clock/2009/10/the_swedish_research_council_m.php\">Coturnix</a>):\nas of next 2010 all peer reviewed journal papers <a href=\"http://www.vr.se/franvetenskapsradet/nyheter/nyhetsarkiv/nyheter2009/nyheter2009/vetenskapsradetkraverfritillgangtillforskningsresultat.5.227c330c123c73dc586800011519.html\">must be Open Access</a>.\nI am not yet VR funded, but involved in a few VR grant applications. Not that that really matters, as I am happily\n<a href=\"http://www.google.se/search?hl=sv&amp;q=site:biomedcentral.com+willighagen&amp;btnG=S%C3%B6k&amp;meta=\">publishing OA already</a>.</p>",
      "summary": "Happy news from the Swedish Vetenskapsradet (via Coturnix): as of next 2010 all peer reviewed journal papers must be Open Access. I am not yet VR funded, but involved in a few VR grant applications. Not that that really matters, as I am happily publishing OA already.",
      
      "date_published": "2009-10-07T00:10:00+00:00",
      "date_modified": "2009-10-07T00:10:00+00:00",
      "tags": ["openaccess"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/j7qvv-pnd49",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/10/07/keeping-my-bioclipse-repositories-in.html",
      "title": "Keeping my Bioclipse repositories in sync with upstream",
      "content_html": "<p><a href=\"http://www.bioclipse.net/\">Bioclipse</a> is now split up over several <a href=\"http://en.wikipedia.org/wiki/Git_(software)\">Git</a> repositories (and some\nadditional stuff in even more repositories). This has all to do with each repository now having one person acting as point-of-access. This\nmeans that I have several repositories checked out, which I need to keep synchronized. Now, I am pretty sure there are many solutions (and\nsuggestions very welcome!), but this is the <a href=\"http://en.wikipedia.org/wiki/Bash\">Bash</a> script I have just written to give me an overview of\nthe state of my repositories, hoping it may be useful to others too:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"c\">#!/bin/bash</span>\n\n<span class=\"nv\">PLUGINS</span><span class=\"o\">=</span><span class=\"sb\">`</span><span class=\"nb\">ls</span> <span class=\"nt\">-1</span><span class=\"sb\">`</span>\n\n<span class=\"k\">for </span>PLUGIN <span class=\"k\">in</span> <span class=\"nv\">$PLUGINS</span>\n<span class=\"k\">do\n        </span><span class=\"nb\">echo</span> <span class=\"s2\">\"************************************************************* </span><span class=\"nv\">$PLUGIN</span><span class=\"s2\">\"</span>\n        <span class=\"nb\">cd</span> <span class=\"nv\">$PLUGIN</span><span class=\"p\">;</span> git fetch origin<span class=\"p\">;</span> git status<span class=\"p\">;</span> <span class=\"nb\">cd</span> ..\n<span class=\"k\">done</span>\n</code></pre></div></div>",
      "summary": "Bioclipse is now split up over several Git repositories (and some additional stuff in even more repositories). This has all to do with each repository now having one person acting as point-of-access. This means that I have several repositories checked out, which I need to keep synchronized. Now, I am pretty sure there are many solutions (and suggestions very welcome!), but this is the Bash script I have just written to give me an overview of the state of my repositories, hoping it may be useful to others too:",
      
      "date_published": "2009-10-07T00:00:00+00:00",
      "date_modified": "2009-10-07T00:00:00+00:00",
      "tags": ["bioclipse","git"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/r6xmq-16f84",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/10/06/cdk-molecules-in-rdf.html",
      "title": "CDK Molecules in RDF",
      "content_html": "<p>Yesterday, I finally got around to starting a <a href=\"http://github.com/egonw/cdk/tree/73-rdf\">branch</a> on adding RDF support to the\n<a href=\"http://cdk.sf.net/\">CDK</a>; in particular, write the CDK data model ontology in <a href=\"http://en.wikipedia.org/wiki/Web_Ontology_Language\">OWL</a>\nand serialization to and from RDF using the ontology. The framework is now set up, but I have yet to formalize all bits and pieces\nof the CDK data model in classes and properties. Just as a preview, here is what a very basic bit of CDK model in RDF looks like\n(<a href=\"http://en.wikipedia.org/wiki/Notation3\">N3 format</a>):</p>\n\n<pre><code class=\"language-notation3\">@prefix cdk:     &lt;http://cdk.sourceforge.net/model.owl#&gt; .\n\n&lt;http://cdk.sf.net/model/atom/1&gt;\n      a       cdk:Atom ;\n      cdk:symbol \"C\" .\n\n&lt;http://cdk.sf.net/model/molecule/1&gt;\n      a       cdk:Molecule ;\n      cdk:hasAtom  .\n</code></pre>\n\n<p>Still rather verbose, but very flexible. I have even been thinking of an XHTML+RDFa writer…</p>",
      "summary": "Yesterday, I finally got around to starting a branch on adding RDF support to the CDK; in particular, write the CDK data model ontology in OWL and serialization to and from RDF using the ontology. The framework is now set up, but I have yet to formalize all bits and pieces of the CDK data model in classes and properties. Just as a preview, here is what a very basic bit of CDK model in RDF looks like (N3 format):",
      
      "date_published": "2009-10-06T00:00:00+00:00",
      "date_modified": "2009-10-06T00:00:00+00:00",
      "tags": ["cdk","rdf"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/st1rb-pxa93",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/10/01/google-wave-invite-but-you-need-to-work.html",
      "title": "Google Wave Invite: but you need to work on the CDK and the CDKitty robot",
      "content_html": "<p>I just posted to below email to the cdk-user mailing list. Next Monday, I’ll decide.</p>\n\n<blockquote>\n  <p>Hi all,</p>\n\n  <p>unless you have not read any news in the last two days, you will have\nseen that Google is rolling out a second batch of <a href=\"http://en.wikipedia.org/wiki/Google_Wave\">Google Wave</a>\naccounts… I have one invite for someone who wants to co-develop the\nCDKitty robot, which adds <a href=\"http://cdk.sf.net/\">CDK</a>-based functionality to Google Wave…</p>\n\n  <p>The code is at: <a href=\"https://github.com/egonw/cdkitty\">https://github.com/egonw/cdkitty</a></p>\n\n  <p>If you are interested in the account, please email me offline with:</p>\n\n  <ul>\n    <li>how you think you can contribute to the robot</li>\n    <li>why you want to do that</li>\n    <li>how much time you will have for it</li>\n  </ul>\n\n  <p>The position is open to anyway, and consider your email an application\nto the position :) (and, if you are a student, we could even try to\narrange <a href=\"http://www.uu.se/\">Uppsala University</a> credit points, if you can work 20 weeks\nfull time on it).</p>\n\n  <p>Egon\n```</p>\n</blockquote>\n\n<p>BTW, existing Google Wave users can invite the robot by adding <em>chemdevelkit@appspot.com</em>.</p>",
      "summary": "I just posted to below email to the cdk-user mailing list. Next Monday, I’ll decide.",
      
      "date_published": "2009-10-01T00:10:00+00:00",
      "date_modified": "2009-10-01T00:10:00+00:00",
      "tags": ["google","wave","cdk"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/2zprm-hs481",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/10/01/processing-chebi-mdl-sd-file-with-cdk.html",
      "title": "Processing the ChEBI MDL SD file with the CDK",
      "content_html": "<p><a href=\"http://www.bioclipse.net/\">Bioclipse</a> has a <a href=\"http://pele.farmbio.uu.se/cgi-bin/bugzilla/show_bug.cgi?id=1526\">bug report</a> about browsing\nthe <a href=\"http://www.ebi.ac.uk/chebi/\">ChEBI</a> <a href=\"ftp://ftp.ebi.ac.uk/pub/databases/chebi/SDF/\">SD file</a> in its\n<a href=\"http://bioclipse.blogspot.com/2009/07/working-with-large-sdfiles-in-bioclipse.html\">moltable editor</a>. Some entries make\nBioclipse crash (as reported), or just very sluggish as with my Dell superlapcomputer :)</p>\n\n<p>So, I processed the file with a pure <a href=\"http://pele.farmbio.uu.se/nightly-1.2.3/\">CDK 1.2.3</a> with this\nsmall piece of Groovy script:</p>\n\n<div class=\"language-groovy highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kn\">import</span> <span class=\"nn\">org.openscience.cdk.interfaces.*</span><span class=\"o\">;</span>\n<span class=\"kn\">import</span> <span class=\"nn\">org.openscience.cdk.io.*</span><span class=\"o\">;</span>\n<span class=\"kn\">import</span> <span class=\"nn\">org.openscience.cdk.io.iterator.*</span><span class=\"o\">;</span>\n<span class=\"kn\">import</span> <span class=\"nn\">org.openscience.cdk.*</span><span class=\"o\">;</span>\n<span class=\"kn\">import</span> <span class=\"nn\">org.openscience.cdk.tools.manipulator.*</span><span class=\"o\">;</span>\n\n<span class=\"n\">iterator</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"n\">IteratingMDLReader</span><span class=\"o\">(</span>\n  <span class=\"k\">new</span> <span class=\"nf\">File</span><span class=\"o\">(</span><span class=\"s2\">\"ChEBI_complete.sdf\"</span><span class=\"o\">).</span><span class=\"na\">newReader</span><span class=\"o\">(),</span>\n  <span class=\"n\">DefaultChemObjectBuilder</span><span class=\"o\">.</span><span class=\"na\">getInstance</span><span class=\"o\">()</span>\n<span class=\"o\">)</span>\n<span class=\"kt\">int</span> <span class=\"n\">i</span> <span class=\"o\">=</span> <span class=\"mi\">0</span><span class=\"o\">;</span>\n<span class=\"kt\">boolean</span> <span class=\"n\">hasNext</span> <span class=\"o\">=</span> <span class=\"kc\">true</span><span class=\"o\">;</span>\n<span class=\"k\">while</span> <span class=\"o\">(</span><span class=\"n\">hasNext</span><span class=\"o\">)</span> <span class=\"o\">{</span>\n  <span class=\"n\">i</span><span class=\"o\">++;</span>\n  <span class=\"kt\">long</span> <span class=\"n\">startTime</span> <span class=\"o\">=</span> <span class=\"n\">System</span><span class=\"o\">.</span><span class=\"na\">currentTimeMillis</span><span class=\"o\">();</span>\n  <span class=\"n\">hasNext</span> <span class=\"o\">=</span> <span class=\"n\">iterator</span><span class=\"o\">.</span><span class=\"na\">hasNext</span><span class=\"o\">();</span>\n  <span class=\"n\">IMolecule</span> <span class=\"n\">mol</span> <span class=\"o\">=</span> <span class=\"n\">iterator</span><span class=\"o\">.</span><span class=\"na\">next</span><span class=\"o\">()</span>\n  <span class=\"kt\">long</span> <span class=\"n\">endTime</span> <span class=\"o\">=</span> <span class=\"n\">System</span><span class=\"o\">.</span><span class=\"na\">currentTimeMillis</span><span class=\"o\">();</span>\n  <span class=\"n\">formula</span> <span class=\"o\">=</span> <span class=\"n\">MolecularFormulaManipulator</span><span class=\"o\">.</span><span class=\"na\">getMolecularFormula</span><span class=\"o\">(</span><span class=\"n\">mol</span><span class=\"o\">)</span>\n  <span class=\"kt\">long</span> <span class=\"n\">time</span> <span class=\"o\">=</span> <span class=\"n\">endTime</span> <span class=\"o\">-</span> <span class=\"n\">startTime</span><span class=\"o\">;</span>\n  <span class=\"k\">if</span> <span class=\"o\">(</span><span class=\"n\">time</span> <span class=\"o\">&gt;</span> <span class=\"mi\">99</span><span class=\"o\">)</span>\n    <span class=\"n\">println</span> <span class=\"n\">i</span> <span class=\"o\">+</span> <span class=\"s2\">\": \"</span> <span class=\"o\">+</span> <span class=\"n\">MolecularFormulaManipulator</span><span class=\"o\">.</span><span class=\"na\">getString</span><span class=\"o\">(</span><span class=\"n\">formula</span><span class=\"o\">)</span> <span class=\"o\">+</span>\n            <span class=\"s2\">\" (\"</span> <span class=\"o\">+</span> <span class=\"n\">endTime</span> <span class=\"o\">+</span> <span class=\"s2\">\"-\"</span> <span class=\"o\">+</span> <span class=\"n\">startTime</span> <span class=\"o\">+</span> <span class=\"s2\">\"=\"</span> <span class=\"o\">+</span> <span class=\"n\">time</span> <span class=\"o\">+</span> <span class=\"s2\">\" ms)\"</span>\n\n<span class=\"o\">}</span>\n</code></pre></div></div>\n\n<p>This script times reading of all entries and reports all that entries take more than 100 ms to read (in the scripting environment). There are surprising results: H2O takes 50 seconds, phosphate 100 seconds. So, I am quite certain it must be the reading of the metadata, and not the connection table. But, this I will explore in more detail now, hoping to come up with a patch for the CDK to speed up reading of such entries.</p>\n\n<p>The full list of timings:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>1: C10H2 (1254375053450-1254375052356=1094 ms)\n152: C20HN7O6 (1254375054779-1254375054125=654 ms)\n592: C3HO3 (1254375056604-1254375055499=1105 ms)\n832: C9NO5 (1254375057016-1254375056823=193 ms)\n879: C20N4 (1254375057381-1254375057039=342 ms)\n1125: R (1254375058293-1254375057528=765 ms)\n1197: C20N7O6 (1254375058612-1254375058372=240 ms)\n1198: C5NO3 (1254375058714-1254375058613=101 ms)\n1243: C5NO4 (1254375063698-1254375058800=4898 ms)\n1272: C21N7O16P3S (1254375067185-1254375063856=3329 ms)\n1277: C23N7O17P3S (1254375067625-1254375067239=386 ms)\n1282: C3NO2S (1254375070673-1254375067650=3023 ms)\n1285: C3O3 (1254375071600-1254375070675=925 ms)\n1290: C2O2 (1254375071802-1254375071608=194 ms)\n1299: H2O (1254375122202-1254375071808=50394 ms)\n1300: H (1254375136668-1254375122202=14466 ms)\n1301: O2 (1254375145270-1254375136670=8600 ms)\n1335: C15N6O5S (1254375150683-1254375145319=5364 ms)\n1343: C10N5O13P3 (1254375298927-1254375150686=148241 ms)\n1349: C2NO2 (1254375301391-1254375298953=2438 ms)\n1351: C34N4O4 (1254375301659-1254375301396=263 ms)\n1509: C6NO2 (1254375302753-1254375302011=742 ms)\n1541: C19N7O6 (1254375303296-1254375302778=518 ms)\n1543: C20HN7O7 (1254375303441-1254375303312=129 ms)\n1609: C9N2O15P3 (1254375303740-1254375303558=182 ms)\n1631: CHO2 (1254375303975-1254375303837=138 ms)\n1632: C4O4 (1254375304127-1254375303976=151 ms)\n1711: C21N7O14P2 (1254375310174-1254375304245=5929 ms)\n1788: C6H3O9P (1254375310555-1254375310387=168 ms)\n1798: C10H2N2O3S (1254375310705-1254375310588=117 ms)\n1808: C6N3O2 (1254375312665-1254375310727=1938 ms)\n1823: C10N5O14P3 (1254375318534-1254375312781=5753 ms)\n1839: C5NO4 (1254375325988-1254375318583=7405 ms)\n1840: C3O3 (1254375326249-1254375325989=260 ms)\n1848: C10N5O7P (1254375336661-1254375326273=10388 ms)\n1862: C5NOSR2 (1254375337336-1254375336699=637 ms)\n1882: C3N2 (1254375337489-1254375337351=138 ms)\n1893: C6O9P (1254375337626-1254375337501=125 ms)\n1910: C3O6P (1254375337846-1254375337639=207 ms)\n1934: H3N (1254375349713-1254375337921=11792 ms)\n1977: O4S (1254375350045-1254375349902=143 ms)\n1984: CN2O (1254375350174-1254375350050=124 ms)\n2007: C5N (1254375350324-1254375350183=141 ms)\n2015: C5N5O (1254375350493-1254375350329=164 ms)\n2016: C2O (1254375350683-1254375350494=189 ms)\n2018: C27N9O15P2 (1254375351927-1254375350684=1243 ms)\n2020: H2O2 (1254375352124-1254375351928=196 ms)\n2036: C17N3O17P2 (1254375352309-1254375352196=113 ms)\n2095: C10N5O4 (1254375352578-1254375352394=184 ms)\n2137: C14HO4R (1254375353331-1254375352646=685 ms)\n2180: C3NO2 (1254375354199-1254375353469=730 ms)\n2184: C9NO5 (1254375354480-1254375354270=210 ms)\n2194: C6N4O2 (1254375356738-1254375354485=2253 ms)\n2201: C21N7O17P3 (1254375359838-1254375356748=3090 ms)\n2228: C6O2 (1254375360480-1254375359912=568 ms)\n2240: CO2 (1254375363324-1254375360485=2839 ms)\n2327: C5NO2S (1254375370536-1254375363612=6924 ms)\n2348: C14N6O5S (1254375371522-1254375370558=964 ms)\n2359: C9N2O9P (1254375372236-1254375371544=692 ms)\n2367: C9N2O6 (1254375373614-1254375372265=1349 ms)\n2370: C5N5 (1254375373975-1254375373615=360 ms)\n2404: C10N5O5 (1254375374360-1254375374108=252 ms)\n2413: C10N5O10P2 (1254375401639-1254375374373=27266 ms)\n2454: C5O5 (1254375401831-1254375401688=143 ms)\n2455: C5NO2S (1254375407807-1254375401832=5975 ms)\n2470: C11N2O2 (1254375408251-1254375407815=436 ms)\n2494: C4NO3 (1254375409200-1254375408373=827 ms)\n2499: C5O10P2R (1254375412153-1254375409297=2856 ms)\n2525: C4HO7P (1254375412777-1254375412293=484 ms)\n2526: C4N2 (1254375414071-1254375412777=1294 ms)\n2534: C21N7O14P2 (1254375417657-1254375414091=3566 ms)\n2581: C3NO2 (1254375422072-1254375417745=4327 ms)\n2638: C4NO4 (1254375424772-1254375422244=2528 ms)\n2680: C5O14P3 (1254375426347-1254375424831=1516 ms)\n2683: C3NO3 (1254375433063-1254375426353=6710 ms)\n2702: C3HO6P (1254375433192-1254375433079=113 ms)\n2749: C4N2O3 (1254375434106-1254375433445=661 ms)\n2755: C10N4O8P (1254375434417-1254375434113=304 ms)\n2756: C5NO2 (1254375436750-1254375434418=2332 ms)\n2779: C4NO2S (1254375437847-1254375436759=1088 ms)\n2803: C5N4 (1254375438991-1254375437968=1023 ms)\n2832: C9NO2 (1254375439226-1254375439026=200 ms)\n2844: C8HNO3 (1254375440463-1254375439238=1225 ms)\n2856: C5O13P3R (1254375441336-1254375440497=839 ms)\n2863: C10O6 (1254375442424-1254375441348=1076 ms)\n2873: C10N5O8P (1254375442560-1254375442433=127 ms)\n2898: C3HO3 (1254375443712-1254375442655=1057 ms)\n2925: C8H4NO6 (1254375443886-1254375443729=157 ms)\n3025: CO3 (1254375444508-1254375444131=377 ms)\n3031: C10N5O11P2 (1254375444810-1254375444601=209 ms)\n3038: C3NO2S (1254375449012-1254375444836=4176 ms)\n3042: C4N2O2 (1254375449224-1254375449066=158 ms)\n3060: C6O2 (1254375449433-1254375449274=159 ms)\n3083: C17N4O9P (1254375450751-1254375449465=1286 ms)\n3088: C34FeN4O4 (1254375452873-1254375450848=2025 ms)\n3111: C9N2O12P2 (1254375454560-1254375452939=1621 ms)\n3119: CNO5P (1254375454774-1254375454563=211 ms)\n3122: C9N3O14P3 (1254375454972-1254375454778=194 ms)\n3184: C3O3 (1254375455362-1254375455053=309 ms)\n3213: CO (1254375455489-1254375455375=114 ms)\n3216: C3HO7P (1254375455662-1254375455490=172 ms)\n3223: C9N2O6 (1254375455850-1254375455737=113 ms)\n3239: C3NO3 (1254375458116-1254375455868=2248 ms)\n3296: C9NO3 (1254375459575-1254375458250=1325 ms)\n3306: S3R (1254375464014-1254375459596=4418 ms)\n3313: C6O6 (1254375464701-1254375464016=685 ms)\n3348: C14HO4 (1254375465102-1254375464766=336 ms)\n3360: C12O11 (1254375465830-1254375465193=637 ms)\n3364: N2 (1254375475917-1254375465872=10045 ms)\n3371: C21N7O17P3 (1254375479243-1254375475920=3323 ms)\n3377: C6N2O2 (1254375481175-1254375479306=1869 ms)\n3379: C3O6P (1254375482278-1254375481176=1102 ms)\n3390: O10P3 (1254375484356-1254375482286=2070 ms)\n3403: C5N2O3 (1254375486975-1254375484451=2524 ms)\n3499: C9NO3 (1254375487745-1254375487074=671 ms)\n3502: C5O8P (1254375489044-1254375487747=1297 ms)\n3532: C55MgN4O5 (1254375489206-1254375489097=109 ms)\n3537: C5NO4 (1254375494872-1254375489209=5663 ms)\n3546: Fe (1254375507646-1254375494892=12754 ms)\n3554: C5N2O2 (1254375507934-1254375507650=284 ms)\n3566: H2 (1254375508526-1254375508033=493 ms)\n3576: C12ClN4O7P2S (1254375508737-1254375508548=189 ms)\n3577: Mn (1254375511113-1254375508738=2375 ms)\n3582: C11NO6P (1254375511249-1254375511120=129 ms)\n3628: O7P2 (1254375554180-1254375511388=42792 ms)\n3633: O4P (1254375659461-1254375554183=105278 ms)\n3647: C12N4OS (1254375659706-1254375659481=225 ms)\n3664: C8HNO6P (1254375661230-1254375659713=1517 ms)\n3665: C9N4O8P (1254375661450-1254375661231=219 ms)\n3679: Mg (1254375679426-1254375661513=17913 ms)\n3859: C20N7O6 (1254375679768-1254375679522=246 ms)\n3860: C19N7O6 (1254375680069-1254375679769=300 ms)\n4026: Ca (1254375681849-1254375680582=1267 ms)\n4029: CNOR (1254375682983-1254375681850=1133 ms)\n4031: COR2 (1254375686384-1254375682984=3400 ms)\n4038: Cl (1254375686610-1254375686387=223 ms)\n4099: F (1254375687012-1254375686767=245 ms)\n4138: H (1254375722496-1254375687100=35396 ms)\n4163: C6NO2 (1254375722805-1254375722566=239 ms)\n4166: C6N2O2 (1254375724837-1254375722807=2030 ms)\n4167: Mg (1254375746423-1254375724838=21585 ms)\n4229: O (1254375754305-1254375746586=7719 ms)\n4254: H3O4P (1254375771602-1254375754367=17235 ms)\n4263: K (1254375771850-1254375771608=242 ms)\n4265: C5NO2 (1254375772195-1254375771852=343 ms)\n4297: Na (1254375772801-1254375772310=491 ms)\n4311: C4O3R (1254375773107-1254375772835=272 ms)\n4313: S (1254375795116-1254375773109=22007 ms)\n4356: Zn (1254375814849-1254375795263=19586 ms)\n4424: C5O5 (1254375818351-1254375814892=3459 ms)\n4453: C2 (1254375818489-1254375818369=120 ms)\n4482: C6N3O2 (1254375819699-1254375818525=1174 ms)\n4494: C (1254375821009-1254375819706=1303 ms)\n4519: Co (1254375821358-1254375821068=290 ms)\n4670: C11N2O2 (1254375821817-1254375821583=234 ms)\n4677: C4H2O4 (1254375822301-1254375821824=477 ms)\n4801: Ni (1254375822605-1254375822450=155 ms)\n4912: C5N2O3 (1254375823778-1254375822655=1123 ms)\n5060: C5O5 (1254375824119-1254375823908=211 ms)\n5111: C6O6 (1254375824420-1254375824212=208 ms)\n5143: Cu (1254375824613-1254375824502=111 ms)\n5357: C6N4O2 (1254375826277-1254375824919=1358 ms)\n5368: C9NO5 (1254375826504-1254375826289=215 ms)\n5369: Fe (1254375826620-1254375826505=115 ms)\n5380: C3HO6P (1254375827340-1254375826635=705 ms)\n5398: Na (1254375827949-1254375827359=590 ms)\n5400: K (1254375828174-1254375827951=223 ms)\n5402: Zn (1254375844116-1254375828175=15941 ms)\n5404: Ca (1254375845806-1254375844117=1689 ms)\n5438: HO (1254375846125-1254375845836=289 ms)\n5538: CH3 (1254375847891-1254375846233=1658 ms)\n5548: Ca (1254375861972-1254375847928=14044 ms)\n5560: H2N (1254375866398-1254375861980=4418 ms)\n5693: C10O6 (1254375867526-1254375866499=1027 ms)\n5814: O7P2 (1254375910579-1254375867608=42971 ms)\n5869: C2NOR2 (1254375914124-1254375910633=3491 ms)\n5871: C3NOSR2 (1254375914574-1254375914161=413 ms)\n5873: C6N4OR2 (1254375916764-1254375914575=2189 ms)\n5875: C11N2OR2 (1254375917143-1254375916766=377 ms)\n5877: C4NO3R2 (1254375917710-1254375917145=565 ms)\n5885: C6N2OR2 (1254375919573-1254375917716=1857 ms)\n5889: C5NO3R2 (1254375920901-1254375919576=1325 ms)\n5895: C6N3OR2 (1254375922306-1254375920904=1402 ms)\n5900: C5NO4 (1254375925689-1254375922310=3379 ms)\n5902: C5NO4 (1254375930626-1254375925693=4933 ms)\n5903: C5NO4 (1254375933920-1254375930626=3294 ms)\n5906: C4NO4 (1254375934593-1254375933924=669 ms)\n5907: C4NO4 (1254375935274-1254375934594=680 ms)\n5909: C4NO4 (1254375936451-1254375935274=1177 ms)\n5911: C9NOR2 (1254375936575-1254375936453=122 ms)\n5913: C3NO2R2 (1254375940197-1254375936577=3620 ms)\n5914: C3NOSeR2 (1254375940307-1254375940197=110 ms)\n5920: C6NOR2 (1254375940705-1254375940311=394 ms)\n5925: C5N2O2R2 (1254375942662-1254375940737=1925 ms)\n5926: C4NO2R2 (1254375943012-1254375942662=350 ms)\n5939: C4O4 (1254375943199-1254375943061=138 ms)\n5993: C2O2 (1254375943413-1254375943287=126 ms)\n6082: Zn (1254375958993-1254375943449=15544 ms)\n6453: C10N5O13P3 (1254376116554-1254375959193=157361 ms)\n6574: CHO2 (1254376116903-1254376116743=160 ms)\n6706: C5O5 (1254376117248-1254376117131=117 ms)\n7032: C3O4 (1254376117689-1254376117435=254 ms)\n7104: C5NO3R2 (1254376118481-1254376117843=638 ms)\n7252: C5NOSR2 (1254376118731-1254376118630=101 ms)\n7411: C3O3 (1254376120056-1254376119011=1045 ms)\n7465: CR (1254376121752-1254376120212=1540 ms)\n7627: CNO (1254376122089-1254376121887=202 ms)\n7741: C12ClN4OS (1254376122320-1254376122156=164 ms)\n7858: CO2R (1254376122547-1254376122436=111 ms)\n7891: Fe4S4 (1254376122844-1254376122585=259 ms)\n8178: Mn (1254376124399-1254376122960=1439 ms)\n8338: C4NO2 (1254376124643-1254376124480=163 ms)\n9219: C4NO6P (1254376127494-1254376124951=2543 ms)\n9234: C9N3O14P3 (1254376127605-1254376127498=107 ms)\n9235: C10N5O14P3 (1254376132596-1254376127605=4991 ms)\n9305: C3NO6P (1254376143823-1254376132629=11194 ms)\n9311: C6O6 (1254376144017-1254376143825=192 ms)\n9402: C5NOSR (1254376144332-1254376144214=118 ms)\n9427: C4O2R2 (1254376144645-1254376144419=226 ms)\n10281: O10P3 (1254376146646-1254376144942=1704 ms)\n10308: C2OR (1254376148204-1254376146659=1545 ms)\n10453: CHOR (1254376148682-1254376148321=361 ms)\n10506: HOR (1254376149933-1254376148793=1140 ms)\n10589: C21N7O17P3 (1254376150485-1254376150182=303 ms)\n10602: C5N2O2R (1254376150727-1254376150503=224 ms)\n10604: C5N2OR2 (1254376150946-1254376150728=218 ms)\n10614: C5N2O2 (1254376151170-1254376150949=221 ms)\n10617: C5N2OR (1254376151389-1254376151171=218 ms)\n10641: C3O6P (1254376152229-1254376151404=825 ms)\n10656: C16OR (1254376152505-1254376152235=270 ms)\n10688: C5O5 (1254376155565-1254376152582=2983 ms)\n10690: C3NO5PR2 (1254376166856-1254376155566=11290 ms)\n10729: C12N4O7P2S (1254376167090-1254376166889=201 ms)\n10748: C3NOR2 (1254376171309-1254376167097=4212 ms)\n10756: C34O4 (1254376172041-1254376171320=721 ms)\n10760: C9N2O15P3 (1254376172162-1254376172045=117 ms)\n10786: OR (1254376172419-1254376172169=250 ms)\n10828: C2NO2R (1254376173256-1254376172429=827 ms)\n10830: C2NOR (1254376174784-1254376173258=1526 ms)\n10883: C9NO2R2 (1254376175110-1254376174827=283 ms)\n10899: NR (1254376202420-1254376175114=27306 ms)\n10902: C2O (1254376203938-1254376202454=1484 ms)\n10914: C9NO5 (1254376204203-1254376203942=261 ms)\n11203: C6O6 (1254376204716-1254376204522=194 ms)\n11226: C4HO7P (1254376205190-1254376204732=458 ms)\n11680: C5NO2SR (1254376205649-1254376205392=257 ms)\n11681: C5NOSR (1254376205900-1254376205650=250 ms)\n11916: H (1254376206483-1254376206109=374 ms)\n12216: C5NOR2 (1254376208374-1254376206750=1624 ms)\n12217: C4N2O2R2 (1254376209040-1254376208375=665 ms)\n12680: COSR2 (1254376209601-1254376209314=287 ms)\n13478: C25HN2O19 (1254376210152-1254376209951=201 ms)\n13662: C5O8P (1254376210482-1254376210379=103 ms)\n</code></pre></div></div>",
      "summary": "Bioclipse has a bug report about browsing the ChEBI SD file in its moltable editor. Some entries make Bioclipse crash (as reported), or just very sluggish as with my Dell superlapcomputer :)",
      
      "date_published": "2009-10-01T00:00:00+00:00",
      "date_modified": "2009-10-01T00:00:00+00:00",
      "tags": ["cdk","groovy","chebi"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/wprrw-p9490",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/09/23/extension-point-for-running-junit-tests.html",
      "title": "Extension point for running JUnit tests in a RCP Application instance?",
      "content_html": "<p>One thing that has been on my wishlist is to be able to run the unit tests we have for <a href=\"http://www.bioclipse.net/\">Bioclipse</a> from inside a\nrunning Bioclipse instance. That is, we have a Bioclipse Test Suite features on the update site, matching the functional features we have\nthere. Each such test suite would run all <a href=\"http://www.junit.org/\">JUnit</a> tests we have for that feature.</p>\n\n<p>The good thing about this is twofold:</p>\n\n<ol>\n  <li>users can verify that their installation is working as intended</li>\n  <li>the development team can easily run the test suite on foreign systems, without the need to install a fully operational <a href=\"http://www.eclipse.org/\">Eclipse</a> with Bioclipse development workspace</li>\n</ol>\n\n<p>Now, the tricky thing is likely the following. How do we get to run all test suites? That is, I don’t want to need to have to run the suites\nfor each feature separately. Of course, this is exactly what <a href=\"http://www.vogella.de/articles/EclipseExtensionPoint/article.html\">extension points</a>\nare for.</p>\n\n<p>So, my question is, did anyone set up an system like this? And, is there an extension point that allows features to plugin additional\nJUnit test suites into a larger test suite dynamically?</p>",
      "summary": "One thing that has been on my wishlist is to be able to run the unit tests we have for Bioclipse from inside a running Bioclipse instance. That is, we have a Bioclipse Test Suite features on the update site, matching the functional features we have there. Each such test suite would run all JUnit tests we have for that feature.",
      
      "date_published": "2009-09-23T00:00:00+00:00",
      "date_modified": "2009-09-23T00:00:00+00:00",
      "tags": ["eclipse","bioclipse","junit"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/sna4h-d4c37",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/09/18/jchempaint-update-merging-of-patches.html",
      "title": "JChemPaint update: merging of patches and CDK statistics",
      "content_html": "<p>With the <a href=\"http://sourceforge.net/apps/mediawiki/cdk/index.php?title=JChemPaintWorkshop2009\">JChemPaint workshop</a> just passed, there is much work from UU and the\nEBI to be integrated. Moreover, <a href=\"http://rguha.wordpress.com/\">Rajarshi</a> just merged a lot of fixes from CDK <a href=\"http://github.com/egonw/cdk/tree/cdk-1.2.x\">1.2.x</a>\ninto the <a href=\"http://github.com/egonw/cdk\">master</a> branch, which will be a big rebase too. That said, I need to do this to recalculate source code statistics for\nthe CDK.</p>\n\n<p>The current set of JChemPaint patches looks like:</p>\n\n<p><img src=\"/assets/images/jcpStatus.png\" alt=\"\" /></p>\n\n<p>The two top most branches (<em>bioclipse-2.1.x</em> and <em>12-ebiStage</em>) are actually staging branches: patches that have not yet been integrated into the\nJChemPaint-Primary branch. Likewise, the <em>0-other</em> branch is a staging branch for patches that are in or up for the review process for CDK <em>master</em>\nitself.</p>\n\n<p>This will mean that I am now going to rebase all these branches once more.</p>",
      "summary": "With the JChemPaint workshop just passed, there is much work from UU and the EBI to be integrated. Moreover, Rajarshi just merged a lot of fixes from CDK 1.2.x into the master branch, which will be a big rebase too. That said, I need to do this to recalculate source code statistics for the CDK.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/jcpStatus.png",
      "date_published": "2009-09-18T00:10:00+00:00",
      "date_modified": "2009-09-18T00:10:00+00:00",
      "tags": ["cdk","jchempaint","git"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/rv6nz-p8j13",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/09/18/my-htmlrdfa-homepage.html",
      "title": "My HTML+RDFa homepage",
      "content_html": "<p>Finally got around to adding a few more bits to my new science homepage: <a href=\"http://egonw.github.com/\">egonw.github.com</a>. Cool thing about this new page is that it is\n<a href=\"http://www.w3.org/TR/rdfa-syntax/\">HTML+RDFa</a>, so, my new <a href=\"http://en.wikipedia.org/wiki/FOAF_%28software%29\">FOAF</a> profile is embedded in the HTML:</p>\n\n<p><img src=\"/assets/images/egonwGithub.png\" alt=\"\" /></p>\n\n<p>Down the bottom is link to extract the RDF triples:</p>\n\n<p><img src=\"/assets/images/egonwGithub1.png\" alt=\"\" /></p>\n\n<p>Next, is to write a piece of code that creates HTML+RDFa+BIBO from a BibTeX file, and to write a plugin for Bioclipse to extract triples from HTML+RDFA.</p>",
      "summary": "Finally got around to adding a few more bits to my new science homepage: egonw.github.com. Cool thing about this new page is that it is HTML+RDFa, so, my new FOAF profile is embedded in the HTML:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/egonwGithub1.png",
      "date_published": "2009-09-18T00:00:00+00:00",
      "date_modified": "2009-09-18T00:00:00+00:00",
      "tags": ["html","rdf","foaf"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/d7yv8-1m885",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/09/17/really-free-chemistry-books.html",
      "title": "Really free chemistry books",
      "content_html": "<p>With pleasure I read <a href=\"http://opendotdotdot.blogspot.com/2009/09/analogue-or-digital-both-please.html\">Analogue or Digital? - Both, Please</a>.\nFunnily, I just created MP3 (or, preferably <a href=\"http://en.wikipedia.org/wiki/Vorbis\">Ogg Vorbis</a>, superior but hardly\nany support by commercial companies, who rather seem to pay license fees) directly from the CD.</p>\n\n<p>Anyway… the blog wanders of to Google introducing searchable books, with many out-of-copyright. I was wondering\nhow many chemistry books the pre-1923 book set included, and that actually sums up to about\n<a href=\"http://books.google.com/books?lr=&amp;as_brr=0&amp;q=chemistry&amp;btnG=Search+Books&amp;as_drrb_is=b&amp;as_minm_is=0&amp;as_miny_is=&amp;as_maxm_is=0&amp;as_maxy_is=1922\">41 thousand books</a>,\njust for the <em>chemistry</em> search term.</p>\n\n<p>There is quite cool stuff there, like the <a href=\"http://books.google.com/books?id=HC_kQAAACAAJ&amp;dq=inauthor:lavoisier&amp;lr=&amp;as_drrb_is=b&amp;as_minm_is=0&amp;as_miny_is=&amp;as_maxm_is=0&amp;as_maxy_is=1922&amp;as_brr=0\">English translation of the works</a>\nof <a href=\"http://en.wikipedia.org/wiki/Antoine_Lavoisier\">Lavoisier</a>.</p>\n\n<p>This is really cool! I can just <a href=\"http://books.google.com/books/download/Elements_of_chemistry.pdf?id=adYKAAAAIAAJ&amp;output=pdf&amp;sig=ACfU3U1wSuUXDwx3MVNlSSWj7BFAjdjApw&amp;source=gbs_v2_summary_r&amp;cad=0\">download this</a>\nonto my eReader (which I don’t have yet anyway, but my Dell laptop will do fine; if only the PDF was broken), but\nthis actually allows me to read all the stuff I read about when doing History of Chemistry in the last year it\nwas given in <a href=\"http://ru.nl/\">Nijmegen</a>, back in 1993. Which was funny in itself, as the course was for second year\nstudents, but one of my introduction tutors suggested me to take it, which I did. It was a great course, by a great\nteacher, btw! It is a shame that the course was lost from the curriculum, much like I hated to see electrochemistry\nand cheminformatics lost in Nijmegen. Severe and very regrettable loss of diversity in the education there.</p>\n\n<p>Anyways, I’m going to need hours to browse all the goodies there. Did you spot the\n<a href=\"http://books.google.com/books?id=lndOAAAAMAAJ&amp;q=chemistry&amp;dq=chemistry&amp;lr=&amp;as_drrb_is=b&amp;as_minm_is=0&amp;as_miny_is=&amp;as_maxm_is=0&amp;as_maxy_is=1922&amp;as_brr=0\">1913 copy of the CRC Handbook of Chemistry and Physics</a>\nyet?</p>\n\n<p>I am looking forward to seeing people starting text mining on these books… anyway?</p>",
      "summary": "With pleasure I read Analogue or Digital? - Both, Please. Funnily, I just created MP3 (or, preferably Ogg Vorbis, superior but hardly any support by commercial companies, who rather seem to pay license fees) directly from the CD.",
      
      "date_published": "2009-09-17T00:00:00+00:00",
      "date_modified": "2009-09-17T00:00:00+00:00",
      "tags": ["openscience","publishing"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/jwawz-s1c12",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/09/13/vrms-meme-how-much-non-free-software.html",
      "title": "VRMS meme: how much non-free software are you running?",
      "content_html": "<p>Over at <a href=\"http://planet.ubuntu.com/\">Planet Ubuntu</a> there is a meme running around VRMS\n(<a href=\"http://en.wikipedia.org/wiki/Vrms\">Virtual Richard M. Stallman</a>, brilliant name!) which finds non-free software on\nyour desktop. I uninstalled Sun’s Java6, for which there is the\n<a href=\"http://www.outflux.net/blog/archives/2009/09/12/uninstall-sun-java6/\">OpenJDK6 alternative</a>.</p>\n\n<p>These are my current results:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>Non-free packages installed on egonw-laptop\n\nfglrx-modaliases          Identifiers supported by the ATI graphics driver\nlinux-generic             Complete Generic Linux kernel\nlinux-restricted-modules- Non-free Linux 2.6.28 modules helper script\nlinux-restricted-modules- Restricted Linux modules for generic kernels\nskype                     Skype - Take a deep breath\nsun-java5-bin             Sun Java(TM) Runtime Environment (JRE) 5.0 (architectu\nsun-java5-demo            Sun Java(TM) Development Kit (JDK) 5.0 demos and examp\nsun-java5-jdk             Sun Java(TM) Development Kit (JDK) 5.0\nsun-java5-jre             Sun Java(TM) Runtime Environment (JRE) 5.0 (architectu\n\n            Contrib packages installed on egonw-laptop\n\nflashplugin-installer     Adobe Flash Player plugin installer\nflashplugin-nonfree       Adobe Flash Player plugin installer (transitional pack\nmsttcorefonts             transitional dummy package\nttf-mscorefonts-installer Installer for Microsoft TrueType core fonts\n\n  9 non-free packages, 0.4% of 2050 installed packages.\n  4 contrib packages, 0.2% of 2050 installed packages.\n</code></pre></div></div>",
      "summary": "Over at Planet Ubuntu there is a meme running around VRMS (Virtual Richard M. Stallman, brilliant name!) which finds non-free software on your desktop. I uninstalled Sun’s Java6, for which there is the OpenJDK6 alternative.",
      
      "date_published": "2009-09-13T00:00:00+00:00",
      "date_modified": "2009-09-13T00:00:00+00:00",
      "tags": ["java","ubuntu","openscience"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/kmsvz-pr714",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/09/11/bioclipse-rdf-and-defeasible-reasoning.html",
      "title": "Bioclipse, RDF and defeasible reasoning",
      "content_html": "<p>Well, I have yet to read the paper in detail, but my <a href=\"http://saml.rilspace.com/\">new student Samuel</a> is going\nto work for 20 weeks on <a href=\"http://saml.rilspace.com/content/first-week-my-degree-project-passed\">defeasible reasoning with DrProlog</a>\nin <a href=\"http://www.bioclipse.net/\">Bioclipse</a>.</p>",
      "summary": "Well, I have yet to read the paper in detail, but my new student Samuel is going to work for 20 weeks on defeasible reasoning with DrProlog in Bioclipse.",
      
      "date_published": "2009-09-11T00:00:00+00:00",
      "date_modified": "2009-09-11T00:00:00+00:00",
      "tags": ["bioclipse","rdf","semweb"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/g3jct-gta96",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/09/10/open-chemical-data-1-nmrshiftdb.html",
      "title": "Open Chemical Data #1: NMRShiftDB",
      "content_html": "<p>As I reported earlier, progress is only possible of you can modify and redistribute. This is why Open Data, Open Source, and\nOpen Standards are so important to us <a href=\"http://en.wikipedia.org/wiki/Blue_Obelisk\">Blue Obelisk</a> members. For data,\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2009/05/18/open-data-license-rights-aggregation.html\">proper licensing <i class=\"fa-solid fa-recycle fa-xs\"></i></a> makes these two\nrequirements possible, but more importantly, make those rights explicit. <a href=\"http://depth-first.com/\">Rich</a> is running the\nnice <a href=\"http://zusammen.metamolecular.com/\">Zusammen</a> blog, but most of his entries are <strong>not</strong> Open Data. Even larger\nchemistry data repositories can be vague and have seemingly contradicting statements.</p>\n\n<p>One project which did it right, was the <a href=\"http://www.nmrshiftdb.org/\">NMRShiftDB</a>. They were ahead of their time and did\npick a proper Open license. By current standards not the best data license (the <a href=\"http://www.gnu.org/copyleft/fdl.html\">GNU FDL</a>),\nbut the best at the time. To push real Open Chemical Data a bit more, I will create a series much like Rich’ series, but\nwill make the restriction that the sources are clear about what rights they give users and that those include the rights\nto modify and redistribute the data without unreasonable restrictions.</p>\n\n<p>I will not say much about the database itself, and even less now, as I think the <em>NMRShiftDB</em> is well-known amongst my readers.</p>\n\n<p>Moreover, I have set up a <a href=\"http://friendfeed.com/\">FriendFeed</a> room, <a href=\"http://friendfeed.com/openchemicaldata\">Open Chemical Data</a>,\nwhere I will aggregate feeds of new molecules in these databases:</p>\n\n<p><img src=\"/assets/images/ocdGroup.png\" alt=\"\" /></p>\n\n<p>Now, the only problem is, I need candidate for this series, and cannot actually think of a third entry (second being the\n<a href=\"http://spreadsheets.google.com/ccc?key=plwwufp30hfq0udnEmRD1aQ\">Open Notebook Science Solubility</a> data)… Want to help\nme out? Please let me know which chemical database is using a Open Data license.</p>",
      "summary": "As I reported earlier, progress is only possible of you can modify and redistribute. This is why Open Data, Open Source, and Open Standards are so important to us Blue Obelisk members. For data, proper licensing makes these two requirements possible, but more importantly, make those rights explicit. Rich is running the nice Zusammen blog, but most of his entries are not Open Data. Even larger chemistry data repositories can be vague and have seemingly contradicting statements.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/ocdGroup.png",
      "date_published": "2009-09-10T00:00:00+00:00",
      "date_modified": "2026-03-07T00:00:00+00:00",
      "tags": ["opendata","friendfeed","chemistry"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/vkj7r-zc905",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/09/09/art-of-programming-do-not-leak.html",
      "title": "The Art of Programming: do not leak implementation details",
      "content_html": "<p>At the <a href=\"http://sourceforge.net/apps/mediawiki/cdk/index.php?title=JChemPaintWorkshop2009#Work\">JChemPaint workshop</a> here in Uppsala, where we have\nMark from <a href=\"http://www.steinbeck-molecular.de/steinblog/\">Chris</a>’ group as our guest, we encountered an inconsistency in <a href=\"http://cdk.sf.net/\">CDK</a> 1.2,\nwhere the bond stereochemistry did not yet follow the pattern recently adopted of having Class fields, to allow using <em>null</em> to have the semantics\nof undefined. Previously, the defaults for native values were confounded with set values. For example, the formal charge <em>unset</em> and 0 would be have\na field value <em>int = 0</em>.</p>\n\n<p>So, I am now writing a patch which replaces the use of <em>int</em> in <code class=\"language-plaintext highlighter-rouge\">IBond.getStereo()</code>. But instead of going for <code class=\"language-plaintext highlighter-rouge\">Integer</code>, the patch is actually\ngoing to use a enumeration.</p>\n\n<p>Now, getting the the Art of Programming… while writing patches in the CDK, you run into those lovely bits of code, where intention is mixed with\nimplementation details. They should not, and often do not need to, but they typically do. This is actually one reason why we now have a more strict\npeer-review installed. Below are two nice examples where intention is mixed with implementation detail.</p>\n\n<h3 id=\"example-1\">Example 1</h3>\n\n<div class=\"language-java highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kt\">int</span> <span class=\"n\">stereo</span> <span class=\"o\">=</span> <span class=\"n\">container</span><span class=\"o\">.</span><span class=\"na\">getBond</span><span class=\"o\">(</span><span class=\"n\">chiralNeighbours</span><span class=\"o\">.</span><span class=\"na\">get</span><span class=\"o\">(</span><span class=\"n\">i</span><span class=\"o\">),</span> <span class=\"n\">atom</span><span class=\"o\">).</span><span class=\"na\">getStereo</span><span class=\"o\">();</span>\n<span class=\"k\">if</span> <span class=\"o\">(</span><span class=\"n\">stereo</span> <span class=\"o\">==</span> <span class=\"mi\">0</span><span class=\"o\">)</span> <span class=\"o\">{</span>\n  <span class=\"c1\">// do something</span>\n<span class=\"o\">}</span>\n</code></pre></div></div>\n\n<p>This code is bad because we have no clue of what this code is supposed to do. When should the if-clause kick in? Be reminded that the <em>int = 0</em> has\nthe confounded meanings of <em>no stereochemistry</em> and perhaps <em>has stereochemistry, but no one ever bothered telling me</em>. So, which of the two situations\ndoes the if clause apply to. So, my patch can only assume that both were applicable (following the actual implementation), though I don’t think that\nmakes sense on an algorithmic level. Had the author used <code class=\"language-plaintext highlighter-rouge\">CDKConstants.STEREO_BOND_NONE</code> (which is the implementation for <em>int = 0</em> for <em>no stereochemistry</em>,\nthen I had known what the implementation was doing. Instead, the author chose to reuse implementation details: a hardcoded 0.</p>\n\n<h3 id=\"example-2\">Example 2</h3>\n\n<p>There is another instance of this problem. Look at this lovely piece of code:</p>\n\n<div class=\"language-java highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nc\">IBond</span> <span class=\"n\">bond</span> <span class=\"o\">=</span> <span class=\"n\">molecule</span><span class=\"o\">.</span><span class=\"na\">getBond</span><span class=\"o\">(</span><span class=\"n\">atomA</span><span class=\"o\">,</span> <span class=\"n\">unplacedAtom</span><span class=\"o\">);</span>\n<span class=\"k\">if</span> <span class=\"o\">(</span><span class=\"nc\">Math</span><span class=\"o\">.</span><span class=\"na\">abs</span><span class=\"o\">(</span><span class=\"n\">bond</span><span class=\"o\">.</span><span class=\"na\">getStereo</span><span class=\"o\">())</span> <span class=\"o\">&lt;</span> <span class=\"mi\">2</span>\n    <span class=\"o\">&amp;&amp;</span> <span class=\"nc\">Math</span><span class=\"o\">.</span><span class=\"na\">abs</span><span class=\"o\">(</span><span class=\"n\">bond</span><span class=\"o\">.</span><span class=\"na\">getStereo</span><span class=\"o\">())</span> <span class=\"o\">!=</span> <span class=\"mi\">0</span><span class=\"o\">)</span> <span class=\"o\">{</span>\n<span class=\"o\">}</span>\n</code></pre></div></div>\n\n<p>This example also uses hardcoded value, instead of the matching constants. Remember that <em>int = 0</em> had the meaning of no stereochemistry, so I assume\nthis code is determining is stereochemistry is defined for the bond, making nice use that those situations at some point were coded as non-zero values.\nMoreover, it is only interested in a few stereochemistry definitions, and from the implementation I learn (and that actually makes sense at this\nlocation) that it is only interested in those stereochemistry for which the first bond atom is the stereochemical center. This again is leaking\nimplementation details, instead of using semantically meaningful constants.</p>",
      "summary": "At the JChemPaint workshop here in Uppsala, where we have Mark from Chris’ group as our guest, we encountered an inconsistency in CDK 1.2, where the bond stereochemistry did not yet follow the pattern recently adopted of having Class fields, to allow using null to have the semantics of undefined. Previously, the defaults for native values were confounded with set values. For example, the formal charge unset and 0 would be have a field value int = 0.",
      
      "date_published": "2009-09-09T00:00:00+00:00",
      "date_modified": "2009-09-09T00:00:00+00:00",
      "tags": ["java","cdk","jchempaint"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/hx839-f4h59",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/09/08/updated-bioclipse-sdk-eclipse-35.html",
      "title": "Updated Bioclipse SDK: the Eclipse 3.5 version",
      "content_html": "<p>Last Friday, the Bioclipse 2.1 development series moved to <a href=\"http://update.eclipse.org/downloads/drops/R-3.5-200906111540/eclipse-news-all.html\">Eclipse 3.5</a>,\nso I had to update the <a href=\"http://wiki.bioclipse.net/index.php?title=Bioclipse_SDK\">Bioclipse SDK</a> too, which\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2009/08/13/making-bioclipse-development-easier-new.html\">we developed earlier <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.</p>\n\n<p>With a new Eclipse version also comes new screenshots to talk you through the process of setting up a new\n<a href=\"http://wiki.bioclipse.net/index.php?title=How_to_make_a_manager_2\">Bioclipse manager</a> plugin.</p>\n\n<h3 id=\"step-1\">Step 1</h3>\n\n<p>Right click in your workspace navigator, and choose <em>New -&gt; Project</em>:</p>\n\n<p><img src=\"/assets/images/newBCProject.png\" alt=\"\" /></p>\n\n<h3 id=\"step-2\">Step 2</h3>\n\n<p>And select to create a new <em>Plug-in Project</em>:</p>\n\n<p><img src=\"/assets/images/newBCProject1.png\" alt=\"\" /></p>\n\n<h3 id=\"step-3\">Step 3</h3>\n\n<p>Give a project name, such as <em>net.bioclipse.xml</em>:</p>\n\n<p><img src=\"/assets/images/newBCProject2.png\" alt=\"\" /></p>\n\n<h3 id=\"step-4\">Step 4</h3>\n\n<p>Tune the <code class=\"language-plaintext highlighter-rouge\">ID</code>, <code class=\"language-plaintext highlighter-rouge\">Version</code>, <code class=\"language-plaintext highlighter-rouge\">Name</code>, and <code class=\"language-plaintext highlighter-rouge\">Provider</code> to your liking:</p>\n\n<p><img src=\"/assets/images/newBCProject3.png\" alt=\"\" /></p>\n\n<h3 id=\"step-5\">Step 5</h3>\n\n<p>Then select <em>Bioclipse Manager</em>:</p>\n\n<p><img src=\"/assets/images/newBCProject4.png\" alt=\"\" /></p>\n\n<h3 id=\"step-6\">Step 6</h3>\n\n<p>The next wizard page is specific the the Bioclipse manager, and asks a manager namespace, which will be used as prefix in the JavaScript Console.\nFor example, if I make the namespace <code class=\"language-plaintext highlighter-rouge\">xml</code>, then I will type <code class=\"language-plaintext highlighter-rouge\">xml.someMethod()</code> inside the JavaScript. The default manager name is typically\nOK by default:</p>\n\n<p><img src=\"/assets/images/newBCProject5.png\" alt=\"\" /></p>\n\n<p>Then click Finish and let Eclipse set up the new project.</p>\n\n<h3 id=\"step-7\">Step 7</h3>\n\n<p>Because I have not figured out yet how to add <em>Import-Package</em> to the <code class=\"language-plaintext highlighter-rouge\">MANIFEST.MF</code> programmatically, you will have to do this manually. Add\nthe last line of the next screenshot to the MANIFEST.MF of your new plugin:</p>\n\n<p><img src=\"/assets/images/newBCProject6.png\" alt=\"\" /></p>",
      "summary": "Last Friday, the Bioclipse 2.1 development series moved to Eclipse 3.5, so I had to update the Bioclipse SDK too, which we developed earlier .",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/newBCProject2.png",
      "date_published": "2009-09-08T00:00:00+00:00",
      "date_modified": "2026-03-07T00:00:00+00:00",
      "tags": ["bioclipse","eclipse","java"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/3afny-p1c44",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/09/05/nmrshiftdb-rdf-2-some-statistics.html",
      "title": "NMRShiftDB RDF #2: Some statistics",
      "content_html": "<p>This morning I had some more fun, and since the <a href=\"http://www.ebi.ac.uk/nmrshiftdb/nmrshiftdbhtml/statistics.html\">statistics</a> view on the\n<a href=\"http://www.nmrshiftdb.org/\">NMRShiftDB</a> server is down, I though I could recalculate the statistics myself. Because the current RDF\nversion of the data does not include all information yet, I cannot reproduce all of them. On the other hand, I can determine some other\ninteresting statistics.</p>\n\n<h2 id=\"spectra-per-spectrum-type\">Spectra per spectrum type</h2>\n\n<p>One of the statistics given in the aforementioned page is the number of spectra per nuclei. This can be recalculated with the following SPARQL:</p>\n\n<script src=\"https://gist.github.com/181315.js\"></script>\n\n<p>The results for the 1.3.3 release are:</p>\n\n<table>\n  <thead>\n    <tr>\n      <th>nucleus</th>\n      <th>count</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <td>13C</td>\n      <td>21958</td>\n    </tr>\n    <tr>\n      <td>1H</td>\n      <td>3031</td>\n    </tr>\n    <tr>\n      <td>11B</td>\n      <td>326</td>\n    </tr>\n    <tr>\n      <td>17O</td>\n      <td>131</td>\n    </tr>\n    <tr>\n      <td>15N</td>\n      <td>79</td>\n    </tr>\n    <tr>\n      <td>195Pt</td>\n      <td>68</td>\n    </tr>\n    <tr>\n      <td>19F</td>\n      <td>50</td>\n    </tr>\n    <tr>\n      <td>31P</td>\n      <td>38</td>\n    </tr>\n    <tr>\n      <td>73Ge</td>\n      <td>18</td>\n    </tr>\n    <tr>\n      <td>33S</td>\n      <td>8</td>\n    </tr>\n    <tr>\n      <td>29Si</td>\n      <td>5</td>\n    </tr>\n  </tbody>\n</table>\n\n<p>I am a bit surprised by the count for the silicon NMR spectra, as I would have thought I alone had entered more than just five.</p>\n\n<h2 id=\"molecules-with-the-most-spectra\">Molecules with the most spectra</h2>\n\n<p>It turns out that the molecules have in the 1.3.3 NMRShiftDB release at most 7 spectra, as I can calculate with:</p>\n\n<script src=\"https://gist.github.com/181324.js\"></script>\n\n<p>That is going to change, as the paper I am digitizing now (doi:<a href=\"http://dx.doi.org/10.1021/jo971176v\">10.1021/jo971176v</a>) has carbon and\nhydrogen NMR spectra for 7 solvents for each compound :) It should be possible to summarize the number of molecules for each number of\nspectra per molecule, but did not manage to get this SPARQL to work out well.</p>\n\n<p>BTW, did you know you can find reprint PDFs of a paper (if any; this one happens to have a <a href=\"http://ccc.chem.pitt.edu/wipf/Web/4505.pdf\">PDF copy</a>)\nwith Google using the title in quotes and <code class=\"language-plaintext highlighter-rouge\">filetype:pdf</code>? Try <a href=\"http://www.google.com/search?hl=en&amp;&amp;as_epq=NMR+Chemical+Shifts+of+Common+Laboratory+Solvents+as+Trace+Impurities+&amp;as_oq=&amp;as_eq=&amp;num=10&amp;lr=&amp;as_filetype=pdf&amp;ft=i&amp;as_sitesearch=&amp;as_qdr=all&amp;as_rights=&amp;as_occt=any&amp;cr=&amp;as_nlo=&amp;as_nhi=&amp;safe=images\">this query</a>.\nThe top hit was molecule 10016314 (<a href=\"http://pele.farmbio.uu.se/nmrshiftdb/?moleculeId=10016314\">RDF</a>), which has 4 <sup>13</sup>C\nspectra, one <sup>15</sup>N and two proton NMR spectra.</p>\n\n<h2 id=\"molecules-with-the-most-different-nuclei\">Molecules with the most different nuclei</h2>\n\n<p>In the first query, we already save saw in the first SPARQL, there are 11 different nuclei in the database, though carbon and\nhydrogen are by far the most abundant spectra. I like diversity, so one statistic I find interesting, is the molecules which\nhave spectra with the most different nuclei. This is done with the query:</p>\n\n<script src=\"https://gist.github.com/181326.js\"></script>\n\n<p>It shows that molecule 10023801 (<a href=\"http://pele.farmbio.uu.se/nmrshiftdb/?moleculeId=10023801\">RDF</a>) has 5 different NMR types:\n<sup>13</sup>C spectra, one <sup>15</sup>N, <sup>29</sup>Si spectra, one <sup>17</sup>O, and <sup>1</sup>H spectra. Unfortunately,\nthe compound also has chlorines, so it disqualifies as molecule for which NMR spectra are available for all its elements.</p>",
      "summary": "This morning I had some more fun, and since the statistics view on the NMRShiftDB server is down, I though I could recalculate the statistics myself. Because the current RDF version of the data does not include all information yet, I cannot reproduce all of them. On the other hand, I can determine some other interesting statistics.",
      
      "date_published": "2009-09-05T00:00:00+00:00",
      "date_modified": "2009-09-05T00:00:00+00:00",
      "tags": ["nmrshiftdb","sparql"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1021/jo971176v", "doi": "10.1021/jo971176v"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/7k8wn-jee62",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/09/05/nmrshiftdb-rdf-1-spectra-by-inchi.html",
      "title": "NMRShiftDB RDF #1: Spectra by InChI",
      "content_html": "<p>Originally, I wanted to include a SPARQL query in <a href=\"https://chem-bla-ics.linkedchemistry.info/2009/09/04/nmrshiftdb-enters-rdfopenmoleculesnet-2.html\">my yesterdays blog <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nshowing how to retrieve <a href=\"http://www.nmrshiftdb.org/\">NMRShiftDB</a> spectra based on an InChIKey, but it horribly failed. I have yet to discover why. This\nmorning I discovered that it is specific for that field, and that using the same thing with InChI is no problem:</p>\n\n<script src=\"https://gist.github.com/181307.js\"></script>",
      "summary": "Originally, I wanted to include a SPARQL query in my yesterdays blog showing how to retrieve NMRShiftDB spectra based on an InChIKey, but it horribly failed. I have yet to discover why. This morning I discovered that it is specific for that field, and that using the same thing with InChI is no problem:",
      
      "date_published": "2009-09-05T00:00:00+00:00",
      "date_modified": "2026-03-07T00:00:00+00:00",
      "tags": ["nmrshiftdb","sparql"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/nv925-tje87",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/09/04/nmrshiftdb-enters-rdfopenmoleculesnet-2.html",
      "title": "NMRShiftDB enters rdf.openmolecules.net #2: SPARQL end point with Virtuoso",
      "content_html": "<p>About 6 months ago I <a href=\"http://chem-bla-ics.blogspot.com/2009/03/nmrshiftdb-enters-rdfopenmoleculesnet.html\">reported</a> about my efforts to RDF-ize the data from the\n<a href=\"http://www.nmrshiftdb.org/\">NMRShiftDB</a>. Since then, time was consumed by many other things, but now that <a href=\"http://www.bioclipse.net/\">Bioclipse</a> can query\n<a href=\"http://en.wikipedia.org/wiki/SPARQL\">SPARQL</a> end points, that I want to contribute the triple set (it is <a href=\"http://www.gnu.org/copyleft/fdl.html\">GNU FDL</a>-licensed)\nto <a href=\"http://www.bio2rdf.org/\">Bio2RDF</a>, that a student started working in my group (now larger than just me :) on reasoning on life sciences data, and that I\nrecently contributed my <a href=\"http://egonw.posterous.com/nmrshiftdb-1006-contributions-and-counting\">1000th NMR spectrum</a> to the database, I thought it was time to\nfinally reinstall <a href=\"http://www.openlinksw.com/wiki/main/Main/VOSDownload\">Virtuoso</a>.</p>\n\n<p>There are precompiled binaries for <a href=\"https://launchpad.net/~wdaniels/+archive/ppa\">Ubuntu</a> and <a href=\"http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=508048\">Debian</a>,\nbut Michel encouraged me to use version 6 when <a href=\"http://chem-bla-ics.blogspot.com/2009/06/michel-dumontier-at-uppsala-university.html\">he visited us</a>.\nAnd so I compiled and install <a href=\"https://sourceforge.net/projects/virtuoso/files/virtuoso-devel/6.0.0-TP1/\">6.0.0.TP1</a> on the public server, while I do have the\nbinary debs for 5.0.12 on my laptop. With some basic Apache magic, I hooked up the SPARQL end point of the server to the web:</p>\n\n<div class=\"language-xml highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nt\">&lt;Proxy</span> <span class=\"err\">/nmrshiftdb/sparql</span><span class=\"nt\">&gt;</span>\n  RewriteEngine On\n  Allow from all\n  ProxyPass        http://localhost:8890/sparql\n  ProxyPassReverse http://localhost:8890/sparql\n<span class=\"nt\">&lt;/Proxy&gt;</span>\n</code></pre></div></div>\n\n<p>Nice thing about this is, that I can set up multiple servers, allowing me to keep incompatibly licensed data sets apart (see\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2009/05/18/open-data-license-rights-aggregation.html\">Open Data: license, rights, aggregation, clean interfaces? <i class=\"fa-solid fa-recycle fa-xs\"></i></a>), which is\nthe same approach Bio2RDF is taking.</p>\n\n<p>The <a href=\"http://pele.farmbio.uu.se/nmrshiftdb/sparql\">end point</a> now offers about <a href=\"http://pele.farmbio.uu.se/nmrshiftdb/sparql?default-graph-uri=&amp;query=SELECT+count%28*%29+WHERE+{\\%0D%0A++%3Fs+%3Fp+%3Fo+.%0D%0A}&amp;format=text%2Fhtml&amp;debug=on\">278887</a>\ntriples, but this will soon rise as I make more content from the database available in the original SQL database. The data is from the\n<a href=\"https://sourceforge.net/projects/nmrshiftdb/files/nmrshiftdb/1.3.3/\">1.3.3 release</a> by <a href=\"http://www.steinbeck-molecular.de/steinblog/\">Chris</a>’\nteam, and does not include my 1000th spectrum.</p>\n\n<p>Getting the data into the database was not trivial either. The documentation suggests WebDAV, and that indeed worked for me once, after\nusing the <a href=\"http://www.snee.com/bobdc.blog/2009/02/getting-started-using-virtuoso.html\">curl approach suggested here</a>. But upon a second upload, it\ndid again not enter the store. The ultimate solution was to use the iSQL interface, with the following SQL</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>DB.DBA.RDF_LOAD_RDFXML_MT(\n  file_to_string_output('/tmp/nmrshiftdb.rdf'), '',\n  'http://pele.farmbio.uu.se/nmrshiftdb'\n);\n</code></pre></div></div>\n\n<p>Scientifically, this progress is not overly interesting, although it makes it very clear that you really should not have to be happy with proprietary\nand non-semantic formats for anything. But, to me, this is mostly a technological success of great importance: I can now share really large sets of\nRDF data.</p>\n\n<p>Querying this data is a simple with SPARQL, and the results are available in various formats, such as JSON, which makes it easy to integrate in\nthird-party applications or <a href=\"http://chem-bla-ics.blogspot.com/2009/09/google-wave-robot-for-cdk-functionality.html\">Google Wave robots</a>\n(did I hear someone say <a href=\"http://nmrshifty.appspot.com/\">NMRShifty</a>?). As I have <a href=\"http://chem-bla-ics.blogspot.com/search?q=sparql\">blogged before</a>,\nSPARQL is an excellent tool to aggregate scientific data prior to data analysis. And I will demo more interesting queries later this month.</p>",
      "summary": "About 6 months ago I reported about my efforts to RDF-ize the data from the NMRShiftDB. Since then, time was consumed by many other things, but now that Bioclipse can query SPARQL end points, that I want to contribute the triple set (it is GNU FDL-licensed) to Bio2RDF, that a student started working in my group (now larger than just me :) on reasoning on life sciences data, and that I recently contributed my 1000th NMR spectrum to the database, I thought it was time to finally reinstall Virtuoso.",
      
      "date_published": "2009-09-04T00:00:00+00:00",
      "date_modified": "2026-03-07T00:00:00+00:00",
      "tags": ["rdf","sparql","nmrshiftdb","cheminf"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/t5hgg-stt44",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/09/02/open-knowledge-reproducibility-in.html",
      "title": "Open Knowledge: Reproducibility in Cheminformatics with ODOSOS",
      "content_html": "<p>Below are the slides of my presentation of last Monday (see my <a href=\"http://chem-bla-ics.blogspot.com/2009/08/reminder-my-talk-in-frankfurt-on-monday.html\">earlier</a>\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2009/04/08/open-knowledge-reproducibility-in.html\">blogs <i class=\"fa-solid fa-recycle fa-xs\"></i></a>):</p>\n\n<p><a href=\"https://zenodo.org/records/2652077\"><img src=\"/assets/images/frankfurthSlides.png\" alt=\"\" /></a></p>",
      "summary": "Below are the slides of my presentation of last Monday (see my earlier blogs ):",
      
      "date_published": "2009-09-02T00:10:00+00:00",
      "date_modified": "2026-03-07T00:00:00+00:00",
      "tags": ["cheminf"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.5281/ZENODO.2652076", "doi": "10.5281/ZENODO.2652076"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/stdr4-6rm16",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/09/02/google-wave-robot-for-cdk-functionality.html",
      "title": "Google Wave robot for CDK functionality",
      "content_html": "<p>I was really happy to hear early last week that I was invited to take part in the <a href=\"https://chem-bla-ics.linkedchemistry.info/2009/08/17/social-web-does-not-wait-for-bioclipse.html\">Google Wave <i class=\"fa-solid fa-recycle fa-xs\"></i></a> beta,\nand received my account details this Monday, while at attending (and <a href=\"http://chem-bla-ics.blogspot.com/2009/09/open-knowledge-reproducibility-in.html\">speaking at</a>) the GDCh\nWissenschaftsforum Chemie 2009. Yesterday was a travel day, and while working on course material for the <a href=\"http://www.pharmbio.org/\">Pharmaceutical Bioinformatics</a> course that\nuses <a href=\"http://www.bioclipse.net/\">Bioclipse</a>, I set up an Eclipse environment for development of a wave robot. <a href=\"http://code.google.com/apis/wave/extensions/robots/java-tutorial.html\">Documentation</a>\nwas very clear, and deployment on <a href=\"http://www.appspot.com/\">Appspot</a> one click on the appropriate button. Great work from the people from Google! It was all so easy, I could not\nresist pushing things a bit further, and looked carefully at other robots, like <a href=\"http://www.chemspider.com/blog/chemspidey-rides-the-wave-courtesy-of-cameron-neylon.html\">ChemSpidey</a>\nby <a href=\"http://blog.openwetware.org/scienceintheopen/2009/08/27/writing-a-wave-robot-some-thoughts-on-good-practice-for-research-robots/\">Cameron</a> and\n<a href=\"http://blogs.nature.com/wp/nascent/2009/07/igor_a_google_wave_robot_to_ma.html\">Igor</a> by <a href=\"http://www.ghastlyfop.com/blog/\">Euan</a>, to see how text replacement is done,\nand wrote my first functional robot, <em>CDKitty (<strong>chemdevelkit@appspot.com</strong>)</em>:</p>\n\n<p><img src=\"/assets/images/cdkitty.png\" alt=\"\" /></p>\n\n<p>It seems that it is a policy that wave robot names end with <code class=\"language-plaintext highlighter-rouge\">-y</code>, so CDKitty sounded somewhat appropriate. Anyways, the robot is not overly functional yet, but it has\na <em>profile</em> (which took some extra googling) and one function <strong><em>mwOf</em></strong>. Add the robot to your wave and prefix a molecular formula with <code class=\"language-plaintext highlighter-rouge\">mwOf:</code>,\nand CDKitty will calculate the molecular formula on the fly. Clearly, this opens up a whole new application world for the <a href=\"http://cdk.sf.net/\">CDK</a>,\nand you can leave feature requests at the <a href=\"http://github.com/egonw/CDKitty/issues\">issue tracker</a> of the <a href=\"http://github.com/egonw/CDKitty\">project home at GitHub</a>.\nPatches are most welcome too! :)</p>\n\n<p>BTW, it seems I messed up the regular expression, which seems not to be including the last digit (filed as <a href=\"http://github.com/egonw/CDKitty/issues/#issue/1\">issue 1</a>).</p>\n\n<p>Almost forgot to add that: many thanx to <a href=\"http://blog.openwetware.org/scienceintheopen/\">Cameron</a> for the insightful discussions we had over applecider,\nWeisse and German dinner on Monday evening!</p>",
      "summary": "I was really happy to hear early last week that I was invited to take part in the Google Wave beta, and received my account details this Monday, while at attending (and speaking at) the GDCh Wissenschaftsforum Chemie 2009. Yesterday was a travel day, and while working on course material for the Pharmaceutical Bioinformatics course that uses Bioclipse, I set up an Eclipse environment for development of a wave robot. Documentation was very clear, and deployment on Appspot one click on the appropriate button. Great work from the people from Google! It was all so easy, I could not resist pushing things a bit further, and looked carefully at other robots, like ChemSpidey by Cameron and Igor by Euan, to see how text replacement is done, and wrote my first functional robot, CDKitty (chemdevelkit@appspot.com):",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/cdkitty.png",
      "date_published": "2009-09-02T00:00:00+00:00",
      "date_modified": "2026-03-07T00:00:00+00:00",
      "tags": ["cdk","cheminf","google","wave"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/mwcz7-gmj05",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/08/29/reminder-my-talk-in-frankfurt-on-monday.html",
      "title": "Reminder: my talk in Frankfurt on Monday; Want to meet up?",
      "content_html": "<p>Quick and short reminder about my <a href=\"https://chem-bla-ics.linkedchemistry.info/2009/04/08/open-knowledge-reproducibility-in.html\">Open Knowledge: Reproducibility in Cheminformatics with Open Data, Open Source and Open Standards <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\ntalk on Monday. The session is great anyway, with other talks from <a href=\"http://blog.openwetware.org/scienceintheopen/\">Cameron</a>, <a href=\"http://chembl.blogspot.com/\">John</a> and\nsomeone from Berlin on a Open Access HTS system (which reminds me to talk about the <em>Open Access</em> and that the term is tainted).</p>\n\n<p>I still have a free program, other than I want to see <a href=\"https://chem-bla-ics.linkedchemistry.info/2009/08/17/social-web-does-not-wait-for-bioclipse.html\">Google Wave <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nin action (and while I have receive my invitation, I have not received a login account yet). There is a potentially interesting talk about\n<em>Second Generation Small Molecule Therapeutics</em> by 15:00. But no plans otherwise for the afternoon and/or evening.</p>\n\n<p>If you like to talk about <a href=\"http://cdk.sf.net/\">CDK</a>, <a href=\"http://www.bioclipse.net/\">Bioclipse</a> and/or the <a href=\"http://blueobelisk.sourceforge.net/wiki/Main_Page\">Blue Obelisk movement</a>.\nOr about my talk on Open Data, Open Standards and Open Source (ODOSOS) in chemoinformatics.</p>\n\n<p>If you happen to be around the Frankfurt Westend campus. In building 4, I think, the Hörsaalzentrum, where the conference is. Please\nlet me know if you like to meet up. I hope to be online :), but no promise on that… should work at a Uni location, not?\nLet’s see… This is how to ping me, and don’t worry about redundancy.</p>\n\n<p><strong>Email</strong>: egon.willighagen at gmail dot com <br />\n<strong>IRC</strong>: #cdk at irc.freenode.net <br />\n<strong>Twitter</strong>: <a href=\"http://twitter.com/egonwillighagen\">egonwillighagen</a> <br />\n<strong>Identica</strong>: <a href=\"http://identi.ca/chemblaics\">chemblaics</a> <br />\n<strong>Blog</strong>: just leave a reply to this message <br /></p>",
      "summary": "Quick and short reminder about my Open Knowledge: Reproducibility in Cheminformatics with Open Data, Open Source and Open Standards talk on Monday. The session is great anyway, with other talks from Cameron, John and someone from Berlin on a Open Access HTS system (which reminds me to talk about the Open Access and that the term is tainted).",
      
      "date_published": "2009-08-29T00:00:00+00:00",
      "date_modified": "2026-03-07T00:00:00+00:00",
      "tags": ["openscience","openaccess","blue-obelisk"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/x5ez1-x4t24",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/08/22/plos-one-and-chemical-blogspace-about.html",
      "title": "PLoS ONE and Chemical blogspace: About no Impact yet",
      "content_html": "<p>Journals in chemistry are pretty well fixed. <em>JACS</em>, <em>Angewandte Chemie</em> are clear leaders. <em>Nature</em> and <em>Science</em> if you have something that\nwill attract many scientists. For the rest many smaller journals exist more dedicated at particular research areas.</p>\n\n<p><img src=\"/assets/images/cbJournalRankings.png\" alt=\"\" /></p>\n\n<p><a href=\"http://www.plosone.org/\">PLoS ONE</a> is a new journal that changes the way science is published: it publishes anything that is scientifically\nsound and does not make any judgement on impact and lets the community deal with that. <a href=\"http://blog.openwetware.org/scienceintheopen/\">Cameron Neylon</a>\nrecently had him taped to discuss <a href=\"http://vimeo.com/5696434\">article-level metrics used at PLoS ONE</a> (see also\n<a href=\"http://shirleywho.wordpress.com/2009/08/06/the-evolution-of-scientific-impact/\">this</a>).</p>\n\n<p>And, PONE (as they affectionately call it) seems to be steadily growing to, at least, become a BIG publisher. Clearly, not dedicating yourself to a small discipline helps. And the IT we have had around for the past 10 years make this large scale publishing possible. The impact of a paper becomes clear through those article level metrics.</p>\n\n<p>Finding interesting papers, however, may be a bit more difficult. There are dedicated RSS feeds listed at the front page:</p>\n\n<p><img src=\"/assets/images/poneFeeds.png\" alt=\"\" /></p>\n\n<p>And I recently subscribed to <a href=\"http://www.plosone.org/article/browse.action?startPage=0&amp;field=&amp;pageSize=10&amp;catName=Chemistry\">the Chemistry feed</a>\n(<a href=\"http://feeds2.feedburner.com/plosone/Chemistry\">RSS</a>).</p>\n\n<p>One of the sources taken into account for the article-level metrics is Postgenomic.com, and you may be aware that\n<a href=\"http://cb.openmolecules.net/\">Chemical blogspace</a> is using the same software. However, us ~60 active have not been paying attention\nthis PONE feed. Well, there have appeared only 84 papers yet in this subsection:</p>\n\n<p><img src=\"/assets/images/poneChemistryFeed.png\" alt=\"\" /></p>\n\n<p>… but only one has been cited in Chemical blogspace, which is a bit disappointing:</p>\n\n<p><img src=\"/assets/images/poneCbPaper.png\" alt=\"\" /></p>\n\n<p>So, what are your reasons you do not read this journal yet?</p>\n\n<p>I have spotted one paper which I will soon read and review: <em>How Large Is the Metabolome? A Critical Analysis of Data Exchange Practices in Chemistry</em>\n(doi:<a href=\"http://dx.doi.org/10.1371/journal.pone.0005440\">10.1371/journal.pone.0005440</a>).</p>",
      "summary": "Journals in chemistry are pretty well fixed. JACS, Angewandte Chemie are clear leaders. Nature and Science if you have something that will attract many scientists. For the rest many smaller journals exist more dedicated at particular research areas.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/cbJournalRankings.png",
      "date_published": "2009-08-22T00:00:00+00:00",
      "date_modified": "2009-08-22T00:00:00+00:00",
      "tags": ["cb","chemistry","publishing","plos"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1371/journal.pone.0005440", "doi": "10.1371/journal.pone.0005440"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/qyshc-pn870",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/08/21/bioclipse-and-sparql-end-points-2.html",
      "title": "Bioclipse and SPARQL end points #2: MyExperiment",
      "content_html": "<p><a href=\"http://en.wikipedia.org/wiki/Resource_Description_Framework\">RDF</a> and <a href=\"http://en.wikipedia.org/wiki/SPARQL\">SPARQL</a>\nare two really useful Open Standards. <a href=\"http://github.com/egonw/bioclipse.rdf/tree/master\">Bioclipse-RDF</a> is a\nplugin for <a href=\"http://www.bioclipse.net/\">Bioclipse</a> that provide RDF functionality, among which using remote SPARQL end points.</p>\n\n<p>The <a href=\"http://www.myexperiment.org/\">MyExperiment</a> team has set up an excellent <a href=\"http://rdf.myexperiment.org/\">RDF front end</a>.\nFor example, this is <a href=\"http://rdf.myexperiment.org/User/286\">my MyExperiment account in RDF</a>. The storage gets updated\nonce every day (at this moment), but I’m sure that will become more often in the future. The SPARQL end point\nallows us to make any query against the database that <a href=\"http://rdf.myexperiment.org/ontologies/\">their ontologies</a>\nsupport. The above query showed up 132 workflows when I ran it today.</p>\n\n<h2 id=\"gists\">Gists</h2>\n\n<p>Now, so far I have been using <a href=\"http://gist.github.com/\">Gist</a> to share Bioclipse scripts and I wrote\nsome <a href=\"https://chem-bla-ics.linkedchemistry.info/2009/01/16/bioclipse-and-gist-integration.html\">Bioclipse GUI elements for downloading such gists <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.\nTo annotate these gists, <a href=\"http://delicious.com/\">Delicious</a> has been used, and a listing of Bioclipse scripts can be found under the\ntags <a href=\"http://delicious.com/tag/bioclipse+gist\">bioclipse and gist</a>.</p>\n\n<p>MyExperiment also allows to share workflows, but originally only for <a href=\"http://taverna.sf.net/\">Taverna</a>.\nA recent change, however, made it possible to share other <em>types</em> of workflows too. And, MyExperiment\nitself also allows all the annotation which we may want to do.</p>\n\n<p>Now, using the Bioclipse-RDF functionality, I can query the MyExperiment database and use that information\ndo to stuff. If this stuff is a Bioclipse script, then I can just download it, as the download link of a\nworkflow is part of the RDF too, as we will see.</p>\n\n<h2 id=\"querying-a-sparql-end-point\">Querying a SPARQL end point</h2>\n\n<p>As we have seen in the <a href=\"https://chem-bla-ics.linkedchemistry.info/2009/08/16/bioclipse-and-sparql-end-points.html\">first article of this series <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\nthe RDF manager his a method to query a remote SPARQL end point. The complexity is mostly in formulating the SPARQL (and this one\nhappens to be available as <a href=\"http://www.myexperiment.org/workflows/890\">workflow on MyExperiment too</a>:</p>\n\n<p><img src=\"/assets/images/myExp890.png\" alt=\"\" /></p>\n\n<p>This is worsened by the fact that JavaScript does not have a type of multiline Strings, so the backslashes at\nthe end of the lines are JavaScript syntax and not part of the SPARQL. To simplify the SPARQL, I will show\nbelow the SPARQL only, and not the Bioclipse script wrapping as is done in the above code snippet.</p>\n\n<h2 id=\"list-all-taverna-2-workflows\">List all Taverna 2 workflows</h2>\n\n<p>Listing all Taverna 2 workflows, as shown in that earlier snippet, is done with the SPARQL:</p>\n\n<script src=\"https://gist.github.com/egonw/172138.js\"></script>\n\n<p>This query asks for a <code class=\"language-plaintext highlighter-rouge\">?workflow</code> and its <code class=\"language-plaintext highlighter-rouge\">?title</code>, and the workflow <code class=\"language-plaintext highlighter-rouge\">?type</code> must be of Class <code class=\"language-plaintext highlighter-rouge\">ContentType</code> as defined in the\n<code class=\"language-plaintext highlighter-rouge\">mebase</code> namespace, and we want to know the <code class=\"language-plaintext highlighter-rouge\">?typetitle</code> of that content type, because we are filtering that using a\n<a href=\"http://en.wikipedia.org/wiki/Regular_expression\">regular expression</a> to contain “Taverna 2”. Well, if you cannot\nfollow this, just <a href=\"http://www.bing.com/search?q=sparql+tutorial&amp;go=&amp;form=QBLH&amp;filt=all\">google for SPARQL</a>,\nand run one of those tutorials which are abundantly present on the web.</p>\n\n<h2 id=\"finding-tags-used-to-annotate-workflows\">Finding tags used to annotate workflows</h2>\n\n<p>To list all tags which have likely to do with metabolomics, I can do:</p>\n\n<script src=\"https://gist.github.com/egonw/172277.js\"></script>\n\n<p>And I can also list all workflows that are tagged like this. Because I could not get string matching to work, I used the tag’s URI instead:</p>\n\n<script src=\"https://gist.github.com/egonw/172685.js\"></script>\n\n<h2 id=\"all-myexperiments-users-in-sweden\">All MyExperiments Users in Sweden</h2>\n\n<p>I was also interested in all MyExperiment Users in Sweden, and again, a simple SPARQL tells me where they live:</p>\n\n<script src=\"https://gist.github.com/egonw/172129.js\"></script>\n\n<h2 id=\"finding-duncan-and-pierre\">Finding Duncan and Pierre</h2>\n\n<p>Very easy to find users, such as <a href=\"http://duncan.hull.name/\">Duncan</a>:</p>\n\n<script src=\"https://gist.github.com/egonw/172686.js\"></script>\n\n<p>Or <a href=\"http://plindenbaum.blogspot.com/\">Pierre</a>, who has not listed where he lives:</p>\n\n<script src=\"https://gist.github.com/egonw/172687.js\"></script>\n\n<h2 id=\"my-workflows\">My workflows</h2>\n\n<p>Given a user, it is also easy to get the workflows he <em>owns</em>. Again, I am using my URI instead of combining with a search\nfor my account, because the MyExperiment SPARQL end point is not particularly fast:</p>\n\n<script src=\"https://gist.github.com/egonw/172691.js\"></script>\n\n<p>Earlier in this series:</p>\n\n<ol>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2009/08/16/bioclipse-and-sparql-end-points.html\">Bioclipse and SPARQL end points #1: DBPedia <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n</ol>",
      "summary": "RDF and SPARQL are two really useful Open Standards. Bioclipse-RDF is a plugin for Bioclipse that provide RDF functionality, among which using remote SPARQL end points.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/myExp890.png",
      "date_published": "2009-08-21T00:00:00+00:00",
      "date_modified": "2025-10-26T00:00:00+00:00",
      "tags": ["bioclipse","rdf","foaf","myexperiment","rdf","sparql"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/1hvtk-7ka60",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/08/17/social-web-does-not-wait-for-bioclipse.html",
      "title": "The Social Web does not wait for Bioclipse... here comes Google Wave",
      "content_html": "<p><a href=\"https://wave.google.com/\">Google Wave</a> is going to change the web. It’s the end of Google Docs, and likely many other services. It’s\ngoing to be Open Source and being a Wave Provider will not be restricted to Google. This will be enough to make this a success. If\nyou haven’t watched the full video demo yet, please have a look yourself:</p>\n\n<object width=\"560\" height=\"340\"><param name=\"movie\" value=\"https://www.youtube.com/v/v_UyVmITiYQ&amp;hl=en&amp;fs=1&amp;\" />\n  <param name=\"allowFullScreen\" value=\"true\" /><param name=\"allowscriptaccess\" value=\"always\" />\n  <embed src=\"https://www.youtube.com/v/v_UyVmITiYQ&amp;hl=en&amp;fs=1&amp;\" type=\"application/x-shockwave-flash\" allowscriptaccess=\"always\" allowfullscreen=\"true\" width=\"560\" height=\"340\" wmode=\"opaque\" />\n</object>\n\n<p>I left some thoughts and notes on FriendFeed, but they got lost.</p>",
      "summary": "Google Wave is going to change the web. It’s the end of Google Docs, and likely many other services. It’s going to be Open Source and being a Wave Provider will not be restricted to Google. This will be enough to make this a success. If you haven’t watched the full video demo yet, please have a look yourself:",
      
      "date_published": "2009-08-17T00:00:00+00:00",
      "date_modified": "2026-02-21T00:00:00+00:00",
      "tags": ["google","wave"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/9zgvr-hfk92",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/08/17/bioclipse-enters-social-web.html",
      "title": "Bioclipse enters the social web",
      "content_html": "<p>The <a href=\"https://en.wikipedia.org/wiki/Open_Notebook_Science\">Open Notebook Science</a> <a href=\"http://spreadsheets.google.com/ccc?key=plwwufp30hfq0udnEmRD1aQ\">Solubility project</a>\nin particular is keen on sharing results using the <a href=\"https://en.wikipedia.org/wiki/Social_web\">Social Web</a>. Last week I reported about the plugin I wrote to access\nthe data on <a href=\"https://www.friendfeed.com/\">FriendFeed</a>.</p>\n\n<p>When someone asked last week on the <a href=\"https://taverna.sf.net/\">Taverna</a> mailing list about a Twitter <a href=\"http://twitter.com/\">node</a>,\nI was surely interested. Though this can hardly be called <em>core research</em>, I, <em>fortunately</em>, had to test\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2009/08/13/making-bioclipse-development-easier-new.html\">the new Bioclipse SDK <i class=\"fa-solid fa-recycle fa-xs\"></i></a> :)</p>\n\n<p>So, I hacked up a Twitter plugin for <a href=\"https://www.bioclipse.net/\">Bioclipse</a> in no time using <a href=\"https://www.winterwell.com/software/jtwitter.php\">JTwitter</a>\n(license:LGPL), to allow sending tweets to <a href=\"https://twitter.com/egonwillighagen\">my Twitter account <i class=\"fa-solid fa-link-slash fa-xs\"></i></a> (but not yet my\n<a href=\"https://identi.ca/chemblaics\">Identi.ca account</a>):</p>\n\n<p><img src=\"/assets/images/bioclipseTweet1.png\" alt=\"\" /></p>\n\n<p>Or, as copy/pastable script:</p>\n\n<script src=\"https://gist.github.com/169156.js\"></script>\n\n<p>And you can see it really hit Twitter <a href=\"http://search.twitter.com/search?q=bioclipse+social+feature\">here</a> and in this screenshot of my\n<a href=\"http://choqok.gnufolks.org/\">Choqok</a> client:</p>\n\n<p><img src=\"/assets/images/bioclipseTweet.png\" alt=\"\" /></p>",
      "summary": "The Open Notebook Science Solubility project in particular is keen on sharing results using the Social Web. Last week I reported about the plugin I wrote to access the data on FriendFeed.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/bioclipseTweet.png",
      "date_published": "2009-08-17T00:00:00+00:00",
      "date_modified": "2026-03-07T00:00:00+00:00",
      "tags": ["bioclipse","twitter","friendfeed"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/qtb5r-fxm04",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/08/16/bioclipse-and-sparql-end-points.html",
      "title": "Bioclipse and SPARQL end points",
      "content_html": "<p>Last week, there was a very interesting thread on the <a href=\"http://dbpedia.org/\">DBPedia</a> mailing list, on using Java for doing remote\n<a href=\"http://en.wikipedia.org/wiki/SPARQL\">SPARQL</a> queries. This was one of the features still missing in <a href=\"http://github.com/egonw/bioclipse.rdf/tree/master\">bioclipse.rdf</a>.\n<a href=\"http://dowhatimean.net/\">Richard Cyganiak</a> replied pointing the code in Jena which conveniently does this and which bioclipse.rdf is already using anyway. Next,\n<a href=\"http://iwis.cs.aau.dk/blog/4\">Fred Durao</a> even gave a full code example relieving me from any further research, resulting in\n<code class=\"language-plaintext highlighter-rouge\">sparqlRemote()</code> now implemented in the <code class=\"language-plaintext highlighter-rouge\">rdf</code> manager:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>&gt; rdf.sparqlRemote(\n  \"http://dbpedia.org/sparql\",\n  \"select distinct ?Concept where{[] a ?Concept } LIMIT 10\"\n);\n[[http://dbpedia.org/ontology/Place], [http://dbpedia.org/ontology/Area],\n[http://dbpedia.org/ontology/City], [http://dbpedia.org/ontology/River],\n[http://dbpedia.org/ontology/Road], [http://dbpedia.org/ontology/Lake],\n[http://dbpedia.org/ontology/LunarCrater],\n[http://dbpedia.org/ontology/ShoppingMall], [http://dbpedia.org/ontology/Park],\n[http://dbpedia.org/ontology/SiteOfSpecialScientificInterest]]\n</code></pre></div></div>\n\n<p>I reported earlier <a href=\"https://chem-bla-ics.linkedchemistry.info/2009/02/dbpedia-lookup-and-autocomplete-of.html\">two example SPARQL queries for chemistry <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\nwhich can now be rewritten as Bioclipse scripts:</p>\n\n<script src=\"https://gist.github.com/168582.js\"></script>\n\n<p>and</p>\n\n<script src=\"https://gist.github.com/168583.js\"></script>",
      "summary": "Last week, there was a very interesting thread on the DBPedia mailing list, on using Java for doing remote SPARQL queries. This was one of the features still missing in bioclipse.rdf. Richard Cyganiak replied pointing the code in Jena which conveniently does this and which bioclipse.rdf is already using anyway. Next, Fred Durao even gave a full code example relieving me from any further research, resulting in sparqlRemote() now implemented in the rdf manager:",
      
      "date_published": "2009-08-16T00:00:00+00:00",
      "date_modified": "2025-11-29T00:00:00+00:00",
      "tags": ["bioclipse","sparql","rdf","dbpedia"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/g2qh1-n7q51",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/08/13/making-bioclipse-development-easier-new.html",
      "title": "Making Bioclipse Development easier: the New Manager Wizard",
      "content_html": "<p>Today, <a href=\"http://jonalv.blogspot.com/\">Jonathan</a>, Carl, Arvid and I made writing managers for <a href=\"http://www.bioclipse.net/\">Bioclipse</a>\na bit easier. Plug-in development Eclipse in itself is already tricky to learn, and the use of Spring by the Bioclipse managers\nis not helping. And because very soon two new people will be starting with writing a new manager rather soon, we thought it was\ntime to lower the activation barrier a bit.</p>\n\n<p>The basic file structure of an Bioclipse manager looks like:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>net.bioclipse.foo/\n|--META-INF\n|   |--MANIFEST.MF\n|   `-- spring\n|       `-- context.xml\n|-- plugin.xml\n|-- .classpath\n|-- .project\n|-- build.properties\n`-- src\n    `-- net\n        `-- bioclipse\n            `-- foo\n                |-- Activator.java\n                `-- business\n                    |-- FooManager.java\n                    |-- FooManagerFactory.java\n                    |-- IFooManager.java\n                    |-- IJavaFooManager.java\n                    `-- IJavaScriptFooManager.java\n</code></pre></div></div>\n\n<p>That is <strong><em>twelve</em></strong> files which need to be just right. I used to copy/paste from an earlier (simple) manager.</p>\n\n<p>But we know and understand that setting up this framework is even more challenging if you have not done this\nat least 10 times before. So, today we implemented a <em>New Wizard</em> (source available from this Git repository:\n<a href=\"http://github.com/egonw/bioclipse.sdk/tree/master\">bioclipse.sdk</a>).</p>\n\n<p><img src=\"/assets/images/bioclipseSDK1.png\" alt=\"\" /></p>\n\n<p>It just asks you a project name:</p>\n\n<p><img src=\"/assets/images/bioclipseSDK2.png\" alt=\"\" /></p>\n\n<p>and a few other settings:</p>\n\n<p><img src=\"/assets/images/bioclipseSDK3.png\" alt=\"\" /></p>\n\n<h2 id=\"installing-the-bioclipse-sdk\">Installing the Bioclipse SDK</h2>\n\n<p>Installing this new plugin is fairly easy, and we have set up an <em>Update Site</em> at\n<a href=\"http://pele.farmbio.uu.se/sdk/\">http://pele.farmbio.uu.se/sdk/</a>. Just add this as Update site in Eclipse 3.4.x (which\nis still required for Bioclipse2). It depends on the <a href=\"http://www.eclipse.org/jdt/\">JDT</a> and\n<a href=\"http://www.eclipse.org/pde/\">PDE</a>, which you will likely already have installed being part of the default\nEclipse RCP release.</p>\n\n<p>Go to the <em>Software Updates</em> in the <em>Help</em> menu:</p>\n\n<p><img src=\"/assets/images/bioclipseSDK4.png\" alt=\"\" /></p>\n\n<p>and pick <em>Add Site…</em>. Enter the aforementioned update site as shown here:</p>\n\n<p><img src=\"/assets/images/bioclipseSDK5.png\" alt=\"\" /></p>\n\n<p>Then, select the Bioclipse plugin:</p>\n\n<p><img src=\"/assets/images/bioclipseSDK6.png\" alt=\"\" /></p>\n\n<p>After you hit <em>Install</em> and Eclipse install the fews tens of kBs of the plugin, the plugin should show up in\nyour installation, like it did in mine:</p>\n\n<p><img src=\"/assets/images/bioclipseSDK.png\" alt=\"\" /></p>\n\n<h2 id=\"implementation-details\">Implementation Details</h2>\n\n<p>Writing the plugin was a challenge to me, and I am happy we were doing this in a hackaton. The Bioclipse-QSAR\nproject already had a New Project wizard, but not for a new Plug-in Project. Some things are just slightly\ndifferent then. For example, it turned out that creating a <code class=\"language-plaintext highlighter-rouge\">.classpath</code> cannot be done in the regular way\n(it never showed up), and I had to dig up some internal code of the PDE. Actually, our current implementation\nis still using a few internal classes because of this:</p>\n\n<div class=\"language-java highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nc\">IClasspathEntry</span><span class=\"o\">[]</span> <span class=\"n\">entries</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"nc\">IClasspathEntry</span><span class=\"o\">[</span><span class=\"mi\">3</span><span class=\"o\">];</span>\n<span class=\"nc\">String</span> <span class=\"n\">executionEnvironment</span> <span class=\"o\">=</span> <span class=\"kc\">null</span><span class=\"o\">;</span>\n<span class=\"nc\">ClasspathComputer</span><span class=\"o\">.</span><span class=\"na\">setComplianceOptions</span><span class=\"o\">(</span>\n  <span class=\"n\">project</span><span class=\"o\">,</span>\n  <span class=\"nc\">ExecutionEnvironmentAnalyzer</span><span class=\"o\">.</span><span class=\"na\">getCompliance</span><span class=\"o\">(</span><span class=\"n\">executionEnvironment</span><span class=\"o\">)</span>\n<span class=\"o\">);</span>\n<span class=\"n\">entries</span><span class=\"o\">[</span><span class=\"mi\">0</span><span class=\"o\">]</span> <span class=\"o\">=</span> <span class=\"nc\">ClasspathComputer</span><span class=\"o\">.</span><span class=\"na\">createJREEntry</span><span class=\"o\">(</span><span class=\"n\">executionEnvironment</span><span class=\"o\">);</span>\n<span class=\"n\">entries</span><span class=\"o\">[</span><span class=\"mi\">1</span><span class=\"o\">]</span> <span class=\"o\">=</span> <span class=\"nc\">ClasspathComputer</span><span class=\"o\">.</span><span class=\"na\">createContainerEntry</span><span class=\"o\">();</span>\n<span class=\"nc\">IPath</span> <span class=\"n\">path</span> <span class=\"o\">=</span> <span class=\"n\">project</span><span class=\"o\">.</span><span class=\"na\">getProject</span><span class=\"o\">().</span><span class=\"na\">getFullPath</span><span class=\"o\">().</span><span class=\"na\">append</span><span class=\"o\">(</span><span class=\"s\">\"src/\"</span><span class=\"o\">);</span>\n<span class=\"n\">entries</span><span class=\"o\">[</span><span class=\"mi\">2</span><span class=\"o\">]</span> <span class=\"o\">=</span> <span class=\"nc\">JavaCore</span><span class=\"o\">.</span><span class=\"na\">newSourceEntry</span><span class=\"o\">(</span><span class=\"n\">path</span><span class=\"o\">);</span>\n</code></pre></div></div>\n\n<p>Ideas are most welcome on how to clean up this code, and not make it use internal, non-exported classes.\nFor the Java source files and even the <code class=\"language-plaintext highlighter-rouge\">MANIFEST.MF</code> we are using templates, though I have seen this file being\ncreated programmatically too.</p>\n\n<p>I’m sure we’ll run in some needed plumbing here and there, but that’s what update sites are for, not?\n<em>Release soon, release often</em> is an Open Source concept that works well in the Eclipse world.</p>",
      "summary": "Today, Jonathan, Carl, Arvid and I made writing managers for Bioclipse a bit easier. Plug-in development Eclipse in itself is already tricky to learn, and the use of Spring by the Bioclipse managers is not helping. And because very soon two new people will be starting with writing a new manager rather soon, we thought it was time to lower the activation barrier a bit.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/bioclipseSDK1.png",
      "date_published": "2009-08-13T00:10:00+00:00",
      "date_modified": "2009-08-13T00:10:00+00:00",
      "tags": ["eclipse","bioclipse"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/y1ka2-ad976",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/08/13/last-call-xep-0244-io-data.html",
      "title": "&quot;LAST CALL: XEP-0244 (IO Data)&quot;",
      "content_html": "<p>Today I received this email, which is a milestone for the <a href=\"http://en.wikipedia.org/wiki/Extensible_Messaging_and_Presence_Protocol\">XMPP</a>\n(aka Jabber) work Johannes, Ola and I have been working on as SOAP alternative using the intrinsically asynchronous XMPP as transport\nprotocol instead of HTTP as SOAP commonly does (see\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2008/10/31/next-generation-asynchronous.html\">Next generation asynchronous webservices <i class=\"fa-solid fa-recycle fa-xs\"></i></a>):</p>\n\n<blockquote>\n  <p>This message constitutes notice of a Last Call for comments on XEP-0244 (IO Data).</p>\n\n  <p>Abstract: This specification defines an XMPP protocol extension for handling the input to and output from a remote entity.</p>\n\n  <p>URL: <a href=\"https://www.xmpp.org/extensions/xep-0244.html\">http://www.xmpp.org/extensions/xep-0244.html</a></p>\n\n  <p>This Last Call begins today and shall end at the close of business on 2009-09-01.</p>\n\n  <p>Please consider the following questions during this Last Call and send your feedback to the standards @ xmpp.org discussion list:</p>\n\n  <ol>\n    <li>Is this specification needed to fill gaps in the XMPP protocol stack or to clarify an existing protocol?</li>\n    <li>Does the specification solve the problem stated in the introduction and requirements?</li>\n    <li>Do you plan to implement this specification in your code? If not, why not?</li>\n    <li>Do you have any security concerns related to this specification?</li>\n    <li>Is the specification accurate and clearly written?</li>\n  </ol>\n\n  <p>Your feedback is appreciated!</p>\n</blockquote>\n\n<p>There remains quite a lot to do, and you are more than welcome to join in the project. There is a Java library and we’ve\nintegrated the specs into <a href=\"http://www.bioclipse.net/\">Bioclipse</a> and <a href=\"http://taverna.sf.net/\">Taverna</a> (see\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2009/01/21/details-behind-calling-xmpp-cloud.html\">Details behind the “Calling XMPP cloud services from Taverna2” <i class=\"fa-solid fa-recycle fa-xs\"></i></a>),\nbut there is no support for BioCatalogue yet, and no libraries for other programming language yet.</p>",
      "summary": "Today I received this email, which is a milestone for the XMPP (aka Jabber) work Johannes, Ola and I have been working on as SOAP alternative using the intrinsically asynchronous XMPP as transport protocol instead of HTTP as SOAP commonly does (see Next generation asynchronous webservices ):",
      
      "date_published": "2009-08-13T00:00:00+00:00",
      "date_modified": "2026-03-07T00:00:00+00:00",
      "tags": ["xmpp","bioinfo","cheminf"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/wcms5-hqy57",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/08/07/searching-pubchem-from-within-bioclipse.html",
      "title": "Searching PubChem from within Bioclipse",
      "content_html": "<p>For the application note which we are about to submit, I was working on improving the <a href=\"http://pubchem.ncbi.nlm.nih.gov/\">PubChem</a>\n<a href=\"http://www.bioclipse.net/\">Bioclipse</a> API a bit, resulting in new <code class=\"language-plaintext highlighter-rouge\">download</code> methods:</p>\n\n<script src=\"https://gist.github.com/163281.js\"></script>\n\n<p>The search allows using <a href=\"http://pubchem.ncbi.nlm.nih.gov/help.html#PubChem_index\">PubChem Filters</a> which provides\nmany simple means to restrict the search results. For example, we can search molecules and restrict on the molecular\nweight:</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nx\">lists</span> <span class=\"o\">=</span> <span class=\"nx\">pubchem</span><span class=\"p\">.</span><span class=\"nf\">download</span><span class=\"p\">(</span><span class=\"nx\">pubchem</span><span class=\"p\">.</span><span class=\"nf\">search</span><span class=\"p\">(</span><span class=\"dl\">\"</span><span class=\"s2\">malaria 300:500[MW]</span><span class=\"dl\">\"</span><span class=\"p\">))</span>\n</code></pre></div></div>\n\n<p>Other filters you can use in pubchem.search (provided by PubChem itself), includes (with examples):</p>\n\n<ul>\n  <li><strong>[el]</strong>: <code class=\"language-plaintext highlighter-rouge\">pubchem.search(\"Au[el]\")</code></li>\n  <li><strong>[inchi]</strong>: <code class=\"language-plaintext highlighter-rouge\">pubchem.search(\"\\\"InChI=1S/CH4/h1H4\\\"[inchi]\")</code></li>\n  <li><strong>[inchikey]</strong>: <code class=\"language-plaintext highlighter-rouge\">pubchem.search(\"VNWKTOKETHGBQD-UHFFFAOYSA-N[inchikey]\")</code></li>\n  <li><strong>[mimass]</strong>: <code class=\"language-plaintext highlighter-rouge\">pubchem.search(\"375.9785:375.9786[mimass]\")</code></li>\n</ul>\n\n<p>And many, many more… see the linked Filters page.</p>\n\n<p>Now, you surely want to look at the hits, for which we use the molecular table editor:</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nx\">list</span> <span class=\"o\">=</span> <span class=\"nx\">pubchem</span><span class=\"p\">.</span><span class=\"nf\">download</span><span class=\"p\">(</span><span class=\"nx\">pubchem</span><span class=\"p\">.</span><span class=\"nf\">search</span><span class=\"p\">(</span><span class=\"dl\">\"</span><span class=\"s2\">375.9785:375.9786[mimass]</span><span class=\"dl\">\"</span><span class=\"p\">))</span>\n<span class=\"nx\">cdk</span><span class=\"p\">.</span><span class=\"nf\">saveSDFile</span><span class=\"p\">(</span><span class=\"dl\">\"</span><span class=\"s2\">/Virtual/hits.sdf</span><span class=\"dl\">\"</span><span class=\"p\">,</span> <span class=\"nx\">list</span><span class=\"p\">)</span>\n<span class=\"nx\">ui</span><span class=\"p\">.</span><span class=\"nf\">open</span><span class=\"p\">(</span><span class=\"dl\">\"</span><span class=\"s2\">/Virtual/hits.sdf</span><span class=\"dl\">\"</span><span class=\"p\">)</span>\n</code></pre></div></div>\n\n<p>Resulting in:</p>\n\n<p><img src=\"/assets/images/pubchemSearchResults.png\" alt=\"\" /></p>",
      "summary": "For the application note which we are about to submit, I was working on improving the PubChem Bioclipse API a bit, resulting in new download methods:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/pubchemSearchResults.png",
      "date_published": "2009-08-07T00:00:00+00:00",
      "date_modified": "2009-08-07T00:00:00+00:00",
      "tags": ["bioclipse","pubchem"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/hf28w-ngd39",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/08/06/chemspider-fail-1-smiles.html",
      "title": "ChemSpider fail #1: SMILES",
      "content_html": "<p>Cheminformatics is difficult, I know. But I thought I used a simple SMILES when I typed <em>C1CNCC1</em>, but <a href=\"http://www.chemspider.com/\">ChemSpider</a>\ngot it wrong :) The correct structure should be <a href=\"http://en.wikipedia.org/wiki/Pyrrolidine\">pyrrolidine</a>, not\n<a href=\"http://en.wikipedia.org/wiki/Pyrrole\">pyrrole</a>. I always mix up those names, so defaulted to ChemSpider to give me the correct name, which\n<a href=\"http://www.chemspider.com/RecordView.aspx?rid=3fdc7226-0f87-44c5-b367-f3fdbda4bbde\">ChemSpider knows</a> and where it also has the\nSMILES correct… there just seems something wrong with there search dialog:</p>\n\n<p><img src=\"/assets/images/chemSpiderFail.png\" alt=\"\" /></p>\n\n<p>There has been <a href=\"http://www.chemspider.com/blog/?p=55\">some talk about a ChemSpider Bugzilla</a>, but I don’t think this has\nmaterialized yet, and I’ll have the default to <em>info-at-chemspider-dot-com</em> …</p>",
      "summary": "Cheminformatics is difficult, I know. But I thought I used a simple SMILES when I typed C1CNCC1, but ChemSpider got it wrong :) The correct structure should be pyrrolidine, not pyrrole. I always mix up those names, so defaulted to ChemSpider to give me the correct name, which ChemSpider knows and where it also has the SMILES correct… there just seems something wrong with there search dialog:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/chemSpiderFail.png",
      "date_published": "2009-08-06T00:00:00+00:00",
      "date_modified": "2009-08-06T00:00:00+00:00",
      "tags": ["chemspider","smiles"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/b3t7r-0ge09",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/08/05/running-bioclipse-plugin-unit-tests.html",
      "title": "Running Bioclipse Plugin Unit tests: solving the XPCOM error",
      "content_html": "<p>Sometimes you can feel so stupid. For example, when the answer is right on front of you, but only after many hours you realize the right\nquestion belonging to that answer. For example, take <a href=\"http://ubuntuforums.org/showthread.php?t=920649&amp;page=2\">this answer</a>:</p>\n\n<blockquote>\n  <p>add the line: -Dorg.eclipse.swt.browser.XULRunnerPath=/usr/lib/xulrunner</p>\n</blockquote>\n\n<p>This is the problem I was trying to solve: I’m running 64bit Ubuntu Jaunty with Eclipse 3.4.2 for <a href=\"http://www.bioclipse.net/\">Bioclipse</a>\ndevelopment. The answer above is the correct answer. So, I added the line. To the <code class=\"language-plaintext highlighter-rouge\">$HOME/eclipse.ini</code> and to the eclipse command line to\nstart the program. But I still good not run Bioclipse plugin unit tests; I kept getting that stupid error:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>org.eclipse.swt.SWTError: XPCOM error -2147467262\nat org.eclipse.swt.browser.Mozilla.error(Mozilla.java :1638)\nat org.eclipse.swt.browser.Mozilla.setText(Mozilla.ja va:1861)\n</code></pre></div></div>\n\n<p>In retrospect, I was sort of asking the wrong question. I should have asked myself not why I got that XPCOM error even though I was using\nthe solution, but why running the unit tests was not affected by that solution. Realizing that, it became so obvious: the plugin unit\ntesting was using a clean environment, not based on the Eclipse environment I was working in; therefore, adding that line to my Eclipse\nenvironment did not help. Instead, I only had to that line to the Run Configuration of my plugin unit tests too:</p>\n\n<p><img src=\"/assets/images/eclipseSolution.png\" alt=\"\" /></p>\n\n<p>Surely, there are aspects to this which helped me overlook this solution. For example, I had installed Eclipse freshly yesterday, and then\nthe it worked fine. Only after installing some EMF and GEF features, it stopped working again. Bitten by the correlation/causation\npattern :(</p>",
      "summary": "Sometimes you can feel so stupid. For example, when the answer is right on front of you, but only after many hours you realize the right question belonging to that answer. For example, take this answer:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/eclipseSolution.png",
      "date_published": "2009-08-05T00:20:00+00:00",
      "date_modified": "2009-08-05T00:20:00+00:00",
      "tags": ["bioclipse","ubuntu","eclipse"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/9b1ks-02t50",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/08/05/dear-advisory-board-which-qsar.html",
      "title": "Dear Advisory Board: which QSAR descriptors would you like to see implemented in the CDK?",
      "content_html": "<blockquote>\n  <p>Dear Advisory Board,</p>\n\n  <p>Ola Spjuth has recently been working on a extensive QSAR environment in <a href=\"http://www.bioclipse.net/\">Bioclipse</a>, and molecular descriptors are\nprovided using remote services but also using the <a href=\"http://cdk.sf.net/\">CDK</a>. The CDK has a relatively large collection of QSAR descriptors,\nbut certainly not the full list discussed in the <a href=\"http://www.moleculardescriptors.eu/books/handbook.htm\">Handbook of Molecular Descriptor</a>.</p>\n\n  <p>I’m sure everyone would appreciate a few more descriptors, and I am wondering which ones you would assign priority to. So:</p>\n\n  <p>which QSAR descriptors would you like to see implemented in the CDK?</p>\n\n  <p>Looking forward to hearing from you, preferable as comment in this blog, or via email to <a href=\"http://sourceforge.net/mailarchive/forum.php?forum_name=cdk-user\">cdk-user</a>\nmailing list or directly to me otherwise. Make sure to include a full reference to the paper that describes the algorithm.</p>\n\n  <p>Kind regards,</p>\n\n  <p>Egon</p>\n</blockquote>",
      "summary": "Dear Advisory Board, Ola Spjuth has recently been working on a extensive QSAR environment in Bioclipse, and molecular descriptors are provided using remote services but also using the CDK. The CDK has a relatively large collection of QSAR descriptors, but certainly not the full list discussed in the Handbook of Molecular Descriptor. I’m sure everyone would appreciate a few more descriptors, and I am wondering which ones you would assign priority to. So: which QSAR descriptors would you like to see implemented in the CDK? Looking forward to hearing from you, preferable as comment in this blog, or via email to cdk-user mailing list or directly to me otherwise. Make sure to include a full reference to the paper that describes the algorithm. Kind regards, Egon",
      
      "date_published": "2009-08-05T00:10:00+00:00",
      "date_modified": "2009-08-05T00:10:00+00:00",
      "tags": ["cdk","qsar","bioclipse"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/wkk5p-56y41",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/08/05/jchempaint-primary-being-picked-up.html",
      "title": "JChemPaint-Primary being picked up...",
      "content_html": "<p>Backporting the <a href=\"http://chem-bla-ics.blogspot.com/2009/07/maintaining-jchempaint-primary-patch.html\">JChemPaint-Primary patch for master</a> to the cdk-1.2.x\nbranch turned out to be fairly easy, but is a major step forward as we now have a patch to extend CDK 1.2.x with rendering support again, a major thing we\nlost when going from the CDK 1.0 to the 1.2 series.</p>\n\n<p>For example, <a href=\"http://www.knime.org/\">KNIME</a> delayed moving to CDK 1.2 because of the lack of the renderer. Another project that really wanted to have the\nrenderer was Ruby CDK (<a href=\"http://github.com/sklemm/rcdk-ng/tree/master\">rcdk-ng</a>, but not the same as the <a href=\"http://cran.r-project.org/web/packages/rcdk/index.html\">R rcdk package</a>\n:), originally <a href=\"http://depth-first.com/articles/2006/10/30/agile-chemical-informatics-development-with-cdk-and-ruby-rcdk-0-3-0\">started by Rich</a>,\nnow maintained by Sebastian Klemm at the <a href=\"http://www.fhnw.ch/lifesciences/ipt\">Institute for Pharmatechnology of the University of Applied Science</a>,\nSwitzerland.</p>\n\n<p>Ruby CDK is a web environment for molecular structures, and based on the new rendering code, it looks like (copyright by Sebastian):</p>\n\n<p><img src=\"/assets/images/compound.png\" alt=\"\" /></p>",
      "summary": "Backporting the JChemPaint-Primary patch for master to the cdk-1.2.x branch turned out to be fairly easy, but is a major step forward as we now have a patch to extend CDK 1.2.x with rendering support again, a major thing we lost when going from the CDK 1.0 to the 1.2 series.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/compound.png",
      "date_published": "2009-08-05T00:00:00+00:00",
      "date_modified": "2009-08-05T00:00:00+00:00",
      "tags": ["jchempaint"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/fht9j-cv384",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/07/31/maintaining-jchempaint-primary-patch.html",
      "title": "Maintaining the JChemPaint-Primary patch",
      "content_html": "<p>Not so long ago, I finished porting the JChemPaint-Primary <a href=\"http://cdk.svn.sourceforge.net/viewvc/cdk/cdk/branches/jchempaint-primary/\">branch</a>\nto be a <a href=\"http://pele.farmbio.uu.se/cgi-bin/gitweb.cgi?p=jchempaint-primary.git;a=shortlog;h=refs/heads/bioclipse-2.1.x\">patch</a> on top of\n<a href=\"http://cdk.sf.net/\">CDK</a> <a href=\"http://github.com/egonw/cdk/tree/master\">master from our git repository</a>. This means frequent rebasing, to incorporate\nthe latest changes in the CDK <em>master</em> branch. Today, I did such a rebase, after the <a href=\"https://sourceforge.net/projects/cdk/files/cdk%20(development)/1.3.0/\">CDK 1.3.0 release</a>.\nHoping that at least some find this informative, this is what I did. Remember, that the <em>patch</em> is organized around the <em>render</em>\nand <em>control</em> modules, which is why we have so many branches, while merely in linear relationship.</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nv\">$ </span><span class=\"c\"># to download all new patches from origin to the master branch:</span>\ngit pull origin master\n<span class=\"c\"># then rebase all patches in the desired order (which makes absolutely no numerical sense)</span>\n<span class=\"c\"># 0, 1, 2, 9, 6, 7, 3, 8, 4, 5, 11, 10</span>\ngit checkout 0-other\ngit rebase master\ngit checkout 1-render\ngit rebase 0-other\ngit checkout 2-renderbasic\ngit rebase 1-render\ngit checkout 9-rendercontrol\ngit rebase 2-renderbasic\ngit checkout 6-control\ngit rebase 9-rendercontrol\ngit checkout 7-controlbasic\ngit rebase 6-control\ngit checkout 3-renderextra\ngit rebase 7-controlbasic\ngit checkout 8-controlextra\ngit rebase 3-renderextra\ngit checkout 4-renderawt\ngit rebase 8-controlextra\ngit checkout 5-rendersvg\ngit rebase 4-renderawt\ngit checkout 11-controlawt\ngit rebase 5-rendersvg\ngit checkout 10-unsorted\ngit rebase 11-controlawt\n</code></pre></div></div>\n\n<p>This give me, again a clean patch against the latest CDK master:</p>\n\n<p><img src=\"/assets/images/jcpPatch.png\" alt=\"\" /></p>",
      "summary": "Not so long ago, I finished porting the JChemPaint-Primary branch to be a patch on top of CDK master from our git repository. This means frequent rebasing, to incorporate the latest changes in the CDK master branch. Today, I did such a rebase, after the CDK 1.3.0 release. Hoping that at least some find this informative, this is what I did. Remember, that the patch is organized around the render and control modules, which is why we have so many branches, while merely in linear relationship.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/jcpPatch.png",
      "date_published": "2009-07-31T00:20:00+00:00",
      "date_modified": "2009-07-31T00:20:00+00:00",
      "tags": ["cdk","jchempaint","git","bioclipse"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/h3v19-vh903",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/07/31/things-to-check-before-you-consider.html",
      "title": "Things to check before you consider submitting a (final) CDK patch #1",
      "content_html": "<p>Mark the <em>final</em> in the above title; if you merely seek advice on your patch, feel free to send them in whatever state. However, if you bring up your patch for peer review, make sure to have gone through the following steps, in random order:</p>\n\n<ul>\n  <li>be prepared for peer review feedback</li>\n  <li>realize your code will have to be <a href=\"http://www.gnu.org/copyleft/lesser.html\">LGPL</a> or LGPL-compatible</li>\n  <li>make sure the copyright lines are properly updated (see <a href=\"http://chem-bla-ics.blogspot.com/2009/06/making-patches-attribution-copyright.html\">Making patches; Attribution; Copyright and License.</a>)</li>\n  <li>your code is fully unit tested</li>\n  <li>your code does not cause <a href=\"http://pmd.sourceforge.net/\">PMD</a> failures</li>\n  <li>your code is fully JavaDoc-umented\n    <ul>\n      <li>no empty templates</li>\n      <li>JavaDoc for every class field, method and class</li>\n      <li>use of {@link}</li>\n      <li>use of CDK tags @cdk.bug, @cdk.cite, etc</li>\n      <li>period at the end of the first sentence</li>\n      <li>…</li>\n    </ul>\n  </li>\n  <li>make sure all the code still compiles</li>\n  <li>make your code readable\n    <ul>\n      <li>80 characters per line</li>\n      <li>variable names that reflect their purpose and nature</li>\n      <li>no code complexity errors with PMD</li>\n      <li>camelCasing as custom in Java</li>\n      <li>comment your code where appropriate, explaining what your code is supposed to do</li>\n      <li>…</li>\n    </ul>\n  </li>\n</ul>\n\n<p>These are reasons to reject your patch, so better make sure to not have to be reminded of that. The build environment comes with some code to make these\nchecks easier (though not the fixing). For example, say I introduced a new module <code class=\"language-plaintext highlighter-rouge\">uff</code> (for the UFF force field):</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nv\">$ </span><span class=\"nb\">cd </span>cdk/\n<span class=\"nv\">$ </span>ant clean dist-all test-dist-all\n<span class=\"nv\">$ </span>ant <span class=\"nt\">-Dmodule</span><span class=\"o\">=</span>uff test-module\n<span class=\"nv\">$ </span>ant <span class=\"nt\">-f</span> javadoc.xml <span class=\"nt\">-Dmodule</span><span class=\"o\">=</span>uff doccheck-module\n<span class=\"nv\">$ </span>ant <span class=\"nt\">-f</span> pmd.xml <span class=\"nt\">-Dpmd</span>.test<span class=\"o\">=</span>custom <span class=\"nt\">-Dmodule</span><span class=\"o\">=</span>uff test-module\n<span class=\"nv\">$ </span>ant <span class=\"nt\">-f</span> pmd.xml <span class=\"nt\">-Dpmd</span>.test<span class=\"o\">=</span>custom <span class=\"nt\">-Dmodule</span><span class=\"o\">=</span>test-uff test-module\n</code></pre></div></div>",
      "summary": "Mark the final in the above title; if you merely seek advice on your patch, feel free to send them in whatever state. However, if you bring up your patch for peer review, make sure to have gone through the following steps, in random order:",
      
      "date_published": "2009-07-31T00:10:00+00:00",
      "date_modified": "2009-07-31T00:10:00+00:00",
      "tags": ["cdk","rse"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/7sbjg-3e122",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/07/31/new-blogs-11.html",
      "title": "New Blogs #11",
      "content_html": "<p>Not that the last two weeks has seen a boost on blog submissions to <a href=\"http://cb.openmolecules.net/\">Chemical blogspace</a>;\njust that I was not really finished with <a href=\"https://chem-bla-ics.linkedchemistry.info/2009/07/23/new-blogs-10.html\">New Blogs #10 <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.</p>\n\n<ul>\n  <li><a href=\"http://therealmoforganicsynthesis.blogspot.com/\">The Realm of Organic Synthesis</a> (<a href=\"http://cb.openmolecules.net/blog_search.php?blog_id=201\">entry in Cb</a>)</li>\n  <li><a href=\"http://carbontube.blogspot.com/\">Carbon Chemistry</a> (<a href=\"http://cb.openmolecules.net/blog_search.php?blog_id=20\">entry in Cb</a>)</li>\n  <li><a href=\"http://ochemonline.wordpress.com/\">OChemOnline</a> (<a href=\"http://cb.openmolecules.net/blog_search.php?blog_id=203\">entry in Cb</a>)</li>\n  <li><a href=\"http://polymerprocessing.blogspot.com/\">polymer processing</a> (<a href=\"http://cb.openmolecules.net/blog_search.php?blog_id=204\">entry in Cb</a>)</li>\n  <li><a href=\"http://metallome.blogspot.com/\">Metallome</a> (<a href=\"http://cb.openmolecules.net/blog_search.php?blog_id=205\">entry in Cb</a>)</li>\n  <li><a href=\"http://notimetolouse.blogspot.com/\">BKBlog</a> (<a href=\"http://cb.openmolecules.net/blog_search.php?blog_id=206\">entry in Cb</a>)</li>\n  <li><a href=\"http://sea36.blogspot.com/\">Sam’s notes</a> (<a href=\"http://cb.openmolecules.net/blog_search.php?blog_id=207\">entry in Cb</a>)</li>\n  <li><a href=\"http://zusammen.metamolecular.com/posts\">Zusammen</a> (<a href=\"http://cb.openmolecules.net/blog_search.php?blog_id=208\">entry in Cb</a>)</li>\n  <li><a href=\"http://kashthealien.wordpress.com/\">Kashthealien’s Blog</a> (<a href=\"http://cb.openmolecules.net/blog_search.php?blog_id=209\">entry in Cb</a>)</li>\n  <li><a href=\"http://chemjobber.blogspot.com/\">Chemjobber</a> (<a href=\"http://cb.openmolecules.net/blog_search.php?blog_id=210\">entry in Cb</a>)</li>\n  <li><a href=\"http://www.ch.ic.ac.uk/rzepa/blog\">Henry Rzepa</a> (<a href=\"http://cb.openmolecules.net/blog_search.php?blog_id=211\">entry in Cb</a>)</li>\n  <li><a href=\"http://www.chemcafe.net/\">ChemCafé</a> (<a href=\"http://cb.openmolecules.net/blog_search.php?blog_id=213\">entry in Cb</a>)</li>\n  <li><a href=\"http://phoscarb.blogspot.com/\">Phosphorus carbide</a> (<a href=\"http://cb.openmolecules.net/blog_search.php?blog_id=214\">entry in Cb</a>)</li>\n  <li><a href=\"http://molecularmodelingbasics.blogspot.com/\">Molecular Modeling Basics</a> (<a href=\"http://cb.openmolecules.net/blog_search.php?blog_id=215\">entry in Cb</a>)</li>\n  <li><a href=\"http://cactus.nci.nih.gov/blog\">/chemical/structure Blog</a> (<a href=\"http://cb.openmolecules.net/blog_search.php?blog_id=216\">entry in Cb</a>)</li>\n  <li><a href=\"http://cheminfonews.blogspot.com/\">Indiana University Cheminformatics News</a> (<a href=\"http://cb.openmolecules.net/blog_search.php?blog_id=217\">entry in Cb</a>)</li>\n  <li><a href=\"http://blog.accelrys.com/\">Accelrys | Blog</a> (<a href=\"http://cb.openmolecules.net/blog_search.php?blog_id=218\">entry in Cb</a>)</li>\n</ul>\n\n<p>Happy reading!</p>",
      "summary": "Not that the last two weeks has seen a boost on blog submissions to Chemical blogspace; just that I was not really finished with New Blogs #10 .",
      
      "date_published": "2009-07-31T00:00:00+00:00",
      "date_modified": "2026-03-07T00:00:00+00:00",
      "tags": ["blog","cb"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/p4pxq-3t626",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/07/23/new-blogs-10.html",
      "title": "New Blogs #10",
      "content_html": "<p>Many new blogs have appeared in <a href=\"http://cb.openmolecules.net/\">Chemical blogspace</a> since\n<a href=\"http://chemicalblogspace.blogspot.com/2008/04/new-blogs-9.html\">New Blogs #9</a>. I should really\nmake these overviews more often (I left out the new blogs which have not blogged in 2009 yet):</p>\n\n<ul>\n  <li><a href=\"http://graphenelitreviews.blogspot.com/\">Graphene Literature Reviews</a> (<a href=\"http://cb.openmolecules.net/blog_search.php?blog_id=175\">entry in Cb</a>)</li>\n  <li><a href=\"http://rajcalab.wordpress.com/\">RajcaLab Weblog</a> (<a href=\"http://cb.openmolecules.net/blog_search.php?blog_id=177\">entry in Cb</a>)</li>\n  <li><a href=\"http://graphiteworks.wordpress.com/\">Making Graphite Work</a> (<a href=\"http://cb.openmolecules.net/blog_search.php?blog_id=178\">entry in Cb</a>)</li>\n  <li><a href=\"http://infiniflux.blogspot.com/\">infiniflux!</a> (<a href=\"http://cb.openmolecules.net/blog_search.php?blog_id=179\">entry in Cb</a>)</li>\n  <li><a href=\"http://pmgb.wordpress.com/\">PMGB</a> (<a href=\"http://cb.openmolecules.net/blog_search.php?blog_id=180\">entry in Cb</a>)</li>\n  <li><a href=\"http://altchemcareers.wordpress.com/\">The road less travelled</a> (<a href=\"http://cb.openmolecules.net/blog_search.php?blog_id=182\">entry in Cb</a>)</li>\n  <li><a href=\"http://paulingblog.wordpress.com/\">PaulingBlog</a> (<a href=\"http://cb.openmolecules.net/blog_search.php?blog_id=183\">entry in Cb</a>)</li>\n  <li><a href=\"http://www.inkspotscience.com/blog\">InkSpot. Science. On Demand</a> (<a href=\"http://cb.openmolecules.net/blog_search.php?blog_id=185\">entry in Cb</a>)</li>\n  <li><a href=\"http://kilomentor.chemicalblogs.com/55_kilomentor\">kilomentor</a> (<a href=\"http://cb.openmolecules.net/blog_search.php?blog_id=189\">entry in Cb</a>)</li>\n  <li><a href=\"http://cheminsilico.blogspot.com/\">Chemistry in silico</a> (<a href=\"http://cb.openmolecules.net/blog_search.php?blog_id=190\">entry in Cb</a>)</li>\n  <li><a href=\"http://blog.rguha.net/\">So much to do, so little time</a> (<a href=\"http://cb.openmolecules.net/blog_search.php?blog_id=192\">entry in Cb</a>)</li>\n  <li><a href=\"http://www.robthejob.de/\">Der Molekülblog</a> (<a href=\"http://cb.openmolecules.net/blog_search.php?blog_id=193\">entry in Cb</a>)</li>\n  <li><a href=\"http://www.selenocisteina.info/\">Selenocisteïna</a> (<a href=\"http://cb.openmolecules.net/blog_search.php?blog_id=194\">entry in Cb</a>)</li>\n  <li><a href=\"http://chembl.blogspot.com/\">ChEMBL</a> (<a href=\"http://cb.openmolecules.net/blog_search.php?blog_id=195\">entry in Cb</a>)</li>\n  <li><a href=\"http://dailychem.blogspot.com/\">A Chemistry Question, Daily</a> (<a href=\"http://cb.openmolecules.net/blog_search.php?blog_id=196\">entry in Cb</a>)</li>\n  <li><a href=\"http://rosettadesigngroup.com/blog\">Macromolecular Modeling Blog</a> (<a href=\"http://cb.openmolecules.net/blog_search.php?blog_id=197\">entry in Cb</a>)</li>\n  <li><a href=\"http://practicalfragments.blogspot.com/\">Practical Fragments</a> (<a href=\"http://cb.openmolecules.net/blog_search.php?blog_id=198\">entry in Cb</a>)</li>\n  <li><a href=\"http://bridgeheadcarbons.blogspot.com/\">Bridgehead Carbons</a> (<a href=\"http://cb.openmolecules.net/blog_search.php?blog_id=200\">entry in Cb</a>)</li>\n</ul>",
      "summary": "Many new blogs have appeared in Chemical blogspace since New Blogs #9. I should really make these overviews more often (I left out the new blogs which have not blogged in 2009 yet):",
      
      "date_published": "2009-07-23T00:00:00+00:00",
      "date_modified": "2009-07-23T00:00:00+00:00",
      "tags": ["cb","chemistry","blog"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/5sxn7-4va27",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/07/20/updating-my-bioclipseqsar-fork-with.html",
      "title": "Updating my bioclipse.qsar fork with Ola&apos;s main branch",
      "content_html": "<p><a href=\"http://github.com/\">GitHub</a> makes forking cheap, and I have a <a href=\"http://github.com/egonw/bioclipse.qsar/tree/master\">fork</a> of the\n<a href=\"http://github.com/olas/bioclipse.qsar/tree/master\">bioclipse.qsar</a> repository (see\n<a href=\"http://chem-bla-ics.blogspot.com/2009/07/bioclipse-moving-to-github-cia-hooks.html\">Bioclipse moving to GitHub: CIA hooks enabled</a>),\nso that I can easily share my patches with Ola for review. Ola can review them and apply them back into his main version.</p>\n\n<p>I was wondering how I could bring my fork synchronized with Ola’s version again, and found the answer in this guide on GitHub.\nIt turns out all I have to do is, though this is locally:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nv\">$ </span>git remote add olas git://github.com/olas/bioclipse.qsar.git\n<span class=\"nv\">$ </span>git pull olas master\n</code></pre></div></div>\n\n<p>This gets me into the following state:</p>\n\n<p><img src=\"/assets/images/githubUpdatingMyFork.png\" alt=\"\" /></p>\n\n<p>This <a href=\"http://www.kernel.org/pub/software/scm/git/docs/gitk.html\">gitk</a> output show that my local <em>master</em> branch is\nidentical to Ola’s master <em>branch</em> on GitHub, while both are three commits ahead of my current <em>master</em> branch at GitHub.</p>\n\n<p>Right after this, I updated my fork at GitHub with a simple <code class=\"language-plaintext highlighter-rouge\">git push</code>, resulting in this gitk output:</p>\n\n<p><img src=\"/assets/images/githubUpdatingMyFork1.png\" alt=\"\" /></p>",
      "summary": "GitHub makes forking cheap, and I have a fork of the bioclipse.qsar repository (see Bioclipse moving to GitHub: CIA hooks enabled), so that I can easily share my patches with Ola for review. Ola can review them and apply them back into his main version.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/githubUpdatingMyFork.png",
      "date_published": "2009-07-20T00:00:00+00:00",
      "date_modified": "2009-07-20T00:00:00+00:00",
      "tags": ["bioclipse","git","github"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/qcswn-40p26",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/07/17/eln-vendor-open-source-stuff-just-works.html",
      "title": "ELN vendor: &quot;The Open Source stuff just works better&quot;",
      "content_html": "<p>Simon Coles is CTO of <a href=\"http://www.amphora-research.com/\">Amphora Research Systems</a> (a company I do not know) and in the\nbusiness of <a href=\"http://en.wikipedia.org/wiki/Electronic_lab_notebook\">Electronic Lab Notebooks</a>. I know nothing about their\nproducts but would like to propagate the <a href=\"http://elnblog.com/2009/07/use-of-open-source-in-commercial-eln-products/\">statements he just made on Open Source</a>\nin reply to a <a href=\"http://www.linkedin.com/groupAnswers?viewQuestionAndAnswers=&amp;discussionID=5153766&amp;gid=1148517\">question on LinkedIn</a>\n(btw, <a href=\"http://www.linkedin.com/in/egonw\">my LinkedIn account</a>):</p>\n\n<blockquote>\n  <p>We use a lot of Open Source components in our products, and I know we’re not alone.</p>\n</blockquote>\n\n<p>He also gives why they do so, quoting from the full arguments in his blog:</p>\n\n<ul>\n  <li><em>The Open Source stuff just works better</em></li>\n  <li><em>Support is better</em></li>\n  <li><em>Licensing issues go away</em></li>\n  <li><em>It is dramatically cheaper for our customers to deploy</em></li>\n  <li><em>We have much more latitude in deployment options</em></li>\n</ul>\n\n<p>I am not sure if this involves open source cheminformatics, but asked about that… the whole article is worth reading.</p>",
      "summary": "Simon Coles is CTO of Amphora Research Systems (a company I do not know) and in the business of Electronic Lab Notebooks. I know nothing about their products but would like to propagate the statements he just made on Open Source in reply to a question on LinkedIn (btw, my LinkedIn account):",
      
      "date_published": "2009-07-17T00:00:00+00:00",
      "date_modified": "2009-07-17T00:00:00+00:00",
      "tags": ["eln","opensource"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/6c3ha-gmy93",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/07/15/bioclipse-moving-to-github-cia-hooks.html",
      "title": "Bioclipse moving to GitHub: CIA hooks enabled",
      "content_html": "<p>Following the CDK and <a href=\"http://chem-bla-ics.blogspot.com/2009/07/jchempaint-primary-moving-to-git.html\">JChemPaint Primary</a>,\n<a href=\"http://www.bioclipse.net/\">Bioclipe</a> moved to Git just after the <a href=\"http://bioclipse.blogspot.com/2009/07/biocipse-20-released.html\">2.0.0 release</a>.</p>\n\n<p>We decided to split up the repositories, and have one <a href=\"http://mndoci.com/blog/2009/05/07/benevolent-dicators-and-scientific-collaboration/\">benevolent dictator</a>,\nor <a href=\"https://chem-bla-ics.linkedchemistry.info/2009/06/21/dr-whos-of-life-sciences.html\">dr. Who <i class=\"fa-solid fa-recycle fa-xs\"></i></a>, for each repository who will maintain\nthe plugins defined in the repository and coordinate development:</p>\n\n<ul>\n  <li><a href=\"http://github.com/olas/bioclipse.core/tree/master\">bioclipse.core</a></li>\n  <li><a href=\"http://github.com/olas/bioclipse.statistics/tree/master\">bioclipse.statistics</a></li>\n  <li><a href=\"http://github.com/olas/bioclipse.ds/tree/master\">bioclipse.ds</a></li>\n  <li><a href=\"http://github.com/olas/bioclipse.qsar/tree/master\">bioclipse.qsar</a></li>\n  <li><a href=\"http://github.com/egonw/bioclipse.cheminformatics/tree/master\">bioclipse.cheminformatics</a></li>\n  <li><a href=\"http://github.com/egonw/bioclipse.rdf/tree/master\">bioclipse.rdf</a></li>\n  <li><a href=\"http://github.com/masak/bioclipse.bioinformatics/tree/master\">bioclipse.bioinformatics</a></li>\n</ul>\n\n<p>Several plugins are still in the SVN world, but a good deal is now Git-ready. BTW, this move also adds several new accounts\nto watch on GitHub (see Rich’ <a href=\"http://depth-first.com/articles/2009/07/03/seventeen-github-accounts-to-watch-in-cheminformatics\">17 GitHub accounts to watch on Cheminformatics</a>).</p>\n\n<p><a href=\"http://www.github.com/\">GitHub</a> turns out to be our big friend here, not <a href=\"http://www.sf.net/\">SourceForge</a>, which only supports one\nGit repository. GitHub recently must have added hooks recently, but I am really happy to see those. The above Bioclipse repositories\nhave hooks turned on for <a href=\"http://cia.vc/stats/project/bioclipse.core\">CIA</a> (so that commit messages end up on our #bioclipse IRC\nchannel) and email (as indicated by the green color):</p>\n\n<p><img src=\"/assets/images/githubHooks.png\" alt=\"\" /></p>\n\n<p>The splitting up, was rather interesting indeed. We wanted to keep the complete commit history, but still reduce the git repositories\nconsiderably. This means removing history of the plugins which should not end up in the repository. Git allows this! Git rules! This\ntime, <a href=\"http://www.kernel.org/pub/software/scm/git/docs/git-filter-branch.html\">git filter-branch</a> is our friend and there are\nbasically two options: constructive and destructive. The first copied bit by bit plugins from the old to the new repository.\nThe second one does the opposite, and removed bit by bit stuff you do not want. Depending on the ratio of plugins you want to\nkeep and those you want to remove, either solution is more appropriate. I have summarized the git commands I used in detail on\n<a href=\"http://wiki.bioclipse.net/index.php?title=Git_Development#Making_your_own_feature_Git_repository\">this Bioclipse wiki page</a>.</p>",
      "summary": "Following the CDK and JChemPaint Primary, Bioclipe moved to Git just after the 2.0.0 release.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/githubHooks.png",
      "date_published": "2009-07-15T00:00:00+00:00",
      "date_modified": "2026-03-20T00:00:00+00:00",
      "tags": ["git","bioclipse","github","sourceforge"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/vrevg-x5008",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/07/08/jchempaint-primary-moving-to-git.html",
      "title": "JChemPaint-Primary moving to Git",
      "content_html": "<p>I knew it was going to be painful, but making the <a href=\"http://cdk.svn.sourceforge.net/viewvc/cdk/cdk/branches/jchempaint-primary/\">jchempaint-primary branch</a>\na proper patch to the CDK master branch <strong>is</strong> painful. I am working my way towards setting up <a href=\"http://pele.farmbio.uu.se/cgi-bin/gitweb.cgi?p=jchempaint-primary.git;a=summary\">a git repository</a>\n(<strong>IMPORTANT</strong>: these patches are <strong>not</strong> final yet, and their history <strong>will</strong> change, as I am <em>rebasing</em> regularly to make cleaner patches! Making\ncopies is save, but please hold of any forking and/or branching on top of it until it is final. Thanx.) for the patch, with split ups of the various\nparts into reviewable blobs:</p>\n\n<p><img src=\"/assets/images/jcpGit.png\" alt=\"\" /></p>\n\n<p>As you can see (when you click on the image to enlarge it), I have more or less finished the first drafts of the patch sets (see\n<a href=\"https://sourceforge.net/apps/mediawiki/cdk/index.php?title=JChemPaint_Primary_Patches\">this wiki page</a>) 0-other, 1-render, 2-renderbasic,\n9-rendercontrol, and 6-control. The last one does not actually compile properly yet, as I need to abstract an IRenderer interface first.</p>\n\n<p>There are several patch sets that I am still porting, but I hope to finish that this week, after which I’ll continue working on the new\n<code class=\"language-plaintext highlighter-rouge\">IEdit</code> framework in the controller modules recently set up by Arvid.</p>\n\n<p>It will take some time before these patches actually get submitted for review, as there is quite some PMD, DocCheck and unit testing\nwork to be done, as is clear from the <a href=\"http://pele.farmbio.uu.se/nightly-jcp/\">Nightly running on the SVN branch</a>.</p>\n\n<p>Finally, I like to note that this git repository collapses a lot of work done by developers at both Uppsala University (Arvid, Ola and me)\nand the EBI (<a href=\"http://gilleain.blogspot.com/\">Gilleain</a>, Stefan and now Mark). While the above git history will not reflect those contributions,\nyou can recover this information from the <a href=\"http://chem-bla-ics.blogspot.com/2009/06/making-patches-attribution-copyright.html\">copyright headers</a>.\nI also like to thank Lars and Sam for their valuable testing!</p>",
      "summary": "I knew it was going to be painful, but making the jchempaint-primary branch a proper patch to the CDK master branch is painful. I am working my way towards setting up a git repository (IMPORTANT: these patches are not final yet, and their history will change, as I am rebasing regularly to make cleaner patches! Making copies is save, but please hold of any forking and/or branching on top of it until it is final. Thanx.) for the patch, with split ups of the various parts into reviewable blobs:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/jcpGit.png",
      "date_published": "2009-07-08T00:00:00+00:00",
      "date_modified": "2009-07-08T00:00:00+00:00",
      "tags": ["jchempaint","git","cdk"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/qj1xp-mas31",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/07/07/bioclipse-jchempaint-2.html",
      "title": "Bioclipse-JChemPaint #2",
      "content_html": "<p>Recently, I blogged about <a href=\"http://chem-bla-ics.blogspot.com/2009/06/bioclipse-jchempaint.html\">Bioclipse-JChemPaint</a> of the imminent Bioclipse 2.0.0\nrelease, a complete rewrite of the <a href=\"http://www.bioclipse.net/\">Bioclipse</a> application as published in doi:<a href=\"https://doi.org/10.1186/1471-2105-8-59\">10.1186/1471-2105-8-59</a>.\nI also <a href=\"http://chem-bla-ics.blogspot.com/2009/05/bioclipse-beta5-really-last-one-now.html\">blogged</a> about the feature to\nbrowse large <a href=\"http://en.wikipedia.org/wiki/Chemical_table_file\">MDL SF files</a> (Bioclipse 2.1 will have support one or more CML\nconventions for chemical tables). Ola did some <a href=\"http://bioclipse.blogspot.com/2009/07/working-with-large-sdfiles-in-bioclipse.html\">profiling on processing SD files</a>,\nbut also notes that such may be more suitable for the StructureDB</p>\n\n<p>Browsing a large set of structures with there properties gives a quick overview of the data set. It also makes bugs shallow,\nsuch as the one shown below found when I was browsing the <a href=\"http://www.chemblog.org/\">StarLite</a> database:</p>\n\n<p><img src=\"/assets/images/jcpRoundingError.png\" alt=\"\" /></p>\n\n<p>The MDL molfile for structure 55 is available from the <a href=\"http://pele.farmbio.uu.se/cgi-bin/bugzilla/attachment.cgi?id=78&amp;action=edit\">bug report I filed against Bioclipse</a>.</p>",
      "summary": "Recently, I blogged about Bioclipse-JChemPaint of the imminent Bioclipse 2.0.0 release, a complete rewrite of the Bioclipse application as published in doi:10.1186/1471-2105-8-59. I also blogged about the feature to browse large MDL SF files (Bioclipse 2.1 will have support one or more CML conventions for chemical tables). Ola did some profiling on processing SD files, but also notes that such may be more suitable for the StructureDB",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/jcpRoundingError.png",
      "date_published": "2009-07-07T00:10:00+00:00",
      "date_modified": "2009-07-07T00:10:00+00:00",
      "tags": ["bioclipse","jchempaint","cdk","chembl"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1186/1471-2105-8-59", "doi": "10.1186/1471-2105-8-59"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/zd7a9-m1c11",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/07/07/knowledge-management-ontologies.html",
      "title": "Knowledge Management - Ontologies",
      "content_html": "<p>Chemistry has a bit of background in ontologies, and <a href=\"http://wwmm.ch.cam.ac.uk/blogs/adams/?p=245\">ChemAxiom</a> is certainly not the\nfirst (though I think it is rather promising…). Three years ago I gave a presentation at the CUBIC (now existing as\n<a href=\"http://www.linkedin.com/groups?gid=2071414\">LinkedIn Alumni Group</a>), which is not so extensive, but does have a few interesting\ncitations on the use of ontologies in chemistry on slide 16:</p>\n\n<p><a href=\"https://doi.org/10.5281/zenodo.2667497\"><img src=\"/assets/images/2006_slide.png\" alt=\"Image of the first slide with the title, name, and date\" /></a></p>",
      "summary": "Chemistry has a bit of background in ontologies, and ChemAxiom is certainly not the first (though I think it is rather promising…). Three years ago I gave a presentation at the CUBIC (now existing as LinkedIn Alumni Group), which is not so extensive, but does have a few interesting citations on the use of ontologies in chemistry on slide 16:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/2006_slide.png",
      "date_published": "2009-07-07T00:00:00+00:00",
      "date_modified": "2026-01-23T00:00:00+00:00",
      "tags": ["ontology","chemistry"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.5281/ZENODO.2667498", "doi": "10.5281/ZENODO.2667498"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/gq1ew-ehp14",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/07/02/bioclipse-for-cdk-developers-2.html",
      "title": "Bioclipse for CDK Developers #2",
      "content_html": "<p>I <a href=\"https://chem-bla-ics.linkedchemistry.info/2009/02/15/bioclipse-for-cdk-developers-1.html\">reported earlier <i class=\"fa-solid fa-recycle fa-xs\"></i></a> how <a href=\"http://www.bioclipse.net/\">Bioclipse</a>\nallows you to use a script to perceive atom types for the content of the JChemPaint RCP editor. This functionality is now available in the\noutline, and indicates directly if Bioclipse (and the underlying <a href=\"http://cdk.sf.net/\">CDK</a>) understands the chemistry you are drawing. In a\nfuture Bioclipse release, these <em>problems</em> will be visualized more prominently, likely using the Errors/Problems Views available from Eclipse, or otherwise.</p>\n\n<p><img src=\"/assets/images/atomTyping.png\" alt=\"\" /></p>",
      "summary": "I reported earlier how Bioclipse allows you to use a script to perceive atom types for the content of the JChemPaint RCP editor. This functionality is now available in the outline, and indicates directly if Bioclipse (and the underlying CDK) understands the chemistry you are drawing. In a future Bioclipse release, these problems will be visualized more prominently, likely using the Errors/Problems Views available from Eclipse, or otherwise.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/atomTyping.png",
      "date_published": "2009-07-02T00:00:00+00:00",
      "date_modified": "2026-03-07T00:00:00+00:00",
      "tags": ["cdk","bioclipse"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/dyxec-j7q96",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/06/30/making-patches-attribution-copyright.html",
      "title": "Making patches; Attribution; Copyright and License.",
      "content_html": "<p>I have discussed this in the past on mailing lists, but realized yesterday that I need to strengthen the message a bit more.\nJust to remove confusion. The below is extracted from an email I sent this morning to the\n<a href=\"http://lists.sourceforge.net/lists/listinf/cdk-user\">cdk-user mailing list</a>, but I’m sure\nyou can apply this to any other open source project. (Disclaimer: I have not studied international law, and the below cannot\nbe used as legal advice. Like you would have! Hahahaha! Let it be pointers :)</p>\n\n<p><strong>1. What is that copyright/license header in that .java source file?</strong><br /></p>\n\n<p>This header looks something along the lines of:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>/* Copyright (C) 2000-2007  Christoph Steinbeck\n *               2001-2007,2009  Egon Willighagen\n *\n * Contact: cdk-devel@lists.sourceforge.net\n *\n * This program is free software; you can redistribute it and/or\n * modify it under the terms of the GNU Lesser General Public License\n * as published by the Free Software Foundation; either version 2.1\n * of the License, or (at your option) any later version.\n * All we ask is that proper credit is given for our work, which includes\n * - but is not limited to - adding the above copyright notice to the beginning\n * of your source code files, and to any copyright notice that you may distribute\n * with programs based on this work.\n *\n * This program is distributed in the hope that it will be useful,\n * but WITHOUT ANY WARRANTY; without even the implied warranty of\n * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the\n * GNU Lesser General Public License for more details.\n *\n * You should have received a copy of the GNU Lesser General Public License\n * along with this program; if not, write to the Free Software\n * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.\n */\n</code></pre></div></div>\n\n<p>This header has two major pieces: 1. who has the copyright on this file; 2. what is the license that makes is Open Source.</p>\n\n<p>This is crucial information, and the <a href=\"http://cdk.sf.net/\">CDK</a> has a bad history in keeping track of primarily 1. Many source\nfiles, actually, still list the Chemistry Development Kit Project has copyright owner, which is a false statement, as the CDK\nProject is not a legal entity and in many countries therefore not allowed to own copyright. Moreover, none of the contributors\never signed a legal paper to re-assign copyright to this project anyway, like we do with many of our ACS papers.</p>\n\n<p><strong>2. But doesn’t the Git/SVN/CVS history have this copyright owner information?</strong><br /></p>\n\n<p>It is true that the Git, SVN and CVS histories of the CDK source code contain a lot of information on this. However, this is\nnot helping, because this information is lost when we distribute our source code. And when others distribute our source code\n(e.g. <a href=\"http://packages.debian.org/search?keywords=libcdk-java\">Debian</a> and <a href=\"http://packages.ubuntu.com/search?keywords=libcdk-java\">Ubuntu</a>),\nthey have no means of keeping track of this.</p>\n\n<p>Therefore, we must properly annotate our source files with this information.</p>\n\n<p><strong>3. Why is there a contact email?</strong><br /></p>\n\n<p>Because the CDK project should still be the single point of entry regarding the source code.</p>\n\n<p><strong>4. What’s with all those years listed in those copyright ownership lines?</strong><br /></p>\n\n<p>Seriously, I have no clue, but every serious project does it, so there must be some legal reason. Indeed, it sounds logical\nto only list the years when you actually made changes. In the above hypothetical header, I made changes between 2001 and this\nyear, but not in 2008. It might have to do with proper establishment of the end of the copyright period; see below.</p>\n\n<p><strong>5. When must I add my name to the copyright lines?</strong><br /></p>\n\n<p>When you made a non-trivial contribution to the source code, and you must ensure you do this in each such contribution.\nBy adding your name, you make clear that:</p>\n\n<p>a. you are the original author of that contribution (and not someone else)<br />\nb. you release the software under the given license</p>\n\n<p>This is the information (re)distributors need to know if they are working within the boundaries of law.</p>\n\n<p><strong>6. Should I complain about people not adding this information in their patches when reviewing those contributions?</strong><br /></p>\n\n<p>Yes, you should.</p>\n\n<p><strong>7. What about all those files that still list Copyright (C) The CDK Project?</strong><br /></p>\n\n<p>File bug reports. For each file, we need to read the commit history, extract the authors of all non-trivial contributions\nand when those contributions were made, and update the copyright lines.</p>\n\n<p><strong>8. Must the header always list the LGPL as license?</strong><br /></p>\n\n<p>No. The LGPL is our license choice, but if you used code under another (compatible) license written by someone else, that\noriginal license applies, and that original license you need to provide in the header.</p>\n\n<p>Additionally, do not forget to list the original copyright owners.</p>\n\n<p><strong>9. Can I rewrite GPL C++/C code as LGPL for the CDK?</strong><br /></p>\n\n<p>Not entirely related to the above, but relevant. I once asked the FSF about this, and rewriting a piece of code in another\nlanguage is <em>not</em> a clean room implementation and does, therefore, not erase original copyright ownership not license\napplicability. Therefore, we cannot base CDK implementations on, e.g. GPL-ed licensed C++ code, such as in\n<a href=\"http://openbabel.sf.net/\">OpenBabel</a>.</p>\n\n<p><strong>10. How can I use GPL code in the CDK?</strong><br /></p>\n\n<p>You cannot. All code depending GPL-code, must be GPL too. There is the <a href=\"http://code.google.com/p/chemojava/updates/list\">ChemoJava</a>\nproject to hold GPL-licensed CDK-based code, which has a number of classes that use the CDK library and depend on GPL\nlibraries too.</p>\n\n<p>Alternatively, you can try to convince the original authors to relicense. A good recent example here, is the UFF force field\nimplementation of OpenBabel which was relicensed (or dual licensed) as LGPL and now also available from\n<a href=\"http://www.jmol.org/\">Jmol</a>. Incidentally,\nthis is a reason why it is important to have those copyright lines include the email address, so that in the future people\ncan contact the original authors of code. In a far future, this is even used to decide when copyright no longer applies,\nbecause the original authors are dead by then.</p>\n\n<p>(Thanx to <a href=\"http://baoilleach.blogspot.com/\">Noel</a> for replying to the mailing list!)</p>",
      "summary": "I have discussed this in the past on mailing lists, but realized yesterday that I need to strengthen the message a bit more. Just to remove confusion. The below is extracted from an email I sent this morning to the cdk-user mailing list, but I’m sure you can apply this to any other open source project. (Disclaimer: I have not studied international law, and the below cannot be used as legal advice. Like you would have! Hahahaha! Let it be pointers :)",
      
      "date_published": "2009-06-30T00:10:00+00:00",
      "date_modified": "2009-06-30T00:10:00+00:00",
      "tags": ["opensource","cdk"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/m0zsx-9ta16",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/06/30/jchempaint-hack-session-at-uppsala.html",
      "title": "JChemPaint hack session at Uppsala",
      "content_html": "<p>Arvid and I had a meeting on the ControllerHub refactoring, to make it modular to the bone. Actually,\nit is the <a href=\"http://cdk.svn.sourceforge.net/viewvc/cdk/cdk/branches/jchempaint-primary/src/main/org/openscience/cdk/controller/IChemModelRelay.java?view=log\">IChemModelRelay</a>\nthat needs refactoring. This is what we wrote down:</p>\n\n<p><img src=\"/assets/images/jcpHackWeek.JPG\" alt=\"\" /></p>\n\n<p>In this picture you can see our priorities (1,2 for Arvid, and A,B for me).</p>\n\n<p><strong>1</strong> The above mentioned relay will get refactored to have each method a separate class, which at the same time will provide\nthe undo/redo functionality. We might even have undo at a scripting level :)</p>\n\n<p><strong>2</strong> Second item on Arvid’s list is extending the mouse relay to handle key modifiers, an unfortunate omission in that design.\nThis is needed to Ctrl- and Apple’s Command-based selection approach.</p>\n\n<p><strong>A</strong> To easy the footprint reduction of JChemPaint, we are going to split the current <em>render</em> module into <strong>render</strong> and <strong>renderextra</strong>.\nThe second may see future split ups, and both may see name changes, but the first go will split them according to requiring an\nIAtomContainer data model or a IChemModel data model.</p>\n\n<p>This will help us clean up dependencies, by forcing us to have the core functionality not pull in, for example, reaction functionality.\nAdditionally, isotope related rendering will go into <em>renderextra</em> so removing the dependency on the IsotopeFactory and the associated\n800kB-sized <em>isotope.xml</em> data file.</p>\n\n<p>This will not immediately help the applet class partitioning and indexing, but it will help us to keep a sane overview of all the stuff\nwe have around.</p>\n\n<p><strong>B</strong> The goal is to merge this work on JChemPaint back into the CDK libary, so that we again have a CDK version with a\nfully working editing environment as did CDK 1.0. However, that requires to code to be stable, which includes full unit\ntesting, no PMD violations, complete JavaDoc. However, as I wrote this morning to the\n<a href=\"https://lists.sourceforge.net/lists/listinfo/cdk-jchempaint\">cdk-jchempaint mailing list</a>:</p>\n\n<blockquote>\n  <p>But, there is a lot of clean up to do. I counted 497 missing test\nmethods; 326 PMD violations, and saw a lot of missing JavaDoc. This\nmeans, that the current patch is pretty messed up indeed, and we are a\nlong way away from seeing a merge with CDK master :(</p>\n</blockquote>",
      "summary": "Arvid and I had a meeting on the ControllerHub refactoring, to make it modular to the bone. Actually, it is the IChemModelRelay that needs refactoring. This is what we wrote down:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/jcpHackWeek.JPG",
      "date_published": "2009-06-30T00:00:00+00:00",
      "date_modified": "2009-06-30T00:00:00+00:00",
      "tags": ["cdk","jchempaint"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/wsp0d-q1894",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/06/26/michel-dumontier-at-uppsala-university.html",
      "title": "Michel Dumontier at Uppsala University",
      "content_html": "<p><a href=\"http://dumontierlab.com/\">Michel</a> visits <a href=\"http://www.farmbio.uu.se/researchgroup.php?fg=1\">our group</a> this week and gave a\nvery exciting talk yesterday on the role of ontologies in drug discover. This being ongoing research in our group too, the talk was\nwell received by the audience (which was not too large, because after mid-summer, Uppsala has holiday). First the first time, I\nmicroblogged a talk on <a href=\"http://twitter.com/egonwillighagen\">my twitter account</a> (using the\n<a href=\"http://search.twitter.com/search?q=dumontieratuppsala\">#dumontieratuppsala</a> tag). I have not got a XSLT ready to convert the\nrelevant items into a nice HTML snippet for embedding in this blog, but will try to do that later. Meanwhile, I also made a\nfew bookmarks here and there, which are available from <a href=\"http://delicious.com/egonw/%23dumontieratupssala\">Delicious</a>.</p>\n\n<p>The rest of the day, we talked about various ontology, bio- and cheminformatics related stuff. We looked at\n<a href=\"http://sadiframework.org/\">SADI</a>, <a href=\"http://www.bioclipse.net/\">Bioclipse</a> (and my RDF extension, see\n<a href=\"http://delicious.com/tag/bioclipse+gist+manager:rdf\">these JavaScripts</a>),\n<a href=\"http://www.bio2rdf.org/\">Bio2RDF</a>, <a href=\"http://rdf.openmolecules.net/?InChI=1/CH4/h1H4\">rdf.openmolecules.net</a>,\nand <a href=\"http://www.openlinksw.com/dataspace/dav/wiki/Main/VOSRDF\">Virtuoso</a>.</p>",
      "summary": "Michel visits our group this week and gave a very exciting talk yesterday on the role of ontologies in drug discover. This being ongoing research in our group too, the talk was well received by the audience (which was not too large, because after mid-summer, Uppsala has holiday). First the first time, I microblogged a talk on my twitter account (using the #dumontieratuppsala tag). I have not got a XSLT ready to convert the relevant items into a nice HTML snippet for embedding in this blog, but will try to do that later. Meanwhile, I also made a few bookmarks here and there, which are available from Delicious.",
      
      "date_published": "2009-06-26T00:00:00+00:00",
      "date_modified": "2009-06-26T00:00:00+00:00",
      "tags": ["bio2rdf","rdf","bioclipse"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/7dzaj-08p59",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/06/21/dr-whos-of-life-sciences.html",
      "title": "The Dr Who&apos;s of Life Sciences",
      "content_html": "<p>Peter recently wrote up a model of how several <a href=\"http://en.wikipedia.org/wiki/Blue_Obelisk\">Blue Obelisk</a> (please contribute to the page!)\nprojects changed in history: <a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=2059\">The Doctor Who Model of Open Source</a>. This was later\npicked up by <a href=\"http://opendotdotdot.blogspot.com/2009/06/doctor-who-model-of-open-source.html\">Glyn</a> and then by\n<a href=\"http://tech.slashdot.org/story/09/06/19/1326254/The-Doctor-Who-Model-of-Open-Source?from=rss\">Slashdot</a> (second time Peter got that\nfame; that’s one of the advantages of working at a well-known institute, instead of something like Uppsala University. Beside\n<a href=\"http://www.bioclipse.net/\">Bioclipse</a>, <a href=\"http://www.gromacs.org/\">GROMACS</a> and the <a href=\"http://cdk.sf.net/\">CDK</a>,\n<a href=\"http://en.wikipedia.org/wiki/MySQL_AB\">MySQL AB</a> actually has a headquarters here.) Thanx to\n<a href=\"http://www.steinbeck-molecular.de/steinblog/index.php/2009/06/19/geek-knighthood-blue-obelisk-on-slashdot/comment-page-1/#comment-8809\">Chris who pointed me</a>\nto the Slashdot coverage.</p>\n\n<p>Now, the several blogs and the Slashdot item contain interesting discussions on whether the ‘Dr Who’ model is the best model of how\nopen source projects can evolve. Fact is, at least, that the model does not describe a new phenomenon; Peter merely describes a how\nthe Blue Obelisk deals with the limited resources we have in cheminformatics, and that the succession of project leaders ensures both\nthe scientists interest (who are generally not payed for development or, $Foo forbid, maintenance of scientific data analysis methods)\nas well as the project itself. This makes life science open source different from most pure-IT projects: open source academic software\nis always something on the side.</p>\n\n<p>So, when Miguel Howard turned to <a href=\"http://www.jmol.org/\">Jmol</a>, he had seemingly unlimited resources to work on Jmol and he had great\nideas and made them work: Miguel is the father of the now so popular Jmol applet with scripting functionality. It did mean that the\nintegration with the CDK I worked on, as planned by the original Jmol author <a href=\"http://www.openscience.org/blog/\">Dan Gezelter</a>,\n<a href=\"http://www.steinbeck-molecular.de/steinblog/\">Christoph</a> and me in 2000: the CDK data model was too slow (it is amazing how fast\nJmol is, without using accelerated graphics! See this Nature Preceedings paper:\nDOI:<a href=\"http://dx.doi.org/10.1038/npre.2007.50.1\">10.1038/npre.2007.50.1</a>). My attention was better spend on the CDK.</p>\n\n<p>Now, if the need arises, and the current Jmol head Bob looses interest or time, I’ll be available to take over again. That is\nless likely to happen for an older Dr. Who actor. Several Slashdot commenters also pointed out that the model also matches the\n‘drummer-in-a-band’ model. I guess, or lead-singer… This moved the discussion of what the model exactly models. Peter writes:</p>\n\n<blockquote>\n  <p>“Instead the Blue Obelisk community seems to have evolved a “Doctor Who” model. You’ll recall that every few years something\nfatal happens to the Doctor and you think he is going to die and there will never be another series. Then he regenerates.\nThe new Doctor has a different personality, a different philosophy (though always on the side of good). It is never clear\nhow long any Doctor will remain unregenerated or who will come after him. And this is a common theme in the Blue Obelisk.”</p>\n</blockquote>\n\n<p>This brings me back to the earlier observation I wrote down: science is different, and Peter is right when he says\n<em>you think he is going to die and there will never be another series</em>. This thought is justified for many open source science\nprojects; in Glyn’s blog there is the remark of lack of data, but I think if someone would count of the number of dead open\nsource science projects, I think the outcome will be that the fear is highly justified.</p>\n\n<p>This is likely also the power of the <a href=\"http://en.wikipedia.org/wiki/Blue_Obelisk\">Blue Obelisk</a>: it creates a lively and\nrewarding community with equally minded people, forming an eco-system where the individual projects can flourish. Maybe\nsomeone can come of with a good metaphore for the Blue Obelisk, matching the Dr Who model? BBC comes to mind: is the BBC\nan eco-system where small TV series can survive?</p>\n\n<p>Anyways, my father used to watch Dr Who, and being compared to Dr Who is much more rewarding than being compared to a\ndrummer in some band.</p>",
      "summary": "Peter recently wrote up a model of how several Blue Obelisk (please contribute to the page!) projects changed in history: The Doctor Who Model of Open Source. This was later picked up by Glyn and then by Slashdot (second time Peter got that fame; that’s one of the advantages of working at a well-known institute, instead of something like Uppsala University. Beside Bioclipse, GROMACS and the CDK, MySQL AB actually has a headquarters here.) Thanx to Chris who pointed me to the Slashdot coverage.",
      
      "date_published": "2009-06-21T00:00:00+00:00",
      "date_modified": "2009-06-21T00:00:00+00:00",
      "tags": ["openscience","blue-obelisk","cdk","jmol"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1038/NPRE.2007.50.1", "doi": "10.1038/NPRE.2007.50.1"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/4ctw6-teq86",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/06/18/bioclipse-jchempaint.html",
      "title": "Bioclipse-JChemPaint",
      "content_html": "<p>The Uppsala and EBI CDK-teams have been working hard on finishing the rewrite of <a href=\"http://jchempaint.sf.net/\">JChemPaint</a> I started with Niels earlier.\nWhile the EBI-team focused on the applet (and Swing application), the Uppsala team, obviously, focused on the SWT side, for integration into\n<a href=\"http://bioclipse.net/\">Bioclipse</a>. The new JChemPaint is reaching a useful state, and below is a quick update screenshot something Arvid has been\nworking on:</p>\n\n<p><img src=\"/assets/images/Picture_4.png\" alt=\"\" /></p>\n\n<p>It shows a periodic table which allows you to drag any element type onto the JChemPaint drawing area. It is using regular drag and drop functionality,\nallowing you to create any arbitrary pseudo atom too. This also paves the way for a template system, allowing you to drag-n-drop fragments onto an\nactive JChemPaint editor.</p>",
      "summary": "The Uppsala and EBI CDK-teams have been working hard on finishing the rewrite of JChemPaint I started with Niels earlier. While the EBI-team focused on the applet (and Swing application), the Uppsala team, obviously, focused on the SWT side, for integration into Bioclipse. The new JChemPaint is reaching a useful state, and below is a quick update screenshot something Arvid has been working on:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/Picture_4.png",
      "date_published": "2009-06-18T00:00:00+00:00",
      "date_modified": "2009-06-18T00:00:00+00:00",
      "tags": ["bioclipse","jchempaint","cdk"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/dv8xh-5dk63",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/06/17/no-pdfs-really-do-suck.html",
      "title": "No, PDFs really do suck!",
      "content_html": "<p>A typical blog by Peter MR made (again), <a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=2102\">The ICE-man: Scholary HTML not PDF</a>,\nthe point of why PDF is to data what a hamburger is to a cow, in reply to a blog by Peter SF, <a href=\"http://ptsefton.com/2009/06/11/trip-report-visit-to-microsoft.htm#id3\">Scholarly HTML</a>.</p>\n\n<p>This lead to a <a href=\"http://friendfeed.com/petermr/767254d7/ice-man-scholary-html-not-pdf\">discussion on FriendFeed</a>.\nA couple of misconceptions:</p>\n\n<p><strong>“But how are we going to cite without paaaaaaaaaaaage nuuuuuuuuuuumbers?”</strong><br />\nWe don’t. Many online-only journals can do without; there is DOI. And if that is not enough, the legal business has means of\nidentifying paragraphs, etc, which should provide us with all the methods we could possibly need in science.</p>\n\n<p><strong>Typesetting of PDFs, in most journals, is superior than HTML, which is why I prefer to read a PDF version if it is available. It is nicer to the eyes.</strong><br />\nUmmm… this is supposed to be Science, not a California Glossy. It seems that\n<a href=\"http://shirleywho.wordpress.com/2009/05/11/an-open-letter-to-oprah/\">pretty looks is causing major body count</a> in\nthe States. Otherwise, HTML+CSS can likely beat any pretty looks of PDF, or at least match it.</p>\n\n<p><strong>As I seem to be the only physicist/mathematician who comments on these sort of things, I feel like a broken record,\nbut math support in browsers currently sucks extremely badly and this is a primary reason why we will continue to use\nPDF for quite some time.</strong><br />\nHTML+<a href=\"http://www.w3.org/Math/\">MathML</a> is well established, and default FireFox browsers have no problem showing mathematical\nequations. For years, the <a href=\"http://en.wikipedia.org/wiki/Blue_Obelisk\">Blue Obelisk</a> <a href=\"http://qsar.sourceforge.net/dicts/qsar-descriptors/index.xhtml\">QSAR descriptor ontology</a>\nhas been using such a set up for years. If you use TeX to author your equations, you can\n<a href=\"http://silas.psfc.mit.edu/mathmltalk/\">convert it to HTML</a> too.</p>\n\n<p><strong>We can mine the data from the PDF text.</strong> Theoretically, yes. Practically, it is money down the drain. PDF is particularly\nnasty here, as it breaks words at the end of a line, and even can make words consist of unlinked series of characters\npositioned at (x,y). PDF, however, can contains a lot of metadata, but that is merely a hack, and unneeded workaround.\nWorse, hardly used regarding chemistry. PDF can contain PNG images which can contain CML; the tools are there, but not\nused, and there are more efficient technologies anyway.</p>\n\n<p>I, for one, agree with Peter on PDF: it really suck as scientific communication medium.</p>",
      "summary": "A typical blog by Peter MR made (again), The ICE-man: Scholary HTML not PDF, the point of why PDF is to data what a hamburger is to a cow, in reply to a blog by Peter SF, Scholarly HTML.",
      
      "date_published": "2009-06-17T00:00:00+00:00",
      "date_modified": "2009-06-17T00:00:00+00:00",
      "tags": ["publishing"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/6xj82-qag38",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/05/21/bioclipse-beta5-really-last-one-now.html",
      "title": "Bioclipse beta5: really the last one now",
      "content_html": "<p><a href=\"http://bioclipse.blogspot.com/2009/05/bioclipse-20-beta5-released.html\">Bioclipse beta 5</a> was just released by Ola, and the team had\nsome bad days over an <a href=\"http://chemicalrcp.blogspot.com/2009/05/eclipse-spring-export-problem-uses.html\">problem</a> that happened after\na merge of <a href=\"http://jonalv.blogspot.com/2009/04/i-just-came-up-with-yet-another-way-of.html\">an important branch</a> regarding the\nmanagers we are using to allow scripting of Bioclipse.</p>\n\n<p><img src=\"/assets/images/starlite.png\" alt=\"\" /></p>\n\n<p>In the end, <a href=\"http://jonalv.blogspot.com/\">Jonathan</a> found a workaround for the problem, even though we still have no clue what was\nthe exact cause. Additionally, Arvid implemented one of the last missing features of the JChemPaint editor, being the ability to\ndraw bonds in any arbitrary direction, and the ability to create a new bond to an already existing atom. This really seems to be the\nlast beta before the 2.0 release candidate. So, head over to <a href=\"http://sourceforge.net/projects/bioclipse\">SourceForge</a> as it is\nnow time to report this smaller things you like to see improved.</p>\n\n<p>The beta has many really nice features, and we will have much to write about in later blogs. One thing I particularly like, is the\nsupport for (really) large SD files; the above screenshot is a 800MB file with <a href=\"http://chembl.blogspot.com/\">StarLite structures</a>,\nthough we also tried files larger than 1GB. There is a <em>2D-Structure</em> tab, which will zoom in on the structure in a regular\nJChemPaint editor.</p>\n\n<p>For the Bioclipse scripting, I can just encourage you to browse this blog for example scripts.</p>\n\n<p>There are many extensions currently being developed, around the globe, which will extend the basic Bioclipse workbench towards\nparticular use cases. While surely these will get blogged about in detail later, I do want to briefly mention them. In the works\nare features for: QSAR, Decision Support, Speclipse (NMR and MS spectrum handling), Resource Description Framework, a StructureDatabase,\nMetabolomics, Medea (MS spectrum and fragmentation prediction), XMPP, and much more.</p>\n\n<p>Focus of Bioclipse 2.1 will be towards bioinformatics: sequence handling, BLAST, better PDB/CIF support for protein structures,\nand who knows.</p>",
      "summary": "Bioclipse beta 5 was just released by Ola, and the team had some bad days over an problem that happened after a merge of an important branch regarding the managers we are using to allow scripting of Bioclipse.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/starlite.png",
      "date_published": "2009-05-21T00:00:00+00:00",
      "date_modified": "2009-05-21T00:00:00+00:00",
      "tags": ["chembl","bioclipse","cdk"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/hvqxm-xnq47",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/05/18/open-data-license-rights-aggregation.html",
      "title": "Open Data: license, rights, aggregation, clean interfaces?",
      "content_html": "<p>A <a href=\"http://blog.openwetware.org/scienceintheopen/2009/05/15/a-breakthrough-on-data-licensing-for-public-science/\">recent post</a> by\n<a href=\"http://blog.openwetware.org/scienceintheopen/\">Cameron</a> on his visit last week with <a href=\"http://wwmm.ch.cam.ac.uk/blogs/adams/\">Nico</a>,\n<a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/\">Peter</a> and <a href=\"http://wwmm.ch.cam.ac.uk/blogs/downing/\">Jim</a>, discussed\n<a href=\"http://en.wikipedia.org/wiki/Open_data\">Open Data</a> licensing. This lead to an interesting discussion on these matters, and\nquestions by me on why people care so much about only public domain data (or licensed with\n<a href=\"http://www.opendatacommons.org/licenses/pddl/1.0/\">PDDL</a> or <a href=\"http://wiki.creativecommons.org/CC0\">CC0</a>).</p>\n\n<p>Open licensing for data has not as much matured as for software, and international law seems to be more confusing about the\nissues. I guess that is because data aggregation has been around for way before the computer era. The PDDL and CC0 both try to\novercome this fuzziness. But there is another issue we need to keep in mind. A lot of useful Data was aggregated and made Open\n<em>before</em> these licenses came about, and use, for example, the <a href=\"http://www.gnu.org/copyleft/fdl.html\">GNU FDL</a> license, such as\nthe <a href=\"http://www.nmrshiftdb.org/\">NMRShiftDB</a>.</p>\n\n<h2 id=\"rights\">Rights</h2>\n\n<p>Right now, there are two Open Data camps, much like the BSD-vs-GPL wars in Open Source: one that believes in waiving any rights\non the Data, indicating that facts are free; others that believe that data must be protected to not be eaten by big companies\nand lost to the community (e.g. <a href=\"http://friendfeed.com/onssolubility/cf6afd52/should-we-contribute-solubility-data-to\">the WolframAlpha arragnements are suspect</a>).</p>\n\n<p>Of course, both camps are not that far apart, and both believe Open is important. Interestingly, there are some noteworthy\ndifferences with the Open Source wars. I see parallels between the two, which details an important difference: Open Source has\nalgorithms (uncopyrightable) and implementations (copyrightable); Open Data has Data (uncopyrightable) and aggregation\n(copyrightable). Open Source talks mostly about the implementation, not the algorithm; it’s Open Source, not Open Algorithms\nafter all. In cheminformatics it is even often the case that the algorithms are not even specified and that there only truly\nis source.</p>\n\n<p>However, Open Data in title does not make distinction. Data is fairly cheap and acquisition can be automated and computerized;\nAggregation, on the other hand, requires human involvement: curation and thinking about data models, etc. This is where added\nvalue is. Consider an assigned NMR spectrum or the raw data returned from the spectrometer.</p>\n\n<p>It is this added value that people want to protect, not the data itself. I think.</p>\n\n<h2 id=\"aggregation\">Aggregation</h2>\n\n<p>One important argument that tend to show up when people argument for PDDL and CC0 is that it makes data aggregation easier.\nThis is most certainly true: if you can do whatever you like with a blob of data, that also means aggregate with any other\nblob of data. However, copyleft licenses, like the GNU FDL, require the aggregation to have a compatible license too. It is\nthe license incompatibilities that make this impossible. Or … ?</p>\n\n<p>Open Source has matured to such a point that it is fairly clear what the intended behaviour is, regarding derivatives. An\naggregation of software (typically refered to as a distribution) is only a derivative under certain conditions. This makes\nit possible to run proprietary software on top of GNU/Linux, which uses the GNU GPL but does not require software to run on\ntop of it to be GPL too. Unless… unless, not a clear well-defined interface has been used, indicating a strong dependency.\nNow, surely, these things have not been confirmed to match actual law in court, but the intentions are clear.</p>\n\n<h2 id=\"clean-data-interfaces\">Clean Data Interfaces?</h2>\n\n<p>Now, if we would translate this to Open Data, would there be the equivalent of a clean interface? Can we build a data\ndistribution with data of various licenses? I think we can! I am not a lawyer and please consider this an invitation\nto discuss these matters…</p>\n\n<p>Let’s start simlpe… if I put a GNU FDL image in this blog, by linking to it with a open, free, clean HTML interface\n(<code class=\"language-plaintext highlighter-rouge\">&lt;img src=\"\"/&gt;</code>), would that make my blog GNU FDL too? I don’t think so. Surely, I would need to list copyright owner,\nand actually would be required to put the GNU FDL in my blog too, but hope linking to the license text would suffice too.\n(Let’s skip fair use at this moment, and assume the use goes beyond fair use). Question: am I not using a clean interface,\nand would this not make the image’s license no infect my blog?</p>\n\n<p>A more difficult example, consider <a href=\"http://rdf.openmolecules.net/\">rdf.openmolecules.net</a>, which surely aggregated facts,\nincluding data from the NMRShiftDB and <a href=\"http://dbpedia.org/\">DBPedia</a>. I am using a unique identifiers here, the NMRShiftDB\ncompound ID, and the DBPedia URL, which surely is GNU FDL, and use this to make a <code class=\"language-plaintext highlighter-rouge\">&lt;owl:sameAs&gt;</code> statement. Again, please do\nnot consider fair use, which this certainly is. But, let’s say I put in some more DBPedia and NMRShiftDB data in this\naggregation. The GNU FDL data on rdf.openmolecules.net would be separate RDF blocks, with proper dc:license, dc:author\nannotation. But the block would be part of a larger aggregation. The clean interface here is\n<a href=\"http://en.wikipedia.org/wiki/Resource_description_framework\">Resource Description Framework</a>.</p>\n\n<p>This second case does not only affect my rdf.openmolecules.net website, but, for example, <a href=\"http://bio2rdf.org/\">bio2rdf.org</a>\nis also in the same situation and aggregated and distribute DBPedia’s GNU FDL data (e.g.\n<a href=\"http://bio2rdf.org/searchns/dbpedia/hexokinase\">hexinanose</a>. Does that make the\nwhole of bio2rdf database GNU FDL. They too use RDF as clean interface.</p>\n\n<h2 id=\"call-for-discussion\">Call for Discussion</h2>\n\n<p>Despite what one of the two camps like to see, the mere fact of added value when making data aggregations will keep\ncopyleft license stay around, and instead of trying to convince everyone of the virtues of PDDL- and CC0-like licenses,\nwe should think about to what extend it really matters.</p>\n\n<p>I can do my data analysis with data sources of various licenses. I can search and retrieve data from various sources\nwith various licenses. What obstacles are really there that disallow us to do science? Do the data interfaces we have\nnow not provide enough technical means to address the license incompatibilities? They have in Open Source, why would\nthat not apply to Open Data too?</p>",
      "summary": "A recent post by Cameron on his visit last week with Nico, Peter and Jim, discussed Open Data licensing. This lead to an interesting discussion on these matters, and questions by me on why people care so much about only public domain data (or licensed with PDDL or CC0).",
      
      "date_published": "2009-05-18T00:00:00+00:00",
      "date_modified": "2009-05-18T00:00:00+00:00",
      "tags": ["opendata","nmrshiftdb","rdf","dbpedia","bio2rdf"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/a00pn-pjt64",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/05/15/chemspider-and-rsc-where-next.html",
      "title": "ChemSpider and the RSC: where next?",
      "content_html": "<p>Last Monday the <a href=\"http://www.indiana.edu/~cheminfo/network.html\">CHMINF-L</a> brought the news to me that <a href=\"http://chemspider.com/\">ChemSpider</a>\nwas acquired by the <a href=\"http://rsc.org/\">RSC</a> (not the <a href=\"http://rsc.org/AboutUs/News/PressReleases/2009/ChemSpider.asp\">press release</a>).\n<a href=\"http://twitter.com/\">Twitter</a> (<a href=\"http://twitter.com/egonwillighagen/statuses/1763364256\">my Twitter post</a>) and\n<a href=\"http://friendfeed.com/\">FriendFeed</a> (see <a href=\"http://friendfeed.com/search?q=chemspider+rsc&amp;friends=egonw\">this series</a>).</p>\n\n<p>Reading blogs used to be to get the news, but this has changed. Still, blogging gives more freedom, more space. Blogs did soon\nfollow. <a href=\"http://www.steinbeck-molecular.de/steinblog/\">Chris</a> was the first to\n<a href=\"http://www.steinbeck-molecular.de/steinblog/index.php/2009/05/11/chemspider-bought-by-the-royal-society-of-chemistry/\">blog about it</a>:</p>\n\n<blockquote>\n  <p>This is great news and I’m confident that it will be a move to even more openess in chemistry and cheminformatics.\nIt will also allow the RSC to use Tony fantastic tools for even more semantic markup of articles. I’m looking forward\nto talking to everyone about the implications. For now, congratulations, Tony, and congratulations, RSC, for this\ngreat deal.</p>\n</blockquote>\n\n<p>I think <a href=\"http://www.chemspider.com/blog/\">Tony</a> himself <a href=\"http://www.chemspider.com/blog/the-royal-society-of-chemistry-acquires-chemspider.html\">was next</a>:</p>\n\n<blockquote>\n  <p>This is good for us for a number of reasons. Specifically we will no longer have to deal with our very significant\nresource limitations but more than that it lends credence and validation to the work that we have been doing over the\npast 2 years. It seems so long ago now but ChemSpider was first unveiled to the world at the ACS Spring meeting 2007.\nWhat began then only as a hobby project is now being recognized by the community as one of the primary resources for\ninternet chemistry.</p>\n</blockquote>\n\n<p>His network and insight in required data curation is what I think made ChemSpider a success.</p>\n\n<p>Later views followed from <a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=1891\">Peter</a>, <a href=\"http://prospect.rsc.org/blogs/cw/?p=1829\">Rich</a> and\n<a href=\"http://blogs.nature.com/thescepticalchymist/2009/05/the_rsc_and_chemspider.html\">Neil</a>. I have only congratulations,\nwhich I hereby join, and expect that only future will tell us if our cheers are correct.</p>\n\n<h2 id=\"where-next\">Where next?</h2>\n<p>As Tony indicated, the deal will practically mean better support for ChemSpider in terms of computing power, making if\neasier for them to make upgrades, hence better uptime, etc. It may, indeed, also mean more data, provided from RSC archives,\nas <a href=\"http://blogs.nature.com/thescepticalchymist/2009/05/the_rsc_and_chemspider.html\">suggested by Neil</a>. More practically, I\ncan imagine seeing Project Prospect contributing <em>InChI-DOI</em> links to ChemSpider very soon.</p>\n\n<p>And this would be one of the two recommendations I have to ChemSpider at this moment:</p>\n\n<ol>\n  <li>now linked to a publisher, and with both text mining efforts and expertise, focus on these InChI-DOI links, and, in\nparticular, focus on those InChI-DOI links which involve papers that describe measured properties of the molecules;</li>\n  <li>with the increased support, finish the Open Data work done, by making it easy for people to download the\nChemSpider-OpenData subset. This, I believe, is crucial for a wider adoption in the OpenData community, as OpenData\nwhich is practically made impossible to easily download is not Open enough. Previous priorities may have been focused\non setting up a viable commercial alternative, but with the RSC backing, this can no longer be a reason to not do this.</li>\n</ol>\n\n<p>Once more, congratulations to the ChemSpider-team and the involved RSC people, and very much looking forward to seeing\nhow this will change chemistry for the better!</p>",
      "summary": "Last Monday the CHMINF-L brought the news to me that ChemSpider was acquired by the RSC (not the press release). Twitter (my Twitter post) and FriendFeed (see this series).",
      
      "date_published": "2009-05-15T00:00:00+00:00",
      "date_modified": "2009-05-15T00:00:00+00:00",
      "tags": ["cheminf","chemspider","opendata"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/347vb-pfw04",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/05/11/which-feature-must-i-install-for.html",
      "title": "Which feature must I install for org.eclipse.zest?",
      "content_html": "<p>Dear lazyweb!</p>\n\n<p>I have been trying to figure out which Eclipse 3.4 feature I must install from the update site to get the <em>org.eclipse.zest</em> plugin\nin my environment.</p>\n\n<p>I installed the <a href=\"http://www.eclipse.org/gef/zest/\">Zest</a> feature (which I am going to use to\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2008/11/18/solubility-data-in-bioclipse-1.html\">visualize an RDF network <i class=\"fa-solid fa-recycle fa-xs\"></i></a>), but my workspace still\ncomplained that I did not have the plugin.</p>\n\n<p>Maybe I should rerun <em>Set Target Platform</em> for our product, but I and others in the Bioclipse development community have been\nwondering how we can know what feature to install via the Software Updates… to get a particular plugin on your machine?</p>\n\n<p>Looking forward to hearing from you,</p>\n\n<p>Kind regards,</p>\n\n<p>Egon</p>",
      "summary": "Dear lazyweb!",
      
      "date_published": "2009-05-11T00:10:00+00:00",
      "date_modified": "2026-01-13T00:00:00+00:00",
      "tags": ["bioclipse","eclipse","rdf"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/8bpsk-rb857",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/05/11/pubchem-cdk.html",
      "title": "PubChem-CDK",
      "content_html": "<p><a href=\"http://pele.farmbio.uu.se/pubchem/\">PubChem-CDK</a> is a project that runs <a href=\"http://cdk.sf.net/\">CDK</a> code on the <a href=\"http://pubchem.ncbi.nlm.nih.gov/\">PubChem</a> data.\nAs we speak, a groovy script reads about 100 PubChem Compounds XML entries per second into the database. Mind you, not the SDF they distribute which uses a\ncustom extension to overcome the limits of the real MDL SDF format.</p>\n\n<p>Right now, it has run the atom type perception algorithm on about 1M compounds, and has a pretty good coverage of the <em>organic chemistry</em> domain. I will\nanalyze the <a href=\"http://pele.farmbio.uu.se/pubchem/atomtyping/\">results</a> statistically soon, but will likely use this data first to add some missing atom types\nto CDK 1.2.x. BTW, did you know only <strong><em>three</em></strong> <a href=\"http://pele.farmbio.uu.se/pubchem/atomtyping/?element=C\">carbon atoms failed</a>?\nA C<sup>4-</sup> (CID:<a href=\"http://pele.farmbio.uu.se/pubchem/?cid=156031\">156031</a>), a C<sup>3+</sup> (CID:<a href=\"http://pele.farmbio.uu.se/pubchem/?cid=161072\">161072</a>),\nand a C<sup>2+</sup> (CID:<a href=\"http://pele.farmbio.uu.se/pubchem/?cid=161073\">161073</a>). Would your cheminformatics library know what their properties are?</p>\n\n<p>It is really nice way of browsing PubChem, BTW. For example, did you know there are several boron compounds which have a substructure [N+]-[B+]-[N+]? Yes,\nthree positive charges, <em>next</em> to each other? For example (CID:<a href=\"http://pele.farmbio.uu.se/pubchem/?cid=3612285\">3612285</a>):</p>\n\n<p><img src=\"/assets/images/CID3612285.png\" alt=\"\" /></p>\n\n<p>Well, neither did I. How was it synthesised? What are the spectral properties? How do they stabilise it? What magic counter ion? PubChem, unfortunately,\ndoes not have links to primary literature, and there is no free source for that available. A failure in chemistry. The source points to\n<a href=\"http://cdb.ics.uci.edu/index.htm\">ChemDB</a>, but the <a href=\"http://cdb.ics.uci.edu/cgibin/ChemicalDetailWeb.psp?chemical_id=5257702\">entry in that database</a> does\nnot shed light on this either.</p>\n\n<p>Anyway, more on this later. Much more, as I plan to run many CDK algorithms on this code.</p>",
      "summary": "PubChem-CDK is a project that runs CDK code on the PubChem data. As we speak, a groovy script reads about 100 PubChem Compounds XML entries per second into the database. Mind you, not the SDF they distribute which uses a custom extension to overcome the limits of the real MDL SDF format.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/CID3612285.png",
      "date_published": "2009-05-11T00:00:00+00:00",
      "date_modified": "2009-05-11T00:00:00+00:00",
      "tags": ["cdk","pubchem"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/1zbfd-d2h61",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/05/08/nomination-cdk-for-sf-community-award.html",
      "title": "Nomination of the CDK for a SF Community Award",
      "content_html": "<p>Just hit the below icon, and use 140 characters why you think the CDK should be nominated. Please select the Best Project for Academia,\nand we might make a chance:</p>\n\n<p><a href=\"http://sourceforge.net/community/cca09/nominate/?project_name=The%20Chemistry%20Development%20Kit&amp;project_url=http://sourceforge.net/projects/cdk/\"><img src=\"/assets/images/cca_nominate.png\" alt=\"\" /></a></p>",
      "summary": "Just hit the below icon, and use 140 characters why you think the CDK should be nominated. Please select the Best Project for Academia, and we might make a chance:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/cca_nominate.png",
      "date_published": "2009-05-08T00:00:00+00:00",
      "date_modified": "2009-05-08T00:00:00+00:00",
      "tags": ["cdk","sourceforge"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/cprj5-xnk74",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/05/07/me-is-having-bioclipsexmpprdf-fun.html",
      "title": "/me is having Bioclipse/XMPP/RDF fun",
      "content_html": "<p>Johannes asked me what the <a href=\"http://en.wikipedia.org/wiki/Lipinski%27s_Rule_of_Five\">Lipinski Rule of Five</a> for\n<a href=\"http://en.wikipedia.org/wiki/Farnesol\">farnesol</a> is, in reply to the <a href=\"http://pele.farmbio.uu.se/xmpp-services/index.php/Services\">matching XMPP cloud service</a>.\nThanx to <a href=\"http://dbpedia.org/\">DBPedia</a> for providing a machine readable form of the wikipedia entry:</p>\n\n<p>Here’s the solution (yes, suboptimal, but since we were hacking on XMPP support in\n<a href=\"http://www.bioclipse.net/\">Bioclipse</a>) which shows the structure in JChemPaint and <a href=\"http://www.jmol.org/\">Jmol</a>\nas bonus (gist:<a href=\"http://gist.github.com/107507\">107507</a>):</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"c1\">// Today, Johannes challenged me to use Bioclipse and XMPP to calculate the Lipinski Rule of Five for</span>\n<span class=\"c1\">// http://en.wikipedia.org/wiki/Farnesol</span>\n<span class=\"nx\">query</span> <span class=\"o\">=</span> <span class=\"dl\">\"</span><span class=\"s2\">Farnesol</span><span class=\"dl\">\"</span>\n\n<span class=\"c1\">// Zero: clear the console</span>\n<span class=\"nx\">js</span><span class=\"p\">.</span><span class=\"nf\">clear</span><span class=\"p\">();</span>\n<span class=\"nx\">js</span><span class=\"p\">.</span><span class=\"nf\">print</span><span class=\"p\">(</span><span class=\"dl\">\"</span><span class=\"s2\">Query: </span><span class=\"dl\">\"</span> <span class=\"o\">+</span> <span class=\"nx\">query</span> <span class=\"o\">+</span> <span class=\"dl\">\"</span><span class=\"se\">\\n</span><span class=\"dl\">\"</span><span class=\"p\">);</span>\n\n<span class=\"c1\">// One: connect to the XMPP hive, and make contact with the CDK descriptor service here in Uppsala</span>\n<span class=\"nx\">xmpp</span><span class=\"p\">.</span><span class=\"nf\">connect</span><span class=\"p\">();</span>\n<span class=\"kd\">var</span> <span class=\"nx\">service</span> <span class=\"o\">=</span> <span class=\"nx\">xmpp</span><span class=\"p\">.</span><span class=\"nf\">getService</span><span class=\"p\">(</span><span class=\"dl\">\"</span><span class=\"s2\">descriptor.ws1.bmc.uu.se</span><span class=\"dl\">\"</span><span class=\"p\">);</span>\n<span class=\"nx\">service</span><span class=\"p\">.</span><span class=\"nf\">discoverSync</span><span class=\"p\">(</span><span class=\"mi\">5000</span><span class=\"p\">);</span>\n<span class=\"nx\">service</span><span class=\"p\">.</span><span class=\"nf\">getFunctions</span><span class=\"p\">();</span>\n<span class=\"kd\">var</span> <span class=\"nx\">func</span> <span class=\"o\">=</span> <span class=\"nx\">service</span><span class=\"p\">.</span><span class=\"nf\">getFunction</span><span class=\"p\">(</span><span class=\"dl\">\"</span><span class=\"s2\">LipinskiRuleOfFive</span><span class=\"dl\">\"</span><span class=\"p\">);</span>\n\n<span class=\"c1\">// Two: take advantage of RDF, DBPedia</span>\n<span class=\"nx\">store</span> <span class=\"o\">=</span> <span class=\"nx\">rdf</span><span class=\"p\">.</span><span class=\"nf\">createStore</span><span class=\"p\">()</span>\n<span class=\"nx\">rdf</span><span class=\"p\">.</span><span class=\"nf\">importURL</span><span class=\"p\">(</span><span class=\"nx\">store</span><span class=\"p\">,</span> <span class=\"dl\">\"</span><span class=\"s2\">http://dbpedia.org/data/</span><span class=\"dl\">\"</span> <span class=\"o\">+</span> <span class=\"nx\">query</span> <span class=\"o\">+</span> <span class=\"dl\">\"</span><span class=\"s2\">.rdf</span><span class=\"dl\">\"</span><span class=\"p\">)</span>\n<span class=\"nx\">rdf</span><span class=\"p\">.</span><span class=\"nf\">importURL</span><span class=\"p\">(</span><span class=\"nx\">store</span><span class=\"p\">,</span> <span class=\"dl\">\"</span><span class=\"s2\">http://dbpedia.org/data/</span><span class=\"dl\">\"</span> <span class=\"o\">+</span> <span class=\"nx\">query</span> <span class=\"o\">+</span> <span class=\"dl\">\"</span><span class=\"s2\">/section1/Chembox_Identifiers.rdf</span><span class=\"dl\">\"</span><span class=\"p\">)</span>\n\n<span class=\"c1\">// Three: run the SPARQL query and extract the SMILES from the List&lt;List&lt;String&gt;&gt;, and remove</span>\n<span class=\"c1\">// the '@en' suffix</span>\n<span class=\"kd\">var</span> <span class=\"nx\">sparql</span> <span class=\"o\">=</span> <span class=\"dl\">\"</span><span class=\"s2\">PREFIX dbprop: &lt;http://dbpedia.org/property/&gt; SELECT ?o WHERE { ?s dbprop:smiles ?o }</span><span class=\"dl\">\"</span>\n<span class=\"nx\">smiles</span> <span class=\"o\">=</span> <span class=\"nx\">rdf</span><span class=\"p\">.</span><span class=\"nf\">sparql</span><span class=\"p\">(</span><span class=\"nx\">store</span><span class=\"p\">,</span> <span class=\"nx\">sparql</span><span class=\"p\">).</span><span class=\"nf\">get</span><span class=\"p\">(</span><span class=\"mi\">0</span><span class=\"p\">).</span><span class=\"nf\">get</span><span class=\"p\">(</span><span class=\"mi\">0</span><span class=\"p\">)</span>\n<span class=\"nx\">smiles</span> <span class=\"o\">=</span> <span class=\"nx\">smiles</span><span class=\"p\">.</span><span class=\"nf\">substring</span><span class=\"p\">(</span><span class=\"mi\">0</span><span class=\"p\">,</span> <span class=\"nx\">smiles</span><span class=\"p\">.</span><span class=\"nf\">length</span><span class=\"p\">()</span><span class=\"o\">-</span><span class=\"mi\">3</span><span class=\"p\">)</span>\n\n<span class=\"c1\">// Four: create a CML document</span>\n<span class=\"nx\">propane</span> <span class=\"o\">=</span> <span class=\"nx\">cdk</span><span class=\"p\">.</span><span class=\"nf\">fromSMILES</span><span class=\"p\">(</span><span class=\"nx\">smiles</span><span class=\"p\">);</span>\n<span class=\"nx\">js</span><span class=\"p\">.</span><span class=\"nf\">print</span><span class=\"p\">(</span><span class=\"dl\">\"</span><span class=\"s2\">Molecule SMILES: </span><span class=\"dl\">\"</span> <span class=\"o\">+</span> <span class=\"nx\">smiles</span> <span class=\"o\">+</span> <span class=\"dl\">\"</span><span class=\"se\">\\n</span><span class=\"dl\">\"</span><span class=\"p\">);</span>\n\n<span class=\"c1\">// Five: call the function</span>\n<span class=\"nx\">result</span> <span class=\"o\">=</span> <span class=\"nx\">func</span><span class=\"p\">.</span><span class=\"nf\">invokeSync</span><span class=\"p\">(</span><span class=\"nx\">propane</span><span class=\"p\">.</span><span class=\"nf\">getCML</span><span class=\"p\">(),</span> <span class=\"mi\">900000</span><span class=\"p\">);</span>\n<span class=\"nx\">cmlReturned</span> <span class=\"o\">=</span> <span class=\"nx\">xmpp</span><span class=\"p\">.</span><span class=\"nf\">toString</span><span class=\"p\">(</span><span class=\"nx\">result</span><span class=\"p\">);</span>\n\n<span class=\"c1\">// Six: tune the CML so that the Bioclipse CML reader is happy</span>\n<span class=\"nx\">cmlReturned</span> <span class=\"o\">=</span> <span class=\"nx\">cmlReturned</span><span class=\"p\">.</span><span class=\"nf\">replace</span><span class=\"p\">(</span><span class=\"dl\">\"</span><span class=\"s2\">xsd:int</span><span class=\"dl\">\"</span><span class=\"p\">,</span> <span class=\"dl\">\"</span><span class=\"s2\">xsd:integer</span><span class=\"dl\">\"</span><span class=\"p\">)</span>\n\n<span class=\"c1\">// Seven: extract the Lipinski Rule of Five score</span>\n<span class=\"nx\">propertyList</span> <span class=\"o\">=</span> <span class=\"nx\">cml</span><span class=\"p\">.</span><span class=\"nf\">fromString</span><span class=\"p\">(</span><span class=\"nx\">cmlReturned</span><span class=\"p\">);</span>\n<span class=\"nx\">value</span> <span class=\"o\">=</span> <span class=\"nx\">propertyList</span><span class=\"p\">.</span><span class=\"nf\">getPropertyElements</span><span class=\"p\">().</span><span class=\"nf\">get</span><span class=\"p\">(</span><span class=\"mi\">0</span><span class=\"p\">).</span>\n  <span class=\"nf\">getScalarElements</span><span class=\"p\">().</span><span class=\"nf\">get</span><span class=\"p\">(</span><span class=\"mi\">0</span><span class=\"p\">).</span><span class=\"nf\">getValue</span><span class=\"p\">()</span>\n<span class=\"nx\">js</span><span class=\"p\">.</span><span class=\"nf\">print</span><span class=\"p\">(</span><span class=\"dl\">\"</span><span class=\"s2\">Lipinski Rule of Five: </span><span class=\"dl\">\"</span> <span class=\"o\">+</span> <span class=\"nx\">value</span> <span class=\"o\">+</span> <span class=\"dl\">\"</span><span class=\"se\">\\n</span><span class=\"dl\">\"</span><span class=\"p\">)</span>\n\n<span class=\"c1\">// Eight: while at it, let's create a 2D and open in JChemPaint</span>\n<span class=\"nx\">service</span> <span class=\"o\">=</span> <span class=\"nx\">xmpp</span><span class=\"p\">.</span><span class=\"nf\">getService</span><span class=\"p\">(</span><span class=\"dl\">\"</span><span class=\"s2\">cdk.ws1.bmc.uu.se</span><span class=\"dl\">\"</span><span class=\"p\">);</span>\n<span class=\"nx\">service</span><span class=\"p\">.</span><span class=\"nf\">discoverSync</span><span class=\"p\">(</span><span class=\"mi\">5000</span><span class=\"p\">);</span>\n<span class=\"nx\">service</span><span class=\"p\">.</span><span class=\"nf\">getFunctions</span><span class=\"p\">();</span>\n<span class=\"nx\">func</span> <span class=\"o\">=</span> <span class=\"nx\">service</span><span class=\"p\">.</span><span class=\"nf\">getFunction</span><span class=\"p\">(</span><span class=\"dl\">\"</span><span class=\"s2\">generate2Dcoordinates</span><span class=\"dl\">\"</span><span class=\"p\">);</span>\n<span class=\"nx\">mol</span> <span class=\"o\">=</span> <span class=\"nx\">cdk</span><span class=\"p\">.</span><span class=\"nf\">fromSMILES</span><span class=\"p\">(</span><span class=\"nx\">smiles</span><span class=\"p\">)</span>\n<span class=\"nx\">result</span> <span class=\"o\">=</span> <span class=\"nx\">func</span><span class=\"p\">.</span><span class=\"nf\">invokeSync</span><span class=\"p\">(</span><span class=\"nx\">mol</span><span class=\"p\">.</span><span class=\"nf\">getCML</span><span class=\"p\">(),</span> <span class=\"mi\">900000</span><span class=\"p\">);</span>\n<span class=\"nx\">cmlReturned</span> <span class=\"o\">=</span> <span class=\"nx\">xmpp</span><span class=\"p\">.</span><span class=\"nf\">toString</span><span class=\"p\">(</span><span class=\"nx\">result</span><span class=\"p\">);</span>\n<span class=\"nx\">mol2d</span> <span class=\"o\">=</span> <span class=\"nx\">cdk</span><span class=\"p\">.</span><span class=\"nf\">fromCml</span><span class=\"p\">(</span><span class=\"nx\">cmlReturned</span><span class=\"p\">);</span>\n<span class=\"nx\">ui</span><span class=\"p\">.</span><span class=\"nf\">open</span><span class=\"p\">(</span><span class=\"nx\">mol2d</span><span class=\"p\">)</span>\n\n<span class=\"c1\">// Nine: oh, and a 3D model in Jmol</span>\n<span class=\"nx\">func</span> <span class=\"o\">=</span> <span class=\"nx\">service</span><span class=\"p\">.</span><span class=\"nf\">getFunction</span><span class=\"p\">(</span><span class=\"dl\">\"</span><span class=\"s2\">addExplicitHydrogens</span><span class=\"dl\">\"</span><span class=\"p\">);</span>\n<span class=\"nx\">result</span> <span class=\"o\">=</span> <span class=\"nx\">func</span><span class=\"p\">.</span><span class=\"nf\">invokeSync</span><span class=\"p\">(</span><span class=\"nx\">mol</span><span class=\"p\">.</span><span class=\"nf\">getCML</span><span class=\"p\">(),</span> <span class=\"mi\">900000</span><span class=\"p\">);</span>\n<span class=\"nx\">mol</span> <span class=\"o\">=</span> <span class=\"nx\">cdk</span><span class=\"p\">.</span><span class=\"nf\">fromCml</span><span class=\"p\">(</span><span class=\"nx\">xmpp</span><span class=\"p\">.</span><span class=\"nf\">toString</span><span class=\"p\">(</span><span class=\"nx\">result</span><span class=\"p\">));</span>\n<span class=\"nx\">func</span> <span class=\"o\">=</span> <span class=\"nx\">service</span><span class=\"p\">.</span><span class=\"nf\">getFunction</span><span class=\"p\">(</span><span class=\"dl\">\"</span><span class=\"s2\">generate3Dcoordinates</span><span class=\"dl\">\"</span><span class=\"p\">);</span>\n<span class=\"nx\">result</span> <span class=\"o\">=</span> <span class=\"nx\">func</span><span class=\"p\">.</span><span class=\"nf\">invokeSync</span><span class=\"p\">(</span><span class=\"nx\">mol</span><span class=\"p\">.</span><span class=\"nf\">getCML</span><span class=\"p\">(),</span> <span class=\"mi\">900000</span><span class=\"p\">);</span>\n<span class=\"nx\">mol3d</span> <span class=\"o\">=</span> <span class=\"nx\">cdk</span><span class=\"p\">.</span><span class=\"nf\">fromCml</span><span class=\"p\">(</span><span class=\"nx\">xmpp</span><span class=\"p\">.</span><span class=\"nf\">toString</span><span class=\"p\">(</span><span class=\"nx\">result</span><span class=\"p\">));</span>\n<span class=\"nx\">file</span> <span class=\"o\">=</span> <span class=\"dl\">\"</span><span class=\"s2\">/Virtual/foo.cml</span><span class=\"dl\">\"</span><span class=\"p\">;</span>\n<span class=\"nx\">ui</span><span class=\"p\">.</span><span class=\"nf\">remove</span><span class=\"p\">(</span><span class=\"nx\">file</span><span class=\"p\">)</span>\n<span class=\"nx\">cdk</span><span class=\"p\">.</span><span class=\"nf\">saveCML</span><span class=\"p\">(</span><span class=\"nx\">mol3d</span><span class=\"p\">,</span> <span class=\"nx\">file</span><span class=\"p\">);</span>\n<span class=\"nx\">ui</span><span class=\"p\">.</span><span class=\"nf\">open</span><span class=\"p\">(</span><span class=\"nx\">file</span><span class=\"p\">)</span>\n</code></pre></div></div>",
      "summary": "Johannes asked me what the Lipinski Rule of Five for farnesol is, in reply to the matching XMPP cloud service. Thanx to DBPedia for providing a machine readable form of the wikipedia entry:",
      
      "date_published": "2009-05-07T00:00:00+00:00",
      "date_modified": "2009-05-07T00:00:00+00:00",
      "tags": ["bioclipse","rdf","xmpp"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/atb3h-1x106",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/05/04/thesis-and-copyright-transfer.html",
      "title": "Thesis and copyright transfer",
      "content_html": "<p><a href=\"https://chem-bla-ics.linkedchemistry.info/2008/03/01/todo-april-2nd-defend-my-phd-work.html\">My thesis <i class=\"fa-solid fa-recycle fa-xs\"></i></a> was <em>released</em> slightly over a year ago in print form. The electronic form\nhas not yet, which has social and legal barriers. Like many before me, I made the <em>mistake</em> to publish in journals that require me to reassign copyright. Combine that\nwith the custom to publish papers in your thesis as is, which means reduction of work for the manuscript committee, as they can know which chapters have undergone\npeer-review in the form they are present in the thesis too. (The layout is different, as I integrated them into the thesis design.)</p>\n\n<p><img src=\"/assets/images/thesisCover.png\" alt=\"\" /></p>\n\n<p>So, my thesis has <a href=\"https://chem-bla-ics.linkedchemistry.info/2008/03/01/todo-april-2nd-defend-my-phd-work.html\">six chapters <i class=\"fa-solid fa-recycle fa-xs\"></i></a> which I cannot redistribute. I am still happy having\npublished in the <a href=\"http://pubs.acs.org/journal/jcisd8\">Journal of Chemical Information and Modeling</a> (formerly <em>JCICS</em>), and still review papers for that journal; it\nstill is the main forum in cheminformatics, but recently <a href=\"https://chem-bla-ics.linkedchemistry.info/2009/03/22/journal-of-cheminformatics-i-hope.html\">challenged <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.</p>\n\n<p>I have come to the conclusion that copyright transfer will not be my author choice in the future, and the ACS AuthorChoice still seems to require copyright transfer,\nif I do not misunderstand the legal talk.</p>\n\n<p>So, if you like a copy of those six chapters, send me an email requesting the a print copy of my thesis from my defense (I still have a few around). (Alternatively,\ndonate to my project to make these chapters AuthorChoice, but the ACS charges quite some money, and Google Adsense is not compensating for that quite yet.)</p>",
      "summary": "My thesis was released slightly over a year ago in print form. The electronic form has not yet, which has social and legal barriers. Like many before me, I made the mistake to publish in journals that require me to reassign copyright. Combine that with the custom to publish papers in your thesis as is, which means reduction of work for the manuscript committee, as they can know which chapters have undergone peer-review in the form they are present in the thesis too. (The layout is different, as I integrated them into the thesis design.)",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/thesisCover.png",
      "date_published": "2009-05-04T00:00:00+00:00",
      "date_modified": "2026-03-20T00:00:00+00:00",
      "tags": ["career"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/0cybr-c0g39",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/04/30/new-friendfeed-layout-but-there-is-fix.html",
      "title": "New FriendFeed layout, but there is a fix...",
      "content_html": "<p><a href=\"http://friendfeed.com/\">FriendFeed</a> is the missing link between social [bookmarking|news|…] and <a href=\"http://en.wikipedia.org/wiki/Internet_Relay_Chat\">IRC</a>\n(#cdk on <a href=\"http://www.freenode.net/\">irc.freenode.net</a>); I quite like it. Anyway, as of today, they have a new layout, and that I do not like. No more\nicons for feed types, and big avatar photo’s. Really, I <em>know</em> what my fellow blogger look like (even met many of them\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2008/08/31/science-blogging-2008-london-was-cool.html\">in London last year <i class=\"fa-solid fa-recycle fa-xs\"></i></a>). The rest of\nthe layout is a bit too colourful for my taste.</p>\n\n<p>Fortunately, <a href=\"http://nsaunders.wordpress.com/\">Neil</a> posted <a href=\"http://friendfeed.com/neilfws/3c83a2e5/my-beta-ff-greasemonkey-suite-is-complete\">three very useful</a>\n<a href=\"http://en.wikipedia.org/wiki/Greasemonkey\">GreaseMonkey</a> script to clean stuff (and since I use those to link science databases and resources anyway, see\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2007/12/21/christmas-presents.html\">Christmas Presents <i class=\"fa-solid fa-recycle fa-xs\"></i></a> and\nDOI:<a href=\"http://dx.doi.org/10.1186/1471-2105-8-487\">10.1186/1471-2105-8-487</a>): <a href=\"http://userscripts.org/scripts/show/46187#\">FriendFeed Service Icons</a>,\n<a href=\"http://userstyles.org/styles/16747\">Cleaner FriendFeed</a>, and <a href=\"http://userstyles.org/styles/16763\">Remove avatars from Friendfeed beta</a>.\nThe last may require the script target websites to no longer point to the beta server, but the real thing. However, by the time you read this,\nthe script may already be updated.</p>\n\n<p>After installing these, my FriendFeed page looks better again:</p>\n\n<p><img src=\"/assets/images/cleanFF.png\" alt=\"\" /></p>",
      "summary": "FriendFeed is the missing link between social [bookmarking|news|…] and IRC (#cdk on irc.freenode.net); I quite like it. Anyway, as of today, they have a new layout, and that I do not like. No more icons for feed types, and big avatar photo’s. Really, I know what my fellow blogger look like (even met many of them in London last year ). The rest of the layout is a bit too colourful for my taste.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/cleanFF.png",
      "date_published": "2009-04-30T00:00:00+00:00",
      "date_modified": "2026-03-19T00:00:00+00:00",
      "tags": ["friendfeed","javascript"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1186/1471-2105-8-487", "doi": "10.1186/1471-2105-8-487"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/xwv7k-kg163",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/04/29/things-to-do.html",
      "title": "Things to do...",
      "content_html": "<p>I know I am lagging behind things… been busy and did not have time to reply to everyone yet. Some TOREPLY’s go back more then a month. Sorry about that!</p>\n\n<p>Some of the things on my TODO list (in random order): <em>Bioclipse2 bug fixes, CDK patch reviewing (e.g. vflib), look at the Jmol-CDK bridge and bring it into\naction in Bioclipse2, RDF for PubChem, convert the Woordenboek Organische Chemie data into in RDF, RDF for NMRShiftDB, align with ChemAxiom, publish about\nthe Bioclipse2 RDF feature, finish the MetWare paper, write a metabolomics feature for Bioclipse2, finish the pKa prediction in the CDK, write 100% coverage\nCDK 2 CML 2 CDK, implement atom parity stereochemistry from SMILES and/or MDL molfiles, use supervised SOMs in QSAR, user supervised SOMs in\nproteochemometrics, study variable influence on supervised SOM models, make my thesis Open in the Radboud University library repository (excluding the\npapers I no longer have copyright on), update the QSAR and algorithm ontologies in OWL, create a web page with life ONS solubility RDF, create an ONS\nsolubility Bioclipse2 feature, study the CDK fingerprint performance compared to the new PubChem fingerprint, make Chemical blogspace aware of the ChemSpider\nwidget, interest people for an unconference in Stockholm or Uppsala, move house, learn to Swedish, get a driver license, implement a memory more-efficient\nCDK interfaces implementation, promote XMPP services which are better than SOAP, and write more papers, work on CMLSpect for metabolomics, finish the CDK\nbook, finally get a grant application approved, read up with literature and summarize in blog, port the Jmol UFF force field code to the CDK, analyze atom\ntyping in the CDK against PubChem and StarLite, compile strigi-chemistry again KDE 4.2.2, finish egonw.github.com homepage, …</em> you know the regular list\nof things to do.</p>\n\n<p>If you happen to be a masters student interested in doing a internship/practical here in Uppsala (unpaid, but you will learn so much), just email me.</p>",
      "summary": "I know I am lagging behind things… been busy and did not have time to reply to everyone yet. Some TOREPLY’s go back more then a month. Sorry about that!",
      
      "date_published": "2009-04-29T00:00:00+00:00",
      "date_modified": "2009-04-29T00:00:00+00:00",
      "tags": ["career"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/j5hd2-45306",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/04/24/cdk-workshop-2009-3.html",
      "title": "CDK Workshop 2009 #3",
      "content_html": "<p>Last of my writing on the <a href=\"http://apps.sourceforge.net/mediawiki/cdk/index.php?title=CDK_Workshop_2009\">CDK Workshop</a>. It was great fun meeting all the CDK developers\nand users, and thanx to everyone for all that they contributed, in particular during the unconference part! Yesterday, I had a travel day, and slept 12 hours in one\ngo last nite. This leaves me with a long list of follow emails, CDK patches and many other things to catch up with. But it was more than worth it.</p>\n\n<p>In the next months, we will say which conversations during the workshop will lead to fruitful collaborations and new CDK contributions. I already have a patch around\nfor <code class=\"language-plaintext highlighter-rouge\">@cdk.threadsafe</code> and <code class=\"language-plaintext highlighter-rouge\">@cdk.threadnonsafe</code> in reply to the <a href=\"http://apps.sourceforge.net/mediawiki/cdk/index.php?title=Threading_group\">Threading</a>\nsession at the unconference, which I’ll ask Rajarshi to review.</p>\n\n<p>Earlier in this series:</p>\n\n<ul>\n  <li><a href=\"http://chem-bla-ics.blogspot.com/2009/04/cdk-workshop-2009-2.html\">CDK Workshop 2009 #2</a></li>\n  <li><a href=\"http://chem-bla-ics.blogspot.com/2009/04/cdk-workshop-2009-1.html\">CDK Workshop 2009 #1</a></li>\n  <li><a href=\"http://chem-bla-ics.blogspot.com/2009/04/my-cdk-workshop-2009-course-material-2.html\">My CDK Workshop 2009 Course Material #2</a></li>\n  <li><a href=\"http://chem-bla-ics.blogspot.com/2009/04/my-cdk-workshop-2009-course-material.html\">My CDK Workshop 2009 Course Material</a></li>\n</ul>",
      "summary": "Last of my writing on the CDK Workshop. It was great fun meeting all the CDK developers and users, and thanx to everyone for all that they contributed, in particular during the unconference part! Yesterday, I had a travel day, and slept 12 hours in one go last nite. This leaves me with a long list of follow emails, CDK patches and many other things to catch up with. But it was more than worth it.",
      
      "date_published": "2009-04-24T00:00:00+00:00",
      "date_modified": "2009-04-24T00:00:00+00:00",
      "tags": ["cdk"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/jp5dh-gk632",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/04/22/my-cdk-workshop-2009-course-material-2.html",
      "title": "My CDK Workshop 2009 Course Material #2",
      "content_html": "<p>I wrote about my course material, and now complement that with the (three) slides:</p>\n\n<iframe class=\"scribd_iframe_embed\" title=\"CDK Workshop 2009 Slides\" src=\"https://www.scribd.com/embeds/14528398/content?start_page=1&amp;view_mode=scroll&amp;access_key=key-aarsvd4ijbwdndi2xr9\" tabindex=\"0\" data-auto-height=\"true\" data-aspect-ratio=\"1.331360946745562\" scrolling=\"no\" width=\"100%\" height=\"600\" frameborder=\"0\"></iframe>\n<p style=\"margin: 12px auto 6px auto; font-family: Helvetica,Arial,Sans-serif; font-size: 14px; line-height: normal; display: block;\">\n  <a title=\"View CDK Workshop 2009 Slides on Scribd\" href=\"https://www.scribd.com/doc/14528398/CDK-Workshop-2009-Slides#from_embed\" style=\"color: #098642; text-decoration: underline;\"> CDK Workshop 2009 Slides </a> by\n  <a title=\"View Egon Willighagen's profile on Scribd\" href=\"https://www.scribd.com/user/10744841/Egon-Willighagen#from_embed\" style=\"color: #098642; text-decoration: underline;\"> Egon Willighagen </a>\n</p>",
      "summary": "I wrote about my course material, and now complement that with the (three) slides:",
      
      "date_published": "2009-04-22T00:10:00+00:00",
      "date_modified": "2009-04-22T00:10:00+00:00",
      "tags": ["cdk"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/77t7v-1ed64",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/04/22/cdk-workshop-2009-2.html",
      "title": "CDK Workshop 2009 #2",
      "content_html": "<p>The second CDK Workshop day started off with application presentations, which are well covered by Chris’ blog posts:</p>\n\n<ul>\n  <li><a href=\"http://www.steinbeck-molecular.de/steinblog/index.php/2009/04/21/john-van-dries-talk-on-cdk-in-virtual-drug-discovery/\">John van Drie’s talk on CDK in Virtual Drug Discovery</a>,</li>\n  <li><a href=\"http://www.steinbeck-molecular.de/steinblog/index.php/2009/04/21/asad-rahman-on-small-molecules-and-reaction-mechanism-rewiring-the-enzyme-space/\">Asad Rahman on Small molecules and reaction mechanism - Rewiring the enzyme space</a>, and</li>\n  <li><a href=\"http://www.steinbeck-molecular.de/steinblog/index.php/2009/04/21/oliver-karch-about-molwind-using-cdk-to-visualize-molecule-spaces-in-a-geospatial-context/\">Oliver Karch about Molwind - Using CDK to Visualize Molecule Spaces in a Geospatial Context</a>.</li>\n</ul>\n\n<p>You can also find <a href=\"http://search.twitter.com/search?q=%23cdkws2009\">coverage on Twitter</a> by\n<a href=\"http://wwmm.ch.cam.ac.uk/blogs/downing/\">Jim</a> and <a href=\"http://wwmm.ch.cam.ac.uk/blogs/adams/\">Nico</a>.</p>\n\n<p>The afternoon we had a more development-oriented <a href=\"http://en.wikipedia.org/wiki/Unconference\">unconference</a>. It was a bit exciting for me and\n<a href=\"http://www.steinbeck-molecular.de/steinblog/\">Chris</a> as previous such sessions at CDK workshops involved 5 to 10 participants instead some\n20+ now, but things nicely self-organized into 5 sessions:</p>\n\n<p><img src=\"/assets/images/unconf1.jpg\" alt=\"\" /></p>\n\n<p><img src=\"/assets/images/unconf2.jpg\" alt=\"\" /></p>\n\n<p><img src=\"/assets/images/unconf3.jpg\" alt=\"\" /></p>\n\n<p>In words, and linking to the coverage in the CDK wiki:</p>\n\n<ul>\n  <li><a href=\"http://apps.sourceforge.net/mediawiki/cdk/index.php?title=JChemPaint/Rendering\">JChemPaint/Rendering</a></li>\n  <li><a href=\"http://apps.sourceforge.net/mediawiki/cdk/index.php?title=Mining_ChEMBL_with_CDK_pharmacophore_stuff\">Mining ChEMBL with CDK pharmacophore stuff</a></li>\n  <li><a href=\"http://apps.sourceforge.net/mediawiki/cdk/index.php?title=Threading_group\">Threading group</a></li>\n  <li><a href=\"http://apps.sourceforge.net/mediawiki/cdk/index.php?title=Top_10_improvements_%28CDK_WS_2009_unconference%29\">Top 10 improvements (CDK WS 2009 unconference)</a></li>\n  <li><a href=\"http://apps.sourceforge.net/mediawiki/cdk/index.php?title=Clojure\">Clojure</a> (see also Jim’s blog <a href=\"http://wwmm.ch.cam.ac.uk/blogs/downing/?p=310\">2D Molecule Diagrams in Clojure</a>)</li>\n</ul>",
      "summary": "The second CDK Workshop day started off with application presentations, which are well covered by Chris’ blog posts:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/unconf2.jpg",
      "date_published": "2009-04-22T00:00:00+00:00",
      "date_modified": "2009-04-22T00:00:00+00:00",
      "tags": ["cdk","jchempaint"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/2k1q0-9cz22",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/04/21/cdk-workshop-2009-1.html",
      "title": "CDK Workshop 2009 #1",
      "content_html": "<p>Coverage on <a href=\"http://pele.farmbio.uu.se/planetcdk/\">Planet CDK</a>, Twitter <a href=\"http://search.twitter.com/search?q=%23cdkws2009\">#cdkws2009</a>,\nFriend Feed’s <a href=\"http://friendfeed.com/rooms/chemistry-development-kit\">CDK room</a>, and on this Wiki page.\n<a href=\"http://www.steinbeck-molecular.de/steinblog/index.php/2009/04/20/egons-introductory-talk-about-getting-started-with-cdk/\">Chris</a>\n<a href=\"http://www.steinbeck-molecular.de/steinblog/index.php/2009/04/20/mark-rijnbeeks-talk-about-cdk-and-databases/\">blogged</a>\n<a href=\"http://www.steinbeck-molecular.de/steinblog/index.php/2009/04/20/ola-spjuth-talks-about-accessing-and-scripting-cdk-from-bioclipse/\">too</a>.</p>\n\n<p><img src=\"/assets/images/DSCI0007.JPG\" alt=\"\" /></p>",
      "summary": "Coverage on Planet CDK, Twitter #cdkws2009, Friend Feed’s CDK room, and on this Wiki page. Chris blogged too.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/DSCI0007.JPG",
      "date_published": "2009-04-21T00:00:00+00:00",
      "date_modified": "2009-04-21T00:00:00+00:00",
      "tags": ["cdk"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/vhhja-zz453",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/04/20/my-cdk-workshop-2009-course-material.html",
      "title": "My CDK Workshop 2009 Course Material",
      "content_html": "<p>My CDK Workshop 2009 Course Material:</p>\n\n<iframe class=\"scribd_iframe_embed\" title=\"CDK Workshop 2009 Intro Course Material\" src=\"https://www.scribd.com/embeds/14446588/content?start_page=1&amp;view_mode=scroll&amp;access_key=key-2gvdjm4h4ug0blzqu3bk\" tabindex=\"0\" data-auto-height=\"true\" data-aspect-ratio=\"0.7080062794348508\" scrolling=\"no\" width=\"100%\" height=\"600\" frameborder=\"0\"></iframe>\n<p style=\"margin: 12px auto 6px auto; font-family: Helvetica,Arial,Sans-serif; font-size: 14px; line-height: normal; display: block;\">\n  <a title=\"View CDK Workshop 2009 Intro Course Material on Scribd\" href=\"https://www.scribd.com/document/14446588/CDK-Workshop-2009-Intro-Course-Material#from_embed\" style=\"color: #098642; text-decoration: underline;\"> CDK Workshop 2009 Intro Course Material </a> by\n  <a title=\"View Egon Willighagen's profile on Scribd\" href=\"https://www.scribd.com/user/10744841/Egon-Willighagen#from_embed\" style=\"color: #098642; text-decoration: underline;\"> Egon Willighagen </a>\n</p>",
      "summary": "My CDK Workshop 2009 Course Material:",
      
      "date_published": "2009-04-20T00:00:00+00:00",
      "date_modified": "2009-04-20T00:00:00+00:00",
      "tags": ["cdk"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/wtjzh-frf54",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/04/17/downloading-domoic-acid-from-pubchem.html",
      "title": "Downloading Domoic Acid from PubChem",
      "content_html": "<p>The identity of <a href=\"http://en.wikipedia.org/wiki/Domoic_acid\">domoic acid</a> has been under discussion (see\n<a href=\"http://www.chemspider.com/blog/the-plot-thickens-on-domoic-acid.html\">here</a>, <a href=\"http://www.chemspider.com/blog/where-does-ce-news-source-its-chemical-structures.html\">here</a>\nand <a href=\"http://www.chemspider.com/blog/providing-some-structured-support-with-chemspiders-wikipedia-services.html\">here</a>).\n(And I very much like the <a href=\"http://www.chemspider.com/\">ChemSpider</a> service to make it easy to\n<a href=\"http://www.chemspider.com/blog/providing-some-structured-support-with-chemspiders-wikipedia-services.html\">copy data from ChemSpider into WikiPedia ChemBoxes</a>;\ncheers!)</p>\n\n<p>Now, my practical in next weeks <a href=\"https://apps.sourceforge.net/mediawiki/cdk/index.php?title=CDK_Workshop_2009\">CDK Workshop will</a> use\n<a href=\"http://groovy.codehaus.org/\">Groovy</a> (please install it on your laptop!), and am hacking up example scripts for the course material,\nand came up with this script to download the structure of <a href=\"http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=5282253\">domoic acid</a>\nfrom <a href=\"http://pubchem.ncbi.nlm.nih.gov/\">PubChem</a> (CID:5282253):</p>\n\n<script src=\"https://gist.github.com/97067.js\"></script>",
      "summary": "The identity of domoic acid has been under discussion (see here, here and here). (And I very much like the ChemSpider service to make it easy to copy data from ChemSpider into WikiPedia ChemBoxes; cheers!)",
      
      "date_published": "2009-04-17T00:10:00+00:00",
      "date_modified": "2009-04-17T00:10:00+00:00",
      "tags": ["cdk","pubchem","chemspider","wikipedia"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/zpbzd-vh907",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/04/17/cdk-121-released.html",
      "title": "CDK 1.2.1 Released",
      "content_html": "<p>I just released <a href=\"http://cdk.sf.net/\">CDK</a> 1.2.1 (aka <em>The CDK Workshop 2009 Release</em>), which is now available for\n<a href=\"https://sourceforge.net/project/showfiles.php?group_id=20024&amp;package_id=35118&amp;release_id=676514\">download from SourceForge</a>.\nThe source can be found in <a href=\"http://cdk.git.sourceforge.net/git/gitweb.cgi?p=cdk;a=log;h=130689aea6f054a55ddf45814f42d23d5ca79d78\">our Git repository</a>.\nThe changes since 1.2.0 are mostly bug fixing, new unit tests, and minor clean up here and there:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>Fixed bug 2714283, which properly throws an exception when rings are not closed properly. If a ring is not closed with the appropriate ring number, InvalidSmilesException is thrown. Matches Daylight behavior\nFixed bug 2729120 and added unit test\nUpdated comment to fix bug 2768643.\nPartial fix for bug 2719237. Made getBondOrderSum static, added unit test for it\nTypo: proteinl -&gt; protein\nMade class public, to unbreak adding it to the build/*.javafiles\nPartially fixed SMARTS matching for R0. Updated target molecule initialization to explicitly indicate atoms not in a ring and also updated RingMembership atom to do an explicit check when R0 is specified. Partially fixes bug 2587204\nFixed dubious equality test. A private method was checking Double objects via reference. Worked fine when they were null. Fails when we need to compare by value. Code is updated to take it into account. Added unit test (and made the method protected so that it can be tested)\nAdded test method annotation. Completes coverage for data module\nRefactored ChiIndexUtils to make it package private. Cleans up public API, since it is only used by chi descriptor code. Updated all dependent classes. Moved test code (which needs to be filled in!) as well\nCode cleanup of ChiIndexUtils. Converted to 1.5 idioms\nClean up of PathTools and added test method annotation, so that core is completely covered\nFixed the previous commit to edit the cdk.keyword line, not the cdk.module line\nMore consistent keywords used\nAdded a test to ensure that Integer objects are compared by value rather than reference\nAdded a test case to check that atom container diffs are correct when using deserialized objects\nFixed IntegerDifference so that it actually checks the integer value rather than references of the Integer object. Fixes the problem whereby an object serialized to disk and then deserialized does not match the original object (i.e., non empty diff string)\nApplied patch #2675819 (Stefan Kuhn): Patch to add a removeReaction to reactionSet\nUse interface instead of implementation\nRemoved an unused import\nUse IAtomContainer instead of IMolecule, as the actual matching is using IAtomContainers already (fixes #2686249)\nFixed a ClassCastException (fixes #2685134)\nAdded source attrib to fix building the Ubuntu .deb\nFixed Help build system: use doclet jars in develjar/; updated for new src folder src/main; removed very outdated use of rt.jar\nRemoved libdepends include for test-ioformats, which does not actually have libdepends\nUpdated so that if a target atom has no symbol (such as pseudo atoms) the match returns false (rather than an NPE)\nFixed proper handling of #n SMARTS querys\nAdded test case for bug 2686473\nAdded note on Ant 1.7.1 required\nFixed a NPE source: 'null == 2' causes an exception, so first test for nullness\nFixed copyright notice for 2009\nFixed duplicate storage of layout templates, which only belong in the sdg module, not extra module too\nMerge branch 'local1.2' of ../../git-svn/cdk\n</code></pre></div></div>\n\n<p>Thanx for all who reported bug reports!</p>",
      "summary": "I just released CDK 1.2.1 (aka The CDK Workshop 2009 Release), which is now available for download from SourceForge. The source can be found in our Git repository. The changes since 1.2.0 are mostly bug fixing, new unit tests, and minor clean up here and there:",
      
      "date_published": "2009-04-17T00:00:00+00:00",
      "date_modified": "2009-04-17T00:00:00+00:00",
      "tags": ["cdk"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/gt05b-5mq25",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/04/15/bioclipse2-scripting-3-xlogp.html",
      "title": "Bioclipse2 Scripting #3: XLogP calculatation using a XMPP CDK cloud service",
      "content_html": "<p>In preparation of the <a href=\"https://apps.sourceforge.net/mediawiki/cdk/index.php?title=CDK_Workshop_2009\">CDK workshop</a> next week, here is a small\n<a href=\"http://www.bioclipse.net/\">Bioclipse2</a> script to calculate the XLogP value for a given <a href=\"http://www.opensmiles.org/\">SMILES</a>,\nusing the a CDK-based <a href=\"http://chem-bla-ics.blogspot.com/search?q=xmpp\">XMPP service</a>:</p>\n\n<script src=\"https://gist.github.com/egonw/95987.js\"></script>\n\n<p>Earlier in this series:</p>\n\n<ul>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2008/10/25/bioclipse2-scripting-1-from-smiles-to.html\">Bioclipse2 Scripting #1: from SMILES to a UFF optimized structure in Jmol <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2009/02/21/bioclipse2-scripting-2-searching.html\">Bioclipse2 Scripting #2: searching PubChem <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n</ul>",
      "summary": "In preparation of the CDK workshop next week, here is a small Bioclipse2 script to calculate the XLogP value for a given SMILES, using the a CDK-based XMPP service:",
      
      "date_published": "2009-04-15T00:10:00+00:00",
      "date_modified": "2026-03-20T00:00:00+00:00",
      "tags": ["xmpp","cdk","bioclipse"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/p57j8-4wd14",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/04/15/multiple-inheritence-for-content-types.html",
      "title": "Multiple inheritence for content types?",
      "content_html": "<p><a href=\"http://www.bioclipse.net/\">Bioclipse</a> is an environment for handling and processing life sciences data. This data is present\nin files with a wide variety of formats, each of which can contain a particular data type. For example, a we can have a single\nmolecule in <em>MDL molfile</em> and in <em><a href=\"http://en.wikipedia.org/wiki/Chemical_Markup_Language\">CML</a></em>.</p>\n\n<p>The latter is particularly interesting, as I do not know how to work that out… Firstly, I want the <em>CML (Single Molecule)</em>\ncontent type extend the CML content type, so that a <a href=\"http://chemicalrcp.blogspot.com/2009/01/editing-and-validation-of-cml-documents.html\">validating CML editor</a>\ncan open it with the proper schema, but at the same time I would like to extend it a content type representation a\n<em>Single Molecule</em>. Hence, the multiple inheritance.</p>\n\n<p>This is what the <a href=\"http://bioclipse.svn.sourceforge.net/viewvc/bioclipse/bioclipse2/trunk/plugins/net.bioclipse.cml/plugin.xml\">plugin.xml</a> currently looks like:</p>\n\n<div class=\"language-xml highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nt\">&lt;extension</span>\n  <span class=\"na\">point=</span><span class=\"s\">\"org.eclipse.core.runtime.contentTypes\"</span><span class=\"nt\">&gt;</span>\n\n  <span class=\"nt\">&lt;content-type</span>\n    <span class=\"na\">base-type=</span><span class=\"s\">\"net.bioclipse.contenttypes.cml\"</span>\n    <span class=\"na\">id=</span><span class=\"s\">\"net.bioclipse.contenttypes.cml.singleMolecule2d\"</span>\n    <span class=\"na\">name=</span><span class=\"s\">\"CML (Single 2D Molecule)\"</span>\n    <span class=\"na\">priority=</span><span class=\"s\">\"high\"</span><span class=\"nt\">&gt;</span>\n    <span class=\"nt\">&lt;describer</span> <span class=\"na\">class=</span><span class=\"s\">\"net.bioclipse.cml.contenttypes.CmlFileDescriber\"</span><span class=\"nt\">&gt;</span>\n      <span class=\"nt\">&lt;parameter</span>\n        <span class=\"na\">name=</span><span class=\"s\">\"dimension\"</span>\n        <span class=\"na\">value=</span><span class=\"s\">\"2D\"</span><span class=\"nt\">/&gt;</span>\n      <span class=\"nt\">&lt;parameter</span>\n        <span class=\"na\">name=</span><span class=\"s\">\"cardinality\"</span>\n        <span class=\"na\">value=</span><span class=\"s\">\"single\"</span><span class=\"nt\">/&gt;</span>\n    <span class=\"nt\">&lt;/describer&gt;</span>\n  <span class=\"nt\">&lt;/content-type&gt;</span>\n\n<span class=\"nt\">&lt;/extension&gt;</span>\n</code></pre></div></div>\n\n<p>Very clearly, a single base-type. Is there any option of multiple inheritance?</p>",
      "summary": "Bioclipse is an environment for handling and processing life sciences data. This data is present in files with a wide variety of formats, each of which can contain a particular data type. For example, a we can have a single molecule in MDL molfile and in CML.",
      
      "date_published": "2009-04-15T00:00:00+00:00",
      "date_modified": "2009-04-15T00:00:00+00:00",
      "tags": ["cdk","bioclipse"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/rde9j-mn382",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/04/13/rednael-cdk-git-for-rajarshis-patches.html",
      "title": "Rednael, CDK Git for Rajarshi&apos;s patches, PubChem SDF",
      "content_html": "<p>Short blog item about some <a href=\"http://cdk.git.sourceforge.net/\">CDK Git</a> updates. Could not get sleep, so might as well spend that time on\n<a href=\"http://cdk.sf.net/\">CDK</a> hacking, not? Reason why I actually could not catch sleep was the news that <a href=\"http://pubchem.ncbi.nlm.nih.gov/\">PubChem</a>\nSD files are not regular MDL SD files, but use custom extensions, for example, for dative bonds (see\n<a href=\"ftp://ftp.ncbi.nlm.nih.gov/pubchem/specifications/pubchem_sdtags.pdf\">this PDF</a>). This <em>surely</em> explains the weird things I have seen,\nbut, unfortunately, the big SDF button on PubChem does not warn about that. Anyway, thanx for Wolfgang for informing about that\ncustomization!</p>\n\n<p>So, instead I hacked a bit on the CDK, which was about time. Last two weeks have been really busy with finding a new house (which we did),\nand writing two big grant applications (about done). Finally time for cleaning up my TOREPLY list on Gmail. I picked the request of\n<a href=\"http://blog.rguha.net/\">Rajarshi</a> to put online some of his patches, which are now available from\n<a href=\"http://pele.farmbio.uu.se/git/rajarshi.git/\">pele.farmbio.uu.se/git/rajarshi.git</a>, where you will\n<a href=\"https://sourceforge.net/tracker/?func=browse&amp;group_id=20024&amp;atid=320024\">find four of his patches ready for review</a>:\n<em>fp2d</em>, <em>pcore</em>, <em>pubchemfp</em> and <em>cleanpt</em>. These are <strong>really</strong> interesting patches!</p>\n\n<p>That brings me to the last thing for today: <a href=\"http://github.com/egonw/rednael/tree/master\">Rednael</a>.\n<a href=\"http://en.wikipedia.org/wiki/Zarah_Leander\">Leander</a> (a nickname already reserved, so reverse used) is an\n<a href=\"http://en.wikipedia.org/wiki/Internet_Relay_Chat\">IRC</a> bot for the #cdk channel which reports us of commits to our main Git repository.\nBack in the old SVN days (time goes so fast :), we had the <a href=\"http://cia.vc/\">CIA</a> (Langley?) use there equipment to monitor SVN commit,\nand <a href=\"http://cia.vc/stats/project/cdk\">report those online</a> and on IRC, but Git is too advanced for them, apparently. So, I wrote\nmy own little bot to do it (see earlier link to GitHub). It can monitor multiple channels and report about multiple RSS feeds per\nchannel. Thus, it is actually not restricted to Git commits alone.</p>",
      "summary": "Short blog item about some CDK Git updates. Could not get sleep, so might as well spend that time on CDK hacking, not? Reason why I actually could not catch sleep was the news that PubChem SD files are not regular MDL SD files, but use custom extensions, for example, for dative bonds (see this PDF). This surely explains the weird things I have seen, but, unfortunately, the big SDF button on PubChem does not warn about that. Anyway, thanx for Wolfgang for informing about that customization!",
      
      "date_published": "2009-04-13T00:00:00+00:00",
      "date_modified": "2009-04-13T00:00:00+00:00",
      "tags": ["cdk","git","pubchem"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/qp0bj-ymr70",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/04/11/groovy-cdk-and-keyword-list.html",
      "title": "Groovy CDK and the Keyword List",
      "content_html": "<p>Today I have been hacking a bit more on the CDK material for the <a href=\"https://apps.sourceforge.net/mediawiki/cdk/index.php?title=CDK_Workshop_2009\">CDK workshop</a>\n(see <a href=\"http://chem-bla-ics.blogspot.com/2009/04/cdk-documentation.html\">CDK - The Documentation</a>). Below are two previews, one with a LaTeX-ified keyword list\n(<a href=\"http://pele.farmbio.uu.se/nightly-1.2.x/keywords.html\">here as HTML</a>):</p>\n\n<iframe class=\"scribd_iframe_embed\" title=\"CDK Keyword List\" src=\"https://www.scribd.com/embeds/14135688/content?start_page=1&amp;view_mode=scroll&amp;access_key=key-2g4d8dgj887ewymskmd6\" tabindex=\"0\" data-auto-height=\"true\" data-aspect-ratio=\"0.7080062794348508\" scrolling=\"no\" width=\"100%\" height=\"600\" frameborder=\"0\"></iframe>\n<p style=\"margin: 12px auto 6px auto; font-family: Helvetica,Arial,Sans-serif; font-size: 14px; line-height: normal; display: block;\"> <a title=\"View CDK Keyword List on Scribd\" href=\"https://www.scribd.com/document/14135688/CDK-Keyword-List#from_embed\" style=\"color: #098642; text-decoration: underline;\"> CDK Keyword List </a> by\n  <a title=\"View Egon Willighagen's profile on Scribd\" href=\"https://www.scribd.com/user/10744841/Egon-Willighagen#from_embed\" style=\"color: #098642; text-decoration: underline;\"> Egon Willighagen </a>\n</p>\n\n<p>And here about <a href=\"http://groovy.codehaus.org/\">Groovy</a>, which indeed is groovy:</p>\n\n<iframe class=\"scribd_iframe_embed\" title=\"Groovy CDK\" src=\"https://www.scribd.com/embeds/14135833/content?start_page=1&amp;view_mode=scroll&amp;access_key=key-20tebo1yw1kzhy7slqv1\" tabindex=\"0\" data-auto-height=\"true\" data-aspect-ratio=\"0.7080062794348508\" scrolling=\"no\" width=\"100%\" height=\"600\" frameborder=\"0\"></iframe>\n<p style=\"margin: 12px auto 6px auto; font-family: Helvetica,Arial,Sans-serif; font-size: 14px; line-height: normal; display: block;\"> <a title=\"View Groovy CDK on Scribd\" href=\"https://www.scribd.com/document/14135833/Groovy-CDK#from_embed\" style=\"color: #098642; text-decoration: underline;\"> Groovy CDK </a> by\n  <a title=\"View Egon Willighagen's profile on Scribd\" href=\"https://www.scribd.com/user/10744841/Egon-Willighagen#from_embed\" style=\"color: #098642; text-decoration: underline;\"> Egon Willighagen </a>\n</p>",
      "summary": "Today I have been hacking a bit more on the CDK material for the CDK workshop (see CDK - The Documentation). Below are two previews, one with a LaTeX-ified keyword list (here as HTML):",
      
      "date_published": "2009-04-11T00:00:00+00:00",
      "date_modified": "2009-04-11T00:00:00+00:00",
      "tags": ["cdk","cdkbook"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/m1are-j7656",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/04/08/open-knowledge-reproducibility-in.html",
      "title": "&quot;Open Knowledge: Reproducibility in Cheminformatics with Open Data, Open Source and Open Standards&quot;",
      "content_html": "<p>I have submitted today the abstract of my talk at the <a href=\"http://www.gdch.de/vas/tagungen/tg/5580/wissprog/symp.htm\">GDCh-Wissenschaftsforum Chemie 2009</a>\n<a href=\"http://www.dopplr.com/trip/egonw/523957\">in Frankfurt in August</a> as part of the Open Notebook Science/Open Drug Discovery session:</p>\n\n<blockquote>\n  <p>“Open Knowledge: Reproducibility in Cheminformatics with Open Data,\nOpen Source and Open Standards”</p>\n\n  <p>Abstract:\nThe Open paradigms in science have been met with strong criticism.\nNevertheless, support and use of Open models among scientists is\ngrowing. While the Open model is certainly only one approach to doing\nscience, it has a few aspects that make propagation of knowledge more\ntransparent. Indeed, Open Data, Open Source and Open Standards\n(ODOSOS) make it easier to reproduce of knowledge and promote peer\nreview. Various ODOSOS projects will be introduced which improve\nreproducibility in cheminformatics, the underlying science of\nexchanging chemical knowledge. Recent contributions of the Chemistry\nDevelopment Kit, Bioclipse, chemical ontologies and others will be\ndiscussed that add to the repertoire of Open Cheminformatics, and how\nthese contribute to Open Knowledge.</p>\n</blockquote>\n\n<p>The exact details I do not know yet, and likely not before the weekend before the meeting :) But this blog gives a good impression of what you can expect.</p>",
      "summary": "I have submitted today the abstract of my talk at the GDCh-Wissenschaftsforum Chemie 2009 in Frankfurt in August as part of the Open Notebook Science/Open Drug Discovery session:",
      
      "date_published": "2009-04-08T00:00:00+00:00",
      "date_modified": "2009-04-08T00:00:00+00:00",
      "tags": ["odosos","cdk","chemistry"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/g6q22-exr65",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/04/07/nature-jobs-google-map-mashup.html",
      "title": "Nature Jobs - Google Map mashup",
      "content_html": "<p>Nothing much I need to say about the <a href=\"http://www.nature.com/naturejobs/science/map\">NatureJobs Interactive World Map</a>,\nI think. Thanx to <a href=\"http://pimm.wordpress.com/\">Partial Immortalization</a> for the\n<a href=\"http://friendfeed.com/e/7e1c5170-056f-aa14-185d-7a1d2ec6d158/checking-Naturejobs-Interactive-World-Map-http/\">link on FriendFeed</a>!</p>\n\n<p><img src=\"/assets/images/naturejobs.png\" alt=\"\" /></p>",
      "summary": "Nothing much I need to say about the NatureJobs Interactive World Map, I think. Thanx to Partial Immortalization for the link on FriendFeed!",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/naturejobs.png",
      "date_published": "2009-04-07T00:00:00+00:00",
      "date_modified": "2009-04-07T00:00:00+00:00",
      "tags": ["science"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/rzwcr-mq411",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/04/05/cdk-documentation.html",
      "title": "CDK - The Documentation",
      "content_html": "<p>In preparation for the <a href=\"http://www.steinbeck-molecular.de/steinblog/index.php/2009/02/17/cdk-workshop-at-ebi-on-april-2021/\">CDK workshop</a> later this\nmonth, I am writing up my material for my kick-off presentation of the workshop. So, I better make it good. Using LaTeX at least overcomes my laziness\nwhich always made Word documents look stupid. Even default LaTeX looks good:</p>\n\n<p><img src=\"/assets/images/cdkbook.png\" alt=\"\" /></p>\n\n<p>Clearly, any such documentation becomes quickly outdated, in particular when source code fragments are involved. Yes,\n<a href=\"http://sourceforge.net/mailarchive/forum.php?thread_name=6aeb064b0903130006j2a71a94execd2f09209cd668%40mail.gmail.com&amp;forum_name=cdk-user\">CDK 1.2</a> is\nAPI stable, but only for the core classes. Moreover, I hope that the documentation will survive CDK 1.4 or 2.0 or whatever the next stable version is.</p>\n\n<p>Therefore, I need to source code fragments compilable. R has the magnificent <a href=\"http://www.stat.umn.edu/~charlie/Sweave/\">Sweave</a>, and I wanted for a\nlong time something similar. While I do not have something that powerful yet, at least my current set up allows me have code that both compiles and\nembeds in the LaTeX. The system allows me to write both Java application code as\n<a href=\"https://en.wikipedia.org/wiki/BeanShell\">BeanShell</a> scripts. No clue yet what I will use in the workshop,\nmaybe even both. Like Sweave, it even saves output, and I can include that in the LaTeX source too. The code fragments can either go in as a verbatim\nsection, or as a listing, depending on what I find more appropriate.</p>",
      "summary": "In preparation for the CDK workshop later this month, I am writing up my material for my kick-off presentation of the workshop. So, I better make it good. Using LaTeX at least overcomes my laziness which always made Word documents look stupid. Even default LaTeX looks good:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/cdkbook.png",
      "date_published": "2009-04-05T00:00:00+00:00",
      "date_modified": "2025-12-09T00:00:00+00:00",
      "tags": ["cdk","cdkbook"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/yjwzx-c4d55",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/03/30/starlite-talks-in-uppsala-helenas-open.html",
      "title": "StARlite talks in Uppsala; Helena&apos;s Open Chemogenomics thesis",
      "content_html": "<p><a href=\"http://chembl.blogspot.com/\">John</a> was in <a href=\"http://chembl.blogspot.com/2009/03/hit-sack-pt-viii-first-hotel-linne.html\">Uppsala last Friday</a>, and our group\nhad the pleasure of talking to/with him before he was opponent to Helena defending her thesis on <em>Chemogenomics: Models of Protein-Ligand Interaction Space</em>\n(ISBN:<a href=\"http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-89299\">978-91-554-7430-0</a>). Since we believe we can do tons of really interesting science on John’s\n<a href=\"http://www.ebi.ac.uk/chembl/\">StARlite data</a>, I was excited to talk to him in person. He gave three talks that day, and managed to keep the overlap minimal (yes, not quite an absolute measure, but you get the point). We showed him the efforts of Arvid, Carl, Jonathan and me on converting the StARlite data to RDF, on which I will write shortly.</p>\n\n<p>BTW, Helena’s thesis is partly Open (see Peter’s <a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=1481\">“should theses be Open?”</a>). Partly, because it has the\nthesis parts which have not been published in journals before. I find this an interesting intermediate solution. Of course, all the really interesting bits\nare missing (the peer-reviewed papers), but at least puts the wrapping material in the Open (introduction, discussion, conclusion). I think I will do the\nsame with my thesis (unless someone funds my papers to become Open Choice).</p>",
      "summary": "John was in Uppsala last Friday, and our group had the pleasure of talking to/with him before he was opponent to Helena defending her thesis on Chemogenomics: Models of Protein-Ligand Interaction Space (ISBN:978-91-554-7430-0). Since we believe we can do tons of really interesting science on John’s StARlite data, I was excited to talk to him in person. He gave three talks that day, and managed to keep the overlap minimal (yes, not quite an absolute measure, but you get the point). We showed him the efforts of Arvid, Carl, Jonathan and me on converting the StARlite data to RDF, on which I will write shortly.",
      
      "date_published": "2009-03-30T00:00:00+00:00",
      "date_modified": "2009-03-30T00:00:00+00:00",
      "tags": ["openscience","chembl"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/4q9fe-k6368",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/03/23/highlighting-console-output-in-eclipse.html",
      "title": "Highlighting Console output in Eclipse with Grep Console",
      "content_html": "<p>I ran into an <a href=\"http://www.eclipse.org/\">Eclipse</a> <a href=\"http://marian.schedenig.name/projects/grep-console/\">Grep Console</a> plugin (EPL license) today that takes\nregular expression to color output in the Console. Given the amount of output <a href=\"http://www.bioclipse.net/\">Bioclipse</a> and the\n<a href=\"http://cdk.sf.net/\">CDK</a> give when in DEBUG mode, this allows me to highlight those bits I am interested in. For example,\ncomments on the <em>Bioclipse managers</em>:</p>\n\n<p><img src=\"/assets/images/regexpConsole.png\" alt=\"\" /></p>",
      "summary": "I ran into an Eclipse Grep Console plugin (EPL license) today that takes regular expression to color output in the Console. Given the amount of output Bioclipse and the CDK give when in DEBUG mode, this allows me to highlight those bits I am interested in. For example, comments on the Bioclipse managers:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/regexpConsole.png",
      "date_published": "2009-03-23T00:00:00+00:00",
      "date_modified": "2009-03-23T00:00:00+00:00",
      "tags": ["eclipse","bioclipse"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/kzzbq-02828",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/03/22/journal-of-cheminformatics-i-hope.html",
      "title": "Journal of Cheminformatics: I hope the Instructions to the Authors improve",
      "content_html": "<p>Besides <a href=\"https://chem-bla-ics.linkedchemistry.info/2009/03/19/nature-chemistry-improves-publishing.html\">Nature Chemistry <i class=\"fa-solid fa-recycle fa-xs\"></i></a>, another journal was launched last week (see\n<a href=\"http://www.steinbeck-molecular.de/steinblog/index.php/2009/03/17/open-access-journal-of-cheminformatics-now-live/\">here</a> and\n<a href=\"http://blogs.openaccesscentral.com/blogs/ccblog/entry/journal_of_cheminformatics_publishes_launch\">here</a>): the\n<a href=\"http://www.jcheminf.com/\">Journal of Cheminformatics</a>. First of all, congratulations to <a href=\"http://www.steinbeck-molecular.de/steinblog/\">Chris</a>\nand David for their efforts! While the journal only published one research paper yet, it already found\n<a href=\"http://cb.openmolecules.net/journal_search.php?journal_id=Journal%20of%20Cheminformatics\">its place</a> on\n<a href=\"http://cb.openmolecules.net/\">Chemical blogspace</a>. I have two things I want to blog about: <em>data rich publishing</em>, and\n<em>starting the scientific communication</em>.</p>\n\n<h2 id=\"data-rich-publishing\">Data Rich Publishing</h2>\n\n<p>Peter had a <a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=1326\">detailed blog</a> about why he joined the editorial board:</p>\n\n<blockquote>\n  <p>I take this position with some trepidation as I have grave reservations about the current practice of cheminformatics.\nIt suffers from closed data, closed source and closed standards, and thereby generally poor experimental design, poor\nmetrics and almost always irreproducible results and conclusions which are based on subjective opinions.</p>\n</blockquote>\n\n<p>I strongly agree with this observation, and have discussed my view on this in\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2008/03/01/todo-april-2nd-defend-my-phd-work.html\">my thesis <i class=\"fa-solid fa-recycle fa-xs\"></i></a> (send me an email if you\nwant a copy).</p>\n\n<p>So, what has the journal to say about this (see <a href=\"http://www.jcheminf.com/info/instructions/\">Instructions to the Author</a>,\nemphasis mine):</p>\n\n<blockquote>\n  <p>Journal of Cheminformatics recommends, <strong>but does not require</strong>, that the source code of the software should be made\navailable under a suitable open-source license that will entitle other researchers to further develop and extend\nthe software if they wish to do so.</p>\n</blockquote>\n\n<p>Regarding data, they even less revolutionary; recommended figures formats (EPS, PDF, PNG) focus on nice graphics instead\nof reuse of data. I also note that I cannot upload data in the <a href=\"http://en.wikipedia.org/wiki/OpenDocument\">Open Document Format</a>,\nor, hey, let’s really push things, in <a href=\"http://en.wikipedia.org/wiki/Resource_Description_Framework\">RDF</a>. Well, not according to\nthe Instructions. And surely, I can put the [O|R]DF in the supplementary information, anyway. It would also be nice if I could\nuse Jmol as an applet to enrich the graphics, and improve data reusability of the paper, like the\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2009/01/19/rsc-now-allows-jmol-in-main-text-of.html\">RSC recently started to allow <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.</p>\n\n<p>Regarding the supplementary information, there is a section on <em>additional files</em>, which, unconveniently are capped at\n20MB size. No mention of chemical formats at all, neither any recommendation on semantic formats like\n<a href=\"http://en.wikipedia.org/wiki/Chemical_Markup_Language\">CML</a> (I wonder when this was discussed with the Editorial Board,\nand where Peter was at the time). How am I going to put online my 500 molecular structure CML file now? (Though it’s good\nto know it is virus scanned ;)</p>\n\n<p>So, why do I vent my concerns about these limitations? I had not blogged about the launch of the journal earlier, because\nI have not made up my mind about it. On one side, I am happy to see a journal that promotes (scientific) use of papers,\nand a journal that allows me to keep copyright on the material. However, on the other side, what the current Instructions\nsuggest, the data I could use from the papers is available only in an old-fashion way. That’s a lost opportunity and could\nhave killed competition for sure. Instead, the unique selling point is now restricted to using an\n<a href=\"http://www.biomedcentral.com/info/about/openaccess/\">open access license</a>. Nature Chemistry, on the other hand, chose\ndata rich publishing as a selling point (though in competition with things done at the RSC).</p>\n\n<p>The other thing I want to mention about the journal is the following. <a href=\"http://blog.rguha.net/\">Rajarshi</a> blogged about\n<a href=\"http://hackberry.chem.trinity.edu/blog/\">Bachrach</a>’s paper on <em>Chemistry publication - making the revolution</em>\n(DOI:<a href=\"https://doi.org/10.1186/1758-2946-1-2\">10.1186/1758-2946-1-2</a>). Firstly, by adding a link like that for the\nDOI I just gave, Chemical blogspace can pick it up; we need this later. Secondly, the paper actually suggests that\n<em>“[b]y publishing lots of data, available for ready re-use by all scientists, we can radically change the way science\nis communicated and ultimately performed”</em>; this is in strong contrast to what I have seen in the Instructions so far.</p>\n\n<h2 id=\"starting-the-scientific-communication\">Starting the Scientific Communication</h2>\n<p><a href=\"http://depth-first.com/\">Rich</a> <a href=\"http://blog.rguha.net/?p=216#comment-342\">replied</a> to Rajarshi about the requirement\nto log in before someone could make a comment, which he did not like. He suggested alternative ways to prevent SPAM\nand sorts. The choice for this commenting approach may also originate from having an Open discussion, where everyone\ntakes responsibility for what he says. The use of OpenID, as Rich suggests would only partially address that; on the\nother hand, setting up a fake email address is quite common in the blogosphere too.</p>\n\n<p>If Rajarshi would have used the DOI to link to the Steven’s paper, as said, Chemical blogspace would have recognized\nit. Instead, he chose to link directly to the PDF. This is a typical case of hamburgers in action. However, others\ndid when they discussed the first research paper in the journal (DOI:<a href=\"https://doi.org/10.1186/1758-2946-1-3\">10.1186/1758-2946-1-3</a>).\nThese blogs were picked up by Cb and are listed on <a href=\"http://cb.openmolecules.net/paper.php?paper_id=1666\">this page</a>.</p>\n\n<p>Now, I only need to remind you of <em>Userscripts for the Life Sciences</em> (DOI:<a href=\"https://doi.org/10.1186/1471-2105-8-487\">10.1186/1471-2105-8-487</a>)\nthat we have the methods to link these comments back to the journal website. The <em>Quotes from Chemical Blogspace and Postgenomic</em>\nscript in particular, does the hard work (needs GreaseMonkey, the script can be downloaded here; see also\n<a href=\"http://baoilleach.blogspot.com/2007/04/add-quotes-from-postgenomic-and.html\">Noel’s original post</a>). This way,\nwe can read the comments when we visit the <a href=\"http://www.jcheminf.com/content/1/1/3\">papers homepage</a>:</p>\n\n<p><img src=\"/assets/images/cbStillWorks.png\" alt=\"\" /></p>\n\n<p>Now, the script has not yet been updated for the new journal (Noel, can you please upload the revision?), so you need\nto edit the source right now and add <code class=\"language-plaintext highlighter-rouge\">http://*.jcheminf.com/*</code> to the list of website the script acts on:</p>\n\n<p><img src=\"/assets/images/cbStillWorks1.png\" alt=\"\" /></p>",
      "summary": "Besides Nature Chemistry , another journal was launched last week (see here and here): the Journal of Cheminformatics. First of all, congratulations to Chris and David for their efforts! While the journal only published one research paper yet, it already found its place on Chemical blogspace. I have two things I want to blog about: data rich publishing, and starting the scientific communication.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/cbStillWorks.png",
      "date_published": "2009-03-22T00:00:00+00:00",
      "date_modified": "2025-12-07T00:00:00+00:00",
      "tags": ["cb","cheminf","cml","userscript","publishing","rdf","jcheminf"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1186/1758-2946-1-2", "doi": "10.1186/1758-2946-1-2"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1186/1758-2946-1-3", "doi": "10.1186/1758-2946-1-3"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1186/1471-2105-8-487", "doi": "10.1186/1471-2105-8-487"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/e7wjx-fvr63",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/03/20/preferential-positions-of-phophate.html",
      "title": "Preferential positions of phophate counter ions",
      "content_html": "<p>A long time ago (‘96 or so?), as a student with the no longer existing CAOS/CAMM (Google shows some traces, like\n<a href=\"https://doi.org/10.3233/CMI-2014-000003\">this chapter describing the centre</a>), I did a short internship\nwith Hilbert Bruijn-Slot (I hope I remember his name correctly), where has asked me to look at data in the CSD, and in\nparticular the prefered position of phosphate counter ions. It was a fun research, and almost made it into a paper, if we\nwere not just beating by a few months by a group of Russians who just published the same.</p>\n\n<p>Today, <a href=\"http://chem-bla-ics.blogspot.com/2009/03/nature-chemistry-improves-publishing.html?showComment=1237542960000#c7670910429973706274\">Neil asked me</a> <!-- keep link -->\nto look at another Nature Chemistry paper (DOI:<a href=\"http://dx.doi.org/10.1038/nchem.100\">10.1038/nchem.100</a>), and in particular\n<a href=\"http://www.nature.com/nchem/journal/v1/n1/compound/nchem.100_ci.html\">its Chemical Compounds table</a>. I could not directly\nspot the thing not in the table I discussed, but did notice the phosphate salts in the table. Not uncommonly, the counter ions are not near the phosphate in this diagram and I wondered how they did this in 3D.</p>\n\n<p>Well, bringing back good memories to that internship I mentioned, <a href=\"http://www.nature.com/nchem/journal/v1/n1/compound/nchem.100_comp5_3d.html\">the 3D model</a>\nshown by <a href=\"http://www.jmol.org/\">Jmol</a> actually does show the salt, and with the two sodiums near the phosphate;\neven better, they sit at very recognisable positions :)</p>\n\n<p><img src=\"/assets/images/phosphateSalt.png\" alt=\"\" /></p>",
      "summary": "A long time ago (‘96 or so?), as a student with the no longer existing CAOS/CAMM (Google shows some traces, like this chapter describing the centre), I did a short internship with Hilbert Bruijn-Slot (I hope I remember his name correctly), where has asked me to look at data in the CSD, and in particular the prefered position of phosphate counter ions. It was a fun research, and almost made it into a paper, if we were not just beating by a few months by a group of Russians who just published the same.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/phosphateSalt.png",
      "date_published": "2009-03-20T00:00:00+00:00",
      "date_modified": "2025-11-19T00:00:00+00:00",
      "tags": ["chemistry","jmol"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1038/nchem.100", "doi": "10.1038/nchem.100"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.3233/CMI-2014-000003", "doi": "10.3233/CMI-2014-000003"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/40377-hz881",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/03/19/nature-chemistry-improves-publishing.html",
      "title": "Nature Chemistry improves publishing chemistry: a detailed analysis",
      "content_html": "<p><a href=\"http://www.nature.com/nchem/\">Nature Chemistry</a> just released the first issue with a few free papers,\nlike <em>Asymmetric total syntheses of (+)- and (-)-versicolamide B and biosynthetic implications</em> by Miller et al.\n(DOI:<a href=\"https://doi.org/10.1038/nchem.110\">10.1038/nchem.110</a>).</p>\n\n<p>Now, we’ve seen the Royal Society of Chemistry’s <a href=\"http://chem-bla-ics.blogspot.com/search?q=project+prospect\">Project Prospect</a> <!-- keep link -->\n(see <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/02/01/rsc-first-publisher-to-go-semantic.html\">RSC: the first publisher to go semantic! <i class=\"fa-solid fa-recycle fa-xs\"></i></a>)\nand ChemSpiders recent <a href=\"http://www.chemmantis.com/\">ChemMantis</a> system which enriches\nthe papers with machine readable representations of the molecules discussed in those\npapers. The new Nature publication has been in the works for a while, and they\n<a href=\"http://blogs.nature.com/thescepticalchymist/2008/05/jj_day_98_service_with_a_simpl.html\">asked</a>\nthe community before what a Nature Chemistry paper should like like, and I replied in\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2008/05/08/re-what-should-nature-chemistry-paper.html\">Re: What should a Nature Chemistry paper look like? <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.</p>\n\n<h2 id=\"the-verdict\">The verdict</h2>\n\n<p>So, have the been listening? Is the HTML they produce semantic? Is it data rich? Or is it\njust another hamburger? Well, I am very happy to see some of the suggestions I made picked\nup (though I do not fool myself in believing I am the only one that suggested those\nfeatures). A tour of good things, and points for improvement.</p>\n\n<p>The first impression is not shocking; it looks like any other interface, with molecules drawn as images in the paper:</p>\n\n<p><img src=\"/assets/images/nchem3.png\" alt=\"\" /></p>\n\n<p>All structures that are numbered and linked (as in <em>C6-epi-stephacidin A (Compound <strong>13</strong>)</em>\nhave a hover-over function to popup a drawing of the structure:</p>\n\n<p><img src=\"/assets/images/nchem4.png\" alt=\"\" /></p>\n\n<p>The popup image is a nice gimmick, but not really sematically useful. The link, however,\nis! It points to a separate supplementary page with further information which include\na image of the 2D structure and, following a link, the 3D structure in <a href=\"http://www.jmol.org/\">Jmol</a>.\nMoreover, it comes with the machine readable representations:</p>\n\n<p><img src=\"/assets/images/nchem5.png\" alt=\"\" /></p>\n\n<p>This is indeed interesting, and a big step forward, though please do note my comments later.\nFor convenience, all molecules with such supplementary information is available from the\nspecial Chemical Compounds section of the paper:</p>\n\n<p><img src=\"/assets/images/nchem2.png\" alt=\"\" /></p>\n\n<p>Excellent! This really is a step forward towards a data-rich paper! Indeed, I will shortly\nwrite up a <a href=\"http://www.bioclipse.net/\">Bioclipse</a> plugin for Nature Chemistry, which\nwill download all molecular structures based on the DOI! Anyway, more on that later…\nFor this article, that table looks like:</p>\n\n<p><img src=\"/assets/images/nchem1.png\" alt=\"\" /></p>\n\n<p>By now, you likely also noted the links to <a href=\"http://pubchem.ncbi.nlm.nih.gov/\">PubChem</a>, and\nindeed, upon publication of a paper, all structures are deposited in the public domain:</p>\n\n<p><img src=\"/assets/images/nchem6.png\" alt=\"\" /></p>\n\n<p>At last but not least, each molecule is available in the <a href=\"http://en.wikipedia.org/wiki/Chemical_Markup_Language\">Chemical Markup Language</a>\n(with 2D coordinates)! And you know I am a very happy CML user for a long time (see e.g.\nPeter’s recent blog <a href=\"https://blogs.ch.cam.ac.uk/pmr/2009/03/13/egon-willighagen-and-cml/\">Egon Willighagen and CML <i class=\"fa-solid fa-recycle fa-xs\"></i></a>).\nBTW, one comment on the CML: the namespace used is the outdated namespace, <strong>not</strong>\nthe current one (see <a href=\"http://cmlexplained.blogspot.com/2007/06/there-can-be-only-one-namespace.html\">There can be only one (namespace)</a>).\n(But the <a href=\"http://cdk.sf.net/\">CDK</a> and Bioclipse will read it anyway.)</p>\n\n<h2 id=\"details-matter\">Details matter</h2>\n\n<p>So, while the first impression was not shocking, it was a bit deceptive. <em>Nature Chemistry</em>\nreally changes publishing of chemistry. But I have bad news too. They need to improve the\nHTML they produce.</p>\n\n<p>But before pointing out some missed chances, let me reply <em>inter alia</em> to Peter’s recent\nwork on the Open Source plugin for including semantic chemistry in MS-Word documents\n(see [How can we publish semantic chemical documents? <i class=\"fa-solid fa-recycle fa-xs\">&lt;/i](https://blogs.ch.cam.ac.uk/pmr/2009/03/16/how-can-we-publish-semantic-chemical-documents/)):\nNature Chemistry seems to have done a great job with existing tools. Nevertheless, I fully\nback up Peters comment that while the plugin is useless without Word, the results produced\nwith the plugin are extremely Open Standard, and enormously reusable! Indeed, while the\nWord file format is only formally an true Open Standard, the file format is plain XML, and\nextracting content bearing the CML namespace is trivial.</i></p>\n\n<p>Which reminds me, if someone from the Nature Chemistry team is reading this, please point\nme to a blog what tools actually <em>are</em> involved in publishing a Nature Chemistry paper!\nI think we all like to know.</p>\n\n<p>Now, the <a href=\"http://en.wikipedia.org/wiki/HTML\">HTML</a> has room for improvement. First of all,\na look at the metadata defined for the web page of the article shows a <em>description</em>\nand <em>keywords</em> about the journal, not the article, and the same goes for the web pages for\nthe molecules:</p>\n\n<p><img src=\"/assets/images/nchem7.png\" alt=\"\" /></p>\n\n<p>Additionally, the compound details web page has no special markup for the machine readable\ninformation:</p>\n\n<p><img src=\"/assets/images/nchem8.png\" alt=\"\" /></p>\n\n<p>Or, if it does, it’s still mixed with markup for visual pleasing output:</p>\n\n<p><img src=\"/assets/images/nchem9.png\" alt=\"\" /></p>\n\n<p>Still, the HTML is clean enough to have some regular expressions extract a good deal of\ninformation, and there is also still the PubChem deposition.</p>\n\n<h2 id=\"beyond-connection-tables\">Beyond connection tables</h2>\n\n<p>Like many other chemistry journals, Nature Chemistry does not consider properties of\nthe molecule interesting, and NMR spectra are hidden in the Supplementary Information.\nThis paper in particular, disregards a lot of machine readable facts by putting all\nexperimental section bits in a PDF document. So, the next challenge for Nature Chemistry\nwill be to get the authors of papers contribute the original spectra (JCAMP-DX, CMLSpect,\netc) in the supplementary information section. Better, have the raw data or even the NMR\npeak-atom annotations deposited in public repositories such (see \n<a href=\"https://chem-bla-ics.linkedchemistry.info/2009/03/04/open-nmr-data-raw-curves-and-annotated.html\">Open NMR data: raw curves and annotated peak lists <i class=\"fa-solid fa-recycle fa-xs\"></i></a>).</p>\n\n<p>All in all, I am rather positive about the first Nature Chemistry issue, and like to\nthank the editors and paper authors for there efforts on improving publishing chemistry!</p>",
      "summary": "Nature Chemistry just released the first issue with a few free papers, like Asymmetric total syntheses of (+)- and (-)-versicolamide B and biosynthetic implications by Miller et al. (DOI:10.1038/nchem.110).",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/nchem4.png",
      "date_published": "2009-03-19T00:00:00+00:00",
      "date_modified": "2025-12-07T00:00:00+00:00",
      "tags": ["inchi","chemistry","jmol"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1038/nchem.110", "doi": "10.1038/nchem.110"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/yk97t-8s820",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/03/18/nmrshiftdb-enters-rdfopenmoleculesnet.html",
      "title": "NMRShiftDB enters rdf.openmolecules.net",
      "content_html": "<p>This morning I finished setting up a <a href=\"http://en.wikipedia.org/wiki/RDF\">RDF</a> interface to the <a href=\"http://www.nmrshiftdb.org/\">NMRShiftDB</a> data\n(see <a href=\"http://pele.farmbio.uu.se/nmrshiftdb/?moleculeId=234\">nmr:234</a>):</p>\n\n<p><img src=\"/assets/images/nmrRDF.png\" alt=\"\" /></p>\n\n<p>And made links between the new frontend and <a href=\"http://rdf.openmolecules.net/\">rdf.openmolecules.net</a>, make the\n<em>Linked Open Chemistry Data</em> (LOCD) network grow (naming following <a href=\"http://esw.w3.org/topic/HCLSIG/LODD\">Linked Open Drug Data</a>).\nIn comparison with the previous depiction, I added arrows to indicate the direction of the linking. Green nodes still indicate\nsources with an RDF interface; therefore, the LOCD network consists really only of those green nodes:</p>\n\n<p><img src=\"/assets/images/ons2.png\" alt=\"\" /></p>\n\n<p>The link with DBPedia is discussed in <a href=\"https://chem-bla-ics.linkedchemistry.info/2009/02/17/dbpedia-enters-rdfopenmoleculesnet.html\">DBPedia enters rdf.openmolecules.net <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.\nThe <a href=\"http://github.com/egonw/nmrshiftdb-rdf/tree/master\">source code for the NMRShiftDB-RDF frontend</a> can be found at\n<a href=\"http://www.github.com/\">GitHub</a>.</p>",
      "summary": "This morning I finished setting up a RDF interface to the NMRShiftDB data (see nmr:234):",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/nmrRDF.png",
      "date_published": "2009-03-18T00:00:00+00:00",
      "date_modified": "2025-11-29T00:00:00+00:00",
      "tags": ["nmrshiftdb","rdf","opendata","nmrshiftdb"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/et43s-kc120",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/03/14/autogenerating-cml-bindings-for-xmpp.html",
      "title": "Autogenerating CML bindings for XMPP services with XMLBeans",
      "content_html": "<p>I blogged earlier about our efforts to create a better <a href=\"http://en.wikipedia.org/wiki/SOAP\">SOAP</a>\nservice architecture, based on <a href=\"http://en.wikipedia.org/wiki/Jabber\">XMPP</a>:</p>\n\n<ul>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2009/01/21/details-behind-calling-xmpp-cloud.html\">Details behind the “Calling XMPP cloud services from Taverna2” <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2009/01/19/calling-xmpp-cloud-services-from.html\">Calling XMPP cloud services from Taverna2 <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2008/10/31/next-generation-asynchronous.html\">Next generation asynchronous webservices <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n</ul>\n\n<p>So, I set up XMPP services for QSAR descriptor calculation, 2D diagram and 3D geometry\ncalculations and a few more, using the <a href=\"http://cdk.sf.net/\">CDK</a>.\n<a href=\"http://en.wikipedia.org/wiki/Chemical_Markup_Language\">Chemical Markup Language</a> has been my\nprimary choice for some 10 years now (see <a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=1241\">Peter’s blog</a>)\nas it allows me to do things I cannot do in other formats.</p>\n\n<p>Now, our XMPP services publish themselves what data types the allow as input and what they output\nin return. They do this by publishing XML Schema to describe the input and output types. My CDK\nservices use CML, so they return the CML schema. Johannes’ <a href=\"http://xws4j.sourceforge.net/\">xws4j</a>\nimplementation of the <a href=\"http://xmpp.org/extensions/xep-0244.html\">IO-DATA</a>\nspecification has an add on that can build bindings to the schema on the fly. Now, CML comes with\na good <a href=\"http://www.xom.nu/\">XOM</a>-based binding (called <a href=\"http://wwmm.ch.cam.ac.uk/maven2/cml/cmlxom/\">CMLXOM</a>)\nso this is not strictly necessary, but for less common schemata it is worthwhile: you can always\ncreate bindings for brand new schemata, for older versions, for whatever. Services can even\ncreate their own local schemata, and people will still be able to easily use them. This is to me\na big plus for this architecture.</p>\n\n<p>Anyway, while CMLXOM exists, we wanted to show that the on-the-fly creation of bindings works,\neven for large schemata, such as CML. However, one of the older flavours had an small error in a\nregular expression in a data type CML defines. Johannes therefore asked me to test building\nbindings for the CML schema version used in my services. He adviced me to use scomp for this,\nwhich is a command line utility around the <a href=\"http://xmlbeans.apache.org/\">XMLBeans</a>\nlibrary used for the binding generation.</p>\n\n<p>As I am running <a href=\"http://www.ubuntu.com/\">Ubuntu</a>, I preferred installing\n<a href=\"http://packages.ubuntu.com/jaunty/libxmlbeans-java\">the packaged version</a> instead of installing\nthe binary provided by XMLBeans. Now, after I did this, I noticed that this .deb did not install\nthe scomp utility, so I filled a <a href=\"https://bugs.launchpad.net/ubuntu/+source/xmlbeans/+bug/342349\">wishlist bug report</a>.\nEarlier this week I already encountered another bug, but this package being Java, I had a good\nidea on how to fix the bug.</p>\n\n<p>And so I implemented my own wishlist. I’m sure there is room for improvement, as my .deb\npackaging skills are a bit rusty (a very long time ago I have been in the Debian New Maintainers\nqueue, but by the time they solved the long queue delays, I was too occupied with other things.\nYes, this was a long time ago already :). Anyway, Ubuntu’s <a href=\"http://launchpad.net/\">LaunchPad</a>\nhas a nice feature, called the <a href=\"http://launchpad.net/ubuntu/+ppas\">Personal Package Archives</a>.\nThis service will, after I have finished hacking on the packaging specs in the famous <em>debian/</em>\nfolder and tested the <em>.debs</em> build from it, will rebuild it and put the resulting package up for\ndownload.</p>\n\n<p>Conclusion: a perfect opportunity to finally gives this a try. The learning curve was\nsurprisingly shallow, and the result can be seen in <a href=\"https://launchpad.net/~egonw/+archive/ppa\">my personal package archive</a>:</p>\n\n<p><img src=\"/assets/images/ppa.png\" alt=\"\" /></p>\n\n<p>Now, you can easily imagine that I will soon work on packaging stuff I did in the past too, such\nas update <a href=\"http://packages.ubuntu.com/jaunty/libcdk-java\">libcdk-java</a> and now that OpenJDK in\nmain can run <a href=\"http://www.jmol.org/\">Jmol</a> reasonably, finally package Jmol for main. I just hope\nI remember my <a href=\"http://alioth.debian.org/\">Alioth</a> account, so that I can properly contribute to\nthe <a href=\"http://alioth.debian.org/projects/debichem/\">debichem</a> project.</p>\n\n<p>Getting back to running <em>scomp</em> on the CML scheme, it works with one minor problem:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nv\">$ </span>scomp <span class=\"nt\">-src</span> <span class=\"nb\">.</span> <span class=\"nt\">-d</span> <span class=\"nb\">.</span>  cml.xsd\n/home/egonw/tmp/cml/cml.xsd:10098:9: warning: p-props-correct.2.2: maxOccurs must be greater than or equal to 1.\nTime to build schema <span class=\"nb\">type </span>system: 1.792 seconds\nTime to generate code: 3.297 seconds\nTime to compile code: 9.658 seconds\n</code></pre></div></div>\n\n<p>The problem is reflected by line 10098 which goes like:</p>\n\n<div class=\"language-xml highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nt\">&lt;xsd:sequence</span> <span class=\"na\">minOccurs=</span><span class=\"s\">\"0\"</span> <span class=\"na\">maxOccurs=</span><span class=\"s\">\"0\"</span><span class=\"nt\">&gt;</span>\n</code></pre></div></div>\n\n<p>which can be traced down to line 23 in <a href=\"http://cml.svn.sf.net/viewvc/cml/schema2/trunk/elements/tableHeaderCell.xsd?revision=161&amp;view=markup\">schema2/trunk/elements/tableHeaderCell.xsd</a>.\nI filled a <a href=\"https://sourceforge.net/tracker2/?func=detail&amp;aid=2686810&amp;group_id=51361&amp;atid=463005\">bug report about this</a>.</p>",
      "summary": "I blogged earlier about our efforts to create a better SOAP service architecture, based on XMPP:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/ppa.png",
      "date_published": "2009-03-14T00:00:00+00:00",
      "date_modified": "2025-11-29T00:00:00+00:00",
      "tags": ["cml","java","ubuntu","xml","xmpp"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/8ggrr-e4m33",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/03/12/bioclipse-powerful-jmol-application.html",
      "title": "Bioclipse: a powerful Jmol application",
      "content_html": "<p>While <a href=\"http://www.bioclipse.net/\">Bioclipse</a> is much more, it could be an interesting alternative to the\n<a href=\"http://www.jmol.org/\">Jmol</a> application. It offers:</p>\n\n<ul>\n  <li>a scripting console</li>\n  <li>a file browser (the <a href=\"http://www.eclipse.org/\">Eclipse</a> way)</li>\n  <li>an outline of the file content which allows selections</li>\n  <li>a script editor</li>\n</ul>\n\n<p>The underlying RCP toolkit has many other interesting features for a Jmol application, but the above is up and running:</p>\n\n<p><img src=\"/assets/images/jmolBioclipse.png\" alt=\"\" /></p>",
      "summary": "While Bioclipse is much more, it could be an interesting alternative to the Jmol application. It offers:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/jmolBioclipse.png",
      "date_published": "2009-03-12T00:00:00+00:00",
      "date_modified": "2009-03-12T00:00:00+00:00",
      "tags": ["bioclipse","jmol"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/cxds4-qnq95",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/03/04/open-nmr-data-raw-curves-and-annotated.html",
      "title": "Open NMR data: raw curves and annotated peak lists",
      "content_html": "<p>Games are known to trigger technical innovation. But recently it also triggered innovation on open chemical databases. Jean-Claude\n<a href=\"http://usefulchem.blogspot.com/2009/03/spectral-game-update.html\">reported</a>:</p>\n\n<blockquote>\n  <p>We are very excited by what we have put together so far. There are currently 457 H NMR, 389 C NMR, 11 IR and 29 NIR spectra. This\nis only possible because of people who submitted their spectra to ChemSpider as Open Data - please keep uploading!</p>\n</blockquote>\n\n<p>Now, the <a href=\"http://nmrshiftdb.org/\">NMRShiftDB</a> also hosts quite a number of NMR spectra, and I have a hobby to submit spectra,\nparticularly for rare nuclei. In particular, I think it is fun to to have as many as possible structures which have spectra for\nall the nuclei in that structure. <a href=\"http://en.wikipedia.org/wiki/Benzene\">Benzene</a> is a simple example for which NMR spectra are\navailable for all nuclei (see <a href=\"http://nmrshiftdb.chemie.uni-mainz.de/portal/js_pane/P-Results/nmrshiftdbaction/showDetailsFromHome/molNumber/7901\">this entry</a>).</p>\n\n<p>Now, the main difference between the NMRShiftDB and <a href=\"http://www.chemspider.com/\">ChemSpider</a> spectral data is the the first are annotated\npeak lists (each shift is assigned to an atom), and the latter are full, but unannotated, spectral curves. So, there are quite a few\nthings you could do here. For example, see which structures which NMR curves are not yet annotated in NMRShiftDB.\n<a href=\"http://www.chemspider.com/blog/\">Antony</a> pointed me to <a href=\"http://www.chemspider.com/spectra.aspx\">this page</a> which is an overview\nof all spectral data in ChemSpider, but that page is difficult to machine process. Partly, because it is a mix of Open and\nProprietary data, and partly because it uses JavaScript to navigate the table. (BTW, RDF interfaces to both resources would\nbe much more helpful, and simply allow me to query all molecules which have a spectrum which is Open, and which is not found\nin the NMRShiftDB. I am working on a RDF interface to NMRShiftDB.)</p>\n\n<p>Antony also <a href=\"http://usefulchem.blogspot.com/2009/03/spectral-game-update.html#c6864087472599346745\">asked</a> me to advertise the\noption to upload Open spectral curves to ChemSpider. So, hereby. However, I really do hope ChemSpider will make it easier for\nothers to reuse all the Open Data, as having to machine browsing the linked HTML interface is a waste of ChemSpider computing\nresources.</p>\n\n<p><strong>Update</strong>: the game is now available from <a href=\"http://spectralgame.com/\">spectralgame.com</a>.</p>",
      "summary": "Games are known to trigger technical innovation. But recently it also triggered innovation on open chemical databases. Jean-Claude reported:",
      
      "date_published": "2009-03-04T00:00:00+00:00",
      "date_modified": "2009-03-04T00:00:00+00:00",
      "tags": ["opendata","chemspider","nmr","nmrshiftdb"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/qc1ee-v0c37",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/03/03/open-data-versus-capatalism.html",
      "title": "Open Data versus Capatalism?",
      "content_html": "<p><a href=\"http://iandavis.com/blog/\">Ian Davis</a> was recently <a href=\"http://mndoci.tumblr.com/post/82535784/open-data-is-more-important-than-open-source\">quoted</a> saying\n<em>open data is more important than open source</em>, which was pulled (out of context) from <a href=\"http://www.slideshare.net/iandavis/code4lib2009-keynote-1073812\">this presentation</a>.\nThe context was (a slide earlier): <em>Data outlasts code</em>.</p>\n\n<p>As far as I can see, this is <strong>utter nonsense</strong>, even within context of the slide (see also this\n<a href=\"http://friendfeed.com/e/4a8c5012-ee18-2b72-390a-ca3298b3a57c/open-data-is-more-important-than-open/\">discussion on FriendFeed</a>). Obviously, within the context of\n<em>Ian</em> it does makes sense, and I hope he will respond in his blog and explain why he thinks Open Data is more special.</p>\n\n<p>Without code, you have no way of accessing the data. Ask anyone to recover from a hard disk failure. In\n<a href=\"http://blueobelisk.sourceforge.net/wiki/ODOSOS\">ODOSOS</a> (Open Standards, Open Data, Open Source) they are all equal. You need them all for progress.\nYou cannot single out one as being more important than another. Why would you anyway? Politics is all I can think of… All three combine and ensure\nour science is more efficient.</p>\n\n<p><a href=\"http://www.abhishek-tiwari.com/\">Fishy Perspective</a> (what’s in a name) comments on this in <a href=\"http://www.abhishek-tiwari.com/2009/03/data-vendetta.html\">Data Vendetta</a>,\nand I will take one quote out of context:</p>\n\n<blockquote>\n  <p>Organizations are spending lot of money do generate proprietary data to safeguard its competitive edge, why you are convinced that\nthey need to disclose that, no one is here for charity. Most the companies have their proprietary data policies, and they release\nthe data in public only when there is sufficient overlap from publicly available databases.</p>\n</blockquote>\n\n<h2 id=\"open-data-versus-capitalism\">Open Data versus Capitalism?</h2>\n\n<p>Companies are about money making, and there is nothing wrong with that. Others to work to make the world a better place.</p>\n\n<p>If Rosalind had not shared her data (following <em>Data Vendetta</em>, and not going into whether she did willingly or knowingly), all current\npharmaceutical research would have been delayed by half a year(?), more(?)… who knows. Even that half year would have meant quite a lot\nof death people. A lot of medicine would have not been discovered or hit the market at the same time. Capitalism is one thing, not good,\nnot bad, orthogonal really. <strong>Capitalism as ideology does not contradict Open Data</strong>. But sharing knowledge as Open Data always has a positive\neffect on mankind.</p>\n\n<p>If you want to make money, please do, as much as you can. But please pick carefully what you want to make money on. Be creative!\nDo some innovation! Be bold! Go where no one has gone before!</p>",
      "summary": "Ian Davis was recently quoted saying open data is more important than open source, which was pulled (out of context) from this presentation. The context was (a slide earlier): Data outlasts code.",
      
      "date_published": "2009-03-03T00:00:00+00:00",
      "date_modified": "2009-03-03T00:00:00+00:00",
      "tags": ["opendata","openscience","odosos"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/j770h-q642",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/03/01/solubility-data-in-bioclipse-4-finding.html",
      "title": "Solubility Data in Bioclipse #4: Finding ChEBI IDs (Again, but better)",
      "content_html": "<p>Those who carefully analyzed the second SPARQL query in\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2009/02/27/solubility-data-in-bioclipse-3-finding.html\">Solubility Data in Bioclipse #3: Finding ChEBI IDs <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nwill have noticed the use of <em>owl:sameAs</em>. Those who did not, here’s the query again:</p>\n\n<div class=\"language-sparql highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">PREFIX</span><span class=\"w\"> </span><span class=\"nn\">owl</span><span class=\"o\">:</span><span class=\"w\"> </span><span class=\"nn\">&lt;http://www.w3.org/2002/07/owl#&gt;</span><span class=\"w\">\n</span><span class=\"k\">PREFIX</span><span class=\"w\"> </span><span class=\"nn\">ons</span><span class=\"o\">:</span><span class=\"w\"> </span><span class=\"nn\">&lt;http://spreadsheet.google.com/plwwufp30hfq0udnEmRD1aQ/onto#&gt;</span><span class=\"w\">\n</span><span class=\"k\">PREFIX</span><span class=\"w\"> </span><span class=\"nn\">rdfonm</span><span class=\"o\">:</span><span class=\"w\"> </span><span class=\"nn\">&lt;http://rdf.openmolecules.net/#&gt;</span><span class=\"w\">\n</span><span class=\"k\">PREFIX</span><span class=\"w\"> </span><span class=\"nn\">dc</span><span class=\"o\">:</span><span class=\"w\"> </span><span class=\"nn\">&lt;http://purl.org/dc/elements/1.1/&gt;</span><span class=\"w\">\n\n</span><span class=\"k\">SELECT</span><span class=\"w\"> </span><span class=\"k\">DISTINCT</span><span class=\"w\"> </span><span class=\"nv\">?title</span><span class=\"w\"> </span><span class=\"nv\">?chebi</span><span class=\"w\"> </span><span class=\"k\">WHERE</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\">\n  </span><span class=\"nv\">?solvent</span><span class=\"w\"> </span><span class=\"k\">a</span><span class=\"w\"> </span><span class=\"nn\">ons</span><span class=\"o\">:</span><span class=\"ss\">Solvent</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n  </span><span class=\"nv\">?solvent</span><span class=\"w\"> </span><span class=\"nn\">dc</span><span class=\"o\">:</span><span class=\"ss\">title</span><span class=\"w\"> </span><span class=\"nv\">?title</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n  </span><span class=\"nv\">?solvent</span><span class=\"w\"> </span><span class=\"nn\">owl</span><span class=\"o\">:</span><span class=\"ss\">sameAs</span><span class=\"w\"> </span><span class=\"nv\">?same</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n  </span><span class=\"nv\">?same</span><span class=\"w\"> </span><span class=\"nn\">rdfonm</span><span class=\"o\">:</span><span class=\"ss\">chebiid</span><span class=\"w\"> </span><span class=\"nv\">?chebi</span><span class=\"w\">\n</span><span class=\"p\">}</span><span class=\"w\">\n</span></code></pre></div></div>\n\n<p>This syntax is a bit clumsy, considering we said <em>?solvent</em> and <em>?same</em> are the same thing. Fortunately, there are tools that do take\nthis into account. One such tool for <a href=\"http://jena.sourceforge.net/\">Jena</a> (which I use in Bioclipse) is\n<a href=\"http://www.mindswap.org/2003/pellet/\">Pellet</a>. I just commited code for Bioclipse to use Pellet, which simplifies the above query to:</p>\n\n<div class=\"language-sparql highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">PREFIX</span><span class=\"w\"> </span><span class=\"nn\">ons</span><span class=\"o\">:</span><span class=\"w\"> </span><span class=\"nn\">&lt;http://spreadsheet.google.com/plwwufp30hfq0udnEmRD1aQ/onto#&gt;</span><span class=\"w\">\n</span><span class=\"k\">PREFIX</span><span class=\"w\"> </span><span class=\"nn\">rdfonm</span><span class=\"o\">:</span><span class=\"w\"> </span><span class=\"nn\">&lt;http://rdf.openmolecules.net/#&gt;</span><span class=\"w\">\n</span><span class=\"k\">PREFIX</span><span class=\"w\"> </span><span class=\"nn\">dc</span><span class=\"o\">:</span><span class=\"w\"> </span><span class=\"nn\">&lt;http://purl.org/dc/elements/1.1/&gt;</span><span class=\"w\">\n\n</span><span class=\"k\">SELECT</span><span class=\"w\"> </span><span class=\"k\">DISTINCT</span><span class=\"w\"> </span><span class=\"nv\">?title</span><span class=\"w\"> </span><span class=\"nv\">?chebi</span><span class=\"w\"> </span><span class=\"k\">WHERE</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\">\n  </span><span class=\"nv\">?solvent</span><span class=\"w\"> </span><span class=\"k\">a</span><span class=\"w\"> </span><span class=\"nn\">ons</span><span class=\"o\">:</span><span class=\"ss\">Solvent</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n  </span><span class=\"nv\">?solvent</span><span class=\"w\"> </span><span class=\"nn\">dc</span><span class=\"o\">:</span><span class=\"ss\">title</span><span class=\"w\"> </span><span class=\"nv\">?title</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n  </span><span class=\"nv\">?solvent</span><span class=\"w\"> </span><span class=\"nn\">rdfonm</span><span class=\"o\">:</span><span class=\"ss\">chebiid</span><span class=\"w\"> </span><span class=\"nv\">?chebi</span><span class=\"w\">\n</span><span class=\"p\">}</span><span class=\"w\">\n</span></code></pre></div></div>\n\n<p>The key thing here to understand, and I know this is rather abstract, is that the RDF document we build for the ONS Solubility data does\nnot define the relation between Solvent and ChEBI identifiers, but using RDF we know this to be true. Only because the system now understands\nthe <code class=\"language-plaintext highlighter-rouge\">owl:sameAs</code> relation.</p>\n\n<p>Now, Pellet does not stop there, and there are many more statements we can make. Even better, anyone can plug in such relations. Any\ndatabase can define <em>owl:sameAs</em> and other relations, so that we can transparently browse the internet for chemistry in a semantically\nmeaningful way.</p>\n\n<p>I also know that the above is rather technical. For those chemists who have not stopped reading yet, what I would like to hear from you\nis what data you would like to see linked. It does not really matter what, because we can do it all (given\n<a href=\"http://en.wikipedia.org/wiki/Open_data\">Open Data</a>).</p>",
      "summary": "Those who carefully analyzed the second SPARQL query in Solubility Data in Bioclipse #3: Finding ChEBI IDs will have noticed the use of owl:sameAs. Those who did not, here’s the query again:",
      
      "date_published": "2009-03-01T00:00:00+00:00",
      "date_modified": "2025-11-29T00:00:00+00:00",
      "tags": ["sparql","chebi","bioclipse"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/gh3np-xbm68",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/02/27/solubility-data-in-bioclipse-3-finding.html",
      "title": "Solubility Data in Bioclipse #3: Finding ChEBI IDs",
      "content_html": "<p>With the RDF functionality set up in <a href=\"http://www.bioclipse.net/\">Bioclipse</a> (see\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2009/02/22/solubility-data-in-bioclipse-2-handling.html\">Solubility Data in Bioclipse #2: handling RDF <i class=\"fa-solid fa-recycle fa-xs\"></i></a>),\nwe can start mining the Chemical RDF space. Check out this mashup:</p>\n\n<script src=\"https://gist.github.com/egonw/71677.js\"></script>\n\n<p>What happens in this script is the following:</p>\n\n<ol>\n  <li>Load the ONS Solubility data (line 4-5)</li>\n  <li>ask for all owl:sameAs relations to navigate (line 8-14)</li>\n  <li>load the RDF for the <a href=\"https://chem-bla-ics.linkedchemistry.info/2009/02/17/dbpedia-enters-rdfopenmoleculesnet.html\">rdf.openmolecule.net <i class=\"fa-solid fa-recycle fa-xs\"></i></a> resources (line 16-26)</li>\n  <li>query for all solvents which have an <a href=\"http://www.ebi.ac.uk/chebi/\">ChEBI</a> identifier (line 28-38)</li>\n</ol>\n\n<p>The output will look like the following (in the future this will be opened as spreadsheet in Bioclipse):</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>[[ethanol 40C, CHEBI:16236],\n[acetonitrile, CHEBI:38472],\n[chloroform, CHEBI:35255],\n[methanol 30C, CHEBI:17790],\n[THF, CHEBI:26911],\n[ethanol, CHEBI:16236],\n[ethanol 30C, CHEBI:16236],\n[methanol 40C, CHEBI:17790],\n[methanol, CHEBI:17790]]\n</code></pre></div></div>\n\n<p>Now, this example shows a simple yet powerful feature of how RDF is used nowadays: the ChEBI identifier was not part of the original\n<a href=\"https://spreadsheets.google.com/ccc?key=plwwufp30hfq0udnEmRD1aQ&amp;hl=en\">Solubility spreadsheet at Google Docs</a>. But, taking advantage\nof the unique and <em>resolvable</em> URIs for molecules, when can simply look them up.</p>\n\n<p>Nice, isn’t it?</p>",
      "summary": "With the RDF functionality set up in Bioclipse (see Solubility Data in Bioclipse #2: handling RDF ), we can start mining the Chemical RDF space. Check out this mashup:",
      
      "date_published": "2009-02-27T00:00:00+00:00",
      "date_modified": "2025-11-29T00:00:00+00:00",
      "tags": ["gist","sparql","rdf","chebi"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/vsb09-mjr40",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/02/25/rdf-for-chemistry.html",
      "title": "RDF for chemistry",
      "content_html": "<p><a href=\"http://www.iscb.org/cms_addon/conferences/cshals2009/io-informatics-news.php\">C-SHALS 2009</a> (<em>Conference on Semantics in Healthcare and Life Sciences</em>)\nhas just started, and has coverage in a <a href=\"http://cshals.blogspot.com/\">blog</a> and in a <a href=\"http://friendfeed.com/rooms/cshals-2009\">FriendFeed room</a>. It\nnicely coincides with Rich’ blog on <a href=\"https://doi.org/10.59350/a2w3n-cvb94\">What the Heck is the Semantic Web? <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\nand the RDF work I have recently done on <a href=\"https://chem-bla-ics.linkedchemistry.info/2009/02/17/dbpedia-enters-rdfopenmoleculesnet.html\">rdf.openmolecules.net <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nand <a href=\"https://chem-bla-ics.linkedchemistry.info/2009/02/22/solubility-data-in-bioclipse-2-handling.html\">Bioclipse <i class=\"fa-solid fa-recycle fa-xs\"></i></a>. (Oh, do I wish I could have attended that\nconference.)</p>\n\n<p>Anyway, I was proudly surprised to see <a href=\"http://chem-bla-ics.blogspot.com/search?q=sechemtic\">sechemtic</a> show up in the\n<a href=\"http://cambridgesemantics.com/2008/09/sem-web-introduction/\">Semantic Web technologies: Introduction and Survey</a> tutorial by Lee Feigenbaum of\n<a href=\"http://cambridgesemantics.com/\">Cambridge Semantics</a>:</p>\n\n<p><img src=\"/assets/images/sechemticCSHALS.png\" alt=\"\" /></p>\n\n<p><strong>Use?</strong> Rich was asking what could be done with RDF for chemistry… here is a <a href=\"http://semantically-challenged.blogspot.com/2008/11/cool-uris-for-molecules.html\">nice mashup by Phil Ashworth</a>:\n<a href=\"http://homepage.ntlworld.com/philnjo/phil/chemdemo/chemdemo1.html\">a Google Map</a> showing the locations where certain chemical can be bought:</p>\n\n<p><img src=\"/assets/images/googleChemMashup.png\" alt=\"\" /></p>",
      "summary": "C-SHALS 2009 (Conference on Semantics in Healthcare and Life Sciences) has just started, and has coverage in a blog and in a FriendFeed room. It nicely coincides with Rich’ blog on What the Heck is the Semantic Web? , and the RDF work I have recently done on rdf.openmolecules.net and Bioclipse . (Oh, do I wish I could have attended that conference.)",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/sechemticCSHALS.png",
      "date_published": "2009-02-25T00:00:00+00:00",
      "date_modified": "2025-12-29T00:00:00+00:00",
      "tags": ["google","chemistry","rdf","semweb"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.59350/a2w3n-cvb94", "doi": "10.59350/a2w3n-cvb94"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/z5rpq-5k922",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/02/22/solubility-data-in-bioclipse-2-handling.html",
      "title": "Solubility Data in Bioclipse #2: handling RDF",
      "content_html": "<p><a href=\"http://en.wikipedia.org/wiki/Resource_Description_Framework\">RDF</a> is swiftly becoming the <a href=\"http://en.wikipedia.org/wiki/Lingua_franca\">lingua franca</a> of life sciences\n(see for example [<a href=\"https://chem-bla-ics.linkedchemistry.info/2007/10/24/one-billion-biochemical-rdf-triples.html\">1 <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,<a href=\"https://chem-bla-ics.linkedchemistry.info/2007/07/31/rdf-ing-molecular-space.html\">2 <i class=\"fa-solid fa-recycle fa-xs\"></i></a>]).\nBioclipse is an excellent platform to visualize results from analysis of the network, both for graph visualization (see\n[<a href=\"https://chem-bla-ics.linkedchemistry.info/2008/11/18/solubility-data-in-bioclipse-1.html\">3 <i class=\"fa-solid fa-recycle fa-xs\"></i></a>]), as well of visualization of domain\nspecific data types (e.g. sequences, molecules, …).</p>\n\n<p>Yesterday I uploaded a Bioclipse feature that adds a <em>rdf</em> manager to handle RDF content, which includes\n<a href=\"http://en.wikipedia.org/wiki/SPARQL\">SPARQL</a> support. The below snippet shows application to the solubility data\n[<a href=\"https://chem-bla-ics.linkedchemistry.info/2008/11/18/solubility-data-in-bioclipse-1.html\">3 <i class=\"fa-solid fa-recycle fa-xs\"></i></a>]:</p>\n\n<p>See also:</p>\n\n<ul>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2007/10/24/one-billion-biochemical-rdf-triples.html\">One Billion Biochemical RDF Triples! <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2007/07/31/rdf-ing-molecular-space.html\">RDF-ing molecular space <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2008/11/18/solubility-data-in-bioclipse-1.html\">Solubility Data in Bioclipse #1 <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n</ul>",
      "summary": "RDF is swiftly becoming the lingua franca of life sciences (see for example [1 ,2 ]). Bioclipse is an excellent platform to visualize results from analysis of the network, both for graph visualization (see [3 ]), as well of visualization of domain specific data types (e.g. sequences, molecules, …).",
      
      "date_published": "2009-02-22T00:00:00+00:00",
      "date_modified": "2026-03-19T00:00:00+00:00",
      "tags": ["bioclipse","sparql","rdf"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/fja51-1ga17",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/02/21/bioclipse2-scripting-2-searching.html",
      "title": "Bioclipse2 Scripting #2: searching PubChem",
      "content_html": "<p>This week I have been porting the PubChem plugin for <a href=\"http://www.bioclipse.net/\">Bioclipse</a> 1.2 to the new manager-based architecture. While still working on the Wizards,\nyou can run the following JavaScript in Bioclipse2 from SVN and from the next beta (*):</p>\n\n<script src=\"https://gist.github.com/67462.js\"></script>\n\n<p>*) There was some confusion on the <em>two</em> beta Bioclipse2 releases so far. Some people expected a release without any bugs left. That release is what we planned to call a\n<em>Release Candidate</em>. We agree that the first two betas at least turned out to be more alpha than we actually hoped, and we thank everyone who has given these releases a\ngo. Those who tried several development releases of Bioclipse2 saw a lot of ongoing development, and we are fixing\n<a href=\"http://bugs.bioclipse.net/\">any bug reported</a> on these releases. So, do not hesitate in reporting bugs!</p>\n\n<p>Earlier in this series:</p>\n\n<ul>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2008/10/25/bioclipse2-scripting-1-from-smiles-to.html\">Bioclipse2 Scripting #1: from SMILES to a UFF optimized structure in Jmol <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2008/11/04/next-generation-asynchronous.html\">Next generation asynchronous webservices #2 <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2008/11/20/scripting-jchempaint.html\">Scripting JChemPaint <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2009/02/15/bioclipse-for-cdk-developers-1.html\">Bioclipse for CDK Developers #1 <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n</ul>",
      "summary": "This week I have been porting the PubChem plugin for Bioclipse 1.2 to the new manager-based architecture. While still working on the Wizards, you can run the following JavaScript in Bioclipse2 from SVN and from the next beta (*):",
      
      "date_published": "2009-02-21T00:00:00+00:00",
      "date_modified": "2025-11-29T00:00:00+00:00",
      "tags": ["bioclipse","pubchem","javascript"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/tt4n7-sqs27",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/02/17/dbpedia-enters-rdfopenmoleculesnet.html",
      "title": "DBPedia enters rdf.openmolecules.net",
      "content_html": "<p>As of tonight, <a href=\"http://rdf.openmolecules.net/\">rdf.openmolecules.net</a> links to the chemistry <a href=\"http://www.dbpedia.org/\">DBPedia</a> (1816 chemical compounds),\nfor which I used the SPARQL given in <a href=\"https://chem-bla-ics.linkedchemistry.info/2009/02/dbpedia-lookup-and-autocomplete-of.html\">DBPedia: lookup and autocomplete of chemistry <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.\nIt’s first of several steps to extend rdf.openmolecules.net to link up various chemistry database. The below figure shows the current state, where the green nodes are fully RDF-ied:</p>\n\n<p><img src=\"/assets/images/ons.png\" alt=\"\" /></p>\n\n<p>Drugs are still missing, but will add those too, and since not all entries had InChIs, SMILES were converted using\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2009/02/10/cdk-12-release-candidate.html\">CDK 1.1.5 <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.</p>",
      "summary": "As of tonight, rdf.openmolecules.net links to the chemistry DBPedia (1816 chemical compounds), for which I used the SPARQL given in DBPedia: lookup and autocomplete of chemistry . It’s first of several steps to extend rdf.openmolecules.net to link up various chemistry database. The below figure shows the current state, where the green nodes are fully RDF-ied:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/ons.png",
      "date_published": "2009-02-17T00:00:00+00:00",
      "date_modified": "2025-11-29T00:00:00+00:00",
      "tags": ["rdf","dbpedia","sparql","inchi","smiles","chebi","cb"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/zv2f4-ac581",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/02/15/bioclipse-for-cdk-developers-1.html",
      "title": "Bioclipse for CDK Developers #1",
      "content_html": "<p>Ola has released the <a href=\"http://bioclipse.blogspot.com/2009/02/bioclipse-20-beta2-released.html\">second beta for Bioclipse 2.0</a>.\nThings are getting along, and I will not go into details on the <a href=\"http://bioclipse.blogspot.com/2008/08/bioclipse-20-alpha01-released.html\">molecules table Arvid is working on</a>,\nthe 1GB+ SD file support, the <a href=\"https://chem-bla-ics.linkedchemistry.info/2008/12/30/editing-and-validation-of-cml-documents.html\">validating CML editor <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\nthe <a href=\"https://chem-bla-ics.linkedchemistry.info/2008/11/04/next-generation-asynchronous.html\">support for XMPP services <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\nor the <a href=\"http://bioclipse.blogspot.com/2009/02/bioclipse-20-beta2-released.html\">brand new welcome page</a>\nwhich will guide new users around in what Bioclipse has to offer.</p>\n\n<p>This blog will focus on what <a href=\"http://www.bioclipse.net/\">Bioclipse</a> has to offer <a href=\"http://cdk.sf.net/\">CDK</a> developers.</p>\n\n<p>While Bioclipse 1.x (doi:<a href=\"https://doi.org/10.1186/1471-2105-8-59\">10.1186/1471-2105-8-59</a>) was a prototype that showed the\npower if integrating different bio- and cheminformatics tools, Bioclipse2 was designed from scratch, taking advantage of\nthe latest <a href=\"http://wiki.eclipse.org/index.php/Rich_Client_Platform\">Eclipse RCP</a> technologies. More importantly, the\nteam in Uppsala decided to have all functionality work via managers, allowing all actions to be recorded. <em>And</em>,\nscripting of Bioclipse. I blogged earlier about <a href=\"https://chem-bla-ics.linkedchemistry.info/2008/11/20/scripting-jchempaint.html\">scripting JChemPaint <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\nand <a href=\"https://chem-bla-ics.linkedchemistry.info/2008/10/25/bioclipse2-scripting-1-from-smiles-to.html\">creating UFF optimized 3D structures from SMILES <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2009/01/16/bioclipse-and-gist-integration.html\">Example scripts <i class=\"fa-solid fa-recycle fa-xs\"></i></a> can be found on\nGitHub (this is <a href=\"http://github.com/blog/317-scripting-bioclipse\">their coverage</a>), and are\n<a href=\"http://delicious.com/tag/bioclipse+gist\">indexed on Delicious</a>.</p>\n\n<h2 id=\"r-for-cheminformatics\">R for cheminformatics</h2>\n\n<p>The fact that we can script everything makes Bioclipse an ideal platform for doing cheminformatics: we have access to a variety of\ncheminformatics libraries, <em>and</em> the means to visualize results via <a href=\"http://jchempaint.sf.net/\">JChemPaint</a> and\n<a href=\"http://www.jmol.org/\">Jmol</a>. It is like R for cheminformatics: Bioclipse being the R command line, Bioclipse plugins the R\npackages. Eclipse provides an mechanism called <em>Update Sites</em>, which makes something like CRAN redundant. Back to the Chemistry\nDevelopment Kit.</p>\n\n<p>Over the next weeks, I will blog about scripts aimed at CDK developers and people who want to learn more on how the CDK\ninternals work. This series assumes Bioclipse 2.0 beta2 (or better) and the CDK Feature installed. I’ll be using the Gist\nwidget to embed scripts in this blog, but you can always download the Gist directly into Bioclipse, with the GUI as described\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2009/01/16/bioclipse-and-gist-integration.html\">here <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.</p>\n\n<p>Bioclipse uses JavaScript (maybe other scripting languages in the future. File a wishlist report if you like to see Jython,\nBeanShell or other support in the <a href=\"http://bugs.bioclipse.net/\">Bioclipse bug track system</a>.)\nBioclipse managers are visible using special variables, such as:</p>\n\n<table>\n  <tr>\n    <td><span style=\"font-weight:bold;\">Bioclipse Feature</span></td>\n    <td>ui</td>\n    <td>Bioclipse UI interaction</td>\n  </tr>\n  <tr>\n    <td><span style=\"font-weight:bold;\">Cheminformatics Feature</span></td>\n    <td>cdk</td>\n    <td>CDK functionality</td>\n  </tr>\n  <tr>\n    <td></td>\n    <td>jmol</td>\n    <td>Jmol functionality</td>\n  </tr>\n  <tr>\n    <td><span style=\"font-weight:bold;\">CDK Feature</span></td>\n    <td>cdx</td>\n    <td>CDK Developer functionality</td>\n  </tr>\n</table>\n\n<p>Bioclipse scripting has TAB completion support, so you can type cdk. (notice the dot at the end) to which methods the cdk manager provides.</p>\n\n<h2 id=\"debugging-cdks-atom-type\">Debugging CDK’s Atom Type</h2>\n\n<p>As I wrote last week with the email on the <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/07/01/atom-typing-in-cdk.html\">first CDK 1.2 release candidate <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\nthe new CDK atom typer is a core component of the new CDK. The new implementation covers all atom types used in CDK 1.0, and many more.\nIn particular, <a href=\"http://chemistry-to-informatics.blogspot.com/\">Miguel</a> boosted support for charged and radical atom types.</p>\n\n<p>However, the atom types in your data set may not be covered, or perception fails otherwise. That happens. Bioclipse2 makes\ndebugging of this important step in cheminformatics quite insightful. The following script reads a molecule from SMILES,\nvisualizes 2D diagram in JChemPaint, and perceives atom types: The atom type perception results are return to the JavaScript\nconsole, and if there are <em>nulls</em> given, then the CDK algorithm did not find a matching atom type for that atom. If you are\nsure your cheminformatics representation is in order, I welcome a bug report\n<a href=\"http://sourceforge.net/tracker2/?atid=120024&amp;group_id=20024&amp;func=browse\">here</a>.</p>\n\n<p>CDK developers can take advantage of this functionality, to eliminate possible causes why a certain algorithm fails. CDK atom typing is used for a variate of algorithms, including counting implicit hydrogens, which many other algorithms need to know.</p>\n\n<h2 id=\"how-does-the-cdk-read-a-smiles\">How does the CDK read a SMILES</h2>\n\n<p>A use case for people who want to know if a particular SMILES feature is read or to make sure it is read correctly:\nThis script uses the <em>diff</em> functionality introduced in CDK 1.2, and shows two aspects of the SMILES specification: 1. it\npicked up the isotope information given in the second SMILES; 2. the second SMILES does not include the implicit hydrogen\ncount, which the SMILES specification then defaults as zero.</p>\n\n<h2 id=\"summary\">Summary</h2>\n\n<p>The CDK managers in Bioclipse (<em>cdk</em> and <em>cdx</em>) expose functionality of the CDK, and allows using it in Bioclipse’ rich\nvisual workbench environment.</p>",
      "summary": "Ola has released the second beta for Bioclipse 2.0. Things are getting along, and I will not go into details on the molecules table Arvid is working on, the 1GB+ SD file support, the validating CML editor , the support for XMPP services , or the brand new welcome page which will guide new users around in what Bioclipse has to offer.",
      
      "date_published": "2009-02-15T00:00:00+00:00",
      "date_modified": "2025-10-26T00:00:00+00:00",
      "tags": ["bioclipse","cdk"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1186/1471-2105-8-59", "doi": "10.1186/1471-2105-8-59"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/445ka-1jq43",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/02/12/blogger-degrading.html",
      "title": "Blogger Degrading...",
      "content_html": "<p>Did others notice this too? The <a href=\"http://www.blogger.com/\">blogger.com</a> <em>Links to this post</em> functionality seems seriously broken…\nonce a rather useful feature, it has now degradated to a useless state:</p>\n\n<p><img src=\"/assets/images/bloggerDegrading.png\" alt=\"\" /></p>\n\n<p>I’m quite sure a post of <a href=\"http://phylogenomics.blogspot.com/2008/10/mccain-palin-going-after-fruit-flies.html\">last October</a> cannot\nlink to the <a href=\"http://usefulchem.blogspot.com/2009/02/substructure-searching-on-ons.html\">Substructure searching on ONS solubility data</a>\nitem Jean-Claude posted today.</p>",
      "summary": "Did others notice this too? The blogger.com Links to this post functionality seems seriously broken… once a rather useful feature, it has now degradated to a useless state:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/bloggerDegrading.png",
      "date_published": "2009-02-12T00:00:00+00:00",
      "date_modified": "2009-02-12T00:00:00+00:00",
      "tags": ["blog","google"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/wbb63-gv107",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/02/11/dbpedia-lookup-and-autocomplete-of.html",
      "title": "DBPedia: lookup and autocomplete of chemistry",
      "content_html": "<p>On the <a href=\"http://dbpedia.org/\">DBPedia</a> <a href=\"https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion\">discussion mailing list</a> there was a post on a\nnice web page which allows you to look up things, and which features a autocomplete edit field. The below screenshot show lookup of molecular structures:</p>\n\n<p><img src=\"/assets/images/dbpediaAutocomplete.png\" alt=\"\" /></p>\n\n<p>If you are not ware of this, adding content to DBPedia is as easy as adding something to <a href=\"http://www.wikipedia.org/\">WikiPedia</a>. Literally: DBPedia is\nthe <a href=\"http://en.wikipedia.org/wiki/Resource_Description_Framework\">RDF</a> flavour of WikiPedia. It extracts the information from the info boxes, as I\ndiscussed before (see <a href=\"http://chem-bla-ics.blogspot.com/2007/08/molecules-in-wikipedia.html\">Molecules in Wikipedia</a>).</p>\n\n<p>BTW, one can take advantage of DBPedia to see what WikiPedia has to offer in terms of chemistry. For example, to list all molecules which have a SMILES, one can use this simple SPARQL query:</p>\n\n<script src=\"http://gist.github.com/57559.js\"></script>\n\n<p>Or, to list those which have an InChI:</p>\n\n<script src=\"https://gist.github.com/57571.js\"></script>\n\n<p>And this is actually quite useful, e.g. it can be used in quality control. Running the above queries will show up several broken SMILES and InChIs. I have not had time to fix those yet, so please go ahead and beat me to those fixes, and get some WikiPedia Fame :) Alternatively, invert the queries and add missing InChIs, PubChem CID or SMILES. When I have a bit more free time again, after the new stable CDK and Bioclipse releases, I’ll runs these analyses again, and summarize them in a web page.</p>",
      "summary": "On the DBPedia discussion mailing list there was a post on a nice web page which allows you to look up things, and which features a autocomplete edit field. The below screenshot show lookup of molecular structures:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/dbpediaAutocomplete.png",
      "date_published": "2009-02-11T00:00:00+00:00",
      "date_modified": "2009-02-11T00:00:00+00:00",
      "tags": ["rdf","dbpedia","wikipedia","smiles","inchi"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/9k5sa-fgb67",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/02/10/cdk-12-release-candidate.html",
      "title": "CDK 1.2 Release Candidate",
      "content_html": "<p>I released <a href=\"http://cdk.sf.net/\">CDK</a> 1.1.5 today. Below is the email I sent to the\n<a href=\"http://sourceforge.net/mailarchive/forum.php?forum_name=cdk-user\">cdk-user mailing list</a>:</p>\n\n<blockquote>\n  <p>Hi all,</p>\n\n  <p>I am happy to be able to announce the first Release Candidate for CDK 1.2.</p>\n\n  <p>Everyone using using CDK 1.0 is suggest to upgrade to this release,\nwhich has fewer bugs, is much better tested, and is faster too. It\nalso comes with API changes, and a full changelog is not available\n(yet). However, the CDK developers are available on this mailing list\nand on IRC to help you port CDK 1.0 applications to CDK 1.2. Two\ndifferences in particular I would like to point out at this moment:</p>\n\n  <h4 id=\"1-explicit-atom-typing\">1. explicit atom typing</h4>\n\n  <p>CDK 1.0 did atom typing at various places to perform its function,\nleading to inconsistencies and bugs. CDK 1.2 introduces a new atom\ntyping module which isolates atom typing from other algorithms.\nConsequently, the CDK will be more critical on your code and your\ndata: where the old code might have silently eaten incorrect input,\nthe new implementation complains: expect exceptions! The actual atom\ntype list used in CDK 1.2 is more complete than the ones used in CDK\n1.0; however, it is not unlikely that you will find no atom type\nperceived for a clearly valid atom type. Please report such cases.</p>\n\n  <p>And I really want to stress this: in every instance where CDK 1.2, CDK\n1.0 would have failed too, though it might have not complained about\nit.</p>\n\n  <h4 id=\"2-no-rendering-functionality\">2. no rendering functionality</h4>\n\n  <p>The new rendered under development (see the <a href=\"http://sourceforge.net/mailarchive/forum.php?forum_name=cdk-jchempaint\">cdk-jchempaint mailing\nlist</a>)\nhas not made the CDK 1.2.0 release. However, it is expected to\nbe available in a later CDK 1.2.x release. If you really need the\ngraphics functionality, please contact me. Bioclipse2 is an example\nproject which combines CDK 1.2 with the new rendering code.</p>\n\n  <h2 id=\"contributions\">Contributions</h2>\n\n  <p>This release features contributions from a larger developer group than\never before. In particular, I would like to welcome those who have\npicked up JuniorJobs, and provided other smaller patches! A full list\nof authors is available from:</p>\n\n  <p><a href=\"http://cdk.svn.sourceforge.net/viewvc/cdk/cdk/tags/cdk-1.1.5/AUTHORS\">http://cdk.svn.sourceforge.net/viewvc/cdk/cdk/tags/cdk-1.1.5/AUTHORS</a></p>\n\n  <p>(If you see your name missing (sorry!), please just email me)</p>\n\n  <p>If you like to contribute too, there are many ways. The JuniorJobs is\njust an example and are available from:</p>\n\n  <p><a href=\"http://sourceforge.net/tracker/?group_id=20024&amp;atid=997721\">http://sourceforge.net/tracker/?group_id=20024&amp;atid=997721</a></p>\n\n  <h2 id=\"download\">Download</h2>\n\n  <p>CDK 1.2 RC1 is available from SourceForge as CDK 1.1.5:</p>\n\n  <p><a href=\"http://sourceforge.net/project/showfiles.php?group_id=20024&amp;package_id=57806\">http://sourceforge.net/project/showfiles.php?group_id=20024&amp;package_id=57806</a></p>\n\n  <p>Alternatively, you can download the release from SVN:</p>\n\n  <p><a href=\"http://cdk.svn.sourceforge.net/viewvc/cdk/cdk/tags/cdk-1.1.5/\">http://cdk.svn.sourceforge.net/viewvc/cdk/cdk/tags/cdk-1.1.5/</a></p>\n\n  <h2 id=\"bugs\">Bugs</h2>\n\n  <p>As said, this CDK release is the most tested CDK release ever, with\nmore than 10 thousand unit tests! However, there are open (minor)\nissues, which you can see reported at Nightly:</p>\n\n  <p><a href=\"http://pele.farmbio.uu.se/nightly-1.2.x/\">http://pele.farmbio.uu.se/nightly-1.2.x/</a></p>\n\n  <p>The number of failing unit tests is below 1%, and in the same range as\nthe number of failing tests for CDK 1.0. Importantly, these are\ntypically fails of unit tests which are not available in the CDK 1.0\nunit test suite; that is, many of the failing unit tests in CDK 1.0\nare <em>not</em> failing in CDK 1.2 (it really is rewarding to upgrade!)</p>\n\n  <p>However, if you find additional bugs (or just have wishlists), you can\nreport these with our SourceForge bug tracker at:</p>\n\n  <p><a href=\"http://sourceforge.net/tracker/?group_id=20024&amp;atid=120024\">http://sourceforge.net/tracker/?group_id=20024&amp;atid=120024</a></p>\n\n  <h2 id=\"documentation\">Documentation</h2>\n\n  <p>Over the next weeks I hope to compose a somewhat useful list of\nchanges. I have not made up my mind yet how that will take shape,\nmaybe as a list of blogs, which I’ll aggregate later. Dunno yet.\nSuggestions and contributions welcome :)</p>\n\n  <p>JavaDoc for the release is not yet available on SF for download\n(working on that), but available for the cdk1.2.x / branch at:</p>\n\n  <p><a href=\"http://pele.farmbio.uu.se/nightly-1.2.x/api/\">http://pele.farmbio.uu.se/nightly-1.2.x/api/</a></p>\n\n  <p>OK, that wraps it up for now. Just reply if you have questions.</p>\n\n  <p>Egon</p>\n</blockquote>",
      "summary": "I released CDK 1.1.5 today. Below is the email I sent to the cdk-user mailing list:",
      
      "date_published": "2009-02-10T00:00:00+00:00",
      "date_modified": "2025-10-20T00:00:00+00:00",
      "tags": ["cdk","java","sourceforge"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/24q7a-eg105",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/02/05/where-can-i-host-my-experimental-data.html",
      "title": "Where can I host my experimental data? Open Submission Chemistry Databases #1",
      "content_html": "<p><a href=\"http://depth-first.com/\">Rich</a> just posted an interesting read on <a href=\"http://depth-first.com/articles/2009/02/04/web-centric-science\">Web-Centric Science</a>,\nafter a <a href=\"http://therealmoforganicsynthesis.blogspot.com/2009/02/throwing-down-gauntlet-for-my-fellow.html\">gauntlet thrown down</a> by\n<a href=\"http://therealmoforganicsynthesis.blogspot.com/\">The Realm of Organic Synthesis</a> (TROS).</p>\n\n<p>I agree that this still is a problem: where can (organic) chemists host their data? TROS hints at <a href=\"http://wikipedia.org/\">Wikipedia</a>,\nbut an encyclopedia is not always the most suited place for cutting edge chemistry (article can easily be biased, contain (science)\npolitical views, etc…). I would suggest a blog would be a good start, and if proper markup would be used services like\n<a href=\"http://cb.openmolecules.net/\">Chemical blogspace</a> would automatically aggregate it.</p>\n\n<p>However, something less volatile might be interesting. So, what we need is an overview of web databases where experimental chemistry\ndata can be hosted. I’ll start one, and annotate resources with license, on <a href=\"http://delicious.com/egonw\">delicious.com</a>,\nusing the tags <a href=\"http://delicious.com/tag/chemistry+web+database+open+submission\">chemistry +web +database +open +submission</a>,\nand regularly summarize things here.</p>\n\n<p>In the below table, the last column indicated the most liberal license you can use to host your data:</p>\n\n<table>\n  <tbody>\n    <tr>\n      <td><span style=\"font-weight:bold;\">database</span></td>\n      <td><span style=\"font-weight:bold;\">data type</span></td>\n      <td><span style=\"font-weight:bold;\">license</span></td>\n    </tr>\n    <tr>\n      <td><a href=\"http://nmrshiftdb.ice.mpg.de/\">NMRShiftDB</a></td>\n      <td>NMR spectra</td>\n      <td><a href=\"http://www.gnu.org/copyleft/fdl.html\">GNU FDL</a></td>\n    </tr>\n    <tr>\n      <td><a href=\"http://nmrshiftdb.ice.mpg.de/\">ChemSpider</a></td>\n      <td>Structures, links to papers, spectra</td>\n      <td><a href=\"http://www.opendefinition.org/\">open data</a></td>\n    </tr>\n    <tr>\n      <td><a href=\"http://sord.nl/\">SORD</a></td>\n      <td>Organic Reactions</td><td>?</td>\n    </tr>\n  </tbody>\n</table>\n\n<p>There are some obvious gaps here, if you consider a typical experimental section. What to do with an measure melting point, IR spectra, mass spectral information, and measured elemental composition.</p>",
      "summary": "Rich just posted an interesting read on Web-Centric Science, after a gauntlet thrown down by The Realm of Organic Synthesis (TROS).",
      
      "date_published": "2009-02-05T00:10:00+00:00",
      "date_modified": "2025-10-20T00:00:00+00:00",
      "tags": ["chemistry","opendata","nmrshiftdb"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/sja4z-njx48",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/02/05/why-cant-sourceforge-just-remember-me.html",
      "title": "Why can&apos;t SourceForge just remember me?",
      "content_html": "<p>I do not typically make complaints in my blog, so consider this a request for advice in good practices ;)</p>\n\n<p>My problem is that I have to log in on <a href=\"http://www.sf.net/\">SourceForge</a> every day, even if I tick the ‘Remember me’ switch. I do understand that\naccount log ins do need some time out… but less than a day? One cause of problems seems to be if I connect via a different network, but\n<a href=\"http://en.wikipedia.org/wiki/HTTP_cookie\">Cookies</a> should not be affected by that? Am I doing something wrong here, or does SourceForge?</p>\n\n<p>Are others having the same problems?</p>",
      "summary": "I do not typically make complaints in my blog, so consider this a request for advice in good practices ;)",
      
      "date_published": "2009-02-05T00:00:00+00:00",
      "date_modified": "2009-02-05T00:00:00+00:00",
      "tags": ["sourceforge"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/qf4d9-79d36",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/02/04/youtube-for-chemistry.html",
      "title": "YouTube for Chemistry",
      "content_html": "<p><a href=\"http://www.chemspider.com/\">ChemSpider</a> has set up <a href=\"http://www.chemspider.com/blog/why-are-chemical-structures-like-youtube-videos.html\">embeddable chemistry widget</a>\n(per <a href=\"http://blog.openwetware.org/scienceintheopen/\">Cameron</a>’s idea), much like <a href=\"http://youtube.com/\">YouTube</a>. I just have to try that.\nUnlike YouTube, you need to be registered and logged in to use the functionality (I hope the requirement will be dropped):</p>\n\n<div class=\"language-html highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nt\">&lt;script </span><span class=\"na\">type=</span><span class=\"s\">\"text/javascript\"</span> <span class=\"na\">src=</span><span class=\"s\">\"http://www.chemspider.com/csjsapi.ashx?op=img&amp;amp;tk=3d178e75-a272-4d60-8ca9-5b1183a0e746&amp;amp;id=171&amp;amp;w=120&amp;amp;p=1&amp;amp;eid=%22azijnzuur%22\"</span><span class=\"nt\">&gt;&lt;/script&gt;</span>\n</code></pre></div></div>\n\n<p>There is an option to have ChemSpider link back to blog, and I will have to figure out how to enable\n<a href=\"http://cb.openmolecules.net/\">Chemical blogspace</a> to extract the InChI from the underlying JavaScripts.</p>\n\n<p><strong>Update</strong>: I noticed that the ChemSpider server was a bit sluggish this morning, and that loading my blog page halts at loading the\nJavaScript… Tony, I suggest to use some Ajax magic here, with a really fast JavaScript download (using an almost static bit of\nJavaScript), and then a Ajax to access to slower bits, which might involve image generation and database lookup.</p>\n\n<p><strong>Update2</strong>: the feature was already under development before Cameron asked about it.</p>\n\n<p><strong>Update3</strong>: the script is no longer working, and I made the code visible instead, for historic reasons.</p>",
      "summary": "ChemSpider has set up embeddable chemistry widget (per Cameron’s idea), much like YouTube. I just have to try that. Unlike YouTube, you need to be registered and logged in to use the functionality (I hope the requirement will be dropped):",
      
      "date_published": "2009-02-04T00:00:00+00:00",
      "date_modified": "2025-10-19T00:00:00+00:00",
      "tags": ["chemspider"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/c9h5h-sdq16",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/01/27/metware-presenation-at-metabolomics.html",
      "title": "MetWare presentation at Metabolomics Workshop",
      "content_html": "<p>I gave <a href=\"https://doi.org/10.5281/zenodo.3366041\">this</a> 10 minute presentation on\n<a href=\"https://sourceforge.net/projects/metware/\">MetWare <i class=\"fa-solid fa-recycle fa-xs\"></i></a> this afternoon at the\n<a href=\"http://www.elixir-europe.org/page.php?page=metabolomics_workshop\">International Metabolomics Workshop <i class=\"fa-solid fa-link-slash fa-xs\"></i></a>:</p>\n\n<p><a href=\"https://doi.org/10.5281/zenodo.3366041\"><img src=\"/assets/images/metware_uppsala_2009.png\" alt=\"\" /></a></p>",
      "summary": "I gave this 10 minute presentation on MetWare this afternoon at the International Metabolomics Workshop :",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/metware_uppsala_2009.png",
      "date_published": "2009-01-27T00:00:00+00:00",
      "date_modified": "2025-10-22T00:00:00+00:00",
      "tags": ["metware","metabolomics"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.5281/ZENODO.3366041", "doi": "10.5281/ZENODO.3366041"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/qvacd-hd936",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/01/23/statistics-on-development-community.html",
      "title": "Statistics on the Development Community",
      "content_html": "<p>Git is nice. Nicer to some than to other, that is true. <a href=\"http://gitready.com/\">GitReady</a> just learned me how to\ncalculate commit message in a few seconds: <code class=\"language-plaintext highlighter-rouge\">git shortlog -s -n</code>. For the whole\n<a href=\"http://www.bioclipse.net/\">Bioclipse</a> and <a href=\"http://cdk.sf.net/\">CDK</a> commit history. Seconds.\nHere they are.</p>\n\n<h2 id=\"cdk\">CDK</h2>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>  6337  egonw\n  1446  rajarshi\n   616  shk3\n   540  steinbeck\n   421  miguelrojasch\n   139  chhoppe\n   115  eoc21\n    98  kaihartmann\n    97  mfe4\n    74  labarta\n    69  tohel\n    43  f_marighetti\n    40  ospjuth\n    39  nielsout\n    34  sushil_ronghe\n    29  archvile18\n    29  maet\n    26  djiao\n    24  michaelthoward\n    22  dirk49\n    17  Egon Willighagen\n    16  martitm\n    15  jhao\n    14  stomkinson\n    12  benedikta\n    12  dleidert\n    12  telto\n    11  thomaskuhn\n    10  sea36\n    10  yz237\n     7  zzzgggrrr\n     6  speleo3\n     5  jharter\n     5  mario_baseda\n     4  akrassavine\n     4  gilleain\n     4  mekovich\n     4  ulif\n     4  yeldar\n     3  edgarl\n     3  jonalv\n     3  petermr\n     2  drzz\n     2  jakt\n     2  sithmein\n     2  vedina\n     1  Kaihartmann\n</code></pre></div></div>\n\n<h2 id=\"bioclipse\">Bioclipse</h2>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>   939  ospjuth\n   466  jonalv\n   438  egonw\n   293  shk3\n   266  goglepox\n   263  carl_masak\n    93  edrin_t\n    67  rklancer\n    44  biocoder\n    23  gilleain\n    23  miguelrojasch\n    21  Annzi\n     9  Egon Willighagen\n     1  grantsparks\n</code></pre></div></div>\n\n<p>There indeed is a bit of contributor duplication, but pretty neat. I did not have a full Jmol git around, so those statistics will have to follow.</p>",
      "summary": "Git is nice. Nicer to some than to other, that is true. GitReady just learned me how to calculate commit message in a few seconds: git shortlog -s -n. For the whole Bioclipse and CDK commit history. Seconds. Here they are.",
      
      "date_published": "2009-01-23T00:00:00+00:00",
      "date_modified": "2009-01-23T00:00:00+00:00",
      "tags": ["cdk","bioclipse","git"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/1zmv1-tnn62",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/01/21/details-behind-calling-xmpp-cloud.html",
      "title": "Details behind the &quot;Calling XMPP cloud services from Taverna2&quot;",
      "content_html": "<p>On Monday I showed <a href=\"https://chem-bla-ics.linkedchemistry.info/2009/01/19/calling-xmpp-cloud-services-from.html\">two screenshot <i class=\"fa-solid fa-recycle fa-xs\"></i></a> showing our\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2009/01/19/calling-xmpp-cloud-services-from.html\">new XMPP-based web/cloud services <i class=\"fa-solid fa-recycle fa-xs\"></i></a> in action\ninside <a href=\"http://taverna.sf.net/\">Taverna</a>.</p>\n\n<p>I promised details, but realize I have actually already posted a lot of them <a href=\"https://chem-bla-ics.linkedchemistry.info/2008/10/31/next-generation-asynchronous.html\">in October <i class=\"fa-solid fa-recycle fa-xs\"></i></a>:</p>\n\n<blockquote>\n  <p>Johannes ideas led to the <a href=\"http://xmpp.org/extensions/xep-0244.html\">IO-DATA proposal</a> (XEP-0244), which is currently\nmarked experimental and being discussed on the ws-xmpp mailing list. He gathered a few people around him to get it going,\nresulting in working stuff! Yeah!</p>\n</blockquote>\n\n<p><a href=\"http://miningdrugs.blogspot.com/\">Joerg</a> <a href=\"http://friendfeed.com/e/a15e79ac-92ce-4b16-81d9-8f7b6ec1ea24/chem-bla-ics-Calling-XMPP-cloud-services-from/\">asked</a>\n<em>Could you post more results, what is it, why do we need it, e.g. why are you mentioning SOAP and cloud? Do not know enough to see the bonus right now.</em></p>\n\n<p><strong>What is it?</strong> IO-DATA is a protocol on top of the XMPP protocol to allow machine-to-machine communication. Actually,\nmuch like SOAP, RPC, and other platforms. How IO-DATA differs lies to some extend to the transport layer: instead of\nusing HTTP, it used the XMPP transport protocol, also used for Jabber chat clients. It basically allows clients like\nTaverna to chat with services running elsewhere.</p>\n\n<p><strong>Why do we need it?</strong> Most services run over HTTP, making them web services. This is convenient, because there is\nmuch infrastructure around, like web browsers. REST services also take advantage of this. However, for heavy\ncomputing this sometimes leads to problems. For example, routers are known to have time outs on HTTP connections.\nTo solve this, SOAP services often introduce a polling mechanism. IO-DATA takes a different approach. Instead of\nhaving to ask all the time how a calculation is doing, you can just wait for the service to send you a message\nwhen it is done. Instead of working around the lack of asynchronous aspects, IO-DATA introduces these in the protocol.</p>\n\n<p>Other interesting features include that the IO-DATA integrates the interface formats for services into the service\nitself, SOAP needs WSDL for this, and that it features service discovery via DISCO. The latter is done with SOAP\ntoo, for example with UDDI and BioMoby. The latter also adds strong data typing for input and output of services.</p>\n\n<p>IO-DATA addresses the data typing by allowing asking the service what XML Schema it uses for input and output.\nWhile XML Schema has alternative, and which may be prefered in some situations, it does allow strong data typing\nand supports <a href=\"http://friendfeed.com/e/2d322ac5-a5b9-4336-b421-fede0eb8e192/Hi-Guys-I-m-looking-for-an-exhaustive-resource-of/\">a lot of formats in life sciences</a>\n(which I’ll summarise soon).</p>\n\n<p>Moreover, if there just happens not to be a suitable schemata around, you can just define one yourself, which can\nbe as simple as a single element wrapper around some custom text-based format. You worry about supporting many\nformats? Well, no need. Johannes’ xws4j library, which I used for the Taverna plugin too, allows compiling a Java\nbinding code. Bioclipse’s script environment allows you do to this on the fly: you find a service, ask for the\nschema, compile bindings for input and output, set up the input with the input binding, send it of to the service,\nand use the output binding for convenient access to the computation results. Without having to reboot Bioclipse.\nIsn’t that <strong>cool</strong>? Can your software do that? (See <a href=\"http://gist.github.com/22185\">this example Gist</a>: the io\nfactory creates the binding).</p>\n\n<p><strong>Why do I mention SOAP and the cloud?</strong> It should be clear from the above why I mention SOAP: it offer the same\nfunctionality, but more conveniently, we think. I mention cloud here, to refer to cloud computing which is doing\ncomputation on the cloud, which is a synonym for the internet (see\n<a href=\"http://en.wikipedia.org/wiki/Cloud_computing\">Cloud Computing @ Wikipedia</a>). Because it does\nnot use HTTP, we do not feel we can call it web service. Instead, cloud computing is a more general term, not\ntied to any particular architecture. IO-DATA is just one possible architecture, one we think is promising for\nlife science applications.</p>",
      "summary": "On Monday I showed two screenshot showing our new XMPP-based web/cloud services in action inside Taverna.",
      
      "date_published": "2009-01-21T00:00:00+00:00",
      "date_modified": "2025-10-26T00:00:00+00:00",
      "tags": ["xmpp","taverna"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/g8r2b-7d676",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/01/19/calling-xmpp-cloud-services-from.html",
      "title": "Calling XMPP cloud services from Taverna2",
      "content_html": "<p>SMILES (<em>CCC</em>) in, mass out. Yes, we can now call XMPP/IO-DATA cloud services with Taverna2 :)</p>\n\n<p><img src=\"/assets/images/t1.png\" alt=\"\" /></p>\n\n<p><img src=\"/assets/images/t2.png\" alt=\"\" /></p>\n\n<p>Details will follow, but here’s the <a href=\"http://github.com/egonw/xws-taverna/tree/master\">source code</a>.</p>",
      "summary": "SMILES (CCC) in, mass out. Yes, we can now call XMPP/IO-DATA cloud services with Taverna2 :)",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/t1.png",
      "date_published": "2009-01-19T00:10:00+00:00",
      "date_modified": "2009-01-19T00:10:00+00:00",
      "tags": ["taverna","xmpp"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/bha3t-gtx57",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/01/19/rsc-now-allows-jmol-in-main-text-of.html",
      "title": "RSC now allows Jmol in main text of publication... well, almost",
      "content_html": "<p>Richard Kidd wrote in the <a href=\"http://prospect.rsc.org/blogs/cw/?p=1315\">ChemistryWorldBlog</a> about Henry Rzepa to have published two papers in\n<a href=\"http://www.rsc.org/\">RSC</a> journals where Jmol is part of the main paper, after having used Jmol in extra material in ACS journals before.\nThe key here is that the <a href=\"http://www.jmol.org/\">Jmol</a> is part of the official text… when you open the paper in a browser, you immediately\nget to see the Jmol live, 3D graphics! Well, so it is said in the blog.</p>\n\n<p>However, when I checked the HTML of the first of the two papers (<em>A computational investigation of the structure of polythiocyanogen</em>,\ndoi:<a href=\"http://dx.doi.org/10.1039/b810147g\">10.1039/b810147g</a>). The main HTML <strong>still</strong> links to a supplementary page. Progress, but not\nperfect either:</p>\n\n<p><img src=\"/assets/images/henryJmolOnline.png\" alt=\"\" /></p>",
      "summary": "Richard Kidd wrote in the ChemistryWorldBlog about Henry Rzepa to have published two papers in RSC journals where Jmol is part of the main paper, after having used Jmol in extra material in ACS journals before. The key here is that the Jmol is part of the official text… when you open the paper in a browser, you immediately get to see the Jmol live, 3D graphics! Well, so it is said in the blog.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/henryJmolOnline.png",
      "date_published": "2009-01-19T00:00:00+00:00",
      "date_modified": "2009-01-19T00:00:00+00:00",
      "tags": ["jmol","publishing"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1039/b810147g", "doi": "10.1039/b810147g"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/gkcs6-62t68",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/01/16/bioclipse-and-gist-integration.html",
      "title": "Bioclipse and Gist integration",
      "content_html": "<p>As you might have read, <a href=\"http://www.bioclipse.net/\">Bioclipse</a> has scripting support (see for example, <a href=\"https://chem-bla-ics.linkedchemistry.info/2008/11/20/scripting-jchempaint.html\">Scripting JChemPaint <i class=\"fa-solid fa-recycle fa-xs\"></i></a>),\nand that we have been collection them on <a href=\"http://gist.github.com/\">Gist</a> and indexing them on Delicious with the tags <a href=\"http://delicious.com/tag/bioclipse+gist\">bioclipse and gist</a>.\nThis provides a nice overview of what you can do with the current SVN version of Bioclipse2. And, hopefully, when released, allow users to quickly learn about Bioclipse features,\nallow people to share scripts etc. Think of it as <a href=\"http://myexperiment.org/\">MyExperiment.org</a> for Bioclipse.</p>\n\n<p>Now, what was missing until today, was easy access to gists in Bioclipse itself. No <code class=\"language-plaintext highlighter-rouge\">gist.load(33421)</code> yet. There still is not, but I uploaded earlier today a Wizard for it.\n(The manager will follow later). Right click on an open Project, select New -&gt; Other, and pick <em>Download Gist</em>:</p>\n\n<p><img src=\"/assets/images/gist3.png\" alt=\"\" /></p>\n\n<p>and click <em>Next</em>:</p>\n\n<p><img src=\"/assets/images/gist4.png\" alt=\"\" /></p>\n\n<p>Then, just type the number of the Gist you want to open in Bioclipse, for example <a href=\"http://gist.github.com/18315\">18315</a> (see\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2008/10/25/bioclipse2-scripting-1-from-smiles-to.html\">Bioclipse2 Scripting #1: from SMILES to a UFF optimized structure in Jmol <i class=\"fa-solid fa-recycle fa-xs\"></i></a>),\nand click another <em>Next</em> to select a file name and location:</p>\n\n<p><img src=\"/assets/images/gist5.png\" alt=\"\" /></p>\n\n<p>The current code does require you to know the Gist number, so you’ll need a web browser to look it up, but we do have search facilities in mind. Also, while the code\nattempts so, the resulting Gist is not automatically openend in an editor (a bug). Another idea is to just install the\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2008/10/24/git-eclipse-integration.html\">egit <i class=\"fa-solid fa-recycle fa-xs\"></i></a> plugin in Bioclipse :)</p>",
      "summary": "As you might have read, Bioclipse has scripting support (see for example, Scripting JChemPaint ), and that we have been collection them on Gist and indexing them on Delicious with the tags bioclipse and gist. This provides a nice overview of what you can do with the current SVN version of Bioclipse2. And, hopefully, when released, allow users to quickly learn about Bioclipse features, allow people to share scripts etc. Think of it as MyExperiment.org for Bioclipse.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/gist3.png",
      "date_published": "2009-01-16T00:00:00+00:00",
      "date_modified": "2025-10-26T00:00:00+00:00",
      "tags": ["bioclipse","git","github"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/saeyx-84w55",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/01/15/editing-and-validation-of-pubchem-xml.html",
      "title": "Editing and Validation of PubChem XML documents",
      "content_html": "<p>With the general framework set up for <a href=\"https://chem-bla-ics.linkedchemistry.info/2008/12/30/editing-and-validation-of-cml-documents.html\">editing and validation of CML documents <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\nit was fairly easy to support the <a href=\"ftp://ftp.ncbi.nlm.nih.gov/pubchem/specifications/pubchem.xsd\">PubChem XML file format schema too</a>.</p>\n\n<p>With the upcoming Bioclipse2 beta (scheduled next Friday), all you need to install on top of the Bioclipse2 core is the new XML feature.</p>",
      "summary": "With the general framework set up for editing and validation of CML documents , it was fairly easy to support the PubChem XML file format schema too.",
      
      "date_published": "2009-01-15T00:00:00+00:00",
      "date_modified": "2025-10-11T00:00:00+00:00",
      "tags": ["pubchem","xml","bioclipse"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/ms0pq-arf10",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/12/30/editing-and-validation-of-cml-documents.html",
      "title": "Editing and Validation of CML documents in Bioclipse",
      "content_html": "<p>One advantage of using XML is that one can rely on good support in libraries for functionality. When\nparsing XML, one does not have to take care of the syntax, and focus on the data and its semantics.\nThis comes at the expense of verbosity, though, but having the ability to express semantics explicitly\nis a huge benefit for flexibility.</p>\n\n<p>So, when <a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/\">Peter</a> and Henry put their first documents online about the Chemical Markup Language\n(CML), I was thrilled, even though is actually was still SGML when I encountered it. The work predates the\n<a href=\"http://www.w3.org/TR/1998/REC-xml-19980210\">XML recommendation</a>. As I\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2008/10/02/jchempaint-history-cml-patches-in-1999.html\">recently blogged <i class=\"fa-solid fa-recycle fa-xs\"></i></a>, in ‘99\nI wrote patches for Jmol and JChemPaint to support CML, which were published as preprint in the\n<a href=\"http://www.sciencedirect.com/preprintarchive\">Chemical Preprint Server</a> in a paper in 2000 in the\n<a href=\"http://hackberry.trinity.edu/IJC/\">Internet Journal of Chemistry</a>. Neither of the two has survived.</p>\n\n<p>Anyway, the <a href=\"http://cdk.sf.net/\">Chemistry Development Kit</a> makes heavy use of CML, and \n<a href=\"http://www.bioclipse.net/\">Bioclipse</a> supports it too. Now, Bioclipse is based on the <a href=\"http://www.eclipse.org/\">Eclipse</a>\n<a href=\"http://wiki.eclipse.org/index.php/Rich_Client_Platform\">Rich Client Platform</a> architecture, for which\nthere exist quite a few XML tools in the <a href=\"http://www.eclipse.org/webtools/\">Web Tools Platform</a> (WTP).\nAmong these, a validation, content assisting XML editor. This means, I get red markings when I make my\nXML document not-well-formed or invalid. Just a quick recap: well-formedness means that the XML document\nhas a proper syntax: one root node, properly closed tags, quotes around attribute values, etc. Validness,\nhowever, means that the document is well-formed, but also hierarchically organized according to some specification.</p>\n\n<p>Enter CML. CML is such a specification, first with DTDs, but after the introduction of XML Namespaces with\nXML Schema (see <a href=\"http://cmlexplained.blogspot.com/2007/06/there-can-be-only-one-namespace.html\">There can be only one (namespace)</a>).\nThe WTP can use this XML Schema for validation, and this is of great help learning the CML language.\nPressing Ctrl-space in Bioclipse will now show you what allowed content can be added at the current character\nposition.</p>\n\n<p>Yes, Bioclipse can do this now (in SVN, at least). This has been on my wishlist for at least two years now, but\nnever really found the right information. Now, three days ago <a href=\"http://intellectualcramps.blogspot.com/\">David</a>\nwrote about <a href=\"http://intellectualcramps.blogspot.com/2008/12/end-of-year-cramps.html\">End of Year Cramps</a>\nin which he describes some of his work on the WTP for autocomplete for XPath queries. He <em>see[s] a brighter\nfuture for XML at eclipse over the next year. I hope that those in the eclipse and XML community will help\nto continue to improve the basic support, so that first class commercial quality applications that leverage\nthis support can continue to be built.</em></p>\n\n<p>That was enough statement for me to <a href=\"http://intellectualcramps.blogspot.com/2008/12/end-of-year-cramps.html?showComment=1230451020000#c4332753586396921531\">ask in the comments</a>\non how to make the WTP XML editor aware of the CML XML Schema. It already picked up XML Schema’s with\n<code class=\"language-plaintext highlighter-rouge\">xsi:schemaLocation</code>, but I needed something to worked without such statements in the XML document itself.\nDavid explained that me that I could use the <a href=\"http://intellectualcramps.blogspot.com/2008/12/end-of-year-cramps.html?showComment=1230498780000#c4628316622126916885\">org.eclipse.wst.xml.catalog extension</a>.\nThis was really easy, and <a href=\"http://bioclipse.svn.sourceforge.net/viewvc/bioclipse/bioclipse2/trunk/plugins/net.bioclipse.cml/plugin.xml?r1=8101&amp;r2=8100&amp;pathrev=8101\">commited to Bioclipse SVN</a> as:</p>\n\n<div class=\"language-xml highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nt\">&lt;extension</span>\n  <span class=\"na\">point=</span><span class=\"s\">\"org.eclipse.wst.xml.core.catalogContributions\"</span><span class=\"nt\">&gt;</span>\n  <span class=\"nt\">&lt;catalogContribution&gt;</span>\n    <span class=\"nt\">&lt;uri</span> <span class=\"na\">name=</span><span class=\"s\">\"http://www.xml-cml.org/schema\"</span>\n          <span class=\"na\">uri=</span><span class=\"s\">\"schema24/schema.xsd\"</span><span class=\"nt\">/&gt;</span>\n  <span class=\"nt\">&lt;/catalogContribution&gt;</span>\n<span class=\"nt\">&lt;/extension&gt;</span>\n</code></pre></div></div>\n\n<p>However, that does not make the WTP XML editor available in the Bioclipse application yet. Not ever in\nthe “Open With”… So, I set up a <a href=\"http://bioclipse.svn.sourceforge.net/viewvc/bioclipse/bioclipse2/trunk/features/net.bioclipse.cml_feature/\">CML Feature</a>.\nAfter a follow up question, it turned out that the CML content type of Bioclipse was already a sub type of the\nXML type (see ):</p>\n\n<div class=\"language-xml highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nt\">&lt;extension</span>\n  <span class=\"na\">point=</span><span class=\"s\">\"org.eclipse.core.runtime.contentTypes\"</span><span class=\"nt\">&gt;</span>\n  <span class=\"nt\">&lt;content-type</span>\n    <span class=\"na\">base-type=</span><span class=\"s\">\"org.eclipse.core.runtime.xml\"</span>\n    <span class=\"na\">id=</span><span class=\"s\">\"net.bioclipse.contenttypes.cml\"</span>\n    <span class=\"na\">name=</span><span class=\"s\">\"Chemical Markup Language (CML)\"</span>\n    <span class=\"na\">file-extensions=</span><span class=\"s\">\"cml,xml\"</span>\n    <span class=\"na\">priority=</span><span class=\"s\">\"normal\"</span><span class=\"nt\">&gt;</span>\n  <span class=\"nt\">&lt;/content-type&gt;</span>\n<span class=\"nt\">&lt;/extension&gt;</span>\n</code></pre></div></div>\n\n<p>So, the only remaining problem was to actually get the WTP XML editor as part of the Bioclipse application.\nThe new CML Feature takes care of that (I hope the export and building the update site work too, but\nthat’s yet untested), by important the relevant plugins and features. Last night, however, I ended up with\none stacktrace which gave me little clue on which plugin I was still missing.</p>\n\n<p>Therefore, I headed to #eclipse and actually met David of the blog that started this again. He asked\n<a href=\"http://nitind.blogspot.com/\">nitind</a> to think about it too, and they helped me pin down the issue.\nThis relevant bit of the stacktrace turned out to be:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>Caused by: java.lang.IllegalStateException\n at org.eclipse.core.runtime.Platform.getPluginRegistry(Platform.java:774)\n at org.eclipse.wst.common.componentcore.internal.impl.WTPResourceFactoryRegistry$ResourceFactoryRegistryReader.(WTPResourceFactoryRegistry.java:275)\n at org.eclipse.wst.common.componentcore.internal.impl.WTPResourceFactoryRegistry.(WTPResourceFactoryRegistry.java:61)\n at org.eclipse.wst.common.componentcore.internal.impl.WTPResourceFactoryRegistry.(WTPResourceFactoryRegistry.java:55)\n ... 37 more\n</code></pre></div></div>\n\n<p>This refered to this bit of code of Eclipse’ Platform.java:</p>\n\n<div class=\"language-java highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nc\">Bundle</span> <span class=\"n\">compatibility</span> <span class=\"o\">=</span> <span class=\"nc\">InternalPlatform</span><span class=\"o\">.</span><span class=\"na\">getDefault</span><span class=\"o\">()</span>\n  <span class=\"o\">.</span><span class=\"na\">getBundle</span><span class=\"o\">(</span><span class=\"nc\">CompatibilityHelper</span><span class=\"o\">.</span><span class=\"na\">PI_RUNTIME_COMPATIBILITY</span><span class=\"o\">);</span>\n  <span class=\"k\">if</span> <span class=\"o\">(</span><span class=\"n\">compatibility</span> <span class=\"o\">==</span> <span class=\"kc\">null</span><span class=\"o\">)</span>\n    <span class=\"k\">throw</span> <span class=\"k\">new</span> <span class=\"nf\">IllegalStateException</span><span class=\"o\">();</span>\n</code></pre></div></div>\n\n<p>So, the plugin I turned to to have missing was <em>org.eclipse.core.runtime.compatibility</em>. Apparently,\nsome parts of the WTP that the XMLEditor is using, still uses Eclipse2.x technology.</p>\n\n<p><img src=\"/assets/images/cmlValid.png\" alt=\"\" /></p>\n\n<p>This screenshot shows the WTP XMLEditor in action in Bioclipse on a CML file. It shows the document\ncontents with the ‘Design’ tab, which also shows allowed content, as derived from the XML Schema for\nCML. Also, note that the Outline and Properties view automatically come for free, which allows more\ndetail and navigation of the content.</p>\n\n<p><img src=\"/assets/images/cmlContentAssisting.png\" alt=\"\" /></p>\n\n<p>This screenshot shows the ‘Source’ tab for the same file, where I deliberately changed the value of\nthe @id attribute of the first atom. The value does not validate against the regular expression defined\nin the CML schema for @id attribute values. It also shows the content assisting in action. At any\nlocation in the CML file, I can hit Ctrl-Space, and the editor will show me which content I can add\nat that location.</p>\n\n<p>This makes Bioclipse a perfect tool to craft CML documents and learn the language.</p>",
      "summary": "One advantage of using XML is that one can rely on good support in libraries for functionality. When parsing XML, one does not have to take care of the syntax, and focus on the data and its semantics. This comes at the expense of verbosity, though, but having the ability to express semantics explicitly is a huge benefit for flexibility.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/cmlValid.png",
      "date_published": "2008-12-30T00:10:00+00:00",
      "date_modified": "2025-10-11T00:00:00+00:00",
      "tags": ["cml","bioclipse","xml","cdk"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/hnkbp-nnn23",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/12/30/11-years-of-debian.html",
      "title": "11 Years of Debian",
      "content_html": "<p>11 years ago, a day more or less, I bought an the special issue of <a href=\"http://www.chip.de/\">CHIP</a> which <a href=\"http://lists.debian.org/debian-user/1998/01/msg00359.html\">shipped Debian 1.3.1</a>.\nI think I’ve tried <a href=\"http://www.opensuse.org/\">SuSe</a> and <a href=\"http://www.redhat.com/\">RedHat</a> earlier that year, but this <a href=\"http://www.debian.org/\">Debian</a> release made me switch away\nfrom proprietary products 98% (taxes I still had to do with Windows98). Right now, I am mostly running <a href=\"http://www.ubuntu.com/\">Ubuntu</a>, which leans heavily on the work of the Debian project.</p>\n\n<p>I celebrated by installing a prerelease of Lenny, Debian’s next stable release, but still testing now, in a virtual box with <a href=\"http://www.virtualbox.org/\">VirtualBox</a>.\nWorks like a charm, and will allow me in 2009 to finally pick up some packaging work for Debian, and maybe, finally, get Jmol available in Debian main.</p>",
      "summary": "11 years ago, a day more or less, I bought an the special issue of CHIP which shipped Debian 1.3.1. I think I’ve tried SuSe and RedHat earlier that year, but this Debian release made me switch away from proprietary products 98% (taxes I still had to do with Windows98). Right now, I am mostly running Ubuntu, which leans heavily on the work of the Debian project.",
      
      "date_published": "2008-12-30T00:00:00+00:00",
      "date_modified": "2008-12-30T00:00:00+00:00",
      "tags": ["linux","debian"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/xgy88-kfs02",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/12/26/state-of-cdk-120.html",
      "title": "State of CDK 1.2.0...",
      "content_html": "<p>The reason why I have not blogged in more than two weeks, was that I was hoping to blog about the CDK 1.2.0 release. This was originally aimed at\nSeptember, slipped into October, November and then December. There were only three show stoppers (see <a href=\"https://apps.sourceforge.net/mediawiki/cdk/index.php?title=CDK_1.2_TODO\">this wiki page</a>),\none of which the <a href=\"http://cheminfo.informatics.indiana.edu/~rguha/code/java/nightly/api/org/openscience/cdk/interfaces/IChemObject.html\">IChemObject</a>\ninterfaces were not properly tested.</p>\n\n<p>The problem was that the unit tests for the methods in superinterfaces were not applied to implementations of subinterfaces.\nFor example, the unit test for <code class=\"language-plaintext highlighter-rouge\">IElement.getSymbol()</code> was not applied to the class <code class=\"language-plaintext highlighter-rouge\">Atom</code>, which implements <code class=\"language-plaintext highlighter-rouge\">IAtom</code> which\nis a subinterfaces of <code class=\"language-plaintext highlighter-rouge\">IElement</code>.</p>\n\n<p>In fixing this, I had to take some hurdles. For example: the unit test classes used a set up following the implementations; CDK 1.2.x has three\nimplementations of the interfaces: data, datadebug and nonotify. The last does not send around update notifications, and rough tests indicate it\nis about 10% faster. The second implementation sends messages to the debugger for every modification of the data classes, which is, clearly,\nuseful for debugging purposes.</p>\n\n<p>However, the JUnit4 test classes were basically doing the same. The unit test <code class=\"language-plaintext highlighter-rouge\">DebugAtomTest</code> inherited form <code class=\"language-plaintext highlighter-rouge\">AtomTest</code>, and only overwrote\ncustomizations. <code class=\"language-plaintext highlighter-rouge\">AtomTest</code>, itself, inherited from <code class=\"language-plaintext highlighter-rouge\">ElementTest</code>. That’s where things got broken. In the single implementation set up, this\nwould have been fine, but to allow testing of all three implementations, <code class=\"language-plaintext highlighter-rouge\">getBuilder()</code> had to be used.</p>\n\n<p>And when I implemented that, I did not realize that <code class=\"language-plaintext highlighter-rouge\">ElementTest</code> would do a test like:</p>\n\n<div class=\"language-java highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nc\">IElement</span> <span class=\"n\">element</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newElement</span><span class=\"o\">();</span>\n<span class=\"c1\">// test IElement functionality</span>\n</code></pre></div></div>\n\n<p>However, while the use of builder ensure testing of all three implementations, it does <strong>not</strong> run these tests on <code class=\"language-plaintext highlighter-rouge\">IAtom</code> implementations.</p>\n\n<p>The followed a long series of patches to get this fixed. One major first patch, was to define unit test frameworks like <code class=\"language-plaintext highlighter-rouge\">AbstractElementTest</code>\nwhich formalized running unit tests on any implementation, as I noticed that quite a few tests were still testing one particular implementation.\nThis allowed <code class=\"language-plaintext highlighter-rouge\">DebugElementTest</code> to extend <code class=\"language-plaintext highlighter-rouge\">AbstractElementTest</code>, instead of <code class=\"language-plaintext highlighter-rouge\">ElementTest</code>, which would now extend <code class=\"language-plaintext highlighter-rouge\">AbstractElementTest</code> too.</p>\n\n<p>OK, with that out of the way, it was time to fix running the unit test for <code class=\"language-plaintext highlighter-rouge\">IElement.getSymbol()</code> on <code class=\"language-plaintext highlighter-rouge\">IAtom.getSymbol()</code>, which required the\nremoval of the use of <code class=\"language-plaintext highlighter-rouge\">IChemObjectBuilder</code> implementations. So, I introduced <code class=\"language-plaintext highlighter-rouge\">newChemObject()</code> which would return a fresh instance of the\nactually tested implementation. That is, <code class=\"language-plaintext highlighter-rouge\">DebugAtomTest</code> would return a new <code class=\"language-plaintext highlighter-rouge\">DebugAtom</code>, and the <code class=\"language-plaintext highlighter-rouge\">getSymbol()</code> test would now run on\n<code class=\"language-plaintext highlighter-rouge\">DebugAtom</code> and not <code class=\"language-plaintext highlighter-rouge\">DebugElement</code>. Good.</p>\n\n<p>No, not good. The actual implementation I was using, looks like:</p>\n\n<div class=\"language-java highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kd\">public</span> <span class=\"kd\">class</span> <span class=\"nc\">DebugElementTest</span> <span class=\"n\">extend</span> <span class=\"nc\">AbstractElementTest</span> <span class=\"o\">{</span>\n  <span class=\"nd\">@BeforeClass</span> <span class=\"kd\">public</span> <span class=\"kd\">static</span> <span class=\"kt\">void</span> <span class=\"nf\">setup</span><span class=\"o\">()</span> <span class=\"o\">{</span>\n    <span class=\"n\">setChemObject</span><span class=\"o\">(</span><span class=\"k\">new</span> <span class=\"nc\">DebugElement</span><span class=\"o\">());</span>\n  <span class=\"o\">}</span>\n<span class=\"o\">}</span>\n\n<span class=\"kd\">public</span> <span class=\"kd\">abstract</span> <span class=\"kd\">class</span> <span class=\"nc\">AbstractElementTest</span> <span class=\"n\">extend</span> <span class=\"nc\">AbstractChemObjectTest</span> <span class=\"o\">{</span>\n  <span class=\"nd\">@Test</span> <span class=\"kd\">public</span> <span class=\"kt\">void</span> <span class=\"nf\">testGetSymbol</span><span class=\"o\">()</span> <span class=\"o\">{</span>\n    <span class=\"nc\">IElement</span> <span class=\"n\">element</span> <span class=\"o\">=</span> <span class=\"o\">(</span><span class=\"nc\">IElement</span><span class=\"o\">)</span><span class=\"n\">newChemObject</span><span class=\"o\">();</span>\n    <span class=\"c1\">// do testing</span>\n  <span class=\"o\">}</span>\n<span class=\"o\">}</span>\n\n<span class=\"kd\">public</span> <span class=\"kd\">abstract</span> <span class=\"kd\">class</span> <span class=\"nc\">AbstractChemObjectTest</span> <span class=\"o\">{</span>\n  <span class=\"kd\">private</span> <span class=\"nc\">IChemObject</span> <span class=\"n\">testedObject</span><span class=\"o\">;</span>\n  <span class=\"kd\">public</span> <span class=\"kd\">static</span> <span class=\"nf\">setChemObject</span><span class=\"o\">(</span><span class=\"nc\">IChemObject</span> <span class=\"n\">object</span><span class=\"o\">)</span> <span class=\"o\">{</span>\n    <span class=\"k\">this</span><span class=\"o\">.</span><span class=\"na\">testedObject</span> <span class=\"o\">=</span> <span class=\"n\">object</span><span class=\"o\">;</span>\n  <span class=\"o\">}</span>\n  <span class=\"kd\">public</span> <span class=\"nc\">IChemObject</span> <span class=\"nf\">setChemObject</span><span class=\"o\">(</span><span class=\"nc\">IChemObject</span> <span class=\"n\">object</span><span class=\"o\">)</span> <span class=\"o\">{</span>\n    <span class=\"k\">return</span> <span class=\"o\">(</span><span class=\"nc\">IChemObject</span><span class=\"o\">)</span><span class=\"n\">testedObject</span><span class=\"o\">.</span><span class=\"na\">clone</span><span class=\"o\">();</span>\n  <span class=\"o\">}</span> <span class=\"c1\">// just imagine it has try/catch here too</span>\n\n  <span class=\"c1\">// and here the tests for the IChemObject API</span>\n  <span class=\"nd\">@Test</span> <span class=\"kd\">public</span> <span class=\"kt\">void</span> <span class=\"nf\">testGetProperties</span><span class=\"o\">()</span> <span class=\"o\">{</span>\n    <span class=\"nc\">IChemObject</span> <span class=\"n\">element</span> <span class=\"o\">=</span> <span class=\"o\">(</span><span class=\"nc\">IChemObject</span><span class=\"o\">)</span><span class=\"n\">newChemObject</span><span class=\"o\">();</span>\n    <span class=\"c1\">// do testing</span>\n  <span class=\"o\">}</span>\n<span class=\"o\">}</span>\n</code></pre></div></div>\n\n<p>Excellent! No.</p>\n\n<p>Well, yes. The above system works, but made many unit tests fail, because of bugs in <code class=\"language-plaintext highlighter-rouge\">clone()</code> methods. The full scope has to be explored,\nbut at least <code class=\"language-plaintext highlighter-rouge\">IPolymer.clone()</code> is not doing what I would expect it to do. Either I am wrong, and need to overwrite the clone unit tests\nof superinterfaces in <code class=\"language-plaintext highlighter-rouge\">AbstractPolymerTest</code>, or the implementations needs fixing. I emailed the cdk-devel mailing list and filed a bug\nreport. But having about 1000 unit tests fail, because of clone broken, is something I did not like. For example, as it makes bug fixing\nmore difficult.</p>\n\n<p>So, next step was to find an approach that did not require clone, but give some interesting insights in the Java language. JUnit4 requires\nthe <code class=\"language-plaintext highlighter-rouge\">@BeforeClass</code> method to be static. This means I cannot have a non-static <code class=\"language-plaintext highlighter-rouge\">DebugElementTest</code> method return an instance. And, you\ncannot overwrite a static method! That had never occured to me in the past. <code class=\"language-plaintext highlighter-rouge\">DebugElementTest.newChemObject()</code> does not overwrite\n<code class=\"language-plaintext highlighter-rouge\">AbstractChemObjectTest.newChemObject</code> which is somewhere upstream.</p>\n\n<p>But, after discussing matters with Carl, I ended up with this approach:</p>\n\n<div class=\"language-java highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kd\">public</span> <span class=\"kd\">abstract</span> <span class=\"kd\">class</span> <span class=\"nc\">AbstractChemObjectTest</span> <span class=\"kd\">extends</span> <span class=\"nc\">CDKTestCase</span> <span class=\"o\">{</span>\n  <span class=\"kd\">private</span> <span class=\"kd\">static</span> <span class=\"nc\">ITestObjectBuilder</span> <span class=\"n\">builder</span><span class=\"o\">;</span>\n  <span class=\"kd\">public</span> <span class=\"kd\">static</span> <span class=\"kt\">void</span> <span class=\"nf\">setTestObjectBuilder</span><span class=\"o\">(</span><span class=\"nc\">ITestObjectBuilder</span> <span class=\"n\">builder</span><span class=\"o\">)</span> <span class=\"o\">{</span>\n    <span class=\"nc\">AbstractChemObjectTest</span><span class=\"o\">.</span><span class=\"na\">builder</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">;</span>\n  <span class=\"o\">}</span>\n  <span class=\"kd\">public</span> <span class=\"kd\">static</span> <span class=\"nc\">IChemObject</span> <span class=\"nf\">newChemObject</span><span class=\"o\">()</span> <span class=\"o\">{</span>\n    <span class=\"k\">return</span> <span class=\"nc\">AbstractChemObjectTest</span><span class=\"o\">.</span><span class=\"na\">builder</span><span class=\"o\">.</span><span class=\"na\">newTestObject</span><span class=\"o\">();</span>\n  <span class=\"o\">}</span>\n<span class=\"o\">}</span>\n\n<span class=\"kd\">public</span> <span class=\"kd\">interface</span> <span class=\"nc\">ITestObjectBuilder</span> <span class=\"o\">{</span>\n  <span class=\"kd\">public</span> <span class=\"nc\">IChemObject</span> <span class=\"nf\">newTestObject</span><span class=\"o\">();</span>\n<span class=\"o\">}</span>\n\n<span class=\"kd\">public</span> <span class=\"kd\">class</span> <span class=\"nc\">DebugAtomTest</span> <span class=\"kd\">extends</span> <span class=\"nc\">AbstractAtomTest</span> <span class=\"o\">{</span>\n  <span class=\"nd\">@BeforeClass</span> <span class=\"kd\">public</span> <span class=\"kd\">static</span> <span class=\"kt\">void</span> <span class=\"nf\">setUp</span><span class=\"o\">()</span> <span class=\"o\">{</span>\n    <span class=\"n\">setTestObjectBuilder</span><span class=\"o\">(</span><span class=\"k\">new</span> <span class=\"nc\">ITestObjectBuilder</span><span class=\"o\">()</span> <span class=\"o\">{</span>\n      <span class=\"kd\">public</span> <span class=\"nc\">IChemObject</span> <span class=\"nf\">newTestObject</span><span class=\"o\">()</span> <span class=\"o\">{</span>\n        <span class=\"k\">return</span> <span class=\"k\">new</span> <span class=\"nf\">DebugAtom</span><span class=\"o\">();</span>\n      <span class=\"o\">}</span>\n    <span class=\"o\">});</span>\n  <span class=\"o\">}</span>\n<span class=\"o\">}</span>\n</code></pre></div></div>",
      "summary": "The reason why I have not blogged in more than two weeks, was that I was hoping to blog about the CDK 1.2.0 release. This was originally aimed at September, slipped into October, November and then December. There were only three show stoppers (see this wiki page), one of which the IChemObject interfaces were not properly tested.",
      
      "date_published": "2008-12-26T00:00:00+00:00",
      "date_modified": "2008-12-26T00:00:00+00:00",
      "tags": ["cdk","junit","java"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/dm9c0-tzs46",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/12/08/peer-reviewed-cheminformatics-2-code.html",
      "title": "Peer reviewed Cheminformatics #2: Code review for the Chemistry Development Kit",
      "content_html": "<p>Peer review is an important component of open source development, and recently there was the discussion the other way around, if open source is\nrequired for peer review. Depends on your definition of peer review: No, if you restrict peer review to what it is in publishing (see\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2008/11/12/re-open-source-peer-review.html\">Re: Open Source != peer review <i class=\"fa-solid fa-recycle fa-xs\"></i></a>); Yes, if we really want to speed up\ncheminformatics evolution and assume unrestricted, open peer review where reviewers can openly publish there review report with all the greasy\ndetails (see <a href=\"https://chem-bla-ics.linkedchemistry.info/2008/07/22/peer-reviewed-chemoinformatics-why.html\">Peer reviewed Chemoinformatics: Why OpenSource Chemoinformatics should be the default <i class=\"fa-solid fa-recycle fa-xs\"></i></a>).</p>\n\n<p>The <a href=\"http://www.chemistry-development-kit.org/\">CDK</a> has a strong history of peer review. Patches have been available from\n<a href=\"http://cdk.svn.sf.net/\">SVN</a> from the start, and later we instantiated a mailing list so that people could easily monitor code changes, and\nI have actually being doing this since the start, scanning the code patches, knowing that a lot of code is backed up by unit tests to detect\nregressions. Anyone can review CDK code in this manner, just by subscribing to the <a href=\"https://sourceforge.net/mailarchive/forum.php?forum_name=cdk-commits\">cdk-commits</a>\nmailing list. If one has questions or comments on a patch, a reply to <a href=\"https://sourceforge.net/mailarchive/forum.php?forum_name=cdk-devel\">cdk-devel</a>\nis all that is needed to get things going.</p>\n\n<p>About a year ago, CDK development had become so extensive that code review in this manner was no longer the way forward (though still possible,\nand still used). However, it turned out that it was all too easy to overlook a patch or just click it away in busy times. This was experienced\nby some developers who previously monitored the cdk-commit messages sketched above. So, we moved to a more formal patching system where any\nnon-trivial patching is done in a SVN branch. Once the primary developer is happy about the branch, (s)he requests a review by other developers.\nThese can leave comments in the source code, reply to the mailing list, or leave comments in the <a href=\"https://sourceforge.net/tracker2/?group_id=20024&amp;atid=320024\">CDK patch tracker</a>.\nThis more formal work habit got into action about half a year ago already.</p>\n\n<p>A <a href=\"https://sourceforge.net/mailarchive/forum.php?thread_name=200812041823.25546.stefan.kuhn%40ebi.ac.uk&amp;forum_name=cdk-devel\">recent message</a>\nfrom Stefan makes clear that this tracker has some room for improvements. For example, there is no automatic email to cdk-devel when a patch\nhas not been tended to for a longer period of time. And, I do not see a simple way of doing this with the SourceForge bug track system.</p>\n\n<p>But, what I can do, is define a number of groups to represent the state of the patch. So, I defined:</p>\n\n<ul>\n  <li><a href=\"https://sourceforge.net/tracker2/?func=browse&amp;group_id=20024&amp;atid=320024&amp;status=1&amp;artgroup=896890\">Needs Review</a>: this patch has not been reviewed (sufficiently) yet</li>\n  <li><a href=\"https://sourceforge.net/tracker2/?func=browse&amp;group_id=20024&amp;atid=320024&amp;status=1&amp;artgroup=896891\">Accepted</a>: but not yet applied to SVN yet. When applied, the patch report is simply closed</li>\n  <li><a href=\"https://sourceforge.net/tracker2/?func=browse&amp;group_id=20024&amp;atid=320024&amp;status=1&amp;artgroup=896892\">Needs Revision</a>: the reviewers like to see changes made to the patch</li>\n</ul>\n\n<p><img src=\"/assets/images/cdkPatching.png\" alt=\"\" /></p>\n\n<p>Not perfect, but a step forward in tracking the state of patches.</p>",
      "summary": "Peer review is an important component of open source development, and recently there was the discussion the other way around, if open source is required for peer review. Depends on your definition of peer review: No, if you restrict peer review to what it is in publishing (see Re: Open Source != peer review ); Yes, if we really want to speed up cheminformatics evolution and assume unrestricted, open peer review where reviewers can openly publish there review report with all the greasy details (see Peer reviewed Chemoinformatics: Why OpenSource Chemoinformatics should be the default ).",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/cdkPatching.png",
      "date_published": "2008-12-08T00:00:00+00:00",
      "date_modified": "2025-10-11T00:00:00+00:00",
      "tags": ["cdk","openscience"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/wngzb-y3329",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/12/05/cheminformatics-benchmark-project-1.html",
      "title": "Cheminformatics Benchmark Project #1",
      "content_html": "<p>Yesterday’s blog about <a href=\"https://chem-bla-ics.linkedchemistry.info/2008/12/04/who-says-java-is-not-fast.html\">Who says Java is not fast?!? <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\ncaused quite some feedback (thanx to all commenters!) with several good points. Of course, a table like that in the cinfony paper\n(see also the comments in the blogs by <a href=\"http://baoilleach.blogspot.com/2008/12/cinfony-paper-published-in-chemistry.html\">Noel</a>\n(the author) and <a href=\"https://doi.org/10.59350/1ph8m-fj607\">Rich <i class=\"fa-solid fa-recycle fa-xs\"></i></a>). Many things determine why the CDK\nmight be fastest in that table for SDF iterating. Suggestions have been that OpenBabel and RDKit may be doing much more than simple reading; Java might actually take advantage of the second core for caching file content.</p>\n\n<p><a href=\"http://www.simbiosys.ca/blog/\">ZZ</a> observed something I overlooked: calculating the molecular mass in CDK is by far slowest\nof all three toolkit, though people have suggestions on why that may is.</p>\n\n<h2 id=\"benchmarking\">Benchmarking</h2>\n\n<p>The correct way to compare toolkits, open source, proprietary, free, commercial, is to have a proper benchmark toolkit for\ncheminformatics. That’s what I am suggesting here: <a href=\"http://github.com/egonw/cheminfbenchmark/tree/master\">a project to define simple and fair benchmarks</a>.\nIt’s an open project, and anyone can contribute in order to keep tests balanced in impartial towards any tested toolkit.</p>",
      "summary": "Yesterday’s blog about Who says Java is not fast?!? caused quite some feedback (thanx to all commenters!) with several good points. Of course, a table like that in the cinfony paper (see also the comments in the blogs by Noel (the author) and Rich ). Many things determine why the CDK might be fastest in that table for SDF iterating. Suggestions have been that OpenBabel and RDKit may be doing much more than simple reading; Java might actually take advantage of the second core for caching file content.",
      
      "date_published": "2008-12-05T00:00:00+00:00",
      "date_modified": "2025-12-19T00:00:00+00:00",
      "tags": ["cheminf","cdk"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.59350/1ph8m-fj607", "doi": "10.59350/1ph8m-fj607"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/j9em9-aad11",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/12/04/who-says-java-is-not-fast.html",
      "title": "Who says Java is not fast?!?",
      "content_html": "<p>While performance tests actually show that for even core numerical calculations Java is at par with C in terms of speeds,\nand sometimes even hits Fortran-like speeds, people keep think that Java is not fast. I only invite you to test that yourself.</p>\n\n<p>Meanwhile, I would like to take the opportunity to advertise <a href=\"http://baoilleach.blogspot.com/\">Noel</a>’s\n<a href=\"http://code.google.com/p/cinfony/\">cinfony</a> paper in <a href=\"http://www.journal.chemistrycentral.com/home/\">CCJ</a>\n(doi:<a href=\"https://doi.org/10.1186/1752-153X-2-24\">10.1186/1752-153X-2-24</a>) which features these speed measurements\n(from the paper, CC-BY license):</p>\n\n<p><img src=\"/assets/images/javaIsFast.png\" alt=\"\" /></p>\n\n<p>I have to say that these numbers surprised me, as the <a href=\"http://cdk.sf.net/\">CDK</a> is hardly optimized for speed at al…</p>",
      "summary": "While performance tests actually show that for even core numerical calculations Java is at par with C in terms of speeds, and sometimes even hits Fortran-like speeds, people keep think that Java is not fast. I only invite you to test that yourself.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/javaIsFast.png",
      "date_published": "2008-12-04T00:10:00+00:00",
      "date_modified": "2008-12-04T00:10:00+00:00",
      "tags": ["java","cdk"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1186/1752-153X-2-24", "doi": "10.1186/1752-153X-2-24"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/56vn2-zgc48",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/12/04/short-variables-and-lack-of-comments.html",
      "title": "Short variables and lack of comments...",
      "content_html": "<p>… a source code reviewer nightmare. The must-read <a href=\"http://www.lwn.net/\">lwn.net</a> ran a <a href=\"http://lwn.net/Articles/308566/\">nice open letter</a>\nto a Linux kernel developer. I’d like to cite this bit about code review (see also <a href=\"https://chem-bla-ics.linkedchemistry.info/2008/11/12/re-open-source-peer-review.html\">Re: Open Source != peer review <i class=\"fa-solid fa-recycle fa-xs\"></i></a>):</p>\n\n<blockquote>\n  <p>[Andrew Morton] had a number of concrete requests - such as documenting the user-space ABI and the network protocol - which have not been\nsatisfied. He also asked for better code documentation in general:</p>\n\n  <ul>\n  So please. Go through all the code and make it tell a story. Ask yourself \"how would I explain all this to a kernel developer who is sitting next to me\".\n  It's important, and it's an important skill.\n</ul>\n</blockquote>\n\n<p>This is important indeed! This is also why CDK quality assurance tends to complain about short variables. While an for-next index <code class=\"language-plaintext highlighter-rouge\">i</code> is clear enough, <code class=\"language-plaintext highlighter-rouge\">ac</code> for an\n<code class=\"language-plaintext highlighter-rouge\">IAtomContainer</code> is quite useless, as it does not explain what the purpose of the container is. BTW, a longer name like <code class=\"language-plaintext highlighter-rouge\">atomContainer</code>\ndoes not really help here. Maybe I will wrote a unit test for that…</p>",
      "summary": "… a source code reviewer nightmare. The must-read lwn.net ran a nice open letter to a Linux kernel developer. I’d like to cite this bit about code review (see also Re: Open Source != peer review ):",
      
      "date_published": "2008-12-04T00:00:00+00:00",
      "date_modified": "2025-10-11T00:00:00+00:00",
      
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/kcxg1-z9t12",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/11/30/parallel-building-cdk.html",
      "title": "Parallel building the CDK",
      "content_html": "<p>Some time ago, I added parallel building targets for <a href=\"http://cdk.sf.net/\">CDK</a>’s <a href=\"http://ant.apache.org/\">Ant</a> <code class=\"language-plaintext highlighter-rouge\">build.xml</code>. Now that I am setting up a\n<a href=\"http://pele.farmbio.uu.se/nightly/\">Nightly</a> for the <a href=\"http://cdk.svn.sourceforge.net/viewvc/cdk/cdk/branches/jchempaint-primary/\">jchempaint-primary</a> branch,\nand really only want to report on the CDK modules <code class=\"language-plaintext highlighter-rouge\">control</code> and <code class=\"language-plaintext highlighter-rouge\">render</code>, I need the build system to use a properties files to define which modules\nshould be compiled.</p>\n\n<p>So, I hacked a bit on the build system, and made use of two <a href=\"http://ant-contrib.sourceforge.net/\">ant-contrib</a> tasks, <code class=\"language-plaintext highlighter-rouge\">if</code> and <code class=\"language-plaintext highlighter-rouge\">foreach</code> which in the first\nplace reduce the size of the <code class=\"language-plaintext highlighter-rouge\">build.xml</code>, but also provide means for parallelization. Earlier, it was using the <code class=\"language-plaintext highlighter-rouge\">parallel</code> task of\nAnt itself for this (see <a href=\"https://chem-bla-ics.linkedchemistry.info/2008/03/23/cdk-module-dependencies-2.html\">CDK Module dependencies #2 <i class=\"fa-solid fa-recycle fa-xs\"></i></a>).</p>\n\n<p>The build dependencies between CDK modules are fairly complex, and typically this complexity increases upon bug fixing etc. Ideally, the build dependencies\nwill be calculated on runtime, instead of being hard-coded right now, and I will explore this in the near future.</p>\n\n<p>These dependencies can be used to build some of the module in parallel, but not all. This causes speed up of the compilation not to scale linearly with the number of\nthreads or cores. The <a href=\"http://spreadsheets.google.com/ccc?key=pdSdCFxXYzmOnssF4qVzwCw&amp;hl=en\">below build times</a> are calculated for three replicates, on a four\ncore machine:</p>\n\n<p><img src=\"/assets/images/buildtimes.png\" alt=\"\" /></p>\n\n<p>Going from one to two threads certainly pays of, but going to 4 shows only a three second speed up. The four processor cores were not utilized 100%,\nso I also attempted 2 threads core, but that showed zero improvement.</p>",
      "summary": "Some time ago, I added parallel building targets for CDK’s Ant build.xml. Now that I am setting up a Nightly for the jchempaint-primary branch, and really only want to report on the CDK modules control and render, I need the build system to use a properties files to define which modules should be compiled.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/buildtimes.png",
      "date_published": "2008-11-30T00:00:00+00:00",
      "date_modified": "2025-10-11T00:00:00+00:00",
      "tags": ["cdk"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/pmapp-aex40",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/11/24/software-is-method-meme.html",
      "title": "Software is a Method (Meme)",
      "content_html": "<ol>\n  <li>it provides a recipe to approach (scientific) questions</li>\n  <li>let’s you cook up a (scientific) answer</li>\n  <li>you can use it as a black box (like an <a href=\"http://en.wikipedia.org/wiki/Orbitrap\">orbitrap</a>)</li>\n  <li>you can refine existing methods (well, some can, others don’t)</li>\n  <li>it has an error (but I do not believe it is normally distributed)</li>\n</ol>\n\n<p>Now, to me it’s trivial to work put how Open Source supports this.</p>",
      "summary": "it provides a recipe to approach (scientific) questions let’s you cook up a (scientific) answer you can use it as a black box (like an orbitrap) you can refine existing methods (well, some can, others don’t) it has an error (but I do not believe it is normally distributed)",
      
      "date_published": "2008-11-24T00:00:00+00:00",
      "date_modified": "2008-11-24T00:00:00+00:00",
      
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/yhdxa-ft783",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/11/20/scripting-jchempaint.html",
      "title": "Scripting JChemPaint",
      "content_html": "<p>Today and tomorrow, Stefan, <a href=\"http://gilleain.blogspot.com/\">Gilleain</a>, Arvid and I are having a <a href=\"https://apps.sourceforge.net/mediawiki/cdk/index.php?title=JChemPaintWorkshop2008\">JChemPaint Developers Workshop</a>\nin Uppsala, to sprint the development of JChemPaint3, for which <a href=\"http://progz-jchem.blogspot.com/\">Niels</a> layed out the foundation already a long time ago.</p>\n\n<p>Gilleain and Arvid are merging their branches into a <a href=\"http://cdk.svn.sourceforge.net/viewvc/cdk/cdk/branches/jchempaint-primary/\">single code base</a>,\nwhile Stefan is working on the <a href=\"http://cdk.svn.sourceforge.net/viewvc/cdk/jchempaint/trunk/\">Swing application and applet</a>. The Bioclipse SWT-based widget is being developed for\n<a href=\"http://bioclipse.svn.sourceforge.net/viewvc/bioclipse/bioclipse2/trunk/plugins/net.bioclipse.cdk.jchempaint.view/\">Bioclipse2</a>.</p>\n\n<p>The new design separates widget/graphics toolkit specifics from the chemical drawing and editing logic. Regarding the editing functionality, this basically comes down to have a\nsemantically meaningful edit API. This allows us to convert both Swing and SWT mouse events into things like <code class=\"language-plaintext highlighter-rouge\">addAtom(\"C\", atom)</code>, which would add a carbon to an already\nexisting <em>atom</em>. However, without too much phantasy, it allows adding a scripting language. This is what I have been working on. Right now, the following API is available\nfrom the Bioclipse2 JavaScript console (via the <em>jcp</em> namespace, in random order):</p>\n\n<ul>\n  <li>ICDKMolecule jcp.getModel()</li>\n  <li>IAtom getClosestAtom(Point2d)</li>\n  <li>setModel(ICDKMolecule) <em>(for really fancy things)</em></li>\n  <li>removeAtom(IAtom)</li>\n  <li>IBond getClosestBond(Point2d)</li>\n  <li>updateView() <em>(all edit command issue this automatically)</em></li>\n  <li>addAtom(String,Point2d)</li>\n  <li>addAtom(String,IAtom) <em>(which works out coordinates automatically)</em></li>\n  <li>Point2d newPoint2d(double,double)</li>\n  <li>updateImplicitHydrogenCounts()</li>\n  <li>moveTo(IAtom, Point2d)</li>\n  <li>setSymbol(IAtom,String)</li>\n  <li>setCharge(IAtom,int)</li>\n  <li>setMassNumber(IAtom,int)</li>\n  <li>addBond(IAtom,IAtom)</li>\n  <li>moveTo(IBond,Point2d)</li>\n  <li>setOrder(IBond,IBond.Order)</li>\n  <li>setWedgeType(IBond,int)</li>\n  <li>IBond.Order getOrder(int)</li>\n  <li>zap() <em>(sort of <code class=\"language-plaintext highlighter-rouge\">sudo rm -Rf /*</code>)</em></li>\n  <li>cleanup() <em>(calculate 2D coordinates from scratch)</em></li>\n  <li>addRing(IAtom,int)</li>\n  <li>addPhenyl(IAtom)</li>\n</ul>\n\n<p>This API (many more method will follow) is not really aimed at the end user, who will simply point and click. The goal of this scripting language is, at least at this moment,\nto test the underlying implementation using Bioclipse. Future applications, however, may include simple scripts which use some logic to convert the editor content. For example,\nreplacing a t-butyl fragment into a pseudo atom “t-Bu”. The key thing to remember, is that this will allow Bioclipse to have non-CDK-based programs act on the JChemPaint editor\ncontent (e.g. using <code class=\"language-plaintext highlighter-rouge\">getModel()</code> and <code class=\"language-plaintext highlighter-rouge\">setModel(ICDKMolecule)</code>). More on that later.</p>\n\n<p>A simple script could look like: Or, as screenshot:</p>\n\n<p><img src=\"/assets/images/jcpScripting.png\" alt=\"\" /></p>",
      "summary": "Today and tomorrow, Stefan, Gilleain, Arvid and I are having a JChemPaint Developers Workshop in Uppsala, to sprint the development of JChemPaint3, for which Niels layed out the foundation already a long time ago.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/jcpScripting.png",
      "date_published": "2008-11-20T00:00:00+00:00",
      "date_modified": "2008-11-20T00:00:00+00:00",
      
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/ydgw3-52978",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/11/18/solubility-data-in-bioclipse-1.html",
      "title": "Solubility Data in Bioclipse #1",
      "content_html": "<p>I am working on converting <a href=\"http://usefulchem.blogspot.com/\">Jean-Claude</a>’s <a href=\"https://spreadsheets.google.com/ccc?key=plwwufp30hfq0udnEmRD1aQ&amp;hl=en\">Solubility</a>\ndata to <a href=\"http://www.w3.org/RDF/\">RDF</a> (after <a href=\"http://plindenbaum.blogspot.com/\">Pierre</a>’s model, see <a href=\"http://usefulchem.blogspot.com/2008/10/rdf-triples-for-open-notebook-science.html\">here</a>,\n<a href=\"http://usefulchem.blogspot.com/2008/11/ons-solubility-web-query.html\">here</a>, and <a href=\"http://anybody.cephb.fr/perso/lindenb/tmp/jcbradley.rdf\">here</a>,\n<a href=\"http://rguha.wordpress.com/2008/11/06/solubility-queries-and-the-google-visualization-api/\">here</a> for first data exploration), so that I can integrate it with data from\n<a href=\"http://dbpedia.org/About\">DBPedia</a>, <a href=\"http://www.freebase.com/\">Freebase</a>, <a href=\"http://rdf.openmolecules.net/\">rdf.openmolecules.net</a>, etc.\n<a href=\"http://www.bioclipse.net/\">Bioclipse</a> will be the workbench in which this will be visualized, and just got graph depiction online using\n<a href=\"http://www.eclipse.org/gef/zest/\">Zest</a>. The screenshot does not show the RDF yet, but that will follow soon:</p>\n\n<p><img src=\"/assets/images/bioclipseRDF.png\" alt=\"\" /></p>\n\n<p>Next stops:</p>\n\n<ul>\n  <li>create a Eclipse package for Jena</li>\n  <li>read the Solubility data (does anyone know a Java library to read from Google Docs?)</li>\n  <li>create a virtual database of Solubility compounds (possibly StructureDB-based)</li>\n  <li>Use the CDK to autoextract chemical triples</li>\n</ul>",
      "summary": "I am working on converting Jean-Claude’s Solubility data to RDF (after Pierre’s model, see here, here, and here, here for first data exploration), so that I can integrate it with data from DBPedia, Freebase, rdf.openmolecules.net, etc. Bioclipse will be the workbench in which this will be visualized, and just got graph depiction online using Zest. The screenshot does not show the RDF yet, but that will follow soon:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/bioclipseRDF.png",
      "date_published": "2008-11-18T00:00:00+00:00",
      "date_modified": "2008-11-18T00:00:00+00:00",
      "tags": ["bioclipse","rdf"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/msk9v-2s364",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/11/12/re-open-source-peer-review.html",
      "title": "Re: Open Source != peer review",
      "content_html": "<p>Andrew has an <a href=\"http://dalkescientific.com/writings/diary/archive/2008/11/11/open_source_is_not_peer_review.html\">interesting thread</a>\non the content of a slide of a recent presentation. In the comments you can read the back and forth on things; indeed, there are very many\naspects to things and he did ask a very complex question, of which he assumed that I understood what he was asking, and I indeed assumed\ntoo that I understood what he was asking:</p>\n\n<blockquote>\n  <p>Some argue that doing good computational-based science requires open source. The argument is that scientists need to review the source\ncode in order to verify that it works correctly. How, they argue, can you review someone else’s paper if you can’t review the source code\nused to make that paper?</p>\n\n  <p>I like open source. (My talk goes into the philosophical differences between “open source” and “free software.”) I think there should be\nsupport for peer review. But I don’t understand why the ability to see the source code, in order to review it for scientific quality,\nrequires the right to redistribute the source code to others.</p>\n</blockquote>\n\n<p>So, I <em>assumed</em> he was interested in hearing why people thing open source benefits open source. Misinterpreting the last two words, I\nthough access to the code and the ability to redistribute code I find bad in my peer review. There was another incorrect assumption on my\nside: I had open peer review in mind, as I like so much about open source projects, instead of the peer review as in paper peer review,\nprior to the preprint server age. Another thing I understood incorrectly, was that he was only referring to computational packages, not\ncheminformatics in general. My mistake. Being from a GCC meeting, I assumed the latter.</p>\n\n<p>Therefore, a lot of miscommunication. I agree to a large extend with Andrews analysis: peer review is certainly possible without Open\nSource. Actually, this matches closely with the discussion between Cathedral versus Bazaar opensource projects (see\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2008/11/07/opendatasourcestandards-is-not-enough.html\">my post earlier this week <i class=\"fa-solid fa-recycle fa-xs\"></i></a>).\nHe argues that current opensource (cheminformatics) do not have enough eyeballs, and indicates that money buys eyeballs. Indeed it does.</p>\n\n<p>However, the original argument I wanted to make, but failed, is that Open Source (any kind of access to the source code) is a strict\nrequirement for reviewing the implementation. We do not want black boxes.</p>\n\n<p>How you organize this access to the source code is another thing, and topic of much of the discussion in Andrews blog. There are many\nsolutions, but all include some sort of access to the source code. Redistribution is not a requirement, though, if the review is only\nsend upstream, as is common in reviewing papers.</p>\n\n<p>I feel that Open Source is a solution worth fighting for, but I do understand the argument that funding of this approach remains to be a\nproblem. Open Source cheminformatics is the equivalent of a preprint server; one solution to peer review, a good one, I think, not the\nonly one. The parallels are seemingly even stronger: you cannot review a paper by just reading the abstract and the conclusion: a paper\nis not a black box either.</p>\n\n<p>Anyway… just a tip of the iceberg touched in the discussion. Feel free to join in.</p>",
      "summary": "Andrew has an interesting thread on the content of a slide of a recent presentation. In the comments you can read the back and forth on things; indeed, there are very many aspects to things and he did ask a very complex question, of which he assumed that I understood what he was asking, and I indeed assumed too that I understood what he was asking:",
      
      "date_published": "2008-11-12T00:00:00+00:00",
      "date_modified": "2025-10-11T00:00:00+00:00",
      "tags": ["odosos","opensource","cheminf"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/pmme3-yzj05",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/11/10/finding-commit-that-causes-regressions.html",
      "title": "Finding the commit that causes the regressions...",
      "content_html": "<p><a href=\"http://apps.sourceforge.net/mediawiki/cdk/index.php?title=CDK_1.2_TODO\">CDK 1.1.x</a> releases are well in progress,\nbut a recent commit broke a number of unit tests. Here comes\n<a href=\"http://www.kernel.org/pub/software/scm/git/docs/git-bisect.html\">git-bisect</a>.</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nv\">$ </span>git checkout <span class=\"nt\">-b</span> my-local-1.2 cdk1.2.x\n<span class=\"nv\">$ </span>git bisect start\n<span class=\"nv\">$ </span>git bisect bad\n<span class=\"nv\">$ </span>git bisect good 8219139e9236ab8036e9d08c13fcd0482d500c79\n</code></pre></div></div>\n\n<p>These lines indicate that the current version (HEAD) is broken, and that revision 8219139e9236ab8036e9d08c13fcd0482d500c79 was OK. Now,\n<code class=\"language-plaintext highlighter-rouge\">git-bisect</code> does the proper thing, and starts in the middle, allowing me to run my tests, and issue a <code class=\"language-plaintext highlighter-rouge\">git bisect bad</code> or\n<code class=\"language-plaintext highlighter-rouge\">git bisect good</code> depending on whether my test fails or not. The test I am running is:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nv\">$ </span>ant clean dist-all test-dist-all jarTestdata\n<span class=\"nv\">$ </span>ant <span class=\"nt\">-Dmodule</span><span class=\"o\">=</span>smarts test-module\n<span class=\"nv\">$ </span>git bisect <span class=\"o\">[</span>good|bad]\n</code></pre></div></div>\n\n<p>So, if I had to inspect 1024 commits, I’d found the bad commit in 10 times running this test suite. For the culprit I was after it was\n6 times. The outcome was this commit, what I already suspected and emailed about to the\n<a href=\"https://lists.sourceforge.net/lists/listinfo/cdk-devel\">cdk-devel</a> mailing list:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>[fa49ac603c36908f341b25d52a78435cdb8ca4d3] atomicNumber set as default (Integer) CDKConstants.UNSET\n</code></pre></div></div>",
      "summary": "CDK 1.1.x releases are well in progress, but a recent commit broke a number of unit tests. Here comes git-bisect.",
      
      "date_published": "2008-11-10T00:00:00+00:00",
      "date_modified": "2008-11-10T00:00:00+00:00",
      "tags": ["cdk","git"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/kmk49-mj610",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/11/07/opendatasourcestandards-is-not-enough.html",
      "title": "Open{Data|Source|Standards} is not enough: we need Open Projects",
      "content_html": "<p>The <a href=\"http://blueobelisk.sourceforge.net/wiki/Main_Page\">Blue Obelisk</a> mantra <a href=\"http://blueobelisk.sourceforge.net/wiki/ODOSOS\">ODOSOS</a>,\nOpen Data, Open Source, Open Standards, is well known, and much cited too. <a href=\"http://usefulchem.blogspot.com/\">Jean-Claude Bradley</a>\npopularized the <a href=\"http://en.wikipedia.org/wiki/Open_Notebook_Science\">Open Notebook Science</a> (ONS). This has always been nagging me a bit,\nbecause the <a href=\"http://cdk.sf.net/\">CDK</a>, <a href=\"http://www.jmol.org/\">Jmol</a>, JChemPaint and other chemistry projects have done that for much\nlonger, though we did not use notebooks as much, so called it just an open source project. It really is no different, IMO, though\nsurely, there are differences.</p>\n\n<p>Anyway, the key thing which ONS and CDK and Jmol share, is that they use an Open Notebook. Not every Open Source or Open Data project does.\nActually, many scientific Open Source are not open Projects! They are more like the Cathedral than the wished-for Bazaar (see\n<a href=\"http://en.wikipedia.org/wiki/The_Cathedral_and_the_Bazaar\">The Cathedral and the Bazaar</a>). So, Open Source (science) projects are certainly not ONS projects by default!</p>\n\n<p>Now, the CDK actually is ONS, it is a Bazaar. The notebooks we use include:</p>\n\n<ul>\n  <li>open project via <a href=\"https://sourceforge.net/mail/?group_id=20024\">mailing lists</a></li>\n  <li>open methods/results via <a href=\"https://sourceforge.net/svn/?group_id=20024\">subversion</a></li>\n  <li>informal reporting via blogs (e.g. <a href=\"http://rguha.wordpress.com/\">Rajarshi</a>, <a href=\"http://www.steinbeck-molecular.de/steinblog/\">Christoph</a>, <a href=\"http://cdktaverna.wordpress.com/\">Thomas</a>, mine)</li>\n  <li>informal reporting via <a href=\"http://www.cdknews.org/\">CDK News</a></li>\n</ul>\n\n<p>What more would you wish for? That’s not a rhetorical question. Remember that every reader of this blog is in\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2007/11/27/be-in-my-advisory-board-1-being-good.html\">my advisory board <i class=\"fa-solid fa-recycle fa-xs\"></i></a>!</p>\n\n<p>Unfortunately, I do not create work at a workbench myself, so I do not produce new knowledge myself, other than extracted from existing\ndata. That’s really a shame, and I really do hope that Jean-Claude or <a href=\"http://blog.openwetware.org/scienceintheopen\">Cameron</a> will send\nme a box to measure solubilities (see <a href=\"http://usefulchem.blogspot.com/2008/10/rdf-triples-for-open-notebook-science.html\">here</a>,\n<a href=\"http://usefulchem.blogspot.com/2008/11/ons-solubility-web-query.html\">here</a>, and\n<a href=\"http://anybody.cephb.fr/perso/lindenb/tmp/jcbradley.rdf\">here</a>,\n<a href=\"http://rguha.wordpress.com/2008/11/06/solubility-queries-and-the-google-visualization-api/\">here</a> for first data exploration),\neven though I cannot participate in the <a href=\"http://usefulchem.blogspot.com/2008/11/submeta-open-notebook-science-awards.html\">challenge</a>.\n(hint, hint :)</p>\n\n<h2 id=\"from-cathedral-to-bazaar-in-life-sciences\">From Cathedral to Bazaar in Life Sciences</h2>\n\n<p>One Cathedral we ran into with <a href=\"http://www.bioclipse.net/\">Bioclipse</a> was <a href=\"http://www.biocatalogue.org/\">BioCatalogue</a>,\nwhich will serve as website where people can annotate and categorize (web) services. While the project has been around for a while, the\nwebsite was rather uninformative. Fortunately, the projects is going to open up, and be more Bazaar-like. For example, they\nnow started a <a href=\"http://www.biocatalogue.org/wiki\">wiki</a> and a\n<a href=\"http://listserv.manchester.ac.uk/cgi-bin/wa?SUBED1=biocatalogue-friends&amp;A=1\">mailing list</a>. I hope these efforts will continue,\nso that I can contribute from my point of view!</p>\n\n<p>The <a href=\"http://embraceregistry.net/\">EMBRACE Registry</a> is a project with similar goals and a rather nice outcome (which I learned about on\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2008/11/03/embrace-workshop-in-uppsala.html\">Monday <i class=\"fa-solid fa-recycle fa-xs\"></i></a>). It is actually anticipate to be replaced by or merge\nwith BioCatalogue. So, all data I entered, <a href=\"http://prints.cs.man.ac.uk:8081/category/tags/cheminformatics\">cheminformatics workflows</a>\n(look, <a href=\"https://chem-bla-ics.linkedchemistry.info/2008/10/18/chemoinformatics-p0wned-by.html\">no ‘o’ <i class=\"fa-solid fa-recycle fa-xs\"></i></a>), will later be available from BioCatalogue too.\nThat is already my first contribution to BioCatalogue. One enormously interesting feature of the Registry, is that is allows uploading of\ncode to test the service. This will mean the Registry will not only poll if the service is still online (by checking the WSDL file), it\nwill also test if the service behaves properly. Now, immediate thoughts are mashups with <a href=\"http://www.myexperiment.org/\">MyExperiment</a>.\nEach WSDL entry in the Registry points to MyExperiment workflows that use them, and the workflow page would indicate the status of all\nused WDSL services. This integration was already anticipated long before I thought about it, as the involved Cathedrals were nicely\nlocated in the same floor in Manchester.</p>\n\n<p>Below is a screenshot from the EMBRACE Registry for the <a href=\"http://www.chemspider.com/\">ChemSpider</a>\n<a href=\"http://prints.cs.man.ac.uk:8081/service/massspecapi\">WDSL entry</a> for <a href=\"http://www.myexperiment.org/workflows/97\">a workspace</a>\nI <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/11/26/metabolomics-workflows-in-taverna.html\">uploaded <i class=\"fa-solid fa-recycle fa-xs\"></i></a> about a year ago to MyExperiment:</p>\n\n<p><img src=\"/assets/images/registry.png\" alt=\"\" /></p>\n\n<p>BTW, ChemSpider has an Advisory Board of which I am member, but it is also a classical (and intentional) Cathedral project. We do share common interests though, which makes us collaborate.</p>\n\n<h2 id=\"why-important\">Why Important?</h2>\n\n<p>One recurrent theme in Open Source is <a href=\"http://en.wikipedia.org/wiki/Given_enough_eyeballs\">given enough eyeballs, all bugs are shallow</a>.\nThis surely applies to science as well. The difference between the two is that in current science the eyes only inspect with a delay of at\nleast 6 months. Current practice is that research is finished (delay), and when decided publishable written up a paper (delay, and loosing\nvaluable information in the process, as you can read in my blog all the time), and published (even more delay). ONS changes that, and so do\nBazaar-like open source projects, such as the CDK, Jmol and Bioclipse. They bugs are present, whether we like it or not, not just in source\ncode, but in science too. Theories get overthrown, but why should we like the long delays current scientific good practice? Hate it! Work\naround it. Use the Bazaar. Use ONS!</p>\n\n<p>Now, ONS actually needs Open Source, allowing them to deal effectively with the data they produce; to allow extraction of new scientific\nknowledge from the measurements. If Rajarshi and Pierre would not have made their efforts, other could not easily join in, leading to\nthose much hated delays. Bugs should be shallow, and openness allows us to make those bugs visible. We can prove that there is a bug,\nwithout having to reproduce data ourselves, leading to those nasty delays again. Just copy the data, compare it to your own, do your\nanalysis.</p>\n\n<p>One recent project in open source chemistry dealing with making bugs visible, is the web page set up by Andreas Tille for the\n<a href=\"http://alioth.debian.org/projects/debichem\">DebiChem project</a>. His page <a href=\"http://cdd.alioth.debian.org/debichem/bugs/\">summarizes the bugs</a>\nlisted for the chemistry in Debian (which includes the Blue Obelisk projects <a href=\"http://packages.debian.org/lenny/avogadro\">Avogadro</a>,\n<a href=\"http://packages.debian.org/lenny/bodr\">BODR</a>, <a href=\"http://packages.debian.org/lenny/libcdk-java\">CDK</a>,\n<a href=\"http://packages.debian.org/lenny/chemical-mime-data\">Chemical MIME Data</a>,\n<a href=\"http://packages.debian.org/lenny/kalzium\">Kalzium</a> and <a href=\"http://packages.debian.org/lenny/openbabel\">OpenBabel</a>):</p>\n\n<p><img src=\"/assets/images/debichem.png\" alt=\"\" /></p>\n\n<p>This data analysis helps the projects being analyzed.</p>\n\n<h2 id=\"packaging\">Packaging</h2>\n\n<p>This brings me to a last topic, for this blog: packaging using Open Standards. In order to allow those eyeballs to spot bugs, it is of the\nutmost importance to package your results in Open Standards, and not just one, but likely many. For Open Source projects this ultimately\nmeans Distribution Packages (deb or rpm). If that goal has been achieved, you know your results can be read by anyone. Software should be\ninstallable (make, ant, cmake, etc), and Data should be readable (no PDF, but RDF, XML, JSON, or whatever standard). Preferably not Excel,\nas this is too free format (as Rajarshi also <a href=\"http://rguha.wordpress.com/2008/11/06/solubility-queries-and-the-google-visualization-api/\">indicated</a>),\nbut with some added conventions it may do well. Blue Obelisk project are generally doing well in terms of packaging.</p>\n\n<p>For the CDK, which already is reasonably well packaged, I am currently working on <a href=\"http://cdk.svn.sourceforge.net/viewvc/cdk/cdk-eclipse/trunk/\">Eclipse</a>\nand <a href=\"http://cdk.svn.sourceforge.net/viewvc/cdk/cdk-pom/trunk/\">Maven2</a> packages. The former is already being used by Bioclipse, while the\nsecond aims at <a href=\"https://sourceforge.net/projects/cml\">Jumbo</a> (which has just seen a\n<a href=\"https://sourceforge.net/project/showfiles.php?group_id=51361\">new release</a>. <a href=\"http://wwmm.ch.cam.ac.uk/blogs/downing/\">Jim</a>,\nI’m happy to see the CMLDOM/Jumbo split!), <a href=\"http://www.cdk-taverna.de/\">CDK-Taverna</a>, and possibly a third (Paula, what for do you plan\nto use it?). The POM export is not fully working yet, but with four research sites involved in this Open Project, I’m sure we’ll work\nit out.</p>\n\n<p>The bottom line is, scientific progress would benefit so much from a Bazaar approach. And the key thing is not collaboration; that’s\nsomething you can do in a Cathedral-like fashion too. No, the key thing is to be Open and allow anyone, even your worst nightmare, to\ncomment on what you do. Let him prove you wrong, openly, that is.</p>\n\n<p>OK, there it is. My open notebook entry for this week. Now you know what I have been up to this week.</p>",
      "summary": "The Blue Obelisk mantra ODOSOS, Open Data, Open Source, Open Standards, is well known, and much cited too. Jean-Claude Bradley popularized the Open Notebook Science (ONS). This has always been nagging me a bit, because the CDK, Jmol, JChemPaint and other chemistry projects have done that for much longer, though we did not use notebooks as much, so called it just an open source project. It really is no different, IMO, though surely, there are differences.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/registry.png",
      "date_published": "2008-11-07T00:00:00+00:00",
      "date_modified": "2025-10-11T00:00:00+00:00",
      "tags": ["odosos","chemspider","workflow","cdk","bioclipse","cml","debian","eclipse","rdf","jmol","blue-obelisk"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/esdth-ef628",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/11/04/next-generation-asynchronous.html",
      "title": "Next generation asynchronous webservices #2",
      "content_html": "<p>Getting back to some webservice stuff (see <a href=\"https://chem-bla-ics.linkedchemistry.info/2008/10/31/next-generation-asynchronous.html\">part #1 of this series <i class=\"fa-solid fa-recycle fa-xs\"></i></a>)…\nactually, I’ll use <em>cloud service</em> from now on, since <em>web service</em> is reserved for SOAP/WSDL (see\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2008/11/03/embrace-workshop-in-uppsala.html\">my EMBRACE presentation <i class=\"fa-solid fa-recycle fa-xs\"></i></a>).\nLet me present this bit of JavaScript I just ran in Bioclipse2:</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nx\">xws</span><span class=\"p\">.</span><span class=\"nf\">connect</span><span class=\"p\">();</span>\n<span class=\"nx\">service</span> <span class=\"o\">=</span> <span class=\"nx\">xws</span><span class=\"p\">.</span><span class=\"nf\">getService</span><span class=\"p\">(</span><span class=\"dl\">\"</span><span class=\"s2\">cdk.ws1.bmc.uu.se</span><span class=\"dl\">\"</span><span class=\"p\">);</span>\n<span class=\"nx\">service</span><span class=\"p\">.</span><span class=\"nf\">discoverSync</span><span class=\"p\">(</span><span class=\"mi\">9000</span><span class=\"p\">);</span>\n<span class=\"nx\">service</span><span class=\"p\">.</span><span class=\"nf\">getFunctions</span><span class=\"p\">();</span>\n<span class=\"nx\">f</span> <span class=\"o\">=</span> <span class=\"nx\">service</span><span class=\"p\">.</span><span class=\"nf\">getFunction</span><span class=\"p\">(</span><span class=\"dl\">\"</span><span class=\"s2\">calculateMass</span><span class=\"dl\">\"</span><span class=\"p\">);</span>\n<span class=\"nx\">ios</span> <span class=\"o\">=</span> <span class=\"nx\">f</span><span class=\"p\">.</span><span class=\"nf\">getIoSchemataSync</span><span class=\"p\">(</span><span class=\"mi\">9000</span><span class=\"p\">);</span>\n<span class=\"nx\">iof</span> <span class=\"o\">=</span> <span class=\"nx\">xws</span><span class=\"p\">.</span><span class=\"nf\">getIoFactory</span><span class=\"p\">(</span><span class=\"nx\">ios</span><span class=\"p\">);</span>\n<span class=\"nx\">smiDoc</span> <span class=\"o\">=</span> <span class=\"nx\">iof</span><span class=\"p\">.</span><span class=\"nf\">createSmilesDocument</span><span class=\"p\">();</span>\n<span class=\"nx\">smiDoc</span><span class=\"p\">.</span><span class=\"nf\">setSmiles</span><span class=\"p\">(</span><span class=\"dl\">\"</span><span class=\"s2\">CCC</span><span class=\"dl\">\"</span><span class=\"p\">);</span>\n<span class=\"nx\">result</span> <span class=\"o\">=</span> <span class=\"nx\">f</span><span class=\"p\">.</span><span class=\"nf\">invokeSync</span><span class=\"p\">(</span><span class=\"nx\">smiDoc</span><span class=\"p\">.</span><span class=\"nf\">toString</span><span class=\"p\">(),</span> <span class=\"mi\">9000</span><span class=\"p\">);</span>\n<span class=\"nx\">obj</span> <span class=\"o\">=</span> <span class=\"nx\">iof</span><span class=\"p\">.</span><span class=\"nf\">getOutputObject</span><span class=\"p\">(</span><span class=\"nx\">result</span><span class=\"p\">);</span>\n<span class=\"nf\">print</span><span class=\"p\">(</span><span class=\"dl\">\"</span><span class=\"s2\">Mass: </span><span class=\"dl\">\"</span> <span class=\"o\">+</span>  <span class=\"nx\">obj</span><span class=\"p\">.</span><span class=\"nf\">getStringValue</span><span class=\"p\">())</span>\n</code></pre></div></div>\n\n<p>At first, it might look a bit verbose to just calculate the mass of a molecule, and it is, and it is not even written in XML. Hahahaha</p>\n\n<p>Anyway, the code rocks, thanx to Johannes’ great work on his <a href=\"https://sourceforge.net/projects/xws4j/\">xws4j</a> library! I’ll explain\nthe script. The first line gets Bioclipse online using a <em>Jabber</em> account, which you set via Bioclipse’ preferences pages. The next few\nlines allows you to connect to a cloud service, this one running on <code class=\"language-plaintext highlighter-rouge\">ws1.bmc.uu.se</code> and called <code class=\"language-plaintext highlighter-rouge\">cdk</code>. With the <code class=\"language-plaintext highlighter-rouge\">getFunctions()</code> method\nwe query which functions are available, called ports in WDSL if not mistaken, from which we pick the <code class=\"language-plaintext highlighter-rouge\">calculateMass</code> one.</p>\n\n<p>And then the action joins in. One nice feature of the <a href=\"http://xmpp.org/extensions/xep-0244.html\">IO-DATA proposal</a> is that the function\nitself defines the XML Schema it uses for input and output, and does not rely on WDSL to do that (or maybe recent SOAP specs allows that\ntoo). So, we query the function for its schemata, and the xws4j library then something funky happens: we order the library to create a\ndata model on the fly for this service! From this we get a Java data model for the service. This allows us to use <code class=\"language-plaintext highlighter-rouge\">createSmilesDocument()</code>\nand <code class=\"language-plaintext highlighter-rouge\">setSmiles()</code>. That’s function-specific stuff!</p>\n\n<p>Of course, we do not have to do that. For example, the second function I wrote (<code class=\"language-plaintext highlighter-rouge\">generate3Dcoordinates</code>) eats and spits CML, and I’d\nrather rely on CMLDOM or CDK as data model then. But more on that later…</p>\n\n<p>The Bioclipse xws4j plugin actually puts the data model in my workspace, so that I can easily introspect the API:</p>\n\n<p><img src=\"/assets/images/autogeneration.png\" alt=\"\" /></p>\n\n<p>The last three lines invoke the function (synchronously, as it’s cheap), and get the mass from the function output. BTW, I should\nstress that a function does not require any specific implementation regarding synchronous or asynchronous calls. You write <strong>one</strong>\nfunction, and can call it in either way you like. The library hides all IO-DATA details around that.</p>",
      "summary": "Getting back to some webservice stuff (see part #1 of this series )… actually, I’ll use cloud service from now on, since web service is reserved for SOAP/WSDL (see my EMBRACE presentation ). Let me present this bit of JavaScript I just ran in Bioclipse2:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/autogeneration.png",
      "date_published": "2008-11-04T00:00:00+00:00",
      "date_modified": "2025-10-11T00:00:00+00:00",
      "tags": ["xmpp","java","bioclipse","web"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/wh1bq-ptb19",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/11/03/embrace-workshop-in-uppsala.html",
      "title": "EMBRACE workshop in Uppsala",
      "content_html": "<p>This Monday and Tuesday I will attend the EMBRACE workshop <a href=\"http://teacher.bmc.uu.se/webservicesworkshop/Welcome.html\">Understanding, creating and deploying EMBRACE compliant WebServices</a>.\nI will present there the ongoing work in Bioclipse to support services and web services in particular. The sheets of the presentation will look like:</p>\n\n<p><a href=\"https://doi.org/10.5281/zenodo.2647677\"><img src=\"/assets/images/bioclipseWebserviceEmbrace.png\" alt=\"\" /></a></p>",
      "summary": "This Monday and Tuesday I will attend the EMBRACE workshop Understanding, creating and deploying EMBRACE compliant WebServices. I will present there the ongoing work in Bioclipse to support services and web services in particular. The sheets of the presentation will look like:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/bioclipseWebserviceEmbrace.png",
      "date_published": "2008-11-03T00:00:00+00:00",
      "date_modified": "2008-11-03T00:00:00+00:00",
      "tags": ["bioclipse","openscience","web"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.5281/zenodo.2647677", "doi": "10.5281/zenodo.2647677"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/7gx9j-dn554",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/10/31/next-generation-asynchronous.html",
      "title": "Next generation asynchronous webservices",
      "content_html": "<p>Johannes joined a <a href=\"http://wiki.bioclipse.net/index.php?title=Outcome_of_the_Bioclipse_autumn_workshop_2006\">Bioclipse Workshop</a> a long time\nago, and introduced the participants to the idea of using <a href=\"http://xmpp.org/\">XMPP</a> (aka Jabber) for asynchronous web services.\n<a href=\"http://en.wikipedia.org/wiki/SOAP_(protocol)\">SOAP</a> is commonly user to run webservices over <a href=\"http://en.wikipedia.org/wiki/HTTP\">HTTP</a>,\nbut via (SMTP) email and XMPP is possible too (see <a href=\"http://xmpp.org/extensions/xep-0072.html\">SOAP over XMPP</a>). Using HTTP as transport\nlayer has problems. The biggest problem, is possibly that HTTP connections are timed out, e.g. by intermediate router. This makes it\nrather unsuited for long running jobs. Workarounds are easy to come up with, and <em>polling</em> is a common solution.</p>\n\n<p>Johannes ideas solve this limitation by using the general XMPP protocol for chatting:</p>\n\n<dl>\n<dt><span style=\"color:red; font-weight: bold\">client</span></dt>\n  <dd>he, can you do something for me?</dd>\n<dt><span style=\"color:darkgreen; font-weight: bold\">service</span></dt>\n  <dd>sure, I can do <b>generate3Dcoordinates</b> and <b>generateSMILES</b>.</dd>\n<dt><span style=\"color:red; font-weight: bold\">client</span></dt>\n  <dd>ah, nice! what input does <b>generate<a href=\"http://www.opensmiles.org/\">SMILES</a></b> take? and the output?</dd>\n<dt><span style=\"color:darkgreen; font-weight: bold\">service</span></dt>\n  <dd>input: <a href=\"http://en.wikipedia.org/wiki/Chemical_Markup_Language\">CML</a>, output a simple string.</dd>\n<dt><span style=\"color:red; font-weight: bold\">client</span></dt>\n  <dd>ok, here's the CML</dd>\n<dt><span style=\"color:darkgreen; font-weight: bold\">service</span></dt>\n  <dd>I'm done now. sorry that it took 10 minutes, but I'm running Vista...</dd>\n<dt><span style=\"color:red; font-weight: bold\">client</span></dt>\n  <dd>excellent, please send me the results</dd>\n<dt><span style=\"color:darkgreen; font-weight: bold\">service</span></dt>\n  <dd>ok, here is the SMILES for <a href=\"http://en.wikipedia.org/wiki/Lacosamide\">lacosamide</a>: <span class=\"chem:smiles\">CC(=O)N[C@H](COC)C(=O)NCC1=CC=CC=C1</span></dd>\n</dl>\n\n<p>Well, the important bit is in the last line. A job may take lone, even on clusters. The client might have to reboot meanwhile (possibly\nbecause of critical security updates)… the <em>service</em> will just continue, and send you a message when done. If you just happen to be\noffline, it will send a message when you are back online.</p>\n\n<p>Johannes ideas led to the <a href=\"http://xmpp.org/extensions/xep-0244.html\">IO-DATA proposal</a> (XEP-0244), which is currently marked experimental\nand being discussed on the <a href=\"http://mail.jabber.org/mailman/listinfo/ws-xmpp\">ws-xmpp</a> mailing list. He gathered a few people around\nhim to get it going, resulting in working stuff! Yeah!</p>\n\n<h2 id=\"chemistry-development-kit-xws\">Chemistry Development Kit XWS</h2>\n\n<p>Besides contributing to the proposal, I am also involved in this project by writing XMPP-webservices, for the\n<a href=\"https://cdk.github.io/\">CDK</a>. This brings me to <a href=\"https://cdk.svn.sourceforge.net/svnroot/cdk/cdk-xws/trunk@12888\">cdk-xws</a>, which is\nthe project to bring CDK functionality online as webservices using IO-DATA.</p>\n\n<p><img src=\"/assets/images/cdkxwsPsi.png\" alt=\"\" /></p>\n\n<p>This shows three nodes, the first being the CDK service, with two functions, of which I only implemented one yet.</p>\n\n<p>For the curious, this is what the XMPP messages look like:</p>\n\n<div class=\"language-xml highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nt\">&lt;iq</span> <span class=\"na\">from=</span><span class=\"s\">\"egonw@ws1.bmc.uu.se/home\"</span>\n    <span class=\"na\">id=</span><span class=\"s\">\"JSO-0.12.5-6\"</span>\n    <span class=\"na\">to=</span><span class=\"s\">\"cdk.ws1.bmc.uu.se\"</span>\n    <span class=\"na\">type=</span><span class=\"s\">\"set\"</span><span class=\"nt\">&gt;</span>\n  <span class=\"nt\">&lt;command</span> <span class=\"na\">xmlns=</span><span class=\"s\">\"http://jabber.org/protocol/commands\"</span>\n           <span class=\"na\">action=</span><span class=\"s\">\"execute\"</span>\n           <span class=\"na\">node=</span><span class=\"s\">\"calculateMass\"</span><span class=\"nt\">&gt;</span>\n    <span class=\"nt\">&lt;iodata</span> <span class=\"na\">xmlns=</span><span class=\"s\">\"urn:xmpp:tmp:io-data\"</span>\n            <span class=\"na\">type=</span><span class=\"s\">\"input\"</span><span class=\"nt\">&gt;</span>\n      <span class=\"nt\">&lt;in&gt;</span>\n        <span class=\"nt\">&lt;smiles</span> <span class=\"na\">xmlns=</span><span class=\"s\">\"urn:xws:cdk:input\"</span><span class=\"nt\">&gt;</span>CCC<span class=\"nt\">&lt;/smiles&gt;</span>\n      <span class=\"nt\">&lt;/in&gt;</span>\n    <span class=\"nt\">&lt;/iodata&gt;</span>\n  <span class=\"nt\">&lt;/command&gt;</span>\n<span class=\"nt\">&lt;/iq&gt;</span>\n<span class=\"nt\">&lt;iq</span> <span class=\"na\">from=</span><span class=\"s\">\"cdk.ws1.bmc.uu.se\"</span>\n    <span class=\"na\">id=</span><span class=\"s\">\"JSO-0.12.5-6\"</span>\n    <span class=\"na\">to=</span><span class=\"s\">\"egonw@ws1.bmc.uu.se/home\"</span>\n    <span class=\"na\">type=</span><span class=\"s\">\"result\"</span>\n    <span class=\"na\">xml:lang=</span><span class=\"s\">\"en\"</span><span class=\"nt\">&gt;</span>\n  <span class=\"nt\">&lt;command</span> <span class=\"na\">xmlns=</span><span class=\"s\">\"http://jabber.org/protocol/commands\"</span>\n           <span class=\"na\">node=</span><span class=\"s\">\"calculateMass\"</span>\n           <span class=\"na\">sessionid=</span><span class=\"s\">\"XWS-1\"</span>\n           <span class=\"na\">status=</span><span class=\"s\">\"completed\"</span><span class=\"nt\">&gt;</span>\n    <span class=\"nt\">&lt;iodata</span> <span class=\"na\">xmlns=</span><span class=\"s\">\"urn:xmpp:tmp:io-data\"</span>\n            <span class=\"na\">type=</span><span class=\"s\">\"output\"</span><span class=\"nt\">&gt;</span>\n      <span class=\"nt\">&lt;out&gt;</span>\n        <span class=\"nt\">&lt;mass&gt;</span>36.032207690364004<span class=\"nt\">&lt;/mass&gt;</span>\n      <span class=\"nt\">&lt;/out&gt;</span>\n    <span class=\"nt\">&lt;/iodata&gt;</span>\n    <span class=\"nt\">&lt;note</span> <span class=\"na\">type=</span><span class=\"s\">\"info\"</span><span class=\"nt\">&gt;</span>Done<span class=\"nt\">&lt;/note&gt;</span>\n  <span class=\"nt\">&lt;/command&gt;</span>\n<span class=\"nt\">&lt;/iq&gt;</span>\n</code></pre></div></div>",
      "summary": "Johannes joined a Bioclipse Workshop a long time ago, and introduced the participants to the idea of using XMPP (aka Jabber) for asynchronous web services. SOAP is commonly user to run webservices over HTTP, but via (SMTP) email and XMPP is possible too (see SOAP over XMPP). Using HTTP as transport layer has problems. The biggest problem, is possibly that HTTP connections are timed out, e.g. by intermediate router. This makes it rather unsuited for long running jobs. Workarounds are easy to come up with, and polling is a common solution.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/cdkxwsPsi.png",
      "date_published": "2008-10-31T00:00:00+00:00",
      "date_modified": "2025-09-10T00:00:00+00:00",
      "tags": ["bioclipse","xmpp","soap","http"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/av0xr-bgf16",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/10/31/embedding-gists-in-blogs.html",
      "title": "Embedding Gists in blogs",
      "content_html": "<p><a href=\"http://holtsblog.blogspot.com/2008/10/embedding-gist-in-your-blog.html\">Mark</a> pointed me to the embed functionality of\n<a href=\"http://gist.github.com/\">Gist</a>, product on <a href=\"https://github.com/\">GitHub <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nwhere I host <a href=\"https://chem-bla-ics.linkedchemistry.info/2008/10/20/gittodo-support-for-freemind-graphical.html\">some todo software <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nand a <a href=\"https://chem-bla-ics.linkedchemistry.info/2008/09/30/git-mirror-for-cdk.html\">git mirror of CDK 1.2.x <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.</p>\n\n<p>So, the other day, when I blogged about <a href=\"https://chem-bla-ics.linkedchemistry.info/2008/10/25/bioclipse2-scripting-1-from-smiles-to.html\">Bioclipse2 scripts <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\nI should have embedded the script like this:</p>\n\n<script src=\"https://gist.github.com/18315.js\"></script>",
      "summary": "Mark pointed me to the embed functionality of Gist, product on GitHub where I host some todo software and a git mirror of CDK 1.2.x .",
      
      "date_published": "2008-10-31T00:00:00+00:00",
      "date_modified": "2025-10-11T00:00:00+00:00",
      "tags": ["github"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/5azgt-k8a94",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/10/25/bioclipse2-scripting-1-from-smiles-to.html",
      "title": "Bioclipse2 Scripting #1: from SMILES to a UFF optimized structure in Jmol",
      "content_html": "<p>After some difficulties this week with making an export of <a href=\"http://cdk.sf.net/\">CDK</a> plugins in the\n<a href=\"http://www.bioclipse.net/\">Bioclipse2</a> <em>Cheminformatics feature</em> of with the <a href=\"http://cdk.svn.sourceforge.net/viewvc/cdk/cdk-eclipse/trunk/\">cdk-eclipse</a>\nsoftware, I got the following cute Bioclipse2 script up and running:</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nx\">dimethylether</span> <span class=\"o\">=</span> <span class=\"nx\">cdk</span><span class=\"p\">.</span><span class=\"nf\">fromSMILES</span><span class=\"p\">(</span> <span class=\"dl\">\"</span><span class=\"s2\">COC</span><span class=\"dl\">\"</span> <span class=\"p\">);</span>\n<span class=\"nx\">cdk</span><span class=\"p\">.</span><span class=\"nf\">addExplicitHydrogens</span><span class=\"p\">(</span> <span class=\"nx\">dimethylether</span> <span class=\"p\">);</span>\n<span class=\"nx\">cdk</span><span class=\"p\">.</span><span class=\"nf\">generate3dCoordinates</span><span class=\"p\">(</span> <span class=\"nx\">dimethylether</span> <span class=\"p\">);</span>\n\n<span class=\"c1\">// save as CML</span>\n<span class=\"nx\">cdk</span><span class=\"p\">.</span><span class=\"nf\">saveCML</span><span class=\"p\">(</span> <span class=\"nx\">dimethylether</span><span class=\"p\">,</span> <span class=\"dl\">\"</span><span class=\"s2\">/Virtual/dimethylether.cml</span><span class=\"dl\">\"</span> <span class=\"p\">);</span>\n<span class=\"nx\">ui</span><span class=\"p\">.</span><span class=\"nf\">open</span><span class=\"p\">(</span> <span class=\"dl\">\"</span><span class=\"s2\">/Virtual/dimethylether.cml</span><span class=\"dl\">\"</span> <span class=\"p\">);</span> <span class=\"c1\">// this should open a JmolEditor</span>\n\n<span class=\"nx\">jmol</span><span class=\"p\">.</span><span class=\"nf\">minimize</span><span class=\"p\">();</span>\n</code></pre></div></div>\n\n<p>You can see four of my favorite cheminformatics tools integrated: CDK is used to convert a SMILES into connection table with add explicit\nhydrogens, and to create initial 3D coordinates (with the code from Christian Hoppe, and thanx to Stefan for fixing that code in the\nCDK 1.1.x branch!). Then, <a href=\"http://cml.sourceforge.net/\">CMLDOM</a> is used to create and save a CML document, which is then opened into a\n<a href=\"http://www.jmol.org/\">Jmol</a> editor in Bioclipse.</p>\n\n<p>A variation of this script is visible in the following screenshot:</p>\n\n<p><img src=\"/assets/images/mashupCmldomJmolCDK.png\" alt=\"\" /></p>\n\n<p>This and other Bioclipse2 scripts I will post in <a href=\"http://gist.github.com/\">Gist</a>, a sort of <a href=\"http://en.wikipedia.org/wiki/Pastebin\">pastebin</a>\nsupporting version history, and I’ll tag them with <em>bioclipse gist</em> on <a href=\"http://delicious.com/egonw/\">delicious</a>, so that you can always\nbrowse them, comment on them, or add your own gists at\n<a href=\"http://delicious.com/tag/bioclipse+gist\">http://delicious.com/tag/bioclipse+gist</a>.</p>",
      "summary": "After some difficulties this week with making an export of CDK plugins in the Bioclipse2 Cheminformatics feature of with the cdk-eclipse software, I got the following cute Bioclipse2 script up and running:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/mashupCmldomJmolCDK.png",
      "date_published": "2008-10-25T00:00:00+00:00",
      "date_modified": "2008-10-25T00:00:00+00:00",
      "tags": ["bioclipse","cml","cdk","jmol","eclipse","github","cheminf"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/sejc5-h3m89",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/10/24/git-eclipse-integration.html",
      "title": "Git-Eclipse integration",
      "content_html": "<p>Recently, I have been blogging about <a href=\"http://git.or.cz/\">Git</a>:</p>\n\n<ul>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2008/10/20/gittodo-support-for-freemind-graphical.html\">GitToDo support for Freemind <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2008/09/30/git-mirror-for-cdk.html\">Git mirror for the CDK <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2008/09/07/cdk-development-with-branches-using-git.html\">CDK development with branches using Git <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2007/10/31/offline-cdk-development-using-git-svn.html\">Offline CDK development using git-svn <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n</ul>\n\n<p>One concern expressed by people was the lack of integration with IDEs. Now, an <a href=\"http://git.or.cz/gitwiki/EclipsePlugin\">Eclipse plugin</a>\nseems well on its way:</p>\n\n<p><img src=\"/assets/images/gitEclipse.png\" alt=\"\" /></p>\n\n<p>With a experimental update site (<a href=\"http://www.jgit.org/update-site\">http://www.jgit.org/update-site</a>),\nthe plugin is just an Eclipse reboot away.</p>\n\n<p>Now, the plugin is still in its early stages and many <a href=\"http://git.or.cz/gitwiki/EclipsePluginWishlist\">open feature requests</a>,\nbut fortunately the bug tracker can easy be <a href=\"http://code.google.com/p/egit/wiki/ConfiguringMylyn\">integrated with Mylyn</a>,\nand is <a href=\"http://repo.or.cz/w/egit.git?a=log\">still actively developed</a>.</p>\n\n<p>Cheers to Shawn and Robin for their work!</p>",
      "summary": "Recently, I have been blogging about Git:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/gitEclipse.png",
      "date_published": "2008-10-24T00:00:00+00:00",
      "date_modified": "2025-10-11T00:00:00+00:00",
      "tags": ["git","eclipse"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/a10pf-47781",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/10/20/bugzilla-eclipse-ide-integration-mylyn.html",
      "title": "Bugzilla Eclipse IDE integration: Mylyn",
      "content_html": "<p>A new environment means new tools. <a href=\"http://www.bioclipse.net/\">Bioclipse</a> is Eclipse RCP-based, so colleagues work with Eclipse and are much more into\nEclipse too. For example, into <a href=\"http://www.eclipse.org/mylyn/\">Mylyn</a>. Mylyn is a tool to track tasks and assign context to them. The tasks I am interested\nin (for this blog item), is fixing bug reports. Mylyn is rather suited for this, as it allows linking Java source files to bug reports. With a growing list\nof <em>projects</em> in my navigator, browsing them becomes difficult because the list is way too long. Mylyn allows me to only show those source files which are\nactually related to the bug I am fixing. Cool!</p>\n\n<p>However, SourceForge, our bug tracker, integrates, but to too limited functionality. <a href=\"http://www.bugzilla.org/\">Bugzilla</a>, though, has excellent integration.\nAnd curious about what that would look like, I installed Bugzilla on an Ubuntu system. Which failed. Due to a bug know for two years already! Anyway, two\ntweaks to the system got it working!</p>\n\n<ol>\n  <li>Work around the password in the postinstall script (see <a href=\"http://ph.ubuntuforums.com/showthread.php?t=625588\">here</a>)</li>\n  <li>Set up a /bugs/ link (see <a href=\"http://ubuntuforums.org/showthread.php?t=405283\">here</a>)</li>\n</ol>\n\n<p>This is Bugzilla as viewed in Mylyn:</p>\n\n<p><img src=\"/assets/images/bugzilla.png\" alt=\"\" /></p>\n\n<p>(The bug content is derived from <a href=\"https://bugs.launchpad.net/bugs/1\">Ubuntu bug #1.</a>)</p>",
      "summary": "A new environment means new tools. Bioclipse is Eclipse RCP-based, so colleagues work with Eclipse and are much more into Eclipse too. For example, into Mylyn. Mylyn is a tool to track tasks and assign context to them. The tasks I am interested in (for this blog item), is fixing bug reports. Mylyn is rather suited for this, as it allows linking Java source files to bug reports. With a growing list of projects in my navigator, browsing them becomes difficult because the list is way too long. Mylyn allows me to only show those source files which are actually related to the bug I am fixing. Cool!",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/bugzilla.png",
      "date_published": "2008-10-20T00:10:00+00:00",
      "date_modified": "2008-10-20T00:10:00+00:00",
      "tags": ["eclipse","bioclipse"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/zdj58-w3989",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/10/20/gittodo-support-for-freemind-graphical.html",
      "title": "GitToDo support for Freemind: graphical mapping of important things on my schedule",
      "content_html": "<p>About a week ago, I hooked up my <a href=\"http://github.com/egonw/gtd/tree/master\">GitToDo</a> software with\n<a href=\"http://freemind.sourceforge.net/wiki/index.php/Main_Page\">Freemind</a>. This allows me to organize the projects\nI am working on, without having to code this in GitToDo directly. I also immediately take advantage of visualization,\nfor example, adding an icon for projects with one or more TODO items marked TODAY or URGENT:</p>\n\n<p><img src=\"/assets/images/GitToDoFreemind.png\" alt=\"\" /></p>\n\n<p>Keeping my GitToDo repository synchronized is as easy as typing:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>gtd-freemind-update\ngtd-freemind-show\n</code></pre></div></div>",
      "summary": "About a week ago, I hooked up my GitToDo software with Freemind. This allows me to organize the projects I am working on, without having to code this in GitToDo directly. I also immediately take advantage of visualization, for example, adding an icon for projects with one or more TODO items marked TODAY or URGENT:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/GitToDoFreemind.png",
      "date_published": "2008-10-20T00:00:00+00:00",
      "date_modified": "2008-10-20T00:00:00+00:00",
      
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/dvb9w-vg635",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/10/20/chemical-editing.html",
      "title": "Chemical Editing...",
      "content_html": "<p>As you might have seen, we, <a href=\"http://www.bioclipse.net/\">Uppala</a> and the <a href=\"http://www.ebi.ac.uk/steinbeck/\">EBI</a>, are working on the next generation\n<a href=\"https://apps.sourceforge.net/mediawiki/cdk/index.php?title=JChemPaint\">JChemPaint</a>. JChemPaint is an editor, and therefore, consists of a model\n(<code class=\"language-plaintext highlighter-rouge\">IChemModel</code>), a view (<code class=\"language-plaintext highlighter-rouge\">IRenderer</code>) and a controller (<code class=\"language-plaintext highlighter-rouge\">IController</code>). See the many posts in <a href=\"http://gilleain.blogspot.com/\">Gilleain’s blog</a>.</p>\n\n<p>For the renderer I have set up a <a href=\"https://apps.sourceforge.net/mediawiki/cdk/index.php?title=JChemPaint_Rendering_Modules\">wiki page</a>\nwhich I’ll be hacking in the next days, which shows how a <code class=\"language-plaintext highlighter-rouge\">IChemObject</code> content should be rendered in JChemPaint. It looks like:</p>\n\n<p><img src=\"/assets/images/jcpRenderingRequirements.png\" alt=\"\" /></p>\n\n<p>The <code class=\"language-plaintext highlighter-rouge\">IController</code> is a rather important part too, and like the <code class=\"language-plaintext highlighter-rouge\">IRenderer</code> bit of JChemPaint, needs a major overhaul. The new design,\ndiscussed by Gilleain <a href=\"http://gilleain.blogspot.com/2008/09/interface-relays-and-controller-modules.html\">here</a> and\n<a href=\"http://gilleain.blogspot.com/2008/09/current-controller-architecture.html\">here</a>, should, IMHO, look like:</p>\n\n<p><img src=\"/assets/images/Jcp_editing.png\" alt=\"\" /></p>\n\n<p>In this diagram, the gestures can come from any input device, mouse, tracking ball, Wiimote, and will result in events in some\nwidget library (SWT, AWT shown). The old JChemPaint, converted the Swing <code class=\"language-plaintext highlighter-rouge\">MouseEvent</code>’s directly into <code class=\"language-plaintext highlighter-rouge\">IChemObject</code> modifications,\nmaking the code incompatible with SWT. This is why the <em>Chemical Editing Events</em> layer must be added.</p>\n\n<p>Events in this layer look like <code class=\"language-plaintext highlighter-rouge\">addAtom(attachementAtom, coordinates)</code> and <code class=\"language-plaintext highlighter-rouge\">setFormalCharge(atom, newCharge)</code>.\nThe link to scripting should be clear now, and will help use write unit tests for this layer.</p>",
      "summary": "As you might have seen, we, Uppala and the EBI, are working on the next generation JChemPaint. JChemPaint is an editor, and therefore, consists of a model (IChemModel), a view (IRenderer) and a controller (IController). See the many posts in Gilleain’s blog.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/jcpRenderingRequirements.png",
      "date_published": "2008-10-20T00:00:00+00:00",
      "date_modified": "2008-10-20T00:00:00+00:00",
      "tags": ["jchempaint","cdk"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/5905j-fpe35",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/10/20/chem-bla-ics-turns-3.html",
      "title": "Chem-bla-ics turns 3!",
      "content_html": "<p>Five days ago, my chem-bla-ics blog turned 3. Here’s the <a href=\"https://chem-bla-ics.linkedchemistry.info/2005/10/15/chem-bla-ics.html\">first post <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.\nIt defined:</p>\n\n<blockquote>\n  <p><strong>chemblaics</strong> <em>is the application of open source software in cheminformatics, chemometrics, proteochemometrics, etc, making\nexperimental results reproducable and validatable.</em></p>\n</blockquote>\n\n<p>Much has changed to the field since that post, for the better of chemical sciences.</p>",
      "summary": "Five days ago, my chem-bla-ics blog turned 3. Here’s the first post . It defined:",
      
      "date_published": "2008-10-20T00:00:00+00:00",
      "date_modified": "2025-10-11T00:00:00+00:00",
      "tags": ["blog","cheminf"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/srpzr-0ar34",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/10/18/chemoinformatics-p0wned-by.html",
      "title": "Chemoinformatics p0wned by cheminformatics... #2",
      "content_html": "<p>Some time ago <a href=\"http://baoilleach.blogspot.com/\">Noel</a> <a href=\"http://baoilleach.blogspot.com/2008/07/chemoinformatics-p0wned-by.html\">ran a poll</a>\non <em>chemoinformatics</em> and <em>cheminformatics</em>, so I set up a poll too in <a href=\"https://chem-bla-ics.linkedchemistry.info/2008/07/09/chemoinformatics-p0wned-by.html\">part #1 of this series <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.\nThe outcome is clear:</p>\n\n<p><img src=\"/assets/images/cheminfoPoll.png\" alt=\"\" /></p>\n\n<p>The <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/05/28/blue-obelisk-in-obernai-at.html\">Obernai meeting <i class=\"fa-solid fa-recycle fa-xs\"></i></a> strongly suggested <em>chemoinformatics</em>\n[<a href=\"https://en.wikipedia.org/wiki/Cheminformatics\">1</a>], but the start of the open access <a href=\"http://jcheminf.com/\">Journal of Cheminformatics</a>\nis the killer. I can no longer resist: I’ll follow the wish from my <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/11/27/be-in-my-advisory-board-1-being-good.html\">advisory board <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\nand the general trend around the world (<a href=\"https://chem-bla-ics.linkedchemistry.info/2008/08/06/mapping-peoples-interest-google-insight.html\">except India <i class=\"fa-solid fa-recycle fa-xs\"></i></a>).</p>\n\n<p>The journal’s editor-in-chief is David Wild, while <a href=\"http://www.steinbeck-molecular.de/steinblog/\">Christoph Steinbeck</a>\nseems to be going to <a href=\"http://friendfeed.com/e/44ff87a4-940e-32d7-8745-9459ee2664ef/Christoph-Steinbeck-is-now-Editor-in-Chief-Europe/\">lead the European branch</a>.\nPeople seem <a href=\"http://usefulchem.blogspot.com/2008/10/journal-of-cheminformatics.html\">to like</a>\n<a href=\"http://baoilleach.blogspot.com/2008/10/journal-of-cheminformatics-new-open.html\">the idea</a>. The journal will clearly be in\ndirect competition for market share with the <a href=\"http://pubs.acs.org/journals/jcisd8/index.html\">JCIM</a>,\n<a href=\"http://www3.interscience.wiley.com/journal/104557877/home?CRETRY=1&amp;SRETRY=0\">QSAR &amp; Combinatorial Science</a>,\nand even the open access <a href=\"http://www.journal.chemistrycentral.com/subjects/cheminformaticsandmolecularmodelling\">Chemistry Central Journal</a>.\nInteresting to see where this is going…</p>",
      "summary": "Some time ago Noel ran a poll on chemoinformatics and cheminformatics, so I set up a poll too in part #1 of this series . The outcome is clear:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/cheminfoPoll.png",
      "date_published": "2008-10-18T00:00:00+00:00",
      "date_modified": "2025-10-05T00:00:00+00:00",
      "tags": ["cheminf","jcheminf"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/xahx7-yw817",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/10/07/jmol-116-rc-18-in-bioclipse.html",
      "title": "Jmol 11.6 RC 18 in Bioclipse",
      "content_html": "<p>Just updated <a href=\"http://www.bioclipse.net/\">Bioclipse2</a> with <a href=\"http://www.jmol.org/\">Jmol</a> 11.6 RC 18:</p>\n\n<p><img src=\"/assets/images/bcJmol1.6RC18.png\" alt=\"\" /></p>\n\n<p>Now <a href=\"https://chem-bla-ics.linkedchemistry.info/2008/09/24/moved-to-sweden-post-doc-in-bioclipse.html\">working in Uppsala <i class=\"fa-solid fa-recycle fa-xs\"></i></a> makes Bioclipse\nmy default life sciences platform, and I’ll be porting older Bioclipse1 plugins to Bioclipse2, which has a much better architecture.</p>\n\n<p>Bioclipse2 does not have a native Jmol Console, but script commands can easily be run with <code class=\"language-plaintext highlighter-rouge\">jmol.run()</code> (written by Jonathan).\nI wonder if it would be hard to have a JmolScript view like this JavaScript Console… The outline on the right (written by Ola)\nallows me to navigate the Jmol data model.</p>",
      "summary": "Just updated Bioclipse2 with Jmol 11.6 RC 18:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/bcJmol1.6RC18.png",
      "date_published": "2008-10-07T00:00:00+00:00",
      "date_modified": "2025-10-11T00:00:00+00:00",
      "tags": ["bioclipse","jmol"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/mnn5d-bwg40",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/10/06/pka-prediction-or-how-to-convert-jcim.html",
      "title": "pKa prediction, or, how to convert a JCIM paper into Java",
      "content_html": "<p>Lee et al. published last week a paper on pK<sub>a</sub> prediction (doi:<a href=\"https://doi.org/10.1021/ci8001815\">10.1021/ci8001815</a>). As the paper says,\n<em>the pKa, and in particular the ionic state of a molecule at physiological pH, affects pharmacokinetics and pharmacodynamics</em>. The paper describes\na (binary) decision tree using presence or absence of SMARTS substructures to traverse the tree, allowing prediction of monoprotic molecules.</p>\n\n<p>Now, the paper’s Supplementary Information contains the full model. I’d rather rebuild the model, but the full training set does not seem available.\nStill, the paper’s model shows comparible predictive power as commercial models, so I’d say it would be a welcome addition to the\n<a href=\"http://cdk.sf.net/\">CDK</a>.</p>\n\n<p>And as the CDK already has a <a href=\"http://cheminfo.informatics.indiana.edu/~rguha/code/java/nightly/api/org/openscience/cdk/smiles/smarts/parser/SMARTSParser.html\">SMARTS parser</a>,\nadding this model should be easy enough. So, here goes :) First, let us outline the API:</p>\n\n<div class=\"language-java highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"cm\">/* $Revision$ $Author$ $Date$\n *\n * Copyright (C) 2008  Egon Willighagen &lt;egonw@users.sf.net&gt;\n *\n * Contact: cdk-devel@list.sourceforge.net\n *\n * This program is free software; you can redistribute it and/or\n * modify it under the terms of the GNU Lesser General Public License\n * as published by the Free Software Foundation; either version 2.1\n * of the License, or (at your option) any later version.\n *\n * This program is distributed in the hope that it will be useful,\n * but WITHOUT ANY WARRANTY; without even the implied warranty of\n * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the\n * GNU Lesser General Public License for more details.\n *\n * You should have received a copy of the GNU Lesser General Public License\n * along with this program; if not, write to the Free Software\n * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.\n */</span>\n<span class=\"kn\">package</span> <span class=\"nn\">org.openscience.cdk.charges.pka</span><span class=\"o\">;</span>\n\n<span class=\"cm\">/**\n * Tool to predict a molecule's pKa. The class implements\n * the algorithm published by Lee et al. {@cdk.cite Lee2008}\n * which is based on a SMARTS-based decision tree, trained\n * with 1693 monoprotic compounds.\n *\n * @cdk.module extra\n */</span>\n<span class=\"kd\">public</span> <span class=\"kd\">class</span> <span class=\"nc\">PkaPredictor</span> <span class=\"o\">{</span>\n\n  <span class=\"cm\">/**\n   * Predicts the pKa value of a molecule.\n   *\n   * @param  container IMolecule to predict the pKa for\n   * @return           The predicted pKa\n   *\n   * @throws CDKException upon failure of the prediction algorithm\n   */</span>\n  <span class=\"kd\">public</span> <span class=\"kd\">static</span> <span class=\"kt\">float</span> <span class=\"nf\">predict</span><span class=\"o\">(</span><span class=\"nc\">IMolecule</span> <span class=\"n\">container</span><span class=\"o\">)</span> <span class=\"kd\">throws</span> <span class=\"nc\">CDKException</span> <span class=\"o\">{}</span>\n\n<span class=\"o\">}</span>\n</code></pre></div></div>\n\n<p>The first line is picked up by SVN, which will add the revision number, the last commiter and when the last commit happened. The third\nline is important: it indicates who has the right or need to be asked permission to modify the license, if ever needed. If people\nprovide patches to the code, they are added to this list. The rest of the source file header includes a general contact email address,\nand the <em>LGPL v2</em> license the CDK uses. The <em>package</em> declaration puts it in the <em>cdk.charges.pka</em> package, which seemed appropriate.</p>\n\n<p>The class JavaDoc contains two CDK specific tags. The tag <code class=\"language-plaintext highlighter-rouge\">${cdk.cite Lee2008}</code> is used to point to the literature reference database in\n<code class=\"language-plaintext highlighter-rouge\">doc/refs/cheminf.bibx</code>. When the HTML JavaDoc is compiled, the full reference gets included in the HTML. The other tag,\n<code class=\"language-plaintext highlighter-rouge\">@cdk.module</code> is used by the CDK build system to determine in which <a href=\"http://chem-bla-ics.blogspot.com/2008/03/cdk-module-dependencies-2.html?showComment=1207512720000\">CDK module</a> <!-- keep link -->\nthe Class should end up; <code class=\"language-plaintext highlighter-rouge\">extra</code> in this case. The method’s JavaDoc is pretty default.</p>\n\n<p>Next, we need some logic to traverse the look up the predicted pK<sub>a</sub> from the decision tree, and I implemented this as:</p>\n\n<div class=\"language-java highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kd\">public</span> <span class=\"kd\">static</span> <span class=\"kt\">float</span> <span class=\"nf\">predict</span><span class=\"o\">(</span><span class=\"nc\">IMolecule</span> <span class=\"n\">container</span><span class=\"o\">)</span> <span class=\"kd\">throws</span> <span class=\"nc\">CDKException</span> <span class=\"o\">{</span>\n  <span class=\"k\">if</span> <span class=\"o\">(</span><span class=\"n\">node1</span> <span class=\"o\">==</span> <span class=\"kc\">null</span><span class=\"o\">)</span> <span class=\"n\">initalize</span><span class=\"o\">();</span>\n\n  <span class=\"nc\">DecisionTreeNode</span> <span class=\"n\">node</span> <span class=\"o\">=</span> <span class=\"n\">node1</span><span class=\"o\">;</span>\n  <span class=\"c1\">// traverse down tree until we end up in a leave</span>\n  <span class=\"k\">while</span> <span class=\"o\">(!</span><span class=\"n\">node</span><span class=\"o\">.</span><span class=\"na\">isTerminal</span><span class=\"o\">())</span> <span class=\"o\">{</span>\n    <span class=\"n\">node</span> <span class=\"o\">=</span> <span class=\"n\">node</span><span class=\"o\">.</span><span class=\"na\">decide</span><span class=\"o\">(</span><span class=\"n\">container</span><span class=\"o\">);</span>\n  <span class=\"o\">}</span>\n  <span class=\"k\">return</span> <span class=\"n\">node</span><span class=\"o\">.</span><span class=\"na\">getValue</span><span class=\"o\">();</span>\n<span class=\"o\">}</span>\n</code></pre></div></div>\n\n<p>The root node of the tree is called <code class=\"language-plaintext highlighter-rouge\">node1</code>, and I explain its initialization later. Then, the code traverses the tree by asking\neach node to decide whether the SMARTS substructure is present or not. It returns a new <code class=\"language-plaintext highlighter-rouge\">DecisionTreeNode</code> matching the presence\nor absence. At some point, the terminal node is reached, and we can ask this node the associated prediction value.</p>\n\n<h2 id=\"the-java-version-of-the-decision-tree\">The Java version of the Decision Tree</h2>\n\n<p>The paper’s supplementary information contains the tree encoded like this:</p>\n\n<pre><code class=\"language-csv\">1,0,,,5.9131093\n2,1,[#G6H]C(=O),1,3.6849957                                                                   \n3,1,[#G6H]C(=O),0,7.206913 \n</code></pre>\n\n<p>That is, each line lists the node identifier, the parent identifier, the SMARTS query, presence (1) or absence (0), and the\nnode value. Actually, a bit more, but these are the important bits for now. The first line is the root node <code class=\"language-plaintext highlighter-rouge\">node1</code>, and the\nsecond and third line the two children of the root node. If the <code class=\"language-plaintext highlighter-rouge\">[#G6H]C(=O)</code> substructure is present, then <code class=\"language-plaintext highlighter-rouge\">node2</code> applies, and\nthe predicted value would be 3.6849957; if the substructure is absent, then <code class=\"language-plaintext highlighter-rouge\">node3</code> applies, and pK<sub>a</sub> 7.206913.</p>\n\n<p>Now, these nodes and there interdepencies are encoded in the initialize() method as:</p>\n\n<div class=\"language-java highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kd\">private</span> <span class=\"kd\">static</span> <span class=\"kt\">void</span> <span class=\"nf\">initalize</span><span class=\"o\">()</span> <span class=\"kd\">throws</span> <span class=\"nc\">CDKException</span> <span class=\"o\">{</span>\n  <span class=\"n\">node1</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"nc\">DecisionTreeNode</span><span class=\"o\">(</span><span class=\"mf\">5.9131093f</span><span class=\"o\">,</span> <span class=\"mf\">17.32f</span><span class=\"o\">);</span>                             \n  <span class=\"nc\">DecisionTreeNode</span> <span class=\"n\">node2</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"nc\">DecisionTreeNode</span><span class=\"o\">(</span><span class=\"mf\">3.6849957f</span><span class=\"o\">,</span> <span class=\"mf\">5.9569998f</span><span class=\"o\">);</span>        \n  <span class=\"nc\">DecisionTreeNode</span> <span class=\"n\">node3</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"nc\">DecisionTreeNode</span><span class=\"o\">(</span><span class=\"mf\">7.206913f</span><span class=\"o\">,</span> <span class=\"mf\">17.32f</span><span class=\"o\">);</span>             \n  <span class=\"n\">node1</span><span class=\"o\">.</span><span class=\"na\">setChildNodes</span><span class=\"o\">(</span><span class=\"s\">\"[#G6H]C(=O)\"</span><span class=\"o\">,</span> <span class=\"n\">node2</span><span class=\"o\">,</span> <span class=\"n\">node3</span><span class=\"o\">);</span>                             \n<span class=\"o\">}</span>\n</code></pre></div></div>\n\n<p>The second argument in the <code class=\"language-plaintext highlighter-rouge\">DecisionTreeNode</code> constructor is the value range for the node, and is an indication of the\nvariance of the prediction value.</p>\n\n<p>A simple Perl script can convert the file from the supplementary information into Java source code. With more than\n1500 nodes in the tree, this beats manual hacking up of the tree.</p>\n\n<h2 id=\"the-junit4-test\">The JUnit4 test</h2>\n\n<p>The unit tests now looks like:</p>\n\n<div class=\"language-java highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kn\">package</span> <span class=\"nn\">org.openscience.cdk.charges.pka</span><span class=\"o\">;</span>\n\n<span class=\"cm\">/**\n * Unit test to test the functionality of the {@link PkaPredictor}.\n *                                                                 \n * @author     egonw                                               \n * @cdk.module test-extra                                          \n */</span>                                                                \n<span class=\"kd\">public</span> <span class=\"kd\">class</span> <span class=\"nc\">PkaPredictorTest</span> <span class=\"kd\">extends</span> <span class=\"nc\">NewCDKTestCase</span> <span class=\"o\">{</span>             \n\n  <span class=\"nd\">@Test</span> <span class=\"kd\">public</span> <span class=\"kt\">void</span> <span class=\"nf\">testThrowsNoException</span><span class=\"o\">()</span> <span class=\"kd\">throws</span> <span class=\"nc\">Exception</span> <span class=\"o\">{</span>\n    <span class=\"nc\">IMolecule</span> <span class=\"n\">methane</span> <span class=\"o\">=</span> <span class=\"nc\">NoNotificationChemObjectBuilder</span><span class=\"o\">.</span><span class=\"na\">getInstance</span><span class=\"o\">().</span><span class=\"na\">newMolecule</span><span class=\"o\">();</span>\n    <span class=\"nc\">IAtom</span> <span class=\"n\">carbon</span> <span class=\"o\">=</span> <span class=\"n\">methane</span><span class=\"o\">.</span><span class=\"na\">getBuilder</span><span class=\"o\">().</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"nc\">Elements</span><span class=\"o\">.</span><span class=\"na\">CARBON</span><span class=\"o\">);</span>\n    <span class=\"n\">methane</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">carbon</span><span class=\"o\">);</span>\n\n    <span class=\"kt\">float</span> <span class=\"n\">result</span> <span class=\"o\">=</span> <span class=\"nc\">PkaPredictor</span><span class=\"o\">.</span><span class=\"na\">predict</span><span class=\"o\">(</span><span class=\"n\">methane</span><span class=\"o\">);</span>\n    <span class=\"c1\">// the actual value depends on the number of nodes I actually added,</span>\n    <span class=\"c1\">// but I *do* know the min and max without having to have all nodes</span>\n    <span class=\"c1\">// implemented</span>\n    <span class=\"nc\">Assert</span><span class=\"o\">.</span><span class=\"na\">assertTrue</span><span class=\"o\">(</span><span class=\"n\">result</span> <span class=\"o\">&lt;</span> <span class=\"mf\">15.526</span><span class=\"o\">);</span>\n    <span class=\"nc\">Assert</span><span class=\"o\">.</span><span class=\"na\">assertTrue</span><span class=\"o\">(</span><span class=\"n\">result</span> <span class=\"o\">&gt;</span> <span class=\"o\">-</span><span class=\"mf\">0.6659999</span><span class=\"o\">);</span>\n  <span class=\"o\">}</span>\n<span class=\"o\">}</span>\n</code></pre></div></div>\n\n<p>Note that I cannot assert the real prediction value until the full decision tree has been implemented in\nthe class, but I do note the full range and thus test for that. You may have noted that several methods\nthrow <code class=\"language-plaintext highlighter-rouge\">CDKException</code>’s, which would have been caused by SMARTS expressions the CDK cannot handle…</p>\n\n<h2 id=\"smarts-problems\">SMARTS problems…</h2>\n\n<p>Now, the SMARTS used in the supplementary information indeed do not work with the CDK SMARTS engine;\nthe paper indicates that they used <a href=\"http://www.chemcomp.com/\">MOE</a> which extends the original\n<a href=\"http://www.daylight.com/\">Daylight</a> SMARTS. So, if you ever wondered about the forking risk of Open Standards…</p>\n\n<p>So far, I have identified these three patterns used in the paper’s model, but not parsable by the CDK engine:</p>\n\n<ol>\n  <li><strong>[i]</strong> a SP2 hybridized carbon (aromatic or delocalized)</li>\n  <li><strong>[#G6]</strong> matches carbon and sulfur, so seems to indicate a group in the periodic table</li>\n  <li><strong>[#X]</strong> no idea… (no internet at home yet, so cannot Google either)</li>\n</ol>\n\n<p>The #G syntax can be rewritten in a OR form, and possible the others too. However, I’d rather see the\nCDK SMARTS engine support these industry adopted extensions.</p>\n\n<h2 id=\"conclusion\">Conclusion</h2>\n\n<p>The CDK shows its power as <em>development kit</em>, and allowed me to hack up the code of the paper on a\ncasual Saturday evening (sitting on the couch next to a fire in our kacheloven with a glass of beer).\nWriting up this blog was done the next day.</p>\n\n<p>Once the missing SMARTS patterns have been added to the CDK (or proper replacements have been\ndefined), I’ll compare the test set results of the paper with the CDK implementation. I probably\nalso convert the test set results from the supplementary information into unit tests (the SI\ncontains SMILES, experimental and predicted values).</p>",
      "summary": "Lee et al. published last week a paper on pKa prediction (doi:10.1021/ci8001815). As the paper says, the pKa, and in particular the ionic state of a molecule at physiological pH, affects pharmacokinetics and pharmacodynamics. The paper describes a (binary) decision tree using presence or absence of SMARTS substructures to traverse the tree, allowing prediction of monoprotic molecules.",
      
      "date_published": "2008-10-06T00:00:00+00:00",
      "date_modified": "2008-10-06T00:00:00+00:00",
      "tags": ["cdk","chemistry"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1021/ci8001815", "doi": "10.1021/ci8001815"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/ehmhy-2rw24",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/10/02/who-likes-my-friendfeed-posts-most.html",
      "title": "Who likes my FriendFeed posts most...",
      "content_html": "<p><a href=\"http://comments.deasil.com/\">Felix</a> has a <a href=\"http://comments.deasil.com/2008/09/30/ff-likes-me-meter/#\">small tool</a> on his website to show me\n(or anyone else) who likes what I post on <a href=\"http://friendfeed.com/egonw\">my FriendFeed account</a>:</p>\n\n<p><img src=\"/assets/images/whoLikesMe.png\" alt=\"\" /></p>\n\n<p>Which actually is <a href=\"http://mndoci.com/blog/\">Deepak</a>…</p>",
      "summary": "Felix has a small tool on his website to show me (or anyone else) who likes what I post on my FriendFeed account:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/whoLikesMe.png",
      "date_published": "2008-10-02T00:00:00+00:00",
      "date_modified": "2008-10-02T00:00:00+00:00",
      "tags": ["friendfeed"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/r8d8f-55c02",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/10/02/jchempaint-history-cml-patches-in-1999.html",
      "title": "JChemPaint history: CML patches in 1999",
      "content_html": "<p>There was some talk about the history of chemoinformatics toolkits by\n<a href=\"http://baoilleach.blogspot.com/2008/09/overview-of-cheminformatics-toolkits.html\">Noel</a> and\n<a href=\"http://www.dalkescientific.com/writings/diary/archive/2008/09/20/euroqsar.html\">Andrew</a>, which made\nme wonder on the exact history of <a href=\"http://www.jmol.org/\">Jmol</a> and\n<a href=\"http://sf.net/project/jchempaint\">JChemPaint</a>. Below is the email\n<a href=\"http://www.steinbeck-molecular.de/steinblog/\">Christoph</a> dug up from his archives:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>X-Mozilla-Status: 1011\nX-Mozilla-Status2: 00000000\nMessage-ID: &lt;372ECD5E.53A49584@ice.mpg.de&gt;\nDate: Tue, 04 May 1999 12:35:10 +0200\nFrom: Christoph Steinbeck\nReply-To: steinbeck@ice.mpg.de\nOrganization: Max-Planck-Institute of Chemical Ecology\nX-Mailer: Mozilla 4.51 [en] (WinNT; I)\nX-Accept-Language: en\nMIME-Version: 1.0\nTo: Egon Willighagen\nSubject: Re: Participating in JChemPaint\nReferences: &lt;000701be9613$34cf52e0$8e74ae83@catv6142.extern.kun.nl&gt;\nContent-Type: text/plain; charset=us-ascii\nContent-Transfer-Encoding: 7bit\n\n&gt; Egon Willighagen wrote:\n&gt;\n&gt; Dear Christoph Steinbeck,\n&gt;\n&gt; Yesterday I visited your site on JChemPaint. I like to contribute some\n&gt; of my expertise on\n&gt; Java and CML (1).\n&gt;\n&gt; CML is a markup language that is able to contain chemical information.\n&gt; It can contain for example physical properties, for which I use CML in\n&gt; my Dictionary on Organic Chemistry (2).\n&gt; But is also might contain spectra, bibliographic references etc. And\n&gt; of course 2D and 3D\n&gt; structural information.\n&gt;\n&gt; Therefore I propose to write both CML-input and -output procedures for\n&gt; the JChemPaint project.\n&gt;\n&gt; I hope to hear from you soon.\n&gt;\n&gt; Yours sincerely,\n&gt;\n&gt; Egon Willighagen\n&gt;\n&gt; 1. http://www.xml-cml.org/\n&gt; 2. http://www.sci.kun.nl/sigma/Chemisch/Woordenboek/\n\nDear Egon,\n\nthanks very much for your mail and your offer to write CML-input and\noutput routines for JChemPaint.\nThat really sounds great to me and I will give you access to our CVS\ntree as soon as we have discussed the details.\n\nCheers,\n\nChris\n\n--C. S.\nDr. Christoph Steinbeck (http://www.ice.mpg.de/~stein)\nMPI of Chemical Ecology, Tatzendpromenade 1a, 07745 Jena, Germany\nTel: +49(0)3641 643644 - MoPho: +49(0)177 8236510 - Fax: +49(0)3641\n643665\n\nWhat is man but that lofty spirit - that sense of enterprise.\n.. Kirk, \"I, Mudd,\" stardate 4513.3..\n</code></pre></div></div>\n\n<p>Now, my email must have been triggered by the <a href=\"http://freshmeat.net/projects/jchempaint/\">announcement of JChemPaint on FreshMeat.net</a>,\nwhich is the oldest public record of JChemPaint I have found so far:</p>\n\n<p><img src=\"/assets/images/fmJChemPaint.png\" alt=\"\" /></p>",
      "summary": "There was some talk about the history of chemoinformatics toolkits by Noel and Andrew, which made me wonder on the exact history of Jmol and JChemPaint. Below is the email Christoph dug up from his archives:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/fmJChemPaint.png",
      "date_published": "2008-10-02T00:00:00+00:00",
      "date_modified": "2008-10-02T00:00:00+00:00",
      "tags": ["jmol","jchempaint","cml"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/6h7hy-t8615",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/10/01/cherry-picking-commits-from-cdk-trunk.html",
      "title": "Cherry-picking commits from CDK trunk: how to make a reasonable commit message",
      "content_html": "<p>Some of you heard me complain about commit messages resulting from <code class=\"language-plaintext highlighter-rouge\">git cherry-pick</code> which allows me to apply patches from\n<a href=\"http://cdk.sf.net/\">CDK</a> <a href=\"http://cdk.svn.sourceforge.net/viewvc/cdk/cdk/trunk/\">trunk</a> to a branch, without needing to do\na full merge of what happens in trunk. The commit messages would be identical, which made it seem that those original messages were mine.</p>\n\n<p>However, this is how I can modify those messages:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nv\">$ </span>git commit <span class=\"nt\">--amend</span>\n</code></pre></div></div>\n\n<p>This allows me to convert a mere <em>refactored a method</em> into <em>Applied patch from trunk (rev 12479): [shk3] refactored a method</em>.</p>",
      "summary": "Some of you heard me complain about commit messages resulting from git cherry-pick which allows me to apply patches from CDK trunk to a branch, without needing to do a full merge of what happens in trunk. The commit messages would be identical, which made it seem that those original messages were mine.",
      
      "date_published": "2008-10-01T00:00:00+00:00",
      "date_modified": "2008-10-01T00:00:00+00:00",
      "tags": ["git","cdk"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/assax-tmx60",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/09/30/git-mirror-for-cdk.html",
      "title": "Git mirror for the CDK",
      "content_html": "<p>While slowly <a href=\"https://chem-bla-ics.linkedchemistry.info/2008/09/24/moved-to-sweden-post-doc-in-bioclipse.html\">merging with Sweden <i class=\"fa-solid fa-recycle fa-xs\"></i></a>, and ADSL which should\nreach my house in some two weeks, I am enjoying my new office space and <a href=\"http://git.or.cz/\">Git</a> to upload patches to the CDK. Christoph\n<a href=\"http://www.steinbeck-molecular.de/steinblog/index.php/2008/08/26/linus-on-git-on-google-techtalks/\">wondered</a> if we should switch CDK\nfrom SVN to Git. A few developers objected, for various reasons: no native Windows clients (though <a href=\"http://code.google.com/p/msysgit/\">msysgit</a>\nmight be the solution), no (stable) plugins for Eclipse, IDEA(?), etc.</p>\n\n<p>I <a href=\"https://chem-bla-ics.linkedchemistry.info/2008/09/07/cdk-development-with-branches-using-git.html\">made the switch <i class=\"fa-solid fa-recycle fa-xs\"></i></a>, and really happy about it.</p>\n\n<p>Anyway, one issue for me not to switch the full CDK project would be to have a central place where we could host our Git repository. Now,\nGitHub does just that, and after inquiring with them about the 100MB limit, <a href=\"http://github.com/mojombo\">Tom</a> emailed me:</p>\n\n<blockquote>\n  <p>Hi Egon,</p>\n\n  <p>We’d love to have your open source project on GitHub. The 100MB is currently a soft limit, so you won’t have any problems uploading\na larger repo. We hope you enjoy GitHub!</p>\n\n  <p>Tom Preston-Werner <br />\ngithub.com/mojombo</p>\n</blockquote>\n\n<p>So, I created an <a href=\"http://github.com/egonw\">account</a> (I’m happy there are so few <em>Egon</em>’s in the world :), and\n<a href=\"http://github.com/egonw/cdk/tree/master\">uploaded the CDK 1.2 branch</a>, which, for now at least, will serve as mirror only, while\nSVN will be the primary repository.</p>\n\n<p>You can easily check it out with:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nv\">$ </span>git clone git://github.com/egonw/cdk.git\n</code></pre></div></div>\n\n<p>I am not sure how you can email me your patches, but I know it is possible and report on this later. This mirror is important to those who want to play with Git, as one no longer requires git-svn, dropping one dependency.</p>\n\n<p>Now, it does provide some extra payload on my side, as I need to keep cdk SVN repository (or, better, my git-svn copy of it) synchronized with the git repository, but this turned out to be fairly easy:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nv\">$ </span><span class=\"nb\">cd </span>GitHub/cdk\n<span class=\"nv\">$ </span>git pull ../../SourceForge/git-svn/cdk my-local-1.2\n<span class=\"nv\">$ </span>git push\n</code></pre></div></div>\n\n<p>So, does this mean no goodies for people who stick to SVN? No, there are some, like this\n<a href=\"http://github.com/egonw/cdk/graphs/punch_card\">PunchCard</a>:</p>\n\n<p><img src=\"/assets/images/punchCard.png\" alt=\"\" /></p>",
      "summary": "While slowly merging with Sweden , and ADSL which should reach my house in some two weeks, I am enjoying my new office space and Git to upload patches to the CDK. Christoph wondered if we should switch CDK from SVN to Git. A few developers objected, for various reasons: no native Windows clients (though msysgit might be the solution), no (stable) plugins for Eclipse, IDEA(?), etc.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/punchCard.png",
      "date_published": "2008-09-30T00:00:00+00:00",
      "date_modified": "2025-10-11T00:00:00+00:00",
      "tags": ["git","cdk"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/ht6ah-j3602",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/09/24/moved-to-sweden-post-doc-in-bioclipse.html",
      "title": "Moved to Sweden: Post-doc in the Bioclipse group of Prof. Jarl Wikberg",
      "content_html": "<p>The reason why I have not been able to blog much lately, is that my family and I have been moving to\n<a href=\"http://en.wikipedia.org/wiki/Uppsala\">Uppsala</a>/Sweden, where I’ll start a postdoc in the\n<a href=\"http://www.farmbio.uu.se/researchgroup.php?fg=1\">group of Jarl Wikberg</a> @ <a href=\"http://www.bmc.uu.se/\">BMC</a> @\n<a href=\"http://en.wikipedia.org/wiki/Uppsala_University\">Uppsala University</a>, where I’ll work on chemoinformatics\nin drug design, and the use of <a href=\"http://cdk.sf.net/\">CDK</a> and <a href=\"http://www.bioclipse.net/\">Bioclipse</a>\nin particular.</p>\n\n<p>More blogging when I have more frequent internet access again…</p>",
      "summary": "The reason why I have not been able to blog much lately, is that my family and I have been moving to Uppsala/Sweden, where I’ll start a postdoc in the group of Jarl Wikberg @ BMC @ Uppsala University, where I’ll work on chemoinformatics in drug design, and the use of CDK and Bioclipse in particular.",
      
      "date_published": "2008-09-24T00:00:00+00:00",
      "date_modified": "2008-09-24T00:00:00+00:00",
      "tags": ["cdk","cheminf","chemometrics","career","bioclipse"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/vcd40-00e56",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/09/09/friendfeed-for-chemistry-development.html",
      "title": "FriendFeed for the Chemistry Development Kit",
      "content_html": "<p><a href=\"http://www.friendfeed.com/\">FriendFeed</a> is a nice aggregation service allowing discussion of items posted from delicious,\nblogs, and any other RSS-based feed (e.g. <a href=\"http://www.friendfeed.com/egonw\">my feed</a>). It also has a <em>room</em> concept, where\npeople can post stuff around a topic, such as a conference such as <a href=\"http://beta.friendfeed.com/rooms/science-blogging-2008\">Science Blogging 2008 London</a>,\nor the <a href=\"http://www.friendfeed.com/rooms/chemistry-development-kit\">CDK</a>:</p>\n\n<p><img src=\"/assets/images/cdkFF.png\" alt=\"\" /></p>\n\n<p>I have associated the RSS feed of the CDK <a href=\"http://sourceforge.net/tracker2/?group_id=20024&amp;atid=120024\">bug tracker</a>,\nthe <a href=\"http://www.steinbeck-molecular.de/cdknews/index.php/CDKNews/issue/view/10\">CDK News ASAP</a>,\nand will shortly add the <a href=\"http://cia.vc/stats/project/cdk/cdk\">commits messages feed</a>.</p>",
      "summary": "FriendFeed is a nice aggregation service allowing discussion of items posted from delicious, blogs, and any other RSS-based feed (e.g. my feed). It also has a room concept, where people can post stuff around a topic, such as a conference such as Science Blogging 2008 London, or the CDK:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/cdkFF.png",
      "date_published": "2008-09-09T00:00:00+00:00",
      "date_modified": "2008-09-09T00:00:00+00:00",
      "tags": ["cdk","friendfeed"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/vt3yd-8nb03",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/09/07/cdk-development-with-branches-using-git.html",
      "title": "CDK development with branches using Git",
      "content_html": "<p><a href=\"http://www.steinbeck-molecular.de/steinblog/\">Christoph</a> pointed me to a <a href=\"http://www.steinbeck-molecular.de/steinblog/index.php/2008/08/26/linus-on-git-on-google-techtalks/\">video on Git by Linus</a>.\n<a href=\"http://cdk.sf.net/\">CDK</a> is now using branches extensively in development, and just set up a branch for the upcoming 1.2.0 release\nlater this year (end of October, see <a href=\"http://cdk.svn.sourceforge.net/viewvc/cdk/cdk/branches/cdk-1.2.x/\">cdk-1.2.x</a>). Christoph has just\n<a href=\"http://www.steinbeck-molecular.de/steinblog/index.php/2008/09/01/creating-and-reviewing-patches-in-the-chemistry-development-kit-cdk/\">reviewed</a>\nthe branch containing the API move to <a href=\"http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Iterable.html\">Iterable</a>. This patch now\nallows to do this (which would really deserve a blog item by itself):</p>\n\n<div class=\"language-java highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">for</span> <span class=\"o\">(</span><span class=\"nc\">IAtom</span> <span class=\"n\">atom</span> <span class=\"o\">:</span> <span class=\"n\">molecule</span><span class=\"o\">.</span><span class=\"na\">atoms</span><span class=\"o\">())</span> <span class=\"o\">{</span>\n  <span class=\"nc\">System</span><span class=\"o\">.</span><span class=\"na\">out</span><span class=\"o\">.</span><span class=\"na\">println</span><span class=\"o\">(</span><span class=\"s\">\"Symbol: \"</span> <span class=\"o\">+</span> <span class=\"n\">atom</span><span class=\"o\">.</span><span class=\"na\">getSymbol</span><span class=\"o\">());</span>\n<span class=\"o\">}</span>\n</code></pre></div></div>\n\n<p>Now, while branching in SVN is easy (<code class=\"language-plaintext highlighter-rouge\">svn copy</code>), merging is a pain, something Miguel and I found out in the last half year, where\nhe and I experimented with using branches in development (see also <a href=\"https://chem-bla-ics.linkedchemistry.info/2008/05/02/comparing-junit-test-results-between.html\">Comparing Branches <i class=\"fa-solid fa-recycle fa-xs\"></i></a>).\nWe discovered that porting bug fixes from trunk to a branch, or just keeping the branch synchronized with trunk, simply does not work.\nAnd merging itself, after a while, became a tedious process. So, when watching Linus’ movie on Git where he mentions being able to\nmerge several branches a day, I knew I had to switch. A full switch for the CDK <a href=\"http://www.steinbeck-molecular.de/steinblog/index.php/2008/08/26/linus-on-git-on-google-techtalks/#comment-1294\">depends</a>\non an always accessible repository (I have been thinking about <a href=\"http://github.com/blog\">GitHub</a>; anyone with an opinion on that?).</p>\n\n<p>However, you can start using Git without a central Git repository, including branch support. This blog by\n<a href=\"http://www.jukie.net/~bart/blog/svn-branches-in-git\">Bart</a> has the juicy details, which I’ll apply here to CDK, for easy\ncopy/pasting. This replaces the earlier writing on\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2007/10/31/offline-cdk-development-using-git-svn.html\">Offline CDK development using git-svn <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.</p>\n\n<p>First step is to get yourself a Git mirror of SVN (which will take a long time; do it overnight(s)):</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nv\">$ </span>git svn clone https://cdk.svn.sourceforge.net/svnroot/cdk/cdk/ <span class=\"nt\">-T</span> trunk <span class=\"nt\">-b</span> branches <span class=\"nt\">-t</span> tags\n<span class=\"nv\">$ </span>git gc\n</code></pre></div></div>\n\n<p>The second command compresses commits to reduce the size of your local Git copy, resulting in a cdk folder of about 300MB. Enter the\ndirectory, and check that it has the default <code class=\"language-plaintext highlighter-rouge\">master</code> branch:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nv\">$ </span>git branch\n<span class=\"k\">*</span> master\n</code></pre></div></div>\n\n<p>In SVN one must always do a svn update before one starts coding. Similarly, in git you do (and I found this important to keep your local repository consistent):</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nv\">$ </span>git svn rebase\n</code></pre></div></div>\n\n<p>Committing has not changed, and a simple change would go via:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nv\">$ </span>nano build.xml\n<span class=\"nv\">$ </span>git commit <span class=\"nt\">-m</span> <span class=\"s2\">\"Changed something, but too lazy to write up what I actually changed\"</span> build.xml\n<span class=\"nv\">$ </span>git svn dcommit\n</code></pre></div></div>\n\n<h2 id=\"branches\">Branches</h2>\n\n<p>Now, before we move to setting up branches, one must realize that there are SVN branches and (local) Git branches. Keep that in mind, and\nconsider that we have Git to realize how to keep them synchronized. The check the Git branches one uses <code class=\"language-plaintext highlighter-rouge\">git branch</code> as shown above; to\nview the SVN branches, however, we type (which should produce a quite long list for the CDK; only a few listed below):</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nv\">$ </span>git branch <span class=\"nt\">-r</span>\n trunk\n tags/cdk-2003-Oct-17\n cdk-1.2.x\n mesprague-iterators\n</code></pre></div></div>\n\n<p>Here, the first is <a href=\"http://cdk.svn.sourceforge.net/viewvc/cdk/cdk/trunk/\">CDK trunk</a>, the second a tag\n<a href=\"http://cdk.svn.sourceforge.net/viewvc/cdk/cdk/tags/cdk-2003-Oct-17/\">tags/cdk-2003-Oct-17</a>, and the last two are the branches\n<a href=\"http://cdk.svn.sourceforge.net/viewvc/cdk/cdk/branches/cdk-1.2.x/\">cdk-1.2.x</a> and <em>mesprague-iterators</em> (no longer existing).\nI am not sure why the <em>branches/</em> is missing here; some git-svn magic I presume.</p>\n\n<p>Now, to create local Git branches that are synchronized with the SVN <em>cdk-1.2.x</em> and <em>cdk-1.0.x</em> branches, we type:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nv\">$ </span>git checkout <span class=\"nt\">-b</span> my-local-1.2 cdk-1.2.x\n<span class=\"nv\">$ </span>git checkout <span class=\"nt\">-b</span> my-local-1.0 cdk-1.0.x\n<span class=\"nv\">$ </span>git branch\n<span class=\"k\">*</span> master\n  my-local-1.0\n  my-local-1.2\n</code></pre></div></div>\n\n<p>You can now easily change branches with <code class=\"language-plaintext highlighter-rouge\">git checkout &lt;BRANCH&gt;</code>, and check which SVN path you are working against with <code class=\"language-plaintext highlighter-rouge\">git log -1</code>:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nv\">$ </span>git checkout my-local-1.2\n<span class=\"nv\">$ </span>git log <span class=\"nt\">-1</span>\ncommit 93bd0b22bbad31897eed6686e5b208c5e23505f7\nAuthor: egonw\nDate:   Sun Sep 7 08:13:38 2008 +0000\n\n    Fixed inline citation <span class=\"o\">(</span>closes <span class=\"c\">#1987947)</span>\n\n\n    git-svn-id: https://cdk.svn.sourceforge.net/svnroot/cdk/cdk/branches/cdk-1.2.x@12215 eb4e18e3-b210-0410-a6ab-dec725e4b171\n</code></pre></div></div>\n\n<p>Inspection of the output shows the <code class=\"language-plaintext highlighter-rouge\">git-svn-id</code> line which indicates that that patch was indeed commited against <em>cdk/branches/cdk-1.2.x</em>.</p>\n\n<p>With this set up, I can easily changes between trunk and branches, and backport patches from trunk to the <em>cdk-1.2.x</em>\nbranch (using <code class=\"language-plaintext highlighter-rouge\">git cherry-pick</code>) and merge all commits to the branch into trunk using:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nv\">$ </span>git checkout master\ngit merge cdk-1.2.x\n</code></pre></div></div>\n\n<p>Git does an excellent job here. It recognizes when the branch was last merged with trunk, and will not attempt to apply\npatches twice. Even better, it also recognized patches that were backported from trunk to the branch, and will not attempt\nto merge that either.</p>\n\n<p>The result: I can easily merge branches now, generally speeding up CDK development! For example, it reduces the time\nbetween someone submits a patch, and when I apply it to <em>trunk</em> (or <em>cdk-1.2.x</em> in case of a bug fix). I just set up a\nlocal branch, apply the patch, and tune until I am happy; I do not keep trunk unstable, as I am doing this in a\nseparate branch. Similarly, if people develop there patch in an SVN branch, I can just as easily switch branches\n(as described above) and check things, before I merge).</p>\n\n<h2 id=\"setting-up-new-svn-branches\">Setting up new SVN branches</h2>\n\n<p>As far as I know, <code class=\"language-plaintext highlighter-rouge\">git-svn</code> cannot create or delete new SVN branches. But this is easy enough with SVN command:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nv\">$ </span>svn copy https://cdk.svn.sourceforge.net/svnroot/cdk/cdk/trunk https://cdk.svn.sourceforge.net/svnroot/cdk/cdk/branches/egonw-mynewbranch\n<span class=\"nv\">$ </span>git svn fetch\n<span class=\"nv\">$ </span>git checkout <span class=\"nt\">-b</span> my-local-newbranch egonw-mynewbranch\n<span class=\"nv\">$ </span><span class=\"c\"># hack in my-local-newbranch</span>\n<span class=\"nv\">$ </span>git commit <span class=\"nt\">-a</span>\n<span class=\"nv\">$ </span>git svn dcommit\n<span class=\"nv\">$ </span>git checkout master\n<span class=\"nv\">$ </span>git merge my-local-newbranch\n<span class=\"nv\">$ </span>svn remove https://cdk.svn.sourceforge.net/svnroot/cdk/cdk/branches/egonw-mynewbranch\n</code></pre></div></div>\n\n<p>Enough for now.</p>",
      "summary": "Christoph pointed me to a video on Git by Linus. CDK is now using branches extensively in development, and just set up a branch for the upcoming 1.2.0 release later this year (end of October, see cdk-1.2.x). Christoph has just reviewed the branch containing the API move to Iterable. This patch now allows to do this (which would really deserve a blog item by itself):",
      
      "date_published": "2008-09-07T00:00:00+00:00",
      "date_modified": "2025-10-11T00:00:00+00:00",
      "tags": ["cdk","svn"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/s6bgg-e7a38",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/09/01/ubiquity-fun-entering-semantic-markup.html",
      "title": "Ubiquity fun: entering semantic markup as easy as running a Ubiquity command",
      "content_html": "<p>Now, the <a href=\"https://chem-bla-ics.linkedchemistry.info/2008/09/01/ubiquity-fun-resolving-dois.html\">DOI ubiquity scripts <i class=\"fa-solid fa-recycle fa-xs\"></i></a> I just blogged about,\nwas just the beginning of things. Me exploring the environment and learning the JavaScript language.</p>\n\n<p>I start to become really interesting when we use these technologies to improve things. I am still not sure people will like the\ncommand line nature, but at least I will be a happy user. This is the setting: I’m blogging about some chemistry, like to add an\nInChI (or InChIKey) and add that <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/12/10/including-smiles-cml-and-inchi-in.html\">cool sechemtic markup <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\npeople have been blogging about, but I do not know (or want to know) the HTML details for that.</p>\n\n<p>Well, no worries, no more. Here comes <em>sechemtic-inchi</em> (<a href=\"http://blueobelisk.sourceforge.net/people/egonw/sechemtic-inchi.html\">installer here</a>)!</p>\n\n<h2 id=\"step-1\">Step 1</h2>\n\n<p>I type in the InChI I want in my blog (example showing that of <a href=\"http://en.wikipedia.org/wiki/Methane\">methane</a>):</p>\n\n<p><img src=\"/assets/images/ubiSechemticStep1.png\" alt=\"\" /></p>\n\n<p>And, I select the InChI:</p>\n\n<p><img src=\"/assets/images/ubiSechemticStep1.5.png\" alt=\"\" /></p>\n\n<p>After which I hit the Ubiquity shortcut (ALT-SPACE on Linux) and I type <em>sechemtic-inchi</em>:</p>\n\n<p><img src=\"/assets/images/ubiSechemticStep2.png\" alt=\"\" /></p>\n\n<p>And, viola, there is my RDFa HTML code for chemistry:</p>\n\n<p><img src=\"/assets/images/ubiSechemticStep3.png\" alt=\"\" /></p>\n\n<p>Now, with only minor amounts of fantasy, you can imagine where this is going: SMILES, InChiKey, etc, etc. Hook it up with\n<a href=\"http://rguha.wordpress.com/2008/08/30/ubiquity-and-chemical-information/\">chemistry webservices</a> to autoconvert\nSMILES to InChIKeys, and Bob’s your uncle.</p>",
      "summary": "Now, the DOI ubiquity scripts I just blogged about, was just the beginning of things. Me exploring the environment and learning the JavaScript language.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/ubiSechemticStep1.5.png",
      "date_published": "2008-09-01T00:10:00+00:00",
      "date_modified": "2025-10-05T00:00:00+00:00",
      "tags": ["rdf","javascript","web","ubiquity"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/gfjz7-ykc12",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/09/01/ubiquity-fun-resolving-dois.html",
      "title": "Ubiquity fun: resolving DOIs",
      "content_html": "<p>Now, I’m really after something else, but here’s my first <a href=\"https://wiki.mozilla.org/Labs/Ubiquity\">Ubiquity</a>\n<a href=\"http://blueobelisk.sourceforge.net/people/egonw/ubiquity.html\">scripts</a>. It allow you to select a DOI on any\nweb page (which really only makes sense if it is not already a hyperlink), you hit ALT-SPACE (Linux), CTRL-SPACE (Windows),\nor whatever the shortcut is on your operating system, and type <em>resolve-doi</em> and it will automatically convert the DOI\ninto a hyperlink to look up the paper.</p>\n\n<p>What I am actually interested in, is being able to use this command in a blog editing environment; however, I\nhave not managed to get that working in one command. And because I am apparently not able to put in two ubiquity\ncommands in blog items, you need to go to <a href=\"http://blueobelisk.sourceforge.net/people/egonw/ubiquity.html\">this page</a>.</p>\n\n<p>Second warning. I have only tried them with Ubiquity 0.1, not 0.1.1, or even later.</p>\n\n<p>For the curious, the script looks like:</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nx\">CmdUtils</span><span class=\"p\">.</span><span class=\"nc\">CreateCommand</span><span class=\"p\">({</span>\n  <span class=\"na\">name</span><span class=\"p\">:</span> <span class=\"dl\">\"</span><span class=\"s2\">resolve-doi</span><span class=\"dl\">\"</span><span class=\"p\">,</span>\n  <span class=\"na\">homepage</span><span class=\"p\">:</span> <span class=\"dl\">\"</span><span class=\"s2\">http://chem-bla-ics.blogspot.com/</span><span class=\"dl\">\"</span><span class=\"p\">,</span>\n  <span class=\"na\">author</span><span class=\"p\">:</span> <span class=\"p\">{</span> <span class=\"na\">name</span><span class=\"p\">:</span> <span class=\"dl\">\"</span><span class=\"s2\">Egon Willighagen</span><span class=\"dl\">\"</span><span class=\"p\">,</span> <span class=\"na\">email</span><span class=\"p\">:</span> <span class=\"dl\">\"</span><span class=\"s2\">egon.willighagen@gmail.com</span><span class=\"dl\">\"</span><span class=\"p\">},</span>\n  <span class=\"na\">description</span><span class=\"p\">:</span> <span class=\"dl\">\"</span><span class=\"s2\">Resolves a DOI into a URL</span><span class=\"dl\">\"</span><span class=\"p\">,</span>\n  <span class=\"na\">license</span><span class=\"p\">:</span> <span class=\"dl\">\"</span><span class=\"s2\">GPL</span><span class=\"dl\">\"</span><span class=\"p\">,</span>\n  <span class=\"na\">takes</span><span class=\"p\">:</span> <span class=\"p\">{</span><span class=\"dl\">\"</span><span class=\"s2\">doi</span><span class=\"dl\">\"</span><span class=\"p\">:</span> <span class=\"nx\">noun_arb_text</span><span class=\"p\">},</span>\n\n  <span class=\"na\">preview</span><span class=\"p\">:</span> <span class=\"kd\">function</span><span class=\"p\">(</span> <span class=\"nx\">pblock</span><span class=\"p\">,</span> <span class=\"nx\">doi</span> <span class=\"p\">)</span> <span class=\"p\">{</span>\n    <span class=\"kd\">var</span> <span class=\"nx\">msg</span> <span class=\"o\">=</span> <span class=\"dl\">'</span><span class=\"s1\">Inserts a URL for the DOI: ${doi}</span><span class=\"dl\">'</span><span class=\"p\">;</span>\n    <span class=\"kd\">var</span> <span class=\"nx\">d</span> <span class=\"o\">=</span> <span class=\"nx\">doi</span><span class=\"p\">.</span><span class=\"nx\">text</span> <span class=\"o\">||</span> <span class=\"nx\">CmdUtils</span><span class=\"p\">.</span><span class=\"nf\">getSelection</span><span class=\"p\">();</span>\n    <span class=\"nx\">pblock</span><span class=\"p\">.</span><span class=\"nx\">innerHTML</span> <span class=\"o\">=</span> <span class=\"nx\">CmdUtils</span><span class=\"p\">.</span><span class=\"nf\">renderTemplate</span><span class=\"p\">(</span><span class=\"nx\">msg</span><span class=\"p\">,</span> <span class=\"p\">{</span><span class=\"na\">doi</span><span class=\"p\">:</span> <span class=\"nx\">d</span><span class=\"p\">});</span>\n  <span class=\"p\">},</span>\n\n  <span class=\"na\">execute</span><span class=\"p\">:</span> <span class=\"kd\">function</span><span class=\"p\">(</span> <span class=\"nx\">doi</span> <span class=\"p\">)</span> <span class=\"p\">{</span>\n    <span class=\"kd\">var</span> <span class=\"nx\">msg</span> <span class=\"o\">=</span> <span class=\"dl\">'</span><span class=\"s1\">&lt;a href=\"http://dx.doi.org/${doi}\"&gt;${doi}&lt;/a&gt;</span><span class=\"dl\">'</span><span class=\"p\">;</span>\n    <span class=\"kd\">var</span> <span class=\"nx\">d</span> <span class=\"o\">=</span> <span class=\"nx\">doi</span><span class=\"p\">.</span><span class=\"nx\">text</span> <span class=\"o\">||</span> <span class=\"nx\">CmdUtils</span><span class=\"p\">.</span><span class=\"nf\">getSelection</span><span class=\"p\">();</span>\n    <span class=\"kd\">var</span> <span class=\"nx\">newText</span> <span class=\"o\">=</span> <span class=\"nx\">CmdUtils</span><span class=\"p\">.</span><span class=\"nf\">renderTemplate</span><span class=\"p\">(</span><span class=\"nx\">msg</span><span class=\"p\">,</span> <span class=\"p\">{</span><span class=\"na\">doi</span><span class=\"p\">:</span> <span class=\"nx\">d</span><span class=\"p\">});</span>\n    <span class=\"nx\">CmdUtils</span><span class=\"p\">.</span><span class=\"nf\">setSelection</span><span class=\"p\">(</span><span class=\"nx\">newText</span><span class=\"p\">);</span>\n  <span class=\"p\">}</span>\n<span class=\"p\">})</span>\n</code></pre></div></div>\n\n<p>Comments on this code most welcome! It’s <a href=\"http://www.gnu.org/copyleft/gpl.html\">GPL</a>. Details can be found in\n<a href=\"https://wiki.mozilla.org/Labs/Ubiquity/Ubiquity_0.1_Author_Tutorial\">this tutorial</a> and examples in\n<a href=\"http://rguha.wordpress.com/2008/08/30/ubiquity-and-chemical-information/\">Rajarshi’s blog</a>.</p>",
      "summary": "Now, I’m really after something else, but here’s my first Ubiquity scripts. It allow you to select a DOI on any web page (which really only makes sense if it is not already a hyperlink), you hit ALT-SPACE (Linux), CTRL-SPACE (Windows), or whatever the shortcut is on your operating system, and type resolve-doi and it will automatically convert the DOI into a hyperlink to look up the paper.",
      
      "date_published": "2008-09-01T00:00:00+00:00",
      "date_modified": "2008-09-01T00:00:00+00:00",
      "tags": ["javascript","web","doi","ubiquity"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/32fcj-fd465",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/08/31/creating-cmlreact-from-usefulchem-ugi.html",
      "title": "Creating CMLReact from UsefulChem Ugi Reactions",
      "content_html": "<p><a href=\"http://blog.openwetware.org/scienceintheopen/\">Cameron</a>, <a href=\"http://usefulchem.blogspot.com/\">Jean-Claude</a> and I were invited to\n<a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/\">Peter</a>’s place in Cambridge, where we are now hacking on CMLReact for the\n<a href=\"http://usefulchem.wikispaces.com/exp023\">Ugi reactions</a> Jean-Claude has been working on. I just finished a script that uses the\nCDK and Sam’s interface to the <a href=\"http://cheminfo.informatics.indiana.edu/~rguha/code/java/nightly/api/org/openscience/cdk/inchi/package-frame.html\">InChI library</a>\nto convert a list of four reactants and one Ugi product into CMLReact (doi:<a href=\"http://dx.doi.org/10.1021/ci0502698\">10.1021/ci0502698</a>).\nThe full <a href=\"https://en.wikipedia.org/wiki/BeanShell\">BeanShell</a> script looks like:</p>\n\n<pre><code class=\"language-beanshell\">#!/usr/bin/bsh\n\nimport java.io.File;\nimport java.io.FileReader;\nimport java.io.BufferedReader;\n\nimport org.openscience.cdk.*;\nimport org.openscience.cdk.exception.*;\nimport org.openscience.cdk.inchi.*;\nimport org.openscience.cdk.interfaces.*;\nimport org.openscience.cdk.io.CMLWriter;\n\nimport org.openscience.cdk.libio.cml.Convertor;\nimport org.xmlcml.cml.element.CMLReaction;\n\nimport net.sf.jniinchi.INCHI_RET;\n\nInChIGeneratorFactory factory = new InChIGeneratorFactory();\n// Get InChIToStructure\n\nFile file = new File(\"inchi.ugi.txt\"); // five inchis expected, last being the product\nBufferedReader reader = new BufferedReader(new FileReader(file));\n\nString first = reader.readLine();\nString second = reader.readLine();\nString third = reader.readLine();\nString fourth = reader.readLine();\nString product = reader.readLine();\n\nSystem.out.println(\"First: \" + first);\nIMolecule firstAC;\n{\n  InChIToStructure intostruct = factory.getInChIToStructure(first, DefaultChemObjectBuilder.getInstance());\n\n  INCHI_RET ret = intostruct.getReturnStatus();\n  if (ret == INCHI_RET.WARNING) {\n    // Structure generated, but with warning message\n    System.out.println(\"InChI warning: \" + intostruct.getMessage());\n  } else if (ret != INCHI_RET.OKAY) {\n    // Structure generation failed\n    throw new CDKException(\"Structure generation failed failed: \" + ret.toString()\n      + \" [\" + intostruct.getMessage() + \"]\");\n  }\n\n  firstAC = new Molecule(intostruct.getAtomContainer());\n}\n\nSystem.out.println(\"Second: \" + second);\nIMolecule secondAC;\n{\n  InChIToStructure intostruct = factory.getInChIToStructure(second, DefaultChemObjectBuilder.getInstance());\n\n  INCHI_RET ret = intostruct.getReturnStatus();\n  if (ret == INCHI_RET.WARNING) {\n    // Structure generated, but with warning message\n    System.out.println(\"InChI warning: \" + intostruct.getMessage());\n  } else if (ret != INCHI_RET.OKAY) {\n    // Structure generation failed\n    throw new CDKException(\"Structure generation failed failed: \" + ret.toString()\n      + \" [\" + intostruct.getMessage() + \"]\");\n  }\n\n  secondAC = new Molecule(intostruct.getAtomContainer());\n}\n\nSystem.out.println(\"Third: \" + third);\nIMolecule thirdAC;\n{\n  InChIToStructure intostruct = factory.getInChIToStructure(third, DefaultChemObjectBuilder.getInstance());\n\n  INCHI_RET ret = intostruct.getReturnStatus();\n  if (ret == INCHI_RET.WARNING) {\n    // Structure generated, but with warning message\n    System.out.println(\"InChI warning: \" + intostruct.getMessage());\n  } else if (ret != INCHI_RET.OKAY) {\n    // Structure generation failed\n    throw new CDKException(\"Structure generation failed failed: \" + ret.toString()\n      + \" [\" + intostruct.getMessage() + \"]\");\n  }\n\n  thirdAC = new Molecule(intostruct.getAtomContainer());\n}\n\nSystem.out.println(\"Fourth: \" + fourth);\nIMolecule fourthAC;\n{\n  InChIToStructure intostruct = factory.getInChIToStructure(fourth, DefaultChemObjectBuilder.getInstance());\n\n  INCHI_RET ret = intostruct.getReturnStatus();\n  if (ret == INCHI_RET.WARNING) {\n    // Structure generated, but with warning message\n    System.out.println(\"InChI warning: \" + intostruct.getMessage());\n  } else if (ret != INCHI_RET.OKAY) {\n    // Structure generation failed\n    throw new CDKException(\"Structure generation failed failed: \" + ret.toString()\n      + \" [\" + intostruct.getMessage() + \"]\");\n  }\n\n  fourthAC = new Molecule(intostruct.getAtomContainer());\n}\n\nSystem.out.println(\"Product: \" + product);\nIMolecule productAC;\n{\n  InChIToStructure intostruct = factory.getInChIToStructure(product, DefaultChemObjectBuilder.getInstance());\n\n  INCHI_RET ret = intostruct.getReturnStatus();\n  if (ret == INCHI_RET.WARNING) {\n    // Structure generated, but with warning message\n    System.out.println(\"InChI warning: \" + intostruct.getMessage());\n  } else if (ret != INCHI_RET.OKAY) {\n    // Structure generation failed\n    throw new CDKException(\"Structure generation failed failed: \" + ret.toString()\n      + \" [\" + intostruct.getMessage() + \"]\");\n  }\n\n  productAC = new Molecule(intostruct.getAtomContainer());\n}\n\nIReaction ugiReaction = new Reaction();\nugiReaction.addReactant(firstAC);\nugiReaction.addReactant(secondAC);\nugiReaction.addReactant(thirdAC);\nugiReaction.addReactant(fourthAC);\nugiReaction.addProduct(productAC);\n\nStringWriter stringWriter = new StringWriter();\nCMLWriter cmlWriter = new CMLWriter(stringWriter);\n\ncmlWriter.write(ugiReaction);\ncmlWriter.close();\nSystem.out.println(stringWriter.toString());\n</code></pre>\n\n<p>My apologies for the code duplication, but never tried inline functions in BeanShell yet… You can\nmonitor the efforts at <a href=\"http://docs.google.com/Doc?id=dq5m5bs_12hb8d2wcw\">Google Docs</a>.</p>",
      "summary": "Cameron, Jean-Claude and I were invited to Peter’s place in Cambridge, where we are now hacking on CMLReact for the Ugi reactions Jean-Claude has been working on. I just finished a script that uses the CDK and Sam’s interface to the InChI library to convert a list of four reactants and one Ugi product into CMLReact (doi:10.1021/ci0502698). The full BeanShell script looks like:",
      
      "date_published": "2008-08-31T00:20:00+00:00",
      "date_modified": "2025-10-11T00:00:00+00:00",
      "tags": ["cml","cdk","inchi","usefulchem"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1021/ci0502698", "doi": "10.1021/ci0502698"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/b8qbg-ss716",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/08/31/ugichem2cml.html",
      "title": "UgiChem2CML",
      "content_html": "<p>The nice thing about a hacksession, is that you have something to write about. Below a screenshot of a\n<a href=\"https://en.wikipedia.org/wiki/Ugi_reaction\">Ugi reaction</a> in <a href=\"http://www.bioclipse.net/\">Bioclipse</a>…\nnote the <em>source</em> tab of the editor, which holds the CML. Now, JChemPaint can do reactions too (I did that in 2003\nin Peter’s group, but seems to be offline at this moment), but this was the quick hack to do the CMLReact in\n<a href=\"http://docs.google.com/Doc?id=dq5m5bs_12hb8d2wcw\">Google Docs</a> (or soon to be):</p>\n\n<p><img src=\"/assets/images/ugiBioclipse.png\" alt=\"\" /></p>\n\n<p>And this is us this afternoon:</p>\n\n<p><img src=\"/assets/images/CIMG0503_s.JPG\" alt=\"\" /></p>",
      "summary": "The nice thing about a hacksession, is that you have something to write about. Below a screenshot of a Ugi reaction in Bioclipse… note the source tab of the editor, which holds the CML. Now, JChemPaint can do reactions too (I did that in 2003 in Peter’s group, but seems to be offline at this moment), but this was the quick hack to do the CMLReact in Google Docs (or soon to be):",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/CIMG0503_s.JPG",
      "date_published": "2008-08-31T00:00:00+00:00",
      "date_modified": "2025-10-11T00:00:00+00:00",
      "tags": ["bioclipse","cml","ugi","usefulchem"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/1kj8h-r5g51",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/08/31/science-blogging-2008-london-was-cool.html",
      "title": "Science Blogging 2008 London was Cool!",
      "content_html": "<p>Definately not a <em>first post</em>, but here are my experiences of my <a href=\"http://network.nature.com/group/sciblog2008\">first blogging conference</a>\n(see also <a href=\"https://chem-bla-ics.linkedchemistry.info/2008/08/29/leaving-to-science-blogging-2008-london.html\">this <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nand <a href=\"https://chem-bla-ics.linkedchemistry.info/2008/07/10/going-to-science-blogging-2008-london.html\">this <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\nthe latter using semantic markup for the event): it was fun! My suggested unconference was not chosen, because I, as I usually do,\nfocus to much on how instead of why one wants to do something. Nevertheless, I got to say my things, so I won’t complain. While I\nhave not noted a vivid live coverage in blogosphere of the conference, several people were\n<a href=\"https://web.archive.org/web/20080910235107/https://friendfeed.com/rooms/science-blogging-2008\">live covering the meeting <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>\non <a href=\"https://web.archive.org/web/20080828171106/http://friendfeed.com/\">FriendFeed <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>. Really nice, because you can comment on statements the speaker makes, while he is talking.\nPeople have been using the <em>sciblog</em> tag, which should give you enough hits in the various aggregators and social sites.</p>\n\n<p>The main thing I liked about this conference was the chance to meet fellow bloggers. I am not so much interested in why others blog,\nand generally not reading blogs about the scientific life. I have written up in the past why I blog, so read\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2007/01/11/why-do-i-blog.html\">that <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.\nWhat does interest me is how we can enhance blogs to make them\neasier to aggregate, search through, retrieve data, etc, etc. What I’d like to be able to do is read a blog item, note that it is\nabout topic I like, go of into <a href=\"http://taverna.sf.net/\">Taverna</a> or <a href=\"http://www.bioclipse.net/\">Bioclipse</a> (possible via\n<a href=\"http://rguha.wordpress.com/2008/08/30/ubiquity-and-chemical-information/\">Ubiquity</a>), and hit the <em>get me that data blob</em> button.\nNow, I don’t mind it being hidden behind a paper, being on Google Data, or whatever, I just want to simply hit that button.</p>\n\n<p>Returning readers of this blog that semantic chemistry is something I have worked on in the past, but while\n<a href=\"https://web.archive.org/web/20080723211045/http://cb.openmolecules.net/\">Chemical blogspace <i class=\"fa-solid fa-box-archive fa-xs\"></i></a> has a nice\n<a href=\"https://web.archive.org/web/20080723221106/http://cb.openmolecules.net/inchis.php\">people-blogged-about-this-molecule <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>\nsection, it has not really picked up. Main reason is, that people cannot or do not want to add semantic markup. Now, the one thing\nI like most of the conference discussions yesterday (the pub was too noisy for me to reasonably chat with anyone), was the proposal\nto use Ubiquity for adding these semantics. So, commands like <code class=\"language-plaintext highlighter-rouge\">addSechemticMarkup</code>, <code class=\"language-plaintext highlighter-rouge\">convertSMILESIntoInChIKey</code>, that sort of\nthings… The cool thing here, is that it is blogging service independent. It works for anything inside Firefox, including\nwikis, email, knols, whatever. Now, one obstacle is that Ubiquity involves a command line; and we know how much people dislike\ncommand lines, but I’m sure they will come up with Guiquity. Actually, maybe this is the\n<a href=\"http://www.kaply.com/weblog/2008/04/29/update-on-activities-microformats-and-operator/\">activities</a> that Mike has been talking about…</p>",
      "summary": "Definately not a first post, but here are my experiences of my first blogging conference (see also this and this , the latter using semantic markup for the event): it was fun! My suggested unconference was not chosen, because I, as I usually do, focus to much on how instead of why one wants to do something. Nevertheless, I got to say my things, so I won’t complain. While I have not noted a vivid live coverage in blogosphere of the conference, several people were live covering the meeting on FriendFeed . Really nice, because you can comment on statements the speaker makes, while he is talking. People have been using the sciblog tag, which should give you enough hits in the various aggregators and social sites.",
      
      "date_published": "2008-08-31T00:00:00+00:00",
      "date_modified": "2025-08-31T00:00:00+00:00",
      "tags": ["blog","science","friendfeed"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/ztzj3-29n36",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/08/29/leaving-to-science-blogging-2008-london.html",
      "title": "Leaving to Science Blogging 2008 London",
      "content_html": "<p>Have to leave to the airport any second now for the <a href=\"https://web.archive.org/web/20080508230626/http://network.nature.com/group/sciblog2008\">Science Blogging 2008 <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>\nin London, so nothing much I shall say. Hope to see you tomorrow at the Royal Institute!</p>\n\n<p>Update: <a href=\"https://web.archive.org/web/20080901224608/http://friendfeed.com/rooms/science-blogging-2008\">live coverage <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>\nat <a href=\"https://web.archive.org/web/20080902020422/http://friendfeed.com/\">Friend Feed <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>.</p>",
      "summary": "Have to leave to the airport any second now for the Science Blogging 2008 in London, so nothing much I shall say. Hope to see you tomorrow at the Royal Institute!",
      
      "date_published": "2008-08-29T00:00:00+00:00",
      "date_modified": "2025-08-21T00:00:00+00:00",
      "tags": ["blog","science"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/9se4x-ewm11",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/08/26/metware-screenshot-propagating-xml.html",
      "title": "MetWare screenshot: propagating XML Schema data types",
      "content_html": "<p>Just a quick screenshot. Remember our use of <a href=\"https://chem-bla-ics.linkedchemistry.info/2008/05/16/metware-status-report.html\">SKOS in MetWare <i class=\"fa-solid fa-recycle fa-xs\"></i></a>?\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2008/08/26/metware-screenshot-propagating-xml.html\">Steffen <i class=\"fa-solid fa-recycle fa-xs\"></i></a> has been working on creating integrated\n<a href=\"http://en.wikipedia.org/wiki/JavaServer_Faces\">JSF pages</a>, while I am focusing on autogeneration of blobs. The below screenshot is\nsuch a blob, called a UI component in JSF, which allows easy embedding the the aggregations Steffen is working on.</p>\n\n<p>Autogeneration of web content benefits greatly from well defined input, including data types.\n<a href=\"http://www.metware.org/\">MetWare</a> uses <a href=\"http://en.wikipedia.org/wiki/XML_Schema_(W3C)#Data_Types\">XML Schema Data Types</a>\nfor this, as <a href=\"https://chem-bla-ics.linkedchemistry.info/2008/08/21/metware-screenshot-spectrum-support-2.html\">mentioned ealier <i class=\"fa-solid fa-recycle fa-xs\"></i></a> when I briefly\nmentioned generation of search pages. That example showed the creation of range input on <code class=\"language-plaintext highlighter-rouge\">xsd:integer</code> types. The below screenshot\nshows the different output for <code class=\"language-plaintext highlighter-rouge\">xsd:string</code> (input text box) and <code class=\"language-plaintext highlighter-rouge\">xsd:boolean</code>:</p>\n\n<p><img src=\"/assets/images/boolDataType.png\" alt=\"\" /></p>\n\n<p>Now, this example is not really shocking, but MetWare defines additional types, for example an InChI data type:</p>\n\n<div class=\"language-xml highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nt\">&lt;simpleType</span> <span class=\"na\">name=</span><span class=\"s\">\"inchi\"</span><span class=\"nt\">&gt;</span>\n  <span class=\"nt\">&lt;restriction</span> <span class=\"na\">base=</span><span class=\"s\">\"string\"</span><span class=\"nt\">&gt;</span>\n    <span class=\"nt\">&lt;pattern</span> <span class=\"na\">value=</span><span class=\"s\">\"InChI=1/.*\"</span><span class=\"nt\">/&gt;</span>\n  <span class=\"nt\">&lt;/restriction&gt;</span>\n<span class=\"nt\">&lt;/simpleType&gt;</span>\n</code></pre></div></div>\n\n<p>This allows me to tweak the HTML output created by the JSF pages to include <a href=\"http://microformats.org/\">microformats</a> to support\nthe <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/05/05/cb-comments-for-inchis.html\">Sechemtic <i class=\"fa-solid fa-recycle fa-xs\"></i></a> userscript (see also\ndoi:<a href=\"https://doi.org/10.1186/1471-2105-8-487\">10.1186/1471-2105-8-487</a>).</p>\n\n<p>Or, to provide a drop down box, listing the allowed values:</p>\n\n<div class=\"language-xml highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nt\">&lt;simpleType</span> <span class=\"na\">name=</span><span class=\"s\">\"deviceVendor\"</span><span class=\"nt\">&gt;</span>\n  <span class=\"nt\">&lt;restriction</span> <span class=\"na\">base=</span><span class=\"s\">\"string\"</span><span class=\"nt\">&gt;</span>\n    <span class=\"nt\">&lt;enumeration</span> <span class=\"na\">value=</span><span class=\"s\">\"BioCrates\"</span><span class=\"nt\">/&gt;</span>\n    <span class=\"nt\">&lt;enumeration</span> <span class=\"na\">value=</span><span class=\"s\">\"Bruecker\"</span><span class=\"nt\">/&gt;</span>\n  <span class=\"nt\">&lt;/restriction&gt;</span>\n<span class=\"nt\">&lt;/simpleType&gt;</span>\n</code></pre></div></div>",
      "summary": "Just a quick screenshot. Remember our use of SKOS in MetWare ? Steffen has been working on creating integrated JSF pages, while I am focusing on autogeneration of blobs. The below screenshot is such a blob, called a UI component in JSF, which allows easy embedding the the aggregations Steffen is working on.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/boolDataType.png",
      "date_published": "2008-08-26T00:00:00+00:00",
      "date_modified": "2025-08-26T00:00:00+00:00",
      "tags": ["metware","xml","skos"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1186/1471-2105-8-487", "doi": "10.1186/1471-2105-8-487"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/5gzjj-p9s54",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/08/21/metware-screenshot-spectrum-support-2.html",
      "title": "MetWare screenshot: spectrum support #2",
      "content_html": "<p>As <a href=\"https://chem-bla-ics.linkedchemistry.info/2008/08/20/metware-screenshot-spectrum-support.html\">promised yesterday <i class=\"fa-solid fa-recycle fa-xs\"></i></a>, here’s the pretty\nvisualization of the mass spectrum, using JavaScript from the <a href=\"http://www.ebi.ac.uk/pride/#soft\">PRIDE project</a>:</p>\n\n<p><img src=\"/assets/images/msGUI.png\" alt=\"\" /></p>\n\n<p>Note the manual adding of peaks at 10 and 100 m/z to get the real peaks somewhere in the middle instead of on the left and right border of the graph.</p>\n\n<p>Meanwhile, the search page is now autogenerated too, and the types of searches allowed (min, max in the picture) again depends on the\nXML Scheme data type defined in the <a href=\"http://metware.svn.sourceforge.net/viewvc/metware/BigMet/trunk/src/main/onto/metware.skos?content-type=text%2Fxml&amp;revision=HEAD\">MetWare SKOS</a>:</p>\n\n<p><img src=\"/assets/images/mwSearch.png\" alt=\"\" /></p>",
      "summary": "As promised yesterday , here’s the pretty visualization of the mass spectrum, using JavaScript from the PRIDE project:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/msGUI.png",
      "date_published": "2008-08-21T00:00:00+00:00",
      "date_modified": "2025-10-11T00:00:00+00:00",
      "tags": ["metware","metabolomics","xml","skos"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/9m75g-4rh11",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/08/20/metware-screenshot-spectrum-support.html",
      "title": "MetWare screenshot: spectrum support",
      "content_html": "<p>Not visually attractive, but that will be solved when Steffen gets his hands on it. For now, I’m happy with a table formatting.\nReason: it uses XML Schema to define a dataType, which is recognized by our code generators in <a href=\"http://www.metware.org/\">MetWare</a>\n(see also <a href=\"https://chem-bla-ics.linkedchemistry.info/2008/05/16/metware-status-report.html\">this presentation <i class=\"fa-solid fa-recycle fa-xs\"></i></a>), and used to create a easy to use Java API, which, in turn, can be used in this JSF snippet:</p>\n\n<div class=\"language-xml highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nt\">&lt;h:dataTable</span> <span class=\"na\">value=</span><span class=\"s\">\"#{metobservCharacterizationMassspectrum.spectralPoints.points}\"</span> <span class=\"na\">var=</span><span class=\"s\">\"specpoint\"</span><span class=\"nt\">&gt;</span>\n  <span class=\"nt\">&lt;h:column&gt;</span>\n    <span class=\"nt\">&lt;f:facet</span> <span class=\"na\">name=</span><span class=\"s\">\"header\"</span><span class=\"nt\">&gt;&lt;h:outputText</span> <span class=\"na\">value=</span><span class=\"s\">\"m/z's\"</span><span class=\"nt\">/&gt;&lt;/f:facet&gt;</span>\n    <span class=\"nt\">&lt;h:outputText</span> <span class=\"na\">value=</span><span class=\"s\">\"#{specpoint.mz}\"</span><span class=\"nt\">/&gt;</span>\n  <span class=\"nt\">&lt;/h:column&gt;</span>\n  <span class=\"nt\">&lt;h:column&gt;</span>\n    <span class=\"nt\">&lt;f:facet</span> <span class=\"na\">name=</span><span class=\"s\">\"header\"</span><span class=\"nt\">&gt;&lt;h:outputText</span> <span class=\"na\">value=</span><span class=\"s\">\"Intensities\"</span><span class=\"nt\">/&gt;&lt;/f:facet&gt;</span>\n    <span class=\"nt\">&lt;h:outputText</span> <span class=\"na\">value=</span><span class=\"s\">\"#{specpoint.intensity}\"</span><span class=\"nt\">/&gt;</span>\n  <span class=\"nt\">&lt;/h:column&gt;</span>\n<span class=\"nt\">&lt;/h:dataTable&gt;</span>\n</code></pre></div></div>\n\n<p>The <code class=\"language-plaintext highlighter-rouge\">&lt;dataTable&gt; @value</code> points (via the <code class=\"language-plaintext highlighter-rouge\">faces-config.xml</code>) to the <code class=\"language-plaintext highlighter-rouge\">MetobservCharacterizationMassspectrumBean</code>, which has a\n<code class=\"language-plaintext highlighter-rouge\">getSpectralPoints()</code> method (autocreated from the <code class=\"language-plaintext highlighter-rouge\">&lt;skos:Concept&gt;</code> <code class=\"language-plaintext highlighter-rouge\">SpectralPoints</code>, which has a convenience method\n<code class=\"language-plaintext highlighter-rouge\">List&lt;SpectralPoint&gt; getPoints()</code>.</p>\n\n<p><code class=\"language-plaintext highlighter-rouge\">SpectralPoint</code> in turn has the methods <code class=\"language-plaintext highlighter-rouge\">getIntensity()</code> and <code class=\"language-plaintext highlighter-rouge\">getMz()</code> also used in the above JSF snippet. For convenience,\n<code class=\"language-plaintext highlighter-rouge\">SpectralPointArray</code> also has two other methods: <code class=\"language-plaintext highlighter-rouge\">double[] getIntensities()</code> and <code class=\"language-plaintext highlighter-rouge\">double[] getMzs()</code> (which I’ll have to\nrename to reuse the code for NMR support :).</p>\n\n<p>So, here’s the outcome:</p>\n\n<p><img src=\"/assets/images/msTable.png\" alt=\"\" /></p>\n\n<p>Final note, given the dataType, the MetWare bean also has the logic to convert the data back and forth into a SQL serialization,\nwhich may eventually use base64 encoding, but currently looks like <em>61.0,100.0;62.0,1.1</em>, as defined by the regular expression of\nthe XSD dataType for spectralPointArray.</p>",
      "summary": "Not visually attractive, but that will be solved when Steffen gets his hands on it. For now, I’m happy with a table formatting. Reason: it uses XML Schema to define a dataType, which is recognized by our code generators in MetWare (see also this presentation ), and used to create a easy to use Java API, which, in turn, can be used in this JSF snippet:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/msTable.png",
      "date_published": "2008-08-20T00:00:00+00:00",
      "date_modified": "2025-10-11T00:00:00+00:00",
      "tags": ["metware","xml","java"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/9xesf-et382",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/08/14/profiling-cdk-atom-typer.html",
      "title": "Profiling the CDK atom typer",
      "content_html": "<p>I was doing some profiling (<a href=\"http://yourkit.com/\">YourKit</a> and Eclipse3.4) of the <a href=\"http://cdk.sf.net/\">CDK</a> atom typer, and it turns out\nthat most time is spend on the perception of nitrogen atom types, which seems to be caused by the loadClassInternal() method of the JVM\n(<em>java-1.5.0-sun-1.5.0.16</em> on Ubuntu Hardy):</p>\n\n<p><img src=\"/assets/images/loadClassInternal.png\" alt=\"\" /></p>",
      "summary": "I was doing some profiling (YourKit and Eclipse3.4) of the CDK atom typer, and it turns out that most time is spend on the perception of nitrogen atom types, which seems to be caused by the loadClassInternal() method of the JVM (java-1.5.0-sun-1.5.0.16 on Ubuntu Hardy):",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/loadClassInternal.png",
      "date_published": "2008-08-14T00:00:00+00:00",
      "date_modified": "2008-08-14T00:00:00+00:00",
      "tags": ["cdk","java"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/na9qg-p4f95",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/08/06/scientific-progress-is-primary-human.html",
      "title": "Scientific progress is a primary human need",
      "content_html": "<p><a href=\"http://mndoci.com/blog\">Deepak</a> <a href=\"http://friendfeed.com/e/bb0667e2-7fbb-f3f5-3b15-696e9af8a492/Is-your-web-service-open-source/\">asked me</a> to comment\non his blog post <a href=\"http://mndoci.com/blog/2008/08/05/is-your-web-service-open-source/\">Is your web service open source?</a>. With a slight delay,\nI did on <a href=\"http://friendfeed.com/e/bb0667e2-7fbb-f3f5-3b15-696e9af8a492/Is-your-web-service-open-source/\">FriendFeed</a>. I’ll copy it here.</p>\n\n<p>The question is about getting return-on-investment: if I developed a new algorithm (or new efficient implementation), how can I make\nsome money with that, to feed me, continue development, maybe just maintenance. And, how does that work for scientific software, which\ncan best be opensource? So I replied:</p>\n\n<blockquote>\n  <p>Deepak, did not have time to read it earlier. I have not worked out monetizing open source chemoinformatics. As a scientist,\nI take the position that any implementation must be open source; that’s mere consequence of the scientific requirements for\npeer review and reproducibility. I do understand that further research has to be funded; by making code proprietary, the guy\ndoing the further research is the original author. That’s not necessarily the right thing for scientific progress.</p>\n\n  <p>As a human being, I need feeding. So, I certainly understand making code proprietary, as I have not seen much success in funding ROI\nvia support, though I do think this is the way to go, for scientific software that is. Web services are clear services, sort of\nconsultancy with human involvement. And consultancy is proven technology. Sell access to your service. Anyone can theoretically set\nit up, but practically… so you basically sell your IT expertise.</p>\n\n  <p>A third aspect is user friendly GUIs. Say, ChemOffice, say Bioclipse. These are also scientifically not interesting to develop. Bioclipse,\nbeing open source, is an interesting example. The core is open, free, any one can contribute, <em>and</em> embed that cool new algorithm easily.\nThis ‘plugin’ can be proprietary and sold commercially. No scientific shame, but with a chance for getting some ROI.</p>\n\n  <p>Science should be open, and never be a source of capitalism. I am not against capitalism, but I find it rather unethical to say, sure,\nyour starving (dying from AIDS, whatever) guy, surely I can help; it will just cost ya. Making money because people like buying big cars,\nPokemon cards, want their DNA sequenced, sure, no problem. But don’t start making money from primary needs. Scientific progress is a\nprimary need.</p>\n</blockquote>\n\n<p>Some more details on my background on these issues can be found in\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2007/12/13/i-dont-blame-individuals-in-commercial.html\">I don’t blame Individuals in Commercial Chemoinformatics <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nand <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/10/14/why-odosos-is-important.html\">Why ODOSOS is important <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.</p>",
      "summary": "Deepak asked me to comment on his blog post Is your web service open source?. With a slight delay, I did on FriendFeed. I’ll copy it here.",
      
      "date_published": "2008-08-06T00:00:00+00:00",
      "date_modified": "2025-08-26T00:00:00+00:00",
      "tags": ["cheminf","openscience"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/zsyxz-jvy06",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/08/06/mapping-peoples-interest-google-insight.html",
      "title": "Mapping Peoples Interest: Google Insight Search",
      "content_html": "<p><a href=\"https://google.com/\">Google</a> has a new service: <a href=\"http://www.google.com/insights/search/\">Google Insight Search</a>, and I was wondering if\nit could tell me to use <em>chemoinformatics or cheminformatics</em>… No, it can’t. In both there is a declining interest (only chemoinformatics\nshown):</p>\n\n<p><img src=\"/assets/images/chemoinf.trend.png\" alt=\"\" /></p>\n\n<p>More interesting is that the interest in chemoinformatics only comes from India:</p>\n\n<p><img src=\"/assets/images/chemoinf.map.png\" alt=\"\" /></p>\n\n<p>This tool holds for both flavors too.</p>",
      "summary": "Google has a new service: Google Insight Search, and I was wondering if it could tell me to use chemoinformatics or cheminformatics… No, it can’t. In both there is a declining interest (only chemoinformatics shown):",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/chemoinf.map.png",
      "date_published": "2008-08-06T00:00:00+00:00",
      "date_modified": "2008-08-06T00:00:00+00:00",
      "tags": ["google","cheminf"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/j4d10-0jt05",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/08/03/end-of-theory-data-deluge-makes.html",
      "title": "&quot;The End of Theory: The Data Deluge Makes the Scientific Method Obsolete&quot;",
      "content_html": "<p>The thought triggering editorial <a href=\"http://www.wired.com/science/discoveries/magazine/16-07/pb_theory\">“The End of Theory: The Data Deluge Makes the Scientific Method Obsolete”</a>\nby <a href=\"http://www.wired.com/services/feedback/letterstoeditor\">Chris Anderson</a> can’t have escaped your attention. I was shocked when I read the title\nand the comments made on the blogosphere and on <a href=\"http://friendfeed.com/\">FriendFeed</a>.</p>\n\n<p>How can he say that?! There is no analysis of data anymore?!? Don’t we need to understand why X correlated with Y?!? Etc etc.</p>\n\n<p>So, when I read <a href=\"http://miningdrugs.blogspot.com/2008/08/data-models-or-both.html\">yet another comment</a>, by my respected opensource\nchemoinformatician <a href=\"http://miningdrugs.blogspot.com/\">Joerg</a>, I just had to read the piece myself. Joerg disagrees with the statement\nfrom Chris’ editorial that</p>\n\n<blockquote>\n  <p>[c]orrelation supersedes causation, and science can advance even without coherent models, unified theories, or really any\nmechanistic explanation at all.</p>\n</blockquote>\n\n<p>At first, I would agree with Joerg. It’s nonsense; any QSAR modeler can explain in details the dangers of overfitting, extrapolation,\netc, etc. Not to mention that basically zero mathematical modeling methods can create a statistical signification non-zero regression model with less than 50-100 chemical structures (chemical diversity dependent, etc).</p>\n\n<p>Ok, back to the editorial. There are some arguments on Google, tons of data. Number of incoming links as measure of page importance (brilliant choice, but actually a model, IMHO, which Chris seems to step over). Tons of data. Oh, mentioned that already.</p>\n\n<p>Mmmmm… but wait. Tons of data? The editorial actually refers to <a href=\"http://en.wikipedia.org/wiki/Petabyte\">petabytes</a>:\n<em>Petabytes are stored in the cloud</em>. (Whatever the cloud is… just another buzzword,\n<a href=\"http://news.slashdot.org/article.pl?sid=08/08/02/2224217&amp;from=rss\">trademarketed too</a>, it seems).</p>\n\n<h2 id=\"eureka-chris-is-right-joerg-is-wrong\">Eureka! Chris is right, Joerg is wrong!</h2>\n\n<p>Yes! Then it hit me, Chris is actually correct in his statement, and I was wrong (and Joerg too). If we move away from 50-100 molecules\nin our QSAR training, but use 10k of chemically alike molecules, then our modeling approaches (if capable of handling the matrices)\nwould have a much, much smaller chance for overfitting, extrapolation (there is much, much more interpolation now), etc. The chances\nof getting random correlation become insignificant! Actually, Chris is making the argument QSAR modelists have been making for decades:\nwe do not know the mode of action in detail, as we can make, given enough training data, a reasonable regression model to predict the\naction! Joerg and I have been making the same argument as Chris in our PhD theses! We do not need theory; our QSAR regressions make\ntheory obsolete! (Well, surely, we’d still prefer the theory behind the action, but we lack the measuring techniques to see what\nactually is happening. Joerg, still agreeing with you, so to say ;)</p>\n\n<p>Except for one thing. Joerg and I suggested ‘enough’ molecules are required for statistical sound regression. Chris, on the other hand,\neven makes the point that regression is no longer needed at all at the petabyte scene: we just look up what is happening. Does this hold\nfor chemistry? For QSAR? Petabyte data equals about, say 10kB data per structure, maybe less if we use InChI and neglect conformer info,\n100.000.000.000 structures. About 5000 times <a href=\"http://chemspider.com/\">ChemSpider</a>, if not miscounting the zeros (we don’t care about a\nten-fold at this scale anymore). Maybe, maybe not. Maybe chemical space is too diverse for that, considering a petabyte of chemical\nstructures is enormously insignificant to the full drugable space (was about 10⁶⁰, not?)</p>\n\n<p>But not at all? This lookup approach is actually commonly used in chemoinformatics! Even at a way-below-pentybyte scale:\nHOSE-code-based NMR prediction is a nice example of this! We do not theorize on the chemical carbon NMR shift, we just look it up!</p>\n\n<p>Certainly worth reading, this <a href=\"http://www.wired.com/\">Wired</a> editorial!</p>\n\n<p>PS. One last remark on the title… I’d say the the <em>scientific method</em> is more than just making theories… I feel a bit left\nout as data analyst… :( I guess the title should have said ‘one of the Scientific Methods’…</p>",
      "summary": "The thought triggering editorial “The End of Theory: The Data Deluge Makes the Scientific Method Obsolete” by Chris Anderson can’t have escaped your attention. I was shocked when I read the title and the comments made on the blogosphere and on FriendFeed.",
      
      "date_published": "2008-08-03T00:00:00+00:00",
      "date_modified": "2008-08-03T00:00:00+00:00",
      "tags": ["cheminf"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/d2nsd-pf553",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/08/01/online-multiplayer-metabolomics-game.html",
      "title": "Online, multiplayer metabolomics game!",
      "content_html": "<p>I was just organizing my <a href=\"http://delicious.com/egonw/toread\">toread</a>s, when I found this link: <a href=\"http://www.metabolaspel.nl/\">metabolaspel.nl</a>,\nan online, multiplayer metabolomics game! It’s in Dutch, but I guess anyone will get the idea :) Two teams, each may have two players, fight\neach other in sugar-fat conversion, by tuning the metabolism parameters:</p>\n\n<p><img src=\"/assets/images/mbgame.png\" alt=\"\" /></p>\n\n<p>The game board should look familiar:</p>\n\n<p><img src=\"/assets/images/mbgame2.png\" alt=\"\" /></p>\n\n<p>I finally found a worthy follow up for <a href=\"http://en.wikipedia.org/wiki/Civilization_(computer_game)\">Civilization</a> :)</p>",
      "summary": "I was just organizing my toreads, when I found this link: metabolaspel.nl, an online, multiplayer metabolomics game! It’s in Dutch, but I guess anyone will get the idea :) Two teams, each may have two players, fight each other in sugar-fat conversion, by tuning the metabolism parameters:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/mbgame2.png",
      "date_published": "2008-08-01T00:00:00+00:00",
      "date_modified": "2008-08-01T00:00:00+00:00",
      "tags": ["metabolomics"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/bsv98-6bz80",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/07/26/cdk-literature-5.html",
      "title": "CDK Literature #5",
      "content_html": "<p>Time flies. Another CDK Literature (see also\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2007/01/14/cdk-literature-1.html\">#1 <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2007/07/14/cdk-literature-2.html\">#2 <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2008/01/03/cdk-literature-3.html\">#3 <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2008/01/06/cdk-literature-4.html\">#4 <i class=\"fa-solid fa-recycle fa-xs\"></i></a>).\nQuite a few papers have been published again, and I’ll briefly discuss a few of them.</p>\n\n<h2 id=\"detection-of-iupac-names\">Detection of IUPAC names</h2>\n<p>Klinger et al. have written a paper on detection of IUPAC names. As long as semantic markup languages are not the default,\nthis remains important. Remaining problems include correctly finding boundaries in summaries of chemical. The\n<a href=\"http://cdk.sf.net/\">CDK</a> has been used to create <a href=\"http://www.opensmiles.org/\">SMILES</a>.\n<em>Roman Klinger, Corinna Kolárik, Juliane Fluck, Martin Hofmann-Apitius, Christoph M. Friedrich, Detection of IUPAC and\nIUPAC-like chemical names, Bioinformatics 2008 24(13):i268-i276; doi:<a href=\"https://doi.org/10.1093/bioinformatics/btn181\">10.1093/bioinformatics/btn181</a></em></p>\n\n<h2 id=\"structure-elucidation\">Structure elucidation</h2>\n<p>Elyashberg, <a href=\"http://www.chemspider.com/blog/\">Williams</a> and Martin wrote a review on structure elucidation and discuss\n<a href=\"http://www.steinbeck-molecular.de/steinblog/\">Steinbeck</a>’s Seneca software, which uses components of the CDK, though\nthe CDK is not directly mentioned.\n<em>M.E. Elyashberg, A.J. Williams, G.E. Martin, Computer-assisted structure verification and elucidation tools in NMR-based\nstructure elucidation, Progress in Nuclear Magnetic Resonance Spectroscopy, 2008, 53(1-2):1-104,\ndoi:<a href=\"https://doi.org/10.1016/j.pnmrs.2007.04.003\">10.1016/j.pnmrs.2007.04.003</a></em></p>\n\n<h2 id=\"opensource-distributed-chemical-computing\">Opensource Distributed Chemical Computing</h2>\n<p>Karthikeyan et al. have published <a href=\"http://moltable.ncl.res.in/chemstar/\">ChemStar</a>, an opensource distributed chemical\ncomputing system, build on top the Java Remote Method Invocation architecture, used by the original Seneca too. The\nCDK paper and a <a href=\"http://prdownloads.sourceforge.net/cdk/cdknews3.2.pdf?download\">Fechner/Guha’s CDK News</a> paper are\ncited in relation to a ChemStar application of benchmarking QSAR descriptors. The article does not seem to mention\nthe opensource license, nor have I yet found a source package download.\n<em>M. Karthikeyan, S. Krishnan, A.K. Pandey, A. Bender, A. Tropsha, Distributed Chemical Computing Using ChemStar:\nAn Open Source Java Remote Method Invocation Architecture Applied to Large Scale Molecular Data from PubChem,\nJ. Chem. Inf. Model., 48 (4), 691–703, 2008. <a href=\"https://doi.org/10.1021/ci700334f\">10.1021/ci700334f</a></em></p>\n\n<h2 id=\"tavernas-apiconsumer\">Taverna’s APIConsumer</h2>\n<p>Taverna has several means of making functionality available to the workflow engine. SOAP and BioMoby are two\nprominent ones. The APIConsumer is another one, and described in this paper. The\n<a href=\"http://www.cdk-taverna.de/\">CDK-Taverna</a> project lead by <a href=\"http://cdktaverna.wordpress.com/\">Thomas Kuhn</a>,\nis mentioned as another project that uses this approach.\n<em>Peter Li, Tom Oinn, Stian Soiland, Douglas B. Kell, Automated manipulation of systems biology models using\nlibSBML within Taverna workflows, Bioinformatics 2008 24(2):287-289, doi:<a href=\"https://doi.org/10.1093/bioinformatics/btm578\">10.1093/bioinformatics/btm578</a></em></p>\n\n<h2 id=\"docking-for-substrate-identification\">Docking for Substrate Identification</h2>\n<p>Favia uses docking to recognize interesting substrates for short-chain dehydrogenases/reductases. The CDK’s\nfingerprinter is used to describe intermolecular similarity, by calculating the Tanimoto distances between the\nbit strings.\n<em>Angelo D. Favia1, Irene Nobeli, Fabian Glaser, Janet M. Thornton, Molecular Docking for Substrate Identification:\nThe Short-Chain Dehydrogenases/Reductases, Journal of Molecular Biology, 2008, 375(3):855-874,\ndoi:<a href=\"http://dx.doi.org/10.1016/j.jmb.2007.10.065\">10.1016/j.jmb.2007.10.065</a></em></p>",
      "summary": "Time flies. Another CDK Literature (see also #1 , #2 , #3 , #4 ). Quite a few papers have been published again, and I’ll briefly discuss a few of them.",
      
      "date_published": "2008-07-26T00:00:00+00:00",
      "date_modified": "2025-08-26T00:00:00+00:00",
      "tags": ["cdk"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1093/bioinformatics/btn181", "doi": "10.1093/bioinformatics/btn181"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1016/j.pnmrs.2007.04.003", "doi": "10.1016/j.pnmrs.2007.04.003"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1021/ci700334f", "doi": "10.1021/ci700334f"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1093/bioinformatics/btm578", "doi": "10.1093/bioinformatics/btm578"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1016/j.jmb.2007.10.065", "doi": "10.1016/j.jmb.2007.10.065"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/4a21q-zaj66",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/07/23/molecular-qsar-descriptors-in-cdk.html",
      "title": "Molecular QSAR descriptors in the CDK",
      "content_html": "<p>Rajarshi has patched trunk last night with his work to address a few practical issues in the molecular descriptor\nmodule of the <a href=\"http://cdk.sf.net/\">CDK</a> (and I <a href=\"https://chem-bla-ics.linkedchemistry.info/2008/07/23/commercial-qsar-modeling-sorry-already.html\">peer reviewed this work yesterday <i class=\"fa-solid fa-recycle fa-xs\"></i></a>).\nOne major change is that the <a href=\"http://cheminfo.informatics.indiana.edu/~rguha/code/java/nightly/api/org/openscience/cdk/qsar/IMolecularDescriptor.html\">IMolecularDescriptor</a>\n<code class=\"language-plaintext highlighter-rouge\">calculate()</code> method no longer throws an <code class=\"language-plaintext highlighter-rouge\">Exception</code>, but returns <code class=\"language-plaintext highlighter-rouge\">Double.NaN</code> instead. The Exception is stored in the <code class=\"language-plaintext highlighter-rouge\">DescriptorValue</code> for convenience.\nThis simplifies the QSAR descriptor calculation considerably, and, importantly, makes it more robust to the input. Though only by propagating errors into\ndescriptor matrix. <em>Just make sure your molecular structures have explicit hydrogens and 3D coordinates, and you’re fine.</em></p>\n\n<p>Anyway, Rajarshi also added a new page to <a href=\"http://cheminfo.informatics.indiana.edu/~rguha/code/java/nightly/\">CDK Nightly</a>\nto <a href=\"http://cheminfo.informatics.indiana.edu/~rguha/code/java/nightly/dnames.html\">list the available descriptors</a>:</p>\n\n<p><img src=\"/assets/images/descNames.png\" alt=\"\" /></p>",
      "summary": "Rajarshi has patched trunk last night with his work to address a few practical issues in the molecular descriptor module of the CDK (and I peer reviewed this work yesterday ). One major change is that the IMolecularDescriptor calculate() method no longer throws an Exception, but returns Double.NaN instead. The Exception is stored in the DescriptorValue for convenience. This simplifies the QSAR descriptor calculation considerably, and, importantly, makes it more robust to the input. Though only by propagating errors into descriptor matrix. Just make sure your molecular structures have explicit hydrogens and 3D coordinates, and you’re fine.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/descNames.png",
      "date_published": "2008-07-23T00:00:00+00:00",
      "date_modified": "2025-10-11T00:00:00+00:00",
      "tags": ["cdk","qsar"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/ejb9x-yf544",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/07/23/commercial-qsar-modeling-sorry-already.html",
      "title": "Commercial QSAR modeling? Sorry, already patented...",
      "content_html": "<p><a href=\"http://www.freepatentsonline.com/y2001/0049585.html\">QSAR has been patented</a> in 2001 (US patent 20010049585).</p>\n\n<p>Claim 1:</p>\n\n<blockquote>\n  <p>A method for predicting a set of chemical, physical or biological features related to chemical substances or\nrelated to interactions of chemical substances using a system comprising a plurality of prediction means, the\nmethod comprising using at least 16 different individual prediction means, thereby providing an individual\nprediction of the set of features for each of the individual prediction means and predicting the set of features\non the basis of combining the individual predictions, the combining being performed in such a manner that the\ncombined prediction is more accurate on a test set than substantially any of the predictions of the individual\nprediction means.</p>\n</blockquote>\n\n<p>They use averaging or weighted averaging of the individual predictions (claim 2). Oh, and just in case you think\nyou are clever and you use 17, 32, etc individual predictions. Sorry, no luck either; you have to use way beyond\n1M individual predictions according to the following claim ;)</p>\n\n<p>Claim 2:</p>\n\n<blockquote>\n  <p>A method according to claim 1, wherein the number of different predictions means is at least 20, such as at least\n30, such as at least 40, 50, 75, 100, 200, 400, 600, 800, 1000, 1200, 1400, 1600, 1800, 2000, 2500, 3000, 4000, 5000,\n6000, 7000, 8000, 9000, 10,000, 15,000, 20,000, 30,000, 40,000, 50,000, 100,000, 200,000, 500,000, 1,000,000.</p>\n</blockquote>\n\n<p>What can I say about this? Please leave your opinion in the comments…</p>",
      "summary": "QSAR has been patented in 2001 (US patent 20010049585).",
      
      "date_published": "2008-07-23T00:00:00+00:00",
      "date_modified": "2008-07-23T00:00:00+00:00",
      "tags": ["qsar"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/7z23e-kmf12",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/07/22/peer-reviewed-chemoinformatics-why.html",
      "title": "Peer reviewed Chemoinformatics: Why OpenSource Chemoinformatics should be the default",
      "content_html": "<p>The battle for scientific publishing is continuing: openaccess, peer reviewing, how much does it cost, who should\npay it, is the data in papers copyrighted, etc, etc.</p>\n\n<p>The battle for chemoinformatics, however, has not even started yet. The Blue Obelisk paper\n(doi:<a href=\"https://doi.org/10.1021/ci050400b\">10.1021/ci050400b</a>) has gotten a lot of attention, and citations. But\nclosed source chemoinformatics is doing fine, and have not really openly taken a standpoint against open source\nchemoinformatics. Actually, <a href=\"http://www.cambridgesoft.com/\">CambridgeSoft</a> just\n<a href=\"http://www.biospace.com/news_story.aspx?StoryID=103955&amp;full=1\">received a good investment</a>. I wonder how this\ninvestment will be used, and where the ROI will come from. More closed data and closed algorithms? Focus on\nservices? Early access privileges? At least they had something convincing.</p>\n\n<p>There are many degrees of openness, and many business models. I value open source chemoinformatics, or chemblaics,\nas I call it. There is a striking similarity between publishing and chemoinformatics. Both play an important role\nin the progress of sciences. A big difference is that (independent) peer review of published results is done in\nscientific publishing, but not generally to chemoinformatics. Surely, algorithms are published… Ah, no; they are\nnot. They are described. Ask any chemoinformatician why this subtle difference is causing headaches…</p>\n\n<p>Let me just briefly stress the difference between core chemoinformatics, and GUI applications. The first <em>must</em>\nbe opensource, to allow <strong>independent</strong> <em>Peer Review</em>; the latter is just nice to have as opensource.\n<a href=\"http://www.bioclipse.net/\">Bioclipse</a> is the GUI (doi:<a href=\"https://doi.org/10.1186/1471-2105-8-59\">10.1186/1471-2105-8-59</a>),\nwhile the <a href=\"http://cdk.sf.net/\">CDK</a> is our peer-reviewed chemoinformatics library\n(pmid:<a href=\"http://www.ncbi.nlm.nih.gov/pubmed/16796559\">16796559</a>). I would also like to stress that the CDK is\n<a href=\"http://www.gnu.org/licenses/lgpl.html\">LGPL</a>, allowing the opensource chemoinformatics library to be used in\nproprietary GUI software. We deliberately choose this license, to allow embedding in proprietary code. The\n<a href=\"http://icodons.com/iCODONS/introducing-java-molecular-descriptor-library/\">Java Molecular Descriptor Library</a>\nof <a href=\"http://icodons.com/\">iCODONS</a> is an example of this (that is, AFAIK it’s not opensource).</p>\n\n<p>So, getting back to that CambridgeSoft investment. I really hope they search the ROI in the added value of the\nuser friendly GUI, and not in the chemoinformatics algorithm implementations, which, IMHO, should be peer-reviewed,\nthus open source. Meanwhile, I will continue working on the CDK project to provide open source chemoinformatics\nalgorithms implementations, for use in opensource <em>and</em> proprietary chemoinformatics GUIs.</p>",
      "summary": "The battle for scientific publishing is continuing: openaccess, peer reviewing, how much does it cost, who should pay it, is the data in papers copyrighted, etc, etc.",
      
      "date_published": "2008-07-22T00:00:00+00:00",
      "date_modified": "2008-07-22T00:00:00+00:00",
      "tags": ["cheminf","blue-obelisk"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1021/CI050400B", "doi": "10.1021/CI050400B"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1186/1471-2105-8-59", "doi": "10.1186/1471-2105-8-59"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.2174/138161206777585274", "doi": "10.2174/138161206777585274"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/20fkj-9h122",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/07/15/metabolomics-needs-you.html",
      "title": "Metabolomics needs you",
      "content_html": "<p>Over on <a href=\"http://metabolomicsineu.blogspot.com/\">Metabolomics In Europe</a> I posted a ad for an\n<a href=\"http://metabolomicsineu.blogspot.com/2008/07/researcher-bioinformatics-for.html\">open metabolomics position in our group</a>.\nGo check it out!</p>",
      "summary": "Over on Metabolomics In Europe I posted a ad for an open metabolomics position in our group. Go check it out!",
      
      "date_published": "2008-07-15T00:00:00+00:00",
      "date_modified": "2008-07-15T00:00:00+00:00",
      "tags": ["metabolomics"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/e63j2-r6057",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/07/10/going-to-science-blogging-2008-london.html",
      "title": "Going to Science Blogging 2008: London",
      "content_html": "<div typeof=\"event:Vevent\" xmlns:event=\"http://www.w3.org/2002/12/cal#\">On <span property=\"event:dtstart\" content=\"2008-08-30\">Saturday 30th of August</span>\nI'll be in <span property=\"event:location\">London</span> attending the <a href=\"http://network.nature.com/forum/sciblog2008\" property=\"event:summary\">Science Blogging 2008</a>\nevent. The Monday following that, I'll meet friends at the EBI, but Sunday is empty so far. I'd love to meet up that Sunday, so just ping me if interested.</div>\n<p><br />\nOh, and this blog is using RDFa to markup the event, as discussed <a href=\"http://www.pemberton.nl/vandf/2008/06/how-to-do-hcalendar-in-rdfa.html\">here</a>.</p>",
      "summary": "On Saturday 30th of August I'll be in London attending the Science Blogging 2008 event. The Monday following that, I'll meet friends at the EBI, but Sunday is empty so far. I'd love to meet up that Sunday, so just ping me if interested. Oh, and this blog is using RDFa to markup the event, as discussed here.",
      
      "date_published": "2008-07-10T00:00:00+00:00",
      "date_modified": "2008-07-10T00:00:00+00:00",
      "tags": ["blog","rdf"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/n7grt-dxe17",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/07/09/chemoinformatics-p0wned-by.html",
      "title": "Chemoinformatics p0wned by cheminformatics...",
      "content_html": "<p><a href=\"http://baoilleach.blogspot.com/\">Noel</a> had a <a href=\"http://baoilleach.blogspot.com/2008/07/chemoinformatics-p0wned-by.html\">40 people vote over chemoinformatics versus cheminformatics</a>.\nWhat do you think?</p>\n\n<p>I have thrown in two extra options: <strong>chemblaics</strong> (from my blog: <em>chemblaics (pronounced chem-bla-ics) is the science that uses computers\nto address and possibly solve problems in the area of chemistry, biochemistry and related fields. The big difference between chemblaics and\nareas as chem(o)?informatics, chemometrics, computational chemistry, etc, is that chemblaics only uses open source software, making\nexperimental results reproducable and validatable.</em>) and <strong>bioinformatics</strong> (in case you believe all is life sciences now).</p>",
      "summary": "Noel had a 40 people vote over chemoinformatics versus cheminformatics. What do you think?",
      
      "date_published": "2008-07-09T00:00:00+00:00",
      "date_modified": "2008-07-09T00:00:00+00:00",
      "tags": ["cheminf"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/x7ztm-91j37",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/07/05/svn-commit-hooks-down-for-cdk-and.html",
      "title": "SVN commit hooks down for CDK and Bioclipse",
      "content_html": "<p>SourceForge has been playing with system upgrades again, and in an attempt to debug the failing CIA\ncommits on IRC, I reinstalled the hooks for <a href=\"http://cdk.sf.net/\">CDK</a> and <a href=\"http://www.bioclipse.net/\">Bioclipse</a>,\nso that now all hooks seem to fail, including the email hook… Apparently, it is a known bug, e.g. see\n<a href=\"https://sourceforge.net/tracker/index.php?func=detail&amp;aid=2011306&amp;group_id=1&amp;atid=200001\">this bug report</a>.\nI assume SF will fix this soon.</p>\n\n<p>On the bright side, I also noted an updated webpage for <a href=\"http://sourceforge.net/community/forum/forum.php?id=11&amp;page\">SF uptime/problem tracker</a>,\nwhere it is also reported that stats are currently down for upgrade. There also has an\n<a href=\"http://sourceforge.net/community/forum/rss.php?forum=11\">RSS feed</a>, which I recommend as a good monitoring\ntool for SF site problems.</p>",
      "summary": "SourceForge has been playing with system upgrades again, and in an attempt to debug the failing CIA commits on IRC, I reinstalled the hooks for CDK and Bioclipse, so that now all hooks seem to fail, including the email hook… Apparently, it is a known bug, e.g. see this bug report. I assume SF will fix this soon.",
      
      "date_published": "2008-07-05T00:00:00+00:00",
      "date_modified": "2008-07-05T00:00:00+00:00",
      "tags": ["cdk","bioclipse","rss"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/pnvde-gex62",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/07/04/moving-to-sweden-improving-cdk-support.html",
      "title": "Moving to Sweden: Improving CDK support in Bioclipse",
      "content_html": "<p><span style=\"width: 30%; display: block; margin-left: auto; margin-right: auto; float: right\">\n<img src=\"/blog//assets/images/nmc.png\" />\n</span>\nThis autumn I will end my current post-doc position at <a href=\"http://www.pri.wur.nl/\">Plant Research International</a> in the\n<a href=\"http://www.ab.wur.nl/\">Applied Bioinformatics</a> group and at <a href=\"http://www.biometris.nl/\">Biometris</a> (both part of\n<a href=\"http://www.wur.nl/\">Wageningen University</a>) funded by the <a href=\"http://www.metabolomicscentre.nl/\">Netherlands Metabolomics Center</a>\n(lot’s of vacancies), where I had a good time, and collaborated in several projects within the NMC with much pleasure.</p>\n\n<p>However, personal circumstances strengthened an older wish of me and my family to seek the adventure of living abroad,\nand a vacancy was available in the <a href=\"http://www.farmbio.uu.se/researchgroup.php?fg=1\">group of Prof. Wikberg</a>.\nSo, we are moving to <a href=\"http://en.wikipedia.org/wiki/Sweden\">Sweden</a>. There, I will extend my research on effectively combining\nchemoinformatics (sometimes misspelled as <em>cheminformatics</em> ;) and chemometrics, as I did in my PhD, which fits well with the\ndevelopment of <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/10/11/are-chemogenomics-and.html\">proteochemotrics <i class=\"fa-solid fa-recycle fa-xs\"></i></a> methodology and\n<a href=\"http://www.bioclipse.net/\">Bioclipse</a> as platform to transform scientific hypotheses into data queries.</p>",
      "summary": "This autumn I will end my current post-doc position at Plant Research International in the Applied Bioinformatics group and at Biometris (both part of Wageningen University) funded by the Netherlands Metabolomics Center (lot’s of vacancies), where I had a good time, and collaborated in several projects within the NMC with much pleasure.",
      "image": "https://chem-bla-ics.linkedchemistry.info/blog//assets/images/nmc.png",
      "date_published": "2008-07-04T00:00:00+00:00",
      "date_modified": "2025-10-05T00:00:00+00:00",
      "tags": ["career","metabolomics","cdk"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/n2k7b-f1d65",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/06/29/cdk-community-developers-members-and.html",
      "title": "The CDK Community: Developers, Members, and Users",
      "content_html": "<p>An open source project is as good as its community. <a href=\"http://www.jmol.org/\">Jmol</a> has a brilliant community, but <a href=\"http://cdk.sf.net/\">CDK</a> is not doing\nbad either, in general at least; some CDK projects could use some more user feedback, such as <a href=\"http://www.cdk-taverna.net/\">CDK-Taverna</a>\n(site down at the time of writing, but see the <a href=\"http://cdktaverna.wordpress.com/2008/05/30/cdk-taverna-version-050-release/\">blog</a>).</p>\n\n<p>There are actually quite a few things going on within the CDK Community. There are a few active or less active projects:</p>\n\n<ul>\n  <li>CDK Library <em>(The main and most well-known component))</em></li>\n  <li>CDK-Taverna <em>(just mentioned)</em></li>\n  <li><a href=\"http://cdknews.org/\">CDK News</a> <em>(Not active enough, please contribute!)</em></li>\n  <li><a href=\"http://cheminfo.informatics.indiana.edu/~rguha/code/java/nightly/\">CDK Nightly</a> <em>(Rajarshi’s nightly build and check service, for trunk/ and <a href=\"http://cheminfo.informatics.indiana.edu/~rguha/code/java/nightly-1.0.x/\">cdk1.0.x/</a>)</em></li>\n  <li>JChemPaint <em>(For which a <a href=\"http://chem-bla-ics.blogspot.com/search?q=jchempaint\">new version is being developed</a>.)</em></li>\n</ul>\n\n<p>(Even withing the CDK Library, several threads are ongoing, but I will report on that at some later stage).</p>\n\n<p>A full <a href=\"http://cdk.svn.sourceforge.net/viewvc/cdk/\">list of projects is found in SVN</a>. A recent new project is <em>CDK Policy</em>,\nwhich will attempt to formalize code development such that the library becomes better maintainable. One of the first things\nthe draft does, is formalize roles withing the community.</p>\n\n<h2 id=\"cdk-members-and-cdk-developers\">CDK Members and CDK Developers</h2>\n<p>Basically, anyone with write access to <a href=\"http://cdk.svn.sourceforge.net/viewvc/cdk/\">CDK’s SVN</a> may call himself <em>CDK Developer</em>\n(<a href=\"http://sourceforge.net/project/memberlist.php?group_id=20024\">57 in total</a>). So, what if you contributed patches? Then you may\ncall yourself <em>CDK Member</em> (well, as suggested in the draft policy), like anyone else who is subscribed to the <em>cdk-devel</em>\nmailing list.</p>\n\n<p>This is an important fact. Anyone can subscribe to the list, and directly becomes active CDK Member; He/she gets a voice. The\npolicy proposes that difference between member and developer to be in the fact that the policy has been accepted by the person.\nTherefore, according to the draft policy, <em>anyone</em> who <strong>accepts</strong> the policy gets SVN write access, making the CDK not just\nOpen Source, but an Open Community too. The policy then organizes the maintainability of the software development.</p>\n\n<h2 id=\"cdk-users\">CDK Users</h2>\n<p>A CDK User is basically anyone who uses any of the CDK products, but in particular those subscribed to the cdk-user mailing list.</p>\n\n<p>A limited overview of developers, members and users can be found on <a href=\"http://cdk.sourceforge.net/maps/people/\">this Google map</a>:</p>\n\n<p><img src=\"/assets/images/cdkMap.png\" alt=\"\" /></p>\n\n<p>Just email me (or cdk-user) your latitude/longtitude to have yourself or your research group (URL) linked on this map as user\nor developer.</p>",
      "summary": "An open source project is as good as its community. Jmol has a brilliant community, but CDK is not doing bad either, in general at least; some CDK projects could use some more user feedback, such as CDK-Taverna (site down at the time of writing, but see the blog).",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/cdkMap.png",
      "date_published": "2008-06-29T00:00:00+00:00",
      "date_modified": "2008-06-29T00:00:00+00:00",
      "tags": ["cdk"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/895qm-mnq80",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/06/18/httpchem-bla-icsblogspotcom200805develo.html",
      "title": "The SWT JChemPaint (viewing) widget",
      "content_html": "<p>In addition to this <a href=\"https://chem-bla-ics.linkedchemistry.info/2008/05/19/development-of-new-jchempaint.html\">Swing-based screenshot of JChemPaint <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\nhere’s a SWT widget in action (lower right corner):</p>\n\n<p><img src=\"/assets/images/jcp3inAction.png\" alt=\"\" /></p>\n\n<p>Not quite as beautiful as the <a href=\"http://metamolecular.com/chemwriter/\">ChemWriter</a>, but a start.</p>",
      "summary": "In addition to this Swing-based screenshot of JChemPaint , here’s a SWT widget in action (lower right corner):",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/jcp3inAction.png",
      "date_published": "2008-06-18T00:00:00+00:00",
      "date_modified": "2025-10-11T00:00:00+00:00",
      "tags": ["cdk","jchempaint","bioclipse"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/kytwf-2c912",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/06/17/graphical-overview-of-my-bookmarks.html",
      "title": "Graphical overview of my bookmarks",
      "content_html": "<p><a href=\"http://mndoci.com/blog/\">Deepak</a> informed me about <a href=\"http://wordle.net/\">Wordle</a> via a <a href=\"http://friendfeed.com/e/28644d08-3c21-11dd-88bc-003048343a40/Wordle-mndociondelicious/\">FriendFeed notice</a>,\nwhich can make nice visualisations of tag clouds. Here’s the one for <a href=\"http://del.icio.us/egonw\">my del.icio.us account</a>:</p>\n\n<p><img src=\"/assets/images/wordle.png\" alt=\"\" /></p>\n\n<p>You can clearly see I have quite some reading up to do :)</p>",
      "summary": "Deepak informed me about Wordle via a FriendFeed notice, which can make nice visualisations of tag clouds. Here’s the one for my del.icio.us account:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/wordle.png",
      "date_published": "2008-06-17T00:00:00+00:00",
      "date_modified": "2008-06-17T00:00:00+00:00",
      "tags": ["friendfeed"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/j9rb9-ns27",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/06/05/recovering-full-mass-spectra-from-gc-ms_05.html",
      "title": "Recovering full mass spectra from GC-MS data #2",
      "content_html": "<p>Steffen reminded me over email that the particular machine only has a 1 dalton accuracy, and that the 150ppm\nparameter setting is somewhat inappropriate. As <a href=\"https://chem-bla-ics.linkedchemistry.info/2008/06/04/recovering-full-mass-spectra-from-gc-ms.html\">seen yesterday <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\nit works fine for larger peaks, but fails for low intensity peaks. So, I reran the <code class=\"language-plaintext highlighter-rouge\">centWave</code> peak detection with 750, 1000 and 1250 ppm,\nand that indeed make <a href=\"http://masspec.scripps.edu/xcms/xcms.php\">XCMS</a> recover many more metabolites, and, also important, with more\nextracted ion chromatograms per metabolite, yielding a more accurate mass spectrum. At the same time, I notice that profiles are not\nas clean as before, but that’s where the peak fitting with (Modified) Gaussians come into play.</p>\n\n<p>The original 150ppm results:</p>\n\n<p><img src=\"/assets/images/ionChromPlot4.png\" alt=\"\" /></p>\n\n<p>The 750ppm results:</p>\n\n<p><img src=\"/assets/images/map5.png\" alt=\"\" /></p>\n\n<p>And for 1000ppm (1250ppm did not further improve):</p>\n\n<p><img src=\"/assets/images/map6.png\" alt=\"\" /></p>",
      "summary": "Steffen reminded me over email that the particular machine only has a 1 dalton accuracy, and that the 150ppm parameter setting is somewhat inappropriate. As seen yesterday , it works fine for larger peaks, but fails for low intensity peaks. So, I reran the centWave peak detection with 750, 1000 and 1250 ppm, and that indeed make XCMS recover many more metabolites, and, also important, with more extracted ion chromatograms per metabolite, yielding a more accurate mass spectrum. At the same time, I notice that profiles are not as clean as before, but that’s where the peak fitting with (Modified) Gaussians come into play.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/map6.png",
      "date_published": "2008-06-05T00:00:00+00:00",
      "date_modified": "2025-10-11T00:00:00+00:00",
      "tags": ["metabolomics","rstats"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/r7fav-92885",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/06/04/recovering-full-mass-spectra-from-gc-ms.html",
      "title": "Recovering full mass spectra from GC-MS data",
      "content_html": "<p>One aspect not covered in detail by the <a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=1134\">ongoing</a>\n<a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=1133\">discussion</a> on <a href=\"http://www.simbiosys.ca/blog/2008/06/03/research-and-software-testing/\">unit</a>\n<a href=\"http://www.simbiosys.ca/blog/2008/06/03/quality-in-chemical-software-the-debate-continues/\">testing</a> quality control for scientific software,\nis detecting regressions. This is the most important reason why unit testing is superior to random testing. Putting someone behind a\nkeyboard to tests things is nice, but this process has to be repeated, as the testing has to be repeated over and over again. Just\nto make sure it works for whatever new input, for whatever refactoring, for whatever new cool feature.</p>\n\n<p>Another advantage of unit testing over random testing, is in the fact that it provides you with statistics (lies, damn lies, and statistics).\nThese statistics do give some insight where to start looking, though if really written properly, each unit test has the ability to test a\nsingle line of functional code. That’s where code coverage testing is useful, and should be part of the process too. I have no idea what\ncommercial chemoinformatics software vendors do regarding quality control, but I assume they make heavily use of code coverage too.</p>\n\n<p><a href=\"http://www.simbiosys.ca/blog/\">SimBioSys Blog</a> mentioned the use of annual software competitions. That’s important indeed, and provides\nnice means to compare options, but it is not unit testing; it’s a macroscopic test of functionality, and has little means to identifying\nunderlying fails. Is a bad CASP score caused by wrong isotope masses used in the force field, or by the approach? I’m sure no one can tell.</p>\n\n<p>Anyway, refactoring is a principle activity of software engineers, and unit testing manages that process in some way. Taking unit testing\nto the extreme, any new coding starts with writing the unit tests for the new API. Only when those are finished, the functionality is\nactually implemented.</p>\n\n<p>Ok, now something less boring. GC-MS-based <a href=\"http://en.wikipedia.org/wiki/Metabolite\">Metabolomics</a> with metabolite identity in particular.\nThough I believe there are other uses too, for example, in between sample alignment, the recovery of a full mass spectrum is particularly\nimportant for metabolite identification of new, yet unknown compounds (yes, even dereplication is already non-trivial, because of the lack\nof free (open data preferably), machine accessible (open standards!) database of mass spectra (using different ionization methods). Look\nup by monoisotopic mass is possible in, for example, <a href=\"http://www.chemspider.com/\">ChemSpider</a> (see\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2007/11/26/metabolomics-workflows-in-taverna.html\">this blog <i class=\"fa-solid fa-recycle fa-xs\"></i></a>), but look up via full spectrum is less common.\nThe number of databases are growing, and likewise the openness and accessibility. Who know what the\n<a href=\"http://www.metabolomicscentre.nl/\">Netherlands Metabolomics Center</a>’s support platform will be able to offer in a year or two.</p>\n\n<p>Now, unit tests could, for example, tests that some algorithm can deconvolute the following GC-MS data:</p>\n\n<p><img src=\"/assets/images/ionChromPlot3.png\" alt=\"\" /></p>\n\n<p>The red line is the TIC for the chromatogram, while the black lines are the extracted ion chromatograms of individual m/z (ion) peaks in\nthe mass spectral dimension, and, only of peaks detected using <a href=\"http://masspec.scripps.edu/xcms/xcms.php\">XCMS</a> using the new centWave\nmethod by Ralf. Now, the results are not perfect in this diagram, but it does seem to recognize all five eluting metabolites (that’s the\namount I would guess are eluting). However, I am more interested in the methods ability to recover all m/z peaks for each metabolite, to\nallow me to identify the structure, or at least make a best possible educated guess, or, with a bit of luck the compound is already known,\nI can dereplicate it against some database (Guesses differ, but, particularly in plant metabolomics, more than half of the metabolites we\ncan now detect have a yet undetermined structure).</p>\n\n<p>Now, I’m sure any method will be able to deconvolute these compounds. They are well separated, show a nice gaussian shape, and deconvolution\nbased on just the chromatographic domain will likely already work. It starts to become more difficult for the low intensity peaks, those with\nlow signal-to-noise ratios, or those with a different elution profile (e.g. peak tailing). Deconvolution typically requires some peak shape\n(commonly Gaussian or Exponential-Modified Gaussian), while experimental data typically does not have that. Jim Downing recently introduced\nme to the term ‘Long Tail Science’ (via <a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=938\">this blog from Peter</a>):</p>\n\n<blockquote>\n  <p>Jim Downing cam up with the idea of “Long Tail Science”. The Long Tail is the observation that in the modern web the tail of\nthe distribution is often more important than the few large players. Large numbers of small units is an important concept. And\nit’s complimentary and complementary.</p>\n</blockquote>\n\n<p>This <em>long tail</em> of ion chromatograms is what I am interested. I do not care about the usual suspects, I want to learn about the 80%\nunknown metabolites that are found in samples. What can we learn about those? What is in the long tail of detected metabolites?</p>\n\n<p>Now, I know I am not an expert in tuning <code class=\"language-plaintext highlighter-rouge\">centWave</code> parameters, so I might as well be passing garbage in, but the method is not robust\nagainst me:</p>\n\n<div class=\"language-R highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">xr</span><span class=\"w\"> </span><span class=\"o\">&lt;-</span><span class=\"w\"> </span><span class=\"n\">read</span><span class=\"p\">(</span><span class=\"n\">file</span><span class=\"o\">=</span><span class=\"s2\">\"someClosedData.cdf\"</span><span class=\"p\">)</span><span class=\"w\">\n</span><span class=\"n\">p</span><span class=\"w\"> </span><span class=\"o\">&lt;-</span><span class=\"w\"> </span><span class=\"n\">findPeaks</span><span class=\"p\">(</span><span class=\"n\">xr</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">method</span><span class=\"o\">=</span><span class=\"s2\">\"centWave\"</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">ppm</span><span class=\"o\">=</span><span class=\"m\">150</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">peakwidth</span><span class=\"o\">=</span><span class=\"nf\">c</span><span class=\"p\">(</span><span class=\"m\">5</span><span class=\"p\">,</span><span class=\"m\">25</span><span class=\"p\">))</span><span class=\"w\">\n</span></code></pre></div></div>\n\n<p>But it fails to detect some of the metabolites in the long tail:</p>\n\n<p><img src=\"/assets/images/ionChromPlot4.png\" alt=\"\" /></p>\n\n<p>As you can observe from the low S/N ratio on the red TIC line, you can notice that we are at low intensity metabolites. Assuming some\npeak shape is at such noise levels much more difficult than with better S/N ratios. The <code class=\"language-plaintext highlighter-rouge\">centWave</code> uses a really nice non-parametric approach\nhere… well, not entirely, otherwise it would not have failed over my parameter settings :) Steffen/Ralf, what was the DOI again?</p>\n\n<p>Now, I found these missed metabolites by manual browsing the data, data exploration. SBS <a href=\"http://www.simbiosys.ca/blog/2008/06/03/research-and-software-testing/\">wrote</a>\n<em>[t]here are four distinct type of tests: Func, Speed, Error and Robust</em>. I believe the above situation is really a fifth class: it’s\nneither a true <em>functional test</em>, but it is not an true <em>robustness test</em> either. The input is valid, but of such that it moves towards\ninvalid input. Scientific data is a continuous spectrum of input (remember the <a href=\"https://chem-bla-ics.linkedchemistry.info/2008/06/03/good-scientists-pimp-there-research-was.html\">oil in, oil out <i class=\"fa-solid fa-recycle fa-xs\"></i></a>).\nSoftware must be tested against such borderline data; repeatedly, over and over again, for any version, for any code change, for any\nplatform.</p>\n\n<p>Oh, and code quality has nothing to do with trust. Give me statistics (I can interpret the scope of them) over any trust assurance.</p>",
      "summary": "One aspect not covered in detail by the ongoing discussion on unit testing quality control for scientific software, is detecting regressions. This is the most important reason why unit testing is superior to random testing. Putting someone behind a keyboard to tests things is nice, but this process has to be repeated, as the testing has to be repeated over and over again. Just to make sure it works for whatever new input, for whatever refactoring, for whatever new cool feature.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/ionChromPlot3.png",
      "date_published": "2008-06-04T00:00:00+00:00",
      "date_modified": "2025-08-26T00:00:00+00:00",
      "tags": ["metabolomics","rstats"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/r5p0q-bb580",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/06/03/good-scientists-pimp-there-research-was.html",
      "title": "Good Scientists Pimp there Research (was: Damn, I&apos;m boring...)",
      "content_html": "<p>Define good. Let me say that up front. Good scientists, that is, if you say successful researchers are good scientists, secure\ngood funding. Getting good funding requires doing the most relevant research (define relevant). Or, to put it bluntly, being a\nsuccessful researcher requires to pimp your research. Doing boring research is nice for you, good for a Nobel prize if it\nturns out to have a cool spin off, but doesn’t buy you research success.</p>\n\n<p>And, boy, <a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=1130\">am I boring or what</a>?\nReally, Peter is right… I am boring. He is also right that doing quality control doesn’t result in research output. He writes:</p>\n\n<blockquote>\n  <p>But it’s essential for modern knowledge-driven science. The chemical software and data industry has no tradition of quality.\nI’ve known it for 30 years and I’ve never seen a commercial company output quality metrics. I have never seen a commercial\ncompany publish results of roundtripping. That’s another really boring and apparently pointless operation where you take a\nfile A, convert it to B and then convert it back to A’. What’s the point? Well A and A’ should be the same. With most\ncommercial software you get loss. If you are lucky it’s only whitespace. But it’s more likely to be hydrogens or charges\nor whatever.</p>\n</blockquote>\n\n<p>There is a huge gap between <em>should</em> and <em>is</em>. This is partly caused that it is much easier to throw garbage at some piece of\ncode, than to start an <a href=\"http://usefulchem.blogspot.com/2008/05/not-ugi-product.html\">Ugi reaction</a> on something which is\nnaturally broken. Take a moment to think about that. <em>Garbage in, garbage out</em> is still common in information technology,\nbut I have never heard of the saying <em>oil in, oil out</em>.</p>\n\n<p><strong>Scientists do not have the ability to recognize garbage (when it comes to data)!</strong> It’s statistics, that other boring\nstuff I do. Even worse, it’s the programmers fault if someone throws garbage at my software and it doesn’t return 42.</p>\n\n<p>Unit testing is a poor mans approach to handle garbage. As are diagnostics tools, like\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2008/06/01/finding-differences-between.html\">finding the difference between IChemObjects <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\n(and <a href=\"https://chem-bla-ics.linkedchemistry.info/2008/06/03/finding-differences-between_03.html\">#2 <i class=\"fa-solid fa-recycle fa-xs\"></i></a>). Unit tests define common\n<em>garbage in</em> (of course, also <em>valid input in</em>, but that does not make it less boring) and tests that the method does\nnot freak out; that the tested method gives the user some friendly message like\n<em>“Hi there! I’m your friendly computer program. I am sorry to inform you that it’s garbage you just passed along. Please clean it up first.”</em></p>\n\n<p>Anyway, back to my boring research…</p>",
      "summary": "Define good. Let me say that up front. Good scientists, that is, if you say successful researchers are good scientists, secure good funding. Getting good funding requires doing the most relevant research (define relevant). Or, to put it bluntly, being a successful researcher requires to pimp your research. Doing boring research is nice for you, good for a Nobel prize if it turns out to have a cool spin off, but doesn’t buy you research success.",
      
      "date_published": "2008-06-03T00:10:00+00:00",
      "date_modified": "2025-10-11T00:00:00+00:00",
      "tags": ["chemistry"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/9tk1a-d6c07",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/06/03/finding-differences-between_03.html",
      "title": "Finding differences between IChemObjects #2",
      "content_html": "<p><a href=\"http://cdk.sf.net/\">CDK</a> QSAR descriptors are not allowed to change the input [molecule|atom|bond], and I\nrecently added a unit tests (rev <a href=\"http://cdk.svn.sourceforge.net/viewvc/cdk?view=rev&amp;revision=11138\">11138</a>) for that to the\nabstract class <a href=\"http://cdk.svn.sourceforge.net/viewvc/cdk/cdk/trunk/src/test/org/openscience/cdk/qsar/descriptors/atomic/AtomicDescriptorTest.java?view=log\">AtomicDescriptorTest</a>.</p>\n\n<p>After some code clean up of the diff module code earlier this morning (in anticipation of the rain stopping), I applied this patch\n(rev <a href=\"http://cdk.svn.sourceforge.net/viewvc/cdk?view=rev&amp;revision=11269\">11269</a>) that <code class=\"language-plaintext highlighter-rouge\">noModification</code> unit test:</p>\n\n<div class=\"language-diff highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code> public void testCalculate_NoModifications() throws Exception {\n   IAtomContainer mol = someoneBringMeSomeWater();\n   IAtom atom = mol.getAtom(1);\n<span class=\"gd\">-  String priorString = atom.toString();\n</span><span class=\"gi\">+  IAtom clone = (IAtom)mol.getAtom(1).clone();\n</span>   descriptor.calculate(atom, mol);\n<span class=\"gd\">-  String afterString = atom.toString();\n</span><span class=\"gi\">+  String diff = AtomDiff.diff(clone, atom);\n</span>   assertEquals(\n<span class=\"gd\">-    \"The descriptor must not change the passed bond in any respect.\",\n-    priorString, afterString\n</span><span class=\"gi\">+    \"The descriptor must not change the passed bond in any respect, but found this diff: \" + diff,\n+    0, diff.length()\n</span>   );\n }\n</code></pre></div></div>\n\n<p>This is a nice example of where the new <a href=\"https://chem-bla-ics.linkedchemistry.info/2008/06/01/finding-differences-between.html\">diff module <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nis useful. Instead of dumping to long IAtom.toString()s, the output now gives output like:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>AtomDiff(AtomTypeDiff(, NULL/H, NC:0/1, V:0/1))\n</code></pre></div></div>\n\n<p>This indicates (yes, a bit cryptic) that the formal neighbor count (NC) and the valence (V) fields have been modified, in addition\nto that first field, which I don’t know what it refers too. Indeed, the output still needs a bit more tuning :)</p>",
      "summary": "CDK QSAR descriptors are not allowed to change the input [molecule|atom|bond], and I recently added a unit tests (rev 11138) for that to the abstract class AtomicDescriptorTest.",
      
      "date_published": "2008-06-03T00:00:00+00:00",
      "date_modified": "2025-10-11T00:00:00+00:00",
      "tags": ["cdk","qsar"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/e0rrq-d0863",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/06/01/finding-differences-between.html",
      "title": "Finding differences between IChemObjects",
      "content_html": "<p><a href=\"http://cdk.sf.net/\">CDK</a> <a href=\"http://cdk.svn.sourceforge.net/viewvc/cdk/cdk/trunk/\">trunk</a> is getting into shape, thanx to the many people\nwho contribute to this, and special thanx to Miguel for cleaning up his code related to charge, resonance, and ionization potential\ncalculations!</p>\n\n<p>At the moment, I am focusing at two issues:</p>\n\n<ol>\n  <li>QSAR descriptors that change the input (causing other descriptors to randomly fail)</li>\n  <li>Cloning of IChemObject (for which last week a rather serious bug was found)</li>\n</ol>\n\n<p>Until some days ago, the CDK had one main method to introspect <a href=\"http://cheminfo.informatics.indiana.edu/~rguha/code/java/nightly/api/org/openscience/cdk/interfaces/IChemObject.html\">IChemObject</a>s:\ntheir <code class=\"language-plaintext highlighter-rouge\">toString()</code> results. However, finding the difference between <code class=\"language-plaintext highlighter-rouge\">IChemObjects</code> using this approach is not trivial, particularly if\nthere are several differences.</p>\n\n<p>So, I started a new module called diff. If two objects are identical, it returns a zero-length <code class=\"language-plaintext highlighter-rouge\">String</code>. If not, it lists the changes\nbetween the two classes, in a way much like that of the IChemObjects <code class=\"language-plaintext highlighter-rouge\">toString()</code> methods.</p>\n\n<p>For example, consider this bit of code:</p>\n\n<div class=\"language-java highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nc\">IChemObject</span> <span class=\"n\">atom1</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"nc\">ChemObject</span><span class=\"o\">();</span>\n<span class=\"nc\">IChemObject</span> <span class=\"n\">atom2</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"nc\">ChemObject</span><span class=\"o\">();</span>\n<span class=\"n\">atom2</span><span class=\"o\">.</span><span class=\"na\">setFlag</span><span class=\"o\">(</span><span class=\"nc\">CDKConstants</span><span class=\"o\">.</span><span class=\"na\">ISAROMATIC</span><span class=\"o\">,</span> <span class=\"kc\">true</span><span class=\"o\">);</span>\n<span class=\"nc\">String</span> <span class=\"n\">result</span> <span class=\"o\">=</span> <span class=\"nc\">ChemObjectDiff</span><span class=\"o\">.</span><span class=\"na\">diff</span><span class=\"o\">(</span> <span class=\"n\">atom1</span><span class=\"o\">,</span> <span class=\"n\">atom2</span> <span class=\"o\">);</span>\n</code></pre></div></div>\n\n<p>The result value then looks like:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>ChemObjectDiff(, flag5:F/T)\n</code></pre></div></div>\n\n<p>Now, output will likely change a bit over time. But at least, I now have a easier to use approach for debugging and writing\nunit tests. Don’t be suprised to see <code class=\"language-plaintext highlighter-rouge\">test-*</code> modules start depending on the new <code class=\"language-plaintext highlighter-rouge\">diff</code> module.</p>",
      "summary": "CDK trunk is getting into shape, thanx to the many people who contribute to this, and special thanx to Miguel for cleaning up his code related to charge, resonance, and ionization potential calculations!",
      
      "date_published": "2008-06-01T00:00:00+00:00",
      "date_modified": "2008-06-01T00:00:00+00:00",
      "tags": ["cdk"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/1bbpk-5ye80",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/05/19/development-of-new-jchempaint.html",
      "title": "Development of the new JChemPaint",
      "content_html": "<p>A quick screenshot, after some work on the JChemPaint code based on <a href=\"http://cdk.sf.net/\">CDK</a> <code class=\"language-plaintext highlighter-rouge\">trunk/</code>. Nothing much to see, but a rather small\ncode base, which is good. Today, I have set up <a href=\"http://cdk.svn.sourceforge.net/viewvc/cdk/cdk/trunk/\">cdk/cdk/trunk/</a> and\n<a href=\"http://cdk.svn.sourceforge.net/viewvc/cdk/jchempaint/trunk/\">cdk/jchempaint/trunk</a> as Eclipse plugins, allowing the second to depend on the first.\nSo, no more use of svn:externals. This is what it now looks like, and basically formalizes the end result of\n<a href=\"http://progz-jchem.blogspot.com/\">Niels</a>’ work of last year:</p>\n\n<p><img src=\"/assets/images/jchempaintNew.png\" alt=\"\" /></p>\n\n<p>A possible spin of is that <a href=\"http://bioclipse.net/\">Bioclipse</a>2 can use these plugins too, instead of defining plugins itself.</p>\n\n<p>To reproduce the above screenshot, just import <code class=\"language-plaintext highlighter-rouge\">cdk/cdk/trunk and</code> <code class=\"language-plaintext highlighter-rouge\">cdk/jchempaint/trunk</code> into Eclipse, and run the <code class=\"language-plaintext highlighter-rouge\">TestEditor</code> from the\nJChemPaint plugin.</p>",
      "summary": "A quick screenshot, after some work on the JChemPaint code based on CDK trunk/. Nothing much to see, but a rather small code base, which is good. Today, I have set up cdk/cdk/trunk/ and cdk/jchempaint/trunk as Eclipse plugins, allowing the second to depend on the first. So, no more use of svn:externals. This is what it now looks like, and basically formalizes the end result of Niels’ work of last year:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/jchempaintNew.png",
      "date_published": "2008-05-19T00:00:00+00:00",
      "date_modified": "2008-05-19T00:00:00+00:00",
      "tags": ["jchempaint","cdk","bioclipse"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/dr1b3-t3k63",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/05/16/metware-status-report.html",
      "title": "Metware Status Report",
      "content_html": "<p>Following many, many others, I finally got myself a <a href=\"http://www.slideshare.net/\">SlideShare</a>\n<a href=\"https://web.archive.org/web/20091129144404/https://www.slideshare.net/egonw/\">account <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>\nand uploaded a <a href=\"https://web.archive.org/web/20091125183518/http://www.slideshare.net/egonw/metware\">recent presentation <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>\non <a href=\"http://metware.sf.net/\">MetWare</a>, our metabolomics data warehouse project. Some spoilers: SQL, RDF/SKOS, JSF.</p>\n\n<p><a href=\"https://zenodo.org/records/2639469\"><img src=\"/assets/images/metware_presentation_2008.png\" alt=\"\" /></a></p>",
      "summary": "Following many, many others, I finally got myself a SlideShare account and uploaded a recent presentation on MetWare, our metabolomics data warehouse project. Some spoilers: SQL, RDF/SKOS, JSF.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/metware_presentation_2008.png",
      "date_published": "2008-05-16T00:00:00+00:00",
      "date_modified": "2025-07-28T00:00:00+00:00",
      "tags": ["metware","rdf","ontology","java"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.5281/ZENODO.2639469", "doi": "10.5281/ZENODO.2639469"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/1x0d5-we437",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/05/10/john-wilbanks-replies-to.html",
      "title": "John Wilbanks replies to the ChemSpider/OpenData discussion",
      "content_html": "<p>Not long after I posted my view on things, <a href=\"http://network.nature.com/blogs/user/wilbanks/2008/05/10/chemspider-good-intentions-and-the-fog-of-licensing\">John posted his reply</a>\non the ChemSpider/OpenData discussion. His comment was merely to illustrate an internal advice to some organization, which got accidentally leaked. Anyway, a must read,\n<a href=\"http://sciencecommons.org/projects/publishing/open-access-data-protocol/\">with two</a>\n<a href=\"http://www.opendatacommons.org/odc-public-domain-dedication-and-licence/\">good links</a> to further reading on open data licensing.</p>\n\n<p>His blog mentions the concept of <em>public domain</em>, where data might be dumped, but I always understood that the US public domain concept is different from that of\nmainland-EU, German law in particular. This second ‘good link’ points to a license which formalizes this ‘public domain’ idea. And reading it, I realize that I\nhave read it before. But I had completely forgot about it.</p>\n\n<p>A quick reread of these two links, tells me that it indeed is BSD-versus-GPL all over again; with the <a href=\"http://sciencecommons.org/\">Science Commons</a>\nlicense on the BSD side, and CC-BY-SA at the GPL side. The first surely makes the life easier of aggregators who wish to combine licenses. Can’t argue with that.</p>\n\n<p>Then again… what’s wrong with a bit of viral character in the license? What’s wrong with the statement that ‘you may use my data, if I may use your aggregated\ndata with the same license’? That limits your what you practically can do, but does not limit your freedoms.</p>",
      "summary": "Not long after I posted my view on things, John posted his reply on the ChemSpider/OpenData discussion. His comment was merely to illustrate an internal advice to some organization, which got accidentally leaked. Anyway, a must read, with two good links to further reading on open data licensing.",
      
      "date_published": "2008-05-10T00:00:00+00:00",
      "date_modified": "2008-05-10T00:00:00+00:00",
      "tags": ["chemspider","opendata","copyright"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/zebnk-k6z97",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/05/10/does-chemspider-really-violate-open.html",
      "title": "Does ChemSpider really violate Open Data with CC SA?",
      "content_html": "<p><a href=\"http://www.chemspider.com/\">ChemSpider</a> <a href=\"http://www.chemspider.com/blog/it-appears-chemspider-does-bad-by-using-creative-commons-licenses.html\">is afraid</a>\nthey are doing something bad because they release their data as <a href=\"http://creativecommons.org/licenses/by-sa/3.0/\">CC-BY-SA</a>.\nBecause, John Wilbanks says in Peter’s blog:</p>\n\n<blockquote>\n  <p>I would add to it that I’d like to see a meaningful discussion of the\nrisks of Share Alike and Attribution on <strong>data integration</strong>. Chemspider’s\nmove to CC-BY-SA fits into this discussion nicely - it’s a total\nviolation of the open data protocol we laid out at SC, which says “Don’t\nUse CC Licenses on Data” - <strong>but it does conform inside the broader OKD.</strong></p>\n</blockquote>\n\n<p>Now, let’s take this into pieces.</p>\n\n<ol>\n  <li>John notes that ChemSpider is in compliance with the <a href=\"http://www.opendefinition.org/1.0/\">OKD</a>. This means, that ChemSpider thinks\nabout Open Data just like the <a href=\"http://en.wikipedia.org/wiki/Open_Knowledge_Foundation\">Open Knowledge Foundation</a> does. I’ve scanned\nthrough the OKD, and it indeed seems to support the BY and SA clauses of the CC. So, Chemspider did not do a bad thing.</li>\n  <li>Data integration is tricky: you have to keep track of license information on an entry-by-entry level. For each fact, you keep to track the\nsource, and associate the source with it’s original license. For example, the <a href=\"http://www.nmrshiftdb.org/\">NMRShiftDB</a>\ninformation in ChemSpider should be <a href=\"http://www.gnu.org/copyleft/fdl.html\">GNU FDL</a>.</li>\n  <li>OpenX licenses may be viral. This holds for the <a href=\"http://www.gnu.org/licenses/gpl.html\">GNU GPL</a> as well as for the CC-BY-SA.\nNothing new there. It just requires that when you would like to incorporate the ChemSpider data into a larger database, that database\nhas to be CC-BY-SA too, or likely at least CC-SA.</li>\n</ol>\n\n<p>Summarizing, I think ChemSpider did a good thing, and that ChemSpider does <strong>not</strong> violate the OpenData idea, but instead, that the CC-BY-SA and\nthe OKD violates John’s requirements for integrating data resources (apparently based on a two year legal study). That has nothing to do with ChemSpider.</p>\n\n<p>Now, people will always have different opinions on Openness. The original BSD clause had a\n<a href=\"http://en.wikipedia.org/wiki/BSD_License#UC_Berkeley_advertising_clause\">restrictive ‘advertisement’ clause</a>, not Open enough for at least the\n<a href=\"http://www.debian.org/social_contract#guidelines\">Debian Free Software Guidelines</a> (DFSG), while still open source. The clause was\nlater removed from the BSD license.</p>\n\n<p>Another <a href=\"http://www.debian.org/\">Debian</a> example is Firebox, which is named <a href=\"http://packages.debian.org/iceweasel\">IceWeasel</a> in Debian,\nbecause the ‘license’ on the Firefox name is not open enough.</p>\n\n<p>Another problem with the definition of Openness, is the viral aspect of some licenses (see earlier). For some, the GPL is not open enough,\nbecause it does not give people the freedom to license their software they like themselves, something the BSD and MIT licenses do allow.\nThere is ongoing debate (and that should be ongoing) on how much <em>freedom</em> a license must provide to be called Open. The whole OpenAccess\ndiscussion is similar (see e.g. <a href=\"http://www.google.com/search?q=strong+weak+open+access+site%3Awwmm.ch.cam.ac.uk&amp;btnG=Search\">Peter’s story on this</a>),\nwhere the discussion on the minimal amount of freedom is even worse.</p>\n\n<p>Should we worry about ChemSpider being ‘only’ CC-BY-SA? Maybe. Data is not software, but I disagree that viral license would be OK for software, but NOT for data. That’s just BSD-versus-GPL all over again. I am happy about OpenBabel being GPL, and I am happy about ChemSpider being CC-BY-SA too.</p>\n\n<p>All that said, these discussion are important. And creating good definitions of what freedoms are required, are crucial in deciding whether something is Open. The Blue Obelisk does not have/use such definitions yet, and we should start discussing this, and define a Blue Obelisk ODOSOS Guidelines. Please no funny jokes about how we can boogy then :)</p>\n\n<p>Now, looking forward to hearing what you think about these issues… Looking forward to the other blog items!</p>",
      "summary": "ChemSpider is afraid they are doing something bad because they release their data as CC-BY-SA. Because, John Wilbanks says in Peter’s blog:",
      
      "date_published": "2008-05-10T00:00:00+00:00",
      "date_modified": "2008-05-10T00:00:00+00:00",
      "tags": ["chemspider","copyright","nmrshiftdb"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/g89pc-kz779",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/05/08/re-what-should-nature-chemistry-paper.html",
      "title": "Re: What should a Nature Chemistry paper look like?",
      "content_html": "<p><a href=\"http://blogs.nature.com/thescepticalchymist/author/neil_withers/\">Neil</a> <a href=\"http://blogs.nature.com/thescepticalchymist/2008/05/jj_day_98_service_with_a_simpl.html\">wondered</a>\n<em>“what a <a href=\"http://www.nature.com/nchem/\">Nature Chemistry</a> paper should look like”</em>, and asked the following questions.\nBelow are my answers.</p>\n\n<p><strong>1. HTML vs PDF: does anyone read the HTML articles? Do you read the PDF on-screen or print it out?</strong> <br />\nI typically read the HTML to scan if a paper is interesting for me. But because electronic paper is still too\nexpensive, I typically make a print of the PDF. I would love to print the HTML instead, if only it was not clouded\nwith advertisement, link menu’s etc. Many websites have a ‘Print View’ with just the content. Nicely layed out,\nbut without the menus/etc. NC should adopt this feature (or did I miss that option?).</p>\n\n<p><strong>2. Big vs little graphics: what does everyone else think about the tiny size of the graphics in ACS html articles?</strong> <br />\nI hate the small figures, because they make scanning the HTML more difficult.</p>\n\n<p><strong>3a. Tagging/’semantic web’: what do you think about the toys on the RSC’s Project Prospect?</strong> <br />\nI love tagging and semantic work up. Just browse my blog. I blogged a bit about <a href=\"http://chem-bla-ics.blogspot.com/search?q=project+prospect\">Project Prospect</a> <!-- keep link -->\nin the past, and also about using <a href=\"http://chem-bla-ics.blogspot.com/search?q=RDFa\">RDFa for semantic markup of chemistry</a>.  <!-- keep link -->\nI must also mention the nice semantic work by the <a href=\"http://www.beilstein-journals.org/\">Beilstein Journal</a>. Check the HTML\nsource for all the semantics and the link to the papers RDF version. I discussed some of that work\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2008/04/22/quality-publishing-endnote-versus.html\">earlier <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.</p>\n\n<p><strong>3b. What kind of things would you like to see tagged/linked to other content in Nature Chemistry?</strong> <br />\nI’d really like to see that Nature would pick up social tagging. For example, <a href=\"http://blogs.nature.com/wp/nascent/\">Euan/Ian/etc</a>\ncan tell you now tags from blogs/etc, can be used to find relevant other literature. Show\n<a href=\"http://www.connotea.org/\">Connotea</a> tags for NC papers on the NC website. Show related literature based on tag matching.\nI also recommend taking advantage of <a href=\"http://www.postgenomic.com/\">Postgenomic.com</a> and <a href=\"http://cb.openmolecules.net/\">Chemical blogspace</a>\nto complement papers with user comments, or at least link to them (just like linking to <a href=\"http://www.f1000biology.com/browse/\">F1000</a>).\nRegarding domain knowledge: link to whatever open database present, and encourage authors to provide links to public databases,\ne.g. by providing InChIs for molecules the describe, PDB identifiers, etc, etc.</p>\n\n<p><strong>4. 3D molecular structures: do these help your understanding of a paper?</strong> <br />\nAbsolutely! Henry Rzepa and Christopher Braddock recently showed how one can take advantage of\n<a href=\"http://www.jmol.org/\">Jmol</a> to explain what is going on (doi:<a href=\"https://doi.org/10.1021/np0705918\">10.1021/np0705918</a>),\nbut the ACS forgot to make it part of the main text :) A brilliant recent use of Jmol in explaining chemistry, is\n<a href=\"http://proteopedia.org/wiki/index.php/Main_Page\">ProtopediA</a> that uses <em>Jmol scripts</em> to visualize statements\nin the textual description in the wiki.</p>\n\n<p><strong>5. How useful to you are InChIs and SMILES?</strong> <br />\nWhile there is an <a href=\"http://opensmiles.org/\">OpenSMILES</a> project (part of the <a href=\"http://www.blueobelisk.org/\">Blue Obelisk movement</a>)\nto standardize SMILES, I’d go for InChI, and InChIKey if you mind the length of the InChI itself.</p>\n\n<p><strong>6. Forward linking: do you use it? Would you use an RSS feed that alerted you to new citations of a particular paper.</strong> <br />\nI am not sure what forward linking is, so cannot comment on that. However, I would use RSS feeds to alert me of new citations of\na particular paper. Right now, I am relying on <a href=\"http://scientific.thomson.com/products/wos/\">Web-of-Science</a> to do this for me,\nbut RSS are an excellent alternative. BTW, I was not aware of such feeds yet, and could use some advertisement!</p>\n\n<p><strong>7. Would you actually comment on papers if there was a comments box at the end?</strong> <br />\nNo, I would rather comment in my blog instead. That would place the comments in some perspective. See also my comment\non question 3b.</p>\n\n<p><strong>8. We really like the <a href=\"http://www.biochemj.org/bj/ev/381/0329/bj3810329_ev.htm\">Biochemical Society’s HTML article style</a> – do you?</strong> <br />\nNo, please do not inherit that layout. The use of frames should be discouraged anyway. It seems to be used to easily add\ninteractivity, but I am positive that Ajax/etc can be used to do all this inline.</p>",
      "summary": "Neil wondered “what a Nature Chemistry paper should look like”, and asked the following questions. Below are my answers.",
      
      "date_published": "2008-05-08T00:00:00+00:00",
      "date_modified": "2025-10-11T00:00:00+00:00",
      "tags": ["publishing","chemistry","inchi"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1021/np0705918", "doi": "10.1021/np0705918"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/fwsac-99191",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/05/03/wicked-chemistry-and-unit-testing.html",
      "title": "Wicked chemistry and unit testing",
      "content_html": "<p>After a discussion on starting development releases for <a href=\"http://cdk.sf.net/\">CDK</a> on <a href=\"https://lists.sourceforge.net/lists/listinfo/cdk-devel\">cdk-devel</a>,\nthe discussion continued on the state of the <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/07/01/atom-typing-in-cdk.html\">CDK atom typer <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.\n<a href=\"http://dtp.nci.nih.gov/branches/itb/itb_index.html\">Dan</a> and <a href=\"http://cheminfo.informatics.indiana.edu/~rguha/\">Rajarshi</a>\nhave done tests in the past against <a href=\"http://pubchem.ncbi.nlm.nih.gov/\">PubChem</a> and its DTP/NCI subset. Rajarshi made his\nanalysis part of <a href=\"http://cheminfo.informatics.indiana.edu/~rguha/code/java/nightly/\">CDK Nightly</a>,\nand provides but <a href=\"http://cheminfo.informatics.indiana.edu/~rguha/code/java/nightly/\">a summary</a>\n(which seems broken: zero fails) and a <a href=\"http://cheminfo.informatics.indiana.edu/~rguha/tmp/dtp-atype-report.txt\">detailed list</a>.</p>\n\n<p>Dan, do I understand correctly that those <em>Structure Evaluation:No Comparision - Unparameterized Atom - S.</em> lines in the\n<strong>Depositor-Supplied Comments</strong> section on PubChem are based on CDK trunk? That would be a great honor! Anyways…</p>\n\n<p>The amount of atom types we use to describe the chemistry we observe is overwhelming (even without charged or radical atoms).\nAnd, most atom type lists are quite limited in what they represent. However, having an explicit list allows the computer to\ndecide if it can do reasonable calculations on a structure. <strong>Always filter your data to screen for unrecognized atom types,\nbefore heading of to, for example, QSAR calculations!</strong></p>\n\n<p>Now, many fails are because of the incomplete CDK atom type list (e.g. Au in <a href=\"http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?sid=413374\">SID:413374</a>),\nor because the atom typer code has a bug (e.g. <a href=\"http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?sid=403517\">SID:403517</a>).\nAnd these screenings against PubChem provide a nice priority list. However, others are either because the used SDF format\ncannot represent the chemistry (e.g. <a href=\"http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?sid=420394\">SID:420394</a>), or the\nentry is a plain wrong (e.g. <a href=\"http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?sid=301178\">SID:301178</a>). The latter\ntwo types of fails, I am annotating using <a href=\"http://del.icio.us/egonw/pubchem+check-valency\">del.icio.us/egonw/pubchem+check-valency</a>\nfor others to comment on (just tag the same page using <a href=\"http://del.icio.us/\">del.ici.us</a>, and I’ll see the comments show up.</p>\n\n<h2 id=\"unit-testing\">Unit testing</h2>\n\n<p>For the first two types of fails, basically three things need to be done:</p>\n\n<ul>\n  <li>add the atom type to <a href=\"http://cdk.svn.sourceforge.net/viewvc/*checkout*/cdk/cdk/trunk/src/main/org/openscience/cdk/config/data/cdk_atomtypes.xml?content-type=text%2Fxml\">the ontology</a></li>\n  <li>write a unit test for <a href=\"http://cdk.svn.sourceforge.net/viewvc/cdk/cdk/trunk/src/test/org/openscience/cdk/atomtype/CDKAtomTypeMatcherTest.java?view=log\">CDKAtomTypeMatcherTest</a></li>\n  <li>add perception code to <a href=\"http://cheminfo.informatics.indiana.edu/~rguha/code/java/nightly/api/org/openscience/cdk/atomtype/CDKAtomTypeMatcher.html\">CDKAtomTypeMatcher</a></li>\n</ul>\n\n<p>Because we cannot use <a href=\"http://www.opensmiles.org/\">SMILES</a> or file readers for writing these tests (than we can confounding\nof error sources), we have to hard code the chemical structure, which may be a bit cumbersome.</p>\n\n<p>Unless you use the <a href=\"http://cheminfo.informatics.indiana.edu/~rguha/code/java/nightly/api/org/openscience/cdk/io/CDKSourceCodeWriter.html\">CDKSourceCodeWriter</a>!\nThis <a href=\"http://cheminfo.informatics.indiana.edu/~rguha/code/java/nightly/api/org/openscience/cdk/io/IChemObjectWriter.html\">IChemObjectWriter</a>\ncreates CDK source code, staring with a IMolecule. Now, because our bug reports are derived from fails against the PubChem\nscreening, we can simply use this <a href=\"http://www.beanshell.org/\">BeanShell</a> code to download a structure from\nPubChem and convert it to CDK source code:</p>\n\n<div class=\"language-java highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"err\">#</span><span class=\"o\">!/</span><span class=\"n\">usr</span><span class=\"o\">/</span><span class=\"n\">bin</span><span class=\"o\">/</span><span class=\"n\">bsh</span>\n\n<span class=\"kn\">import</span> <span class=\"nn\">org.openscience.cdk.Molecule</span><span class=\"o\">;</span>\n<span class=\"kn\">import</span> <span class=\"nn\">org.openscience.cdk.io.MDLV2000Reader</span><span class=\"o\">;</span>\n<span class=\"kn\">import</span> <span class=\"nn\">org.openscience.cdk.io.CDKSourceCodeWriter</span><span class=\"o\">;</span>\n\n<span class=\"k\">if</span> <span class=\"o\">(</span><span class=\"n\">bsh</span><span class=\"o\">.</span><span class=\"na\">args</span><span class=\"o\">.</span><span class=\"na\">length</span> <span class=\"o\">==</span> <span class=\"mi\">0</span> <span class=\"o\">||</span> <span class=\"n\">bsh</span><span class=\"o\">.</span><span class=\"na\">args</span><span class=\"o\">[</span><span class=\"mi\">0</span><span class=\"o\">]</span> <span class=\"o\">==</span> <span class=\"kc\">null</span><span class=\"o\">)</span> <span class=\"o\">{</span>\n  <span class=\"nc\">System</span><span class=\"o\">.</span><span class=\"na\">out</span><span class=\"o\">.</span><span class=\"na\">println</span><span class=\"o\">(</span><span class=\"s\">\"Syntax: pubchem2unittest.bsh [CID]\\n\"</span><span class=\"o\">);</span>\n  <span class=\"nc\">System</span><span class=\"o\">.</span><span class=\"na\">exit</span><span class=\"o\">(</span><span class=\"mi\">0</span><span class=\"o\">);</span>\n<span class=\"o\">}</span>\n\n<span class=\"nc\">String</span> <span class=\"n\">cid</span> <span class=\"o\">=</span> <span class=\"n\">bsh</span><span class=\"o\">.</span><span class=\"na\">args</span><span class=\"o\">[</span><span class=\"mi\">0</span><span class=\"o\">];</span>\n<span class=\"nc\">String</span> <span class=\"n\">urlString</span> <span class=\"o\">=</span> <span class=\"s\">\"http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?disopt=SaveSDF&amp;cid=\"</span> <span class=\"o\">+</span> <span class=\"n\">cid</span><span class=\"o\">;</span>\n\n<span class=\"no\">URL</span> <span class=\"n\">url</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"no\">URL</span><span class=\"o\">(</span><span class=\"n\">urlString</span><span class=\"o\">);</span>\n\n<span class=\"nc\">MDLV2000Reader</span> <span class=\"n\">reader</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"nc\">MDLV2000Reader</span><span class=\"o\">(</span><span class=\"n\">url</span><span class=\"o\">.</span><span class=\"na\">openStream</span><span class=\"o\">());</span>\n<span class=\"nc\">Molecule</span> <span class=\"n\">mol</span> <span class=\"o\">=</span> <span class=\"n\">reader</span><span class=\"o\">.</span><span class=\"na\">read</span><span class=\"o\">(</span><span class=\"k\">new</span> <span class=\"nc\">Molecule</span><span class=\"o\">());</span>\n\n<span class=\"nc\">StringWriter</span> <span class=\"n\">stringWriter</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"nc\">StringWriter</span><span class=\"o\">();</span>\n<span class=\"nc\">CDKSourceCodeWriter</span> <span class=\"n\">writer</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"nc\">CDKSourceCodeWriter</span><span class=\"o\">(</span><span class=\"n\">stringWriter</span><span class=\"o\">);</span>\n<span class=\"n\">writer</span><span class=\"o\">.</span><span class=\"na\">write</span><span class=\"o\">(</span><span class=\"n\">mol</span><span class=\"o\">);</span>\n<span class=\"n\">writer</span><span class=\"o\">.</span><span class=\"na\">close</span><span class=\"o\">();</span>\n\n<span class=\"nc\">System</span><span class=\"o\">.</span><span class=\"na\">out</span><span class=\"o\">.</span><span class=\"na\">print</span><span class=\"o\">(</span><span class=\"n\">stringWriter</span><span class=\"o\">.</span><span class=\"na\">toString</span><span class=\"o\">());</span>\n</code></pre></div></div>\n\n<p>For example, I am currently debugging a sulphur atom type perception problem, for which the simplest\nsubstructure looks like (<a href=\"https://chem-bla-ics.linkedchemistry.info/2008/05/03/wicked-chemistry-and-unit-testing.html\">sid=12279910 <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\n<code class=\"language-plaintext highlighter-rouge\">InChI=1/C2H7NS/c1-4(2)3/h3H,1-2H3)</code>:</p>\n\n<p><img src=\"/assets/images/sid12279910.png\" alt=\"\" /></p>\n\n<p>I can convert this PubChem entry to CDK source code with:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nv\">$ </span>bsh <span class=\"nt\">-classpath</span> dist/jar/cdk-svn-20080221.jar tools/pubchem2unittest.bsh 12279910\n</code></pre></div></div>\n\n<p>Resulting in this output which I can copy/paste into my unit test:</p>\n\n<div class=\"language-java highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nc\">IMolecule</span> <span class=\"n\">mol</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"nc\">Molecule</span><span class=\"o\">();</span>\n<span class=\"nc\">IAtom</span> <span class=\"n\">a1</span> <span class=\"o\">=</span> <span class=\"n\">mol</span><span class=\"o\">.</span><span class=\"na\">getBuilder</span><span class=\"o\">().</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"s\">\"S\"</span><span class=\"o\">);</span>\n<span class=\"n\">a1</span><span class=\"o\">.</span><span class=\"na\">setPoint2d</span><span class=\"o\">(</span><span class=\"k\">new</span> <span class=\"nc\">Point2d</span><span class=\"o\">(</span><span class=\"mf\">2.866</span><span class=\"o\">,</span> <span class=\"mf\">0.25</span><span class=\"o\">));</span>  <span class=\"n\">mol</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">a1</span><span class=\"o\">);</span>\n<span class=\"nc\">IAtom</span> <span class=\"n\">a2</span> <span class=\"o\">=</span> <span class=\"n\">mol</span><span class=\"o\">.</span><span class=\"na\">getBuilder</span><span class=\"o\">().</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"s\">\"N\"</span><span class=\"o\">);</span>\n<span class=\"n\">a2</span><span class=\"o\">.</span><span class=\"na\">setPoint2d</span><span class=\"o\">(</span><span class=\"k\">new</span> <span class=\"nc\">Point2d</span><span class=\"o\">(</span><span class=\"mf\">3.7321</span><span class=\"o\">,</span> <span class=\"mf\">0.75</span><span class=\"o\">));</span>  <span class=\"n\">mol</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">a2</span><span class=\"o\">);</span>\n<span class=\"nc\">IAtom</span> <span class=\"n\">a3</span> <span class=\"o\">=</span> <span class=\"n\">mol</span><span class=\"o\">.</span><span class=\"na\">getBuilder</span><span class=\"o\">().</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"s\">\"C\"</span><span class=\"o\">);</span>\n<span class=\"n\">a3</span><span class=\"o\">.</span><span class=\"na\">setPoint2d</span><span class=\"o\">(</span><span class=\"k\">new</span> <span class=\"nc\">Point2d</span><span class=\"o\">(</span><span class=\"mf\">2.0</span><span class=\"o\">,</span> <span class=\"mf\">0.75</span><span class=\"o\">));</span>  <span class=\"n\">mol</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">a3</span><span class=\"o\">);</span>\n<span class=\"nc\">IAtom</span> <span class=\"n\">a4</span> <span class=\"o\">=</span> <span class=\"n\">mol</span><span class=\"o\">.</span><span class=\"na\">getBuilder</span><span class=\"o\">().</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"s\">\"C\"</span><span class=\"o\">);</span>\n<span class=\"n\">a4</span><span class=\"o\">.</span><span class=\"na\">setPoint2d</span><span class=\"o\">(</span><span class=\"k\">new</span> <span class=\"nc\">Point2d</span><span class=\"o\">(</span><span class=\"mf\">2.866</span><span class=\"o\">,</span> <span class=\"o\">-</span><span class=\"mf\">0.75</span><span class=\"o\">));</span>  <span class=\"n\">mol</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">a4</span><span class=\"o\">);</span>\n<span class=\"nc\">IAtom</span> <span class=\"n\">a5</span> <span class=\"o\">=</span> <span class=\"n\">mol</span><span class=\"o\">.</span><span class=\"na\">getBuilder</span><span class=\"o\">().</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"s\">\"H\"</span><span class=\"o\">);</span>\n<span class=\"n\">a5</span><span class=\"o\">.</span><span class=\"na\">setPoint2d</span><span class=\"o\">(</span><span class=\"k\">new</span> <span class=\"nc\">Point2d</span><span class=\"o\">(</span><span class=\"mf\">2.31</span><span class=\"o\">,</span> <span class=\"mf\">1.2869</span><span class=\"o\">));</span>  <span class=\"n\">mol</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">a5</span><span class=\"o\">);</span>\n<span class=\"nc\">IAtom</span> <span class=\"n\">a6</span> <span class=\"o\">=</span> <span class=\"n\">mol</span><span class=\"o\">.</span><span class=\"na\">getBuilder</span><span class=\"o\">().</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"s\">\"H\"</span><span class=\"o\">);</span>\n<span class=\"n\">a6</span><span class=\"o\">.</span><span class=\"na\">setPoint2d</span><span class=\"o\">(</span><span class=\"k\">new</span> <span class=\"nc\">Point2d</span><span class=\"o\">(</span><span class=\"mf\">1.4631</span><span class=\"o\">,</span> <span class=\"mf\">1.06</span><span class=\"o\">));</span>  <span class=\"n\">mol</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">a6</span><span class=\"o\">);</span>\n<span class=\"nc\">IAtom</span> <span class=\"n\">a7</span> <span class=\"o\">=</span> <span class=\"n\">mol</span><span class=\"o\">.</span><span class=\"na\">getBuilder</span><span class=\"o\">().</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"s\">\"H\"</span><span class=\"o\">);</span>\n<span class=\"n\">a7</span><span class=\"o\">.</span><span class=\"na\">setPoint2d</span><span class=\"o\">(</span><span class=\"k\">new</span> <span class=\"nc\">Point2d</span><span class=\"o\">(</span><span class=\"mf\">1.69</span><span class=\"o\">,</span> <span class=\"mf\">0.2131</span><span class=\"o\">));</span>  <span class=\"n\">mol</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">a7</span><span class=\"o\">);</span>\n<span class=\"nc\">IAtom</span> <span class=\"n\">a8</span> <span class=\"o\">=</span> <span class=\"n\">mol</span><span class=\"o\">.</span><span class=\"na\">getBuilder</span><span class=\"o\">().</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"s\">\"H\"</span><span class=\"o\">);</span>\n<span class=\"n\">a8</span><span class=\"o\">.</span><span class=\"na\">setPoint2d</span><span class=\"o\">(</span><span class=\"k\">new</span> <span class=\"nc\">Point2d</span><span class=\"o\">(</span><span class=\"mf\">2.246</span><span class=\"o\">,</span> <span class=\"o\">-</span><span class=\"mf\">0.75</span><span class=\"o\">));</span>  <span class=\"n\">mol</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">a8</span><span class=\"o\">);</span>\n<span class=\"nc\">IAtom</span> <span class=\"n\">a9</span> <span class=\"o\">=</span> <span class=\"n\">mol</span><span class=\"o\">.</span><span class=\"na\">getBuilder</span><span class=\"o\">().</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"s\">\"H\"</span><span class=\"o\">);</span>\n<span class=\"n\">a9</span><span class=\"o\">.</span><span class=\"na\">setPoint2d</span><span class=\"o\">(</span><span class=\"k\">new</span> <span class=\"nc\">Point2d</span><span class=\"o\">(</span><span class=\"mf\">2.866</span><span class=\"o\">,</span> <span class=\"o\">-</span><span class=\"mf\">1.37</span><span class=\"o\">));</span>  <span class=\"n\">mol</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">a9</span><span class=\"o\">);</span>\n<span class=\"nc\">IAtom</span> <span class=\"n\">a10</span> <span class=\"o\">=</span> <span class=\"n\">mol</span><span class=\"o\">.</span><span class=\"na\">getBuilder</span><span class=\"o\">().</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"s\">\"H\"</span><span class=\"o\">);</span>\n<span class=\"n\">a10</span><span class=\"o\">.</span><span class=\"na\">setPoint2d</span><span class=\"o\">(</span><span class=\"k\">new</span> <span class=\"nc\">Point2d</span><span class=\"o\">(</span><span class=\"mf\">3.486</span><span class=\"o\">,</span> <span class=\"o\">-</span><span class=\"mf\">0.75</span><span class=\"o\">));</span>  <span class=\"n\">mol</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">a10</span><span class=\"o\">);</span>\n<span class=\"nc\">IAtom</span> <span class=\"n\">a11</span> <span class=\"o\">=</span> <span class=\"n\">mol</span><span class=\"o\">.</span><span class=\"na\">getBuilder</span><span class=\"o\">().</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"s\">\"H\"</span><span class=\"o\">);</span>\n<span class=\"n\">a11</span><span class=\"o\">.</span><span class=\"na\">setPoint2d</span><span class=\"o\">(</span><span class=\"k\">new</span> <span class=\"nc\">Point2d</span><span class=\"o\">(</span><span class=\"mf\">4.269</span><span class=\"o\">,</span> <span class=\"mf\">0.44</span><span class=\"o\">));</span>  <span class=\"n\">mol</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">a11</span><span class=\"o\">);</span>\n<span class=\"nc\">IBond</span> <span class=\"n\">b1</span> <span class=\"o\">=</span> <span class=\"n\">mol</span><span class=\"o\">.</span><span class=\"na\">getBuilder</span><span class=\"o\">().</span><span class=\"na\">newBond</span><span class=\"o\">(</span><span class=\"n\">a1</span><span class=\"o\">,</span> <span class=\"n\">a2</span><span class=\"o\">,</span> <span class=\"no\">DOUBLE</span><span class=\"o\">);</span>\n<span class=\"n\">mol</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">b1</span><span class=\"o\">);</span>\n<span class=\"nc\">IBond</span> <span class=\"n\">b2</span> <span class=\"o\">=</span> <span class=\"n\">mol</span><span class=\"o\">.</span><span class=\"na\">getBuilder</span><span class=\"o\">().</span><span class=\"na\">newBond</span><span class=\"o\">(</span><span class=\"n\">a1</span><span class=\"o\">,</span> <span class=\"n\">a3</span><span class=\"o\">,</span> <span class=\"no\">SINGLE</span><span class=\"o\">);</span>\n<span class=\"n\">mol</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">b2</span><span class=\"o\">);</span>\n<span class=\"nc\">IBond</span> <span class=\"n\">b3</span> <span class=\"o\">=</span> <span class=\"n\">mol</span><span class=\"o\">.</span><span class=\"na\">getBuilder</span><span class=\"o\">().</span><span class=\"na\">newBond</span><span class=\"o\">(</span><span class=\"n\">a1</span><span class=\"o\">,</span> <span class=\"n\">a4</span><span class=\"o\">,</span> <span class=\"no\">SINGLE</span><span class=\"o\">);</span>\n<span class=\"n\">mol</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">b3</span><span class=\"o\">);</span>\n<span class=\"nc\">IBond</span> <span class=\"n\">b4</span> <span class=\"o\">=</span> <span class=\"n\">mol</span><span class=\"o\">.</span><span class=\"na\">getBuilder</span><span class=\"o\">().</span><span class=\"na\">newBond</span><span class=\"o\">(</span><span class=\"n\">a2</span><span class=\"o\">,</span> <span class=\"n\">a11</span><span class=\"o\">,</span> <span class=\"no\">SINGLE</span><span class=\"o\">);</span>\n<span class=\"n\">mol</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">b4</span><span class=\"o\">);</span>\n<span class=\"nc\">IBond</span> <span class=\"n\">b5</span> <span class=\"o\">=</span> <span class=\"n\">mol</span><span class=\"o\">.</span><span class=\"na\">getBuilder</span><span class=\"o\">().</span><span class=\"na\">newBond</span><span class=\"o\">(</span><span class=\"n\">a3</span><span class=\"o\">,</span> <span class=\"n\">a5</span><span class=\"o\">,</span> <span class=\"no\">SINGLE</span><span class=\"o\">);</span>\n<span class=\"n\">mol</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">b5</span><span class=\"o\">);</span>\n<span class=\"nc\">IBond</span> <span class=\"n\">b6</span> <span class=\"o\">=</span> <span class=\"n\">mol</span><span class=\"o\">.</span><span class=\"na\">getBuilder</span><span class=\"o\">().</span><span class=\"na\">newBond</span><span class=\"o\">(</span><span class=\"n\">a3</span><span class=\"o\">,</span> <span class=\"n\">a6</span><span class=\"o\">,</span> <span class=\"no\">SINGLE</span><span class=\"o\">);</span>\n<span class=\"n\">mol</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">b6</span><span class=\"o\">);</span>\n<span class=\"nc\">IBond</span> <span class=\"n\">b7</span> <span class=\"o\">=</span> <span class=\"n\">mol</span><span class=\"o\">.</span><span class=\"na\">getBuilder</span><span class=\"o\">().</span><span class=\"na\">newBond</span><span class=\"o\">(</span><span class=\"n\">a3</span><span class=\"o\">,</span> <span class=\"n\">a7</span><span class=\"o\">,</span> <span class=\"no\">SINGLE</span><span class=\"o\">);</span>\n<span class=\"n\">mol</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">b7</span><span class=\"o\">);</span>\n<span class=\"nc\">IBond</span> <span class=\"n\">b8</span> <span class=\"o\">=</span> <span class=\"n\">mol</span><span class=\"o\">.</span><span class=\"na\">getBuilder</span><span class=\"o\">().</span><span class=\"na\">newBond</span><span class=\"o\">(</span><span class=\"n\">a4</span><span class=\"o\">,</span> <span class=\"n\">a8</span><span class=\"o\">,</span> <span class=\"no\">SINGLE</span><span class=\"o\">);</span>\n<span class=\"n\">mol</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">b8</span><span class=\"o\">);</span>\n<span class=\"nc\">IBond</span> <span class=\"n\">b9</span> <span class=\"o\">=</span> <span class=\"n\">mol</span><span class=\"o\">.</span><span class=\"na\">getBuilder</span><span class=\"o\">().</span><span class=\"na\">newBond</span><span class=\"o\">(</span><span class=\"n\">a4</span><span class=\"o\">,</span> <span class=\"n\">a9</span><span class=\"o\">,</span> <span class=\"no\">SINGLE</span><span class=\"o\">);</span>\n<span class=\"n\">mol</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">b9</span><span class=\"o\">);</span>\n<span class=\"nc\">IBond</span> <span class=\"n\">b10</span> <span class=\"o\">=</span> <span class=\"n\">mol</span><span class=\"o\">.</span><span class=\"na\">getBuilder</span><span class=\"o\">().</span><span class=\"na\">newBond</span><span class=\"o\">(</span><span class=\"n\">a4</span><span class=\"o\">,</span> <span class=\"n\">a10</span><span class=\"o\">,</span> <span class=\"no\">SINGLE</span><span class=\"o\">);</span>\n<span class=\"n\">mol</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">b10</span><span class=\"o\">);</span>\n</code></pre></div></div>",
      "summary": "After a discussion on starting development releases for CDK on cdk-devel, the discussion continued on the state of the CDK atom typer . Dan and Rajarshi have done tests in the past against PubChem and its DTP/NCI subset. Rajarshi made his analysis part of CDK Nightly, and provides but a summary (which seems broken: zero fails) and a detailed list.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/sid12279910.png",
      "date_published": "2008-05-03T00:00:00+00:00",
      "date_modified": "2025-10-11T00:00:00+00:00",
      "tags": ["cdk","pubchem"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/frwk8-r6898",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/05/02/comparing-junit-test-results-between.html",
      "title": "Comparing JUnit test results between CDK trunk/ and a branch #2",
      "content_html": "<p>I reported earlier on how to compare unit test results <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/11/07/comparing-junit-test-results-between.html\">between CDK trunk and a branch <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.\nLater, I noted that the diff typically overestimates the fail count, when unit tests had been moved to a different module. Therefore, a sort has\nto be added. The code is also updated for the SVN directory restructuring:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nv\">$ </span>cdk cdk/\n<span class=\"nv\">$ </span>cdk trunk/\n<span class=\"nv\">$ </span>ant <span class=\"nt\">-lib</span> develjar/junit-4.3.1.jar <span class=\"nt\">-logfile</span> ant.log test-all\n<span class=\"nv\">$ </span><span class=\"nb\">cd</span> ../branches/miguelrojasch-CMLReact\n<span class=\"nv\">$ </span>ant <span class=\"nt\">-lib</span> develjar/junit-4.3.1.jar <span class=\"nt\">-logfile</span> ant.log test-all\n<span class=\"nv\">$ </span><span class=\"nb\">cd</span> ..\n<span class=\"nv\">$ </span><span class=\"nb\">grep </span>Testcase trunk/reports/<span class=\"k\">*</span>.txt | <span class=\"nb\">cut</span> <span class=\"nt\">-d</span><span class=\"s1\">':'</span> <span class=\"nt\">-f2</span>,3 | <span class=\"nb\">sort</span> <span class=\"o\">&gt;</span> trunk.results\n<span class=\"nv\">$ </span><span class=\"nb\">grep </span>Testcase branches/miguelrojasch-CMLReact/reports/<span class=\"k\">*</span>.txt | <span class=\"nb\">cut</span> <span class=\"nt\">-d</span><span class=\"s1\">':'</span> <span class=\"nt\">-f2</span>,3 | <span class=\"nb\">sort</span> <span class=\"o\">&gt;</span> branch.results\n<span class=\"nv\">$ </span>diff <span class=\"nt\">-u</span> trunk.results branch.results\n</code></pre></div></div>\n\n<p>Obviously, you can still use wc for counting changes:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nv\">$ </span>diff <span class=\"nt\">-u</span> trunk.results branch.results | <span class=\"nb\">grep</span> <span class=\"s2\">\"^-Testcase\"</span> | <span class=\"nb\">wc</span> <span class=\"nt\">-l</span>\n<span class=\"nv\">$ </span>diff <span class=\"nt\">-u</span> trunk.results branch.results | <span class=\"nb\">grep</span> <span class=\"s2\">\"^+Testcase\"</span> | <span class=\"nb\">wc</span> <span class=\"nt\">-l</span>\n</code></pre></div></div>\n\n<p>A second improvement, would be taking advantage of the ant.log files that are created anyway. Using the\n<a href=\"http://www.beanshell.org/\">BeanShell</a> tool <a href=\"http://cdk.svn.sourceforge.net/viewvc/cdk/cdk/trunk/tools/extractTestStats.bsh?revision=10760&amp;view=markup\">tools/extractTestStats.bsh <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nrevision 10760 (see also <a href=\"https://chem-bla-ics.linkedchemistry.info/2008/02/27/cdk-is-now-available-from-your-nearest.html\">this blog on bsh</a>):</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nv\">$ </span>bsh trunk/tools/extractTestStats.bsh trunk/ant.log | <span class=\"nb\">grep </span>run | <span class=\"nb\">grep</span> <span class=\"nt\">-v</span> total | <span class=\"nb\">grep</span> <span class=\"nt\">-v</span> antlogFile | <span class=\"nb\">cut</span> <span class=\"nt\">-d</span><span class=\"s1\">' '</span> <span class=\"nt\">-f1-4</span> | <span class=\"nb\">sort</span> <span class=\"o\">&gt;</span> trunk.overview\n<span class=\"nv\">$ </span>bsh trunk/tools/extractTestStats.bsh branches/miguelrojasch-CMLReact/ant.log | <span class=\"nb\">grep </span>run | <span class=\"nb\">grep</span> <span class=\"nt\">-v</span> total | <span class=\"nb\">grep</span> <span class=\"nt\">-v</span> antlogFile | <span class=\"nb\">cut</span> <span class=\"nt\">-d</span><span class=\"s1\">' '</span> <span class=\"nt\">-f1-4</span> | <span class=\"nb\">sort</span> <span class=\"o\">&gt;</span> branch.overview\n<span class=\"nv\">$ </span>diff <span class=\"nt\">-u</span> trunk.overview branch.overview\n</code></pre></div></div>",
      "summary": "I reported earlier on how to compare unit test results between CDK trunk and a branch . Later, I noted that the diff typically overestimates the fail count, when unit tests had been moved to a different module. Therefore, a sort has to be added. The code is also updated for the SVN directory restructuring:",
      
      "date_published": "2008-05-02T00:00:00+00:00",
      "date_modified": "2025-10-11T00:00:00+00:00",
      "tags": ["cdk","junit"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/13cvj-rgs34",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/04/30/metware-skos-and-java-server-faces.html",
      "title": "MetWare, SKOS and Java Server Faces",
      "content_html": "<p>The <a href=\"http://metware.sf.net/\">MetWare</a> components are slowly coming together. The RAW data upload facility prototype went into beta stage,\nwhile the <a href=\"http://chem-bla-ics.blogspot.com/search?q=SKOS\">SKOS</a> has proven really useful for various things. <!-- keep link --></p>\n\n<p>Because of being compatible with various Java libraries and tools, we decided some time ago to use Java. We also wanted to start of with a\nHTML GUI to MetWare, which led us to <a href=\"http://java.sun.com/javaee/javaserverfaces/\">Java Server Faces</a>. Not being so fond of Tomcat (e.g.\nuse by the <a href=\"http://www.nmrshiftdb.org/\">NMRShiftDB</a>), I was not sure how that would turn out, but Steffen was rather positive about it. And I like it :)</p>\n\n<p><img src=\"/assets/images/metwareJSF2.png\" alt=\"\" /></p>\n\n<p>The source code for this screenshot is rather simple:</p>\n\n<div class=\"language-xml highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nt\">&lt;table&gt;</span>\n  <span class=\"nt\">&lt;tr</span> <span class=\"na\">valign=</span><span class=\"s\">\"top\"</span><span class=\"nt\">&gt;</span>\n    <span class=\"nt\">&lt;td&gt;&lt;br/&gt;</span>Monoisotopic Mass:<span class=\"nt\">&lt;br/&gt;</span>\n          min=<span class=\"nt\">&lt;inputText</span> <span class=\"na\">id=</span><span class=\"s\">\"monomassmin\"</span> <span class=\"na\">value=</span><span class=\"s\">\"#{metidMetaboliteQuery.monoisotopicMassMin}\"</span><span class=\"nt\">/&gt;&lt;br/&gt;</span>\n          max=<span class=\"nt\">&lt;inputText</span> <span class=\"na\">id=</span><span class=\"s\">\"monomassmax\"</span> <span class=\"na\">value=</span><span class=\"s\">\"#{metidMetaboliteQuery.monoisotopicMassMax}\"</span><span class=\"nt\">/&gt;&lt;br/&gt;</span>\n        <span class=\"nt\">&lt;commandButton</span> <span class=\"na\">value=</span><span class=\"s\">\"Search\"</span> <span class=\"na\">id=</span><span class=\"s\">\"submit\"</span> <span class=\"na\">action=</span><span class=\"s\">\"#{metidMetaboliteQuery.search}\"</span><span class=\"nt\">/&gt;</span>\n\n        <span class=\"c\">&lt;!--  search results --&gt;</span>\n        <span class=\"nt\">&lt;p&gt;&lt;dataTable</span> <span class=\"na\">value=</span><span class=\"s\">\"#{metidMetaboliteQuery.results}\"</span> <span class=\"na\">var=</span><span class=\"s\">\"mbolite\"</span><span class=\"nt\">&gt;</span>\n           <span class=\"nt\">&lt;facet</span> <span class=\"na\">name=</span><span class=\"s\">\"caption\"</span><span class=\"nt\">&gt;</span>Search Results...<span class=\"nt\">&lt;/facet&gt;</span>\n           <span class=\"nt\">&lt;column&gt;</span>\n             <span class=\"nt\">&lt;facet</span> <span class=\"na\">name=</span><span class=\"s\">\"header\"</span><span class=\"nt\">&gt;&lt;outputText</span> <span class=\"na\">value=</span><span class=\"s\">\"Monoisotopic mass\"</span><span class=\"nt\">/&gt;&lt;/facet&gt;</span>\n             <span class=\"nt\">&lt;outputText</span> <span class=\"na\">value=</span><span class=\"s\">\"#{mbolite.monoisotopicMass}\"</span><span class=\"nt\">/&gt;</span>\n           <span class=\"nt\">&lt;/column&gt;</span>\n           <span class=\"nt\">&lt;column&gt;</span>\n             <span class=\"nt\">&lt;facet</span> <span class=\"na\">name=</span><span class=\"s\">\"header\"</span><span class=\"nt\">&gt;&lt;outputText</span> <span class=\"na\">value=</span><span class=\"s\">\"InChIKey\"</span><span class=\"nt\">/&gt;&lt;/facet&gt;</span>\n             <span class=\"nt\">&lt;outputText</span> <span class=\"na\">value=</span><span class=\"s\">\"#{mbolite.inchikey}\"</span><span class=\"nt\">/&gt;</span>\n           <span class=\"nt\">&lt;/column&gt;</span>\n         <span class=\"nt\">&lt;/dataTable&gt;&lt;/p&gt;</span>\n        <span class=\"nt\">&lt;/td&gt;</span>\n    <span class=\"nt\">&lt;td</span> <span class=\"na\">width=</span><span class=\"s\">\"25%\"</span><span class=\"nt\">&gt;</span>\n      <span class=\"nt\">&lt;b&gt;&lt;outputText</span> <span class=\"na\">id=</span><span class=\"s\">\"tabelName\"</span> <span class=\"na\">value=</span><span class=\"s\">\"#{metidMetabolite.prefLabel}\"</span><span class=\"nt\">/&gt;</span>:<span class=\"nt\">&lt;/b&gt;</span>\n      <span class=\"nt\">&lt;br/&gt;</span>\n      <span class=\"nt\">&lt;outputText</span> <span class=\"na\">id=</span><span class=\"s\">\"tabelDef\"</span> <span class=\"na\">value=</span><span class=\"s\">\"#{metidMetabolite.definition}\"</span><span class=\"nt\">/&gt;</span>\n    <span class=\"nt\">&lt;/td&gt;</span>\n  <span class=\"nt\">&lt;/tr&gt;</span>\n<span class=\"nt\">&lt;/table&gt;</span>\n</code></pre></div></div>\n\n<p>Key concept here is that JSF uses <a href=\"https://en.wikipedia.org/wiki/JavaBeans\">Java Beans</a>, which are referred to in the above example with code like <code class=\"language-plaintext highlighter-rouge\">#{bean.field}</code>\nfor bean fields, and with <code class=\"language-plaintext highlighter-rouge\">#{bean.method}</code>, assuming a bean exists with <code class=\"language-plaintext highlighter-rouge\">getField()</code>, <code class=\"language-plaintext highlighter-rouge\">setField()</code> and <code class=\"language-plaintext highlighter-rouge\">method()</code>.\nThe <code class=\"language-plaintext highlighter-rouge\">&lt;h:outputText&gt;</code> stuff is JSF to work out bean details and will create HTML in the output. As really brief intro.</p>\n\n<h2 id=\"the-metware-beans\">The Metware Beans</h2>\n\n<p>It is clear that java beans for Metware would be useful, and this is what I have been working on for the last few weeks.\nThe relevant beans for the above example are automagically created from the SKOS, complemented with extra bits of RDF\nfor the additional details, like field data type, mapping to SQL tables, and an example value. This all works very\nsmoothly (the code to <code class=\"language-plaintext highlighter-rouge\">load()</code> and <code class=\"language-plaintext highlighter-rouge\">save()</code> into the SQL database is automatically generated too!) as you can see in\nthe above example. The screenshot shows matches from a (local) live SQL metabolomics database. The text on the right\nside is directly taken from the SKOS.</p>\n\n<p>Now, the bean library allows integration with other tools too, though this cannot be found in our current roadmap.\nBut, for example, I have been thinking about a simple <a href=\"http://www.bioclipse.net/\">Bioclipse</a> wrapper around these\nbeans. What is on our roadmap involves workflows for metabolomics.</p>",
      "summary": "The MetWare components are slowly coming together. The RAW data upload facility prototype went into beta stage, while the SKOS has proven really useful for various things.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/metwareJSF2.png",
      "date_published": "2008-04-30T00:00:00+00:00",
      "date_modified": "2025-09-13T00:00:00+00:00",
      "tags": ["metware","java","bioclipse","skos","nmrshiftdb"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/gqaxm-b3d59",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/04/28/jchempaint-for-bioclipse2.html",
      "title": "JChemPaint for Bioclipse2",
      "content_html": "<p>Today <a href=\"http://bioclipse.blogspot.com/\">Ola</a>, Jonathan and I have a mini-hack session on getting <a href=\"http://jchempaint.sf.net/\">JChemPaint</a>\nsupport ported from Bioclipse1 to <a href=\"http://wiki.bioclipse.net/index.php?title=Bioclipse2\">Bioclipse2</a>. And, we made some progress:</p>\n\n<p><img src=\"/assets/images/jcpInBC2.png\" alt=\"\" /></p>\n\n<p>I’m sure there is still a lot to do, but this is promising… :)</p>\n\n<p>Oh, and BTW, this is based on the JChemPaint 2.3 / CDK 1.0.2 branch, in case you are interested in those details.</p>",
      "summary": "Today Ola, Jonathan and I have a mini-hack session on getting JChemPaint support ported from Bioclipse1 to Bioclipse2. And, we made some progress:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/jcpInBC2.png",
      "date_published": "2008-04-28T00:10:00+00:00",
      "date_modified": "2008-04-28T00:10:00+00:00",
      "tags": ["bioclipse","jchempaint"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/harap-fxc33",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/04/28/blog-comments-no-peer-reviews.html",
      "title": "Blog Comments? No, Peer Reviews!",
      "content_html": "<p>Via <a href=\"http://www.coronene.com/blog/\">Carbon-Based Curiosities</a>’s blogroll I found a number of new blogs (on top of the\n<a href=\"http://chemicalblogspace.blogspot.com/2008/04/new-blogs-9.html\">list I posted yesterday</a>), and just added them to\n<a href=\"http://cb.openmolcules.net/\">Chemical blogspace</a>. This is something I found in <a href=\"http://infiniflux.blogspot.com/\">Infiniflux!</a>:</p>\n\n<p><img src=\"/assets/images/peerReviews.png\" alt=\"\" /></p>\n\n<p>Blog comments? No, Peer Reviews! Nice thought, Joel! I’ll copy that, if you don’t mind.</p>",
      "summary": "Via Carbon-Based Curiosities’s blogroll I found a number of new blogs (on top of the list I posted yesterday), and just added them to Chemical blogspace. This is something I found in Infiniflux!:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/peerReviews.png",
      "date_published": "2008-04-28T00:00:00+00:00",
      "date_modified": "2008-04-28T00:00:00+00:00",
      "tags": ["blog"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/37jh5-ybw75",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/04/27/comments-on-rethinking-software-access.html",
      "title": "Comments on &apos;Rethinking software access&apos;",
      "content_html": "<p><a href=\"http://mndoci.com/blog/blog/\">bbgm</a> was <a href=\"http://mndoci.com/blog/2008/04/26/rethinking-software-access/\">rethinking software access</a>.\nThe blog observes:</p>\n\n<ol>\n  <li>current commercial licensing is unfriendly towards home science</li>\n  <li>bench tools do not easily allow mash ups</li>\n</ol>\n\n<h2 id=\"about-1\">About 1</h2>\n<p>Actually, much of the work I have been doing in opensource chemoinformatics was done as ‘home’ science; I started as organic chemist\nstudent, and later data analyst, while the CDK/Jmol/JChemPaint was something I did at home because I liked, and needed it. I started\nin 1995 working on a website to aid my organic chemistry studies, the <em>Woordenboek Organische Chemie</em> (open data). And, I needed\nsemantic tools for 2D and 3D display of molecular structure. Commercial offerings were not an option, for me as student, so I got\ninvolved with the <a href=\"http://cml.sourceforge.net/\">Chemical Markup Language</a>, <a href=\"http://www.jmol.org/\">Jmol</a> and\n<a href=\"http://jchempaint.sf.net/\">JChemPaint</a> in 1997-98.</p>\n\n<p>Note, that in that time free academic licenses were rarer than now. I always had, and still have, the feeling that those clauses are\njust there to give academics a reason to support non-opensource tools. Also note that a lot of commercial offerings started as\nincorporation of the code base of some PhD work. Not uncommonly, the PhD would simply be hired by the company.</p>\n\n<p>Fact is, commercial chemoinformatics licenses are indeed unfriendly for scientists who maintain related hobbies at home. And,\ngiven my experience, I appreciate your worries: the high costs for those tools, which I certainly could not afford with my student\nfunding, drove me to the opensource ideas many, many years ago.</p>\n\n<h2 id=\"about-2\">About 2</h2>\n<p>The second issue brought up, regards the ability to make mash ups. Open source and open standards are indeed important to make\nmash ups, though the former only helps you work around lack of use of open standards. Using web services contributes to the\nsolution as it has a well-defined, open standard interface. Open source is particularly important for reproducibility of scientific\nresults (see <a href=\"https://chem-bla-ics.linkedchemistry.info/2008/03/01/todo-april-2nd-defend-my-phd-work.html\">my thesis <i class=\"fa-solid fa-recycle fa-xs\"></i></a>), and is the opposite of\nproprietary software, not commercial software. So, it seems bbgm is just looking for\n<a href=\"http://www.blueobelisk.org/\">Blue Obelisk</a> projects.</p>\n\n<p>On a practical note, I think that <a href=\"http://www.bioclipse.net/\">Bioclipse</a> might just be what you are looking for, and integrates\nlocal services as well as services on the internet, just alike. Particularly, the upcoming\n<a href=\"http://wiki.bioclipse.net/index.php?title=Bioclipse2\">Bioclipse2</a> is strong at this, and supports\n<a href=\"http://wiki.bioclipse.net/index.php?title=Bc_webservices\">SOAP</a>, BioMart,\n<a href=\"http://wiki.bioclipse.net/index.php?title=BioMoby_plugin\">BioMoby</a> for online services (also\n<a href=\"http://bioclipse.blogspot.com/2008/03/general-service-infrastructure-in.html\">see this</a>), as well as\n<a href=\"http://www.r-project.org/\">R</a>, BioJava, CDK, Jmol as local services. You can even\n<a href=\"http://wiki.bioclipse.net/index.php?title=Run_Workflows_inside_Bioclipse\">run Taverna workflows</a> from within Bioclipse, if you\nlike. Mash ups can be done in various ways. Hard code Java coders would go the RCP plugin way, for example\n<a href=\"http://bioclipse.blogspot.com/2008/04/jnanotube-nanotube-plugin-for-bioclipse.html\">this nanotube example</a>. Others will prefer\n<a href=\"http://bioclipse.blogspot.com/2008/01/complete.html\">scripting languages</a>, such as JavaScript and Ruby (in addition to R and\nJmol scripting). Or, you might do record as script the tihngs you did graphically, using the\n<a href=\"http://bioclipse.blogspot.com/2008/03/recording-progress.html\">recording feature</a>.</p>\n\n<p>Of course, there are other solutions… Bioclipse is just one, one to which I contributed.</p>\n\n<h2 id=\"about-running-webservices\">About running webservices…</h2>\n<p>Running webservices, is basically being hosting provider, and requires some commercial model. One conflicting problem is that,\nat least being said, that large groups withing the potential user base, aka pharma industry, does not even like sending over\ntheir highly secret data over an <code class=\"language-plaintext highlighter-rouge\">httpS://</code> line to the outside world.</p>\n\n<p><a href=\"http://cheminfo.informatics.indiana.edu/~rguha/\">Rajarshi</a> and the rest of the Indiana group have been running chemoinformatics\nwebservices. They might be the provider you are looking for.</p>\n\n<h2 id=\"conclusion\">Conclusion</h2>\n<p>All I can say to bbgm: “Yes, your two thoughts are indeed issues, and many from within the Blue Obelisk community have been\naddressing them.” Oh, and we will not stop either. <a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/\">Peter</a> recently gave in\nNature a nice overview of what we, Blue Obelisk members, have been cooking on:\n<a href=\"http://www.nature.com/nature/journal/v451/n7179/full/451648a.html\">Chemistry for Everyone</a>: and <em>that</em> includes the\nhobby scientist.</p>",
      "summary": "bbgm was rethinking software access. The blog observes:",
      
      "date_published": "2008-04-27T00:00:00+00:00",
      "date_modified": "2025-10-11T00:00:00+00:00",
      "tags": ["opensource","cml","jmol","jchempaint"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1038/451648a", "doi": "10.1038/451648a"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/tkxq7-gn425",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/04/24/more-cdk-ruby-users.html",
      "title": "More CDK-Ruby users...",
      "content_html": "<p>Via <a href=\"http://depth-first.com/\">Rich</a>’ blog, I was <a href=\"http://depth-first.com/articles/2008/04/23/campdepict-building-a-simple-smiles-depict-web-application-with-jruby-structure-cdk-and-camping\">informed</a>\nabout the work by <a href=\"http://goeslightly.blogspot.com/\">goesLightly</a> on <a href=\"http://goeslightly.blogspot.com/2008/04/campdepict-jruby-cdk-and-camping.html\">CampDepict</a>,\na Ruby-based application which uses the <a href=\"http://cdk.sf.net/\">CDK</a> for SMILES parsing and 2D diagram generation. With\n<code class=\"language-plaintext highlighter-rouge\">cdk-20060714.jar</code> it’s using pretty ancient code, and I have not seen a screenshot.</p>\n\n<p>Anyway, it’s nice to see another blogger into the CDK :)</p>",
      "summary": "Via Rich’ blog, I was informed about the work by goesLightly on CampDepict, a Ruby-based application which uses the CDK for SMILES parsing and 2D diagram generation. With cdk-20060714.jar it’s using pretty ancient code, and I have not seen a screenshot.",
      
      "date_published": "2008-04-24T00:00:00+00:00",
      "date_modified": "2008-04-24T00:00:00+00:00",
      "tags": ["cdk","ruby"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/jgk66-e2a52",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/04/22/quality-publishing-endnote-versus.html",
      "title": "Quality Publishing: EndNote versus InChIs",
      "content_html": "<p>Some publishers hesitate a bit, but others go full speed ahead into the electronic publishing era.\n<a href=\"http://baoilleach.blogspot.com/\">Noel</a> commented on my <a href=\"https://chem-bla-ics.linkedchemistry.info/2008/04/21/open-access-open-data-leads-to-added.html\">post about OA/OD inviting added value <i class=\"fa-solid fa-recycle fa-xs\"></i></a>:</p>\n\n<blockquote>\n  <p>I heard a talk by the <a href=\"http://www.rsc.org/\">RSC</a> at the ACS, saying that their RSS feeds contain\n<a href=\"http://www.iupac.org/inchi/\">InChI</a>s now! Just thought I’d throw that out there :-)</p>\n</blockquote>\n\n<p>The RSC <a href=\"http://www.rsc.org/Publishing/Journals/ProjectProspect/\">Project Prospect</a> is ahead of other\npublishers, for <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/02/01/rsc-first-publisher-to-go-semantic.html\">over a year already <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.\nAdding InChIs to RSS feeds are a cheap way of adding machine-readable chemistry to ones publishing\npipeline; adding CML would allow much more detail (see\n<a href=\"http://chem-bla-ics.blogspot.com/search?q=CMLRSS\">this overview of CMLRSS information in my blog</a>). <!-- keep link --></p>\n\n<p>But, importantly, it allows third-parties to efficiently set up DOI-InChI tables. Cheap (Asian?) workers become\nrather expensive, when compared to machine mining to create such databases. Sure, the authoring becomes\nsomewhat more expensive, but who will argue that scientists might be a bit more precise in what they\npublish. I, for sure, would love to see authors focus on adding InChIs to experimental sections, then\nthat they focus on getting EndNote to put the comma, bold and upper casing in the right place, to meet\njournal standards.</p>\n\n<p>Another publisher who takes its job seriously is <a href=\"http://www.beilstein-journals.org/\">Beilstein</a>.\nStephan recently showed me some of the things they are up too, like information rich figures (yes,\nyou’ll have access to the source, and identify the molecular structures in reaction schema). He\nalso showed me to the RDF now by default available for all their articles. For example, for\nDOI:<a href=\"http://dx.doi.org/10.1186/1860-5397-3-50\">10.1186/1860-5397-3-50</a>,\n<a href=\"http://www.beilstein-journals.org/bjoc/content/rdf/1860-5397-3-50.rdf\">the RDF is available here</a>.\nIt’s indicated in the HTML with:</p>\n\n<div class=\"language-html highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nt\">&lt;link</span> <span class=\"na\">rel=</span><span class=\"s\">'alternate'</span> <span class=\"na\">type=</span><span class=\"s\">'text/rdf'</span> <span class=\"na\">title=</span><span class=\"s\">'RDF'</span> <span class=\"na\">href=</span><span class=\"s\">'http://www.beilstein-journals.org/bjoc/content/rdf/1860-5397-3-50.rdf'</span><span class=\"nt\">/&gt;</span>\n</code></pre></div></div>\n\n<p>There is, actually, also a lot of citation information available in the <code class=\"language-plaintext highlighter-rouge\">&lt;meta&gt;</code> tags in the HTML,\nbut apparently not the right stuff yet to have <a href=\"http://www.zotero.org/\">Zotero</a> pick it up nicely\n(not sure what this Firefox plugin is actually looking for). No chemistry in the RDF it seems,\nbut there is <a href=\"https://chem-bla-ics.linkedchemistry.info/2008/03/10/my-foaf-network-3-my-publications.html\">BIBO <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\n<a href=\"https://chem-bla-ics.linkedchemistry.info/tag/foaf\">FOAF <i class=\"fa-solid fa-recycle fa-xs\"></i></a> and Dublin Core.</p>\n\n<p>Main suggestion to Stephan, right now, would be to include InChIs in the RDF and RSS feed.</p>\n\n<p>Disclaimer: <a href=\"http://chem-bla-ics.blogspot.com/search?q=project+prospect+CUBIC\">Colin</a>, behind <!-- keep link -->\nProject Prospect, visit our group when I was still in Cologne; Stephan contributed code bits\nto the CDK project, e.g. this <a href=\"http://cheminfo.informatics.indiana.edu/~rguha/code/java/nightly/api/org/openscience/cdk/math/Matrix.html\">this Matrix class</a>.</p>\n\n<p>Oh, <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/10/16/lunch-at-nature-hq-with-euan-joanna-ian.html\">Nature <i class=\"fa-solid fa-recycle fa-xs\"></i></a> is,\nof course, also a publisher who actively gets into electronic publishing age.</p>",
      "summary": "Some publishers hesitate a bit, but others go full speed ahead into the electronic publishing era. Noel commented on my post about OA/OD inviting added value :",
      
      "date_published": "2008-04-22T00:00:00+00:00",
      "date_modified": "2025-10-11T00:00:00+00:00",
      "tags": ["publishing","inchi","cml","rss","foaf"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1186/1860-5397-3-50", "doi": "10.1186/1860-5397-3-50"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/1h7ay-3kv21",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/04/21/open-access-open-data-leads-to-added.html",
      "title": "Open Access / Open Data leads to added value",
      "content_html": "<p>Two companies recently showed two things:</p>\n\n<ul>\n  <li>open access and open data allow adding value</li>\n  <li>adding value is easier by forking</li>\n</ul>\n\n<p><a href=\"http://depth-first.com/\">Rich</a>’ <a href=\"http://metamolecular.com/\">MetaMolecular</a> set up <a href=\"https://doi.org/10.59350/nef42-jrb26\">Chempedia <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nwhich combines a substructure-searchable chemical <a href=\"http://wikipedia.org/\">Wikipedia</a>. There is also a\n<a href=\"http://chempedia.net/articles/new\">page to make links</a> to new Wikipedia monographs. Not sure why Rich chose CAS instead of the InChI,\ngiven the recent <a href=\"https://chem-bla-ics.linkedchemistry.info/2008/03/09/chemical-object-identifier-or-freedom.html\">controversy on validity of CAS numbers in Wikipedia <i class=\"fa-solid fa-recycle fa-xs\"></i></a>…\nrealize that this page is for new monograph, of which the CAS number is likely not verified yet, or? On the other hand, the InChI or InChIKey is\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2007/11/16/molecules-in-wikipedia-without-inchis-3.html\">not so abundant in Wikipedia yet <i class=\"fa-solid fa-recycle fa-xs\"></i></a> (I really must make an updated list).</p>\n\n<p><a href=\"http://www.chemspider.com/\">ChemSpider</a> has been using a similar approach to add value to existing resources. The interesting thing in\nthis case, is that these substructure searchable versions, have an interesting spin off: it allows ChemSpider to build a valuable\nDOI-InChI table. So far, I spotted:</p>\n\n<ul>\n  <li><a href=\"http://iucr.chemspider.com/\">iucr.chemspider.com</a> (<a href=\"http://www.chemspider.com/blog/chemspider-rolls-out-website-connected-to-international-union-of-crystallography.html\">Antony’s story</a>)</li>\n  <li><a href=\"http://molbank.chemspider.com/\">molbank.chemspider.com</a> (<a href=\"http://www.chemspider.com/blog/one-more-dedicated-chemspider-website-molbank.html\">Antony’s story</a>)</li>\n  <li><a href=\"https://web.archive.org/web/20081121192530/http://motd.chemspider.com/Chemical-Structure.1020.html\">motd.chemspider.com <i class=\"fa-solid fa-archive fa-xs\"></i></a> (<a href=\"http://www.chemspider.com/blog/dedicated-search-pages-for-subsets-of-data.html\">Antony’s story</a>)</li>\n</ul>\n\n<p>If you wonder how to integrate all data again when things are so distributed, just consider\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2007/12/21/christmas-presents.html\">userscripts <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.</p>",
      "summary": "Two companies recently showed two things:",
      
      "date_published": "2008-04-21T00:00:00+00:00",
      "date_modified": "2025-10-11T00:00:00+00:00",
      "tags": ["chempedia","openscience","chemspider","rdf"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.59350/nef42-jrb26", "doi": "10.59350/nef42-jrb26"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/ea32r-wsp54",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/04/15/make-all-research-results-cc-by.html",
      "title": "&quot;Make all research results CC-BY&quot;",
      "content_html": "<p>While I do not agree in details on the statement made by <a href=\"http://archiv.twoday.net/\">Klaus</a>, I agree with his intentions,\nand happy to propagate the <a href=\"http://archiv.twoday.net/stories/4851871/\">mantra</a>, like\n<a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=1038\">others</a> <a href=\"http://researchremix.wordpress.com/2008/04/10/make-all-research-results-cc-by/\">did</a>\nbefore me:</p>\n\n<blockquote>\n  <p>MAKE ALL RESEARCH RESULTS CC-BY</p>\n</blockquote>\n\n<p>The details I disagree with:</p>\n\n<ul>\n  <li>no need for shouting; we can all perfectly well read it in lower case</li>\n  <li>CC-BY is not required; any open data license will do</li>\n</ul>\n\n<p>Now, I know some of you disagree, and I understand the costs for maintaining and curating a database. But, if all\nresearch results would be freely available, these costs can be shared by the community, and we could\n<em>all stand on the shoulders of giants</em>.</p>",
      "summary": "While I do not agree in details on the statement made by Klaus, I agree with his intentions, and happy to propagate the mantra, like others did before me:",
      
      "date_published": "2008-04-15T00:00:00+00:00",
      "date_modified": "2008-04-15T00:00:00+00:00",
      "tags": ["openscience"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/7tfeg-dvs86",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/04/09/metware-developers-meeting-in-halle.html",
      "title": "The MetWare developers meeting in Halle",
      "content_html": "<p>Today starts the <a href=\"http://metware.sf.net/\">MetWare</a> developers meeting, hosted by <a href=\"http://www.ipb-halle.de/de/forschung/stress-und-entwicklungsbiologie/forschungsgruppen/bioinformatik-massenspektrometrie/\">Steffen Neumann</a>,\nat the <a href=\"http://www.ipb-halle.de/de/\">Leibniz-Institut für Pflanzenbiochemie</a>. Steffen’s group and the <a href=\"http://www.ab.wur.nl/\">Applied Bioinformatics</a>\ngroup where I now work, are co-developing an opensource platform for metabolomics data management. Not really a full LIMS system,\nbut a system to keep track of all the facts about the experiments and samples we would use when analyzing the data in order to\nfind new chemistry, biomarkers, etc (see <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/11/22/metware-metabolomics-database-project.html\">this earlier blog <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\ntoo). Good news is, that BioAssist is developing a support platform for the <a href=\"http://www.metabolomicscentre.nl/\">NMC</a>,\nand plans to use MetWare as a main component.</p>\n\n<p>OK, off to catch my train now. See you online (#metware @ irc.freenode.net); the wiki has an\n<a href=\"http://metware.wiki.sourceforge.net/AgendaApril08\">agenda for the meeting</a>.</p>",
      "summary": "Today starts the MetWare developers meeting, hosted by Steffen Neumann, at the Leibniz-Institut für Pflanzenbiochemie. Steffen’s group and the Applied Bioinformatics group where I now work, are co-developing an opensource platform for metabolomics data management. Not really a full LIMS system, but a system to keep track of all the facts about the experiments and samples we would use when analyzing the data in order to find new chemistry, biomarkers, etc (see this earlier blog too). Good news is, that BioAssist is developing a support platform for the NMC, and plans to use MetWare as a main component.",
      
      "date_published": "2008-04-09T00:00:00+00:00",
      "date_modified": "2025-10-11T00:00:00+00:00",
      "tags": ["metware","metabolomics"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/f2ajz-tmr63",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/04/07/cdkmetabolomicschemometrics.html",
      "title": "The CDK/Metabolomics/Chemometrics Unconference results",
      "content_html": "<p>As <a href=\"https://chem-bla-ics.linkedchemistry.info/2008/04/03/t-plus-18-hours-dr-and-preparing-for.html\">announced earlier <i class=\"fa-solid fa-recycle fa-xs\"></i></a>, Miguel, Velitchka,\n<a href=\"http://www.steinbeck-molecular.de/steinblog/\">Christoph</a> and I held a small <a href=\"http://cdk.sf.net/\">CDK</a>/Metabolomics/Chemometrics\nunconference. We started late, and did not have an evening program, resulting in not overly much results. However, we did do\n<em><a href=\"http://chem-bla-ics.blogspot.com/search?q=molecular+chemometrics\">molecular chemometrics</a></em>. <!-- keep link --></p>\n\n<p>We used the <a href=\"http://www.r-project.org/\">R statistics software</a> together with Rajarshi’s <a href=\"http://cran.r-project.org/web/packages/rcdk/index.html\">rcdk</a>\npackage (an R wrapper around the CDK library) and Ron’s (my PhD supervisor) <a href=\"http://cran.r-project.org/web/packages/pls/index.html\">PLS</a>\npackage (see <a href=\"http://www.jstatsoft.org/v18/i02/\">this paper</a>), to predict retention indices for a number of metabolites.</p>\n\n<p>We ended up with this R script:</p>\n\n<div class=\"language-R highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">library</span><span class=\"p\">(</span><span class=\"s2\">\"rJava\"</span><span class=\"p\">)</span><span class=\"w\">\n</span><span class=\"n\">library</span><span class=\"p\">(</span><span class=\"s2\">\"rcdk\"</span><span class=\"p\">)</span><span class=\"w\">\n</span><span class=\"n\">library</span><span class=\"p\">(</span><span class=\"s2\">\"pls\"</span><span class=\"p\">)</span><span class=\"w\">\n</span><span class=\"n\">mols</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"n\">load.molecules</span><span class=\"p\">(</span><span class=\"s2\">\"data_cdk.sdf\"</span><span class=\"p\">)</span><span class=\"w\">\n</span><span class=\"n\">selection</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"n\">get.desc.names</span><span class=\"p\">()</span><span class=\"w\">\n</span><span class=\"n\">selection</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"n\">selection</span><span class=\"p\">[</span><span class=\"o\">-</span><span class=\"n\">which</span><span class=\"p\">(</span><span class=\"n\">selection</span><span class=\"o\">==</span><span class=\"s2\">\"org.openscience.cdk.qsar.descriptors.molecular.AminoAcidCountDescriptor\"</span><span class=\"p\">)]</span><span class=\"w\">\n</span><span class=\"n\">x</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"n\">eval.desc</span><span class=\"p\">(</span><span class=\"n\">mols</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">selection</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">verbose</span><span class=\"o\">=</span><span class=\"kc\">TRUE</span><span class=\"p\">)</span><span class=\"w\">\n</span><span class=\"n\">x2</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"n\">x</span><span class=\"p\">[,</span><span class=\"n\">apply</span><span class=\"p\">(</span><span class=\"n\">x</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"m\">2</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"k\">function</span><span class=\"p\">(</span><span class=\"n\">a</span><span class=\"p\">)</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"nf\">all</span><span class=\"p\">(</span><span class=\"o\">!</span><span class=\"nf\">is.na</span><span class=\"p\">(</span><span class=\"n\">a</span><span class=\"p\">))})]</span><span class=\"w\">\n</span><span class=\"n\">y</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"n\">read.table</span><span class=\"p\">(</span><span class=\"s2\">\"data_cdk_RI\"</span><span class=\"p\">)</span><span class=\"w\">\n</span><span class=\"n\">input</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"n\">data.frame</span><span class=\"p\">(</span><span class=\"n\">x2</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">y</span><span class=\"p\">)</span><span class=\"w\">\n</span><span class=\"n\">pls.model</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"n\">plsr</span><span class=\"p\">(</span><span class=\"n\">V1</span><span class=\"w\"> </span><span class=\"o\">~</span><span class=\"w\"> </span><span class=\"n\">.</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"m\">50</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">data</span><span class=\"o\">=</span><span class=\"n\">input</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">validation</span><span class=\"o\">=</span><span class=\"s2\">\"CV\"</span><span class=\"p\">)</span><span class=\"w\">\n</span><span class=\"n\">summary</span><span class=\"p\">(</span><span class=\"n\">pls.model</span><span class=\"p\">)</span><span class=\"w\">\n</span><span class=\"n\">plot</span><span class=\"p\">(</span><span class=\"n\">RMSEP</span><span class=\"p\">(</span><span class=\"n\">pls.model</span><span class=\"p\">))</span><span class=\"w\">\n</span><span class=\"n\">plot</span><span class=\"p\">(</span><span class=\"n\">pls.model</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">ncomp</span><span class=\"o\">=</span><span class=\"m\">20</span><span class=\"p\">)</span><span class=\"w\">\n</span><span class=\"n\">abline</span><span class=\"p\">(</span><span class=\"m\">0</span><span class=\"p\">,</span><span class=\"m\">1</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">col</span><span class=\"o\">=</span><span class=\"s2\">\"red\"</span><span class=\"p\">)</span><span class=\"w\">\n</span><span class=\"n\">plot</span><span class=\"p\">(</span><span class=\"n\">pls.model</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"s2\">\"loadings\"</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">comps</span><span class=\"o\">=</span><span class=\"m\">1</span><span class=\"o\">:</span><span class=\"m\">2</span><span class=\"p\">)</span><span class=\"w\">\n</span><span class=\"n\">savehistory</span><span class=\"p\">(</span><span class=\"s2\">\"finalHistory.R\"</span><span class=\"p\">)</span><span class=\"w\">\n</span></code></pre></div></div>\n\n<p>The <code class=\"language-plaintext highlighter-rouge\">AminoAcidCountDescriptor</code> threw us a <code class=\"language-plaintext highlighter-rouge\">NullPointerException</code> and there were a few NAs in the resulting matrix. The CV results were\nnot so good as Velitchka’s best models, but still a good start:</p>\n\n<p><img src=\"/assets/images/riPred.png\" alt=\"\" /></p>\n\n<p>No variable selection; 200 objects, 190 variables.</p>\n\n<p>Questions:</p>\n\n<ul>\n  <li>Can we do this in <a href=\"http://www.bioclipse.net/\">Bioclipse2</a> too?</li>\n  <li>Can we improve the default CDK descriptor parameters to maximize the column count?</li>\n  <li>Rajarshi, what would be involved to write some wrapper code for atomic descriptors for rcdk?</li>\n</ul>",
      "summary": "As announced earlier , Miguel, Velitchka, Christoph and I held a small CDK/Metabolomics/Chemometrics unconference. We started late, and did not have an evening program, resulting in not overly much results. However, we did do molecular chemometrics.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/riPred.png",
      "date_published": "2008-04-07T00:10:00+00:00",
      "date_modified": "2025-10-11T00:00:00+00:00",
      "tags": ["cdk","defense","phd","metabolomics","cheminf","chemometrics"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.18637/jss.v018.i02", "doi": "10.18637/jss.v018.i02"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/865j7-crn03",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/04/07/legal-advice-needed-nih-restricting.html",
      "title": "Legal Advice Needed: the NIH restricting access to our CC-licensed research results",
      "content_html": "<p>In reply to <a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=1026\">Peter’s</a> <a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=1025\">news</a>\nthat the NIH’s <a href=\"http://www.pubmedcentral.nih.gov/\">PubMed Central</a> (PMC) does not allow machine retrieval of content, I was wondering\nabout this section in the CC license of much of the PMC content, such as <a href=\"https://doi.org/10.1186/1471-2105-8-487\">our paper on userscripts</a>\n(section 4a of the <a href=\"http://creativecommons.org/licenses/by/2.0/legalcode\">CC-BY 2.0</a>):</p>\n\n<blockquote>\n  <p>You may not distribute, publicly display, publicly perform, or publicly digitally perform the Work with any technological measures\nthat control access or use of the Work in a manner inconsistent with the terms of this License Agreement.</p>\n</blockquote>\n\n<p>CC-BY 3.0 reads differently, but has similar aims.</p>\n\n<p>Let me make clear that I value machine readable publications much more than free (gratis, as-in-free-beer) publications. Now, the\nNIH initiative now just is ‘Free Access’. An interesting step, but not one I care much about; not in relation to science anyway.</p>\n\n<p>Now, Peter indicates that the NIH has put in place ‘technological measures to control access’ to the distribution of\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2007/12/21/christmas-presents.html\">our work on userscripts <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\n(<a href=\"http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pubmed&amp;pubmedid=18154664\">the PMC entry</a>). That is in clear violation\nof the CC license.</p>\n\n<p>I know that other NIH initiatives do allow this, such as PMC OAI, but that’s just an ‘auxiliary service’. It may come down\nto technical details, but some text on the PMC website is at least inaccurate:</p>\n\n<blockquote>\n  <p>Crawlers and other automated processes may NOT be used to systematically retrieve batches of articles from the PMC web site.\nBulk downloading of articles from the main PMC web site, in any way, is prohibited because of copyright restrictions.</p>\n</blockquote>\n\n<p>They way it is described right now, it is like: <em>You may not drive a car</em>. Next paragraph. <em>But, if you have a driver license,\nwe will approve</em>. Or, translated to this example: <em>You may only use this and that article, but only a few of them</em>.\nNext paragraph. <em>Unless you use the following technical hole in the measure we took to disallow you access</em>.</p>\n\n<p>What the PMC website should indicate, instead, is that text mining is allowed for the PMC OAI subset, but that they would highly\nprefer to use the PMC OAI or PMC FTP routes. This is the least they have to do.</p>\n\n<p>No matter what, I still have the feeling that any technical obstacles are disallowed by the CC-license. Any legal expert here,\nthat can explain me if the CC license allows controlling how people have access to my material?</p>",
      "summary": "In reply to Peter’s news that the NIH’s PubMed Central (PMC) does not allow machine retrieval of content, I was wondering about this section in the CC license of much of the PMC content, such as our paper on userscripts (section 4a of the CC-BY 2.0):",
      
      "date_published": "2008-04-07T00:00:00+00:00",
      "date_modified": "2025-10-11T00:00:00+00:00",
      "tags": ["pubmed","pmc","copyright"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1186/1471-2105-8-487", "doi": "10.1186/1471-2105-8-487"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/wgawt-w6g88",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/04/04/t-plus-51-hours-short-photo-impression.html",
      "title": "T plus 51 hours: a short photo impression",
      "content_html": "<p>I normally do not do these kinds of blog items, but, in reply to <a href=\"http://www.steinbeck-molecular.de/steinblog/index.php/2008/04/03/congratulations-egon/#comment-327\">Christoph’s blog</a>,\nhere’s an overview of the ceremony (see also <a href=\"https://chem-bla-ics.linkedchemistry.info/2008/04/01/t-minus-26-hours-defending-open-source.html\">T-26 <i class=\"fa-solid fa-recycle fa-xs\"></i></a> and\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2008/04/03/t-plus-18-hours-dr-and-preparing-for.html\">T+18 <i class=\"fa-solid fa-recycle fa-xs\"></i></a>):</p>\n\n<p><img src=\"/assets/images/vga_E112.JPG\" alt=\"\" /></p>\n\n<p>This is the doctorate certificate Christoph mentioned, with also Karin and our kids:</p>\n\n<p><img src=\"/assets/images/vga_E179.JPG\" alt=\"\" /></p>\n\n<p>And, <a href=\"http://www.oortjeshekken.nl/\">here</a> (<a href=\"http://maps.google.com/maps?f=q&amp;hl=en&amp;geocode=&amp;q=Erlecomsedam+4,+ooij,+netherlands&amp;sll=51.857623,5.93914&amp;sspn=0.046967,0.146942&amp;ie=UTF8&amp;ll=51.864169,5.933647&amp;spn=0.01174,0.036736&amp;t=h&amp;z=15\">map</a>)\nwas the dinner in the evening:</p>\n\n<p><img src=\"/assets/images/vga_E227.JPG\" alt=\"\" /></p>",
      "summary": "I normally do not do these kinds of blog items, but, in reply to Christoph’s blog, here’s an overview of the ceremony (see also T-26 and T+18 ):",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/vga_E112.JPG",
      "date_published": "2008-04-04T00:00:00+00:00",
      "date_modified": "2025-10-11T00:00:00+00:00",
      "tags": ["defense","phd"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/wbyha-x7e76",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/04/03/t-plus-18-hours-dr-and-preparing-for.html",
      "title": "T plus 18 hours: dr and preparing for the afterparty, umm ^w^w^w, CDK/Metabolomics/Chemometrics unconference",
      "content_html": "<p>I am doctor now; I shall now be <a href=\"http://taaladvies.net/taal/advies/tekst/21#6\">addressed as</a> <em>weledelzeergeleerde</em> Egon;\ntranslating to something like <em>quite-noble-very-knowledgeable</em>, hahahaha. I’ll put up a few photo’s of the ceremony, which\nis actually quite formal at the <a href=\"http://www.ru.nl/\">Radboud University</a>, later.</p>\n\n<p>With this blog item, I would to thank everyone who left a message, sent email, etc with good luck messages. Very much\nappreciated! I’d also like to thank my supervisors, promotores <a href=\"http://www.cac.science.ru.nl/people/lbuydens/index.html\">Lutgarde Buydens</a> and\n<a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/\">Peter Murray-Rust</a> (he mentions the event <a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=1019\">here</a>),\nand <a href=\"http://www.cac.science.ru.nl/people/rwehrens/index.html\">Ron Wehrens</a> for their confidence in me and their guidance\non the path towards the post-doc life. I also thank all those who attended my defense; I had a brilliant day, and actually\nenjoyed talking to those who took place in my promotion committee and who asked me the not-really-nasty-questions about\nmy work.</p>\n\n<h2 id=\"cdk-chemometrics-in-metabolomics-unconference\">CDK-Chemometrics in Metabolomics Unconference</h2>\n\n<p>For today, I organized a small, informal <a href=\"http://en.wikipedia.org/wiki/Unconference\">unconference</a>, oriented around the\n<a href=\"http://cdk.sf.net/\">CDK</a>, chemometrics and metabolomics. I’m certain we will be online much of the day, as we typically\ndo. The meeting will start around 10:00 <a href=\"http://en.wikipedia.org/wiki/Central_European_Summer_Time\">CEST</a>, but we’ll\nattend a seminar by <a href=\"http://www.ki.si/index.php?id=844\">Marjana Novič</a> at 11:00 CEST. If you happen to be in\n<a href=\"http://en.wikipedia.org/wiki/Nijmegen\">Nijmegen</a>, just drop in on the Analytical Chemistry department.\nOtherwise, join the #cdk chat channel in the irc.freenode.net network.</p>\n\n<p>What we’ll do?? Hey, it’s an unconference; we have no idea yet :)</p>",
      "summary": "I am doctor now; I shall now be addressed as weledelzeergeleerde Egon; translating to something like quite-noble-very-knowledgeable, hahahaha. I’ll put up a few photo’s of the ceremony, which is actually quite formal at the Radboud University, later.",
      
      "date_published": "2008-04-03T00:00:00+00:00",
      "date_modified": "2008-04-03T00:00:00+00:00",
      "tags": ["defense","cheminf","chemometrics","phd"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/2b0cr-22997",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/04/01/t-minus-26-hours-defending-open-source.html",
      "title": "T minus 26 hours: defending open source chemoinformatics (and more)",
      "content_html": "<p>In about 26 hours from now, I will be <a href=\"https://chem-bla-ics.linkedchemistry.info/2008/03/01/todo-april-2nd-defend-my-phd-work.html\">defending my PhD thesis <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.\nFollow that link to read the summary; I was thinking if publishing my introduction and discussion (the rest has been published in peer-reviewed\njournals) on <a href=\"http://precedings.nature.com/\">Nature Precedings</a>; would that be a good idea? Otherwise, I’ll post it in my blog. If you just\nhappen to want to attend the public defense, it’s\n<a href=\"http://maps.google.com/maps?f=q&amp;hl=en&amp;geocode=&amp;q=Comeniuslaan+2,+6525+Nijmegen,+Nijmegen+(Gelderland),+Netherlands&amp;sll=37.0625,-95.677068&amp;sspn=28.114729,75.234375&amp;ie=UTF8&amp;ll=51.820699,5.857548&amp;spn=0.002673,0.009184&amp;t=h&amp;z=17&amp;iwloc=addr\">here</a>:</p>\n\n<p><img src=\"/assets/images/aula.png\" alt=\"\" /></p>",
      "summary": "In about 26 hours from now, I will be defending my PhD thesis . Follow that link to read the summary; I was thinking if publishing my introduction and discussion (the rest has been published in peer-reviewed journals) on Nature Precedings; would that be a good idea? Otherwise, I’ll post it in my blog. If you just happen to want to attend the public defense, it’s here:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/aula.png",
      "date_published": "2008-04-01T00:00:00+00:00",
      "date_modified": "2025-10-11T00:00:00+00:00",
      "tags": ["cheminf","chemometrics","phd"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/e7qrk-ypg78",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/03/23/cdk-module-dependencies-2.html",
      "title": "CDK Module dependencies #2",
      "content_html": "<p>A bit over 2 years ago I published a <a href=\"https://chem-bla-ics.linkedchemistry.info/2005/12/06/uml-diagram-of-cdk-module-dependencies.html\">UML diagram showing the dependencies between CDK modules <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.\nSince then I lot of new modules have been defined, added or factored out from the extra module (click to zoom):</p>\n\n<p><img src=\"/assets/images/cdkdeps.dot.png\" alt=\"\" /></p>\n\n<p>These kind of diagrams help us maintain the library, and apply some design goals, as explained in the first post on this.</p>\n\n<p>If one compares the two diagrams, one sees that fewer code depends on the <code class=\"language-plaintext highlighter-rouge\">data</code> module, but it is also clear that still a\nlot of them do. Another issue that had not properly addressed yet, is that a lot of modules still depend on the <code class=\"language-plaintext highlighter-rouge\">extra</code> module,\nwhich aggregates everything that had not been assigned elsewhere.</p>\n\n<h2 id=\"parallelism\">Parallelism</h2>\n\n<p>This diagram also helped me use the [Ant <parallel> task](http://ant.apache.org/manual/CoreTasks/parallel.html) to allow compiling\nCDK modules in parallel, instead of sequentially. Multicore machines can take advantage of that, and reduce the overall computation\ntime. Full parallelism is not possible, and it is well visualized by the above diagram that there basically 12 sequential\ncompilation steps in which one or more modules can be compiled. Further clean up of the module dependencies, will reduce this\nnumber, and further reduce the computation time on multicore machines.</parallel></p>\n\n<p>Now, graph analysis could pinpoint the most troublesome nodes, but it would not surprise me that extra would be amongst them. But\nthe following items are worth looking at too:</p>\n\n<ul>\n  <li>why does <code class=\"language-plaintext highlighter-rouge\">qsar</code> have to depend on <code class=\"language-plaintext highlighter-rouge\">charges</code>?</li>\n  <li>why does <code class=\"language-plaintext highlighter-rouge\">sdg</code> (the 2D layout code) depends on <code class=\"language-plaintext highlighter-rouge\">io</code> code?</li>\n  <li>can <code class=\"language-plaintext highlighter-rouge\">isomorphism</code> and <code class=\"language-plaintext highlighter-rouge\">formula</code> be made independent of <code class=\"language-plaintext highlighter-rouge\">data</code>?</li>\n  <li>why does <code class=\"language-plaintext highlighter-rouge\">reaction</code> depend on <code class=\"language-plaintext highlighter-rouge\">sdg</code>?</li>\n  <li>why does <code class=\"language-plaintext highlighter-rouge\">forcefield</code> depend on <code class=\"language-plaintext highlighter-rouge\">qsaratomic</code>?</li>\n</ul>\n\n<p>Some of these issues are rather practical, but it is these kind of analyses that help us clean up the CDK library.</p>",
      "summary": "A bit over 2 years ago I published a UML diagram showing the dependencies between CDK modules . Since then I lot of new modules have been defined, added or factored out from the extra module (click to zoom):",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/cdkdeps.dot.png",
      "date_published": "2008-03-23T00:00:00+00:00",
      "date_modified": "2025-07-30T00:00:00+00:00",
      "tags": ["cdk"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/5zbfp-9wb13",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/03/22/be-in-my-advisory-board-3-jchempaint.html",
      "title": "Be in my Advisory Board #3: JChemPaint widgets?",
      "content_html": "<p><a href=\"https://chem-bla-ics.linkedchemistry.info/2008/01/15/be-in-my-advisory-board-2-jchempaint.html\">As promised <i class=\"fa-solid fa-recycle fa-xs\"></i></a>, I am working on JChemPaint. I have progressed in cleaning up\nthe <a href=\"http://cdk.svn.sourceforge.net/viewvc/cdk/cdk/trunk/\">CDK trunk/</a> repository by removing traces of the old JChemPaint applet and application. And,\nimportantly, removed the <code class=\"language-plaintext highlighter-rouge\">GeometryTools</code> class that took rendering coordinates. The history here is that the original <code class=\"language-plaintext highlighter-rouge\">GeometryTools</code> was renamed to\n<code class=\"language-plaintext highlighter-rouge\">GeometryToolsInternalCoordinates</code>, but is now available as <code class=\"language-plaintext highlighter-rouge\">GeometryTools</code> again. I still have to merge Niels’ additions with it, though. And,\nI have set up a new <a href=\"http://cdk.svn.sourceforge.net/viewvc/cdk/jchempaint/trunk/\">JChemPaint trunk/</a> where I have moved\n<a href=\"http://cdk.svn.sourceforge.net/viewvc/cdk/jchempaint/trunk/src/main/org/openscience/jchempaint/TestEditor.java?view=log\">Niels’ demo editor</a>.</p>\n\n<p>Main goal for the next weeks is to further clean up things, and get the new JChemPaint project further up and going. There are, however, some new\nchoices for focus now. <a href=\"http://www.bioclipse.net/\">Bioclipse</a> needs a SWT widget, the applet would need a Swing widget, and an application could\nbe based on that too, while I could even create a Qt widget, so that in the foreseeable future we can have JChemPaint on our cell phones. So,\nmight <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/11/27/be-in-my-advisory-board-1-being-good.html\">my advisory board <i class=\"fa-solid fa-recycle fa-xs\"></i></a> (that can be you too) take the\nopportunity to advice me in these matters, and indicate what you would prefer?</p>\n\n<h2 id=\"the-sqt-widget\">The SQT Widget</h2>\n\n<p>For Bioclipse mostly. Bioclipse provides a perfect opportunity to replace the old JChemPaint appliaction (not applet), with a attractive and\npowerful GUI.</p>\n\n<h2 id=\"the-swing-application\">The Swing Application</h2>\n\n<p>Maybe you’d rather see the old JChemPaint application reinstated, with the less attractive Swing-based GUI. I’d really suggest the Bioclipse\napproach, so if you pick this option please explain in the comments of this item why I should do this.</p>\n\n<h2 id=\"the-qt-widget\">The Qt Widget</h2>\n\n<p>The <a href=\"http://trolltech.com/products/qt/jambi\">Qt lib comes with Java support</a>, and this might be an interesting alternative. Besides being able\nto make an Qt-based application, the widget would also make it easier to port JChemPaint to the cell phone and to the\n<a href=\"http://www.kde.org/\">KDE desktop</a>.</p>\n\n<h2 id=\"the-applet\">The Applet</h2>\n\n<p>The applet is important, and requires a Swing or AWT widget. Personally, I’d rather focus on the SWT widget first, as that is a place where\nno good alternative is available. On the applet side, we compete with the JME applet and <a href=\"http://metamolecular.com/chemwriter/\">Rich’ nice applet</a>.</p>\n\n<p>I do intend to provide an applet version, but this request for advice is for setting priorities.</p>",
      "summary": "As promised , I am working on JChemPaint. I have progressed in cleaning up the CDK trunk/ repository by removing traces of the old JChemPaint applet and application. And, importantly, removed the GeometryTools class that took rendering coordinates. The history here is that the original GeometryTools was renamed to GeometryToolsInternalCoordinates, but is now available as GeometryTools again. I still have to merge Niels’ additions with it, though. And, I have set up a new JChemPaint trunk/ where I have moved Niels’ demo editor.",
      
      "date_published": "2008-03-22T00:00:00+00:00",
      "date_modified": "2025-10-11T00:00:00+00:00",
      "tags": ["jchempaint","bioclipse"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/f6q3q-d2e52",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/03/19/my-foaf-network-5-sparql-ing-my-network.html",
      "title": "My FOAF network #5: SPARQL-ing my network",
      "content_html": "<p><a href=\"http://www.foaf-project.org/\">FOAF rulez</a>: it’s RDF. With RDF comes <a href=\"http://www.w3.org/TR/rdf-sparql-query/\">SPARQL</a>.\nSPARQL needs a query engine, however. And there comes <a href=\"http://openrdf.org/\">OpenRDF</a> which created Sesame. I have\nto catch the train in about 15 minutes, so will not elaborate too much, but here are some\n<a href=\"http://www.openrdf.org/doc/sesame2/2.0.1/users/index.html\">Sesame 2.0.1</a> work:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>&gt; create native.\nPlease specify values for the following variables:\nRepository ID [native]: foafRepo\nRepository title [Native store]: FOAF Repository\nTriple indexes [spoc,posc]:\nRepository created\n&gt; open foafRepo\n</code></pre></div></div>\n\n<p>Creates me a new RDF storage and opens it.</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>foafRepo&gt; load http://blueobelisk.sourceforge.net/people/egonw/foaf.xrdf .\nLoading data...\nData has been added to the repository (606 ms)\n</code></pre></div></div>\n\n<p>Loads my FOAF file. Now, a simple SPARQL query that finds me all friends that now someone with the nick egonw:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>foafRepo&gt; sparql\n\nBASE &lt;http://blueobelisk.sourceforge.net/people/egonw/foaf.xrdf&gt;\nPREFIX foaf: &lt;http://xmlns.com/foaf/0.1/&gt;\n\nSELECT ?s ?o\nWHERE { ?s foaf:knows ?o ; foaf:nick \"egonw\" . }\n\n.\nEvaluating query...\n+-------------------------------------+-------------------------------------+\n| s                                   | o                                   |\n+-------------------------------------+-------------------------------------+\n| &lt;http://blueobelisk.sourceforge.net/people/egonw/foaf.xrdf#me&gt;| &lt;http://blueobelisk.sourceforge.net/people/egonw/foaf.xrdf#HenryRzepa&gt;|\n| &lt;http://blueobelisk.sourceforge.net/people/egonw/foaf.xrdf#me&gt;| &lt;http://blueobelisk.sourceforge.net/people/egonw/foaf.xrdf#CarstenNiehaus&gt;|\n| &lt;http://blueobelisk.sourceforge.net/people/egonw/foaf.xrdf#me&gt;| &lt;http://blueobelisk.sourceforge.net/people/egonw/foaf.xrdf#RajarshiGuha&gt;|\n| &lt;http://blueobelisk.sourceforge.net/people/egonw/foaf.xrdf#me&gt;| &lt;http://blueobelisk.sourceforge.net/people/egonw/foaf.xrdf#JeanClaudeBradley&gt;|\n| &lt;http://blueobelisk.sourceforge.net/people/egonw/foaf.xrdf#me&gt;| &lt;http://blueobelisk.sourceforge.net/people/egonw/foaf.xrdf#GeoffHutchison&gt;|\n| &lt;http://blueobelisk.sourceforge.net/people/egonw/foaf.xrdf#me&gt;| &lt;http://blueobelisk.sourceforge.net/people/egonw/foaf.xrdf#ChristophSteinbeck&gt;|\n| &lt;http://blueobelisk.sourceforge.net/people/egonw/foaf.xrdf#me&gt;| &lt;http://blueobelisk.sourceforge.net/people/egonw/foaf.xrdf#PeterMurrayRust&gt;|\n| &lt;http://blueobelisk.sourceforge.net/people/egonw/foaf.xrdf#me&gt;| &lt;http://blueobelisk.sourceforge.net/people/egonw/foaf.xrdf#TobiasHelmus&gt;|\n| &lt;http://blueobelisk.sourceforge.net/people/egonw/foaf.xrdf#me&gt;| &lt;http://blueobelisk.sourceforge.net/people/egonw/foaf.xrdf#StefanKuhn&gt;|\n| &lt;http://blueobelisk.sourceforge.net/people/egonw/foaf.xrdf#me&gt;| &lt;http://blueobelisk.sourceforge.net/people/egonw/foaf.xrdf#MartinEklund&gt;|\n| &lt;http://blueobelisk.sourceforge.net/people/egonw/foaf.xrdf#me&gt;| &lt;http://blueobelisk.sourceforge.net/people/egonw/foaf.xrdf#JohannesWagener&gt;|\n| &lt;http://blueobelisk.sourceforge.net/people/egonw/foaf.xrdf#me&gt;| &lt;http://blueobelisk.sourceforge.net/people/egonw/foaf.xrdf#JarlWikberg&gt;|\n| &lt;http://blueobelisk.sourceforge.net/people/egonw/foaf.xrdf#me&gt;| &lt;http://blueobelisk.sourceforge.net/people/egonw/foaf.xrdf#JeromePansanel&gt;|\n+-------------------------------------+-------------------------------------+\n13 result(s) (15 ms)\n</code></pre></div></div>\n\n<p>Not very pretty, but rather accurate.</p>\n\n<p>More SPARQL fun later. Do try this at home, but make sure to <a href=\"http://openrdf.org/forum/mvnforum/viewthread?thread=1641\">not put a period at the end of a line in your SPARQL query</a>! :)</p>",
      "summary": "FOAF rulez: it’s RDF. With RDF comes SPARQL. SPARQL needs a query engine, however. And there comes OpenRDF which created Sesame. I have to catch the train in about 15 minutes, so will not elaborate too much, but here are some Sesame 2.0.1 work:",
      
      "date_published": "2008-03-19T00:00:00+00:00",
      "date_modified": "2008-03-19T00:00:00+00:00",
      "tags": ["foaf","rdf","sparql"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/5gmnt-8b189",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/03/18/my-foaf-network-4-tabulating-my.html",
      "title": "My FOAF network #4: Tabulating my publications",
      "content_html": "<p><a href=\"http://dowhatimean.net/\">Richard</a> informed me (via <a href=\"http://planetrdf.com/\">Planet RDF</a>) about <a href=\"http://dowhatimean.net/2008/03/tabulator-does-n3\">N3 support in Tabulator</a>.\n<a href=\"http://www.w3.org/DesignIssues/Notation3.html\">N3</a> is a more compressed version of RDF/XML, which I have been using so far, but both are\n<a href=\"http://en.wikipedia.org/wiki/Resource_Description_Framework\">RDF</a>. Now, I don’t plan to use N3 for my FOAF experimenting, but two things caught\nmy eye in the nice blog item.</p>\n\n<p>First, it has a very useful tip on <code class=\"language-plaintext highlighter-rouge\">.htaccess</code> which you can use to teach Apache about MIME types, even when you do not have root access.\nSo, I added this <code class=\"language-plaintext highlighter-rouge\">.htaccess</code> file to <a href=\"http://blueobelisk.sourceforge.net/people/egonw/\">blueobelisk.sourceforge.net/people/egonw/</a>:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>AddType application/rdf+xml;charset=utf-8 .xrdf\n</code></pre></div></div>\n\n<p>Now, you can also access my <a href=\"http://blueobelisk.sourceforge.net/people/egonw/foaf.xrdf\">FOAF file with the MIME type</a> set to\n<code class=\"language-plaintext highlighter-rouge\">application/rdf+xml</code>. And, <a href=\"http://blueobelisk.sourceforge.net/people/egonw/biblio.xrdf\">my bibliography too</a>. Now, the latter becomes\ninteresting when you have <a href=\"http://dig.csail.mit.edu/2007/tab/\">Tabulator</a> installed in your Firefox. Instead of applying the XSLT,\nFirefox will now show it like this:</p>\n\n<p><img src=\"/assets/images/tabulator.png\" alt=\"\" /></p>\n\n<p>And, in the <em>under the hood</em> mode it looks like:</p>\n\n<p><img src=\"/assets/images/tabulator1.png\" alt=\"\" /></p>\n\n<p>Now, my FOAF file does not seem to work well. Not sure what goes wrong there, but given the fact that Tabulator seems to be able to\nrecurse into referenced RDF files, I think it nicely complements what we already have.</p>\n\n<p>Wow, it seems Web3.0/WebNG is really going to happen this year!</p>",
      "summary": "Richard informed me (via Planet RDF) about N3 support in Tabulator. N3 is a more compressed version of RDF/XML, which I have been using so far, but both are RDF. Now, I don’t plan to use N3 for my FOAF experimenting, but two things caught my eye in the nice blog item.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/tabulator1.png",
      "date_published": "2008-03-18T00:00:00+00:00",
      "date_modified": "2008-03-18T00:00:00+00:00",
      "tags": ["foaf","rdf"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/wxszb-3fk40",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/03/11/sugammadex-molecular-condom.html",
      "title": "Sugammadex: the molecular condom",
      "content_html": "<p>Two things I like blogging: 1. the turn-over of information; 2. the informal nature. There are\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2007/01/11/why-do-i-blog.html\">more <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.\nThe turn-over is optimized by commonly: 1. short blog items; 2. easily allows scanning tons of headlines; 3. often full of links if you want to know the details.</p>\n\n<p>Today, my eye was caught by <a href=\"http://gaussling.wordpress.com/2008/03/10/sugammadex-buzz-for-organon/\">Sugammadex Buzz for Organon</a> over at\n<a href=\"http://gaussling.wordpress.com/\">Lamentations on Chemistry</a>. The reason was <a href=\"http://www.organon.nl/\">Organon</a>, which is just around the corner here.\nThey had <a href=\"http://www.organon.com/Media/Press_Releases/2008_01_02_Schering_Plough_announces_new_drug_application_for_sugammadex_assigned_priority_review_status_by_U_S_FDA.asp?ComponentID=197129&amp;SourcePageID=8237#1\">news</a>\nabout a new drug.</p>\n\n<p>Getting to the second reason, I like the informal nature. Just to make sure I checked the press release, but it was really Gaussling that called\n<a href=\"http://en.wikipedia.org/wiki/Sugammadex\">Sugammadex</a> a <em>molecular condom</em>. This is funny for (at least) two reasons. First, it points\n(intentionally?) to the birth control drugs of Organon; second, it is right on with how the drug works.</p>",
      "summary": "Two things I like blogging: 1. the turn-over of information; 2. the informal nature. There are more . The turn-over is optimized by commonly: 1. short blog items; 2. easily allows scanning tons of headlines; 3. often full of links if you want to know the details.",
      
      "date_published": "2008-03-11T00:00:00+00:00",
      "date_modified": "2025-08-17T00:00:00+00:00",
      "tags": ["drugdiscovery","blog"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/3qtf0-89n45",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/03/10/my-foaf-network-3-my-publications.html",
      "title": "My FOAF network #3: My publications",
      "content_html": "<p><a href=\"https://chem-bla-ics.linkedchemistry.info/2008/03/08/my-foaf-network-2-xslt-for-html-gui.html\">As promised <i class=\"fa-solid fa-recycle fa-xs\"></i></a>, I’ll write a bit about using\n<em>Bibliographic Ontology Specification</em> (BIBO) over as <a href=\"http://bibliontology.com/\">bibliontology.com</a>. I have written a\n<a href=\"http://blueobelisk.sourceforge.net/people/egonw/bibo2xhtml.xsl\">basic XSLT</a> to create a HTML GUI (open the\n<a href=\"http://blueobelisk.sourceforge.net/people/egonw/biblio.xml\">RDF source</a> in e.g. Firefox). Really basic: it only converts articles,\nand even assumes some conventions I found in examples in the <a href=\"http://wiki.bibliontology.com/index.php/Examples\">BIBO wiki</a>.\nI have not spotted a BIBO validator yet, so guessing a bit. The BibTeX mapping examples are under discussion, but provide some\ninsight to those who are used to using that (<a href=\"http://jabref.sourceforge.net/\">JabRef</a> users, for example).</p>\n\n<p>So, if I understood the specs enough, the following should be valid BIBO (at least it is\n<a href=\"http://www.w3.org/RDF/Validator/ARPServlet?URI=http%3A%2F%2Fblueobelisk.sourceforge.net%2Fpeople%2Fegonw%2Fbiblio.xml&amp;PARSE=Parse+URI%3A+&amp;TRIPLES_AND_GRAPH=PRINT_TRIPLES&amp;FORMAT=PNG_EMBED\">valid RDF</a>):</p>\n\n<div class=\"language-xml highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"cp\">&lt;?xml version=\"1.0\" encoding=\"UTF-8\"?&gt;</span>\n<span class=\"err\">&lt;</span>?xml-stylesheet type=\"text/xsl\"\n                 href=\"http://blueobelisk.sourceforge.net/people/egonw/bibo2xhtml.xsl\"\n                 ?&gt;\n<span class=\"nt\">&lt;rdf:RDF</span>\n      <span class=\"na\">xmlns:rdf=</span><span class=\"s\">\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\"</span>\n      <span class=\"na\">xmlns:rdfs=</span><span class=\"s\">\"http://www.w3.org/2000/01/rdf-schema#\"</span>\n      <span class=\"na\">xmlns:bibo=</span><span class=\"s\">\"http://purl.org/ontology/biblio/\"</span>\n      <span class=\"na\">xmlns:dc=</span><span class=\"s\">\"http://purl.org/dc/elements/1.1/\"</span>\n      <span class=\"na\">xmlns:dcterms=</span><span class=\"s\">\"http://purl.org/dc/terms/\"</span>\n      <span class=\"na\">xmlns:foaf=</span><span class=\"s\">\"http://xmlns.com/foaf/0.1/\"</span>\n<span class=\"nt\">&gt;</span>\n\n  <span class=\"nt\">&lt;bibo:Journal</span> <span class=\"na\">rdf:about=</span><span class=\"s\">\"urn:issn:1471-2105\"</span><span class=\"nt\">&gt;</span>\n    <span class=\"nt\">&lt;dc:title&gt;</span>BMC Bioinformatics<span class=\"nt\">&lt;/dc:title&gt;</span>\n  <span class=\"nt\">&lt;/bibo:Journal&gt;</span>\n\n  <span class=\"nt\">&lt;bibo:Article</span> <span class=\"na\">rdf:about=</span><span class=\"s\">\"http://dx.doi.org/10.1186/1471-2105-8-59\"</span><span class=\"nt\">&gt;</span>\n    <span class=\"nt\">&lt;dc:title&gt;</span>Bioclipse: an open source workbench for chemo- and bioinformatics<span class=\"nt\">&lt;/dc:title&gt;</span>\n    <span class=\"nt\">&lt;dc:date&gt;</span>2007-02-22<span class=\"nt\">&lt;/dc:date&gt;</span>\n    <span class=\"nt\">&lt;dc:isPartOf</span> <span class=\"na\">rdf:resource=</span><span class=\"s\">\"urn:issn:1471-2105\"</span><span class=\"nt\">/&gt;</span>\n    <span class=\"nt\">&lt;bibo:volume&gt;</span>8<span class=\"nt\">&lt;/bibo:volume&gt;</span>\n    <span class=\"nt\">&lt;bibo:doi&gt;</span>10.1186/1471-2105-8-59<span class=\"nt\">&lt;/bibo:doi&gt;</span>\n\n    <span class=\"nt\">&lt;bibo:contribution&gt;</span>\n      <span class=\"nt\">&lt;bibo:Contribution&gt;</span>\n        <span class=\"nt\">&lt;bibo:role</span> <span class=\"na\">rdf:resource=</span><span class=\"s\">\"http://purl.org/ontology/bibo/roles/author\"</span> <span class=\"nt\">/&gt;</span>\n        <span class=\"nt\">&lt;bibo:contributor&gt;&lt;foaf:Person</span> <span class=\"na\">foaf:name=</span><span class=\"s\">\"Ola Spjuth\"</span><span class=\"nt\">/&gt;&lt;/bibo:contributor&gt;</span>\n        <span class=\"nt\">&lt;bibo:position&gt;</span>1<span class=\"nt\">&lt;/bibo:position&gt;</span>\n      <span class=\"nt\">&lt;/bibo:Contribution&gt;</span>\n    <span class=\"nt\">&lt;/bibo:contribution&gt;</span>\n\n    <span class=\"nt\">&lt;bibo:contribution&gt;</span>\n      <span class=\"nt\">&lt;bibo:Contribution&gt;</span>\n        <span class=\"nt\">&lt;bibo:role</span> <span class=\"na\">rdf:resource=</span><span class=\"s\">\"http://purl.org/ontology/bibo/roles/author\"</span> <span class=\"nt\">/&gt;</span>\n        <span class=\"nt\">&lt;bibo:contributor&gt;&lt;foaf:Person</span> <span class=\"na\">foaf:name=</span><span class=\"s\">\"Tobias Helmus\"</span><span class=\"nt\">/&gt;&lt;/bibo:contributor&gt;</span>\n        <span class=\"nt\">&lt;bibo:position&gt;</span>2<span class=\"nt\">&lt;/bibo:position&gt;</span>\n      <span class=\"nt\">&lt;/bibo:Contribution&gt;</span>\n    <span class=\"nt\">&lt;/bibo:contribution&gt;</span>\n\n    <span class=\"nt\">&lt;bibo:contribution&gt;</span>\n      <span class=\"nt\">&lt;bibo:Contribution&gt;</span>\n        <span class=\"nt\">&lt;bibo:role</span> <span class=\"na\">rdf:resource=</span><span class=\"s\">\"http://purl.org/ontology/bibo/roles/author\"</span> <span class=\"nt\">/&gt;</span>\n        <span class=\"nt\">&lt;bibo:contributor&gt;&lt;foaf:Person</span> <span class=\"na\">foaf:name=</span><span class=\"s\">\"Egon Willighagen\"</span><span class=\"nt\">/&gt;&lt;/bibo:contributor&gt;</span>\n        <span class=\"nt\">&lt;bibo:position&gt;</span>3<span class=\"nt\">&lt;/bibo:position&gt;</span>\n      <span class=\"nt\">&lt;/bibo:Contribution&gt;</span>\n    <span class=\"nt\">&lt;/bibo:contribution&gt;</span>\n\n    <span class=\"nt\">&lt;bibo:contribution&gt;</span>\n      <span class=\"nt\">&lt;bibo:Contribution&gt;</span>\n        <span class=\"nt\">&lt;bibo:role</span> <span class=\"na\">rdf:resource=</span><span class=\"s\">\"http://purl.org/ontology/bibo/roles/author\"</span> <span class=\"nt\">/&gt;</span>\n        <span class=\"nt\">&lt;bibo:contributor&gt;&lt;foaf:Person</span> <span class=\"na\">foaf:name=</span><span class=\"s\">\"Stefan Kuhn\"</span><span class=\"nt\">/&gt;&lt;/bibo:contributor&gt;</span>\n        <span class=\"nt\">&lt;bibo:position&gt;</span>4<span class=\"nt\">&lt;/bibo:position&gt;</span>\n      <span class=\"nt\">&lt;/bibo:Contribution&gt;</span>\n    <span class=\"nt\">&lt;/bibo:contribution&gt;</span>\n\n    <span class=\"nt\">&lt;bibo:contribution&gt;</span>\n      <span class=\"nt\">&lt;bibo:Contribution&gt;</span>\n        <span class=\"nt\">&lt;bibo:role</span> <span class=\"na\">rdf:resource=</span><span class=\"s\">\"http://purl.org/ontology/bibo/roles/author\"</span> <span class=\"nt\">/&gt;</span>\n        <span class=\"nt\">&lt;bibo:contributor&gt;&lt;foaf:Person</span> <span class=\"na\">foaf:name=</span><span class=\"s\">\"Martin Eklund\"</span><span class=\"nt\">/&gt;&lt;/bibo:contributor&gt;</span>\n        <span class=\"nt\">&lt;bibo:position&gt;</span>5<span class=\"nt\">&lt;/bibo:position&gt;</span>\n      <span class=\"nt\">&lt;/bibo:Contribution&gt;</span>\n    <span class=\"nt\">&lt;/bibo:contribution&gt;</span>\n\n    <span class=\"nt\">&lt;bibo:contribution&gt;</span>\n      <span class=\"nt\">&lt;bibo:Contribution&gt;</span>\n        <span class=\"nt\">&lt;bibo:role</span> <span class=\"na\">rdf:resource=</span><span class=\"s\">\"http://purl.org/ontology/bibo/roles/author\"</span> <span class=\"nt\">/&gt;</span>\n        <span class=\"nt\">&lt;bibo:contributor&gt;&lt;foaf:Person</span> <span class=\"na\">foaf:name=</span><span class=\"s\">\"Johannes Wagener\"</span><span class=\"nt\">/&gt;&lt;/bibo:contributor&gt;</span>\n        <span class=\"nt\">&lt;bibo:position&gt;</span>6<span class=\"nt\">&lt;/bibo:position&gt;</span>\n      <span class=\"nt\">&lt;/bibo:Contribution&gt;</span>\n    <span class=\"nt\">&lt;/bibo:contribution&gt;</span>\n\n    <span class=\"nt\">&lt;bibo:contribution&gt;</span>\n      <span class=\"nt\">&lt;bibo:Contribution&gt;</span>\n        <span class=\"nt\">&lt;bibo:role</span> <span class=\"na\">rdf:resource=</span><span class=\"s\">\"http://purl.org/ontology/bibo/roles/author\"</span> <span class=\"nt\">/&gt;</span>\n        <span class=\"nt\">&lt;bibo:contributor&gt;&lt;foaf:Person</span> <span class=\"na\">foaf:name=</span><span class=\"s\">\"Peter Murray-Rust\"</span><span class=\"nt\">/&gt;&lt;/bibo:contributor&gt;</span>\n        <span class=\"nt\">&lt;bibo:position&gt;</span>7<span class=\"nt\">&lt;/bibo:position&gt;</span>\n      <span class=\"nt\">&lt;/bibo:Contribution&gt;</span>\n    <span class=\"nt\">&lt;/bibo:contribution&gt;</span>\n\n    <span class=\"nt\">&lt;bibo:contribution&gt;</span>\n      <span class=\"nt\">&lt;bibo:Contribution&gt;</span>\n        <span class=\"nt\">&lt;bibo:role</span> <span class=\"na\">rdf:resource=</span><span class=\"s\">\"http://purl.org/ontology/bibo/roles/author\"</span> <span class=\"nt\">/&gt;</span>\n        <span class=\"nt\">&lt;bibo:contributor&gt;&lt;foaf:Person</span> <span class=\"na\">foaf:name=</span><span class=\"s\">\"Christoph Steinbeck\"</span><span class=\"nt\">/&gt;&lt;/bibo:contributor&gt;</span>\n        <span class=\"nt\">&lt;bibo:position&gt;</span>8<span class=\"nt\">&lt;/bibo:position&gt;</span>\n      <span class=\"nt\">&lt;/bibo:Contribution&gt;</span>\n    <span class=\"nt\">&lt;/bibo:contribution&gt;</span>\n\n    <span class=\"nt\">&lt;bibo:contribution&gt;</span>\n      <span class=\"nt\">&lt;bibo:Contribution&gt;</span>\n        <span class=\"nt\">&lt;bibo:role</span> <span class=\"na\">rdf:resource=</span><span class=\"s\">\"http://purl.org/ontology/bibo/roles/author\"</span> <span class=\"nt\">/&gt;</span>\n        <span class=\"nt\">&lt;bibo:contributor&gt;&lt;foaf:Person</span> <span class=\"na\">foaf:name=</span><span class=\"s\">\"Jarl Wikberg\"</span><span class=\"nt\">/&gt;&lt;/bibo:contributor&gt;</span>\n        <span class=\"nt\">&lt;bibo:position&gt;</span>9<span class=\"nt\">&lt;/bibo:position&gt;</span>\n      <span class=\"nt\">&lt;/bibo:Contribution&gt;</span>\n    <span class=\"nt\">&lt;/bibo:contribution&gt;</span>\n\n  <span class=\"nt\">&lt;/bibo:Article&gt;</span>\n\n<span class=\"nt\">&lt;/rdf:RDF&gt;</span>\n</code></pre></div></div>\n\n<p>There are some things notable about this markup:</p>\n\n<ol>\n  <li>It is <strong>very</strong> verbose, even for XML standards!</li>\n  <li>It’s RDF from the ground up</li>\n  <li>it reuses many other ontologies</li>\n</ol>\n\n<p>Particularly, the authors section is very verbose. However, it also nicely reuses <a href=\"http://www.foaf-project.org/\">FOAF</a>\nhere. This makes it really powerful. For example, I could have used this bit:</p>\n\n<div class=\"language-xml highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nt\">&lt;bibo:contribution&gt;</span>\n  <span class=\"nt\">&lt;bibo:Contribution&gt;</span>\n    <span class=\"nt\">&lt;bibo:role</span> <span class=\"na\">rdf:resource=</span><span class=\"s\">\"http://purl.org/ontology/bibo/roles/author\"</span> <span class=\"nt\">/&gt;</span>\n    <span class=\"nt\">&lt;bibo:contributor</span> <span class=\"na\">rdf:resource=</span><span class=\"s\">\"http://blueobelisk.sourceforge.net/people/egonw/foaf.xml#me\"</span><span class=\"nt\">/&gt;</span>\n    <span class=\"nt\">&lt;bibo:position&gt;</span>3<span class=\"nt\">&lt;/bibo:position&gt;</span>\n  <span class=\"nt\">&lt;/bibo:Contribution&gt;</span>\n<span class=\"nt\">&lt;/bibo:contribution&gt;</span>\n</code></pre></div></div>\n\n<p>This would <em>semantically link</em> this publication to whatever information I have on myself published in my FOAF file.</p>\n\n<p>Now, the reason why I have not done this yet, is that the XSLT did not properly load the XML from my foaf file:</p>\n\n<div class=\"language-xml highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nt\">&lt;xsl:variable</span> <span class=\"na\">name=</span><span class=\"s\">\"foafURI\"</span> <span class=\"na\">select=</span><span class=\"s\">\"substring-before(bibo:contributor/@rdf:resource, '#')\"</span><span class=\"nt\">/&gt;</span>\n<span class=\"nt\">&lt;xsl:variable</span> <span class=\"na\">name=</span><span class=\"s\">\"authorID\"</span> <span class=\"na\">select=</span><span class=\"s\">\"substring-after(bibo:contributor/@rdf:resource, '#')\"</span><span class=\"nt\">/&gt;</span>\n<span class=\"nt\">&lt;xsl:variable</span> <span class=\"na\">name=</span><span class=\"s\">\"foafDoc\"</span> <span class=\"na\">select=</span><span class=\"s\">\"document($foafURI)\"</span><span class=\"nt\">/&gt;</span>\n<span class=\"nt\">&lt;xsl:value-of</span> <span class=\"na\">select=</span><span class=\"s\">\"$foafDoc//foaf:Person[@rdf:ID=$authorID]\"</span><span class=\"nt\">/&gt;</span>\n</code></pre></div></div>\n\n<p>The XSLT processor <em>xsltproc</em> (version 1.1.22 on Ubuntu 8.04) gives this error:\n<code class=\"language-plaintext highlighter-rouge\">warning: failed to load external entity \"http://blueobelisk.sourceforge.net/people/egonw/foaf.xml\"</code>.\nBut, if I make it a relative, it does work. Both with xsltproc as well as with Firefox online.</p>\n\n<p>Another reason not to do it like that, is that one looses control of the citation content. What I will do soon, is use\nthis set up, making <a href=\"http://www.researcherid.com/\">researcherid.com</a> obsolete (see also\n<a href=\"http://plindenbaum.blogspot.com/2008/01/thomson-scientific-launches.html\">these</a>\n<a href=\"http://nsaunders.wordpress.com/2008/01/17/researcher-id/\">three</a>\n<a href=\"http://mndoci.com/blog/2008/01/17/researcherid-doesnt-seem-like-all-that/\">blogs</a>):</p>\n\n<div class=\"language-xml highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nt\">&lt;bibo:contributor&gt;</span>\n  <span class=\"nt\">&lt;foaf:Person</span> <span class=\"na\">foaf:name=</span><span class=\"s\">\"Egon Willighagen\"</span><span class=\"nt\">&gt;</span>\n    <span class=\"nt\">&lt;rdfs:seeAlso</span> <span class=\"na\">rdf:resource=</span><span class=\"s\">\"http://blueobelisk.sourceforge.net/people/egonw/biblio.xml#me\"</span><span class=\"nt\">/&gt;</span>\n  <span class=\"nt\">&lt;/foaf:Person&gt;</span>\n<span class=\"nt\">&lt;/bibo:contributor&gt;</span>\n</code></pre></div></div>\n\n<p>Just in case you are wondering, <em>“why the ### does he not simply use BibTeX?”</em>, the answer is RDF.\nNo RDF, no SPARQL, no GLORY. Just thing how easy it will become to run a queries like:</p>\n\n<ul>\n  <li>which of those I have published with, run a blog</li>\n  <li>which of those I have published with are going to that conference in Boston in September?</li>\n  <li>which of those I have published with have friends who published about topics around these keywords</li>\n  <li>etc…</li>\n</ul>\n\n<p>All that becomes very easy now.</p>\n\n<p>BTW, this is how I link to my bibliography from my FOAF:</p>\n\n<div class=\"language-xml highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nt\">&lt;foaf:publications</span> <span class=\"na\">rdf:resource=</span><span class=\"s\">\"http://blueobelisk.sourceforge.net/people/egonw/biblio.xml\"</span><span class=\"nt\">/&gt;</span>\n</code></pre></div></div>",
      "summary": "As promised , I’ll write a bit about using Bibliographic Ontology Specification (BIBO) over as bibliontology.com. I have written a basic XSLT to create a HTML GUI (open the RDF source in e.g. Firefox). Really basic: it only converts articles, and even assumes some conventions I found in examples in the BIBO wiki. I have not spotted a BIBO validator yet, so guessing a bit. The BibTeX mapping examples are under discussion, but provide some insight to those who are used to using that (JabRef users, for example).",
      
      "date_published": "2008-03-10T00:00:00+00:00",
      "date_modified": "2025-08-17T00:00:00+00:00",
      "tags": ["foaf"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/ed8nt-n6a25",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/03/09/chemical-object-identifier-or-freedom.html",
      "title": "The Chemical Object Identifier; or, the freedom to identify chemicals",
      "content_html": "<p>IUPAC chemical names, <a href=\"http://opensmiles.org/\">SMILES</a> and InChIs are too long. <a href=\"http://en.wikipedia.org/wiki/International_Chemical_Identifier#InChIKey\">InChIKeys</a>\nare not unique enough because of safety reasons (<em>you have a 1 in 10 billion chance of blowing up your building</em>; well, odds are actually much, much lower than\ngetting hit by Osama or friends, let alone a car). Wikipedia URIs do not cover enough chemical space.</p>\n\n<p>However, we need short identifier. Why, actually? Computers don’t care about long identifiers. Systems can be integrated. A web link is easy to make. But we do.\nA bottle on the shelf does not have a HTML interface. And you do not have a scanner to read the chemical structure from a 2D barcode (see\nDOI:<a href=\"https://doi.org/10.1021/ci049758i\">10.1021/ci049758i</a>).</p>\n\n<p>The <a href=\"http://en.wikipedia.org/wiki/CAS_registry_number\">CAS registry number</a> has serviced this purpose for a long time. For example, as used on bottles visible\nin this picture (copyright: <a href=\"http://creativecommons.org/licenses/by-sa/3.0/\">CC BY-SA</a>, <a href=\"http://blog.openwetware.org/scienceintheopen/\">Science in the Open</a>):</p>\n\n<p><img src=\"/assets/images/cas-number.png\" alt=\"\" /></p>\n\n<p>Now, when <a href=\"http://www.chemspider.com/blog/cas-discourages-using-scifinder-to-help-curate-wikipedia-structures-and-cas-numbers.html\">Anthony reported</a> that CAS,\nthe organization that builds the proprietary lookup service, which has done an amazing job in the past, that they do not wish to see CAS numbers in Wikipedia\ncurated by means of the official database - it violates the <em>end user agreement</em> one has to sign before one can use the database - the blogging community\nreacted (<a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=997\">here</a>,\n<a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=1000\">here</a>,\n<a href=\"http://www.chemconnector.com/chemunicating/the-curation-of-almost-5000-structures-on-wikipedia.html\">here</a>,\n<a href=\"https://doi.org/10.63485/54wv5-hs388\">here <i class=\"fa-solid fa-recycle fa-xs\"></i></a> and\n<a href=\"http://miningdrugs.blogspot.com/2008/03/cas-numbers-are-not-public-domain-are.html\">here</a>).</p>\n\n<p>Personally, I agree with the CAS standpoint. It’s been a proprietary database which people have been supporting financially for years, and thoughtfully signed\nthe license agreement. So, don’t complain afterwards. If you <em>really</em> want to, <strong>end the agreement and object against the license</strong>. I\n<a href=\"http://www.chemspider.com/blog/cas-discourages-using-scifinder-to-help-curate-wikipedia-structures-and-cas-numbers.html#comment-24101\">commented in the original blog</a>:</p>\n\n<blockquote>\n  <p>In 1995 I started a Dutch website on organic chemistry [1] and the CAS number was as useful as it is now, and already then we knew we were not allowed\nto compose a database of CAS numbers. Not sure about the legal state of that, but our university had a license; not sure if students had access, but\ndo not believe so. Anyway, building a substantial list of CAS number was not allowed. So, we looked for other means of identifying molecular structures,\nwhich led us to CML… this was around ‘96-’97 or so, at least before XML was released, and we started using CML actually when it was still in a more\nobscure SGML format :) Yeah, the XML recommendation was much appreciated!</p>\n\n  <p>OK, so back to your blog item. You can imagine that the comment in WP by CAS does not surprise me at all; nothing really new. If they would allow this,\nit would set a precedence…</p>\n\n  <p>The solution is, however, fairly easy. Use InChI(Key), PubChem CID, or ChemSpider CID; the latter two are on the same level as CAS numbers. CAS registry\nnumbers are overrated. Not sure if they still hand out CAS numbers to mixture too… (I guess not).</p>\n\n  <p>Oh, and I agree with Cpt. Renault… people should really abide to legal requirements. Period. If you don’t like them, quit the legal agreement.\nAs simple as that.</p>\n\n  <p>1.<a href=\"http://www.woc.science.ru.nl/\">http://www.woc.science.ru.nl/</a></p>\n</blockquote>\n\n<p>Here, I tend to disagree with <a href=\"http://www.chemspider.com/blog/cas-discourages-using-scifinder-to-help-curate-wikipedia-structures-and-cas-numbers.html#comment-24233\">Will who wrote</a>\nthat “<em>They are just numbers. i.e. descriptors</em>”. The CAS number only makes sense with a (curated) look up table; making it tightly\nlinked to the CAS database. While theoretically you may be allowed to copy numbers from that database, the license agreement strictly\ndisagrees with that. Court would have to decide which right takes higher importance, but my vote is on the agreement, which you\nthoughtfully signed. So, I tend to agree with Joerg who wrote that\n<a href=\"http://miningdrugs.blogspot.com/2008/03/cas-numbers-are-not-public-domain-are.html\">CAS number are not public domain, are they?</a></p>\n\n<p>An interesting bit in that blog item is <a href=\"http://miningdrugs.blogspot.com/2008/03/cas-numbers-are-not-public-domain-are.html#c3452086141400278558\">the comment he left himself</a>:</p>\n\n<blockquote>\n  <p>I just realized that <a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=999\">Peter</a> has also commented on it. And storing 10000 CAS\nnumbers and structures is allowed? What happens, if a journal reaches this limit? Just imagine they publish 1000 papers with 100\nCAS numbers for each article? I do not get this!</p>\n</blockquote>\n\n<p>Interesting indeed. This gets me back to a recent question I was confronted: <em>How would I use chemical literature in the current\nage?</em> Well, what about this hypothetical <a href=\"http://taverna.sf.net/\">Taverna</a> workflow:</p>\n\n<ul>\n  <li>Node 1: get me a list of journals expected to contains CAS registry numbers (such as the <a href=\"http://pubs.acs.org/journals/jcisd8/index.html\">JCIM</a>)</li>\n  <li>Node 2: for each, get me all publications of the last 25 years</li>\n  <li>Node 3: process all articles and count cited CAS registry numbers per journal</li>\n  <li>Node 4: complain if count_per_journal &gt; 10000</li>\n</ul>\n\n<p>Anyway. Common agreement seems to be that we can opt to do without the CAS registry number. The PubChem ID seems a reasonable\ncandidate, and has been suggested <a href=\"http://blog.openwetware.org/scienceintheopen/2008/03/08/what-to-use-as-a-the-primary-key-for-chemicals/\">here</a>\nand <a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=999\">here</a>. The ChemSpider ID could be an option too, though ChemSpider content is\nperiodically added to PubChem.</p>\n\n<p>I’d also like to bring in the suggestion of having a <em>Chemical Object Identifier</em>: like the DOI, the COI is a simple alpha-numerical\nidentifier, with a one-to-one connection to the InChI, and unlike the InChIKey unique as the InChI itself, but requiring a look up\nservice. And the latter I can offer: <a href=\"http://rdf.openmolecules.net/\">http://rdf.openmolecules.net/</a>. It’s a free (as in Open)\nresource, where we can provide this lookup service. It would be really easy to create a new COI when a InChI is passed it did not\nassign a COI yet. A PHP page to do the reverse lookup is easy too. Interested? I can have it going by the end of the month. It comes\nwith full RDF support, so ready for the <a href=\"http://markclittle.blogspot.com/2006/05/web-ng.html\">Web-NG</a>.</p>",
      "summary": "IUPAC chemical names, SMILES and InChIs are too long. InChIKeys are not unique enough because of safety reasons (you have a 1 in 10 billion chance of blowing up your building; well, odds are actually much, much lower than getting hit by Osama or friends, let alone a car). Wikipedia URIs do not cover enough chemical space.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/cas-number.png",
      "date_published": "2008-03-09T00:00:00+00:00",
      "date_modified": "2025-07-30T00:00:00+00:00",
      "tags": ["cas","cheminf"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1021/ci049758i", "doi": "10.1021/ci049758i"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.63485/54wv5-hs388", "doi": "10.63485/54wv5-hs388"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/p1n99-9sa03",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/03/08/my-foaf-network-2-xslt-for-html-gui.html",
      "title": "My FOAF network #2: XSLT for a HTML GUI",
      "content_html": "<p>Because the ACS meeting where <a href=\"http://www.ch.ic.ac.uk/rzepa/\">Henry</a> will present something about FOAF in chemistry,\nis nearing very fast now (here’s the <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/10/26/my-foaf-network-1-foafexplorer.html\">first blog it this series <i class=\"fa-solid fa-recycle fa-xs\"></i></a>),\nit becomes urgent to beef up the <a href=\"http://www.blueobelisk.org/\">Blue Obelisk</a> FOAF network, now consisting of\n<a href=\"http://blueobelisk.sourceforge.net/wiki/Members\">7 members</a>. All do now show up in\n<a href=\"http://blueobelisk.sourceforge.net/people/egonw/\">the FOAFExplorer</a>:</p>\n\n<p><img src=\"/assets/images/foafExplorer.png\" alt=\"\" /></p>\n\n<p>Now, to make sure that my FOAF is in order, I set up the regular XML/RDF toolchain, using <em>xmllint</em> to validate the XML and\nRDF syntax, and <a href=\"http://blueobelisk.sourceforge.net/people/egonw/foaf2xhtml.xsl\">XSLT</a> to convert the FOAF to human readable HTML.\nUsing the <code class=\"language-plaintext highlighter-rouge\">?xml-stylesheet?</code> syntax this also provide the basic HTML GUI when accessing\n<a href=\"http://blueobelisk.sourceforge.net/people/egonw/foaf.xml\">the FOAF file</a> using Firefox. BTW, I had to rename the file to\nmake the SourceForge web server aware that the file is an XML file, so that it nicely sets the MIME type.</p>\n\n<p>BTW, I suggest all to validate your FOAF with <a href=\"http://www.w3.org/RDF/Validator/\">this RDF validator</a>, because some of\nus got some work to do to make them valid:</p>\n\n<ul>\n  <li>Mine is having some encoding issue</li>\n  <li>Henry’s has some 8 errors</li>\n</ul>\n\n<p>The others are actually fine.</p>\n\n<p>While the XSLT is getting along quite nicely, I got serious other work to do. The <a href=\"http://strigi.sf.net/\">Strigi</a>-based\nFOAF indexer is sort of working, gets FOAF documents recursively, but I want it to index our publications and presentation\nslides too. Now, FOAF has a <em>foaf:publications</em> tag, which I thought might be suited. But after chatting with (new)\nfriends on the #foaf IRC channel (<a href=\"http://chatlogs.planetrdf.com/foaf/2008-03-08\">the log</a>), it became clear that the\nscope of that element is to point to some other file (<em>foaf:Document</em>) which lists the publications, such a HTML output\ncreated from BibTeX.</p>\n\n<p>That is, the following syntax is not quite what appears to be intended:</p>\n\n<div class=\"language-xml highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nt\">&lt;foaf:publications&gt;</span>\n  <span class=\"nt\">&lt;foaf:Document</span> <span class=\"na\">rdf:about=</span><span class=\"s\">\"http://dx.doi.org/10.1186/1471-2105-8-59\"</span><span class=\"nt\">&gt;</span>\n    <span class=\"nt\">&lt;dc:title&gt;</span>Bioclipse: an open source workbench for chemo-\n      and bioinformatics<span class=\"nt\">&lt;/dc:title&gt;</span>\n    <span class=\"nt\">&lt;dc:author</span> <span class=\"na\">rdf:resource=</span><span class=\"s\">\"#me\"</span><span class=\"nt\">/&gt;</span>\n  <span class=\"nt\">&lt;/foaf:Document&gt;</span>\n</code></pre></div></div>\n\n<p>The <a href=\"http://bibliontology.com/\">Bibliontology</a> was suggested and seem a rather good candidate to draft a separate\nbut RDF/OWL-based publication list. The server was down at the time of writing, but the Google cache showed the\nscope nicely. The <a href=\"http://groups.google.com/group/bibliographic-ontology-specification-group\">Google group is active</a>\nand the server should <a href=\"http://chatlogs.planetrdf.com/foaf/2008-03-08#T23-05-19\">go back online shortly</a>.</p>\n\n<p>OK, enough for now. More will follow in this series shortly. Such as a HTML GUI for\n<a href=\"http://blueobelisk.sourceforge.net/people/egonw/biblio.xml\">my publication list in Bibliontology format</a>.</p>",
      "summary": "Because the ACS meeting where Henry will present something about FOAF in chemistry, is nearing very fast now (here’s the first blog it this series ), it becomes urgent to beef up the Blue Obelisk FOAF network, now consisting of 7 members. All do now show up in the FOAFExplorer:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/foafExplorer.png",
      "date_published": "2008-03-08T00:00:00+00:00",
      "date_modified": "2025-08-17T00:00:00+00:00",
      "tags": ["semweb","blue-obelisk","foaf"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/ne3f5-6kt41",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/03/03/metabolomics-ontologies-skos-ified.html",
      "title": "Metabolomics Ontologies: SKOS-ified the ArMet specification",
      "content_html": "<p>The <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/11/22/metware-metabolomics-database-project.html\">MetWare project <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nis going to make use of ontology\ntechnologies to control the content of the database, and a first step is to convert <a href=\"http://metware.svn.sourceforge.net/viewvc/metware/trunk/metware/design/\">our MetWare database design</a>\ninto something using a formal ontology language. I have played with <a href=\"http://en.wikipedia.org/wiki/Web_Ontology_Language\">OWL</a>\nin the past (see for example\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2007/04/24/bioclipse-now-allows-qsar-descriptor.html\">its use in Bioclipse <i class=\"fa-solid fa-recycle fa-xs\"></i></a>),\nbut was not overly happy with it in all situations.</p>\n\n<p>Then I read about <a href=\"http://en.wikipedia.org/wiki/SKOS\">SKOS</a>, Simplified Knowledge Organisation System. Unlike OWL, SKOS is less strict on relations\nbetween concepts being marked up. Often these concepts are loosely bound, instead following a strict <em>is_a</em> hierarchy.\n<a href=\"http://www.armet.org/\">ArMet</a> is a Metabolomics knowledge system which does not have a strong hierarchy, and SKOS seemed to me to be the most\nsuitable markup candidate. So, I SKOS-ified the ArMet specification, resulting in <a href=\"http://metware.svn.sourceforge.net/viewvc/*checkout*/metware/trunk/metware/design/onto/armet.skos?revision=HEAD&amp;content-type=text%2Fxml\">this rather simple document</a>.\nThe document is SKOS, but has an associated <a href=\"http://metware.svn.sourceforge.net/viewvc/*checkout*/metware/trunk/metware/design/onto/skos2html.xsl?revision=HEAD&amp;content-type=text%2Fxml\">skos2html.xsl</a>\n<a href=\"http://en.wikipedia.org/wiki/XSLT\">XSLT stylesheet</a>, so that Firefox converts it to XHTML on the fly.</p>\n\n<p>An entry looks like:</p>\n\n<div class=\"language-xml highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nt\">&lt;skos:Concept</span> <span class=\"na\">rdf:about=</span><span class=\"s\">\"GenotypeID\"</span><span class=\"nt\">&gt;</span>\n  <span class=\"nt\">&lt;skos:prefLabel&gt;</span>genotypeID<span class=\"nt\">&lt;/skos:prefLabel&gt;</span>\n\n  <span class=\"nt\">&lt;skos:definition&gt;</span>A unique identifier for the genotype.<span class=\"nt\">&lt;/skos:definition&gt;</span>\n  <span class=\"nt\">&lt;skos:broader</span> <span class=\"na\">rdf:about=</span><span class=\"s\">\"GenotypeProperty\"</span><span class=\"nt\">/&gt;</span>\n<span class=\"nt\">&lt;/skos:Concept&gt;</span>\n</code></pre></div></div>\n\n<p>The full SKOS specification allows capturing much of what we want to do, including i18n via the label system, loos hierarchical relations via\n<em>skos:broader</em>, and the concepts of <em>skos:Collection</em> to aggregate concepts. Where needed, it allows borrowing from other languages. For example,\nto link concepts from MetWare to the original ArMet specification <em>owl:sameAs</em> can be used.</p>",
      "summary": "The MetWare project is going to make use of ontology technologies to control the content of the database, and a first step is to convert our MetWare database design into something using a formal ontology language. I have played with OWL in the past (see for example its use in Bioclipse ), but was not overly happy with it in all situations.",
      
      "date_published": "2008-03-03T00:00:00+00:00",
      "date_modified": "2025-08-17T00:00:00+00:00",
      "tags": ["bioclipse","metware","ontology","semweb","owl","xml"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/z6s2y-1an09",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/03/01/jane-find-me-interesting-journals.html",
      "title": "Jane, find me interesting journals, please.",
      "content_html": "<p><a href=\"http://bioinformatics.oxfordjournals.org/\">Bioinformatics</a> just published a paper from Schuemie and Kors (Erasmus University/NL,\n<a href=\"http://www.biosemantics.org/\">BioSemantics group</a>): <em>Jane: suggesting journals, finding experts</em> (doi:<a href=\"https://doi.org/10.1093/bioinformatics/btn006\">10.1093/bioinformatics/btn006</a>):</p>\n\n<blockquote>\n  <p><a href=\"http://biosemantics.org/jane/index.php\">Jane</a> (Journal/Author Name Estimator) is a freely available web-based application that,\non the basis of a sample text (e.g. the title and abstract of a manuscript), can suggest journals and experts who have published\nsimilar articles.</p>\n</blockquote>\n\n<p>Having just gone into a different research field, I appreciate Jane as a useful tool to learn to find my way around in relevant\nliterature. Based on, for example, the abstract of an article I find interesting, it finds me appropriate journals and authors.\nThe next screenshot shows the results for the abstract of the <a href=\"http://www.blueobelisk.org/\">Blue Obelisk</a> paper\n(doi:<a href=\"https://doi.org/10.1021/ci050400b\">10.1021/ci050400b</a>):</p>\n\n<p><img src=\"/assets/images/jane.png\" alt=\"\" /></p>\n\n<p>The <em>Show articles</em> feature as well as the journal annotation are rather useful to get a quick overview of what is being suggested.\nThe list of authors seems, at first sight, populated by co-authors, and lacks any form of annotation.\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2007/10/26/my-foaf-network-1-foafexplorer.html\">Room for FOAF here <i class=\"fa-solid fa-recycle fa-xs\"></i></a>? They used PubMed as content\nprovider, and text mining to <em>align</em> articles, but nothing really semantic, despite the group’s name. The output does not seem to\nprovide semantics either.</p>",
      "summary": "Bioinformatics just published a paper from Schuemie and Kors (Erasmus University/NL, BioSemantics group): Jane: suggesting journals, finding experts (doi:10.1093/bioinformatics/btn006):",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/jane.png",
      "date_published": "2008-03-01T00:10:00+00:00",
      "date_modified": "2025-08-17T00:00:00+00:00",
      "tags": ["publishing","foaf"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1021/CI050400B", "doi": "10.1021/CI050400B"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1093/bioinformatics/btn006", "doi": "10.1093/bioinformatics/btn006"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/87t55-whn79",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/03/01/todo-april-2nd-defend-my-phd-work.html",
      "title": "TODO: April 2nd, defend my PhD work",
      "content_html": "<p>In 4.5 weeks, on Wednesday April 2 (13:30 precisely, <a href=\"http://maps.google.com/maps?f=q&amp;hl=en&amp;geocode=&amp;q=comeniuslaan+2,+nijmegen,+nederland&amp;sll=37.0625,-95.677068&amp;sspn=25.010803,75.234375&amp;ie=UTF8&amp;ll=51.820852,5.857548&amp;spn=0.002374,0.009184&amp;t=h&amp;z=17&amp;iwloc=addr\">Aula, Comeniuslaan 2, Nijmegen</a>)\nI will publicly defend my PhD work performed in the <a href=\"http://www.cac.science.ru.nl/\">Analytical Chemistry group</a> of\n<a href=\"http://scholar.google.nl/scholar?as_q=&amp;num=10&amp;btnG=Search+Scholar&amp;as_epq=&amp;as_oq=&amp;as_eq=&amp;as_occt=any&amp;as_sauthors=LMC+Buydens&amp;as_publication=&amp;as_ylo=&amp;as_yhi=&amp;as_allsubj=all&amp;hl=en&amp;lr=\">Prof. Lutgarde Buydens</a>\nat the <a href=\"http://www.ru.nl/\">Radboud University Nijmegen</a>:</p>\n\n<p><img src=\"/assets/images/thesisCover.png\" alt=\"\" /></p>\n\n<h2 id=\"table-of-contents\">Table of Contents</h2>\n\n<ol>\n  <li>Introduction</li>\n  <li>Molecular Chemometrics (doi:<a href=\"https://doi.org/10.1080/10408340600969601\">10.1080/10408340600969601</a>)</li>\n  <li>1D NMR in QSPR(doi:<a href=\"https://doi.org/10.1021/ci050282s\">10.1021/ci050282s</a>)</li>\n  <li>Comparing Crystals (doi:<a href=\"https://doi.org/10.1107/S0108768104028344\">10.1107/S0108768104028344</a>)</li>\n  <li>Supervised SOMs (doi:<a href=\"https://doi.org/10.1021/cg060872y\">10.1021/cg060872y</a>)</li>\n  <li>Chemical Metadata in RSS (doi:<a href=\"https://doi.org/10.1021/ci034244p\">10.1021/ci034244p</a>)</li>\n  <li>Interoperability (doi:<a href=\"https://doi.org/10.1021/ci050400b\">10.1021/ci050400b</a>, the Blue Obelisk paper)</li>\n  <li>Discussion and Outlook</li>\n</ol>\n\n<p>Chapters 2, 3, 4, and 5 are first author papers, while for chapters 6 and 7 I am just co-author.</p>\n\n<h2 id=\"summary\">Summary</h2>\n\n<p>Chemometrics and chemoinformatics play important roles in the analysis and modeling of molecular data. In particular, in understanding and\nprediction of properties of molecules and molecular systems. Both chemometrics and chemoinformatics apply statistics, machine learning and\ninformatics methodologies to chemical questions, though originating from a different background. Where chemometrics had its origins in the\nextraction of information from chemical experiments, chemoinformatics had roots in the representation of chemical data for storage in\ndatabases. The technological advances in chemistry and biochemistry in the past decades have led, however, to a flood of data and new\nquestions, and the data analysis and modeling have become more complex. The standing challenge in data analysis and data exchange, is how\nto represent the molecular features relevant to the problem at hand. This representation of molecular information is the topic of this\nthesis.</p>\n\n<p>Chapter 1 introduces the field of data analysis and modeling of molecular data and describes the aforementioned importance of representation\nof relevant features. It discusses different approaches to molecular representation, such as line notations, chemical graphs, and quantum\nchemical models. Each of these have limitations when used in data analysis and modeling. Numerical representations are then introduced, which\nallow the application of statistical and mathematical modeling approaches. These numerical representations are commonly derived from chemical\ngraph and quantum chemical representations. CoMFA and the classification of enzyme reactions are examples were the choice of molecular\nrepresentation as well as the analysis method are important.</p>\n\n<p>The term <em>molecular chemometrics</em> is coined in Chapter 2 for the field that applies statistical modeling methods to molecular structure.\nIt reviews the advances made in this field in recent years. New numerical descriptors for molecules are discussed, as well as approaches to\nrepresent molecules in more complex systems like crystal structures and reactions. Molecular descriptors are used in similarity and diversity\nanalysis. The applications of new methods for structure-activity and structure-property modeling, and dimension reduction are described. An\noverview of recent approaches in model validation show new insights and approaches to estimate the performance of classification and regression\nmodels. The last section of this chapter lists new databases and introduces new methods that improve the extracting of chemical data from\ndatabase and repositories. Semantic markup languages improve the exchange of data, and new methods have been introduced to extract molecular\nproperties from text documents.</p>\n\n<p>Chapter 3 studies the in literature proposed use of 1D <sup>13</sup>C and <sup>1</sup>H NMR spectra as molecular descriptor. These spectra\nare known to describe features relevant to physical properties like solubility and boiling point. The NMR representation is studied for the\npredictive powers of its PLS models for three structure-property data sets. The results indicate that proton NMR is not suitable for building\nQSPR models in combination with PLS. Carbon NMR-based models, however, do give reasonable QSPR models, and the regression vectors for the\ncarbon NMR data, correlate with spectral regions relevant to molecular fragments. Nevertheless, the predictive power of the carbon NMR-based\nspectra is still less than models based on common molecular descriptors. It is concluded that NMR spectra should not be considered first\nchoice when making predictive models in general, and that proton NMR should probably not be used at all.</p>\n\n<p>A computational method to calculate similarities between crystal structures based on a new representation is introduced in Chapter 4. While\na reference method is perfectly able to identify structures with high similarity, it fails to recognize the different similarities between\ntwo similar structures and two completely different structures. This makes it very difficult for clustering algorithm to organize small\nclusters of identical and highly similar structures into larger clusters. The new representation of crystal structures introduced in this\nchapter shows a much smoother transition in similarity values when crystal structures go from identical, via similar, and finally to\ndissimilar structures. Clustering a set of simulated polymorphic structures of estrone, and classification of a set of experimental\ncephalosporin structures reproduce expected clustering and classification.</p>\n\n<p>Chapter 5 uses supervised self-organizing maps to cluster crystal structures represented by their powder diffraction pattern and one or\nmore properties. The topological structure of the resulting maps not only depends on the similarity of the diffraction data, but also on\nthe properties of interest, such as cell volume, space group, and lattice energy. This approach is used to analyze and visualize large\nsets of crystal structures, and the results show that these supervised maps not only give a better mapping, they can also be used to predict\ncrystal properties based on the diffraction patterns, and for subset selection in polymorph prediction. The two applications in\ncrystallography show that suitable representations and similarity measures that allow data analysis and modeling of molecular crystal data\nare now available. Both approaches are flexible enough to open up a new field of research; especially combinations with other classification\nschemes for crystal structures, such as those based on hydrogen bonding patterns, come to mind.</p>\n\n<p>Chapter 6 introduces and discusses a method that allows information rich distribution of molecular data between machines, such as measuring\ndevices and computers. Existing approaches often imply not or badly documented semantics which may lead to information loss. CMLRSS is\nproposed and combines two existing web standards: Rich Site Summaries (RSS), also known as RDF Site Summaries, and the Chemical Markup\nLanguage (CML). Here, RSS is used as transport layer, while CML is used to contain the chemical information. CML supports a wide range of\nchemical data, including molecular (crystal) structures, reaction schemes, and experimental data such as NMR spectra. It is shown that\nthis semantic representation allows automated dissemination of chemical data, and is increasingly used to exchange data between web\nresources.</p>\n\n<p>Chapter 7 describes a communal effort to realize interoperability in chemical informatics, which is called the Blue Obelisk movement.\nThis movement currently consists of more than ten smaller and larger, open source and open data projects all related to chemoinformatics\nand chemistry in general. To increase the reproducibility of molecular representations, this chapter introduces a collaborative dictionary\nof chemoinformatics algorithms, and a public repository of chemical data of general interest, including data for chemical elements and\nisotopes, (boiling points, colors, electron affinities, masses, covalent radii, etc.), definitions of atom types, and more. The\navailability of a standard set of atomic properties, open source algorithms and open data (for example via CMLRSS feeds), it is much\neasier to reproduce and validate published results in molecular chemometrics. Results from Chapter 3 show that such ability is no luxury.</p>\n\n<p>The last chapter summarizes the efforts in this thesis and how they address the challenges in molecular chemometrics. This thesis shows\nthe strong interaction between representation and the methods used for data analysis: molecular representation need to capture relevant\ninformation and be compatible with the statistical methods used to analyze the data. The chapters review molecular\nrepresentations and put focus on model validation using statistics, visualization methods, and standardization approaches.</p>",
      "summary": "In 4.5 weeks, on Wednesday April 2 (13:30 precisely, Aula, Comeniuslaan 2, Nijmegen) I will publicly defend my PhD work performed in the Analytical Chemistry group of Prof. Lutgarde Buydens at the Radboud University Nijmegen:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/thesisCover.png",
      "date_published": "2008-03-01T00:00:00+00:00",
      "date_modified": "2008-03-01T00:00:00+00:00",
      "tags": ["cheminf","chemometrics","phd"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1080/10408340600969601", "doi": "10.1080/10408340600969601"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1021/CI050282S", "doi": "10.1021/CI050282S"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1107/S0108768104028344", "doi": "10.1107/S0108768104028344"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1021/CG060872Y", "doi": "10.1021/CG060872Y"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1021/CI034244P", "doi": "10.1021/CI034244P"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1021/CI050400B", "doi": "10.1021/CI050400B"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/7f9p1-47v65",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/02/27/wheres-maize-genome-torrent.html",
      "title": "Where&apos;s the maize genome torrent?!?",
      "content_html": "<p><a href=\"http://slashdot.org/\">/.</a> just posted <a href=\"http://science.slashdot.org/science/08/02/26/1938210.shtml\">a story about the maize genome</a>\njust published, for which the sequences can be downloaded from <a href=\"http://ftp.maizesequence.org/20080107/\">this FTP site</a>. The files\nare not that large at all. But it makes me wonder… where are the <a href=\"http://en.wikipedia.org/wiki/BitTorrent_%28protocol%29\">.torrent files</a>\nfor the sequenced genomes? Here’s <a href=\"http://politigenomics.blogspot.com/2008/02/your-people-call-it-corn.html\">Davids catch</a> on the story.</p>\n\n<p><strong>Update</strong>: <a href=\"http://www.openhelix.com/blog/?p=165\">OpenHelix discusses the matching genome browser</a>, and indicates that\n<a href=\"http://www.openhelix.com/blog/?p=182\">hundreds of genomes</a> are actively being studied. The\n<a href=\"http://appliedbioinformatics.wur.nl/index.php?option=com_frontpage&amp;Itemid=1\">group where I work</a> recently\n<a href=\"http://www.genomeweb.com/issues/news/143543-1.html\">bought a 454</a> (reg needed, it seems) and participates in the race.\n<a href=\"http://agro.biodiver.se/2008/02/nibbles-peas/\">Nibbles links the cork event to square peas</a>…\nbut my biological background prohibits me to see that link…</p>",
      "summary": "/. just posted a story about the maize genome just published, for which the sequences can be downloaded from this FTP site. The files are not that large at all. But it makes me wonder… where are the .torrent files for the sequenced genomes? Here’s Davids catch on the story.",
      
      "date_published": "2008-02-27T00:10:00+00:00",
      "date_modified": "2008-02-27T00:10:00+00:00",
      "tags": ["bioinfo"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/g5s78-zqk90",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/02/27/cdk-is-now-available-from-your-nearest.html",
      "title": "CDK is now available from your nearest Debian mirror",
      "content_html": "<p><a href=\"https://chem-bla-ics.linkedchemistry.info/2008/02/20/cdk-close-to-entering-debian.html\">Some days have passed <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\nand the Debian mirrors have\nnow picked up the <a href=\"http://cdk.sf.net/\">CDK</a> package (unstable only so far), allowing you to <code class=\"language-plaintext highlighter-rouge\">sudo aptitude install libcdk-java</code> from\nyour favorite local mirror. The details are available from this <a href=\"http://packages.debian.org/libcdk-java\">packages.debian.org/libcdk-java</a>\npage. The fact that it is listed as contrib is a small mistake; the package is really <em>main</em> material.</p>\n\n<p>Now, also make sure to install BeanShell (<code class=\"language-plaintext highlighter-rouge\">sudo aptitude install bsh</code>), which allows you to start scripting the CDK. For example,\nconsider this simple script:</p>\n\n<div class=\"language-java highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kn\">import</span> <span class=\"nn\">org.openscience.cdk.Atom</span><span class=\"o\">;</span>\n<span class=\"nc\">Atom</span> <span class=\"n\">atom</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"nc\">Atom</span><span class=\"o\">(</span><span class=\"s\">\"C\"</span><span class=\"o\">);</span>\n<span class=\"n\">print</span><span class=\"o\">(</span><span class=\"n\">atom</span><span class=\"o\">);</span>\n</code></pre></div></div>\n\n<p>Save this as content of a file <code class=\"language-plaintext highlighter-rouge\">simpleExample.bsh</code>, and run the bsh program to run the script. You will have to set the\n<code class=\"language-plaintext highlighter-rouge\">CLASSPATH</code>, so the full command looks like this on my Linux desktop:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nv\">CLASSPATH</span><span class=\"o\">=</span>/usr/share/java/cdk-interfaces.jar:/usr/share/java/cdk-core.jar:/usr/share/java/cdk-data.jar:/usr/share/java/vecmath1.2-1.14.jar bsh simpleExample.bsh\n</code></pre></div></div>\n\n<p>A wrapper script <code class=\"language-plaintext highlighter-rouge\">cdkbsh</code> that adds the CLASSPATH seems desirable here :) But you get the point.</p>\n\n<p>Interestingly, BeanShell also comes with a graphical user interface, as well as a command line based scripting environment.\nBoth make perfect set ups for quickly testing some code. The GUI version <code class=\"language-plaintext highlighter-rouge\">xbsh</code> looks like (don’t forget to set the CLASSPATH):</p>\n\n<p><img src=\"/assets/images/cdkbsh.png\" alt=\"\" /></p>",
      "summary": "Some days have passed , and the Debian mirrors have now picked up the CDK package (unstable only so far), allowing you to sudo aptitude install libcdk-java from your favorite local mirror. The details are available from this packages.debian.org/libcdk-java page. The fact that it is listed as contrib is a small mistake; the package is really main material.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/cdkbsh.png",
      "date_published": "2008-02-27T00:00:00+00:00",
      "date_modified": "2025-08-17T00:00:00+00:00",
      "tags": ["cdk","debian"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/dmspa-wyb25",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/02/20/cdk-close-to-entering-debian.html",
      "title": "CDK close to entering Debian",
      "content_html": "<p><a href=\"http://gnu.wildebeest.org/diary-man-di/\">Michael Koch</a> (aka man-di) and <a href=\"http://www.wgdd.de/\">Daniel Leidert</a> (as part of the\n<a href=\"http://alioth.debian.org/projects/pkg-java/\">pkg-java team</a>) have worked on packaging the <a href=\"http://cdk.sf.net/\">CDK</a>. The ran into\nsome issues, such as the CDK build system not perfectly compatible with the Debian java libraries in <em>/usr/share/java</em>.\nBoth detection of the available libraries as well as putting them in the classpath, caused trouble with the\n<a href=\"http://build-common.alioth.debian.org/\">CDBS</a>-based build system wrapping around the <a href=\"http://ant.apache.org/\">Ant</a>\n<a href=\"http://cdk.svn.sourceforge.net/viewvc/cdk/branches/cdk-1.0.x/build.xml?view=log\">build.xml</a> (note the many commit this weekend ;).</p>\n\n<p>The result is noteworthy: <a href=\"http://ftp-master.debian.org/new/cdk_1:1.0.1.91-1.html\">CDK has entered the Debian NEW queue</a>. This\nmeans that the Debian experts will check that CDK is really ready to enter Debian. Licenses will be checked, for example. This\nhas been one of my long standing wishes, and I am happy that Michael got around to getting things done. Cheers!</p>",
      "summary": "Michael Koch (aka man-di) and Daniel Leidert (as part of the pkg-java team) have worked on packaging the CDK. The ran into some issues, such as the CDK build system not perfectly compatible with the Debian java libraries in /usr/share/java. Both detection of the available libraries as well as putting them in the classpath, caused trouble with the CDBS-based build system wrapping around the Ant build.xml (note the many commit this weekend ;).",
      
      "date_published": "2008-02-20T00:00:00+00:00",
      "date_modified": "2008-02-20T00:00:00+00:00",
      "tags": ["cdk","debian"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/kk9bs-x2a77",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/02/06/simple-open-bug-track-system-social.html",
      "title": "Simple, Open Bug Track System: social bookmarking",
      "content_html": "<p><a href=\"http://wwmm.ch.cam.ac.uk/blogs/downing/\">Jim</a> replied to the <a href=\"http://chem-bla-ics.blogspot.com/2008/01/why-chemistry-rich-rss-feeds-matter.html#c9123182507496435262\">request by Anthony in my blog</a> <!-- keep link -->\nfor a bug track system for <a href=\"http://wwmm.ch.cam.ac.uk/crystaleye/\">CrystalEye</a> (in beta), after a discussion on the CIF\nprocessing pipeline (see [here] <i class=\"fa-solid fa-recycle fa-xs\"></i>(https://chem-bla-ics.linkedchemistry.info/2008/01/30/why-chemistry-rich-rss-feeds-matter.html),\n<a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=943\">here</a>, <a href=\"http://www.chemspider.com/blog/why-we-cant-publish-scraped-crystaleye-data-yetand-science-commons-declare-a-protocol-for-implementing-open-access-data.html\">here</a>\nand <a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=946\">here</a>).</p>\n\n<p>Instead of setting up a BTS at <a href=\"http://www.sf.net/\">SourceForge</a>, locally with <a href=\"http://www.bugzilla.org/\">Bugzilla</a>, or at\n<a href=\"http://www.launchpad.net/\">LaunchPad</a>, he <a href=\"http://wwmm.ch.cam.ac.uk/blogs/downing/?p=171\">suggested to use</a>\n<a href=\"http://www.connotea.org/\">Connotea</a>:</p>\n\n<blockquote>\n  <p>To report a problem in CrystalEye, simply bookmark an example of the problem with the tag “crystaleyeproblem”, using the\nDescription field to describe the problem. All the problems will appear on the tag feed.</p>\n\n  <p>When we fix the problem we’ll add the tag “crystaleyefixed” to the same bookmark. If you subscribe to this feed, you’ll\nknow to remove the crystaleyeproblem tag.</p>\n\n  <p>In the fullness of time, we’re planning to use connotea tags to annotate structures where full processing hasn’t been\npossible (uncalculatable bond orders, charges etc).</p>\n</blockquote>\n\n<p>Now, Connotea is advertised as a <em>[f]ree online reference management for all researchers, clinicians and scientists</em>,\nand I have never really been happy with any HTML page ending up in the system, I would counter the suggestion by using social\nbookmarking websites for any HTML page (not just publications), such as <a href=\"http://del.icio.us/\">Del.icio.us</a>\n(see <a href=\"http://del.icio.us/search/?fr=del_icio_us&amp;p=crystaleye&amp;type=all\">their list of CrystalEye bookmarks</a>).</p>\n\n<p>Anyway, it does not really matter, and Connotea has an open API to query the database. This will allow Jim to write a simple\nuserscript to enhance each CrystalEye page with a list of bug reports. That will allow every CrystalEye visitor to see what\nothers are commenting on it. In that respect, many other things can be envisioned… Getting comments on the paper behind the\ncrystal structure from <a href=\"http://cb.openmolecules.net/\">Chemical blogspace</a> and <a href=\"http://www.postgenomic.com/\">Postgenomic</a>,\n…</p>",
      "summary": "Jim replied to the request by Anthony in my blog for a bug track system for CrystalEye (in beta), after a discussion on the CIF processing pipeline (see [here] (https://chem-bla-ics.linkedchemistry.info/2008/01/30/why-chemistry-rich-rss-feeds-matter.html), here, here and here).",
      
      "date_published": "2008-02-06T00:00:00+00:00",
      "date_modified": "2025-08-17T00:00:00+00:00",
      "tags": ["crystal"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/kktab-e6159",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/02/05/performance-c-c-c-java-perl-and-python.html",
      "title": "Performance: C, C++, C#, Java, Perl and Python",
      "content_html": "<p>Mathieu Fourment (et al.) just published a paper on some performance testing on 6 programming languages in\n<a href=\"http://www.biomedcentral.com/bmcbioinformatics\">BMC Bioinformatics</a>: <em>A comparison of common programming languages used\nin bioinformatics</em> (doi:<a href=\"https://doi.org/10.1186/1471-2105-9-82\">10.1186/1471-2105-9-82</a>). The below figure is from\nthe paper, for a sequence alignment exercise (copyright with paper authors, OpenAccess license of journal):</p>\n\n<p><img src=\"/assets/images/alignment.gif\" alt=\"\" /></p>\n\n<p>Nothing shocking, I’d say; Java is similar in performance to C++.</p>\n\n<p>What I’d love to have seen, was the performance of compiled Java too, using the java compiler (<em>gcj</em>) which comes with\nGCC 4.1.1. No idea why that was left out. One could also question why they did not use the 1.6 JVM of Sun,\nwhich is more faster (see <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/08/01/cdk-and-java-6-beta.html\">these results on running the CDK unit tests <i class=\"fa-solid fa-recycle fa-xs\"></i></a>).\nAnd, a major omission is Fortran.</p>\n\n<p>Anyway, the authors provide <a href=\"http://www.bioinformatics.org/benchmark/\">the source code</a>, so we can easily test\nourselves the effects of that.</p>\n\n<p>BTW, first post? :) <strong>update:</strong> At least I beat <a href=\"http://cszamudio.spaces.live.com/blog/cns!9BCF6F9D6772B8F5!1742.entry\">Carlos</a>.</p>",
      "summary": "Mathieu Fourment (et al.) just published a paper on some performance testing on 6 programming languages in BMC Bioinformatics: A comparison of common programming languages used in bioinformatics (doi:10.1186/1471-2105-9-82). The below figure is from the paper, for a sequence alignment exercise (copyright with paper authors, OpenAccess license of journal):",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/alignment.gif",
      "date_published": "2008-02-05T00:00:00+00:00",
      "date_modified": "2025-07-30T00:00:00+00:00",
      "tags": ["java"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1186/1471-2105-9-82", "doi": "10.1186/1471-2105-9-82"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/nks4k-n7e69",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/02/02/defining-development-goals-launchpad.html",
      "title": "Defining Development Goals: LaunchPad complements SourceForge",
      "content_html": "<p>Today, Miguel (who made <a href=\"https://chem-bla-ics.linkedchemistry.info/2008/02/02/10000-cdk-commits.html\">the 10000th CDK commit <i class=\"fa-solid fa-recycle fa-xs\"></i></a>) and I gave\n<a href=\"http://launchpad.net/\">LaunchPad</a> a go, because if offers a nice GUI for planning and monitoring source code development. We\nhave set up a <a href=\"https://launchpad.net/~cdk-developers\">CDK team</a> and a <a href=\"https://launchpad.net/cdk/\">CDK project</a>. LaunchPad\nhas overlap with <a href=\"http://www.sf.net/\">SourceForge</a> functionality, but they idea is not to duplicate functionality. Moreover,\nwe do not translate the CDK either, so that LaunchPad functionality is not useful either. Not for the CDK at least; maybe for\n<a href=\"http://www.jmol.org/\">Jmol</a> and <a href=\"http://www.bioclipse.net/\">Bioclipse</a>?</p>\n\n<p>However, we are interested in the task management system of LaunchPad. While the CDK project is currently maintaining a\n<a href=\"http://sourceforge.net/tracker/?atid=631143&amp;group_id=20024&amp;func=browse\">Project Maintenance Tasks</a> tracker, it does not have\nthe feature richness of the LaunchPad equivalent. The latter allows us to link tasks with series goals. We currently basically\nhave two series: the cdk1.0.x/ branch, and trunk. Miguel and I have been working on getting the\n<a href=\"http://cheminfo.informatics.indiana.edu/~rguha/code/java/nightly/api/org/openscience/cdk/qsar/descriptors/molecular/IPMolecularDescriptor.html\">ionization potential prediction</a>\nin trunk working, which involves about all the code Miguel wrote during his PhD thesis with\n<a href=\"http://www.steinbeck-molecular.de/steinblog/\">Christoph</a>. And, this is one of the goal of the next stable CDK series\n(replacing the 1.0.x series). This is something we can easily define in LaunchPad:</p>\n\n<p><img src=\"/assets/images/trunkSeriesGoals.png\" alt=\"\" /></p>\n\n<p>Getting the IP-prediction code updated for the new CDK atom types and other changes, and making it CDK stable involved\nquite a long list of tasks, which shows dependencies. For example, I can’t continue\n<a href=\"http://cdk.svn.sourceforge.net/viewvc/cdk/branches/egonw/charge/\">cleaning up the partial charge prediction code</a>, before\nthe <a href=\"http://cdk.svn.sourceforge.net/viewvc/cdk/branches/miguelrojasch/reaction/\">resonance structure generator in the reaction module</a>\nis working properly again. This in turn depends on me adding missing radical and charge atom types, which in turn depends\non expected atom types, which Miguel had to implement. And this last is actually what he was committing around\nthe 10000th commit.</p>\n\n<p>Now, Miguel and I will try to manage this development in trunk using LaunchPad. It allows as to define all these\nsmaller tasks, but, more importantly, the dependencies between them:</p>\n\n<p><img src=\"/assets/images/ipPredictionDeps.png\" alt=\"\" /></p>\n\n<p>As such, LaunchPad gives us the means to manage this complex development. It shows up what we’re facing, how far we\nhave progressed, and much, much more:</p>\n\n<p><img src=\"/assets/images/taskMaintenance.png\" alt=\"\" /></p>\n\n<p>This goes well beyond what SourceForge has to offer; this will be an interesting experiment. I do not anticipate dropping\nSourceForge at all (just in case you were wondering…); they have served as generally very, very well; and completely\nfree too! (LaunchPad is free too) As far as I can see, they form a perfect complement. Like a ligand and an enzyme, like\nopensource and <a href=\"http://precedings.nature.com/documents/39/version/1\">open notebook science</a>, or like a\nMammoth and an ice field.</p>\n\n<p>Speaking about ONS… <a href=\"http://usefulchem.blogspot.com/\">Jean-Claude</a>, not sure if LaunchPad would be open to projects\nwithout source code too…</p>",
      "summary": "Today, Miguel (who made the 10000th CDK commit ) and I gave LaunchPad a go, because if offers a nice GUI for planning and monitoring source code development. We have set up a CDK team and a CDK project. LaunchPad has overlap with SourceForge functionality, but they idea is not to duplicate functionality. Moreover, we do not translate the CDK either, so that LaunchPad functionality is not useful either. Not for the CDK at least; maybe for Jmol and Bioclipse?",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/ipPredictionDeps.png",
      "date_published": "2008-02-02T00:10:00+00:00",
      "date_modified": "2025-08-17T00:00:00+00:00",
      "tags": ["cdk"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/w1hpp-ks316",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/02/02/10000-cdk-commits.html",
      "title": "10000 CDK commits!",
      "content_html": "<p>It has happened. Just a few minutes ago. The <a href=\"http://cdk.svn.sourceforge.net/viewvc/cdk?view=rev&amp;revision=10000\">10000th commit</a>\nto the <a href=\"http://cdk.sf.net/\">CDK</a> source code repository. Miguel was the lucky(?) one. From our IRC channel #cdk on the\n<a href=\"http://www.freenode.net/\">irc.freenode.net</a> network:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>[19:55]  cdk: miguelrojasch * r10000 /branches/miguelrojasch/\n  reaction/src/org/openscience/cdk/ (2 files in 2 dirs): Removed Flags. \n  They were not used anymore.\n</code></pre></div></div>\n\n<p>And a screenshot:</p>\n\n<p><img src=\"/assets/images/commit10000.png\" alt=\"\" /></p>\n\n<p>The first source code was actually only added with the <a href=\"http://cdk.svn.sourceforge.net/viewvc/cdk?view=rev&amp;revision=5\">5th commit</a>,\nmade by <a href=\"http://www.steinbeck-molecular.de/steinblog/\">Christoph</a>, days after our meeting with\nDan (of <a href=\"http://www.jmol.org/\">Jmol</a> fame) at Notre Dame in September 2000.</p>\n\n<p>The full list of people who contributed to this enormous success is\n<a href=\"http://www.ohloh.net/projects/380/contributors?page=1\">provided by OHLOH</a>.</p>",
      "summary": "It has happened. Just a few minutes ago. The 10000th commit to the CDK source code repository. Miguel was the lucky(?) one. From our IRC channel #cdk on the irc.freenode.net network:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/commit10000.png",
      "date_published": "2008-02-02T00:00:00+00:00",
      "date_modified": "2008-02-02T00:00:00+00:00",
      "tags": ["cdk"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/cbnqa-gdc49",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/01/30/why-chemistry-rich-rss-feeds-matter.html",
      "title": "Why chemistry-rich RSS feeds matter...",
      "content_html": "<p>Peter <a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=943\">wrote up an item</a> on Nick’s\n<a href=\"http://wwmm.ch.cam.ac.uk/crystaleye/\">CrystalEye’s RSS feed</a>, and I have been enthusiastic about\nchemistry-enriched RSS feeds for some time. CMLRSS has the chemical data inline in the RSS; see\nDOI:<a href=\"http://dx.doi.org/10.1021/ci034244p\">10.1021/ci034244p</a>, the use of CMLRSS in Chemical blogspace\ndescribed <a href=\"http://chemicalblogspace.blogspot.com/2007/01/cb-gets-cmlrss-feed.html\">here</a> and\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2007/04/30/improved-cmlrss-feed-for-chemical.html\">here <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\nand the <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/03/06/progress-with-cmlrss-plugin-for.html\">CMLRSS support in Bioclipse <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.</p>\n\n<p>Nick’s RSS feed does not put the chemistry inline, but does link to the raw CML file:</p>\n\n<div class=\"language-xml highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nt\">&lt;entry&gt;</span>\n  <span class=\"nt\">&lt;title&gt;</span>No title supplied<span class=\"nt\">&lt;/title&gt;</span>\n  <span class=\"nt\">&lt;link</span> <span class=\"na\">rel=</span><span class=\"s\">\"enclosure\"</span> <span class=\"na\">href=</span><span class=\"s\">\"http://wwmm.ch.cam.ac.uk/crystaleye/summary//acs/inocaj/2008/3/data/ic702497x/ic702497xsup1_THP4-SINC-publ/ic702497xsup1_THP4-SINC-publ.complete.cml.xml\"</span> <span class=\"na\">hreflang=</span><span class=\"s\">\"en\"</span> <span class=\"nt\">/&gt;</span>\n  <span class=\"c\">&lt;!-- much more, that I skipped for brevity --&gt;</span>\n<span class=\"nt\">&lt;/entry&gt;</span>\n</code></pre></div></div>\n\n<p>The example shown by Peter was nicely chosen: something is wrong with that example. It uncovers a\nbug in the pipeline, that could have been uncovered by a simple agent monitoring the RSS feed.\nThat is why this technology is important! It allows pipelining of information between services.</p>\n\n<p>Anyway, before you read on, check the <a href=\"http://wwmm.ch.cam.ac.uk/crystaleye/summary/acta/e/2008/01-00/data/xu2383/xu2383sup1_I/xu2383sup1_I.cif.summary.html\">structure in the example</a>\nyourself <em>(Bis(pyrimidine-2-carboxylato-K2N,O)copper(II))</em>.</p>\n\n<p>Done? Checked it? You saw the problem, right? Good.</p>\n\n<p>I have scanned the CIF source, but that does not seem to contain the problem. It nicely shows a\ngeneral limitation of commonly used chemoinformatics tools: the lack of proper atom typing (a\nproblem I have been looking into for the <a href=\"http://cdk.sf.net/\">Chemistry Development Kit</a>;\nsee <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/07/01/atom-typing-in-cdk.html\">Atom Typing in the CDK <i class=\"fa-solid fa-recycle fa-xs\"></i></a> and\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2007/11/06/evidence-of-aromaticity.html\">Evidence of Aromaticity <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.).</p>\n\n<p>You will have noted that the 2D diagram in Peter’s blog is charged. I checked the\n<a href=\"http://wwmm.ch.cam.ac.uk/crystaleye/summary/acta/e/2008/01-00/data/xu2383/xu2383sup1_I/xu2383sup1_I.complete.cml.xml\"><em>complete</em> CML source code</a>\nfor the CrystelEye entry, and that contains the charges on the two oxygens bound to the cupper\ntoo. However, the copper is not charged. That leads to a rather unlike situation; that is,\nthat crystal structures will about attract the whole laboratory to itself in a blink of an\neye: there is nothing to balance the double-negative charge! It is conveniently summarized in this bit of the CML:</p>\n\n<div class=\"language-xml highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nt\">&lt;formula</span> <span class=\"na\">formalCharge=</span><span class=\"s\">\"-2\"</span> <span class=\"na\">concise=</span><span class=\"s\">\"C 10 H 6 Cu 1 N 4 O 4 -2\"</span><span class=\"nt\">&gt;</span>\n  <span class=\"nt\">&lt;atomArray</span> <span class=\"na\">elementType=</span><span class=\"s\">\"C H Cu N O\"</span> <span class=\"na\">count=</span><span class=\"s\">\"10.0 6.0 1.0 4.0 4.0\"</span><span class=\"nt\">/&gt;</span>\n<span class=\"nt\">&lt;/formula&gt;</span>\n</code></pre></div></div>\n\n<p>Now, I also checked the <a href=\"http://wwmm.ch.cam.ac.uk/crystaleye/summary/acta/e/2008/01-00/data/xu2383/xu2383sup1_I/xu2383sup1_I.raw.cml.xml\"><em>raw</em> CML</a>;\nthat seems to be unaffected too. So, the bug must be somewhere in the software that converts\nthe raw CML into complete CML. And, before the InChI calculation, because that one is wrong\ntoo. A agent scanning the RSS feed, would have detected this. Someone interested in writing\nup a grant proposal on this?</p>\n\n<p>BTW, the system is not awfully wrong: the negative charge on the acidic carboxyl groups is\nto be expected. But if the bond between the oxygen and the carbon would have been coordinating,\nnot covalent, and the copper would have been +2, then it was fine. Because many chemoinformatics\ntools do not have really support for dative bonds, a covalent bond could be drawn, but then\nthe oxygens should be uncharged… right, not? :)</p>\n\n<p>Oh, and surely, one can do much, much more with those feeds. I blogged about that earlier in\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2007/08/24/automatic-classification-of-thousands.html\">Automatic Classification of thousands of Crystal Structures <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.</p>",
      "summary": "Peter wrote up an item on Nick’s CrystalEye’s RSS feed, and I have been enthusiastic about chemistry-enriched RSS feeds for some time. CMLRSS has the chemical data inline in the RSS; see DOI:10.1021/ci034244p, the use of CMLRSS in Chemical blogspace described here and here , and the CMLRSS support in Bioclipse .",
      
      "date_published": "2008-01-30T00:00:00+00:00",
      "date_modified": "2025-07-30T00:00:00+00:00",
      "tags": ["rss","chemistry"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1021/CI034244P", "doi": "10.1021/CI034244P"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/g3166-hv249",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/01/23/my-phd-thesis-in-color-and-grayscale.html",
      "title": "My PhD Thesis: in color and grayscale",
      "content_html": "<p>Wednesday is my regular day off from my metabolomics work, and today I am finalizing the layout of my thesis, which I’ll\ndefend on April 2. The print version will feature grayscale images with some of them in color too. However, the PDF\nversion that will end up in our university repository should have color prints. So, while halfway creating suitable\ngrayscale versions of the image, I realized I was not doing it properly. I was replacing the images; so, I lost the\ncolor version. Not good.</p>\n\n<p>But wait, LaTeX can do more; why not have a color and a grayscale option? Here comes <code class=\"language-plaintext highlighter-rouge\">optional.sty</code>. By adding\n<code class=\"language-plaintext highlighter-rouge\">\\usepackage{optional}</code> I can add to the source (from <code class=\"language-plaintext highlighter-rouge\">book.tex</code>):</p>\n\n<div class=\"language-latex highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nt\">\\begin{figure}</span>[bt]\n<span class=\"nt\">\\begin{center}</span>\n  <span class=\"k\">\\subfigure</span><span class=\"na\">[]</span><span class=\"p\">{</span>\n    <span class=\"k\">\\label</span><span class=\"p\">{</span>fig:benzene:a<span class=\"p\">}</span>\n    <span class=\"k\">\\opt</span><span class=\"p\">{</span>color<span class=\"p\">}{</span><span class=\"k\">\\includegraphics</span><span class=\"na\">[width=0.4\\textwidth]</span><span class=\"p\">{</span>intro/benzoCompounds<span class=\"p\">_</span>color<span class=\"p\">}}</span>\n    <span class=\"k\">\\opt</span><span class=\"p\">{</span>grayscale<span class=\"p\">}{</span><span class=\"k\">\\includegraphics</span><span class=\"na\">[width=0.4\\textwidth]</span><span class=\"p\">{</span>intro/benzoCompounds<span class=\"p\">}}</span>\n  <span class=\"p\">}</span>\n  <span class=\"k\">\\hspace</span><span class=\"p\">{</span>2cm<span class=\"p\">}</span>\n  <span class=\"k\">\\subfigure</span><span class=\"na\">[]</span><span class=\"p\">{</span>\n    <span class=\"k\">\\label</span><span class=\"p\">{</span>fig:benzene:b<span class=\"p\">}</span>\n    <span class=\"k\">\\includegraphics</span><span class=\"na\">[width=0.18\\textwidth]</span><span class=\"p\">{</span>intro/Ferrocene-2D<span class=\"p\">}</span>\n  <span class=\"p\">}</span>\n<span class=\"nt\">\\end{center}</span>\n<span class=\"k\">\\caption</span><span class=\"p\">{</span>a) 2D diagrams of the two possible resonance structures of a compound\nwith a phenyl ring. Both diagrams refer to the same compounds, but the depicted\ngraph representations are not identical. b) 2D diagram of ferrocene, which,\nlike all organometallic compounds,\nis difficult to represent with classical chemoinformatics approaches.<span class=\"p\">}</span>\n<span class=\"k\">\\label</span><span class=\"p\">{</span>fig:benzene<span class=\"p\">}</span>\n<span class=\"nt\">\\end{figure}</span>\n</code></pre></div></div>\n\n<p>Ferrocene was already black-and-white, so no worry about that. And, it is just the red colored hydroxyl group.\nBut it serves the point :)</p>\n\n<p>Which then allows me to run pdflatex to create a color version and a grayscale version:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>pdflatex <span class=\"s2\">\"</span><span class=\"se\">\\d</span><span class=\"s2\">ef</span><span class=\"se\">\\U</span><span class=\"s2\">seOption{color}</span><span class=\"se\">\\i</span><span class=\"s2\">nput{book}\"</span>\npdflatex <span class=\"s2\">\"</span><span class=\"se\">\\d</span><span class=\"s2\">ef</span><span class=\"se\">\\U</span><span class=\"s2\">seOption{grayscale}</span><span class=\"se\">\\i</span><span class=\"s2\">nput{book}\"</span>\n</code></pre></div></div>\n\n<p>/me is happy</p>",
      "summary": "Wednesday is my regular day off from my metabolomics work, and today I am finalizing the layout of my thesis, which I’ll defend on April 2. The print version will feature grayscale images with some of them in color too. However, the PDF version that will end up in our university repository should have color prints. So, while halfway creating suitable grayscale versions of the image, I realized I was not doing it properly. I was replacing the images; so, I lost the color version. Not good.",
      
      "date_published": "2008-01-23T00:00:00+00:00",
      "date_modified": "2008-01-23T00:00:00+00:00",
      "tags": ["latex","phd"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/3kvks-pdb73",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/01/20/java-server-pages-with-cdk.html",
      "title": "Java Server Pages with CDK functionality",
      "content_html": "<p>Setting up interactive web pages can be done in many way. <a href=\"http://en.wikipedia.org/wiki/JavaServer_Pages\">Java Server Pages</a> are just one of them.\nThey are quite similar to PHP pages or <a href=\"https://doi.org/10.59350/4gxzp-tds81\">Ruby <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\nand combine plain HTML (and likely any other output) code with fragments of code; Java source code in this case.</p>\n\n<p><a href=\"http://www.ubuntu.com/\">Ubuntu</a>’s <a href=\"http://packages.ubuntu.com/gutsy/web/tomcat5.5\">tomcat5.5</a> package installs quite easily, and sets up a server\nat port 8180. I still have to figure out how to nicely integrate it with the Apache server on port 80, though. Suggestions much appreciated.</p>\n\n<p>From then on, one can add new JSP pages by creating a ‘webapp’ in <code class=\"language-plaintext highlighter-rouge\">/usr/share/tomcat5.5-webapps</code>. The basic structure looks like:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>mb-examples.xml\nmb-examples/index.jsp\nmb-examples/WEB-INF/\nmb-examples/WEB-INF/classes/\nmb-examples/WEB-INF/lib/\n</code></pre></div></div>\n\n<p>Just copying the <a href=\"http://cheminfo.informatics.indiana.edu/~rguha/code/java/nightly/\">large CDK jar</a> (the one with all the third party libraries)\ninto <code class=\"language-plaintext highlighter-rouge\">WEB-INF/lib/</code> did not work for me, but unjaring it into <code class=\"language-plaintext highlighter-rouge\">WEB-INF/classes/</code> seem to work fine.</p>\n\n<p>Then, you can just add Java code using the CDK library for what ever you like. The following (simple) example JSP page, takes one parameter,\na molecular formula. This could be the input given in a FORM, but the below page does not deal with that situation yet:</p>\n\n<div class=\"language-xml highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"err\">&lt;</span>%@ page import=\"java.util.*,org.openscience.cdk.*,org.openscience.cdk.tools.*\" %&gt;\n<span class=\"cp\">&lt;!doctype html public \"-//w3c//dtd html 4.0 transitional//en\"&gt;</span>\n<span class=\"nt\">&lt;html&gt;</span>\n<span class=\"err\">&lt;</span>%\n   String mf = request.getParameter( \"mf\" );\n%&gt;\n<span class=\"nt\">&lt;head&gt;</span>\n   <span class=\"nt\">&lt;meta</span> <span class=\"na\">http-equiv=</span><span class=\"s\">\"Content-Type\"</span> <span class=\"na\">content=</span><span class=\"s\">\"text/html; charset=iso-8859-1\"</span><span class=\"nt\">&gt;</span>\n   <span class=\"nt\">&lt;meta</span> <span class=\"na\">name=</span><span class=\"s\">\"Author\"</span> <span class=\"na\">content=</span><span class=\"s\">\"E.L. Willighagen\"</span><span class=\"nt\">&gt;</span>\n   <span class=\"nt\">&lt;title&gt;</span>Metabolomics Examples<span class=\"nt\">&lt;/title&gt;</span>\n<span class=\"nt\">&lt;/head&gt;</span>\n\n<span class=\"nt\">&lt;body</span> <span class=\"na\">bgcolor=</span><span class=\"s\">\"#FFFFFF\"</span><span class=\"nt\">&gt;</span>\n\n<span class=\"nt\">&lt;table&gt;</span>\n<span class=\"nt\">&lt;tr&gt;</span>\n<span class=\"nt\">&lt;td&gt;</span>Molecular Formula:<span class=\"nt\">&lt;/td&gt;</span>\n<span class=\"nt\">&lt;td&gt;</span><span class=\"err\">&lt;</span>%= mf %&gt;<span class=\"nt\">&lt;/td&gt;</span>\n<span class=\"nt\">&lt;/tr&gt;</span>\n\n<span class=\"err\">&lt;</span>%\nMFAnalyser analyser = new MFAnalyser(mf, new Molecule());\ndouble accurateMass = Math.round(analyser.getMass()*10000.0)/10000.0;\n%&gt;\n\n<span class=\"nt\">&lt;tr&gt;</span>\n<span class=\"nt\">&lt;td&gt;</span>Mono-isotopic Accurate Mass:<span class=\"nt\">&lt;/td&gt;</span>\n<span class=\"nt\">&lt;td&gt;</span><span class=\"err\">&lt;</span>%= accurateMass %&gt;<span class=\"nt\">&lt;/td&gt;</span>\n<span class=\"nt\">&lt;/tr&gt;</span>\n<span class=\"nt\">&lt;/table&gt;</span>\n\n<span class=\"nt\">&lt;/body&gt;</span>\n<span class=\"nt\">&lt;/html&gt;</span>\n</code></pre></div></div>\n\n<p>Now, a lot of improvement can be achieved. For example, the <code class=\"language-plaintext highlighter-rouge\">&lt;head&gt;</code> stuff can be split out in a <code class=\"language-plaintext highlighter-rouge\">header.include</code>. And, after proper integration with the\nApache server, <a href=\"http://httpd.apache.org/docs/2.0/misc/rewriteguide.html\">rewrites</a> could be used to create a REST service. But, the above is just to give you an idea.</p>\n\n<p>In case you wonder, this work is related to the opensource <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/11/22/metware-metabolomics-database-project.html\">MetWare <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\ndatabase software development our group is involved in.</p>",
      "summary": "Setting up interactive web pages can be done in many way. Java Server Pages are just one of them. They are quite similar to PHP pages or Ruby , and combine plain HTML (and likely any other output) code with fragments of code; Java source code in this case.",
      
      "date_published": "2008-01-20T00:00:00+00:00",
      "date_modified": "2025-08-17T00:00:00+00:00",
      "tags": ["java","cdk","metware"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.59350/4gxzp-tds81", "doi": "10.59350/4gxzp-tds81"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/53bd8-ecn45",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/01/15/be-in-my-advisory-board-2-jchempaint.html",
      "title": "Be in my Advisory Board #2: JChemPaint development",
      "content_html": "<p>No idea who the 22 persons are who were <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/11/27/be-in-my-advisory-board-1-being-good.html\">willing to join my advisory board <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\nbut they advised me to finish the JChemPaint work <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/09/20/swt-view-with-new-jchempaint.html\">Niels worked on this summer <i class=\"fa-solid fa-recycle fa-xs\"></i></a>:</p>\n\n<p><img src=\"/assets/images/board1.png\" alt=\"\" /></p>\n\n<p>Like my current main hobby project (<a href=\"https://chem-bla-ics.linkedchemistry.info/2007/07/01/atom-typing-in-cdk.html\">atom typing in the CDK <i class=\"fa-solid fa-recycle fa-xs\"></i></a>), the JChemPaint project will\nbe performed in my non-working hours mostly. A reasonable ETA is, therefore, end of this summer. Main discussion on will be done on the\n<a href=\"https://lists.sourceforge.net/lists/listinfo/cdk-jchempaint\">cdk-jchempaint</a> mailing list. Also note that the\n<a href=\"http://jchempaint.sf.net/\">JChemPaint project</a> at SourceForge is deprecated, because the code is now included in the\n<a href=\"http://cdk.sf.net/\">CDK project</a>.</p>\n\n<p>Over the next weeks, I will post more questions in the poll of this blog regarding the JChemPaint development. So, watch that space.</p>",
      "summary": "No idea who the 22 persons are who were willing to join my advisory board , but they advised me to finish the JChemPaint work Niels worked on this summer :",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/board1.png",
      "date_published": "2008-01-15T00:00:00+00:00",
      "date_modified": "2025-08-17T00:00:00+00:00",
      "tags": ["cdk","jchempaint"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/es4e7-esh15",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/01/06/cdk-literature-4.html",
      "title": "CDK Literature #4",
      "content_html": "<p>Fourth in the <em>CDK Literature</em> series. Really, a follow up on <a href=\"https://chem-bla-ics.linkedchemistry.info/2008/01/03/cdk-literature-3.html\">#3 <i class=\"fa-solid fa-recycle fa-xs\"></i></a> which I\nwanted to get out, even though not really finished yet. But, after 3 comes 4, not 3b. Maybe 3.1, but that suggests at least 3.2-3.9 too,\nlet alone full R (that was supposed to the space of all reals…) I’ll stick to positive non-zero integers.\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2007/01/14/cdk-literature-1.html\">#1 <i class=\"fa-solid fa-recycle fa-xs\"></i></a> and\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2007/07/14/cdk-literature-2.html\">#2 <i class=\"fa-solid fa-recycle fa-xs\"></i></a> are still available too.</p>\n\n<p>Another thing I should remark is that this series does not provide full reviews of the cited papers. Instead, it provides a list of papers\nthat cite one of the two CDK papers (doi:<a href=\"https://doi.org/10.1021/ci025584y\">10.1021/ci025584y</a> and\ndoi:<a href=\"https://doi.org/10.2174/138161206777585274\">10.2174/138161206777585274</a>). It is worth repeating that these two articles are not the\nonly article which describe CDK source code, and maybe I should start listing papers that cite other articles that discuss CDK source\ncode. Anyway, the other papers discussing CDK source code are listed in an\n<a href=\"http://cdk.wiki.sourceforge.net/Literature\">overview I maintain in the CDK wiki</a> (now on\n<a href=\"http://www.sf.net/\">SourceForge</a>), but have not updated it for #4 and #3 yet.</p>\n\n<h3 id=\"organic-reaction-ontology\">Organic Reaction Ontology</h3>\n\n<p>Punnaivanam Sankar and Gnanasekaran Aghila published a paper where they propose a knowledge framework for mechanisms of organic reactions,\nand used an XML framework combined with a ontology for the semantics. JChemPaint and the CDK are cited as opensource tools that support\nreactions. <br />\n<em>Punnaivanam Sankar, Gnanasekaran Aghila, Ontology Aided Modeling of Organic Reaction Mechanisms with Flexible and Fragment Based XML\nMarkup Procedures, J. Chem. Inf. Model., 2007, 47(5):1747 -1762, doi:<a href=\"http://dx.doi.org/10.1021/ci700043u\">10.1021/ci700043u</a></em></p>\n\n<h3 id=\"more-qsar\">More QSAR</h3>\n\n<p>Ma et al. have published a (Chinese) QSAR paper where CDK descriptors have been as molecular represention of a data set with 212 ligands\nfor the <a href=\"http://en.wikipedia.org/wiki/P-glycoprotein\">P-glycoprotein</a>. Models have been build with Random Forests, and classification\nsuccess rates for the test set of around 85%. <br />\n<em>Guang-Li Ma, Xiao-Ping Zhao, Yi-Yu Cheng, Identification of P-gp substrates using a random forest method based on chemistry development\nkit descriptors, Chemical J. of Chinese Universities-Chinese, 2007, 28(10):1885-1888</em></p>\n\n<h2 id=\"chemical-databases\">Chemical Databases</h2>\n\n<p><a href=\"http://www.biotec.or.th/ISL/SMOL/\">sMOL</a> is GPL-licensed software for setting up a small molecule database.\nThe software uses JChemPaint, OpenBabel, JOELib and the CDK for\nchemoinformatics functionality, and R and Weka for statistical analyses. I have not locally installed it yet, but the\n<a href=\"http://www.biotec.or.th/ISL/SMOL/PDF/smol-userguide.pdf\">User Guide</a> shows really nice screenshots. The\n<a href=\"http://www.biotec.or.th/ISL/SMOL/PDF/smol-installer-guide.pdf\">Installer Guide</a> shows a quite polished product too.\nNot sure how open the project is to contributions from others (patches, translations, etc, but will ask. <br />\n<em>Supawadee Ingsriswang, Eakasit Pacharawongsakda, sMOL Explorer: an open source, web-enabled database and exploration tool for Small\nMOLecules datasets, Bioinformatics, 2007, 23(18):2498-2500, doi:<a href=\"http://dx.doi.org/10.1093/bioinformatics/btm363\">10.1093/bioinformatics/btm363</a></em></p>\n\n<h2 id=\"free-tools\">Free Tools</h2>\n\n<p>Bruno Villoutreix wrote an overview of free (as in free beer) services to aid virtual screening. It cites the CDK, Jmol, OpenBabel as\ntools, along with a long list of free but proprietary tools. It does explicitly plead for opensource docking and scoring tools, and,\nas such, potentially useful in grant proposals. <br />\n<em>Bruno Villoutreix, Nicolas Renault, David Lagorce, Olivier Sperandio, Matthieu Montes, Maria Miteva, Free Resources to Assist\nStructure-Based Virtual Ligand Screening Experiments, Current Protein and Peptide Science, 2007, 8(4):381-411,\ndoi:<a href=\"http://dx.doi.org/10.2174/138920307781369391\">10.2174/138920307781369391</a></em></p>\n\n<h2 id=\"metabolomics\">Metabolomics</h2>\n\n<p>Fangping Mu et al. have set up a <a href=\"http://www.genome.jp/kegg/\">KEGG</a>-derived database with annotated reactions where atoms between\nreactants and products are mapped, to help data analysis of isotopomeromics data. The CDK rendering features are used for\nvisualization purposes. The software also builds on\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2007/01/30/cdk-workshop-day-2.html\">BioMeta, work by Martin Ott <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\npresented at last years CDK Workshop. <br />\n<em>Fangping Mu, Robert Williams, Clifford Unkefer, Pat Unkefer, James Faeder, William Hlavacek, Carbon-fate maps for metabolic reactions,\nBioinformatics, 2007, 23(23):3193-3199, doi:<a href=\"http://dx.doi.org/10.1093/bioinformatics/btm498\">10.1093/bioinformatics/btm498</a></em></p>\n\n<p>I got two more papers lined up, but do not have access to Current Pharmaceutical Design.</p>",
      "summary": "Fourth in the CDK Literature series. Really, a follow up on #3 which I wanted to get out, even though not really finished yet. But, after 3 comes 4, not 3b. Maybe 3.1, but that suggests at least 3.2-3.9 too, let alone full R (that was supposed to the space of all reals…) I’ll stick to positive non-zero integers. #1 and #2 are still available too.",
      
      "date_published": "2008-01-06T00:00:00+00:00",
      "date_modified": "2025-08-17T00:00:00+00:00",
      "tags": ["cdk"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.2174/138161206777585274", "doi": "10.2174/138161206777585274"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1021/CI025584Y", "doi": "10.1021/CI025584Y"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1021/ci700043u", "doi": "10.1021/ci700043u"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1093/bioinformatics/btm363", "doi": "10.1093/bioinformatics/btm363"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.2174/138920307781369391", "doi": "10.2174/138920307781369391"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/rkkty-a3w29",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/01/03/cdk-literature-3.html",
      "title": "CDK Literature #3",
      "content_html": "<p>Third in a series summarizing literature citing one of the two CDK articles. See also <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/01/14/cdk-literature-1.html\">#1  <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nand <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/07/14/cdk-literature-2.html\">#2  <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.</p>\n\n<h2 id=\"reviews\">Reviews</h2>\n<p>Two reviews have recently appeared which cite the CDK. Ricard Stefani has written a review in Portuguese of the many NMR-based elucidation tools\non computer-aided structure elucidation. The CDK is cited as a general chemoinformatics tool. It also cites\n<a href=\"http://sourceforge.net/projects/seneca\">SENECA</a> which uses CDK’s <a href=\"http://cdk.svn.sourceforge.net/viewvc/cdk/trunk/cdk/src/org/openscience/cdk/structgen/\">structure generators</a>. <br />\n<em>Ricardo Stefani, Paulo Nascimento, Fernando Da Costa, Computer-aided structure elucidation of organic compounds: Recent advances,\nQuimica Nova, 2007, 30(5):1347-1356, 2007, doi:<a href=\"https://doi.org/10.1590/S0100-40422007000500048\">10.1590/S0100-40422007000500048</a></em></p>\n\n<p>Dimitris Agrafiotis has written a overview of the current state of chemoinformatics, and the CDK is cited\nas tool to calculate molecular descriptors. (<a href=\"http://miningdrugs.blogspot.com/\">Jörg</a> is co-author, and\n<a href=\"http://miningdrugs.blogspot.com/2007/05/recent-advances-in-chemoinformatics.html\">he blogged about this article</a> too). <br />\n<em>Dimitris Agrafiotis, Deepak Bandyopadhyay, Jörg Wegner, Herman van Vlijmen, Recent advances in chemoinformatics, J. Chem. Inf. Model.,\n2007, 47(4):1279-1293, doi:<a href=\"https://doi.org/10.1021/ci700059g\">10.1021/ci700059g</a></em></p>\n\n<h2 id=\"1h-proton-coupling-prediction\">1H proton coupling prediction</h2>\n<p>I wrote up a separate blog item on this the article <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/06/10/janocchio-jmol-and-cdk-based-1h.html\">Janocchio: Jmol and CDK based 1H coupling constant prediction  <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nwritten by David Evans at Eli Lilly. <br />\n<em>David Evans, Michael Bodkin, Richard Baker, Gary Sharman, Janocchio - a Java applet for viewing 3D structures and calculating NMR\ncouplings and NOEs, Magnetic Resonance in Chemistry, 2007, 45(7):595-600, doi:<a href=\"https://doi.org/10.1002/mrc.2016\">10.1002/mrc.2016</a></em></p>\n\n<h2 id=\"qsar\">QSAR</h2>\n<p>Quantitative-structure-activity-relationship (QSAR) modeling projects are finding their way to the CDK too. Dmitry\nKonovalov cites the CDK as a free source (as in gratis) for descriptor calculation and touches the problem of reproducibility\nof descriptor calculations. Unfortunately, it does not discuss initiatives like the descriptor ontology as is\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2006/06/05/recent-developments-of-chemistry.html\">discussed in the second CDK article  <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\nor the efforts discussed in the <a href=\"http://www.blueobelisk.org/\">Blue Obelisk</a> paper (doi:<a href=\"https://doi.org/10.1021/ci050400b\">10.1021/ci050400b</a>),\nsuch as the Blue Obelisk Data Repository which aim to improve this reproducibility. <br />\n<em>Dmitry Konovalov, Danny Coomans, Eric Deconinck, Yvan Vander Heyden, Benchmarking of QSAR models for blood-brain\nbarrier permeation, J. Chem. Inf. Model., 2007, 47(4):1648-1656, doi:<a href=\"https://doi.org/10.1021/ci700100f\">10.1021/ci700100f</a></em></p>\n\n<h2 id=\"soap-webservices\">SOAP webservices</h2>\n<p>Xiao Dong and the rest of the <a href=\"http://cheminfo.informatics.indiana.edu/\">Indiana team</a> have set up SOAP webservices,\namong many wrapping CDK functionality, such as descriptor alculation, 2D similarity and fingerprint calculations, and\n2D structure depiction. They also set up a service for <a href=\"http://ambit.acad.bg/toxTree/\">toxTree</a>, which itself uses\nthe CDK too. <br />\n<em>Xiao Dong, Kevin Gilbert, Rajarshi Guha, Randy Heiland, Jungkee Kim, Marlon Pierce, Geoffrey Fox, David Wild,\nWeb service infrastructure for chemoinformatics, J. Chem. Inf. Model., 2007, 47(4):1303-1307,\ndoi:<a href=\"https://doi.org/10.1021/ci6004349\">10.1021/ci6004349</a></em></p>",
      "summary": "Third in a series summarizing literature citing one of the two CDK articles. See also #1 and #2 .",
      
      "date_published": "2008-01-03T00:00:00+00:00",
      "date_modified": "2025-07-30T00:00:00+00:00",
      "tags": ["cdk"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1590/S0100-40422007000500048", "doi": "10.1590/S0100-40422007000500048"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1021/ci700059g", "doi": "10.1021/ci700059g"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1002/mrc.2016", "doi": "10.1002/mrc.2016"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1021/CI050400B", "doi": "10.1021/CI050400B"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1021/ci700100f", "doi": "10.1021/ci700100f"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1021/ci6004349", "doi": "10.1021/ci6004349"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/5qk4g-0qm07",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/01/02/open-lab-2007-results.html",
      "title": "Open Lab 2007 results",
      "content_html": "<p>The results for the <a href=\"http://scienceblogs.com/clock/2008/01/open_lab_2007_the_winning_entr.php\">Open Lab 2007 are out <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>.\nI participated in this endeavor as judge, and read 75 of the 486 blog items, focusing on the sections <em>chemistry,\nblogging, publishing, politics of science</em>, and a number of blog items with few reviews when I passed them.</p>\n\n<p>I am happy to see that one of the <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/12/04/my-open-laboratory-2007-submissions.html\">chemistry submission I made myself <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nmade it into the anthology: the <a href=\"http://depth-first.com/\">Depth-First</a> item on\n<a href=\"https://doi.org/10.59350/rpn9h-qay37\">SMILES and Aromaticity: Broken? <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.\nCongratulations, Rich!</p>",
      "summary": "The results for the Open Lab 2007 are out . I participated in this endeavor as judge, and read 75 of the 486 blog items, focusing on the sections chemistry, blogging, publishing, politics of science, and a number of blog items with few reviews when I passed them.",
      
      "date_published": "2008-01-02T00:00:00+00:00",
      "date_modified": "2025-02-15T00:10:00+00:00",
      "tags": ["openlab"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.59350/rpn9h-qay37", "doi": "10.59350/rpn9h-qay37"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/2j4mh-gjk19",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/01/02/collaborative-work-with-bioclipse.html",
      "title": "Collaborative work with Bioclipse",
      "content_html": "<p><a href=\"http://bioclipse.blogspot.com/\">Ola</a> blogged about something he is working on for Bioclipse2. The next major series\nof <a href=\"http://bioclipse.net/\">Bioclipse</a> releases will use the <a href=\"http://wiki.eclipse.org/index.php/Rich_Client_Platform\">RCP</a>-based\nresource architecture, which allows better integrating with other RCP plugins, such as the\n<a href=\"http://subclipse.tigris.org/\">Subclipse</a> plugin which allows one to browse <a href=\"http://subversion.tigris.org/\">Subversion</a>\nrepositories directly in Bioclipse. That is cool! Check out the <a href=\"http://bioclipse.blogspot.com/2008/01/subversion-in-bioclipse2.html\">screenshot he posted in his blog</a>.</p>\n\n<p>Now, this kind of integration is important. Subversion is a tool to collaboratively work on data, which can be\nopen source (e.g. the Bioclipse source code), open data (e.g. the Blue Obelisk Data Repository), or any other\nkind. However, unlike tools like <a href=\"http://docs.google.com/\">Google Docs</a>, Bioclipse with Subversion supports\nprovides you with a <em>rich</em> client to process your data. No longer need for putting SMILES into a spreadsheet,\njust put the full 3D structure or NMR spectrum in your joint resource set. This is much more suited for\nOpen Notebook Science, right <a href=\"http://usefulchem.blogspot.com/\">Jean-Claude</a>? Just put in the raw data as\nit came out of the spectrometer, and let Bioclipse deal with data extraction. Oh, did you know that\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2006/06/22/text-mining-for-chemistry-using-oscar3.html\">Bioclipse has Oscar3 <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nintegrated (which has not been updated to the latest release, though)?</p>\n\n<p>Why bother with Wikis and Google Docs if you have Bioclipse? Why, even, bother with\n<a href=\"https://blogs.ch.cam.ac.uk/pmr/2008/01/02/how-to-create-interactive-maps-and-graphs/\">ICE <i class=\"fa-solid fa-recycle fa-xs\"></i></a>?</p>",
      "summary": "Ola blogged about something he is working on for Bioclipse2. The next major series of Bioclipse releases will use the RCP-based resource architecture, which allows better integrating with other RCP plugins, such as the Subclipse plugin which allows one to browse Subversion repositories directly in Bioclipse. That is cool! Check out the screenshot he posted in his blog.",
      
      "date_published": "2008-01-02T00:00:00+00:00",
      "date_modified": "2025-04-12T00:00:00+00:00",
      "tags": ["bioclipse","openscience"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/23egf-b0r74",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/12/21/christmas-presents.html",
      "title": "Christmas presents...",
      "content_html": "<p>Our Christmas tree has not been decorated yet, but the presents are there: the <em>BMC Bioinformatics paper</em> on\nuserscripts in life sciences, Bioclipse 1.2.0, a long list of blogs to rate, and a very nice overview from\n<a href=\"http://www.warr.com/\">Wendy Warr</a> on <a href=\"http://www.qsarworld.com/qsar-workflow1.php\">workflow environments</a>,\ndiscussing and comparing different offerings like <a href=\"http://www.scitegic.com/products/overview/index.html\">Pipeline Pilot</a>,\n<a href=\"http://taverna.sf.net/\">Taverna</a>, and <a href=\"http://www.knime.org/\">KNIME</a>.</p>\n\n<h2 id=\"userscripts\">Userscripts</h2>\n\n<p>The paper on userscripts describes how Greasemonkey scripts can be used to combine different information sources\n(DOI:<a href=\"https://doi.org/10.1186/1471-2105-8-487\">10.1186/1471-2105-8-487</a>). A trailer:</p>\n\n<blockquote>\n  <p><strong>Background</strong> <br />\nThe web has seen an explosion of chemistry and biology related resources in the last 15 years: thousands of scientific journals,\ndatabases, wikis, blogs and resources are available with a wide variety of types of information. There is a huge need to aggregate\nand organise this information. However, the sheer number of resources makes it unrealistic to link them all in a centralised manner.\nInstead, search engines to find information in those resources flourish, and formal languages like Resource Description Framework\nand Web Ontology Language are increasingly used to allow linking of resources. A recent development is the use of userscripts to\nchange the appearance of web pages, by on-the-fly modification of the web content. This pens possibilities to aggregate information\nand computational results from different web resources into the web page of one of those resources.</p>\n</blockquote>\n\n<p><a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/\">Peter</a> et al. have been using this technology for\n<a href=\"http://wwmm.ch.cam.ac.uk/crystaleye/\">CrystalEye</a> too, but the paper was in a finalizing state when the\n<a href=\"https://blogs.ch.cam.ac.uk/pmr/2007/08/15/crystaleye-greasemonkey/\">userscript was announced <i class=\"fa-solid fa-recycle fa-xs\"></i></a>, unfortunately.</p>\n\n<h2 id=\"bioclipse-120\">Bioclipse 1.2.0</h2>\n\n<p>The other present is the <a href=\"http://bioclipse.blogspot.com/2007/12/bioclipse-120-released.html\">Bioclipse 1.2.0</a> release, for which\nthe QSAR feature is a great new feature addition (see my blog the other day with an overview of blog items detailing\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2007/12/20/molecular-qsar-descriptors-in-cdk.html\">my participation in that feature <i class=\"fa-solid fa-recycle fa-xs\"></i></a>).\n<a href=\"http://bioclipse.blogspot.com/\">Ola</a> et al. have done a great job with <a href=\"http://wiki.bioclipse.net/index.php?title=Charting_plugin\">the plot functionality</a>,\nwhich is very nice to scatter plot calculated descriptors. This release is likely going to be the last one in the Bioclipse 1\nseries, except for bug fix releases, so, this release also means I can start contributing to the Bioclipse 2 series. Recent\nitems in the Bioclipse blog show a bright future, with project based resource handling, better scripting (R, ruby,\nJavaScript, BeanShell?).</p>\n\n<p>BTW, we never have presents under the tree; we have <a href=\"http://en.wikipedia.org/wiki/Sinterklaas\">Sinterklaas</a>.</p>",
      "summary": "Our Christmas tree has not been decorated yet, but the presents are there: the BMC Bioinformatics paper on userscripts in life sciences, Bioclipse 1.2.0, a long list of blogs to rate, and a very nice overview from Wendy Warr on workflow environments, discussing and comparing different offerings like Pipeline Pilot, Taverna, and KNIME.",
      
      "date_published": "2007-12-21T00:00:00+00:00",
      "date_modified": "2025-04-12T00:00:00+00:00",
      "tags": ["bioclipse","userscript"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1186/1471-2105-8-487", "doi": "10.1186/1471-2105-8-487"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/47rre-wwc20",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/12/20/molecular-qsar-descriptors-in-cdk.html",
      "title": "The molecular QSAR descriptors in the CDK",
      "content_html": "<p>Pending the release of <a href=\"http://www.bioclipse.net/\">Bioclipse 1.2.0</a>, Ola asked me to do some additional feature\nimplementation for the QSAR feature, such as having the filenames as labels in the descriptor matrix. See also\nthese earlier items:</p>\n\n<ul>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2007/10/18/ore-qsar-in-bioclipse-joelib-extension.html\">More QSAR in Bioclipse: the JOELib extension <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2007/07/26/further-bioclipse-qsar-functionality.html\">Further Bioclipse QSAR functionality development <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2007/06/27/qsar-plugin-for-bioclipse-getting-in.html\">QSAR plugin for Bioclipse getting in shape <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2007/04/24/bioclipse-now-allows-qsar-descriptor.html\">Bioclipse now allows QSAR descriptor selection <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2006/11/03/bioclipse-workshop-short-but.html\">Bioclipse Workshop: short but productive <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n</ul>\n\n<p>(How more open notebook science can you get?)</p>\n\n<p>But I ran into some trouble when both <a href=\"http://joelib.sf.net/\">JOElib</a> and <a href=\"http://cdk.sf.net/\">CDK</a> descriptors\nwere selected, or Ola really. Now, nothing much I plan to do on the JOElib code, but at least I code investigate\nthe CDK code.</p>\n\n<p>The QSAR descriptor framework has been published in the <em>Recent developments of the chemistry development kit (CDK) -\nan open-source java library for chemo- and bioinformatics</em> paper (DOI:<a href=\"https://doi.org/10.2174/138161206777585274\">10.2174/138161206777585274</a>).\nHowever, while most molecular descriptors had JUnit tests for at least the <code class=\"language-plaintext highlighter-rouge\">calculate()</code> method, a full\nand proper module testing was not set up. This involves a rough coverage testing and test methods for all\nmethods in the classes.</p>\n\n<p>So, I set up a new CDK module called <code class=\"language-plaintext highlighter-rouge\">qsarmolecular</code>, and added the coverage test class\n<a href=\"http://cdk.svn.sourceforge.net/viewvc/cdk/trunk/cdk/src/org/openscience/cdk/test/QsarmolecularCoverageTest.java?revision=9638&amp;view=markup\">QsarmolecularCoverageTest</a>.\nThis class is really short and basically only requires a module to be set up, as reflected by the line:</p>\n\n<div class=\"language-java highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kd\">private</span> <span class=\"kd\">final</span> <span class=\"kd\">static</span> <span class=\"nc\">String</span> <span class=\"no\">CLASS_LIST</span> <span class=\"o\">=</span> <span class=\"s\">\"qsarmolecular.javafiles\"</span><span class=\"o\">;</span>\n</code></pre></div></div>\n\n<p>The actual functionality is inherited from the <a href=\"http://cdk.svn.sourceforge.net/viewvc/cdk/trunk/cdk/src/org/openscience/cdk/test/CoverageTest.java?revision=9638&amp;view=markup\">CoverageTest</a>.\nThe coverage testing requires, unlike tools like <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/11/28/code-coverage-making-sure-your-code-is.html\">Emma <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nfor which <a href=\"http://cheminfo.informatics.indiana.edu/~rguha/code/java/nightly/\">reports are generated by Nightly</a>,\na certain naming scheme (explained in <em>Development Tools. 1. Unit testing</em> in\n<a href=\"http://www.cdknews.org/\">CDK News</a> 2.2).</p>\n\n<p>Now, testing for a lot of the methods in the <a href=\"http://cdk.svn.sourceforge.net/viewvc/cdk/trunk/cdk/src/org/openscience/cdk/qsar/IMolecularDescriptor.java?revision=9170&amp;view=markup\">IMolecularDescriptor</a>\nand <a href=\"http://cdk.svn.sourceforge.net/viewvc/cdk/trunk/cdk/src/org/openscience/cdk/qsar/IDescriptor.java?revision=9170&amp;view=markup\">IDescriptor</a>\ninterfaces are actually identical for all descriptors. Therefore, I wrote a\n<a href=\"http://cdk.svn.sourceforge.net/viewvc/cdk/trunk/cdk/src/org/openscience/cdk/test/qsar/descriptors/molecular/MolecularDescriptorTest.java?revision=9653&amp;view=markup\">MolecularDescriptorTest</a>\nand made all JUnit test classes for the molecular descriptors extend this new class. This means that by writing only 10 new tests,\nwith 29 assert statements, for the 45 molecular descriptor classes, 450 new unit tests are run without special effort, making to\ntotal sum of unit tests run each night by Nightly for <a href=\"http://cdk.svn.sourceforge.net/viewvc/cdk/trunk/\">trunk/</a>\npass the 4500 unit tests.</p>\n\n<p>Now, this turned out to be necessary. I count 52 new failing tests, which should hit Nightly in the next 24 hours.</p>",
      "summary": "Pending the release of Bioclipse 1.2.0, Ola asked me to do some additional feature implementation for the QSAR feature, such as having the filenames as labels in the descriptor matrix. See also these earlier items:",
      
      "date_published": "2007-12-20T00:00:00+00:00",
      "date_modified": "2025-04-20T00:00:00+00:00",
      "tags": ["cdk","qsar","bioclipse"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.2174/138161206777585274", "doi": "10.2174/138161206777585274"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/ry980-qya21",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/12/19/test-results-for-cdk-10x-branch.html",
      "title": "Test results for the CDK 1.0.x branch",
      "content_html": "<p>The <a href=\"http://cdk.sf.net/\">Chemistry Development Kit</a> has never really been without any bugs, which is reflected in the number\nof failing JUnit tests. For <a href=\"http://cdk.svn.sourceforge.net/viewvc/cdk/trunk/cdk/\">trunk/</a> this is today 106 failing tests\n(<a href=\"http://cheminfo.informatics.indiana.edu/~rguha/code/java/nightly/junitsummary.html\">live stats</a>). The stable\n<a href=\"http://cdk.svn.sourceforge.net/viewvc/cdk/branches/cdk-1.0.x/\">cdk-1.0.x/</a> branch, however, the number of failing tests\nis not much lower: 64 failing tests today (<a href=\"http://cheminfo.informatics.indiana.edu/~rguha/code/java/nightly-1.0.x/junitsummary.html\">live stats</a>).</p>\n\n<p>Overall, only a low percentage of the tests fails (&lt;2% for cdk-1.0.x/ and &lt;3% for trunk/), and, more importantly, it is\nparticular algorithms that are typically broken. For example, in the structgen module 8 tests fail, for both CDK versions.\nIn the <code class=\"language-plaintext highlighter-rouge\">cdk-1.0.x/</code> branch it is the valency checker code that causes quite a few fails, which I discussed in\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2007/07/01/atom-typing-in-cdk.html\">Atom typing in the CDK <i class=\"fa-solid fa-recycle fa-xs\"></i></a> and which is the reason for\nthe atom type perception refactoring in progress in trunk/ (see <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/11/06/evidence-of-aromaticity.html\">Evidence of Aromaticity <i class=\"fa-solid fa-recycle fa-xs\"></i></a>).\nNot all code in trunk/ has yet been updated yet, and this causes quite a few failing tests for <code class=\"language-plaintext highlighter-rouge\">trunk/</code> in the <code class=\"language-plaintext highlighter-rouge\">reaction</code>,\n<code class=\"language-plaintext highlighter-rouge\">qsarAtomic</code> and <code class=\"language-plaintext highlighter-rouge\">qsarBond</code> modules.</p>\n\n<p>Back to the <code class=\"language-plaintext highlighter-rouge\">cdk-1.0.x/</code> branch. Previous CDK releases tended to have around 40 failing tests, so I was worried about\nthe number of tests failing now. Maybe backported patches causes additional fails? To study that I had my machine run\nthe JUnit tests for all revisions of the <code class=\"language-plaintext highlighter-rouge\">cdk-1.0.x/</code> branch since the branch was made in commit\n<a href=\"http://cdk.svn.sourceforge.net/viewvc/cdk?view=rev&amp;revision=8343\">8343</a>. The result looks like:</p>\n\n<p><img src=\"/assets/images/results.png\" alt=\"\" /></p>\n\n<p>Indeed, it is a number of backports that cause the clear increase in bugs between commit 9044 and 9058. Nothing particular I can see, and worse, the intermediate revisions do not compile and do not have test restults:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>104 9044 3731  84  73  979.709  0\n105 9045    0   0   0    0.000  0\n106 9046    0   0   0    0.000  0\n107 9047    0   0   0    0.000  0\n108 9048    0   0   0    0.000  0\n109 9049    0   0   0    0.000  0\n110 9050    0   0   0    0.000  0\n111 9051    0   0   0    0.000  0\n112 9052    0   0   0    0.000  0\n113 9053    0   0   0    0.000  0\n114 9054    0   0   0    0.000  0\n115 9055    0   0   0    0.000  0\n116 9056    0   0   0    0.000  0\n117 9057    0   0   0    0.000  0\n118 9058 3740 104 146  989.566  0\n</code></pre></div></div>\n\n<p>I should have taken more care when merging in these patches, even though they are supposed to fix issues:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>Merged r8697: Add a method to the query atom container creator which creates an\n  queryatomcontainer. This replaces each pseudoatom to an anyatom.\nMerged r8699 and r8700: Added test file by Volker (see cdk-user) for the shortest path problem;\n  JUnit test provided by Volker Haehnke (haehnke - bioinformatik uni-frankfurt de), somewhat\n  rewritten.\nMerged r8701: Renamed a variable to comply with http://en.wikipedia.org/wiki/Dijkstra's_algorithm\nMerged r8751: Bug fixes for bugs #1783367 'SmilesParser incorrectly assigns double bonds' and \n  #1783381 'SmilesParser uses Molecule instead of IMolecule'. Test case for bug #1783367.\nMerged r8754 and r8773: Fix and test case for bug #1783547 and #1783546 'Lost aromaticity in \n  SmilesParser with Biphenyl and Benzene'\nMerged r8774: Add a MDL RXN reader which uses the MDLV2000Reader instead of the MDLReader\nMerged r8775, r8776, r8777: bug fixes for #150354 #1783774 #1778479 in the SmilesParser, \n  SmilesGenerator and MDLWriter/PseudoAtom.\nMerged r8791: Code for v,mass atom two digits mass atom and exception handeling\nMerged r8800: Fixed reading of MDL molfiles with exactly 12 columns (==valid) in the bond block\nMerged r8802: Made a little more memory efficient by removing unnesscary cloning operations\nMerged r8803: Fixed it so that we make a deep copy of the input molecule\nMerged r8809: Added code to work on a local copy of theinput molecule\nMerged r8811: Updated Javadocs\nMerged 8824 8821 8820 8819 8817 8816: Added code to properly work on a local copy\n</code></pre></div></div>\n\n<p>I’m quite sure it must be the deep-cloning fix ported from the commits 8800-8824. I already fixed a number of bugs in the IP calculation\ncode which is still a good deal of the failing tests in the cdk-1.0.x/ branch (and affects trunk/ too), as can be seen by the drop in\nbugs just after the big increase:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>r9079 | egonw | 2007-10-15 13:24:10 +0200 (Mon, 15 Oct 2007) | 1 line\n\nRenamed container to localClone to clear up code. Fixed a bug where the uncloned atoms was\nsearched in the cloned atomcontainer. More bugs like this are in the code. Miguel is contacted\nabout this problem.\n------------------------------------------------------------------------\nr9082 | egonw | 2007-10-15 13:48:15 +0200 (Mon, 15 Oct 2007) | 1 line\n\nRenamed container to localClone to clear up code. Fixed a bug where the uncloned atoms was\nsearched in the cloned atomcontainer.\n</code></pre></div></div>\n\n<p>The big drop in number of fails is caused by the removal of the SMARTS code from the branch, which has been present since\nthe start of the branch (see <a href=\"http://cdk.svn.sourceforge.net/viewvc/cdk/branches/cdk-1.0.x/src/org/openscience/cdk/smiles/?pathrev=8343\">this page</a>).</p>\n\n<p>From this analysis I conclude that CDK 1.0.2 can soon be released. With the note that the ionization potential calculation\nis not safe to use.</p>",
      "summary": "The Chemistry Development Kit has never really been without any bugs, which is reflected in the number of failing JUnit tests. For trunk/ this is today 106 failing tests (live stats). The stable cdk-1.0.x/ branch, however, the number of failing tests is not much lower: 64 failing tests today (live stats).",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/results.png",
      "date_published": "2007-12-19T00:00:00+00:00",
      "date_modified": "2025-04-20T00:00:00+00:00",
      "tags": ["cdk"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/yxjrz-jz465",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/12/17/open-data-getting-more-recognition.html",
      "title": "Open Data getting more recognition",
      "content_html": "<p>The OD part of <a href=\"http://blueobelisk.sourceforge.net/wiki/ODOSOS\">ODOSOS</a> is getting more and more attention, and it\nseems that <a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/\">Peter</a>’s Open Data battle is paying off (see his\n<a href=\"http://en.wikipedia.org/w/index.php?title=Open_Data&amp;oldid=84679322\">original OpenData article in Wikipedia</a>): an\n<a href=\"http://www.opendatacommons.org/odc-public-domain-dedication-and-licence/\">open data specific license</a> has reached\nthe beta stage (see this <a href=\"http://creativecommons.org/weblog/entry/7917\">announcement</a>).</p>\n\n<p>The idea behind this licenses seems to come down to:</p>\n\n<blockquote>\n  <p><strong>Facts are free</strong>. The Rightsholder takes the position that factual information is not covered by Copyright. This\nDocument however covers the Work in jurisdictions that may protect the factual information in the Work by Copyright,\nand to cover any information protected by Copyright that is contained in the Work.</p>\n</blockquote>\n\n<p>I am looking forward how this license will be picked up by the community. <a href=\"http://pubchem.ncbi.nlm.nih.gov/\">PubChem</a>\nmay be a good candidate to use this license; to formalize their dump into the public domain. Not just yet, though,\nbecause things might still change. It is said that a wiki will be set up to ask for feedback. Paul has written\n<a href=\"http://blogs.talis.com/nodalities/2007/12/licensing_open_data_creative_c.php\">a nice writeup on the history of this license</a>.</p>\n\n<p>I particularly like the quote by Tim O’Reilly from <a href=\"http://radar.oreilly.com/archives/2006/07/four_big_ideas_about_open_sour.html\">this blog</a>:</p>\n\n<blockquote>\n  <p>One day soon, tomorrow’s Richard Stallman will wake up and realize that all the software distributed in the world\nis free and open source, but that he still has no control to improve or change the computer tools that he relies\non every day. They are services backed by collective databases too large (and controlled by their service providers)\nto be easily modified. Even data portability initiatives such as those starting today merely scratch the surface,\nbecause taking your own data out of the pool may let you move it somewhere else, but much of its value depends on\nits original context, now lost.</p>\n</blockquote>\n\n<p>In the past I have <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/10/18/open-data-misconception-1-you-do-not.html\">argued for the CC-BY license <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\nand so does Peter in <a href=\"https://blogs.ch.cam.ac.uk/pmr/2007/12/15/deepak-singh-educating-people-about-data-ownership/ on a post by Deepak on\n[educating people about data ownership](http://mndoci.com/blog/2007/12/15/educating-people-about-data-ownership/).\nInterestingly, the new license proposes to remove ownership as solution to *free the data* :\">this recent comment <i class=\"fa-solid fa-recycle fa-xs\"></i></a></p>",
      "summary": "The OD part of ODOSOS is getting more and more attention, and it seems that Peter’s Open Data battle is paying off (see his original OpenData article in Wikipedia): an open data specific license has reached the beta stage (see this announcement).",
      
      "date_published": "2007-12-17T00:00:00+00:00",
      "date_modified": "2025-04-20T00:00:00+00:00",
      "tags": ["odosos","openscience"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/c2vv2-thb23",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/12/13/i-dont-blame-individuals-in-commercial.html",
      "title": "I don&apos;t blame Individuals in Commercial Chemoinformatics",
      "content_html": "<p>The <a href=\"http://www.chemspider.com/blog/?p=302\">comment I left</a> in the <a href=\"http://www.chemspider.com/blog/\">ChemSpider</a> blog,\nwas probably a bit blunt. ChemSpider announced having licensed software from <a href=\"http://www.eyesopen.com/\">OpenEye</a>. I\nhave seen such announcements more often, but am intrigued about the nature of such announcements. Is it bad that\nChemSpider is using OpenEye software? Certainly not. But it is surprising that they <em>“announced today they had entered\ninto an agreement that will <strong>allow</strong> the incorporation of a number of OpenEye’s products into ChemZoo’s online\nchemistry database and property prediction service, ChemSpider”</em> (emphasis mine).</p>\n\n<p>Is it really special that you buy software and then use it? Maybe, it increasingly is, with a number of good software\nproducts freely available. Even many proprietary products are freely available, sometimes to a selected group only,\nthough. Or, is there some license behind this that restricts you in what you may and may not do with it?</p>\n\n<p>Anyway, I made the somewhat inconsiderate comment: <em>“Amazing! (Forgive me that I [have] not read every bit…) But,\namazing! A press release for the fact that one may use software ;)”.</em></p>\n\n<p>Anthony replied with these lines: <em>“Yes, I think it is amazing that companies of this caliber are willing to provide\ntheir tools at no cost to systems like ChemSpider”</em>. He read my sarcasm correctly. I find it absurd that the future of\nchemoinformatics is left to the goodwill of benevolent companies. Chemoinformatics is way too important, and in way\nto crappy state, to be kept as proprietary toy to industry; that’s\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2007/10/14/why-odosos-is-important.html\">something I argued before <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.</p>\n\n<p>Let me try to explain where my sarcasm is coming from.</p>\n\n<h2 id=\"i-do-not-blame-individuals-in-commercial-chemoinformatics\">I do <em>not</em> blame Individuals in Commercial Chemoinformatics</h2>\n\n<p>There is nothing wrong with getting payed for what you do. I get payed for the software I develop too, though\nmost of my contributions to the <a href=\"http://cdk.sf.net/\">CDK</a>, <a href=\"http://jmol.org/\">Jmol</a> en even some some of my\ncontributions to <a href=\"http://www.bioclipse.net/\">Bioclipse</a> I have made as a hobby, in my spare time, unpaid. Nothing\nwrong with a good hobby, I would say.</p>\n\n<p>But I do not blame people for not doing the same. Neither do I blame myself for making a reasonable living in\nThe Netherlands, unlike all those poor bastards who struggle to make it to the next month, like\n<a href=\"http://en.wikipedia.org/wiki/Poverty_in_the_United_States\">many in the United States</a>. But I do not like the\nsituation. Neither do I blame people for being religious, though I really dislike several of the things the\n<em>Church</em> is trying to make people believe (such as that\n<a href=\"http://www.guardian.co.uk/aids/story/0,7369,1059068,00.html\">the HIV virus can get through condoms</a>).\nI hate the situation.</p>\n\n<h2 id=\"i-do-not-dislike-the-commercial-model\">I do <em>not</em> dislike the Commercial Model</h2>\n\n<p>People have to make a living. I do; anyone does. I do feel, however, there is a difference between making a\nliving because you work, and getting money because you happen to be at the right side of the money flow. There\nis a difference between a baker getting up at 5am every morning to feed a village, and someone selling a thin\nslice of bread via eBay to a poor African soul who just received his/her OPLC laptop. Not that I think this\nreally applies to the ChemSpider/OpenEye deal; just to make a statement about commercialism.</p>\n\n<p>The Bill Gates foundation spending a lot of money on scientific research is what Dutch would call <code class=\"language-plaintext highlighter-rouge\">een sigaar uit eigen doos</code>.\nThis translate to something like getting a present you payed yourself. Literally, ‘to get a sigar from ones own box’.\nBut that’s another story.</p>\n\n<h2 id=\"i-hate-the-situation\">I hate the situation</h2>\n\n<p>I hate the situation that research for new drugs is so expensive, and medicine likewise. I hate it that\npharmaceutical industry cannot sell these drugs cheaply to development countries, because they will be sold\nexpensively in western markets. But I do not blame the scientists working in pharma industry.</p>\n\n<p>I hate the situation that scientific results cannot be reproduced independently, because software is being\nused as black box. But I do not blame the guy who wrote the code.</p>\n\n<p>I hate the situation that I cannot contribute the excellent products around, because they disallow me to\ndiscuss my work with others. But I do not blame the guy who sold me the license.</p>\n\n<p>I hate the situation that many very qualified scientists have to find a post-doc after post-doc before the\ngive up and do to industry. I hate the situation that the better scientist you are, the less science you\nactually do, because all time is spent on getting further funds. But I do not blame those who payed for\nthose temporary post-doc positions.</p>\n\n<p>I hate the situation that people have to use commercial models for their scientific contributions, just to\nmake a living, even though they would have loved to contribute that to mankind. But I do not blame them for\nwanting to be able to fulfill their primary living requirements (and those of their families).</p>\n\n<p>I hate the situation I review papers for free for commercial publishers, just to help science progress.\nI do blame myself for not having stopped doing that yet.</p>\n\n<p>But I do not blame ChemSpider for buying or using commercial products. I do not blame the people working\nat OpenEye for making a living. But I do find it absurd that we have to be amazed that scientific software\nis put to work.</p>\n\n<p>I apologize for being blunt, but I cannot apologize for disliking the current situation chemoinformatics is in.</p>",
      "summary": "The comment I left in the ChemSpider blog, was probably a bit blunt. ChemSpider announced having licensed software from OpenEye. I have seen such announcements more often, but am intrigued about the nature of such announcements. Is it bad that ChemSpider is using OpenEye software? Certainly not. But it is surprising that they “announced today they had entered into an agreement that will allow the incorporation of a number of OpenEye’s products into ChemZoo’s online chemistry database and property prediction service, ChemSpider” (emphasis mine).",
      
      "date_published": "2007-12-13T00:00:00+00:00",
      "date_modified": "2025-04-20T00:00:00+00:00",
      "tags": ["openscience"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/smfvd-pdy70",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/12/10/tagging-thesauri-or-ontologies.html",
      "title": "Tagging, thesauri or ontologies?",
      "content_html": "<p>Controlled vocabularies, hierarchies, microformats, RDF. <a href=\"\">Nico Adams</a> pointed me to\n<a href=\"http://www.youtube.com/watch?v=-4CV05HyAbM\">this excellent video</a>:</p>\n\n<iframe width=\"560\" height=\"315\" src=\"https://www.youtube-nocookie.com/embed/-4CV05HyAbM?si=pVgGAYB9ztr06NmN\" title=\"YouTube video player\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen=\"\"></iframe>\n\n<p>It’s a really nifty piece of work, which goes into the differences between thesauri, controlled vocabularies,\nand, as such, ontologies, and social tagging systems. Both have their virtues; it is fuzzy logic versus ODEs\nall over again. Whether one is better than the other only depends on the problem at hand. For example, can\nyou imagine social tagging in atom typing prior to performing force field calculations? Or, an 150-term\nontology to annotate the scientific content of your literature archive?</p>\n\n<h2 id=\"more-from-where-they-come-from\">More from where they come from…</h2>\n\n<p>The video appears to be made by the <a href=\"http://mediatedcultures.net/ksudigg/\">Digital Etnography</a> group, which\nhas made several <a href=\"http://mediatedcultures.net/youtube.htm\">more movies</a>. Certainly something I’m going to\ncheck out over the winter holidays (I guess I am quite a bit more religious about ODOSOS than about gods).</p>\n\n<p>Nico wrote: <em>As long as we appreciate that there may be more than one top node…</em>. I am not entirely sure,\nbut if he refers the thesauri, which are, a particular form of ontologies, where basically the only relations\nthat can be found are <code class=\"language-plaintext highlighter-rouge\">is-a</code> or <code class=\"language-plaintext highlighter-rouge\">is-parent-of</code>, resulting in a hierarchy of controlled terminology with one\ntop node (such as the Gene Ontology). Ontologies can and should be much richer if we really want to take\nadvantage of our information technologies, just like we do with any graph mining. Why mould reality in a\ntight hierarchy?</p>\n\n<h2 id=\"chemical-ontologies\">Chemical ontologies</h2>\n\n<p>Peter has [not seen the movie yet] <i class=\"fa-solid fa-recycle fa-xs\"></i>(https://blogs.ch.cam.ac.uk/pmr/2007/12/09/ontologies-in-physics-and-chemistry/),\nbut replied with a recent comment he had on <a href=\"http://en.wikipedia.org/wiki/Chemical_Markup_Language\">CML</a>:</p>\n\n<blockquote>\n  <p>Ebs and Michael had reviewed CML and questioned why the key concepts were atoms, molecules, electron,\nsubstances, whereas they suggested it would have been better to start from reactions. I think that’s\na very clear difference in orientation between endurants and perdurants. Although chemists publish\nreactions, most of the emphasis is on (new) substances and their properties. CML is designed to map\ndirectly onto the way chemists seem to think - at least in their public communication - e.g. through\ndocuments. Of course we can also do reactions in CML, but even there the emphasis is often on the\ncomponents.</p>\n</blockquote>\n\n<p>The suggestion by Ebs and Michael is indeed quite surprising: ontologies tries to capture knowledge and expressed this an a small set of terms, each of which with an accurate and non-overlapping meaning (orthogonal, if you wish). Now, the terms carbon, nitrogen, oxygen, and the other 104 elements are quite accurate and rather different from each other, at least from a chemical point of view. Sure, bonding is more difficult, and let’s not start about aromaticity. But to question atoms, bonds or electrons as key concepts??</p>",
      "summary": "Controlled vocabularies, hierarchies, microformats, RDF. Nico Adams pointed me to this excellent video:",
      
      "date_published": "2007-12-10T00:00:00+00:00",
      "date_modified": "2025-04-12T00:00:00+00:00",
      "tags": ["ontology"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/dw744-0sa70",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/12/07/open-source-open-data-at-european.html",
      "title": "Open Source, Open Data at the European Bioinformatics Institute",
      "content_html": "<p>I was pleased to hear that <a href=\"http://www.steinbeck-molecular.de/steinblog/index.php/2007/12/06/steinbeck-group-moves-to-european-bioinformatics-institute-ebi-in-january-2008/\">Christoph will move to the EBI</a>\nearly next year. Christoph has been working on Open Source and Open Data chemoinformatics since at least 1997. I first got in contact with\nChristoph when I wrote code for JChemPaint (which Christoph developed) to be able to read\n<a href=\"http://en.wikipedia.org/wiki/Chemical_Markup_Language\">Chemical Markup Languages</a> (CML). This also got me into contact with\n<a href=\"http://www.nd.edu/~gezelter/Main/index.html\">Dan Gezelter</a> who is the original author of <a href=\"http://www.jmol.org/\">Jmol</a>,\nto which I also added CML support. And, of course, with Henry and <a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/\">Peter</a>,\nwho first developed CML. This was <strong>before</strong> XML was an official recommendation, and I have worked with CML files which you\nwould no longer recognize. It was in Dan’s office that the CDK was founded, where Christoph, Dan and I designed data\nclasses to replace the JChemPaint and Jmol data classes. Both JChemPaint and Jmol were rewritten afterwards, but for\nJmol it was later decided that more tuned classes were needed to achieve to required performance for the live rendering\nof tens of thousands of atoms.</p>\n\n<p>Well, Christoph has done many other Open Source and Open Data stuff, including the <a href=\"http://www.nmrshiftdb.org/\">NMRShiftDB</a>,\n<a href=\"http://www.bioclipse.net/\">Bioclipse</a>, and Seneca, a tool for computer-aided structure elucidation (CASE). The scientific\nimpact for Christoph’s work is considerable. When I realize that much of his past work was setting out foundations, and\nthat these foundations have found the be solid, I am happy to hear that he can now start to apply his work to life\nscience problems, where current methods are failing.</p>\n\n<p>Christoph, cheers!</p>",
      "summary": "I was pleased to hear that Christoph will move to the EBI early next year. Christoph has been working on Open Source and Open Data chemoinformatics since at least 1997. I first got in contact with Christoph when I wrote code for JChemPaint (which Christoph developed) to be able to read Chemical Markup Languages (CML). This also got me into contact with Dan Gezelter who is the original author of Jmol, to which I also added CML support. And, of course, with Henry and Peter, who first developed CML. This was before XML was an official recommendation, and I have worked with CML files which you would no longer recognize. It was in Dan’s office that the CDK was founded, where Christoph, Dan and I designed data classes to replace the JChemPaint and Jmol data classes. Both JChemPaint and Jmol were rewritten afterwards, but for Jmol it was later decided that more tuned classes were needed to achieve to required performance for the live rendering of tens of thousands of atoms.",
      
      "date_published": "2007-12-07T00:00:00+00:00",
      "date_modified": "2007-12-07T00:00:00+00:00",
      "tags": ["bioinfo","jchempaint","jmol","bioclipse","nmrshiftdb"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/btg4h-bg647",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/12/04/my-open-laboratory-2007-submissions.html",
      "title": "My Open Laboratory 2007 submissions",
      "content_html": "<p><a href=\"https://chem-bla-ics.linkedchemistry.info/2007/11/14/last-call-for-open-laboratory-2007.html\">As promised <i class=\"fa-solid fa-recycle fa-xs\"></i></a>, here is my\nlist of submission for the <a href=\"http://scienceblogs.com/clock/2007/11/open_laboratory_2008_last_call.php\">Open Laboratory 2007</a>:</p>\n\n<ul>\n  <li><a href=\"https://blogs.ch.cam.ac.uk/pmr/2007/07/14/open-data-is-critical-for-reproducible-research/\">Open Data is critical for Reproducible Research <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n  <li><a href=\"http://www.thechemblog.com/?p=678\">If you ever made something fluoresce after you did a reaction with a transition metal…</a></li>\n  <li><a href=\"http://pipeline.corante.com/archives/2007/11/02/one_for_the_brave.php\">One For the Brave</a></li>\n  <li><a href=\"http://curlyarrow.blogspot.com/2007/04/fun-with-singlet-oxygen.html\">Fun with singlet oxygen</a></li>\n  <li><a href=\"https://doi.org/10.59350/rpn9h-qay37\">SMILES and Aromaticity: Broken? <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n  <li><a href=\"http://totallysynthetic.com/blog/?p=785\">Resveratrol-Based Natural Products</a></li>\n  <li><a href=\"http://usefulchem.blogspot.com/2007/02/making-anti-malarials-feb-2007-update.html\">Making Anti-Malarials: Feb 2007 Update</a></li>\n</ul>\n\n<p>BTW, even though <a href=\"http://www.scienceblogs.com/strangerfruit/2007/11/open_lab_2007.php\">the judges have started</a>\ntheir way through the submissions, you can still <a href=\"http://openlab.wufoo.com/forms/submission-form/\">submit entries</a>.</p>",
      "summary": "As promised , here is my list of submission for the Open Laboratory 2007:",
      
      "date_published": "2007-12-04T00:00:00+00:00",
      "date_modified": "2025-03-30T00:00:00+00:00",
      "tags": ["openlab"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.59350/rpn9h-qay37", "doi": "10.59350/rpn9h-qay37"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/afvyc-7bq58",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/12/03/web2o-open-chemistry-and-chemblaics.html",
      "title": "Web2O, Open Chemistry, and Chemblaics",
      "content_html": "<p>Chemistry World December issue features a nice item on the future of data in chemistry:\n<a href=\"http://www.rsc.org/chemistryworld/Issues/2007/December/SurfingWeb20.asp\">Surfing Web2O</a>; Peter\n<a href=\"https://doi.org/10.63485/9emta-f1x80\">gave an excerpt <i class=\"fa-solid fa-recycle fa-xs\"></i></a>, and Peter\n<a href=\"https://blogs.ch.cam.ac.uk/pmr/2007/12/03/survey-of-open-chemistry-in-chemistry-world/\">commented on it <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.</p>\n\n<p>The article discusses many of the things that have been happening in the field of chemical data. It\ntouches <a href=\"http://usefulchem.blogspot.com/\">Jean-Claude</a>’s work on Open Notebook Science, and then moves to\n<a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/\">Peter</a>’s Open Data, mentions a number of other blogs and the\n<a href=\"http://cb.openmolecules.net/\">Chemical blogspace</a>. Via some video efforts, it ends up with Mitch’\n<a href=\"http://www.chemmunity.com/\">Chemmunity</a>, which has the coolest Captcha I have seen so far:</p>\n\n<p><img src=\"/assets/images/coolestCaptcha.png\" alt=\"\" /></p>\n\n<p>It also cited <a href=\"http://depth-first.com/\">Rich</a>’ blog item on <a href=\"http://depth-first.com/articles/2007/01/24/thirty-two-free-chemistry-databases\">32 free chemical databases</a>,\n<a href=\"http://www.steinbeck-molecular.de/steinblog/\">Christoph</a>’s <a href=\"http://www.nmrshiftdb.org/\">NMRShiftDB.org</a>,\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2007/02/01/rsc-first-publisher-to-go-semantic.html\">Project Prospect <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\nand CML which recently saw <a href=\"http://cmlexplained.blogspot.com/2007/09/chemical-markup-language-for-spectra.html\">its the 7th research paper</a>.</p>\n\n<p>Of course, this is the arena of <em>chemblaics</em>, but unfortunately <a href=\"http://chem-bla-ics.blogspot.com/\">my blog</a> <!-- keep link -->\nis not cited (though my name mentioned). So, what is wrong with my blog??</p>",
      "summary": "Chemistry World December issue features a nice item on the future of data in chemistry: Surfing Web2O; Peter gave an excerpt , and Peter commented on it .",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/coolestCaptcha.png",
      "date_published": "2007-12-03T00:00:00+00:00",
      "date_modified": "2025-07-30T00:00:00+00:00",
      "tags": ["openscience","nmrshiftdb"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.63485/9emta-f1x80", "doi": "10.63485/9emta-f1x80"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/e298f-r8n49",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/11/27/be-in-my-advisory-board-1-being-good.html",
      "title": "Be in my Advisory Board #1: being a good Open Science citizen",
      "content_html": "<p>I recently saw that blogger.com blogs gained <a href=\"http://buzz.blogger.com/2007/07/polls-out-of-draft.html\">a poll feature</a>.\nFrom now on, I will try to be a bit more Open Science, in addition to Open Source. From now on, <em>you</em> can be in my\nAdvisory Board. To do so, vote on my next chemblaics (aka Open Source Chemoinformatics) project. The poll can be found on\nthe left side of this blog. Associated which each poll, which I may run more or less frequently depending on the time of\nyear, will be one blog post where I introduce the options. Options not mentioned, or completely different things,\nyou would like to suggest me to do, can be left as comments to these items.</p>\n\n<h2 id=\"finishing-the-new-jchempaint-code\">Finishing the new JChemPaint code</h2>\n<p>Goal of this option is to use the <a href=\"http://progz-jchem.blogspot.com/\">code written by Niels</a> in his\n<a href=\"http://www.programmeerzomer.nl/\">ProgrammeerZomer</a> project to implement a new JChemPaint based on Java2D and\nindependent of the widget set used (Swing/AWT/SWT/…).</p>\n\n<h2 id=\"cml-roundtripping-of-the-cdk-data-model\">CML-roundtripping of the CDK data model</h2>\n<p>The goal of this project is to ensure that all information the CDK data model can hold can be roundtripped in\n<a href=\"http://en.wikipedia.org/wiki/Chemical_Markup_Language\">CML</a>.</p>\n\n<p>##Integrating InChI-NestedVM in Bioclipse\n<a href=\"http://depth-first.com/\">Rich</a> is, besides an excellent blogger, also someone who is not afraid to try new things.\nRecently, he experimented with compiling the <a href=\"https://doi.org/10.59350/vhefz-rc472\">InChI library into a Java executable <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.\n<a href=\"http://www.bioclipse.net/\">Bioclipse</a> already is able to generate <a href=\"http://iupac.org/inchi/\">InChI</a>s,\nusing the code written by Sam Adams for the <a href=\"http://cdk.sf.net/\">CDK</a>, but a InChI/NestedVM plugin for Bioclipse\ncould make a nice show case.</p>\n\n<h2 id=\"writing-cdk-news-articles\">Writing CDK News articles</h2>\n<p>On the other hand, you might find that I should focus on getting a new <a href=\"http://cdknews.org/\">CDK News</a> issue out,\nfor which we are stilling lacking (finished) contributions.</p>\n\n<p>It’s up to you. Deadline in about two weeks; still got some other things to finish :)</p>",
      "summary": "I recently saw that blogger.com blogs gained a poll feature. From now on, I will try to be a bit more Open Science, in addition to Open Source. From now on, you can be in my Advisory Board. To do so, vote on my next chemblaics (aka Open Source Chemoinformatics) project. The poll can be found on the left side of this blog. Associated which each poll, which I may run more or less frequently depending on the time of year, will be one blog post where I introduce the options. Options not mentioned, or completely different things, you would like to suggest me to do, can be left as comments to these items.",
      
      "date_published": "2007-11-27T00:00:00+00:00",
      "date_modified": "2025-04-12T00:00:00+00:00",
      
      "_references": [
        
          
          
            { "url": "https://doi.org/10.59350/vhefz-rc472", "doi": "10.59350/vhefz-rc472"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/xxceb-3gc32",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/11/26/metabolomics-workflows-in-taverna.html",
      "title": "Metabolomics workflows in Taverna",
      "content_html": "<p>My current jobs description is to speed up metabolomics data analysis, and finally got around to making a first\nrelevant workflow for <a href=\"http://taverna.sf.net/\">Taverna</a>, using the\n<a href=\"http://www.chemspider.com/blog/?p=260\">webservices just posted over at ChemSpider</a>:</p>\n\n<p><img src=\"/assets/images/chemspiderWorkflow.png\" alt=\"\" /></p>\n\n<p>I uploaded the <a href=\"http://myexperiment.org/workflows/97\">source to MyExperiment</a>, so anyway can play with it.\nThere is much to improve, such as using <a href=\"http://cdk-taverna.de/\">CDK-Taverna</a> for further analysis of the results.</p>\n\n<p>I am not sure if opening the workflow in your Taverna installation will automatically set up the WDSL scavenger\nfor the <a href=\"http://www.chemspider.com/MassSpecAPI.asmx\">ChemSpider services</a>, which are available in a HTTP version too,\nbtw. If not, right click on the <em>Available Processors</em> folder, and pick <em>Add new WDSL scavenger</em>… and point it to the\nURL <em>http://www.chemspider.com/MassSpecAPI.asmx?WSDL</em>. The result should look like:</p>\n\n<p><img src=\"/assets/images/chemspiderWorkflow1.png\" alt=\"\" /></p>\n\n<p>Oh, and please note this comment:</p>\n\n<blockquote>\n  <p>These services are offered free of charge to our users during this period of testing, validation and feedback. Some of\nthese services will be made available commercially in the future and we are proactively informing you of our intention to\ndo this. It is likely that these services will remain available to academia at no charge. Please contact us at\nfeedbackATchemspiderDOTcom with feedback and questions.</p>\n</blockquote>\n\n<p>So, I do not know when my workflow will stop working.</p>",
      "summary": "My current jobs description is to speed up metabolomics data analysis, and finally got around to making a first relevant workflow for Taverna, using the webservices just posted over at ChemSpider:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/chemspiderWorkflow.png",
      "date_published": "2007-11-26T00:00:00+00:00",
      "date_modified": "2007-11-26T00:00:00+00:00",
      "tags": ["taverna","chemspider","metabolomics"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/j6f3x-a2q13",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/11/22/metware-metabolomics-database-project.html",
      "title": "MetWare: metabolomics database project started on SourceForge",
      "content_html": "<p>The Applied Bioinformatics at <a href=\"http://www.pri.wur.nl/NL/\">PRI</a> group where I now work in <a href=\"http://en.wikipedia.org/wiki/Wageningen\">Wageningen</a>\nand the group of <a href=\"http://www.ipb-halle.de/de/forschung/stress-und-entwicklungsbiologie/forschungsgruppen/bioinformatik-massenspektrometrie/\">Steffen Neumann</a>\nin Halle have started the <a href=\"http://metware.sf.net/\">MetWare</a> project on <a href=\"http://sf.net/\">Sourceforge</a> to develop\nopensource databases for metabolomics data.</p>\n\n<p>The databases design will be based on and ideally compatible with proposed standards like ArMet (DOI:<a href=\"https://doi.org/10.1038/nbt1041\">10.1038/nbt1041</a>)\nand those recently written up by the <a href=\"http://msi-workgroups.sourceforge.net/\">Metabolomics Standards Initiative</a>\n(see the issue around DOI:<a href=\"https://doi.org/10.1007/s11306-007-0070-6\">10.1007/s11306-007-0070-6</a>).</p>\n\n<p>One important design goal is that the project will use <a href=\"http://www.biomart.org/\">BioMart</a>, which will allow easy\nintegration of the database content in data analysis programs like <a href=\"http://taverna.sf.net/\">Taverna</a>\nand <a href=\"http://www.r-project.org/\">R</a> using the <a href=\"http://www.bioconductor.org/packages/2.1/bioc/html/biomaRt.html\">biomaRt</a>\npackage (see DOI:<a href=\"http://dx.doi.org/10.1093/bioinformatics/bti525\">10.1093/bioinformatics/bti525</a>).</p>\n\n<p>Though the software will be opensource, it is yet unsure how much data will be open.</p>",
      "summary": "The Applied Bioinformatics at PRI group where I now work in Wageningen and the group of Steffen Neumann in Halle have started the MetWare project on Sourceforge to develop opensource databases for metabolomics data.",
      
      "date_published": "2007-11-22T00:00:00+00:00",
      "date_modified": "2007-11-22T00:00:00+00:00",
      "tags": ["metabolomics","metware"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1038/nbt1041", "doi": "10.1038/nbt1041"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1007/s11306-007-0070-6", "doi": "10.1007/s11306-007-0070-6"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1093/bioinformatics/bti525", "doi": "10.1093/bioinformatics/bti525"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/ef7mq-x2k96",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/11/20/when-standards-fail.html",
      "title": "When standards fail...",
      "content_html": "<p><a href=\"http://wwmm.ch.cam.ac.uk/blogs/downing/?p=150\">Jim shows</a> that some people do not think webservices standards are complex enough in itself:</p>\n\n<blockquote>\n  <p><a href=\"http://uszla.me.uk/space/HomePage\">Toby</a> provided a tonic: How do you know when you’re solving the wrong problem? When your solution involves a\n<a href=\"http://www.active-endpoints.com/documents/documents/1/WS-HumanTask-v1.pdf\">133 page standard with a section entitled “Human Task Behavior and State Transitions”</a>,\njust to allow a system to give tasks to people.</p>\n</blockquote>",
      "summary": "Jim shows that some people do not think webservices standards are complex enough in itself:",
      
      "date_published": "2007-11-20T00:00:00+00:00",
      "date_modified": "2007-11-20T00:00:00+00:00",
      
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/hfw6p-d6p02",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/11/19/r-based-genetic-algorithm.html",
      "title": "An R-based genetic algorithm",
      "content_html": "<p>During my PhD I wrote a simple but <a href=\"http://cran.r-project.org/src/contrib/Descriptions/genalg.html\">effective genetic algorithm</a> package for\n<a href=\"http://www.r-project.org/\">R</a>. Because there was a bug recently found, and there is interest in extending the functionality,\nI have set up a <a href=\"http://sourceforge.net/\">SourceForge</a> project called\n<a href=\"http://sourceforge.net/projects/genalg\">genalg</a>.</p>\n\n<p>The package provides GA support for binary and real-value chromosomes (and integer chromosomes is something that will\nbe added soon), and allows to use custom evaluation functions. Here is some example code:</p>\n\n<div class=\"language-R highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"c1\"># optimize two values to match pi and sqrt(50)</span><span class=\"w\">\n</span><span class=\"n\">evaluate</span><span class=\"w\"> </span><span class=\"o\">&lt;-</span><span class=\"w\"> </span><span class=\"k\">function</span><span class=\"p\">(</span><span class=\"n\">string</span><span class=\"o\">=</span><span class=\"nf\">c</span><span class=\"p\">())</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\">\n    </span><span class=\"n\">returnVal</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"kc\">NA</span><span class=\"p\">;</span><span class=\"w\">\n    </span><span class=\"k\">if</span><span class=\"w\"> </span><span class=\"p\">(</span><span class=\"nf\">length</span><span class=\"p\">(</span><span class=\"n\">string</span><span class=\"p\">)</span><span class=\"w\"> </span><span class=\"o\">==</span><span class=\"w\"> </span><span class=\"m\">2</span><span class=\"p\">)</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\">\n        </span><span class=\"n\">returnVal</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"nf\">abs</span><span class=\"p\">(</span><span class=\"n\">string</span><span class=\"p\">[</span><span class=\"m\">1</span><span class=\"p\">]</span><span class=\"o\">-</span><span class=\"nb\">pi</span><span class=\"p\">)</span><span class=\"w\"> </span><span class=\"o\">+</span><span class=\"w\"> </span><span class=\"nf\">abs</span><span class=\"p\">(</span><span class=\"n\">string</span><span class=\"p\">[</span><span class=\"m\">2</span><span class=\"p\">]</span><span class=\"o\">-</span><span class=\"nf\">sqrt</span><span class=\"p\">(</span><span class=\"m\">50</span><span class=\"p\">));</span><span class=\"w\">\n    </span><span class=\"p\">}</span><span class=\"w\"> </span><span class=\"k\">else</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\">\n        </span><span class=\"n\">stop</span><span class=\"p\">(</span><span class=\"s2\">\"Expecting a chromosome of length 2!\"</span><span class=\"p\">);</span><span class=\"w\">\n    </span><span class=\"p\">}</span><span class=\"w\">\n    </span><span class=\"n\">returnVal</span><span class=\"w\">\n</span><span class=\"p\">}</span><span class=\"w\">\n</span><span class=\"n\">monitor</span><span class=\"w\"> </span><span class=\"o\">&lt;-</span><span class=\"w\"> </span><span class=\"k\">function</span><span class=\"p\">(</span><span class=\"n\">obj</span><span class=\"p\">)</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\">\n    </span><span class=\"c1\"># plot the population</span><span class=\"w\">\n    </span><span class=\"n\">xlim</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"nf\">c</span><span class=\"p\">(</span><span class=\"n\">obj</span><span class=\"o\">$</span><span class=\"n\">stringMin</span><span class=\"p\">[</span><span class=\"m\">1</span><span class=\"p\">],</span><span class=\"w\"> </span><span class=\"n\">obj</span><span class=\"o\">$</span><span class=\"n\">stringMax</span><span class=\"p\">[</span><span class=\"m\">1</span><span class=\"p\">]);</span><span class=\"w\">\n    </span><span class=\"n\">ylim</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"nf\">c</span><span class=\"p\">(</span><span class=\"n\">obj</span><span class=\"o\">$</span><span class=\"n\">stringMin</span><span class=\"p\">[</span><span class=\"m\">2</span><span class=\"p\">],</span><span class=\"w\"> </span><span class=\"n\">obj</span><span class=\"o\">$</span><span class=\"n\">stringMax</span><span class=\"p\">[</span><span class=\"m\">2</span><span class=\"p\">]);</span><span class=\"w\">\n    </span><span class=\"n\">plot</span><span class=\"p\">(</span><span class=\"n\">obj</span><span class=\"o\">$</span><span class=\"n\">population</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">xlim</span><span class=\"o\">=</span><span class=\"n\">xlim</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">ylim</span><span class=\"o\">=</span><span class=\"n\">ylim</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">xlab</span><span class=\"o\">=</span><span class=\"s2\">\"pi\"</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">ylab</span><span class=\"o\">=</span><span class=\"s2\">\"sqrt(50)\"</span><span class=\"p\">);</span><span class=\"w\">\n</span><span class=\"p\">}</span><span class=\"w\">\n</span><span class=\"n\">rbga.results</span><span class=\"w\"> </span><span class=\"o\">=</span><span class=\"w\"> </span><span class=\"n\">rbga</span><span class=\"p\">(</span><span class=\"nf\">c</span><span class=\"p\">(</span><span class=\"m\">1</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"m\">1</span><span class=\"p\">),</span><span class=\"w\"> </span><span class=\"nf\">c</span><span class=\"p\">(</span><span class=\"m\">5</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"m\">10</span><span class=\"p\">),</span><span class=\"w\"> </span><span class=\"n\">monitorFunc</span><span class=\"o\">=</span><span class=\"n\">monitor</span><span class=\"p\">,</span><span class=\"w\">\n    </span><span class=\"n\">evalFunc</span><span class=\"o\">=</span><span class=\"n\">evaluate</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">verbose</span><span class=\"o\">=</span><span class=\"kc\">TRUE</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">mutationChance</span><span class=\"o\">=</span><span class=\"m\">0.01</span><span class=\"p\">)</span><span class=\"w\">\n</span><span class=\"n\">plot</span><span class=\"p\">(</span><span class=\"n\">rbga.results</span><span class=\"p\">)</span><span class=\"w\">\n</span><span class=\"n\">plot</span><span class=\"p\">(</span><span class=\"n\">rbga.results</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">type</span><span class=\"o\">=</span><span class=\"s2\">\"hist\"</span><span class=\"p\">)</span><span class=\"w\">\n</span><span class=\"n\">plot</span><span class=\"p\">(</span><span class=\"n\">rbga.results</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"n\">type</span><span class=\"o\">=</span><span class=\"s2\">\"vars\"</span><span class=\"p\">)</span><span class=\"w\">\n</span></code></pre></div></div>",
      "summary": "During my PhD I wrote a simple but effective genetic algorithm package for R. Because there was a bug recently found, and there is interest in extending the functionality, I have set up a SourceForge project called genalg.",
      
      "date_published": "2007-11-19T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["rstats","chemometrics"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/qsqn1-qtk65",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/11/16/molecules-in-wikipedia-without-inchis-3.html",
      "title": "Molecules in Wikipedia without InChIs #3",
      "content_html": "<p>Third in the series of blogs about molecules in <a href=\"http://www.wikipedia.org/\">Wikipedia</a> without an\n<a href=\"http://www.iupac.org/inchi/\">InChI</a> (see also <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/06/19/using-wikipedia-to-recognize-molecules.html\">#1 <i class=\"fa-solid fa-recycle fa-xs\"></i></a> and\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2007/08/11/molecules-in-wikipedia-without-inchis.html\">#2 <i class=\"fa-solid fa-recycle fa-xs\"></i></a>).\nThere a certainly false positives, but here’s the updated list:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>http://www.en.wikipedia.org/wiki/AZD2171\nhttp://www.en.wikipedia.org/wiki/Alizarin\nhttp://www.en.wikipedia.org/wiki/Allantoin\nhttp://www.en.wikipedia.org/wiki/Allylamine\nhttp://www.en.wikipedia.org/wiki/Alpha-ethyltryptamine\nhttp://www.en.wikipedia.org/wiki/Anthraquinone\nhttp://www.en.wikipedia.org/wiki/Aspartame\nhttp://www.en.wikipedia.org/wiki/Barium_sulfate\nhttp://www.en.wikipedia.org/wiki/Biotin\nhttp://www.en.wikipedia.org/wiki/Boron_nitride\nhttp://www.en.wikipedia.org/wiki/Botox\nhttp://www.en.wikipedia.org/wiki/Bremelanotide\nhttp://www.en.wikipedia.org/wiki/CAS_registry_number\nhttp://www.en.wikipedia.org/wiki/Cadmium_sulfide\nhttp://www.en.wikipedia.org/wiki/Carminic_acid\nhttp://www.en.wikipedia.org/wiki/Celestine_%28mineral%29\nhttp://www.en.wikipedia.org/wiki/Cellulose\nhttp://www.en.wikipedia.org/wiki/Chemical\nhttp://www.en.wikipedia.org/wiki/Chemical_file_format\nhttp://www.en.wikipedia.org/wiki/Cheminformatics\nhttp://www.en.wikipedia.org/wiki/Chloramine\nhttp://www.en.wikipedia.org/wiki/Chloroethane\nhttp://www.en.wikipedia.org/wiki/Cinnamic_acid\nhttp://www.en.wikipedia.org/wiki/Crabtree's_catalyst\nhttp://www.en.wikipedia.org/wiki/DDT\nhttp://www.en.wikipedia.org/wiki/DMAP\nhttp://www.en.wikipedia.org/wiki/Dimethicone#Applications\nhttp://www.en.wikipedia.org/wiki/Dimethyl_amine\nhttp://www.en.wikipedia.org/wiki/Dimethyl_sulfide\nhttp://www.en.wikipedia.org/wiki/Dimethylethanolamine\nhttp://www.en.wikipedia.org/wiki/Dioxine\nhttp://www.en.wikipedia.org/wiki/Diphenylamine\nhttp://www.en.wikipedia.org/wiki/Dmso\nhttp://www.en.wikipedia.org/wiki/EDTA\nhttp://www.en.wikipedia.org/wiki/Eschenmoser%27s_salt\nhttp://www.en.wikipedia.org/wiki/Ethylene_carbonate\nhttp://www.en.wikipedia.org/wiki/Folate\nhttp://www.en.wikipedia.org/wiki/Formic_acid\nhttp://www.en.wikipedia.org/wiki/HMPA\nhttp://www.en.wikipedia.org/wiki/Hafnium(IV)_oxide\nhttp://www.en.wikipedia.org/wiki/Heavy_water\nhttp://www.en.wikipedia.org/wiki/Hexafluoroisopropanol\nhttp://www.en.wikipedia.org/wiki/Hydrogen_cyanide\nhttp://www.en.wikipedia.org/wiki/Hydrogen_cyanide#Hydrogen_cyanide_as_a_chemical_weapon\nhttp://www.en.wikipedia.org/wiki/Hydrogen_peroxide\nhttp://www.en.wikipedia.org/wiki/Hydroxyapatite\nhttp://www.en.wikipedia.org/wiki/Hydroxybenzotriazole\nhttp://www.en.wikipedia.org/wiki/IUPAC_nomenclature_of_inorganic_chemistry\nhttp://www.en.wikipedia.org/wiki/Indole\nhttp://www.en.wikipedia.org/wiki/Interferon_beta-1a\nhttp://www.en.wikipedia.org/wiki/J%C3%B6ns_Jakob_Berzelius\nhttp://www.en.wikipedia.org/wiki/Lawesson%27s_reagent\nhttp://www.en.wikipedia.org/wiki/Lewisite\nhttp://www.en.wikipedia.org/wiki/MTBE\nhttp://www.en.wikipedia.org/wiki/Maitotoxin\nhttp://www.en.wikipedia.org/wiki/Menthol\nhttp://www.en.wikipedia.org/wiki/Merck_Index\nhttp://www.en.wikipedia.org/wiki/Mescaline\nhttp://www.en.wikipedia.org/wiki/Metaldehyde\nhttp://www.en.wikipedia.org/wiki/Methionylalanylthreonyl...leucine\nhttp://www.en.wikipedia.org/wiki/Methyl_amine\nhttp://www.en.wikipedia.org/wiki/Methyl_salicylate\nhttp://www.en.wikipedia.org/wiki/Molecular_Query_Language\nhttp://www.en.wikipedia.org/wiki/N-butyllithium\nhttp://www.en.wikipedia.org/wiki/Nafion\nhttp://www.en.wikipedia.org/wiki/Nitrous_oxide\nhttp://www.en.wikipedia.org/wiki/Octanitrocubane\nhttp://www.en.wikipedia.org/wiki/Organic_chemistry\nhttp://www.en.wikipedia.org/wiki/Organic_chemistry#Molecular_structure_elucidation\nhttp://www.en.wikipedia.org/wiki/P4O10\nhttp://www.en.wikipedia.org/wiki/Paraldehyde\nhttp://www.en.wikipedia.org/wiki/Penicillin\nhttp://www.en.wikipedia.org/wiki/Peroxyacetic_acid\nhttp://www.en.wikipedia.org/wiki/Phenol\nhttp://www.en.wikipedia.org/wiki/Physical_science\nhttp://www.en.wikipedia.org/wiki/Piperidine\nhttp://www.en.wikipedia.org/wiki/Potassium_chloride\nhttp://www.en.wikipedia.org/wiki/Psilocybin\nhttp://www.en.wikipedia.org/wiki/Pubchem\nhttp://www.en.wikipedia.org/wiki/Quinine_total_synthesis\nhttp://www.en.wikipedia.org/wiki/Resveratrol\nhttp://www.en.wikipedia.org/wiki/Rhodamine\nhttp://www.en.wikipedia.org/wiki/Salvia_divinorum\nhttp://www.en.wikipedia.org/wiki/Selenium_dioxide\nhttp://www.en.wikipedia.org/wiki/Silicon_carbide\nhttp://www.en.wikipedia.org/wiki/Skatole\nhttp://www.en.wikipedia.org/wiki/Skeletal_formula\nhttp://www.en.wikipedia.org/wiki/Soman\nhttp://www.en.wikipedia.org/wiki/Splenda\nhttp://www.en.wikipedia.org/wiki/Standard_atomic_weight\nhttp://www.en.wikipedia.org/wiki/Subgraph_isomorphism_problem\nhttp://www.en.wikipedia.org/wiki/Sulfur_hexafluoride\nhttp://www.en.wikipedia.org/wiki/Sulfur_mustard\nhttp://www.en.wikipedia.org/wiki/TBHQ\nhttp://www.en.wikipedia.org/wiki/Tabun_(nerve_agent)\nhttp://www.en.wikipedia.org/wiki/Teicoplanin\nhttp://www.en.wikipedia.org/wiki/Tetra-ethyl_lead\nhttp://www.en.wikipedia.org/wiki/Tetraazidomethane\nhttp://www.en.wikipedia.org/wiki/Tetrachloroethylene\nhttp://www.en.wikipedia.org/wiki/Thiomersal\nhttp://www.en.wikipedia.org/wiki/Titanium_dioxide\nhttp://www.en.wikipedia.org/wiki/Tourmaline\nhttp://www.en.wikipedia.org/wiki/Uric_acid\nhttp://www.en.wikipedia.org/wiki/VX_%28nerve_agent%29\nhttp://www.en.wikipedia.org/wiki/Valence_%28chemistry%29\nhttp://www.en.wikipedia.org/wiki/benzylbromide\nhttp://www.en.wikipedia.org/wiki/cortisone\nhttp://www.en.wikipedia.org/wiki/epothilone\nhttp://www.en.wikipedia.org/wiki/piperidine\nhttp://www.en.wikipedia.org/wiki/stilbene\nhttp://www.wikipedia.org/wiki/Phosgene\n</code></pre></div></div>",
      "summary": "Third in the series of blogs about molecules in Wikipedia without an InChI (see also #1 and #2 ). There a certainly false positives, but here’s the updated list:",
      
      "date_published": "2007-11-16T00:00:00+00:00",
      "date_modified": "2025-04-20T00:00:00+00:00",
      "tags": ["wikipedia","inchi"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/krkta-dya88",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/11/14/last-call-for-open-laboratory-2007.html",
      "title": "Last Call for Open Laboratory 2007",
      "content_html": "<p><a href=\"http://pbeltrao.blogspot.com/\">Pedro</a> <a href=\"http://pbeltrao.blogspot.com/2007/11/last-call-for-open-laboratory-2007.html\">reminded me</a>\nof the last call for <a href=\"http://scienceblogs.com/clock/2007/11/open_laboratory_2008_last_call.php\">Open Laboratory 2007</a>,\nwhich prints the best blog items of 2007 in book form. The list of chemistry contributions is not so large yet, so\n<a href=\"http://openlab.wufoo.com/forms/submission-form/\">go ahead and nominate</a> some of cool chemical blog items of the last year.</p>\n\n<p>I will post my shortlist later this week.</p>",
      "summary": "Pedro reminded me of the last call for Open Laboratory 2007, which prints the best blog items of 2007 in book form. The list of chemistry contributions is not so large yet, so go ahead and nominate some of cool chemical blog items of the last year.",
      
      "date_published": "2007-11-14T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["openlab"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/bw5s8-f0t43",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/11/12/scintilla-and-postgenomiccom-on-linux.html",
      "title": "Scintilla and Postgenomic.com on Linux 2.6.17+",
      "content_html": "<p>That’s why blogging works! I reported last Friday on <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/11/09/using-nintendo-wii-for-serious-science.html\">using my Wii for reading <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\n<a href=\"http://scintilla.nature.com/\">Scintilla</a> and <a href=\"http://www.postgenomic.com/\">Postgenomic.com</a>.\n<a href=\"http://chem-bla-ics.blogspot.com/2007/11/using-nintendo-wii-for-serious-science.html#c9163469498026561308\">Alf replied</a>: <!-- keep link --></p>\n\n<blockquote>\n  <p>It is the Linux kernel, yes: TCP window scaling was switched on by default in kernels since about a year ago\n(and in Vista too, I think), and one of our routers or firewalls doesn’t like it. We’re trying to get them\nupgraded, but it takes a while…</p>\n</blockquote>\n\n<p>Ah, the trick word: TCP windows scaling. A quick google turned up a <a href=\"http://inodes.org/blog/2006/09/06/tcp-window-scaling-and-kernel-2617/\">workaround in John’s Tidbits blog</a>:</p>\n\n<blockquote>\n  <p>There are 2 quick fixes. First you can simply turn off windows scaling all together by doing</p>\n\n  <p>echo 0 &gt; /proc/sys/net/ipv4/tcp_window_scaling</p>\n\n  <p>but that limits your window to 64k. Or you can limit the size of your TCP buffers back to pre 2.6.17 kernel values which means a wscale value of about 2 is used which is acceptable to most broken routers.</p>\n\n  <p>echo “4096 16384 131072” &gt; /proc/sys/net/ipv4/tcp_wmem\necho “4096 87380 174760” &gt; /proc/sys/net/ipv4/tcp_rmem</p>\n\n  <p>The original values would have had 4MB in the last column above which is what was allowing these massive windows.</p>\n\n  <p>In a thread somewhere which I can’t find anymore Dave Miller had a great quote along the lines of</p>\n\n  <p>“I refuse to workaround it, window scaling has been part of the protocol since 1999, deal with it.”</p>\n</blockquote>\n\n<p>That worked for me. I think Dave Miller is right, but can’t resist reading Scintilla and Postgenomic.com on my desktop too ;)</p>",
      "summary": "That’s why blogging works! I reported last Friday on using my Wii for reading Scintilla and Postgenomic.com. Alf replied:",
      
      "date_published": "2007-11-12T00:00:00+00:00",
      "date_modified": "2025-04-20T00:00:00+00:00",
      "tags": ["linux","postgenomic"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/cymvx-zfa89",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/11/09/using-nintendo-wii-for-serious-science.html",
      "title": "Using the Nintendo Wii for serious science...",
      "content_html": "<p>On my desktop, the <a href=\"http://scintilla.nature.com/\">Scintilla</a> and <a href=\"http://postgenomic.com/\">Postgenomic.com</a> websites\ndo not work. It is not a browser problem, but has something to do with TCP/IP packages not reaching its destination:\nthe browser. Euan told me they are aware of the problem, but apparently have not found a solution yet.</p>\n\n<p>However, my <a href=\"http://wii.nintendo.com/\">Wii</a> does not have the problem, which makes me wonder if it is a disagreement\nbetween the <a href=\"http://www.nature.com/\">Nature</a> server and my <a href=\"http://packages.ubuntu.com/gutsy/metapackages/linux\">Linux kernel</a>…\nAnyway, this is what the two website look like (first Scintilla, then Postgenomic.com):</p>\n\n<p><img src=\"/assets/images/dsci0190.jpg\" alt=\"\" /></p>\n\n<p><img src=\"/assets/images/dsci0191.jpg\" alt=\"\" /></p>\n\n<p>(BTW, that was one <a href=\"https://doi.org/10.59350/v4pc6-9v569\">very nice piece of work by Rich <i class=\"fa-solid fa-recycle fa-xs\"></i></a>!\nMake sure to also read the <a href=\"https://doi.org/10.59350/vhefz-rc472\">follow up <i class=\"fa-solid fa-recycle fa-xs\"></i></a>).</p>\n\n<p>The only real disadvantage is that it does not integrate well with the things I do daily. If I see some interesting\npost, and would like to tag it on <a href=\"http://del.icio.us/egonw\">my del.icio.us account</a>, I have to google for it on\nmy desktop :(</p>\n\n<p>(You thought I was going to talk about <a href=\"http://folding.stanford.edu/\">F@H</a> or so, didn’t you? :)</p>",
      "summary": "On my desktop, the Scintilla and Postgenomic.com websites do not work. It is not a browser problem, but has something to do with TCP/IP packages not reaching its destination: the browser. Euan told me they are aware of the problem, but apparently have not found a solution yet.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/dsci0190.jpg",
      "date_published": "2007-11-09T00:00:00+00:00",
      "date_modified": "2025-04-12T00:00:00+00:00",
      "tags": ["nintendo","linux","postgenomic"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.59350/v4pc6-9v569", "doi": "10.59350/v4pc6-9v569"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.59350/vhefz-rc472", "doi": "10.59350/vhefz-rc472"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/y9gch-mzn51",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/11/08/cytoscape-in-amsterdam.html",
      "title": "Cytoscape in Amsterdam",
      "content_html": "<p>Right at this moment I am listening to Andrew Hopkins from Dundee on chemical opportunities in system biology, at the\n<a href=\"http://www.cytoscape.org/\">Cytoscape</a> <a href=\"https://web.archive.org/web/20071214220051/https://cytoscape.org/retreat2007/programme.php#nov_8\">conference in Amsterdam <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>.\nAnyone who wants to meet up over lunch or coffee break?</p>",
      "summary": "Right at this moment I am listening to Andrew Hopkins from Dundee on chemical opportunities in system biology, at the Cytoscape conference in Amsterdam . Anyone who wants to meet up over lunch or coffee break?",
      
      "date_published": "2007-11-08T00:00:00+00:00",
      "date_modified": "2025-03-30T00:00:00+00:00",
      "tags": ["cytoscape"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/zs3tv-pp865",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/11/07/comparing-junit-test-results-between.html",
      "title": "Comparing JUnit test results between CDK trunk/ and a branch",
      "content_html": "<p>I have started using branches for non-trivial patches, like removing the HückelAromaticityDetector, in favor of the\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2007/11/06/evidence-of-aromaticity.html\">new CDKHückelAromaticityDetector <i class=\"fa-solid fa-recycle fa-xs\"></i></a>. I am doing\nthis in my personal <a href=\"http://cdk.svn.sf.net/svnroot/cdk/branches/egonw/remove-non-cdkatomtype-code/\">remove-non-cdkatomtype-code</a>\nbranch, where I can quietly work on the patch until I am happy about it. I make sure to keep it synchronized with\ntrunk with regular <code class=\"language-plaintext highlighter-rouge\">svn merge</code> commands.</p>\n\n<p>Now, the goal is that my branch only fixed failing JUnit tests, not that it creates new regressions. To compare the\nresults between two versions of the <a href=\"\">CDK</a>, I use these commands:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>$ cd cdk/trunk/cdk\n$ ant -lib develjar/junit-4.3.1.jar -logfile ant.log test-all\n$ cd ../../branches/egonw/remove-non-cdkatomtype-code/\n$ ant -lib develjar/junit-4.3.1.jar -logfile ant.log test-all\n$ cd ../../..\n$ grep Testcase branches/egonw/remove-non-cdkatomtype-code/reports/*.txt | cut -d':' -f2,3 &gt; branch.results\n$ grep Testcase trunk/cdk/reports/*.txt | cut -d':' -f2,3 &gt; trunk.results\n$ diff -u trunk.results branch.results\n</code></pre></div></div>\n\n<p>The last diff commands gives me a quick overview of what has changed. See get the statistics, I can do:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>$ diff -u trunk.results branch.results | grep \"^-Testcase\" | wc -l\n$ diff -u trunk.results branch.results | grep \"^+Testcase\" | wc -l\n</code></pre></div></div>\n\n<p>The first gives me the number of JUnit tests which are now no longer failing, while the second\ngives me the number of tests which are new fails. Ideally, the second is zero. Unfortunately, not yet the case :)</p>",
      "summary": "I have started using branches for non-trivial patches, like removing the HückelAromaticityDetector, in favor of the new CDKHückelAromaticityDetector . I am doing this in my personal remove-non-cdkatomtype-code branch, where I can quietly work on the patch until I am happy about it. I make sure to keep it synchronized with trunk with regular svn merge commands.",
      
      "date_published": "2007-11-07T00:00:00+00:00",
      "date_modified": "2025-04-20T00:00:00+00:00",
      "tags": ["cdk","junit"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/9fkkg-fxz59",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/11/06/evidence-of-aromaticity.html",
      "title": "Evidence of Aromaticity",
      "content_html": "<p>I have been working on a new atom type perception engine for the <a href=\"http://cdk.sf.net/\">CDK</a>, after having decided that\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2007/07/01/atom-typing-in-cdk.html\">the existing atom type lists <i class=\"fa-solid fa-recycle fa-xs\"></i></a> where not sufficient for\nthe algorithms we have in the CDK. The <a href=\"http://cdk.svn.sourceforge.net/viewvc/*checkout*/cdk/trunk/cdk/src/org/openscience/cdk/config/data/cdk_atomtypes.xml?revision=9288\">new list</a>\nis growing in size, and basically contains four properties (besides element and formal charge):</p>\n\n<ol>\n  <li>number of bounded neighbors</li>\n  <li>number of pi bonds (or double bond equivalents)</li>\n  <li>number of lone pairs</li>\n  <li>hybridization state</li>\n</ol>\n\n<p>This seems to be a minimal and accurate set to cover a rather good deal of chemoinformatics. I have yet to make the mappings\nof the new atom type list with existing lists for force fields, and radicals are missing too. However, the following\nalgorithms in the CDK seem to translate rather well:</p>\n\n<ul>\n  <li>hydrogen adding</li>\n  <li>aromaticity detection (Hückel rules)</li>\n</ul>\n\n<p>I still have to rework the double bond perception.</p>\n\n<h2 id=\"aromaticity\">Aromaticity</h2>\n\n<p>Now, aromaticity is a fuzzy concept, and there is no general agreement on what it is. Some say it is smelly compounds, others\nsay ring systems which apply to the Hückel rule. Based on the new atom type list, I have rewritten the Hückel aromaticity\ndetector and it applies these rules:</p>\n\n<ul>\n  <li>only single rings and two fused non-spiro rings</li>\n  <li>4n+2 electrons</li>\n  <li>no ring atoms with double points not in the ring too</li>\n</ul>\n\n<p>This approach differs in two ways from the old code: it no longer tries to test all ring systems, which required to use\nthe <a href=\"http://cheminfo.informatics.indiana.edu/~rguha/code/java/nightly/api/org/openscience/cdk/ringsearch/AllRingsFinder.html\">CDK AllRingsFinder algorithm</a>\nwhich combinatorial generates all possible ring systems. The new code only considers ring systems with up to two single\nrings. Aromaticity beyond that is even less well defined than aromaticity in general.</p>\n\n<p>The other difference is that the ring system must not have ring atoms which have a double bond which is not part of the\nring too. The classical example is benzoquinone (InChI=1/C6H4O2/c7-5-1-2-6(8)4-3-5/h1-4H) which is not aromatic, even\nthough it conforms the 4n+2 rule (image from <a href=\"http://pubchem.ncbi.nlm.nih.gov/\">PubChem</a>):</p>\n\n<p><img src=\"/assets/images/cid4650.png\" alt=\"\" /></p>\n\n<h2 id=\"evidence-of-aromaticity\">Evidence of Aromaticity</h2>\n\n<p>The final rule, of course, is what nature tells us what is aromatic and what is not. There are many other details to\naromaticity than I just covered. For example, take azulene (InChI=1/C10H8/c1-2-5-9-7-4-8-10(9)6-3-1/h1-8H). All\natoms are aromatic, but not all bonds (also <a href=\"https://pubchem.ncbi.nlm.nih.gov/compound/9231\">PubChem</a>):</p>\n\n<p><img src=\"/assets/images/cid9231.png\" alt=\"\" /></p>\n\n<p>These things are complex, but the rise of <a href=\"http://en.wikipedia.org/wiki/Open_Data\">Open Data</a> helps us out, as well\nas increasing computing power. <a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/\">Peter</a> has been running two rather\nprojects which may help us out: <a href=\"http://wwmm.ch.cam.ac.uk/crystaleye/\">CrystalEye</a> (Nick: no blog?) and\n<a href=\"https://blogs.ch.cam.ac.uk/pmr/2007/11/02/open-nmr-update-and-requests-for-input/\">OpenNMR <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.</p>\n\n<p>NMR shifts will give us experimental backup on our notion of aromaticity, and so do bond lengths. I\n<a href=\"https://blogs.ch.cam.ac.uk/pmr/2007/11/02/open-nmr-update-and-requests-for-input/#comment-1139\">asked Peter about this <i class=\"fa-solid fa-recycle fa-xs\"></i></a>, and whether OpenNMR\npredicted shifts could indeed confirm aromaticity of compounds, and <a href=\"https://blogs.ch.cam.ac.uk/pmr/2007/11/05/open-nmr-how-good-is-the-prediction/\">he replied <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nand showed that the predicted spectra could be used to distinguish between C-C and C=C bonds.</p>\n\n<p>I commented the following (which was in moderation at the time of writing), and that gets us to experimental\nevidence for aromaticity:</p>\n\n<blockquote>\n  <p>Thanx for the elaborate answer. What I had in mind was the question whether NMR shift predictions can be\nused to tell me if a certain ring system is aromatic or not, and in case of fused rings, which atoms and\nwhich bonds are aromatic and which not. I’m sure the prediction error for 1H NMR shifts is well below 2ppm,\nand more in the order of 0.2ppm.</p>\n</blockquote>\n\n<blockquote>\n  <p>But maybe I should be asking, can I use CrystalEye to decide if ring systems are “aromatic”, and in case\nof two rings fused together (non-spiro), which atoms and bonds are aromatic and which not. Aromaticity is\na fuzzy concept, with various definitions. I would be interesting in linking what the expert considers\n‘aromatic’ (or SMILES, or the CDK, or …) with what the QM chemistry (via bond lengths or NMR shift\npredictions) and crystal structures (via bond lengths) has to teach us. The null hypothesis being that\nthe bonds are not delocalized (bond length) and that no ring current is found (NMR shifts, 1H in particular).</p>\n</blockquote>\n\n<blockquote>\n  <p>Regarding those bond lengths, ‘aromatic’ bonds show a bond length in between that of single and double bonds\n(e.g. see <a href=\"http://www.chem.swin.edu.au/modules/mod2/bondlen.html\">this random pick</a>). The CrystalEye data\ndoes not reflect that really, and only <a href=\"http://wwmm.ch.cam.ac.uk/crystaleye/bondlengths/C-C-after-protocol.svg\">a trimodal histograms</a>\nshows up. Indeed, the C#C peak is <em>very</em> low, around 1.2A :) Apparently, the triple C#C bond order is\nunderrepresented in nowadays crystallography.</p>\n</blockquote>\n\n<blockquote>\n  <p>Maybe aromatic C:C bonds are underrepresented too, or can the absence of a peak around 1.40A be explained\notherwise? I would at least have expected a shoulder or deviation in peak shape of the peak at 1.37A.</p>\n</blockquote>\n\n<p>This is what the histogram looks like (for archival reasons):</p>\n\n<p><img src=\"/assets/images/trimodalCC.png\" alt=\"\" /></p>",
      "summary": "I have been working on a new atom type perception engine for the CDK, after having decided that the existing atom type lists where not sufficient for the algorithms we have in the CDK. The new list is growing in size, and basically contains four properties (besides element and formal charge):",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/trimodalCC.png",
      "date_published": "2007-11-06T00:00:00+00:00",
      "date_modified": "2025-04-20T00:00:00+00:00",
      "tags": ["cdk","aromaticity","crystal"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/vkwhn-4tw77",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/11/05/glueing-biomoby-services-together-with.html",
      "title": "Glueing BioMoby services together with JavaScript in Bioclipse",
      "content_html": "<p><a href=\"http://www.blogger.com/profile/10379047094508592338\">Ola</a> has been doing a good job of integrating\n<a href=\"http://biomoby.org/\">BioMoby</a> support into Bioclipse. Earlier he completed a GUI for running BioMOBY\nservices, and added more recently <a href=\"http://bioclipse.blogspot.com/2007/11/scripting-biomoby-in-bioclipse.html\">a JavaScript wrapper too</a>,\nusing the <a href=\"http://wiki.bioclipse.net/index.php?title=Bc_rhino\">Rhino plugin</a> developed by Johannes.</p>\n\n<p>For example:</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nx\">console</span> <span class=\"o\">=</span> <span class=\"nb\">Packages</span><span class=\"p\">.</span><span class=\"nx\">net</span><span class=\"p\">.</span><span class=\"nx\">bioclipse</span><span class=\"p\">.</span><span class=\"nx\">util</span><span class=\"p\">.</span><span class=\"nx\">BioclipseConsole</span><span class=\"p\">;</span>\n<span class=\"nx\">moby</span> <span class=\"o\">=</span> <span class=\"nb\">Packages</span><span class=\"p\">.</span><span class=\"nx\">net</span><span class=\"p\">.</span><span class=\"nx\">bioclipse</span><span class=\"p\">.</span><span class=\"nx\">biomoby</span><span class=\"p\">.</span><span class=\"nx\">ui</span><span class=\"p\">.</span><span class=\"nx\">scripts</span><span class=\"p\">.</span><span class=\"nx\">MobyServiceScripting</span><span class=\"p\">;</span>\n<span class=\"nx\">biojava</span> <span class=\"o\">=</span> <span class=\"nb\">Packages</span><span class=\"p\">.</span><span class=\"nx\">net</span><span class=\"p\">.</span><span class=\"nx\">bioclipse</span><span class=\"p\">.</span><span class=\"nx\">biojava</span><span class=\"p\">.</span><span class=\"nx\">scripts</span><span class=\"p\">.</span><span class=\"nx\">BioJavaScripting</span><span class=\"p\">;</span>\n\n<span class=\"nx\">prot</span><span class=\"o\">=</span><span class=\"nx\">moby</span><span class=\"p\">.</span><span class=\"nf\">downloadGenbank</span><span class=\"p\">(</span><span class=\"dl\">\"</span><span class=\"s2\">NCBI_GI</span><span class=\"dl\">\"</span><span class=\"p\">,</span><span class=\"dl\">\"</span><span class=\"s2\">111076</span><span class=\"dl\">\"</span><span class=\"p\">);</span>\n<span class=\"nx\">seq</span><span class=\"o\">=</span><span class=\"nx\">biojava</span><span class=\"p\">.</span><span class=\"nf\">parseString</span><span class=\"p\">(</span><span class=\"nx\">prot</span><span class=\"p\">);</span>\n<span class=\"nx\">fasta</span><span class=\"o\">=</span><span class=\"nx\">biojava</span><span class=\"p\">.</span><span class=\"nf\">toFasta</span><span class=\"p\">(</span><span class=\"nx\">seq</span><span class=\"p\">);</span>\n\n<span class=\"nx\">console</span><span class=\"p\">.</span><span class=\"nf\">writeToConsole</span><span class=\"p\">(</span><span class=\"nx\">fasta</span><span class=\"p\">);</span>\n</code></pre></div></div>\n\n<p>Today, he explained <a href=\"http://bioclipse.blogspot.com/2007/11/adding-scripting-commands-to-bioclipse.html\">how to create convenience JavaScript shortcuts</a>,\nto reduce the typing.</p>\n\n<p>Screenshots and status of the Bioclipse-BioMoby work is <a href=\"http://wiki.bioclipse.net/index.php?title=BioMoby_plugin\">available from the wiki</a>.</p>",
      "summary": "Ola has been doing a good job of integrating BioMoby support into Bioclipse. Earlier he completed a GUI for running BioMOBY services, and added more recently a JavaScript wrapper too, using the Rhino plugin developed by Johannes.",
      
      "date_published": "2007-11-05T00:00:00+00:00",
      "date_modified": "2007-11-05T00:00:00+00:00",
      "tags": ["bioclipse","javascript"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/fvbts-sc941",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/10/31/offline-cdk-development-using-git-svn.html",
      "title": "Offline CDK development using git-svn",
      "content_html": "<p>While <a href=\"http://subversion.tigris.org/\">Subversion</a> is a signification improvement over <a href=\"http://www.nongnu.org/cvs/\">CVS</a>, they both require a\ncentral server. That is, they do not allow me to commit changes when I am not connected to that server. This is annoying when being on a\nlong train ride, or somewhere else without internet connectivity. I can pile up all my changes, but that would yield one big ugly patch.</p>\n\n<p>Therefore, I tried <a href=\"http://www.selenic.com/mercurial/wiki/\">Mercurial</a> where each client is server too. The version I used, however, did\nnot have the move command, so it put me back into the old CVS days where I lost the history of a file when I reorganize my archive.</p>\n\n<h2 id=\"git\">Git</h2>\n\n<p>Then <a href=\"http://git.or.cz/\">Git</a>, the version control system developed by <a href=\"http://en.wikipedia.org/wiki/Linus_Torvalds\">Linus Torvalds</a> when he\nfound that existing tools did not do what he wanted to do. It seems a rather good product, though with a somewhat larger learning curve,\nbecause of the far more flexible architecture (see <a href=\"http://www.kernel.org/pub/software/scm/git/docs/tutorial.html\">this tutorial</a>).\nWell, <a href=\"http://kernel.org/doc/local/git-quick.html\">it works for the Linux kernel</a>, so must be good :)</p>\n\n<p>Now, <a href=\"http://www.sf.net/\">SourceForge</a> does not have Git support yet, so we use Subversion. <a href=\"http://www.flavio.castelli.name/\">Flavio</a>,\nof <a href=\"http://strigi.sf.net/\">Strigi</a> fame, however, <a href=\"http://www.flavio.castelli.name/howto_use_git_with_svn\">introduced me to git-svn</a>.\nAlmost two month ago, already, but finally made some time to try it out. I think I like it.</p>\n\n<p>This is what I did to make <a href=\"http://cdk.svn.sourceforge.net/viewvc/cdk/trunk/cdk/.classpath?r1=8523&amp;r2=9271\">a commit to CDKs SVN repository</a>:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nv\">$ </span><span class=\"nb\">sudo </span>aptitude <span class=\"nb\">install </span>git-svn git-core\n<span class=\"nv\">$ </span><span class=\"nb\">mkdir</span> <span class=\"nt\">-p</span> git-svn/cdk-trunk\n<span class=\"nv\">$ </span><span class=\"nb\">cd </span>git-svn/cdk-trunk\n<span class=\"nv\">$ </span>git-svn init https://cdk.svn.sourceforge.net/svnroot/cdk/trunk/cdk\n<span class=\"nv\">$ </span>git-svn fetch <span class=\"nt\">-rHEAD</span>\n<span class=\"nv\">$ </span>nano .classpath\n<span class=\"nv\">$ </span>git add .classpath\n<span class=\"nv\">$ </span>git commit\n<span class=\"nv\">$ </span>git-svn dcommit\n</code></pre></div></div>\n\n<p>The first git-svn command initializes a log Git repository based on the SVN repository. The <code class=\"language-plaintext highlighter-rouge\">git-svn fetch</code> command makes a local copy of\nthe SVN repository content defined in the previous command. Local changes are, by default, not commited; unless one explicitly git adds\nthem to a patch. Once a patch is ready you can do all sorts of interesting things with them, among with commit them to the local Git\nrepository with <code class=\"language-plaintext highlighter-rouge\">git commit</code>.</p>\n\n<p>Now, these kind of commits are on the local repository, and I do not require internet access for that. When I am connected again, I can\nsynchronize my local changes with the SVN repository with the <code class=\"language-plaintext highlighter-rouge\">git-svn dcommit</code> command.</p>\n\n<p>A final important command is <code class=\"language-plaintext highlighter-rouge\">git-svn rebase</code>, which is used to update the local git command for changes others made to the SVN repository.</p>",
      "summary": "While Subversion is a signification improvement over CVS, they both require a central server. That is, they do not allow me to commit changes when I am not connected to that server. This is annoying when being on a long train ride, or somewhere else without internet connectivity. I can pile up all my changes, but that would yield one big ugly patch.",
      
      "date_published": "2007-10-31T00:00:00+00:00",
      "date_modified": "2007-10-31T00:00:00+00:00",
      "tags": ["git","svn","cdk"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/9qsxx-j6z92",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/10/29/biospider-another-molecule-search.html",
      "title": "BioSpider: another molecule search engine",
      "content_html": "<p>I just ran into <a href=\"http://biospider.ca/\">BioSpider</a>. Unlike <a href=\"http://www.chemspider.com/\">ChemSpider</a>, BioSpider crawls the\ninternet (well, <a href=\"http://redpoll.pharmacy.ualberta.ca/~knox/biospider2/sources.html\">this list of sources really</a>) to find\ninformation, and depending on what it finds it continues the search. Below is a screenshot of an intermediate point after\nstarting with the InChI of methane:</p>\n\n<p><img src=\"/assets/images/biospider.png\" alt=\"\" /></p>\n\n<p>After the search it generates a long HTML page with all the information it found on the molecule you queried for.\nThis approach is much more scalable than storing all in one database.</p>\n\n<p>This crawling of information is something I was working on myself a bit too, and I think this is a good approach.\nHowever, I think the use of a central website is not the right approach. Instead, the search should be distributed too:\nthe crawling should be done on the client machine; it should be done in <a href=\"http://taverna.sf.net/\">Taverna</a> or\n<a href=\"http://bioclipse.net/\">Bioclipse</a> instead.</p>\n\n<p>My conclusion: excellent idea, bad implementation.</p>",
      "summary": "I just ran into BioSpider. Unlike ChemSpider, BioSpider crawls the internet (well, this list of sources really) to find information, and depending on what it finds it continues the search. Below is a screenshot of an intermediate point after starting with the InChI of methane:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/biospider.png",
      "date_published": "2007-10-29T00:00:00+00:00",
      "date_modified": "2007-10-29T00:00:00+00:00",
      
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/djg2t-d2x55",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/10/26/my-foaf-network-1-foafexplorer.html",
      "title": "My FOAF network #1: the FOAFExplorer",
      "content_html": "<p>In this series I will introduce the technologies behind my FOAF network. <a href=\"http://www.foaf-project.org/\">FOAF</a>\nmeans Friend-of-a-Friend and</p>\n\n<blockquote>\n  <p>[t]he Friend of a Friend (FOAF) project is creating a Web of machine-readable pages describing people,\nthe links between them and the things they create and do.</p>\n</blockquote>\n\n<p><a href=\"http://blueobelisk.sourceforge.net/people/egonw/foaf.xrdf\">My FOAF file</a> (draft) will give you details on\nwho I am, who I collaborate with (and other types of friends), which conferences I am attending, what I\npublished etc. That is, I’ll try to keep it updated. BTW, <a href=\"http://xmlns.com/foaf/spec/\">FOAF is a RDF language</a>.</p>\n\n<h2 id=\"foafexplorer\">FOAFExplorer</h2>\n\n<p><a href=\"http://plindenbaum.blogspot.com/\">Pierre</a> has done some <a href=\"http://plindenbaum.blogspot.com/search?q=FOAF\">excellent FOAF work</a>\nin the past, and <a href=\"http://plindenbaum.blogspot.com/2006/01/myfoafexplorer-browse-your-foaf.html\">developed</a> the\n<a href=\"http://www.urbigene.com/foafexplorer/\">MyFOAFExplorer</a>, and also developed\n<a href=\"http://plindenbaum.blogspot.com/2006/01/scifoaf-cited-at-pacific-symposium-on.html\">a tool to create a FOAF network</a>\nbased on the <a href=\"http://www.ncbi.nlm.nih.gov/sites/entrez\">PubMed database</a>, called\n<a href=\"http://www.urbigene.com/foaf/\">SciFOAF</a>. The latter is neat, but does not allow putting\nall this personal details in the FOAF files. However, the output could be a starting\npoint.</p>\n\n<p>Back to FOAFExplorer, this is <a href=\"http://blueobelisk.sourceforge.net/people/egonw/\">what the FOAFExplorer shows for my network</a>:</p>\n\n<p><img src=\"/assets/images/foaf_egonw.png\" alt=\"\" /></p>\n\n<p>I’m a bit lonely, even though I have linked to two friends in my FOAF file, of which one has a FOAF\nfile too (<a href=\"http://www.ch.ic.ac.uk/rzepa/rzepa.xrdf\">Henry</a>):</p>\n\n<div class=\"language-xml highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nt\">&lt;foaf:knows&gt;</span>\n  <span class=\"nt\">&lt;foaf:Person</span> <span class=\"na\">rdf:ID=</span><span class=\"s\">\"HenryRzepa\"</span><span class=\"nt\">&gt;</span>\n    <span class=\"nt\">&lt;foaf:name&gt;</span>Henry Rzepa<span class=\"nt\">&lt;/foaf:name&gt;</span>\n    <span class=\"nt\">&lt;rdfs:seeAlso</span> <span class=\"na\">rdf:resource=</span><span class=\"s\">\"http://www.ch.ic.ac.uk/rzepa/rzepa.xrdf\"</span><span class=\"nt\">/&gt;</span>\n  <span class=\"nt\">&lt;/foaf:Person&gt;</span>\n<span class=\"nt\">&lt;/foaf:knows&gt;</span>\n<span class=\"nt\">&lt;foaf:knows&gt;</span>\n  <span class=\"nt\">&lt;foaf:Person</span> <span class=\"na\">rdf:ID=</span><span class=\"s\">\"PeterMurrayRust\"</span><span class=\"nt\">&gt;</span>\n    <span class=\"nt\">&lt;foaf:name&gt;</span>Peter Murray-Rust\n    <span class=\"nt\">&lt;foaf:mbox_sha1sum&gt;</span>926d6f8ed367bdded26353a05e80b4f0ce18230d...\n  <span class=\"nt\">&lt;/foaf:Person&gt;</span>\n<span class=\"nt\">&lt;/foaf:knows&gt;</span>\n</code></pre></div></div>\n<p>I guess the FOAFExplorer does not browse into my network. More on that in later items in this series.</p>",
      "summary": "In this series I will introduce the technologies behind my FOAF network. FOAF means Friend-of-a-Friend and",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/foaf_egonw.png",
      "date_published": "2007-10-26T00:00:00+00:00",
      "date_modified": "2007-10-26T00:00:00+00:00",
      "tags": ["semweb","foaf"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/t5hwa-4eg16",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/10/24/one-billion-biochemical-rdf-triples.html",
      "title": "One Billion Biochemical RDF Triples!",
      "content_html": "<p>That must be a record! Eric Jain wrote on <a href=\"http://lists.w3.org/Archives/Public/public-semweb-lifesci/\">public-semweb-lifesci</a>:</p>\n\n<blockquote>\n  <p>The latest release of the <a href=\"http://www.expasy.uniprot.org/\">UniProt</a> protein database contains just over a\nbillion triples*! PRESS RELEASE :-)</p>\n\n  <p>The data is all available via the (Semantic or otherwise) Web:</p>\n\n  <p><a href=\"http://beta.uniprot.org/\">http://beta.uniprot.org/</a></p>\n\n  <p>…or can be bulk-downloaded from:</p>\n\n  <p><a href=\"ftp://ftp.uniprot.org/\">ftp://ftp.uniprot.org/</a></p>\n\n  <ul>\n    <li>Counting some reification statements, and assuming no overlap between\n“named graphs”.</li>\n  </ul>\n\n  <p>P.S. This should be the last you’ll hear from me on this topic – I’m off\nto new adventures…</p>\n</blockquote>\n\n<p>I surely hope this is not the last we hear of this huge RDF collection.</p>",
      "summary": "That must be a record! Eric Jain wrote on public-semweb-lifesci:",
      
      "date_published": "2007-10-24T00:10:00+00:00",
      "date_modified": "2025-03-23T00:00:00+00:00",
      "tags": ["uniprot","rdf"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/5nnev-0bt31",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/10/24/my-blog-turned-2.html",
      "title": "My blog turned 2",
      "content_html": "<p>A bit over two years I posted my first blog item, <a href=\"https://chem-bla-ics.linkedchemistry.info/2005/10/15/chem-bla-ics.html\">Chem-bla-ics <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\nintroducing the topic of my blog. In January this year I <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/01/11/why-do-i-blog.html\">explained why I like blogging <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.</p>",
      "summary": "A bit over two years I posted my first blog item, Chem-bla-ics , introducing the topic of my blog. In January this year I explained why I like blogging .",
      
      "date_published": "2007-10-24T00:00:00+00:00",
      "date_modified": "2025-03-30T00:00:00+00:00",
      
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/byehz-56r07",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/10/19/bob-improved-pov-ray-export-of-jmol.html",
      "title": "Bob improved the POV-Ray export of Jmol",
      "content_html": "<p>Bob has set up a new interface between the data model and the <a href=\"http://www.jmol.org/\">Jmol</a> renderer, which\nallows him to define other types of export too. One of this is a <a href=\"http://www.povray.org/\">POV-Ray</a> export,\nwhich allows creating of high quality images for paper. Jmol has had POV-Ray export for a long time now,\nbut never included the secondary structures or other more recent visual featues. <a href=\"http://pymol.sourceforge.net/\">PyMOL</a>\nis well-known for its POV-Ray feature, and often used to create publication quality protein prints. The\nscript command to create a POV-Ray input file takes the output image size as parameters:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>write povray 400 600   # width 400, height 600\n</code></pre></div></div>\n\n<p>Here’s a screenshot of a protein with surface:</p>\n\n<p><img src=\"/assets/images/t2.pov.png\" alt=\"\" /></p>\n\n<p>And here a MO of water:</p>\n\n<p><img src=\"/assets/images/watermo.pov.png\" alt=\"\" /></p>\n\n<p>Note the shading. More examples are available <a href=\"http://www.stolaf.edu/people/hansonr/temp/jmol/povray.htm\">here</a>.</p>",
      "summary": "Bob has set up a new interface between the data model and the Jmol renderer, which allows him to define other types of export too. One of this is a POV-Ray export, which allows creating of high quality images for paper. Jmol has had POV-Ray export for a long time now, but never included the secondary structures or other more recent visual featues. PyMOL is well-known for its POV-Ray feature, and often used to create publication quality protein prints. The script command to create a POV-Ray input file takes the output image size as parameters:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/t2.pov.png",
      "date_published": "2007-10-19T00:00:00+00:00",
      "date_modified": "2007-10-19T00:00:00+00:00",
      "tags": ["jmol"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/pd0g8-bzs86",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/10/18/ore-qsar-in-bioclipse-joelib-extension.html",
      "title": "More QSAR in Bioclipse: the JOELib extension",
      "content_html": "<p>I added a <a href=\"http://www.bioclipse.net/\">Bioclipse</a> plugin for <a href=\"http://joelib.sourceforge.net/\">JOELib</a> (GPL, by\n<a href=\"http://miningdrugs.blogspot.com/\">Joerg</a>) which comes with many QSAR descriptors, several of which are now\navailable in the <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/07/26/further-bioclipse-qsar-functionality.html\">QSAR feature of Bioclipse <i class=\"fa-solid fa-recycle fa-xs\"></i>\n</a>:</p>\n\n<p><img src=\"/assets/images/JOELibQSAR.png\" alt=\"\" /></p>\n\n<p>Meanwhile, the <a href=\"http://bioclipse.blogspot.com/\">Bioclipse team in Uppsala</a> has set up the obligatory\nscatter plot functionality, but leave that screenshot for them to show. Therefore, time for integration\nwith <a href=\"http://www.r-project.org/\">R</a>.</p>",
      "summary": "I added a Bioclipse plugin for JOELib (GPL, by Joerg) which comes with many QSAR descriptors, several of which are now available in the QSAR feature of Bioclipse :",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/JOELibQSAR.png",
      "date_published": "2007-10-18T00:10:00+00:00",
      "date_modified": "2025-03-30T00:00:00+00:00",
      "tags": ["bioclipse","qsar","java"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/qvsjx-vxn18",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/10/18/open-data-misconception-1-you-do-not.html",
      "title": "Open Data Misconception #1: you do not get cited for your contributions",
      "content_html": "<p>The <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/10/16/chemspider-suse-gnulinux-of-chemical.html\">Open Data/ChemSpider <i class=\"fa-solid fa-recycle fa-xs\"></i></a> debate is continuing,\nand <a href=\"http://baoilleach.blogspot.com/\">Noel</a> wondered in the <a href=\"http://www.chemspider.com/blog/\">ChemSpider Blog</a> item on the\n<a href=\"http://www.chemspider.com/blog/?p=208\">Open Data spectra in ChemSpider</a>. The spectra in ChemSpider come from four persons,\ntwo of which released their data as Open Data (Robert and <a href=\"http://usefulchem.blogspot.com/\">Jean-Claude</a>)\nand two as proprietary data.</p>\n\n<p>One of the two is Gary who expressed <a href=\"http://www.chemspider.com/blog/?p=208#comment-3648\">his concerns in the ChemSpider blog</a>\nthat people would not cite his contributions if he would release the data as Open Data:</p>\n\n<blockquote>\n  <p>In principle, someone could download an assortment of spectra for a given molecule, calculate some other spectra,\nand then write a paper without ever recording a single NMR spectrum of their own. Would they then include the\nindividual who deposited the spectra as a co-author or even acknowledge the source of the spectra that they used?\nWho knows.</p>\n</blockquote>\n\n<p>It is a misconception that releasing your Open Data will cause a situation that your scientific work is not acknowledged\n(citing statistics is the crude mechanism we use for that). First of all, using results without acknowledgment is called\n<strong>plagiarism</strong> (which is ethically wrong by any standard). But this is not a feature of Open Data, it is found in any\nform of science. Recall Herr Schön.</p>\n\n<p>Some months back I advised an other chemical database who had similar concerns, and I pointed the owners,\nlike I commented to Gary, to the <a href=\"http://creativecommons.org/licenses/by/2.5/\">CC-BY license</a> which has an explicit\nAttribution (BY) clause:</p>\n\n<blockquote>\n  <p><strong>Attribution</strong>. You must attribute the work in the manner specified by the author or licensor (but not in any way\nthat suggests that they endorse you or your use of the work).</p>\n</blockquote>\n\n<p>Using this license, plagiarism would not even just be (scientifically) unethical, it would be illegal too, because it would\nbrake the license agreement. This even allows one to bring the case to court, if you like. (BTW, I was recently informed\nthat the database had switched to the CC-BY license!)</p>",
      "summary": "The Open Data/ChemSpider debate is continuing, and Noel wondered in the ChemSpider Blog item on the Open Data spectra in ChemSpider. The spectra in ChemSpider come from four persons, two of which released their data as Open Data (Robert and Jean-Claude) and two as proprietary data.",
      
      "date_published": "2007-10-18T00:00:00+00:00",
      "date_modified": "2025-03-30T00:00:00+00:00",
      "tags": ["chemspider","opendata"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/j6dh2-02n14",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/10/16/lunch-at-nature-hq-with-euan-joanna-ian.html",
      "title": "Lunch at Nature HQ (with Euan, Joanna, Ian and Ålf)",
      "content_html": "<p>On my way back from the <a href=\"http://chem-bla-ics.blogspot.com/search/label/Taverna0710\">Taverna workshop</a> I visited Nature HQ, as\n<a href=\"http://blogs.nature.com/wp/nascent/2007/10/lunch_with_egon_willighagen.html\">Ian reported about on Nascent</a>. It was a (too) short\nmeeting, but very nice to meet <a href=\"http://network.nature.com/blogs/user/euan\">Euan</a> (finally; he wrote the\n<a href=\"http://postgenomic.com/\">postgenomic.com</a> software which I use for <a href=\"http://cb.openmolecules.net/\">Chemical blogspace</a>),\n<a href=\"http://network.nature.com/blogs/user/joannascott\">Joanna</a> (whom I met in Chicago already, where she had\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2007/03/26/acs-chicago-day-1.html\">two <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2007/03/29/acs-chicago-day-3.html\">presentations <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\nand is responsible for <a href=\"http://blogs.nature.com/wp/nascent/2007/08/events_on_second_nature.html\">Second Nature</a>),\n<a href=\"http://network.nature.com/blogs/user/U3DF456C6\">Ian</a> (who works on <a href=\"http://connotea.org/\">Connotea</a>,\nand <a href=\"http://network.nature.com/blogs/user/U3DF456C6/2007/10/08/molecule-tagging-with-connotea\">commented</a> on\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2007/09/19/tagging-molecules-mashup-of-connotea.html\">my tagging molecule blog <i class=\"fa-solid fa-recycle fa-xs\"></i></a>)\nand <a href=\"http://network.nature.com/profile/alf\">Ålf</a> (who works on <a href=\"http://scintilla.nature.com/\">Scintilla</a>) and\nbriefly <a href=\"http://network.nature.com/profile/timo\">Timo</a> (who rules them all). BTW, I had a simple but delicious pasta.</p>\n\n<p>First, let me note that if I would have to name a favorite molecule, and it was <a href=\"http://en.wikipedia.org/wiki/Acetic_acid\">acetic acid</a>,\nnot <a href=\"http://en.wikipedia.org/wiki/Ascorbic_acid\">ascorbic acid</a>. Reason why it would be my favorite is acetic acid was the first\norganic molecule I put in the <a href=\"http://www.woc.science.ru.nl/\">Woordenboek Organische Chemie</a> in 1995.</p>\n\n<p>We discussed a number of things, regarding the things we do. One of these was tagging molecules. Ian used\n<em>http://rdf.openmolecules.net/?info:inchi/InChI=1/CH4/h1H4</em> instead of <em>http://rdf.openmolecules.net/?InChI=1/CH4/h1H4</em>.\nThe first was not yet picked up by <a href=\"http://rdf.openmolecules.net/\">rdf.openmolecules.net</a> but I fixed that.</p>\n\n<p>We also discussed linking molecular structures with scientific literature. The discussions in blogspace of this week\nshow that doing that by using computer programs is not appreciated by publishers (see\n<a href=\"http://www.chemspider.com/blog/?p=204\">here</a>,\n<a href=\"https://blogs.ch.cam.ac.uk/pmr/2007/10/13/outrage-repurposing-open-access-material-is-allowed-without-explicit-permission/\">here <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\n<a href=\"https://blogs.ch.cam.ac.uk/pmr/2007/10/16/why-green-open-access-does-not-support-text-and-data-mining/\">here <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\n<a href=\"https://blogs.ch.cam.ac.uk/pmr/2007/10/15/indexing-open-access-and-free-access-articles/\">here <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\n<a href=\"http://www.chemspider.com/open-chemistry-web/?p=4\">here</a>, and\n<a href=\"https://blogs.ch.cam.ac.uk/pmr/2007/10/14/opsinoscar-you-us-we-please-help/\">here <i class=\"fa-solid fa-recycle fa-xs\"></i></a>)\n(The publishers seem to prefer to like to send of a PDF to India or China.)</p>\n\n<p>I proposed that the InChI would be part of the publication, for all molecules mentioned in the article. If a\njournal can require exact bibliography and experimental section formats, they can certainly require InChIs too.\nThere are few programs left which cannot autogenerate an InChI, and the chemists draws the structures anyway.\nHowever, the software used in the editorial process does not support linking InChIs with a PDF (if that software\nwould have been opensource …).</p>\n\n<p>So, the best current option seems to be social tagging mechanisms, and this is what we talked about. Just use\nConnotea (or any other service) and tag your molecule with a DOI:</p>\n\n<p><img src=\"/assets/images/doiTagDelicious.png\" alt=\"\" /></p>\n\n<p>and</p>\n\n<p><img src=\"/assets/images/connoteaTagDelicious1.png\" alt=\"\" /></p>\n\n<p>This tagging is done manually. No machines involved in that. Nothing the publishers can do about this. No ChemRefer needed.\nBut this will allow us to start building a database with links between papers and molecules, which we badly need. BTW, we will\nnot have to start from scratch. The <a href=\"http://www.nmrshiftdb.org/\">NMRShiftDB</a> already contains many links, which is open data!</p>\n\n<p>Now, you might notice the informal semantics of the <code class=\"language-plaintext highlighter-rouge\">doi:</code> prefix. That’s something I hereby propose, as it allow\nservices to pick up the content more easily. You might also note the <em>incorrect</em> DOI in Connotea. The reason for\nthat is that Connotea does not yet support a ‘/’ in a tag. I\n<a href=\"http://sourceforge.net/tracker/index.php?func=detail&amp;aid=1814491&amp;group_id=133040&amp;atid=726030\">reported that problem</a>.</p>",
      "summary": "On my way back from the Taverna workshop I visited Nature HQ, as Ian reported about on Nascent. It was a (too) short meeting, but very nice to meet Euan (finally; he wrote the postgenomic.com software which I use for Chemical blogspace), Joanna (whom I met in Chicago already, where she had two presentations , and is responsible for Second Nature), Ian (who works on Connotea, and commented on my tagging molecule blog ) and Ålf (who works on Scintilla) and briefly Timo (who rules them all). BTW, I had a simple but delicious pasta.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/doiTagDelicious.png",
      "date_published": "2007-10-16T00:10:00+00:00",
      "date_modified": "2025-03-30T00:00:00+00:00",
      "tags": ["connotea","chemistry","inchi","doi","nmrshiftdb"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/fvykg-vc55",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/10/16/chemspider-suse-gnulinux-of-chemical.html",
      "title": "ChemSpider: the SuSE GNU/Linux of chemical databases?",
      "content_html": "<p>A molecular structure without any properties in meaningless. Structure generators can easily build up a database of molecules of\nunlimited size. 30 million in CAS, 20 million in <a href=\"http://www.chemspider.com/\">ChemSpider</a> or\n15 million in <a href=\"http://pubchem.ncbi.nlm.nih.gov/\">PubChem</a> is nothing yet. The value comes in when linking those structures with\nexperimental properties.</p>\n\n<p>Now, chemical industry, academia and publishers have done there best in the past 50 years to maintain such databases, and decided\nthat a commercial model was the best option to maintain such databases. This was true 50 years ago, but no longer is. ICT has\nprogressed so much that a 20M database can be stored on a local hard disc, or site repository anyway. Moreover, and more importantly,\ncreating a database like this is much cheaper now. These ICT developments threaten the stone age chemical databases around now.\nCurrent approaches can easily build cheap and Open chemical databases; if we only all wanted.</p>\n\n<p>ChemSpider is attempting to set up the largest free chemical database, by mixing both Open data, as well as proprietary data.\nAs such, they are attempting to achieve what <a href=\"http://www.novell.com/linux/\">SuSE</a> and other commercial GNU/Linux distributions are\ntrying to do: create a valuable product by complementing Open data with proprietary data when that adds value. That is, I think\nthey are doing this. SuSE, for example, includes proprietary video drivers. ChemSpider, for example, contains proprietary molecular\nproperties computed by ACD/Labs software (BTW, some of which can be done with Open tools too, as I will show shortly.)</p>\n\n<p>Now, this poses quite a challenge: different licenses, different copyright holders, requirements to provide access to the source\n(for the Open data), etc, all in one system. Quite a challenge indeed, because ChemSpider is now required to track copyright and\nlicense information for each bit of information. GNU/Linux distributions do this by using a package (.deb, .rpm) approach. And,\nthe sheer size of the database poses strong requirements if people start downloading the whole lot.</p>\n\n<p>ChemSpider has <a href=\"http://www.chemspider.com/blog/?p=207\">had their share of critique</a>, but the are learning, and trying to find to\nset up a sustainable environment for what they want to do. That might involve a revenue stream from clients if there is no\ngovernmental organization, academic institute or some society stepping in to provide financial means. A valid question would be\nwhy the did not set up a non-profit organization. But neither did SuSE, RedHat and Mandriva, but that has not stopped those\nfrom contribution to Open source.</p>\n\n<p>I have no idea where ChemSpider will end up (consider that a request for a copy of the full set of Open Data), but am happy to\nhelp them distribute Open data, and even help them replace proprietary bits with open equivalents, which I’m sure the are open\ntoo. With respect to proprietary bits the are redistributing, I understand they can only relay the ODOSOS message to the\ncommercial partners from which they get those proprietary bits, and hope they are doing. ChemSpider has the great opportunity\nto show that releasing and contributing chemical data as Open Data does not conflict with a healthy self-sustainable business\nmodel.</p>",
      "summary": "A molecular structure without any properties in meaningless. Structure generators can easily build up a database of molecules of unlimited size. 30 million in CAS, 20 million in ChemSpider or 15 million in PubChem is nothing yet. The value comes in when linking those structures with experimental properties.",
      
      "date_published": "2007-10-16T00:00:00+00:00",
      "date_modified": "2007-10-16T00:00:00+00:00",
      "tags": ["chemspider"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/q3qes-phk35",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/10/14/why-odosos-is-important.html",
      "title": "Why ODOSOS is important",
      "content_html": "<p>I value <a href=\"https://blueobelisk.github.io/odosos.html\">ODOSOS <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nvery high: they are a key component of science, and scientific research,\nthough not every scientist sees these importance yet. I strongly believe that scientific progress is held back because of scientific\nresults not being open; it’s putting us back into the days of alchemy, where experiments were like black boxes and procedures kept\nsecretly. It was not until the alchemists started to properly write down procedures that it, as a science, took off. Now, with\nchemoinformatics in mind, we have the opportunity to write down our procedures in high detail.</p>\n\n<p>I keep wondering what the state of drug research would be, if the previous generation of chemoinformaticians would have valued ODOSOS\nas much as I do. Now, with a close relative being diagnosed last week with a form of cancer with low five-year survival rates, I cannot\nget more angry about those who want to make (unreasonable) money by selling scientific research. A 1M bonus <em>is</em> unreasonable.\nI can have 10 post-docs work on chemoinformatics research for the same period; I can have them work on drug design for various\nkinds of cancer.</p>\n\n<p>Therefore, I will continue to use every opportunity to convince people of ODOSOS, and will continue to develop new methods to improve\naccurate exchange of scientific data and experimental results. I will help people where I can to distribute open data, even if the\nwhole project is not 100% ODOSOS. For example, the <a href=\"https://cdk.github.io/\">Chemistry Development Kit <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nis open source itself (LGPL) which\ndoes allow embedding into proprietary software. This does not mean that I will contribute to the proprietary software, and actually\nam proud not having done so in the last 10 years.</p>\n\n<p>I will continue to advice people how to make their work more ODOSOS, even if they cannot make the full transition. I will also continue\nto make sure that all my scientific results are ODOSOS, as there is no other kind of science. To set a good example, and, hopefully,\nto lead the way.</p>\n\n<p>This is why I am a proud member of the <a href=\"https://www.blueobelisk.org/\">Blue Obelisk</a>.</p>",
      "summary": "I value ODOSOS very high: they are a key component of science, and scientific research, though not every scientist sees these importance yet. I strongly believe that scientific progress is held back because of scientific results not being open; it’s putting us back into the days of alchemy, where experiments were like black boxes and procedures kept secretly. It was not until the alchemists started to properly write down procedures that it, as a science, took off. Now, with chemoinformatics in mind, we have the opportunity to write down our procedures in high detail.",
      
      "date_published": "2007-10-14T00:00:00+00:00",
      "date_modified": "2025-03-08T00:00:00+00:00",
      "tags": ["openscience","blue-obelisk","opendata","opensource"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/62aqh-f9y91",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/10/14/complife2007-utrechtnl-day-1-and-2.html",
      "title": "CompLife2007, Utrecht/NL. Day 1 and 2",
      "content_html": "<p><a href=\"http://www.complife.com/\">CompLife 2007</a> was held 1.5 weeks ago in Utrecht, The Netherlands. The number of participants was much\nlower than last year in Cambridge. <a href=\"http://bioclipse.blogspot.com/\">Ola</a> and I gave a tutorial on <a href=\"http://bioclipse.net/\">Bioclipse</a>,\nand Thorsten one on <a href=\"http://www.knime.org/\">KNIME</a>. Since a visit to Konstance to meet the KNIME developers, I had not been able to\ndevelop a KNIME plugin, but this was a nice opportunity to finally do so. I managed to do so, and wrote up a plugin that takes\nInChIKeys and then goes of the <a href=\"http://www.chemspider.com/\">ChemSpider</a> to download MDL molfiles:</p>\n\n<p><img src=\"/assets/images/knime_chemspider.png\" alt=\"\" /></p>\n\n<p>Why ChemSpider? Arbitrary. Done PubChem in the past already. Moreover, ChemSpider has the largest database of molecular structures\nand in that sense important to my research.</p>\n\n<p>Why KNIME? Played with <a href=\"http://taverna.sf.net/\">Taverna</a> in the past, and expect to do much more work on Taverna in the coming year\n(see also <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/10/08/taverna-workshop-hinxton-uk.html\">this <i class=\"fa-solid fa-recycle fa-xs\"></i></a> and\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2007/10/08/taverna-workshop-day-1-update.html\">this <i class=\"fa-solid fa-recycle fa-xs\"></i></a>).\nMoreover, KNIME got a CDK plugin already,\nand the KNIME developers contributed valuable feedback to the CDK project in the last year. It was about time that I contributed\nsomething back, though the current functionality is quite limited. KNIME has a better architectural design than Taverna1, but will\nface though competition with Taverna2, due next year.</p>\n\n<h2 id=\"the-presentations\">The presentations</h2>\n\n<p>Heringa gave a presentation on network analysis, and discussed the scale-free network, hub nodes, etc, after which he gave an\nexample on the 14-3-3 PPI family which both have promoting and inhibiting capabilities. Fraser presented work on improving\nmicroarray data analysis, by reducing non-random background noise. <a href=\"http://timon.info/wiki/Wiki.jsp?page=News.pub\">Schroeter</a>\npresented the use of Gaussian process modeling in QSAR studies, which allows estimation of error bars (see\nDOI:<a href=\"https://doi.org/10.1002/cmdc.200700041\">10.1002/cmdc.200700041</a>. I did not feel the results were very convincing, though,\nbut the method sounds interesting. Larhlimi presented research on network analysis of metabolic networks. His approach finds\nso-called minimal forward direction cuts, which identifies critical parts in the network if one is interested in repressing\ncertain metabolic processes. Hofto presented some work on the use of DFT for proteins, and picked up that one has to do things\ncritically to be able to reproduce binding affinities. Combinations of DFT or MM with QM are becoming popular to model binding\nsites. Van Lenthe presented such an approach of the second day of CompLife.</p>\n\n<p>By far the most interesting talk at the conference, was the insightful presentation by <a href=\"http://bioinformatics.bio.uu.nl/ph/\">Paulien Hogeweg</a>.\nShe apparently coined the term <em>bioinformatics</em>. Anyway, she had a exciting presentation on feed-forward loops in relation to\nevolution, and showed correlation between jumps in FFL motifs with biodiversity. She also warned us for the Monster of\nLoch Ness syndrome, where computational models may indicate large underlying processes, which are not really existing.\nBut that should be a problem that most of my readers should be aware of. She introduced evolutionary modeling, to put further\nrestrictions on the models, to reduce the chance of finding monsters.</p>\n\n<p>Hussong had an interesting presentation too, if one is interested in analysis of GC/MS or LC/MS data. He introduced a\nhard-modeling approach for proteomics data using wavelets technology. His angle on this was to use a wavelet that represents\nthe isotopic pattern of a protein mass spectrum. Interestingly, the wavelet had negative intensities, something which one\nwill never find in mass spectra. However, I seem to recall a mathematical restriction on wavelets that would forbid taking\nthe squared version of the function. He indicated that the code is available via\n<a href=\"http://open-ms.sourceforge.net/index.php\">OpenMS</a>.</p>\n\n<p>Jensen, finally, presented his work at the <a href=\"http://www-ucc.ch.cam.ac.uk/\">UCC</a> on Markov models for protein folding, where\nhe uses the <em>mean first passage time</em> as observable to analyze of processes in folding state space. This allows him to\ncompare different modeling approaches and, for example, to predict how many time steps are needed to reach folding.\nBeing able to measure characteristics of certain modeling methods, one is able to make a objective comparison. Something\nwhich allows <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/09/13/outscoring-old-science.html\">a fair competition <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.</p>",
      "summary": "CompLife 2007 was held 1.5 weeks ago in Utrecht, The Netherlands. The number of participants was much lower than last year in Cambridge. Ola and I gave a tutorial on Bioclipse, and Thorsten one on KNIME. Since a visit to Konstance to meet the KNIME developers, I had not been able to develop a KNIME plugin, but this was a nice opportunity to finally do so. I managed to do so, and wrote up a plugin that takes InChIKeys and then goes of the ChemSpider to download MDL molfiles:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/knime_chemspider.png",
      "date_published": "2007-10-14T00:00:00+00:00",
      "date_modified": "2025-08-10T00:00:00+00:00",
      "tags": ["knime","chemspider"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1002/cmdc.200700041", "doi": "10.1002/cmdc.200700041"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/58bs7-cpq93",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/10/08/taverna-workshop-day-1-update.html",
      "title": "Taverna Workshop, Day 1 Update",
      "content_html": "<p>The second part of the morning session featured a presentation by Sirisha Gollapudi which spoke about mining\nbiological graphs, such as protein-protein interaction networks and metabolic pathways. Patterns detection\nfor nodes with only one edge, and cycles etc, using Taverna. An example data she worked on is the Palsson human\nmetabolism (doi:<a href=\"https://doi.org/10.1073/pnas.0610772104\">10.1073/pnas.0610772104</a>); she mentioned that this\nmetabolite data set contains <a href=\"http://en.wikipedia.org/wiki/Cocaine\">cocaine</a> :) Neil Chue Hong finished with\nan introduction on the <a href=\"http://www.omii.ac.uk/\">OMII-UK</a> which is co-host of this meeting.</p>\n\n<p>After lunch Mark Wilkinson introduced <a href=\"http://biomoby.org/\">BioMoby</a>, which we actually use in Wageningen already.\nI have tried to use <a href=\"http://biomoby.open-bio.org/CVS_CONTENT/moby-live/Java/docs/\">jMoby</a> to set up services\nbased on the <a href=\"http://cdk.sf.net/\">CDK</a>, but failed sofar. Will talk with Mark on that. Next was my presentation,\nand I spoke about <a href=\"http://www.cdk-taverna.de/\">CDK-Taverna</a>, <a href=\"http://www.bioclipse.net/\">Bioclipse</a> and some\npeculiarities with chemoinformatics workflow, like the importance with intermediate interaction, the need to\nvisualize the data and complex, information rich data. Bioclipse is seeing\n<a href=\"http://wiki.bioclipse.net/index.php?title=Bioclipse2\">an integration of BioMoby and of Taverna</a>.</p>\n\n<p>After the coffee brake Marco Roos spoke about <a href=\"http://myexperiment.org/\">myExperiment</a> and his work on text\nmining. I unfortunately missed this presentation, as I was meeting with people from the EBI who work on the\n<a href=\"http://www.ebi.ac.uk/thornton-srv/databases/MACiE/\">MACiE database</a> (see\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2006/02/17/chemical-reactions-in-cml.html\">this blog item <i class=\"fa-solid fa-recycle fa-xs\"></i></a>).</p>\n\n<p>A discussion session afterwards introduced a few more Taverna uses, and encountered technical problems.\nTaverna2 is actually going to be quite interesting, with a data caching system between work processors, and a\npowerful scheme of annotation of processors, which will allow rating, finding local services, etc. More on\nthat tomorrow. Dinner time now :)</p>",
      "summary": "The second part of the morning session featured a presentation by Sirisha Gollapudi which spoke about mining biological graphs, such as protein-protein interaction networks and metabolic pathways. Patterns detection for nodes with only one edge, and cycles etc, using Taverna. An example data she worked on is the Palsson human metabolism (doi:10.1073/pnas.0610772104); she mentioned that this metabolite data set contains cocaine :) Neil Chue Hong finished with an introduction on the OMII-UK which is co-host of this meeting.",
      
      "date_published": "2007-10-08T00:10:00+00:00",
      "date_modified": "2025-03-30T00:00:00+00:00",
      "tags": ["taverna","ebi","cdk","bioclipse","myexperiment"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1073/pnas.0610772104", "doi": "10.1073/pnas.0610772104"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/w817a-rtj32",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/10/08/taverna-workshop-hinxton-uk.html",
      "title": "Taverna Workshop, Hinxton, UK",
      "content_html": "<p>I arrived at the <a href=\"http://www.ebi.ac.uk/\">EBI</a> last night for the <a href=\"http://taverna.sf.net/\">Taverna</a> workshop, during which the design\nof Taverna2 is presented and workflow examples are discussed. Several ‘colleagues’ from Wageningen and the SARA computing center\nin Amsterdam are present, along with many other interesting people. This afternoon is my presentation.</p>\n\n<p>Paul Fisher just presented his PhD work on using workflows to improve the throughput of QTL matching against pathway information and\nphenotype. One interesting note was its function to make biological informational studies more reproducible. He had getting the\nversions of online databases explicitly in the workflow, so that it gets stored in workflow output.</p>",
      "summary": "I arrived at the EBI last night for the Taverna workshop, during which the design of Taverna2 is presented and workflow examples are discussed. Several ‘colleagues’ from Wageningen and the SARA computing center in Amsterdam are present, along with many other interesting people. This afternoon is my presentation.",
      
      "date_published": "2007-10-08T00:00:00+00:00",
      "date_modified": "2007-10-08T00:00:00+00:00",
      "tags": ["taverna","java"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/za3b6-jg770",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/10/01/how-blogosphere-changes-publishing.html",
      "title": "How the blogosphere changes publishing",
      "content_html": "<p>Peter <a href=\"https://blogs.ch.cam.ac.uk/pmr/2007/09/30/open-grant-writing-can-the-chemical-blogosphere-help-with-agents-and-eyeballs/\">is writing up a 1FTE grant proposal <i class=\"fa-solid fa-recycle fa-xs\"></i></a> for someone to work\non the question how automatic agents and, more interestingly, the blogosphere are changing, no improving, the\ndissemination of scientific literature. He wants our input. To make his work easy, I’ll tag this item <code class=\"language-plaintext highlighter-rouge\">pmrgrantproposal</code>\nand would ask everyone to do the same (Peter unfortunately did not suggest a tag himself). Here are pointers to\nblog items I wrote, related to the four themes Peter identifies.</p>\n\n<h3 id=\"the-blogosphere-oversees-all-major-open-discussion\">The blogosphere oversees all major Open discussion</h3>\n\n<ul>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2006/05/07/open-text-mining-interface-and.html\">Open Text Mining Interface and Bioclipse <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2006/01/11/uspto-considers-open-source-software.html\">USPTO considers open source software prior art <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2007/09/07/new-inchi-software-beta-license-issues.html\">New InChI software beta: license issues resolved and InChIKey <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2007/09/28/smiles-to-become-open-standard.html\">SMILES to become an Open Standard <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n</ul>\n\n<h3 id=\"the-blogosphere-cares-about-data\">The blogosphere cares about data</h3>\n\n<ul>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2006/04/02/uncertainty-in-nmr-based-3d-protein.html\">Uncertainty in NMR based 3D protein models <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2007/09/21/re-acs-rss-feeds-are-messed-up.html\">re: ACS RSS feeds are messed up <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2007/08/11/molecules-in-wikipedia-without-inchis.html\">Molecules in Wikipedia without InChIs <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n</ul>\n\n<h3 id=\"important-bad-science-cannot-hide\">Important bad science cannot hide</h3>\n\n<p>I do not feel much like pointing to bad scientific articles, but want to point to the enormous amount of literature\nbeing discussed in <a href=\"http://cb.openmolecules.net/papers.php\">Chemical blogspace</a>:\n60 <em>active</em> chemical blogs discussed just over 1300 peer-reviewed papers from\n213 scientific journals in less than 10 months. The top 5 journals have 133, 78, 68, 57 and 48 papers discussed in\n22, 24, 10, 11 and 18 different blogs respectively. (Peter, if you need more in depth statistics, just let me\nknow…)</p>\n\n<p>Two examples where I discuss not-bad-at-all scientific literature:</p>\n\n<ul>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2007/08/24/automatic-classification-of-thousands.html\">Automatic Classification of thousands of Crystal Structures <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2007/07/14/cdk-literature-2.html\">CDK Literature #2 <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n</ul>\n\n<h3 id=\"open-notebook-science\">Open Notebook Science</h3>\n\n<p>I regularly blog about the chemoinformatics research I do in my blog. A few examples from the last half year:</p>\n\n<ul>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2007/02/03/cdk-workshop-days-3-and-4.html\">CDK Workshop - Days #3 and #4 <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2007/05/30/weka-decision-trees-to-java-conversion.html\">Weka Decision Trees to Java Conversion <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2007/07/31/rdf-ing-molecular-space.html\">RDF-ing molecular space <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2007/07/01/atom-typing-in-cdk.html\">Atom typing in the CDK <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2007/07/26/further-bioclipse-qsar-functionality.html\">Further Bioclipse QSAR functionality development <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n</ul>",
      "summary": "Peter is writing up a 1FTE grant proposal for someone to work on the question how automatic agents and, more interestingly, the blogosphere are changing, no improving, the dissemination of scientific literature. He wants our input. To make his work easy, I’ll tag this item pmrgrantproposal and would ask everyone to do the same (Peter unfortunately did not suggest a tag himself). Here are pointers to blog items I wrote, related to the four themes Peter identifies.",
      
      "date_published": "2007-10-01T00:00:00+00:00",
      "date_modified": "2025-03-02T00:00:00+00:00",
      "tags": ["publishing","pmrgrantproposal","cb"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/k274f-na534",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/09/30/complife2007-utrechtnl-taverna.html",
      "title": "CompLife2007, Utrecht/NL; Taverna, EBI/Hinxton/UK",
      "content_html": "<p>Two working days left before I’m off to two conferences. First, next Thursday/Friday, the two day <a href=\"http://www.inf.uni-konstanz.de/complife07/\">CompLife2007</a>\nin Utrecht/NL, with sessions on genomics, systems biology, medical information and data analysis. And, on the second day tutorials on\n<a href=\"http://knime.org/\">KNIME</a> and <a href=\"http://cdk.sf.net/\">CDK</a>/<a href=\"http://www.bioclipse.net/\">Bioclipse</a>. I will try to orient as much as possible around\nMS-based metabolomics, and metabolite identity in particular. <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/09/28/complife06-day-1.html\">Last year <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nthe conference was very interesting.</p>\n\n<p>The Monday/Tuesday after that, I will present CDK-<a href=\"http://taverna.sourceforge.net/\">Taverna</a> integration I worked on in 2005 (see e.g.\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2006/05/18/taverna-runs-with-classpath-091.html\">Taverna on Classpath <i class=\"fa-solid fa-recycle fa-xs\"></i></a> and\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2005/10/18/cdk-taverna-fully-recognized.html\">CDK-Taverna fully recognized <i class=\"fa-solid fa-recycle fa-xs\"></i></a>) at the\n<a href=\"http://taverna.sourceforge.net/index.php?doc=workshop.html\">Taverna meeting</a>, before Thomas continued on that,\nleading to the <a href=\"http://cdk-taverna.de/\">cdk-taverna.de</a> plugin website. If time permits, I will prepare an example\nworkflow from metabolomics. Unlike previous times I went to Cambridgeshire, I won’t fly in on Stansted, but take the\n<a href=\"http://www.eurostar.com/\">EuroStar</a> instead. I am very much looking forward to that. Unfortunately, I will not have time\nto visit Cambridge itself, this time :(</p>",
      "summary": "Two working days left before I’m off to two conferences. First, next Thursday/Friday, the two day CompLife2007 in Utrecht/NL, with sessions on genomics, systems biology, medical information and data analysis. And, on the second day tutorials on KNIME and CDK/Bioclipse. I will try to orient as much as possible around MS-based metabolomics, and metabolite identity in particular. Last year the conference was very interesting.",
      
      "date_published": "2007-09-30T00:00:00+00:00",
      "date_modified": "2025-10-05T00:00:00+00:00",
      "tags": ["cdk","taverna","knime","bioclipse"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/8c47c-8dq17",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/09/28/smiles-to-become-open-standard.html",
      "title": "SMILES to become an Open Standard",
      "content_html": "<p>Craig James wants <a href=\"http://sourceforge.net/mailarchive/forum.php?thread_name=46FA7584.9080902%40emolecules.com&amp;forum_name=blueobelisk-discuss\">to make SMILES an open standard</a>,\nand this has been received with much enthusiasm. SMILES (<a href=\"http://en.wikipedia.org/wiki/SMILES\">Simplified molecular input line entry specification</a>)\nis a de facto standard in chemoinformatics, but the specification is not overly clear, which Craig wants to address. The\n<a href=\"http://blueobelisk.svn.sf.net/svnroot/blueobelisk/smiles-spec/trunk/smiles_spec.html\">draft</a> is CC-licensed and will be discussed on the new\n<a href=\"http://www.blueobelisk.org/\">Blue Obelisk</a> <a href=\"http://sourceforge.net/mailarchive/forum.php?forum_name=blueobelisk-smiles\">blueobelisk-smiles</a>\nmailing list.</p>\n\n<p>Illustrative is my confusion about the sp2 hybridized atoms, which use lower case element symbols in SMILES. Very often this is seen as\nindicating aromaticity. I have written up <a href=\"http://wiki.cubic.uni-koeln.de/cdkwiki/doku.php?id=smiles_aromaticity\">the arguments supporting both views</a>\nin the <a href=\"http://wiki.cubic.uni-koeln.de/cdkwiki/doku.php?id=start\">CDK wiki</a>. I held the position that lower case elements indicated\nsp2 hybridization, and the CDK SMILES parser was converted accordingly some years ago. A recent discussion, however, stirred up the\ndiscussion once more (which led to the aforementioned wiki page).</p>\n\n<p>You can imagine my excitement when I looked up the meaning in the new draft. It states: <em>The formal meaning of a lowercase “aromatic”\nelement in a SMILES string is that the atom is in the sp2 electronic state. When generating a normalized SMILES, all sp2 atoms are\nwritten using a lowercase first character of the atomic symbol. When parsing a SMILES, a parser must note the sp2 designation of each\natom on input, then when the parsing is complete, the SMILES software must verify that electrons can be assigned without violating the\nvalence rules, consistent with the sp2 markings, the specified or implied hydrogens, external bonds, and charges on the atoms.</em></p>",
      "summary": "Craig James wants to make SMILES an open standard, and this has been received with much enthusiasm. SMILES (Simplified molecular input line entry specification) is a de facto standard in chemoinformatics, but the specification is not overly clear, which Craig wants to address. The draft is CC-licensed and will be discussed on the new Blue Obelisk blueobelisk-smiles mailing list.",
      
      "date_published": "2007-09-28T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["smiles","blue-obelisk"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/48njk-xw931",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/09/24/googles-view-in-history.html",
      "title": "Google&apos;s view in history",
      "content_html": "<p>Pierre pointed me to <a href=\"http://plindenbaum.blogspot.com/2007/09/google-viewtimeline.html\">Google’s view:timeline</a> feature,\nwhich shows the search results on a time line, by recognizing phrases like “on 25 September 2000…”. This is\n<a href=\"http://www.google.com/views?q=%22chemistry+development+kit%22+view%3Atimeline&amp;btnGt=Search&amp;esrch=RefinementBarTopViewTabs\">its view</a>\non the <a href=\"http://cdk.sf.net/\">Chemistry Development Kit</a>:</p>\n\n<p><img src=\"/assets/images/googleTimeline.png\" alt=\"\" /></p>",
      "summary": "Pierre pointed me to Google’s view:timeline feature, which shows the search results on a time line, by recognizing phrases like “on 25 September 2000…”. This is its view on the Chemistry Development Kit:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/googleTimeline.png",
      "date_published": "2007-09-24T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["google","cdk"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/bqg35-shz93",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/09/21/re-acs-rss-feeds-are-messed-up.html",
      "title": "re: ACS RSS feeds are messed up",
      "content_html": "<p>A couple of people now confirmed the <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/09/16/acs-rss-feeds-are-messed-up.html\">problem with the ACS journal RSS feeds <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.\nBeing back behind my desktop machine, I can post the obligatory screenshot:</p>\n\n<p><img src=\"/assets/images/acsRSSproblem.png\" alt=\"\" /></p>\n\n<p>The feed for Chemical Biology shows 79 feed items and the first one was a <em>Environ. Sci. Technol.</em> paper. The first of the\n108 papers listed in the feed for Molecular Pharmaceutics is a perp from <em>J.Phys.Chem.C.</em></p>",
      "summary": "A couple of people now confirmed the problem with the ACS journal RSS feeds . Being back behind my desktop machine, I can post the obligatory screenshot:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/acsRSSproblem.png",
      "date_published": "2007-09-21T00:10:00+00:00",
      "date_modified": "2025-03-30T00:00:00+00:00",
      "tags": ["acs","rss"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.59350/xtm6b-5gf91", "doi": "10.59350/xtm6b-5gf91"
            , "cito":
              
              
                [ 
                  "repliesTo"
                  
                 ]
              
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/atgfm-4qc39",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/09/21/tuning-google-results.html",
      "title": "Tuning Google Results?",
      "content_html": "<p>I was just about to install <a href=\"http://subclipse.tigris.org/\">Subclipse</a> (for the millionth time), and\ngoogled for the update site details:</p>\n\n<p><img src=\"/assets/images/googleTuning.png\" alt=\"\" /></p>\n\n<p>Does anyone know how you can get Google pick up or how it detects the Download/UpdateSite/etc pages,\nshown as direct links below the primary hit? Are HTML <code class=\"language-plaintext highlighter-rouge\">&lt;link&gt;</code> elements used for that? Or does it use\ncertain meta data, microformats, …?</p>",
      "summary": "I was just about to install Subclipse (for the millionth time), and googled for the update site details:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/googleTuning.png",
      "date_published": "2007-09-21T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["google","bioclipse"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/cwqb0-3yj62",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/09/20/swt-view-with-new-jchempaint.html",
      "title": "SWT View with the new JChemPaint",
      "content_html": "<p>The second <a href=\"http://programmeerzomer.nl/\">Programmeerzomer</a> and the second summer of code for me, will end tomorrow with a presentation of\n<a href=\"http://progz-jchem.blogspot.com/\">Niels on his new JChemPaint code</a>. The summer is over before you know it. One of the goals was\nmaking the JChemPaint editor Swing independent and more easy to integrate with SWT widgets.</p>\n\n<p>So, I hacked up the last bits of Bioclipse code. However, the CDK version in the net.bioclipse.cdk is still CDK 1.0, and Niels’\nrequires CDK trunk/. So I copied in the <code class=\"language-plaintext highlighter-rouge\">cdk.jar</code> and <code class=\"language-plaintext highlighter-rouge\">cdk-jchempaint.jar</code> from <code class=\"language-plaintext highlighter-rouge\">trunk/</code> into the <em>net.bioclipse.cdk.progz</em>\nplugin, only to find out that that gives binary problems: the plugin would still depend on <em>net.bioclipse.cdk.ui</em> which depends\non the CDK 1.0 plugin…</p>\n\n<p><img src=\"/assets/images/swtJCP.png\" alt=\"\" /></p>\n\n<p>To get something going, I removed the dependency on CDKResource. So, the screenshot above is a bit artificial: it shows a\nstatic picture and the View does not react on ISelectionEvent’s. But that is a Bioclipse/CDK issue, and not caused by Niels’\ncode. Additionally, it does not handle mouse events or so. For that, I need to make it an SWT Editor first.</p>",
      "summary": "The second Programmeerzomer and the second summer of code for me, will end tomorrow with a presentation of Niels on his new JChemPaint code. The summer is over before you know it. One of the goals was making the JChemPaint editor Swing independent and more easy to integrate with SWT widgets.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/swtJCP.png",
      "date_published": "2007-09-20T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["bioclipse","jchempaint"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/dvdc0-ps191",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/09/19/tagging-molecules-mashup-of-connotea.html",
      "title": "Tagging Molecules: a mashup of Connotea and RDF",
      "content_html": "<p>Using the InChI and the new <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/07/31/rdf-ing-molecular-space.html\">rdf.openmolecules.net <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nwebsite, it is now possible to tag molecules. And if you use <a href=\"http://www.connotea.org/\">Connotea</a> for that, your tags will\neven show up on the rdf.openmolecules.net website. For example, at the time of writing,\n<a href=\"http://en.wikipedia.org/wiki/Methane\">methane</a> was tagged with <em>alkanes</em> and <em>gas</em>.</p>\n\n<p>The trick I use, is that the rdf.openmolecules.net gives every molecule a unique HTTP <a href=\"http://en.wikipedia.org/wiki/URL\">URL</a>.\nThis simply web2.0 approach offers an enormous amount of possibilities. The simplest application is that you can tag your molecules\nwith a set label, such as <em>my-tomato-set</em>; after all, Connotea is account based. In this way, you can do open notebook QSAR studies (though the activities would still be missing).</p>\n\n<p>The aforementioned example, however, give two classifications. <em>Methane is an alkane</em> and <em>Methane is a gas</em> (at room temperature).\nNot very well determined semantics, but it is web2.0, not the semantic web.</p>\n\n<p>Interestingly, given some loosely defined semantics, use Connotea to link a molecule to a certain publication. For example,\nI can define <em><a href=\"http://en.wikipedia.org/wiki/Estrone\">Estrone</a> is cited in the article with PubMed ID 15659855</em> using the\ntag <em>pmid:15659855</em>. I’m sure using the DOI would work too, using the tag <a href=\"https://doi.org/10.1107/S0108768104028344\">doi:10.1107/S0108768104028344</a>.\nI have not used these informal semantics in the rdf.openmolecules.net website yet, but if there is such interest,\nI can have such functionality hacked in minutes.</p>\n\n<p>BTW, did anyone see <a href=\"http://www.geneontology.org/\">Gene Ontology</a> terms being used in social bookmarking services?\nFor example, seeing a link to the PDB database with a tag <em>go:0008152</em>? Would be a bit cryptic, and, really, in this\ncase rather minimalistic on information.</p>\n\n<h2 id=\"whats-next\">What’s next?</h2>\n\n<p>Now comes the tedious task of converting the QSAR data sets I used in my PhD research with these tags. It’s really\nsomething I wanted to do for a while now. Next on my <a href=\"http://en.wikipedia.org/wiki/TODO\">TODO</a> list is the\nGreasemonkey script that adds the tags from Connotea to PubChem.</p>",
      "summary": "Using the InChI and the new rdf.openmolecules.net website, it is now possible to tag molecules. And if you use Connotea for that, your tags will even show up on the rdf.openmolecules.net website. For example, at the time of writing, methane was tagged with alkanes and gas.",
      
      "date_published": "2007-09-19T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["rdf","connotea","inchi"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1107/S0108768104028344", "doi": "10.1107/S0108768104028344"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/xtm6b-5gf91",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/09/16/acs-rss-feeds-are-messed-up.html",
      "title": "ACS RSS feeds are messed up",
      "content_html": "<p>All start is difficult. The ACS must know that, but <a href=\"http://blog.everydayscientist.com/?p=661\">they still blame Google</a>.\nIn this blog, <a href=\"http://blog.everydayscientist.com/\">Everyday Scientist</a> mentions that the ACS RSS journal TOC feeds are\nsometimes messed up. I noted that too, but lived with it. The ACS generally is a very professional organization, but\nwhen I read they told ES that his Google RSS client was the problem, I just had to confirm his problems, hoping that\nsome ACS representative can relay the message to their IT department.</p>\n\n<p><a href=\"http://akregator.kde.org/\">Akregator</a> is the RSS client that I use, and I noticed the exact same problem ES noted:\nevery now and then, twice, three times a month, the RSS feeds for one journal give the content of another journal.\nFor example, I get the TOC of Chemical Reviews in the RSS feed of the <a href=\"http://pubs.acs.org/journals/jcisd8/index.html\">JCIM</a>.\nThings like that.</p>\n\n<p>Now, because Akregator has absolute nothing to do with Google, it cannot be Google who is messing up the RSS feeds.\nBecause ES is having the same issues I have, it cannot be Akregator either. Ergo, it is the RSS feed system of the\nACS itself that is messed up.</p>",
      "summary": "All start is difficult. The ACS must know that, but they still blame Google. In this blog, Everyday Scientist mentions that the ACS RSS journal TOC feeds are sometimes messed up. I noted that too, but lived with it. The ACS generally is a very professional organization, but when I read they told ES that his Google RSS client was the problem, I just had to confirm his problems, hoping that some ACS representative can relay the message to their IT department.",
      
      "date_published": "2007-09-16T00:10:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["acs","rss"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/s8dts-h8t13",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/09/16/why-plain-qsar-is-not-enough-for-me.html",
      "title": "Why plain QSAR is not enough for me...",
      "content_html": "<p><a href=\"http://chemistrylabnotebook.blogspot.com/\">Amanda</a> had a very nice post on <a href=\"http://chemistrylabnotebook.blogspot.com/2007/09/on-thursday-helen-blackwell-from.html\">Small molecules that modulate quorum sensing</a>.\nIt’s the perfect read for a Sunday morning, when you have a view looking down on Strasbourg from a hill in the\n<a href=\"http://en.wikipedia.org/wiki/Black_forrest\">Black Forrest</a>. Biology fascinates me, particularly when small\nmolecules are involved. And the molecular signaling used by these bacteria is just delightful. Make sure to\nread up on the small squids in 96-well plates too! (And we are worried about <a href=\"http://www.dierenwelzijn-nederland.nl/varkensflats.htm\">varkensflats</a>!\nThat’s put in perspective :) These very small squids have a symbiosis with bacteria that light up under\ncertain conditions, and this squid species learned how to control that lightning. Nerdy facts like this\nadds that coolness factor that outliers in QSAR lack.</p>\n\n<h2 id=\"small-molecule-macroarrays\">Small-molecule macroarrays</h2>\n\n<p>Another bit in Amanda’s blog catched my eye too: the small-molecule macroarray. I had not seen that term before,\nand looked up the paper by Brown et al. <em>Rapid Identification of Antibacterial Agents Effective against\nStaphylococcus aureus Using Small-Molecule Macroarrays</em> (DOI:<a href=\"https://doi.org/10.1016/j.chembiol.2007.03.006\">10.1016/j.chembiol.2007.03.006</a>).\nLike the more famous (gene expression) microarray, this SMMs are arrays of wells where small molecules are\nconnected to a planer cellulose support system, after which the antibacterial activity can be measure.\nNow, I do have to read up on this technology. For example, are the small-molecule inhibitors released\ninto the assay medium at some point? That is, they will need to find their way to whatever protein it\ninhibits, as the protein will not go to the support system. Can anyone explain me how to inhibition\ntakes place?</p>",
      "summary": "Amanda had a very nice post on Small molecules that modulate quorum sensing. It’s the perfect read for a Sunday morning, when you have a view looking down on Strasbourg from a hill in the Black Forrest. Biology fascinates me, particularly when small molecules are involved. And the molecular signaling used by these bacteria is just delightful. Make sure to read up on the small squids in 96-well plates too! (And we are worried about varkensflats! That’s put in perspective :) These very small squids have a symbiosis with bacteria that light up under certain conditions, and this squid species learned how to control that lightning. Nerdy facts like this adds that coolness factor that outliers in QSAR lack.",
      
      "date_published": "2007-09-16T00:00:00+00:00",
      "date_modified": "2025-02-16T00:00:00+00:00",
      "tags": ["qsar"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1016/j.chembiol.2007.03.006", "doi": "10.1016/j.chembiol.2007.03.006"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/dze3k-zky17",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/09/13/outscoring-old-science.html",
      "title": "Outscoring old science",
      "content_html": "<p>Rich <a href=\"https://doi.org/10.59350/6aep1-v9455\">posted a nice quote <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nthe other day on the introduction of the forward pass in football some 100 years ago, and linked that to sciences. I commented\nwith the remark that the outscoring is the problem:</p>\n\n<blockquote>\n  <p>The big question is: how do we measure our outscore. The other football teams would not have switched too, if the success of the St.Luois team if the outscore was obscured.</p>\n\n  <p>In openaccess publications, there is a slight outscore: higher impact for openaccess publications. But I do not feel this effect is as pronounced as in the football example.</p>\n\n  <p>You got a good statistics to impress people new the forward-pass in science?</p>\n</blockquote>\n\n<p>Just after that, I read this blog by Antony on <a href=\"http://www.chemspider.com/blog/?p=132\">survival-of-the-fittest chemical search engine</a>.\nEven though the measurement of the score is easy, these statistics can easily be obfuscated. Independent rankings, like Google Rank\nand Alexa Rank, may help.</p>\n\n<p>However, what we really need is a direct competition. Us against them, old against new. I don’t mind to be in either group, as long\nas it is the fittest. But, we urgently need to define what fittest is. Agreeing with\n<a href=\"http://blogs.nature.com/wp/nascent/2007/09/prism_publishers_and_researche_1.html\">Timo’s statement</a>\n(e.g. <em>“It therefore troubled me that the initial counterattacks on PRISM were themselves often lacking in nuance and discrimination.”</em>),\nwe need exact measures to do the discrimination. Each team prepares for the game, plays the competition, indepdent scoring, and there\nis your 142-11 outscore. PRISM versus PloS, <a href=\"http://www.chemspider.com/blog/?p=126\">Modgraph versus ACD/Labs</a>,\nCDK against OpenBabel, KNIME versus Taverna, JOELib versus Dragon, microformats versus RDFa, openscience versus patents, PLS versus SVR,\ngemini versus single-tail surfactants, … Bring on those competations! Let the score be clear (open), fair, and discriminating!</p>\n\n<p>Maybe this is something we should set up with the <a href=\"http://www.bluobelisk.org/\">Blue Obelisk</a>: a yearly competition, with various\ncategories (think: databases, prediction, modeling, …), with scientific relevant judging.</p>\n\n<p>May the best team win!</p>",
      "summary": "Rich posted a nice quote the other day on the introduction of the forward pass in football some 100 years ago, and linked that to sciences. I commented with the remark that the outscoring is the problem:",
      
      "date_published": "2007-09-13T00:00:00+00:00",
      "date_modified": "2025-01-30T00:00:00+00:00",
      "tags": ["publishing"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.59350/6aep1-v9455", "doi": "10.59350/6aep1-v9455"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/cqv2x-hph36",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/09/07/new-inchi-software-beta-license-issues.html",
      "title": "New InChI software beta: license issues resolved and InChIKey",
      "content_html": "<p>The <a href=\"http://www.iupac.org/inchi/\">IUPAC/NIST team</a> made a beta release of the next InChI software release:</p>\n\n<blockquote>\n  <p>The principal new features of this release are:</p>\n\n  <ol>\n    <li>A fixed-length (25-character) condensed digital representation of the Identifier to be known as InChIKey. In particular, this will:\n      <ul>\n        <li>facilitate web searching, previously complicated by unpredictable breaking of InChI character</li>\n        <li>strings by search engines</li>\n        <li>allow development of a web-based InChI lookup service</li>\n        <li>permit an InChI representation to be stored in fixed length fields</li>\n        <li>make chemical structure database indexing easier</li>\n        <li>allow verification of InChI strings after network transmission.</li>\n      </ul>\n    </li>\n    <li>Restructured InChI-generating software that separates key steps in its creation from an input chemical structure file. Among other uses, this allows checking of intermediate results to enable easier testing and development of InChI-based applications.</li>\n    <li>Bug fixes designed to withstand malicious attempts to attack a Web server by providing a specially designed InChI string input to InChI binaries.</li>\n  </ol>\n\n  <p>We would welcome reports of your experiences with this new release and, of course, any problems.</p>\n</blockquote>\n\n<h2 id=\"inchikey\">InChIKey</h2>\n\n<p>A had heard about the InChIKey extension earlier, and it solves the issue some people have with the InChI: it is too\nlong. Well, molecules can have many atoms indeed. It is important to realize the InChIKey is not a replacement:\nit simply is not unique. The collision probability is calculated to be rather small, though. But clashes may occur,\nand sees from the above statistics quite likely for the number of molecules estimated to be drug-like, which is\nestimated at ~10<sup>60</sup>. Moreover, these are theoretical probabilities which may not apply to the subset\nof molecules we actually tend to look at.</p>\n\n<p>Anyway, the InChIKey is not a unique identifier, and never use it as such; that’s what you need to remember.</p>\n\n<p>An interesting feature is that addition of a check character, which enables some verification of typos. Nothing said\nabout collision clashes there, which exist too. And the fixed length has its virtues too. That said, it certainly helps as\n<a href=\"http://www.chemspy.com/chemistry-news/googling-inchikeys.html\">sort of prefiltering</a>.\n<a href=\"http://usefulchem.blogspot.com/2007/07/indexing-molecules-in-second-life.html#c9154415358635329736\">Google does a quite decent lookup of InChIs nowadays</a>,\nand there is a growing amount of semantic markup of InChIs like <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/12/10/including-smiles-cml-and-inchi-in.html\">use of microformats <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\nas RDF/RDFa, stored in HTML @alt attributes, <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/08/24/jchempaint-too-png-embedded.html\">embedded in PNG images <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nto address the issues of the InChI length.</p>\n\n<p>Two final comments, and I hope Alan, Steve, Igor, Steve and Dmitrii will pick this up:</p>\n\n<ol>\n  <li>the InChIKey lost the version layer, which will cause trouble when the InChI moves to a next version (as in\nInChI=2/…. I would really like to see InChIKey=1/RYYVLZVUVIJVGH-UHFFFAOYAW as key instead.</li>\n  <li>an online service to validate the key using the check character would be most welcome</li>\n</ol>\n\n<h2 id=\"lgpl-license\">LGPL license</h2>\n\n<p>Not reported in the above announcement is the fact that this release also addresses a issue brought forward by the\nopensource community. License ambiguity has been addressed, and it is reported that the release now clearly states the LGPL license in the distribution as well as source code headers. This will make packaging for, for example, Linux distributions possible.</p>\n\n<h2 id=\"modularization\">Modularization</h2>\n\n<p>One of the reasons why there has not been a Java port developed was the lack of modularization in the InChI software. This apparently has now been added, and I am very interested in reading about the effective modules available now. In particular, the canonicalization is interesting. The resulting atom ordering find its use in chemoinformatics algorithms, and a standard for that is most welcome.</p>\n\n<p>Maybe now is the time to develop a Java version of the software.</p>",
      "summary": "The IUPAC/NIST team made a beta release of the next InChI software release:",
      
      "date_published": "2007-09-07T00:10:00+00:00",
      "date_modified": "2025-01-27T00:00:00+00:00",
      "tags": ["inchi","openscience"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/5q96w-9e910",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/09/07/double-charging-your-readers-quite.html",
      "title": "Double-charging your readers: quite unacceptable indeed",
      "content_html": "<p><a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/\">Peter</a> has been doing an excellent job in advocating\n<a href=\"https://blueobelisk.github.io/odosos.html\">ODOSOS <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\nand one of his posts even <a href=\"http://yro.slashdot.org/article.pl?sid=07/09/04/1341248\">hit Slashdot</a>.</p>\n\n<p>Meanwhile, blogspace has been flooded with dislike of the <a href=\"http://web.archive.org/web/20071005133015/http://www.prismcoalition.org/\">PRISM intiative <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>\n(e.g. see also the <a href=\"https://doi.org/10.63485/p636w-2cx89\">other Peter’s blog <i class=\"fa-solid fa-recycle fa-xs\"></i></a>). The website is so sad, it is almost funny again; but on second\nthought, it is so sad, you wonder the world will end because of WOIII or because of a total halt of scientific progress. It’s so sad,\nit is hard to decide between the real webpage and <a href=\"http://web.archive.org/web/20071028151743/http://pisdcoalition.org/\">this parody <i class=\"fa-solid fa-box-archive fa-xs\"></i></a> which is the fake one.</p>\n\n<p>Wiley seems to be the king of commercial exploitation. While the sue over 6 data points seemed to be an incident, they now try to get\ntheir reading public pay twice for published material: once for reading the paper (well, if you exclude incidental,\noh-I-m-sorry-our-IT-department-messed-up attempts to have readers pay for open access papers; or was that another publisher?), and\n<a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=554\">once for accessing the data (spectra) in that paper</a>.</p>\n\n<h2 id=\"update\">Update</h2>\n\n<p>I am likely a bit too harsh on Wiley here. They do and have done an excellent job on dissemination of scientific knowledge. I\njust think that it would suit them well to allow taking advantage of current ICT/chemoinformatics technologies to improve the\nadvance of science; I would say that should be a goal of a scientific publisher. Instead, they do not give explicit permission\nto reuse data from their publications, unless it involves the commercial exploitation of that database. Sure, curation is\nexpensive, but chemoinformatics has advanced, and <em>very much</em> can be done with an uncurated database. There are enough people\ninterested in setting up free databases, without that costing Wiley a penny. Why not allow that? Wiley is surely aware of this\ninterest, so it is there turn now to act.</p>",
      "summary": "Peter has been doing an excellent job in advocating ODOSOS , and one of his posts even hit Slashdot.",
      
      "date_published": "2007-09-07T00:00:00+00:00",
      "date_modified": "2025-07-30T00:00:00+00:00",
      "tags": ["openaccess","publishing"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.63485/p636w-2cx89", "doi": "10.63485/p636w-2cx89"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/xa92q-jv085",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/09/02/jchempaint-hack-thon.html",
      "title": "A JChemPaint Hack-a-thon",
      "content_html": "<p><a href=\"http://progz-jchem.blogspot.com/\">Niels</a> and I held a JChemPaint <a href=\"http://wiki.cubic.uni-koeln.de/cdkwiki/doku.php?id=jcp_hack_20070902\">hack-a-thon</a>\ntoday (<a href=\"http://moritz.faui2k3.org/irclog/out.pl?channel=cdk;date=2007-09-02\">the IRC log</a>). We had a quite ambitious agenda:</p>\n\n<ul>\n  <li>make the renderer modular</li>\n  <li>make the controller modular</li>\n  <li>make a controller interface with Swing + SWT implementations</li>\n</ul>\n\n<p>All this to make the <a href=\"http://www.mdpi.org/molecules/html/50100093.htm\">JChemPaint</a> editor module of the <a href=\"https://doi.org/10.1021/ci025584y\">CDK</a>\nmore easily integrate with non-Swing widget environments. We achieved to make about 50% of these goals: the controller is now modular, and the\n<a href=\"http://cdk.svn.sf.net/svnroot/cdk/trunk/cdk/src/org/openscience/cdk/controller/Controller2DHub.java\">Controller2DHub</a> (soon going to deprecate\nthe old <a href=\"http://cdk.svn.sf.net/svnroot/cdk/trunk/cdk/src/org/openscience/cdk/controller/Controller2D.java\">Controller2D</a>) no longer receives\nSwing mouse events, but local events by implementing the new <a href=\"http://cdk.svn.sf.net/svnroot/cdk/trunk/cdk/src/org/openscience/cdk/controller/IMouseEventRelay.java\">IMouseEventRelay</a>\ninterface.</p>\n\n<p>Controller modules implement the new <a href=\"http://cdk.svn.sf.net/svnroot/cdk/trunk/cdk/src/org/openscience/cdk/controller/IController2DModule.java\">IController2DModule</a>\ninterface. This modularization allows a clean up of CDK source code, making it more readable and easier to maintain. This was attempted in the past by setting up\nan <a href=\"http://cdk.svn.sf.net/svnroot/cdk/trunk/cdk/src/org/openscience/cdk/controller/AbstractController2D.java\">AbstractController2D</a> and a\n<a href=\"http://cdk.svn.sf.net/svnroot/cdk/trunk/cdk/src/org/openscience/cdk/controller/SimpleController2D.java\">SimpleController2D</a>. The new approach,\nhowever, allows to make separate modules for each rendering mode, which are independent anyway. The old code still needs to be ported to the\nnew architecture, and this is expected to happen in the next two weeks.</p>\n\n<p>Another clean up in the architecture is that the controller modules no longer directly act on the IChemModel, but use a new (badly named)\n<a href=\"http://cdk.svn.sf.net/svnroot/cdk/trunk/cdk/src/org/openscience/cdk/controller/IChemModelRelay.java\">IChemModelRelay</a> interface, making\nthe architecture more closely adhere to the <a href=\"http://en.wikipedia.org/wiki/Model-view-controller\">MVC concept</a>. The IChemModelRelay API\ncurrently contains only two methods, but this is expected to expend considerably, because all current JChemPaint edit actions will\nhave to be passed via this interface.</p>\n\n<p>If you want to give the new architecture a test run, look for the <a href=\"http://cdk.svn.sf.net/svnroot/cdk/trunk/cdk/src/org/openscience/cdk/renderer/progz/TestEditor.java\">TestEditor</a>\napplication. At the time of writing, it uses a demo module, the <a href=\"http://cdk.svn.sf.net/svnroot/cdk/trunk/cdk/src/org/openscience/cdk/renderer/progz/DumpClosestObjectToSTDOUTModule.java\">DumpClosestObjectToSTDOUTModule</a>,\nwhich dumps the nearest IAtom to STDOUT.</p>",
      "summary": "Niels and I held a JChemPaint hack-a-thon today (the IRC log). We had a quite ambitious agenda:",
      
      "date_published": "2007-09-02T00:00:00+00:00",
      "date_modified": "2025-02-16T00:00:00+00:00",
      "tags": ["jchempaint"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1021/CI025584Y", "doi": "10.1021/CI025584Y"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/ppk0b-r0607",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/08/28/nxclient-on-ubuntu-gutsy.html",
      "title": "NXClient on Ubuntu Gutsy",
      "content_html": "<p>If you, like me, already upgrade to <a href=\"http://www.ubuntu.com/\">Ubuntu</a> <a href=\"https://wiki.ubuntu.com/GutsyGibbon\">Gutsy</a>,\nand use <a href=\"http://www.nomachine.com/download.php\">nxclient</a> for remote login (highly recommended, though proprietary code),\nyou might run into the problem that the login no longer works, returning the message “Cannot find KDE environment.”. Ubuntu’s\n<a href=\"http://www.launchpad.net/\">Lauchpad</a> (generally an excellent service) was rather uncooperative and disregarded a bug report\nabout the problem, I found the solution with <code class=\"language-plaintext highlighter-rouge\">grep -ri kde /usr/NX</code>:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>/usr/NX/etc/node.cfg:CommandStartKDE=\"/usr/bin/dbus-launch --exit-with-session startkde\"\n</code></pre></div></div>\n\n<p>The solution was to:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nb\">sudo </span>aptitude <span class=\"nb\">install </span>dbus-x11\n</code></pre></div></div>\n\n<p>which contains the <code class=\"language-plaintext highlighter-rouge\">dbus-launch</code> executable, formerly found in the <em>dbus</em> package itself. I assume it works for\n“Cannot find GNOME environment” too.</p>",
      "summary": "If you, like me, already upgrade to Ubuntu Gutsy, and use nxclient for remote login (highly recommended, though proprietary code), you might run into the problem that the login no longer works, returning the message “Cannot find KDE environment.”. Ubuntu’s Lauchpad (generally an excellent service) was rather uncooperative and disregarded a bug report about the problem, I found the solution with grep -ri kde /usr/NX:",
      
      "date_published": "2007-08-28T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["linux","kde"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/3j307-5v841",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/08/27/xcms-on-ubuntu-feisty.html",
      "title": "XCMS on Ubuntu Feisty",
      "content_html": "<p>I just installed <a href=\"http://metlin.scripps.edu/download/packages/xcms_1.9.2.tar.gz\">XCMS 1.9.2</a> on my Ubuntu system.\n<a href=\"http://metlin.scripps.edu/download/\">XCMS</a> is a GPL-ed <a href=\"http://www.r-project.org/\">R</a> package for metabolomics data analysis.\nJust for the record, you need to install the <a href=\"http://ubuntuguide.org/wiki/Ubuntu:Feisty\">Feisty</a> packages for\n<a href=\"http://packages.ubuntu.com/feisty/source/netcdf\">NetCDF</a>:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nb\">sudo </span>aptitude <span class=\"nb\">install </span>netcdfg-dev libnetcdf3\nR CMD INSTALL <span class=\"nt\">--library</span><span class=\"o\">=</span>/usr/local/lib/R/site-library xcms_1.9.2.tar.gz\n</code></pre></div></div>",
      "summary": "I just installed XCMS 1.9.2 on my Ubuntu system. XCMS is a GPL-ed R package for metabolomics data analysis. Just for the record, you need to install the Feisty packages for NetCDF:",
      
      "date_published": "2007-08-27T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["linux","metabolomics"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/rxpr6-t1772",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/08/24/jchempaint-too-png-embedded.html",
      "title": "JChemPaint too: PNG embedded connectivity tables",
      "content_html": "<p>Rich blogged about Firefly <a href=\"https://doi.org/10.59350/j026p-17z02\">embedding MDL molfiles in PNG images <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\nwhich I found <a href=\"https://doi.org/10.59350/wgy8j-brx45\">really <i class=\"fa-solid fa-recycle fa-xs\"></i></a> cool.\nRich and Noel later showed how that metadata <a href=\"https://doi.org/10.59350/wgy8j-brx45\">can be retrieved again <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\npossibly <a href=\"http://baoilleach.blogspot.com/2007/08/access-embedded-molecular-information.html\">with Python</a>.</p>\n\n<p>But I did not like that <a href=\"http://depth-first.com/articles/tag/firefly\">Firefly</a> could do this, and <a href=\"http://www.mdpi.org/molecules/html/50100093.htm\">JChemPaint</a> not.\nSo, I started hacking. First I discovered I had to get rid of the use of <a href=\"http://java.sun.com/javase/technologies/desktop/media/jai/\">JAI</a>; then I had to adapt the\nJChemPaintPanel <code class=\"language-plaintext highlighter-rouge\">takeSnaphot()</code> API to return a <code class=\"language-plaintext highlighter-rouge\">RendererImage</code>; and finally, I had to figure out how to write the extra metadata. Now, Firefly is not opensource\n(yet), so it took me some time to figure out how that was done, and this is how:</p>\n\n<div class=\"language-java highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nc\">ImageWriter</span> <span class=\"n\">writer</span> <span class=\"o\">=</span> <span class=\"nc\">ImageIO</span><span class=\"o\">.</span><span class=\"na\">getImageWriters</span><span class=\"o\">(</span>\n  <span class=\"k\">new</span> <span class=\"nf\">ImageTypeSpecifier</span><span class=\"o\">(</span><span class=\"n\">awtImage</span><span class=\"o\">),</span> <span class=\"s\">\"png\"</span>\n<span class=\"o\">).</span><span class=\"na\">next</span><span class=\"o\">();</span>\n<span class=\"nc\">ImageTypeSpecifier</span> <span class=\"n\">specifier</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"nc\">ImageTypeSpecifier</span><span class=\"o\">(</span><span class=\"n\">awtImage</span><span class=\"o\">);</span>\n<span class=\"nc\">IIOMetadata</span> <span class=\"n\">meta</span> <span class=\"o\">=</span> <span class=\"n\">writer</span><span class=\"o\">.</span><span class=\"na\">getDefaultImageMetadata</span><span class=\"o\">(</span> <span class=\"n\">specifier</span><span class=\"o\">,</span> <span class=\"kc\">null</span> <span class=\"o\">);</span>\n\n<span class=\"nc\">Node</span> <span class=\"n\">node</span> <span class=\"o\">=</span> <span class=\"n\">meta</span><span class=\"o\">.</span><span class=\"na\">getAsTree</span><span class=\"o\">(</span> <span class=\"s\">\"javax_imageio_png_1.0\"</span> <span class=\"o\">);</span>\n<span class=\"nc\">IIOMetadataNode</span> <span class=\"n\">tExtNode</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"nc\">IIOMetadataNode</span><span class=\"o\">(</span><span class=\"s\">\"tEXt\"</span><span class=\"o\">);</span>\n<span class=\"nc\">IIOMetadataNode</span> <span class=\"n\">tExtEntryNode</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"nc\">IIOMetadataNode</span><span class=\"o\">(</span><span class=\"s\">\"tEXtEntry\"</span><span class=\"o\">);</span>\n<span class=\"n\">tExtEntryNode</span><span class=\"o\">.</span><span class=\"na\">setAttribute</span><span class=\"o\">(</span> <span class=\"s\">\"keyword\"</span><span class=\"o\">,</span> <span class=\"s\">\"molfile\"</span> <span class=\"o\">);</span>\n<span class=\"n\">tExtEntryNode</span><span class=\"o\">.</span><span class=\"na\">setAttribute</span><span class=\"o\">(</span> <span class=\"s\">\"value\"</span><span class=\"o\">,</span> <span class=\"n\">mdlMolfile</span><span class=\"o\">);</span>\n<span class=\"n\">tExtNode</span><span class=\"o\">.</span><span class=\"na\">appendChild</span><span class=\"o\">(</span><span class=\"n\">tExtEntryNode</span><span class=\"o\">);</span>\n<span class=\"n\">node</span><span class=\"o\">.</span><span class=\"na\">appendChild</span><span class=\"o\">(</span><span class=\"n\">tExtNode</span><span class=\"o\">);</span>\n<span class=\"n\">meta</span><span class=\"o\">.</span><span class=\"na\">mergeTree</span><span class=\"o\">(</span><span class=\"s\">\"javax_imageio_png_1.0\"</span><span class=\"o\">,</span> <span class=\"n\">node</span><span class=\"o\">);</span>\n<span class=\"nc\">ImageOutputStream</span> <span class=\"n\">ios</span> <span class=\"o\">=</span> <span class=\"nc\">ImageIO</span><span class=\"o\">.</span><span class=\"na\">createImageOutputStream</span><span class=\"o\">(</span>\n  <span class=\"k\">new</span> <span class=\"nf\">FileOutputStream</span><span class=\"o\">(</span><span class=\"n\">filename</span><span class=\"o\">)</span>\n<span class=\"o\">);</span>\n<span class=\"n\">writer</span><span class=\"o\">.</span><span class=\"na\">setOutput</span><span class=\"o\">(</span><span class=\"n\">ios</span><span class=\"o\">);</span>\n<span class=\"n\">writer</span><span class=\"o\">.</span><span class=\"na\">write</span><span class=\"o\">(</span> <span class=\"n\">meta</span><span class=\"o\">,</span> <span class=\"k\">new</span> <span class=\"nc\">IIOImage</span><span class=\"o\">(</span><span class=\"n\">awtImage</span><span class=\"o\">,</span> <span class=\"kc\">null</span><span class=\"o\">,</span> <span class=\"n\">meta</span><span class=\"o\">),</span> <span class=\"kc\">null</span> <span class=\"o\">);</span>\n</code></pre></div></div>\n\n<p>Now I can create my own test files for the <a href=\"http://neksa.blogspot.com/2007/08/strigi-now-extracts-chemical.html\">Strigi’s ability to extract chemical metadata from PNG images</a>.\nHere is the JChemPaint generator PNG image for <a href=\"http://en.wikipedia.org/wiki/Benzophenone\">benzophenone</a>:</p>\n\n<p><img src=\"/assets/images/mdlTest.png\" alt=\"\" /></p>\n\n<p>Another issue, unrelated to this patch, is that writing PNG images changes the location of the structure in the JChemPaint editor,\nand that the placing of the element symbol in image writing is seriously broken. But that will soon be solved with\n<a href=\"https://progz-jchem.blogspot.com/\">Niels’ new renderer</a>.</p>\n\n<p>The metadata looks like:</p>\n\n<p><img src=\"/assets/images/jcpPNGmolfile.png\" alt=\"\" /></p>\n\n<p>(Newlines are lost in the XML display.)</p>\n\n<p>JChemPaint does not yet write InChIs, and it also does not open PNG images for input yet (as Firefly does).</p>",
      "summary": "Rich blogged about Firefly embedding MDL molfiles in PNG images , which I found really cool. Rich and Noel later showed how that metadata can be retrieved again , possibly with Python.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/jcpPNGmolfile.png",
      "date_published": "2007-08-24T00:00:00+00:00",
      "date_modified": "2025-02-08T00:00:00+00:00",
      "tags": ["jchempaint","cheminf"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.59350/j026p-17z02", "doi": "10.59350/j026p-17z02"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.59350/wgy8j-brx45", "doi": "10.59350/wgy8j-brx45"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/4k6ht-k2z12",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/08/24/automatic-classification-of-thousands.html",
      "title": "Automatic Classification of thousands of Crystal Structures",
      "content_html": "<p>Clustering and classification of crystal structures is hot. Parkin hit the <a href=\"http://www.rsc.org/Publishing/Journals/CE/article.asp?doi=b710869a\">front cover</a>\nof <a href=\"http://www.rsc.org/Publishing/Journals/ce/\">CrystEngComm</a> with a story on <em>Comparing entire crystal structures: structural genetic fingerprinting</em>\n(DOI:<a href=\"https://doi.org/10.1039/b704177b\">10.1039/b704177b</a>). Now, the story itself, while rather interesting and well written, has three major flaws:</p>\n\n<ol>\n  <li>the data set it way too small</li>\n  <li>the proposed proof-of-concept is not novel at all</li>\n  <li>they do not cite me</li>\n</ol>\n\n<p>Well, the latter sounds a bit boohoo, and it is :) (BTW, I do like this paper.)</p>\n\n<p>They propose the work as proof-of-concept, but use a very artificial data set of only 12 crystal structures (<a href=\"http://en.wikipedia.org/wiki/Benzene\">benzene</a>\nand eleven <a href=\"http://en.wikipedia.org/wiki/Polycyclic_aromatic_hydrocarbon\">polycyclic aromatic hydrocarbons</a>, like\n<a href=\"http://en.wikipedia.org/wiki/Naphthalene\">naphtalene</a>, <a href=\"http://en.wikipedia.org/wiki/Anthracene\">anthracene</a>,\n<a href=\"http://en.wikipedia.org/wiki/Phenanthrene\">phenanthrene</a>, <a href=\"http://en.wikipedia.org/wiki/Triphenylene\">triphenylene</a>,\n<a href=\"https://en.wikipedia.org/wiki/Pyrene\">pyrene</a>, <a href=\"https://en.wikipedia.org/wiki/Perylene\">perylene</a>, and <a href=\"https://en.wikipedia.org/wiki/Coronene\">coronene</a>).\nWhile such a small set does make a nice example where you can still list all similarities (<code class=\"language-plaintext highlighter-rouge\">0.5*N*(N-1)</code>), it is really too artificial.</p>\n\n<p>Now, you may wonder if I am in the position to criticize this shortcoming, but I think I am. As part of my PhD\nwork, I analyzed this problem myself, and published two years ago the paper <em>Method for the computational comparison\nof crystal structures</em> (DOI:<a href=\"https://doi.org/10.1107/S0108768104028344\">10.1107/S0108768104028344</a>). Apparently,\nParkin was not aware of this publication and did not cite it. I should have went to a crystallography conference\nwith a poster, and advertise my work more. In this paper, I analyzed a data set with 48 crystal structures, manually\nvalidated by visual inspection, resulting in having to compare 1128! crystal structure pairs. Took me two full weeks\nbehind a Silicon Graphics. Yes, I really understand why they took only 12 structures :)</p>\n\n<p>However, there is more prior art. While my approach was based on a new radial distibution function-based whole\ncrystal structure descriptor, my supervisor (<a href=\"http://www.cac.science.ru.nl/people/rwehrens/index.html\">Ron</a>) used\nthe more common powder diffraction pattern and showed in <em>Representing Structural Databases in a Self-Organising Map</em>\n(DOI:<a href=\"https://doi.org/10.1107/S0108768105020331\">10.1107/S0108768105020331</a>) it to be a good enough descriptor for\nclustering of thousands of crystal structures using a <a href=\"http://en.wikipedia.org/wiki/Self-organizing_map\">self-organizing map</a>\n(SOM).</p>\n\n<p>Last week, my second paper in crystallography appeared: <em>Supervised Self-Organizing Maps in Crystal Property and\nStructure Prediction</em> (DOI:<a href=\"https://doi.org/10.1021/cg060872y\">10.1021/cg060872y</a>). In this paper, we show how\nsupervised SOMs (see DOI:<a href=\"https://doi.org/10.1016/j.chemolab.2006.02.003\">10.1016/j.chemolab.2006.02.003</a>) can be\nused for supervised classification and even for property prediction. Note that these supervised SOMs are <em>truly</em>\nsupervised, unlike many earlier modifications of the unsupervised SOMs: the training is supervised.</p>\n\n<p>Finally, another advantage of this last work: the code is open source. The code for the unsupervised SOMs is available as\n<a href=\"http://r-project.org/\">R</a> package: <a href=\"http://cran.r-project.org/src/contrib/Descriptions/kohonen.html\">kohonen</a>; and for\npowder diffraction patterns: <a href=\"http://cran.r-project.org/src/contrib/Descriptions/wccsom.html\">wccsom</a>. Details can be found in\n<a href=\"http://cran.r-project.org/doc/Rnews/Rnews_2006-3.pdf\">this R News issue</a>. The first package is not actually limited to\ncrystal structures, and can be used for any clustering problem. However, the articles mentioned here make use of simulated\ndiffraction patters, and I am not sure there are open source tools to generate those.</p>\n\n<p>BTW, I would still be interested in teaming up with <a href=\"http://wwmm.ch.cam.ac.uk/crystaleye/index.html\">CrystalEye</a> in\none way or another, and couple these data analysis methods to live streams of new crystal structures. Nick, let me\nknow if you are interesting in idea exchange.</p>\n\n<p>Getting back to Parkin’s paper, I do like the work. Hirshfield surfaces are an interesting tool to visualize packing\ncharacteristics, and using them to describe a crystal structure sounds like an interesting idea indeed. I just hope\nthat the method properly scales.</p>",
      "summary": "Clustering and classification of crystal structures is hot. Parkin hit the front cover of CrystEngComm with a story on Comparing entire crystal structures: structural genetic fingerprinting (DOI:10.1039/b704177b). Now, the story itself, while rather interesting and well written, has three major flaws:",
      
      "date_published": "2007-08-24T00:00:00+00:00",
      "date_modified": "2025-02-16T00:00:00+00:00",
      "tags": ["crystal"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1039/b704177b", "doi": "10.1039/b704177b"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1107/S0108768104028344", "doi": "10.1107/S0108768104028344"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1107/S0108768105020331", "doi": "10.1107/S0108768105020331"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1021/CG060872Y", "doi": "10.1021/CG060872Y"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1016/j.chemolab.2006.02.003", "doi": "10.1016/j.chemolab.2006.02.003"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/ve3jf-qk712",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/08/22/dapagliflozin-molecular-structure.html",
      "title": "Dapagliflozin: the molecular structure",
      "content_html": "<p>An anonymous reader <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/03/14/what-is-dapagliflozin.html\">reported <i class=\"fa-solid fa-recycle fa-xs\"></i></a> that the\n<a href=\"http://www.ama-assn.org/\">American Medical Association</a> <a href=\"http://www.ama-assn.org/ama1/pub/upload/mm/365/dapagliflozin.pdf\">published</a>\nthe structure of dapagliflozin. Here are the details.</p>\n\n<p><img src=\"/assets/images/dapagliflozin.png\" alt=\"\" /></p>\n\n<p>The full name is <em>(2S,3R,4R,5S,6R)-2- [4-chloro-3-(4-ethoxybenzyl)phenyl]-6-(hydroxymethyl)tetrahydro-2H-pyran-3,4,5-triol</em>\nand the PDF report the CAS number <code class=\"language-plaintext highlighter-rouge\">461432-26-8</code>, and\nInChI=1S/C21H25ClO6/c1-2-27-15-6-3-12(4-7-15)9-14-10-13(5-8-16(14)22)21-20(26)19(25)18(24)17(11-23)28-21/h3-8,10,17-21,23-26H,2,9,11H2,1H3/t17-,18-,19+,20-,21+/m1/s1.</p>\n\n<p>I have added this information to Wikipedia, see the <a href=\"http://en.wikipedia.org/wiki/Dapagliflozin\">Dapagliflozin</a> entry.</p>",
      "summary": "An anonymous reader reported that the American Medical Association published the structure of dapagliflozin. Here are the details.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/dapagliflozin.png",
      "date_published": "2007-08-22T00:10:00+00:00",
      "date_modified": "2025-03-30T00:00:00+00:00",
      
      "_references": [
        
          
          
            { "url": "https://doi.org/10.59350/3et2a-pkv75", "doi": "10.59350/3et2a-pkv75"
            , "cito":
              
              
                [ 
                  "repliesTo"
                  
                 ]
              
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/w3eex-30y69",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/08/22/operator-08-released-new-sechemtic-user.html",
      "title": "Operator 0.8 released: a new Sechemtic user script",
      "content_html": "<p><a href=\"http://www.kaply.com/weblog/\">Mike</a> released <a href=\"http://www.kaply.com/weblog/2007/08/21/operator-08-is-available/\">Operator 0.8</a>,\nwhich picks up RDF (RDFa en eRDF) from HTML pages, and adds actions to it. I <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/06/27/chemical-rdfa-with-operator-in-firefox.html\">blogged earlier about the beta <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nand wrote a script for it for <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/12/10/including-smiles-cml-and-inchi-in.html\">chemical RDFa <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.\nAt this moment, <a href=\"http://cb.openmolecules.net/\">Chemical blogspace</a> and <a href=\"http://rdf.openmolecules.net/?InChI=1/CH4/h1H4\">RDF for Molecular Space</a>\n(see <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/07/31/rdf-ing-molecular-space.html\">this blog <i class=\"fa-solid fa-recycle fa-xs\"></i></a>) are using chemical RDFa to semantically markup molecular information.</p>\n\n<p>The new Operator release (<a href=\"https://addons.mozilla.org/en-US/firefox/addon/4106\">download</a>) has one notable API change:\nit now uses “RDF” as key for semantic information; the add-on now supports eRDF too. So, when installing or updating\nto version 0.8, you also need to update the Sechemtic user script to <a href=\"http://blueobelisk.svn.sf.net/svnroot/blueobelisk/operator/tags/1.1/sechemtic_rdfa_operator.js\">version 1.1</a>\n<a href=\"http://blueobelisk.svn.sf.net/svnroot/blueobelisk/operator/tags/\">or better</a>.</p>\n\n<p>Installing Operator scripts is a bit more work than Greasemonkey userscripts. Save the script to your home directory,\nor any other place you can easily find on the hard disk. After installing the Operator add-on, click the <em>Options</em> button:</p>\n\n<p><img src=\"/assets/images/options.png\" alt=\"\" /></p>\n\n<p>For the RDFa script to work, you need to make sure that the <em>Display style</em> is set to <em>Data formats</em>:</p>\n\n<p><img src=\"/assets/images/options1.png\" alt=\"\" /></p>\n\n<p>Then you can go to the <em>User Scripts</em> tab, and use the <em>New</em> button to add the script you downloaded and saved to your hard disk earlier:</p>\n\n<p><img src=\"/assets/images/options2.png\" alt=\"\" /></p>\n\n<p>Then, after rebooting Firefox (looks like MS-Windows :(), you can go to Chemical blogspace and\n<a href=\"http://cb.openmolecules.net/inchis.php\">look up molecules</a>, and see output like that described in\n<a href=\"http://chemicalblogspace.blogspot.com/2007/06/rdfa-operator-in-action-on-cb.html\">RDFa Operator in action on Cb</a>.</p>",
      "summary": "Mike released Operator 0.8, which picks up RDF (RDFa en eRDF) from HTML pages, and adds actions to it. I blogged earlier about the beta and wrote a script for it for chemical RDFa . At this moment, Chemical blogspace and RDF for Molecular Space (see this blog ) are using chemical RDFa to semantically markup molecular information.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/options1.png",
      "date_published": "2007-08-22T00:00:00+00:00",
      "date_modified": "2025-03-30T00:00:00+00:00",
      "tags": ["semweb","chemistry"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/f33et-e6n35",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/08/13/touchgraphing-my-blog.html",
      "title": "Touchgraphing my blog",
      "content_html": "<p>Via <a href=\"https://web.archive.org/web/20071101070909/http://www.lexical.org.uk/planetscifoo/\">SciFoo Planet <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>\n(from <a href=\"https://pimm.wordpress.com/2007/08/11/scifoo-links-visualized-by-touchgraph-google-browser/\">Partial immortalization <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>)\nI learned about <a href=\"http://www.touchgraph.com/TGGoogleBrowser.html\">TouchGraph Google</a> (Peter\n<a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=496\">brought it into Chemical blogspace</a>).\nIt’s cool, though not open source. Here’s the touch graph for my blog:</p>\n\n<p><img src=\"/assets/images/touchGraph.png\" alt=\"\" /></p>\n\n<p>As you can see, plenty of <a href=\"https://www.blogspot.com\">blogspot</a> bloggers around me, among which,\nin purple, <a href=\"http://usefulchem.blogspot.com/\">Useful Chemistry</a>. Funny thing is, each time I\nrepeat the Google search, the output is different. Oh, and make sure to drag one of the halos\naround; that will keep you procrastinating for the whole afternoon :)</p>",
      "summary": "Via SciFoo Planet (from Partial immortalization ) I learned about TouchGraph Google (Peter brought it into Chemical blogspace). It’s cool, though not open source. Here’s the touch graph for my blog:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/touchGraph.png",
      "date_published": "2007-08-13T00:10:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["blog"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/y11ff-5he48",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/08/13/centralized-or-decentralized.html",
      "title": "Centralized or decentralized?",
      "content_html": "<p><a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/\">Peter</a> wondered if data should be stored <a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=497\">centralized or decentralized</a>,\nwhen <a href=\"http://mndoci.com/blog/\">Deepak</a> <a href=\"http://mndoci.com/blog/2007/08/12/freebase-at-scifoo/\">blogged</a> about\n<a href=\"http://freebase.com/\">Freebase</a> and <a href=\"http://www.metaweb.com/\">Metaweb</a>. Now, I haven’t really looked into these\ntwo projects, but the question of centralized versus decentralized is interesting. It’s MySQL versus the world\nwide web; it’s the PubChem compound ID versus the InChI; it’s <a href=\"http://cb.openmolecules.net/rdf/?InChI=1/CH4/h1H4\">http://cb.openmolecules.net/rdf/?InChI=1/CH4/h1H4</a>\nversus <code class=\"language-plaintext highlighter-rouge\">info:inchi/InChI=1/CH4/h1H4</code> (see <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/07/31/rdf-ing-molecular-space.html\">RDF-ing molecular space <i class=\"fa-solid fa-recycle fa-xs\"></i></a>).</p>\n\n<p>Both have advantages and disadvantages (everything does). Google has a huge experience with massive data, and\nis the centralized version of the distributed world wide web. Personally, I tend towards the decentralized\nversion of things. Scales better. The chemical RDF community showed some concerns about scalability of triple\nstores (see e.g. Taylor et al. <em>Bringing Chemical Data onto the Semantic Web</em>, <strong>2006</strong>, DOI <a href=\"https://doi.org/10.1021/ci050378m\">10.1021/ci050378m</a>).\nNow, their tests went up to some 30M triples, which is barely enough to store the InChI, PubChem compound ID, and one chemical name.</p>\n\n<p>So, how would this work for molecules then? I am leaning towards a system where one can query resources about\none molecule, and work ones way through molecular space. Using KEGG, reaction databases, similarity stores,\none could move from molecule to molecule, and add bits of RDF along the way, filling a local RDF store around\nthe actual query I have in mind. For example, if I want to verify that the mass spectrum I found really belongs\nto the molecular structure I have in mind, I would look up in the resources I know about all triples that\nrelate to the putative structure, and do my queries from there. That’s what I would do… (and will do, but\nmore on that later…)</p>",
      "summary": "Peter wondered if data should be stored centralized or decentralized, when Deepak blogged about Freebase and Metaweb. Now, I haven’t really looked into these two projects, but the question of centralized versus decentralized is interesting. It’s MySQL versus the world wide web; it’s the PubChem compound ID versus the InChI; it’s http://cb.openmolecules.net/rdf/?InChI=1/CH4/h1H4 versus info:inchi/InChI=1/CH4/h1H4 (see RDF-ing molecular space ).",
      
      "date_published": "2007-08-13T00:00:00+00:00",
      "date_modified": "2025-02-16T00:00:00+00:00",
      "tags": ["inchi","semweb"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1021/ci050378m", "doi": "10.1021/ci050378m"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/se3te-3tf95",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/08/11/molecular-connectivity-tables-in-images.html",
      "title": "Molecular Connectivity Tables in Images",
      "content_html": "<p>Rich blogged about to <a href=\"https://doi.org/10.59350/wgy8j-brx45\">Never Draw the Same Molecule Twice: Viewing Image Metadata <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nin which he shows his molecular editor outputting images of molecular structure where the connectivity table\nof structure is embedded in the image. His molecular editor can read the image again, and will automatically\npick up the embedded connection table. Noel showed that such can not only be done in Java, but\n<a href=\"http://baoilleach.blogspot.com/2007/08/access-embedded-molecular-information.html\">in Python too</a>.</p>\n\n<p>This is important progress, though I would still like to see <a href=\"http://iupac.org/inchi/\">InChI</a>s in the\ndocuments, and/or the data files as supplementary information. Actually, I would even more like to\nsee that all experimental sections not just list the structure name, but give the InChI. An important\nspin-off is that when giving spectral information, the atom numbering given by InChI can be used to\nassociate NMR shifts, and IR wavenumbers to atoms and atom groups, removing the ambiguity in those\nassociations as we are used to find in literature.</p>\n\n<p>Chemistry Central is <a href=\"http://blogs.openaccesscentral.com/blogs/ccblog/entry/symyx_technologies_to_acquire_mdl\">looking into improving the submission process</a>\nfor molecular data, and hereby request the commenting on, taking into account in ongoing internal\ndiscussings, and incorporation of these approaches in the editorial requirements for CC publications:</p>\n\n<ul>\n  <li>including the connection table as metadata in images</li>\n  <li>including the InChI in experimental sections for newly synthesized molecules</li>\n  <li>use InChI atom numbering to associate NMR shifts with atoms in these experimental sections</li>\n</ul>\n\n<p>I will shortly blog an example experimental section incorporating the InChI.</p>",
      "summary": "Rich blogged about to Never Draw the Same Molecule Twice: Viewing Image Metadata in which he shows his molecular editor outputting images of molecular structure where the connectivity table of structure is embedded in the image. His molecular editor can read the image again, and will automatically pick up the embedded connection table. Noel showed that such can not only be done in Java, but in Python too.",
      
      "date_published": "2007-08-11T00:10:00+00:00",
      "date_modified": "2025-02-16T00:00:00+00:00",
      "tags": ["publishing","chemistry","inchi"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.59350/wgy8j-brx45", "doi": "10.59350/wgy8j-brx45"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/qkszs-g5j41",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/08/11/molecules-in-wikipedia-without-inchis.html",
      "title": "Molecules in Wikipedia without InChIs",
      "content_html": "<p>I reported last week about the <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/08/02/molecules-in-wikipedia.html\">Molecules in Wikipedia <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nand the plethora of templates used. <a href=\"http://cb.openmolecules.net/\">Chemical blogspace</a> has also been using\n<a href=\"http://en.wikipedia.org/\">Wikipedia</a> URLs as molecular identifier and extracting InChIs from the wiki pages (see\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2007/06/19/using-wikipedia-to-recognize-molecules.html\">Using Wikipedia to recognize Molecules in Blogspace <i class=\"fa-solid fa-recycle fa-xs\"></i></a>).\nSeveral people have shown interest in adding InChIs for molecules in Wikipedia, so here’s a new version of a\nlist it molecules without InChIs:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>http://www.en.wikipedia.org/wiki/Hydrogen_cyanide#Hydrogen_cyanide_as_a_chemical_weapon -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/P-Phenylenediamine -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Valence_%28chemistry%29 -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Nitrous_oxide -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Cytisine -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Disulfur_decafluoride -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Mescaline -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Lewisite -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Sulfur_mustard -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Tryptamine -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Interferon_beta-1a -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Methyl_isocyanate -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Anthraquinone -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Tocopherol -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Cinnamic_acid -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Tryptamine -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Psilocybin -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Alphamethyltryptamine -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Alpha-ethyltryptamine -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Allylamine -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Ergosterol -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Squalene -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Sulfur_hexafluoride -&gt; but no InChI/CID\n</code></pre></div></div>\n\n<p>Strictly speaking, the list should be longer, as the code that produced this list actually is also happy\nwhen a PubChem compound identifier (CID) is given. The previous list is also\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2007/06/19/using-wikipedia-to-recognize-molecules.html\">still online <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.</p>",
      "summary": "I reported last week about the Molecules in Wikipedia and the plethora of templates used. Chemical blogspace has also been using Wikipedia URLs as molecular identifier and extracting InChIs from the wiki pages (see Using Wikipedia to recognize Molecules in Blogspace ). Several people have shown interest in adding InChIs for molecules in Wikipedia, so here’s a new version of a list it molecules without InChIs:",
      
      "date_published": "2007-08-11T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["wikipedia","inchi"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/jv7te-h3w77",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/08/02/molecules-in-wikipedia.html",
      "title": "Molecules in Wikipedia",
      "content_html": "<p>I do not care about physical and chemical properties in <a href=\"http://wikipedia.org/\">Wikipedia</a>, as I can easily extract them from other sources.\nThe main value of Wikipedia for molecules is, I think, that it describes the history of a molecule. Additionally, the Wikipedia URL is a\nnice unique molecular identifier (for example <em><a href=\"http://en.wikipedia.org/wiki/Lactose\">http://en.wikipedia.org/wiki/Lactose</a></em>) given certain\nconditions, and many <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/06/19/using-wikipedia-to-recognize-molecules.html\">bloggers are using it as such <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.\nBut, it only is a useful identifier if one (and only one) InChI is stated on the wiki page.</p>\n\n<p>Now that I am <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/07/31/rdf-ing-molecular-space.html\">RDF-ing molecular space <i class=\"fa-solid fa-recycle fa-xs\"></i></a>, I was\n<a href=\"http://del.icio.us/url/e24b896a3398220b76d47f59dbdc2634\">again</a> interested in <a href=\"http://dbpedia.org/docs/\">dbpedia</a>, a RDF version of Wikipedia.\nSee these two <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/06/19/quality-of-chemical-database.html\">blog <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\n<a href=\"http://radar.oreilly.com/archives/2007/03/different_appro_1.html\">items</a> and Peter’s very nice\n<a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=333\">dbpedia, RDF and SPARQL - for chemistry</a> item.\n<a href=\"http://www.scs.carleton.ca/~cleger\">Christian</a> is picking this up, and extending dbpedia for support for the various chemical boxes.</p>\n\n<h2 id=\"wikipedia-templates\">Wikipedia Templates</h2>\n\n<p>I have spotted a couple of templates: <a href=\"http://en.wikipedia.org/w/index.php?title=Template:Drugbox\">Drugbox</a>,\n<a href=\"http://en.wikipedia.org/w/index.php?title=Template:Chembox\">Chembox</a>, <a href=\"http://en.wikipedia.org/w/index.php?title=Template:Chembox_new\">Chembox new</a>,\nof which the last one seems to most recent, and has extensions for explosives and drugs. The\n<a href=\"http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Chemicals\">WikiProject Chemicals</a> does not mention it though. Anyone who knows the status?\nIs <em>chembox new</em> the way forward and going to replace the older <em>chembox</em>? I hope so, because only the newer one has InChI in\nthe last of official fields. Or is <em>chembox new</em> simply an extension of <em>chembox</em> itself?</p>\n\n<p>Somewhere between 1000 and 1500 entries use the <em>chembox new</em> and another 1000 to 1500 use <em>chembox</em> but I assume there is\nconsiderable overlap. Additionally, Christian noted that there still seem to be molecules in Wikipedia which do not use a\ntemplate at all, and counted some 1900 molecules using various lists. If you you want to keep a more close eye on chemistry in\ndbpedia, you should register to the <a href=\"http://sourceforge.net/mailarchive/forum.php?forum_name=dbpedia-discussion\">dbpedia-discussion</a>\nmailing list.</p>",
      "summary": "I do not care about physical and chemical properties in Wikipedia, as I can easily extract them from other sources. The main value of Wikipedia for molecules is, I think, that it describes the history of a molecule. Additionally, the Wikipedia URL is a nice unique molecular identifier (for example http://en.wikipedia.org/wiki/Lactose) given certain conditions, and many bloggers are using it as such . But, it only is a useful identifier if one (and only one) InChI is stated on the wiki page.",
      
      "date_published": "2007-08-02T00:00:00+00:00",
      "date_modified": "2025-08-10T00:00:00+00:00",
      "tags": ["chemistry","wikipedia","rdf","inchi"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/81926-4bz44",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/08/01/excel-messes-up-your-data-analysis.html",
      "title": "Excel messes up your data analysis :)",
      "content_html": "<p>Well, no wonder: Excel is meant to be used to process money flows. Anyway, <a href=\"http://del.icio.us/greyarea\">greyarea</a> pointed me to\n<a href=\"http://itre.cis.upenn.edu/~myl/languagelog/archives/002912.html\">this nice blog item</a> from March 2006. It discusses a 2004 article in\n<a href=\"http://www.biomedcentral.com/bmcbioinformatics\">BMC Bioinformatics</a> <em>Mistaken Identifiers: Gene name errors can be introduced\ninadvertently when using Excel in bioinformatics</em> by Barry Zeeberg et al. (DOI:<a href=\"https://doi.org/10.1186/1471-2105-5-80\">10.1186/1471-2105-5-80</a>).\nHence, the importance of semantics and proper markup languages. The quotes are illustrative:</p>\n\n<blockquote>\n  <p>When we were beta-testing [two new bioinformatics programs] on microarray data, a frustrating problem occurred repeatedly: Some\ngene names kept bouncing back as “unknown.” A little detective work revealed the reason: … A default date conversion feature in\nExcel … was altering gene names that it considered to look like dates. For example, the tumor suppressor DEC1 [Deleted in\nEsophageal Cancer 1] was being converted to ‘1-DEC.’ Figure 1 lists 30 gene names that suffer an analogous fate.<br /><br /></p>\n\n  <p>…<br /><br /></p>\n\n  <p>There is another default conversion problem for RIKEN clone identifiers identifiers of the form nnnnnnnEnn, where n denotes a\ndigit. These identifiers are comprised of the serial number of the plate that contains the library, information on plate status,\nand the address of the clone. A search … identified more than 2,000 such identifiers out of a total set of 60,770. For example,\nthe RIKEN identifier “2310009E13” was converted irreversibly to the floating-point number “2.31E+13.” A non-expert user might\nwell fail to notice that approximately 3% of the identifiers on a microarray with tens of thousands of genes had been converted\nto an incorrect form, yet the potential for 2,000 identifiers to be transmogrified without notice is a considerable concern. Most\nimportant, these conversions to an internal date representation or floating-point number format are irreversible; the original\ngene name cannot be recovered.</p>\n</blockquote>\n\n<p>Is this the article that made all bioinformaticians turn to R?</p>",
      "summary": "Well, no wonder: Excel is meant to be used to process money flows. Anyway, greyarea pointed me to this nice blog item from March 2006. It discusses a 2004 article in BMC Bioinformatics Mistaken Identifiers: Gene name errors can be introduced inadvertently when using Excel in bioinformatics by Barry Zeeberg et al. (DOI:10.1186/1471-2105-5-80). Hence, the importance of semantics and proper markup languages. The quotes are illustrative:",
      
      "date_published": "2007-08-01T00:00:00+00:00",
      "date_modified": "2025-02-22T00:00:00+00:00",
      "tags": ["bioinfo","excel"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1186/1471-2105-5-80", "doi": "10.1186/1471-2105-5-80"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/gh6e5-t2g74",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/07/31/optical-chemical-structure-recognition.html",
      "title": "Optical Chemical Structure Recognition",
      "content_html": "<p>Days after the release of <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/07/20/osra-gpl-ed-molecule-drawing-to-smiles.html\">OSRA <i class=\"fa-solid fa-recycle fa-xs\"></i></a> last week, I saw\nthe <em>optical chemistry structure recognition</em> on the front page of my favorite Dutch <a href=\"http://www.slashdot.org/\">/.</a> equivalent,\n<a href=\"http://tweakers.net/\">Tweakers.net</a>, <em><a href=\"http://life.tweakers.net/nieuws/48640/Duitsers-leren-computer-chemische-structuren-herkennen.html\">Duitsers leren computer chemische structuren herkennen</a></em>,\nwritten by <a href=\"http://tweakers.net/plan/crew/134\">René Gerritsen</a>. The article discusses the Fraunhofer Institute’s\n<a href=\"http://www.scai.fraunhofer.de/chemocr.html\">ChemoCR</a>, which was, IIRC, presented as poster at last year’s\n<a href=\"http://scholle.oc.uni-kiel.de/users/cic/tagungen/workshop06/\">German Conference on Chemoinformatics</a> (to be held again\n<a href=\"http://www.gdch.de/gcc2007/\">this year</a>). Meanwhile, the CCL.net mailing list had a\n<a href=\"http://www.ccl.net/cgi-bin/ccl/day-index.cgi?2007+07+20\">discussion on the alternatives</a> too; I think it is fair to say that\nthe chemical community realizes the importance of these tools. Below is a short overview of the available tools, including some\nimportant information regarding integration into workflows.</p>\n\n<h2 id=\"chemocr\">ChemoCR</h2>\n<p>ChemoCR seems to be proprietary software, as I could not find any download, and InfoChem seems to be the party to sell licenses.\nThe <a href=\"http://tweakers.net/ext/i/1185808728.gif\">screenshot</a> in the Tweakers.net article seems to show that is is written\nin Java, but that hardly matters if not open source. The project is said to have started three years ago.</p>\n\n<h2 id=\"clide\">CLiDE</h2>\n<p><a href=\"http://www.simbiosys.ca/clide/\">CLiDE</a> is another commercial (expensive) program to do the job. It was developed more\nthan ten years ago, and the <a href=\"http://dx.doi.org/10.1021/ci9601022\">most recent scientific publication</a> is from 1997\n(as the webpage states).</p>\n\n<h2 id=\"osra\">OSRA</h2>\n<p><a href=\"http://cactus.nci.nih.gov/osra/\">OSRA</a> (see my <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/07/20/osra-gpl-ed-molecule-drawing-to-smiles.html\">previous blog <i class=\"fa-solid fa-recycle fa-xs\"></i></a>)\nis opensource and uses the GPL license. It is written in C++. It does not as feature complete as ChemoCR yet, but that\nwill surely come. This project is surely the youngest project.</p>\n\n<h2 id=\"kekule\">Kekule</h2>\n<p>I have not picked up copy of the paper <a href=\"http://dx.doi.org/10.1021/ci00008a018\">Kekule: OCR-optical chemical (structure) recognition</a>\ncited by <a href=\"http://www.chemspider.com/blog/?p=83\">Tony</a>, so cannot say much about that right now.</p>\n\n<p>It is obvious that only OSRA lends itself to embedding in reproducable workflows. Debra Banville\n<a href=\"http://dx.doi.org/10.1016/S1359-6446(05)03682-2\">reviewed the two commercial programs CLiDE and ChemoCR</a>\nlast year, along with a few other text mining tools in chemoinformatics. I am curious about her opinion of\nthe new opensource tools in this arena.</p>",
      "summary": "Days after the release of OSRA last week, I saw the optical chemistry structure recognition on the front page of my favorite Dutch /. equivalent, Tweakers.net, Duitsers leren computer chemische structuren herkennen, written by René Gerritsen. The article discusses the Fraunhofer Institute’s ChemoCR, which was, IIRC, presented as poster at last year’s German Conference on Chemoinformatics (to be held again this year). Meanwhile, the CCL.net mailing list had a discussion on the alternatives too; I think it is fair to say that the chemical community realizes the importance of these tools. Below is a short overview of the available tools, including some important information regarding integration into workflows.",
      
      "date_published": "2007-07-31T00:10:00+00:00",
      "date_modified": "2025-10-11T00:00:00+00:00",
      "tags": ["cheminf"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1021/ci9601022", "doi": "10.1021/ci9601022"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1021/ci00008a018", "doi": "10.1021/ci00008a018"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1016/S1359-6446(05)03682-2", "doi": "10.1016/S1359-6446(05)03682-2"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/9r29x-8y455",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/07/31/rdf-ing-molecular-space.html",
      "title": "RDF-ing molecular space",
      "content_html": "<p><a href=\"http://en.wikipedia.org/wiki/Resource_Description_Framework\">RDF</a> might be the solution we are looking for to get a grip\non the huge amount of information we are facing. <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/05/11/microformats-in-chemistry.html\">microformats <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\nand <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/06/27/chemical-rdfa-with-operator-in-firefox.html\">RDFa <i class=\"fa-solid fa-recycle fa-xs\"></i></a>, are just solutions along the way,\nand Gleaning Resource Descriptions from Dialects of Languages (<a href=\"http://www.w3.org/2004/01/rdxh/spec\">GRDDL</a>) might be\nan important tool to get the web RDF-ied.</p>\n\n<p>One important aspect of RDF is that <a href=\"http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#section-Graph-URIref\">any resource has a unique URI</a>.\nThese make look like a URL or even like <code class=\"language-plaintext highlighter-rouge\">urn:doi:10.1186/1471-2105-8-59</code>. The recent blogs by Pierre\n(<em><a href=\"http://plindenbaum.blogspot.com/2007/07/url-1-lsid-1.html\">URL +1, LSID -1</a></em>) and Roderic\n(<em><a href=\"http://iphylo.blogspot.com/2007/06/rethinking-lsids-versus-http-uri.html\">Rethinking LSIDs versus HTTP URI</a></em>)\nillustrate the pro and cons of the different alternatives.</p>\n\n<h2 id=\"bioguid\">bioGUID</h2>\n\n<p>As usual, the bioinformaticians are less conservative and ahead of chemists in trying new options, and several interesting\nwebsite have emerged. For example, <a href=\"http://bioguid.info/\">bioGUID</a> makes the bridge between a simple URI and a resolvable URL.\nAnd, importantly, it spit RDF. This is the output for <a href=\"http://bioguid.info/doi:10.1109/MIS.2006.62\">http://bioguid.info/doi:10.1109/MIS.2006.62</a>:</p>\n\n<div class=\"language-xml highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"cp\">&lt;?xml version=\"1.0\" encoding=\"utf-8\"?&gt;</span>\n<span class=\"cp\">&lt;?xml-stylesheet type=\"text/xsl\" href=\"http://bioguid.info/xsl/html.xsl\"?&gt;</span>\n<span class=\"nt\">&lt;rdf:RDF</span> <span class=\"na\">xmlns:bioguid=</span><span class=\"s\">\"http://bioguid.info/schema/0.1/\"</span> \n  <span class=\"na\">xmlns:rdfs=</span><span class=\"s\">\"http://www.w3.org/2000/01/rdf-schema#\"</span>\n  <span class=\"na\">xmlns:rss=</span><span class=\"s\">\"http://purl.org/rss/1.0/\"</span> \n  <span class=\"na\">xmlns:prism=</span><span class=\"s\">\"http://prismstandard.org/namespaces/1.2/basic/\"</span>\n  <span class=\"na\">xmlns:dcterms=</span><span class=\"s\">\"http://purl.org/dc/terms/\"</span> \n  <span class=\"na\">xmlns:dc=</span><span class=\"s\">\"http://purl.org/dc/elements/1.1/\"</span>\n  <span class=\"na\">xmlns:rdf=</span><span class=\"s\">\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\"</span><span class=\"nt\">&gt;</span>\n  <span class=\"nt\">&lt;rdf:Description</span> <span class=\"na\">rdf:about=</span><span class=\"s\">\"http://bioguid.info/doi:10.1109/MIS.2006.62\"</span><span class=\"nt\">&gt;</span>\n    <span class=\"nt\">&lt;rdf:type</span> <span class=\"na\">rdf:resource=</span><span class=\"s\">\"http://bioguid.info/schema/0.1/Publication\"</span><span class=\"nt\">/&gt;</span>\n    <span class=\"nt\">&lt;rdfs:comment&gt;</span>Generated by transforming XML returned by CrossRef's\n      OpenURL service.<span class=\"nt\">&lt;/rdfs:comment&gt;</span>\n    <span class=\"nt\">&lt;dc:creator&gt;</span>Shadbolt<span class=\"nt\">&lt;/dc:creator&gt;</span>\n    <span class=\"nt\">&lt;dc:title&gt;</span>The Semantic Web Revisited<span class=\"nt\">&lt;/dc:title&gt;</span>\n    <span class=\"nt\">&lt;dcterms:issued&gt;</span>2006<span class=\"nt\">&lt;/dcterms:issued&gt;</span>\n\n    <span class=\"nt\">&lt;prism:publicationDate&gt;</span>2006<span class=\"nt\">&lt;/prism:publicationDate&gt;</span>\n    <span class=\"nt\">&lt;dc:identifier</span> <span class=\"na\">rdf:resource=</span><span class=\"s\">\"doi:10.1109/MIS.2006.62\"</span><span class=\"nt\">/&gt;</span>\n    <span class=\"nt\">&lt;rdfs:comment&gt;</span>info URI scheme<span class=\"nt\">&lt;/rdfs:comment&gt;</span>\n    <span class=\"nt\">&lt;dc:identifier</span> <span class=\"na\">rdf:resource=</span><span class=\"s\">\"info:doi/10.1109/MIS.2006.62\"</span><span class=\"nt\">/&gt;</span>\n    <span class=\"nt\">&lt;rdfs:comment&gt;</span>CrossRef resolver<span class=\"nt\">&lt;/rdfs:comment&gt;</span>\n    <span class=\"nt\">&lt;rss:link&gt;</span>http://dx.doi.org/10.1109/MIS.2006.62<span class=\"nt\">&lt;/rss:link&gt;</span>\n    <span class=\"nt\">&lt;prism:publicationName&gt;</span>IEEE Intelligent Systems<span class=\"nt\">&lt;/prism:publicationName&gt;</span>\n\n    <span class=\"nt\">&lt;prism:volume&gt;</span>21<span class=\"nt\">&lt;/prism:volume&gt;</span>\n    <span class=\"nt\">&lt;prism:number&gt;</span>3<span class=\"nt\">&lt;/prism:number&gt;</span>\n    <span class=\"nt\">&lt;prism:startingPage&gt;</span>96<span class=\"nt\">&lt;/prism:startingPage&gt;</span>\n    <span class=\"nt\">&lt;prism:issn&gt;</span>10947167<span class=\"nt\">&lt;/prism:issn&gt;</span>\n  <span class=\"nt\">&lt;/rdf:Description&gt;</span>\n<span class=\"nt\">&lt;/rdf:RDF&gt;</span>\n</code></pre></div></div>\n\n<p>(BTW, interesting is the use of XSLT to create HTML; it’s doing the opposite of GRDDL! And this is probably the right way. Cheers Roderic!)</p>\n\n<h2 id=\"inchi\">InChI</h2>\n\n<p>I wanted something similar for molecules. The unique identifier is the <a href=\"http://iupac.org/inchi/\">InChI</a>, of course. The InChI itself is\nnot a proper URI, so I set up a webpage to work around that (if only I had realized this some time ago, I would have urged IUPAC to use\nthe prefix ‘inchi:’ instead of ‘InChI=’). The result is, currently, looking like\n<a href=\"http://cb.openmolecules.net/rdf/rdf.php?InChI=1/CH4/h1H4\">http://cb.openmolecules.net/rdf/rdf.php?InChI=1/CH4/h1H4</a>.\nI do not use a XSLT yet, but will do so shortly. The RDF looks like:</p>\n\n<div class=\"language-xml highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nt\">&lt;rdf:RDF</span>\n<span class=\"na\">xmlns:rdf=</span><span class=\"s\">\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\"</span>\n<span class=\"na\">xmlns:iupac=</span><span class=\"s\">\"http://www.iupac.org/\"</span><span class=\"nt\">&gt;</span>\n\n<span class=\"nt\">&lt;rdf:Description</span>\n <span class=\"na\">rdf:about=</span><span class=\"s\">\"http://cb.openmolecules.net/rdf/?InChI=1/CH4/h1H4\"</span><span class=\"nt\">&gt;</span>\n\n <span class=\"nt\">&lt;iupac:inchi&gt;</span>InChI=1/CH4/h1H4<span class=\"nt\">&lt;/iupac:inchi&gt;</span>\n\n <span class=\"nt\">&lt;pubchem:cid</span> <span class=\"na\">xmlns:pubchem=</span><span class=\"s\">\"http://pubchem.ncbi.nlm.nih.gov/#\"</span><span class=\"nt\">&gt;</span>297<span class=\"nt\">&lt;/pubchem:cid&gt;</span>\n <span class=\"nt\">&lt;pubchem:name</span> <span class=\"na\">xmlns:pubchem=</span><span class=\"s\">\"http://pubchem.ncbi.nlm.nih.gov/#\"</span><span class=\"nt\">&gt;</span>methane<span class=\"nt\">&lt;/pubchem:name&gt;</span>\n <span class=\"nt\">&lt;cb:discussedBy</span> <span class=\"na\">xmlns:cb=</span><span class=\"s\">\"http://cb.openmolecules.net/#\"</span><span class=\"nt\">&gt;</span>http://chemistrylabnotebook.blogspot.com/2007/04/space-final-frontier.html<span class=\"nt\">&lt;/cb:discussedBy&gt;</span>\n <span class=\"nt\">&lt;cb:discussedBy</span> <span class=\"na\">xmlns:cb=</span><span class=\"s\">\"http://cb.openmolecules.net/#\"</span><span class=\"nt\">&gt;</span>http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=299<span class=\"nt\">&lt;/cb:discussedBy&gt;</span>\n <span class=\"nt\">&lt;cb:discussedBy</span> <span class=\"na\">xmlns:cb=</span><span class=\"s\">\"http://cb.openmolecules.net/#\"</span><span class=\"nt\">&gt;</span>http://chem-bla-ics.blogspot.com/2006/12/smiles-cas-and-inchi-in-blogs.html<span class=\"nt\">&lt;/cb:discussedBy&gt;</span>\n <span class=\"nt\">&lt;cb:discussedBy</span> <span class=\"na\">xmlns:cb=</span><span class=\"s\">\"http://cb.openmolecules.net/#\"</span><span class=\"nt\">&gt;</span>http://chem-bla-ics.blogspot.com/2007/02/invisible-inchis.html<span class=\"nt\">&lt;/cb:discussedBy&gt;</span>\n\n<span class=\"nt\">&lt;/rdf:Description&gt;</span>\n\n<span class=\"nt\">&lt;/rdf:RDF&gt;</span>\n</code></pre></div></div>\n\n<p>The system uses PHP to create the output, and has a basis pluggable system: a plugin basically spits a RDF fragment for\nthe given InChI, and at this moment it only has a plugin for <a href=\"http://cb.openmolecules.net/\">Cb</a>, but I plan a few more.\nIt needs some tuning and any and all feedback is most welcome. Note that the actual URI might change a bit.</p>",
      "summary": "RDF might be the solution we are looking for to get a grip on the huge amount of information we are facing. microformats , and RDFa , are just solutions along the way, and Gleaning Resource Descriptions from Dialects of Languages (GRDDL) might be an important tool to get the web RDF-ied.",
      
      "date_published": "2007-07-31T00:00:00+00:00",
      "date_modified": "2025-07-30T00:00:00+00:00",
      "tags": ["chemistry","rdf","inchi"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/d98p9-06w80",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/07/26/further-bioclipse-qsar-functionality.html",
      "title": "Further Bioclipse QSAR functionality development",
      "content_html": "<p>I had some time to <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/06/27/qsar-plugin-for-bioclipse-getting-in.html\">work some more on the QSAR functionality <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nin <a href=\"http://www.bioclipse.net/\">Bioclipse</a>. There is still much to do, but it is getting there. The calculation of a QSAR descriptor data matrix</p>\n\n<p><img src=\"/assets/images/qsarJob.png\" alt=\"\" /></p>\n\n<p>This screenshot shows that multi-resource selection is now working, and that the calculation is now a Job. The resulting matrix looks like:</p>\n\n<p><img src=\"/assets/images/qsarJob1.png\" alt=\"\" /></p>\n\n<p>Things that remain to be done:</p>\n\n<ul>\n  <li>work on a SDF resource</li>\n  <li>a graph view for the matrix</li>\n  <li><a href=\"http://www.r-project.org/\">R</a> functionality for the matrices</li>\n  <li><a href=\"http://joelib.sf.net/\">JOELib</a> support</li>\n</ul>",
      "summary": "I had some time to work some more on the QSAR functionality in Bioclipse. There is still much to do, but it is getting there. The calculation of a QSAR descriptor data matrix",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/qsarJob.png",
      "date_published": "2007-07-26T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["cdk","qsar","bioclipse","joelib"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/arc2j-1ha32",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/07/20/osra-gpl-ed-molecule-drawing-to-smiles.html",
      "title": "OSRA: GPL-ed molecule drawing to SMILES convertor",
      "content_html": "<p>Igor wrote a message to the <a href=\"http://www.ccl.net/chemistry/sub_unsub.shtml\">CCL mailing list</a> about\n<a href=\"http://cactus.nci.nih.gov/osra/\">OSRA</a>:</p>\n\n<blockquote>\n  <p>We would like to announce a new addition to the set of chemoinformatics tools available from the Computer-Aided Drug Design Group\nat the NCI-Frederick. OSRA is a utility designed to convert graphical representations of chemical structures, such as they appear\nin journal articles, patent documents, textbooks, trade magazines etc., into SMILES.<br /><br /></p>\n\n  <p>OSRA can read a document in any of the over 90 graphical formats parseable by ImageMagick (GIF, JPEG, PNG, TIFF, PDF, PS etc.) and\ngenerate the SMILES representation of the molecular structure images encountered within that document.</p>\n</blockquote>\n\n<p>The email does not give any information on the fail rate, but the demo they provide via the\n<a href=\"http://cactus.nci.nih.gov/cgi-bin/osra/index.cgi\">webinterface</a> does show some minor glitches (the bromine is not recognized):</p>\n\n<p><img src=\"/assets/images/osra.png\" alt=\"\" /></p>\n\n<p>The source reuses <a href=\"http://openbabel.sf.net/\">OpenBabel</a> and uses the GPL license. The value equal to that of text mining tools like\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2006/06/22/text-mining-for-chemistry-using-oscar3.html\">OSCAR3 <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\nand together they sounds like the Jordan and Pippen of mining chemical literature.</p>",
      "summary": "Igor wrote a message to the CCL mailing list about OSRA:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/osra.png",
      "date_published": "2007-07-20T00:10:00+00:00",
      "date_modified": "2025-02-22T00:00:00+00:00",
      "tags": ["cheminf","openbabel"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/1wgmy-mfr06",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/07/20/screencasts-for-life-science.html",
      "title": "Screencasts for life science informatics",
      "content_html": "<p><a href=\"http://mndoci.com/blog/\">Deepak</a> blogged about <a href=\"http://mndoci.com/blog/2007/07/18/bioscreencastcom-02/\">screencasting for bio topics</a>,\nconcentrated at <a href=\"https://web.archive.org/web/20070701050807/http://bioscreencast.com/\">bioscreencast.com <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>\nof which he is co-owner. I guess it is like a YouTube for\nbioinformatics thingies. <a href=\"http://usefulchem.blogspot.com/\">Jean-Claude</a> picked this up very quickly (seen on\n<a href=\"http://cb.openmolecules.net/\">Cb</a>? At least I did.), and already uploaded a screencast,\n<a href=\"https://web.archive.org/web/*/http://bioscreencast.com/bsc_movwin.html*\">demoing JSpecView <i class=\"fa-solid fa-link-slash fa-xs\"></i></a>\nwritten by <a href=\"http://wwwchem.uwimona.edu.jm:1104/chrl.html\">Robert</a>. I wonder if he will upload the\n<a href=\"http://usefulchem.blogspot.com/2006/07/cml-in-rss-feeds.html\">screencasts he made for</a>\n<a href=\"http://www.bioclipse.net/\">Bioclipse</a> too? (hint, hint … :)</p>\n\n<p>I have no idea if this site will be a success, but at least it has the right ingredients: tags, flash movies, clean UI, a\n<a href=\"http://bioscreencast.wordpress.com/\">blog to monitor technological changes and improvements</a>, and a page to\nrequest screencasts (with voting). What I only miss is a one summary page for each screencast to which I can\neasily link, for example for my <a href=\"http://del.icio.us/egonw\">del.icio.us</a> account.</p>",
      "summary": "Deepak blogged about screencasting for bio topics, concentrated at bioscreencast.com of which he is co-owner. I guess it is like a YouTube for bioinformatics thingies. Jean-Claude picked this up very quickly (seen on Cb? At least I did.), and already uploaded a screencast, demoing JSpecView written by Robert. I wonder if he will upload the screencasts he made for Bioclipse too? (hint, hint … :)",
      
      "date_published": "2007-07-20T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["bioinfo"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/4xfms-7nn46",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/07/16/cdk-data-model-1.html",
      "title": "The CDK data model #1",
      "content_html": "<p>The <a href=\"http://cdk.sf.net/\">Chemistry Development Kit</a> has a rich set of data classes, each of which is\n<a href=\"http://cdk.svn.sf.net/svnroot/cdk/trunk/cdk/src/org/openscience/cdk/interfaces/IChemObject.java\">defined by an interface</a>.\nWhile the classes for atoms, bonds and a connectivity table are fairly straightforward, but beyond that it is sometimes\nnot entirely clear. I will now discuss all interfaces in a series of blog items. I’ll start with the IChemFile.\n<a href=\"http://wiki.cubic.uni-koeln.de/blog/\">Christoph</a>, please correct me if I move to far away from our Notre Dame board sketch.</p>\n\n<h2 id=\"ichemfile\">IChemFile</h2>\n\n<p>The <a href=\"http://cdk.sourceforge.net/api/org/openscience/cdk/interfaces/IChemFile.html\">IChemFile</a> is the class to\nhold a chemical document, e.g. a MDL molfile or a PDB file. The idea of this class is that it can hold anything we\ncan expect from a chemical document. But nothing beyond that either; a XHTML document with embedded CML is outside\nthe scope of a IChemFile. You might wonder why the <a href=\"http://cdk.sourceforge.net/api/org/openscience/cdk/io/IChemObjectReader.html\">IChemObjectReaders</a>\nnot always just return a IChemFile. That would be a fair point, any many actually do, but somethings it is handier\nto return an IMolecule. A reader for MDL molfiles would be expected to return a IMolecule.</p>\n\n<p>However, a document may contain much more, and the approach taken by the CDK is that a file contains one or more\nmodels. A MDL molfile is an example document with one model, while a MDL SD file would be a document with more than\none model.</p>\n\n<h2 id=\"ichemsequence\">IChemSequence</h2>\n\n<p>However, the IChemFile can hold more than one <a href=\"http://cdk.sourceforge.net/api/org/openscience/cdk/interfaces/IChemSequence.html\">IChemSequence</a>.\nNow, I honestly cannot remember why that is; a single IChemSequence should be enough. And, I actually do not remember\nmore than one IChemSequence being used. (Anyone?) As said, the IChemSequence contains IChemModels, and nothing more\nreally. The interface therefore just contains the basic logic of a list. Let’s move on.</p>\n\n<h2 id=\"ichemmodel\">IChemModel</h2>\n\n<p>The <a href=\"http://cdk.sourceforge.net/api/org/openscience/cdk/interfaces/IChemModel.html\">IChemModel</a> is much more interesting.\nIn the CDK a model is defined as anything that occurs in one actual volume of 3D (or 2D) space. A CIF file with a\ncrystal structures is, therefore, one IChemModel. A supramolecular aggregation of lipids, e.g. a mono- or bilayer,\nwould be IChemModel too. This could be a time step in a molecular dynamics run. Additionally, the IChemModel may\nalso be a chemical reaction, possibly a multistep reaction. It could be, for example, a enzyme reaction mechanism\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2006/02/17/chemical-reactions-in-cml.html\">entry from the MACiE database <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.\nThese three types of content are captured in the ICrystal, IMoleculeSet, and IReactionSet.</p>\n\n<h2 id=\"some-examples\">Some Examples</h2>\n\n<p>A CIF file would be read as an IChemFile contains an IChemSequence with one IChemModel containing an ICrystal.\nAn MDL molfile would be read as an IChemFile containing an IChemSquence with one IChemModel containing a\nIMoleculeSet with one IMolecule. And, an MDL SD file, however, would be read is an IChemFile with an\nIChemSequence with as many IChemModels as there are molecules in the SD file; and, each IChemModel would\ncontains a IMoleculeSet with only one IMolecule. Counter-intuitively, because one may expect the SD file,\nwhich is a set of molecules, being stored in a IMoleculeSet.</p>\n\n<p>Enough for tonight. More later. For the impatient, previously I wrote up a short blog about\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2006/04/12/cdk-data-classes-and-change.html\">the update notification scheme in the CDK interfaces <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.</p>",
      "summary": "The Chemistry Development Kit has a rich set of data classes, each of which is defined by an interface. While the classes for atoms, bonds and a connectivity table are fairly straightforward, but beyond that it is sometimes not entirely clear. I will now discuss all interfaces in a series of blog items. I’ll start with the IChemFile. Christoph, please correct me if I move to far away from our Notre Dame board sketch.",
      
      "date_published": "2007-07-16T00:10:00+00:00",
      "date_modified": "2025-07-30T00:00:00+00:00",
      "tags": ["cdk","cheminf"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/m9z3j-c9093",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/07/16/open-science-notebook-10-years-ago.html",
      "title": "The Open Science Notebook 10 years ago",
      "content_html": "<p>So, with <a href=\"http://mndoci.com/blog/2007/07/14/does-the-open-research-world-need-a-single-access-point/\">all</a>\n<a href=\"http://drexel-coas-elearning.blogspot.com/2006/09/open-notebook-science.html\">these</a>\n<a href=\"http://3quarksdaily.blogs.com/3quarksdaily/2006/11/the_future_of_s.html\">people</a>\n<a href=\"https://doi.org/10.59350/sk1yv-zxp51\">blogging <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\n<a href=\"http://www.sennoma.net/main/archives/2007/07/giving_open_notebook_science_a.php\">about</a>\n<a href=\"http://scilib.typepad.com/science_library_pad/2007/06/thinking-about-.html\">the</a>\n<a href=\"https://blogs.ch.cam.ac.uk/pmr/2007/06/08/alicias-open-science-thesis/\">Open <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\n<a href=\"http://lccccollegeenglish.blogspot.com/2007/05/gary-hermans-open-notebook-science.html\">Science</a>\n<a href=\"https://doi.org/10.63485/c9h8f-ee251\">Notebook <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\n(yes, each word is one distinct blog) it is worth looking back in time. To make clear what I put\nunder the OSN: a notebook in which experimental details and outcome are written down.\nSo, what did the OSN look like almost ten years ago?</p>\n\n<p>It looked like the early open source chemoinformatics projects, such as\n<a href=\"http://sourceforge.net/users/steinbeck/\">CompChem and JMDraw</a> set up by\n<a href=\"http://wiki.cubic.uni-koeln.de/blog/\">Christoph</a> (the SourceForge projects have, unfortunately,\nbeen deleted; so I cannot link to the original project pages). JChemPaint and Jmol also originate from\nthose years.</p>\n\n<p>These projects were OSNs <em>avant le lettre</em>: an experiment in chemoinformatics is the definition of a\nnew (or reformulation of an old) algorithm, writing down the experiment (source code in this code),\nuploaded into a repository (Open Science!) for everyone to comment on, possible sent around an\nannouncement for discussion to mailing list, and reporting the outcome (preferable in a peer-reviewed\njournal). While I am ranting^Wtalking about the issues, chemoinformatics is in the luxurious situation\nthat reproducibility of a procedure is <strong>much</strong> easier,\n<a href=\"https://blogs.ch.cam.ac.uk/pmr/2007/07/14/more-potential-reproducibility/\">except for the missing data part <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.</p>\n\n<p>Just wanted to say that OSN is really nothing new, not to chemistry anyway. Maybe for lab chemists.\n<a href=\"http://drexel-coas-talks-mp3-podcast.blogspot.com/\">Jean-Claude</a> has shown to be very successful in\n<a href=\"https://doi.org/10.1038/npre.2007.39.1\">promoting these open science ideas</a> among lab chemists,\nand congratulate him with the exposure in all those magazine interviews lately. Cheers!</p>\n\n<h2 id=\"open-science-versus-open-source\">Open Science versus Open Source</h2>\n\n<p>Oh, and let me make the distinction between open source in general and open science. Many of the\ncurrent open source software in chemistry(/chemoinformatics) are <strong>not</strong> open science. Open science\nmeans that every step in the development process is open, where is many chemoinformatics programs\nare <em>dumped</em> into the open source sphere at the end. That is not the way it should be.</p>\n\n<p>For the lab chemists: <em><a href=\"http://en.wikipedia.org/wiki/%5EW\">^W is a shortcut for ‘delete the previous word’</a></em>.</p>",
      "summary": "So, with all these people blogging about the Open Science Notebook (yes, each word is one distinct blog) it is worth looking back in time. To make clear what I put under the OSN: a notebook in which experimental details and outcome are written down. So, what did the OSN look like almost ten years ago?",
      
      "date_published": "2007-07-16T00:00:00+00:00",
      "date_modified": "2025-07-30T00:00:00+00:00",
      "tags": ["cdk","openscience"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1038/npre.2007.39.1", "doi": "10.1038/npre.2007.39.1"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.59350/sk1yv-zxp51", "doi": "10.59350/sk1yv-zxp51"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.63485/c9h8f-ee251", "doi": "10.63485/c9h8f-ee251"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/17nhx-htt54",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/07/14/cdk-literature-2.html",
      "title": "CDK Literature #2",
      "content_html": "<p>Second in a series of articles summarizing articles that cite one of the main CDK articles for\n<a href=\"http://www.cdknews.org/\">CDK News</a>. The <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/01/14/cdk-literature-1.html\">first CDK Literature <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nwas already half a year ago, so it was about time.</p>\n\n<h2 id=\"bioclipse\">Bioclipse</h2>\n\n<p>Nothing much I have to say about that. Just <a href=\"http://chem-bla-ics.blogspot.com/search?q=Bioclipse\">browse my blog</a> and\nyou’ll see that it heavily uses CDK, JChemPaint and Jmol. See also the <a href=\"http://bioclipse.blogspot.com/\">Bioclipse blog</a>. <br />\n<em>Ola Spjuth, Tobias Helmus, Egon Willighagen, Stefan Kuhn, Martin Eklund, Johannes Wagener, Peter Murray-Rust,\nChristoph Steinbeck, Jarl Wikberg, Bioclipse: an open source workbench for chemo- and bioinformatics, BMC Bioinformatics,\n2007, 8(59), doi:<a href=\"https://doi.org/10.1186/1471-2105-8-59\">10.1186/1471-2105-8-59</a></em></p>\n\n<h2 id=\"proteomics-in-20052006\">Proteomics in 2005/2006</h2>\n\n<p>Review article on proteomics which mentions the CDK and JChemPaint in the data analysis section, but it does not cite them.\nIt does cite the Bioclipse article though. <br />\n<em>Jeffrey Smith, Jean-Philippe Lambert, Fred Elisma, Daniel Figeys, Proteomics in 2005/2006: Developments, applications\nand challenges, Analytical Chemistry, 2007, 79(12):4325-4343, doi:<a href=\"https://doi.org/10.1021/ac070741j\">10.1021/ac070741j</a></em></p>\n\n<h2 id=\"combinatorial-enumeration\">Combinatorial Enumeration</h2>\n\n<p>Article by Andreas on <a href=\"http://gecco.org.chemie.uni-frankfurt.de/smilib/index.html\">SmiLib</a> (BSD-like license) which\nis library for combinatorial enumeration using building blocks. The CDK is used for the addition of explicit\nhydrogens and the creation of MDL SD files. Andreas mentions in the article that the CDK’s SMILES parser ignores\nstereo chemistry. <br />\n<em>Andreas Schüller, Volker Hänke, Gisbert Schneider, SmiLib v2.0: A Java-Based Tool for Rapid Combinatorial Library\nEnumeration, QSAR &amp; Combinatorial Science, 2007, 26(3):407-410, doi:<a href=\"https://doi.org/10.1002/qsar.200630101\">10.1002/qsar.200630101</a></em></p>\n\n<h2 id=\"molecular-query-language\">Molecular Query Language</h2>\n\n<p>This article is also from the group of Gisbert. Ewgenij introduces an open standard SMARTS replacement, covered in\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2005/10/30/cdk-news.html\">CDK News in 2005 <i class=\"fa-solid fa-recycle fa-xs\"></i></a>. There is an interface to the CDK, but the\nlicense of the reference implementation makes it impossible to distribute it with the CDK itself. This is rather\nunfortunate, because if it would have been possible, a number of implementations in the CDK, such as atom type\nperception, could be based on MQL. See also <a href=\"http://miningdrugs.blogspot.com/2007/01/molecular-query-languages-flexmol-mql.html\">Jörgs blog on MQL</a>. <br />\n<em>Ewgenij Proschak, Jörg Wegner, Andreas Schüller, Gisbert Schneider, Uli Fechner, J. Chem. Inf. Model., 2007, 47(2):295-301,\ndoi:<a href=\"https://doi.org/10.1021/ci600305h\">10.1021/ci600305h</a></em></p>\n\n<h2 id=\"golden-rules-in-mass-spectroscopy\">Golden Rules in Mass Spectroscopy</h2>\n\n<p>Tobias Kind wrote about structure elucidation using mass spectra, and discusses MolGen and CDK’s <code class=\"language-plaintext highlighter-rouge\">DeterministicStructureGenerator</code>,\nand mentions problems with both generators. He has been in contact with the CDK and recently did\n<a href=\"http://sourceforge.net/tracker/index.php?func=detail&amp;aid=1743861&amp;group_id=20024&amp;atid=120024\">extensive tests</a>. <br />\n<em>Tobias Kind and Oliver Fiehn, Seven Golden Rules for heuristic filtering of molecular formulas obtained by accurate mass\nspectrometry, BMC Bioinformatics, 2007, 8:105, doi:<a href=\"https://doi.org/10.1186/1471-2105-8-105\">10.1186/1471-2105-8-105</a></em></p>",
      "summary": "Second in a series of articles summarizing articles that cite one of the main CDK articles for CDK News. The first CDK Literature was already half a year ago, so it was about time.",
      
      "date_published": "2007-07-14T00:00:00+00:00",
      "date_modified": "2025-07-30T00:00:00+00:00",
      "tags": ["cdk"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1186/1471-2105-8-59", "doi": "10.1186/1471-2105-8-59"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1021/ac070741j", "doi": "10.1021/ac070741j"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1186/1471-2105-8-105", "doi": "10.1186/1471-2105-8-105"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1002/qsar.200630101", "doi": "10.1002/qsar.200630101"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1021/ci600305h", "doi": "10.1021/ci600305h"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/tbd0q-67564",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/07/13/inter-and-extrapolation-nmr-shift.html",
      "title": "Inter- and Extrapolation: the NMR shift prediction debate",
      "content_html": "<p>Chemical blogspace has seen a lengthy discussion on <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/06/19/quality-of-chemical-database.html\">the quality of a few NMR shift prediction programs <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\nand Ryan wanted to make <a href=\"http://acdlabs.typepad.com/my_weblog/2007/07/final-note-on-t.html\">a final statement</a>. Down his blog item\nhe had this quote from Jeff, discussing the use of the <a href=\"http://www.nmrshiftdb.org/\">NMRShiftDB</a> as external test set:</p>\n\n<blockquote>\n  <p>“Of course customers are really interested in how accurately a prediction program can predict THEIR molecules - not a collection of external data such as NMRShiftDB.”</p>\n</blockquote>\n\n<p>I’m sure none of us knows what weird chemistry people are doing; we will never know what the overlap of the NMRShiftDB test\nset with the customer data set is. The quote suggests it is low, but we simply do not know.</p>\n\n<h2 id=\"interpolation-and-extrapolation\">Interpolation and Extrapolation</h2>\n\n<p>The accuracy of prediction models is very difficult to grasp, and one can only estimate it; using a test set.\nIf few data is available, one may opt for using the training set as test set too, and gives an estimate if the\nmodeling method is able to predict at all. However, the outcome of this exercise is the worst possible estimate\nyou can make. So, when possible you use an independent test set, which does not contain any molecules that were\npresent in the training set. (Actually, one could even suggest that this must happen on a shift level, but that\ngives problems with HOSE-code based prediction.)</p>\n\n<p>Now, what Ryan stresses in his <a href=\"http://acdlabs.typepad.com/my_weblog/2007/07/final-note-on-t.html\">latest blog item</a>\nis that prediction test results for the various available methods does not explicitly state the amount of overlap\nbetween the training and test set, one cannot draw any conclusions. Agreed. I would, however, like to tune this\neven a bit further, after reading the stupid quote (of course, taking out of context). What Jeff probably aimed\nat, is that the prediction accuracy is only meaningful to a customer if there is considerable between the customers\ndata set and the test set, which is what the model makers do not know.</p>\n\n<p>And the overlap actually goes beyond the overlap in terms of molecular identity. It is really the overlap in terms\nof molecular substructures that matters: a database with alkanes but no phenyl rings will more accurately predict\nother alkanes not present in the training set (interpolation), but will not accurately predict compounds with\nphenyl rings (extrapolation). What the customer needs is that his personal data set does not require extrapolation.\nThat is what matters.</p>\n\n<p>It is interesting to realize, however, that the NMRShiftDB allows you to upload your molecules, or alternatively,\nyou download the software (it’s open source) and the data (it’s open data) if you don’t want to send your molecules\nover the internet, and the NMRShiftDB software will automatically take into account your own data set.</p>\n\n<p>Thus, if you are working on a series of related molecules, you can extend the NMRShiftDB data set with already\nelucidated structures, reducing the prediction error for your yet related unknowns derivatives. It is that easy\nto include prior/expert knowledge in the NMRShiftDB. I believe the ACD/Labs software allows this too, so the\nquote is really meaningless. Not correct, not wrong, simply says nothing.</p>\n\n<h2 id=\"open-data-open-source-open-standards\">Open Data, Open Source, Open Standards</h2>\n\n<p>Now, the various releases of the ACD/Labs software show a simple, understandable trend that increasing the number\nof data you use for the training set, reduces the prediction error. That’s because of various reasons I will not\ngo into in this item. The ACD/Labs NMR databases are expensive, because they have to manually extract and validate\nthe data from literature (see <a href=\"http://acdlabs.typepad.com/my_weblog/2007/06/the_purgatory_d.html\">The Purgatory Database</a>);\nso, during my PhD I only bought the CNMR and HNMR prediction packages. (Off topic: two weeks after I received my\ncopies of the software, ACD/Labs released a new version, which they kindly sent me a copy of too. Common in\nopensource, but much appreciated at that time. Cheers, <a href=\"http://www.acdlabs.com/\">ACD/Labs</a>!)</p>\n\n<p>The ACD/Labs databases are likely expensive because of various reasons. And this is where the ODOSOS concept of the\n<a href=\"http://www.blueobelisk.org/\">Blue Obelisk</a> comes in. <strong>Open Data</strong>: if publishers would not copyright their data,\nNMR databases would be much cheaper to set up (see <a href=\"https://blogs.ch.cam.ac.uk/pmr/2007/07/12/do-authors-want-to-give-publishers-a-monopoly-over-their-data/\">this thread in Peter’s blog <i class=\"fa-solid fa-recycle fa-xs\"></i></a>);\nassuming ACD/Labs has to pay publishers for actually setting up their database. <strong>Open Source</strong>: the various Blue\nObelisk projects provide the <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/09/08/chemical-archeology-oscar3-to.html\">tools to automatically create a purgatory NMR database <i class=\"fa-solid fa-recycle fa-xs\"></i></a>;\nno humans needed for that any more. <strong>Open Standards</strong>: the data from the NMRShiftDB can be downloaded in various\nformats, among which CMLSpect. Being able to easily read the data, made it possible that we actually have this\ndiscussion. Sure, the open data part of the NMRShiftDB is crucial too! But the database could have used an obscure,\nbinary, undocumented, with many software tweaks and special cases, <code class=\"language-plaintext highlighter-rouge\">.doc</code>-like format, which no one could support.</p>\n\n<p>Clearly, ODOSOS gives all, even proprietary, NMR prediction tools a boost, and I am very happy to see that happen.\nIt is the point that we, the Blue Obelisk Movement, are trying to make for some time now.</p>",
      "summary": "Chemical blogspace has seen a lengthy discussion on the quality of a few NMR shift prediction programs , and Ryan wanted to make a final statement. Down his blog item he had this quote from Jeff, discussing the use of the NMRShiftDB as external test set:",
      
      "date_published": "2007-07-13T00:00:00+00:00",
      "date_modified": "2025-07-30T00:00:00+00:00",
      "tags": ["nmr","nmrshiftdb"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/k7q92-prm35",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/07/08/that-big-pile-of-paper.html",
      "title": "That big pile of paper...",
      "content_html": "<p>Everyone of use knows that big pile of paper on your desk that contains the things we want to read, scan or just\nbrowse. I even have <a href=\"http://del.icio.us/egonw/toread\">an electronic equivalent</a>. Another pile contains leaflets\nand glossy folders from conferences, like the <a href=\"http://chem-bla-ics.blogspot.com/search?q=ACS+Chicago\">ACS meeting in Chicago</a>.\nOK, going to get rid of those last ones, and will shortly put the links here.</p>\n\n<p>The first leaflet is from <a href=\"http://www.chemistrycentral.com/\">Chemistry Central</a>, one of the open access publishers.\nActually, not just open access as in free access, but open access as in freedom to reuse it. One things I noticed is this text:\n<em>Our submission system also allows authors to upload figures and reactions schemes in ChemDraw or ISIS/Draw file formats</em>.\nWhat about CMLReact and CML itself? Those are formats I can author with my <a href=\"http://www.blueobelisk.org/\">Blue Obelisk</a>\ntools.</p>\n\n<p>Then there is the proprietary <a href=\"http://www.strandls.com/sarchitect/\">Sarchitect</a> in the area of QSAR/QSPR/ADMET.\nNo idea about the scope or whatever. Oh, make sure to check out <a href=\"http://www.qsarworld.com/\">QSAR world</a>,\nwhere <a href=\"http://andygoesus.blogspot.com/\">Andreas</a> has a column too. I also have some information on the\n<a href=\"http://www.rsc.org/virtuallibrary\">RSC Virtual Library</a> which provides free access to the RSC journals for\nRSC member. But I am not. <a href=\"http://www.epa.gov/greenchemistry\">Green Chemistry</a> is nice for the environment,\nof course, but according to the <a href=\"http://www.epa.gov/\">EPA</a>, it’s about more: <em>Cleaner, cheaper, smarter chemistry</em>.\nWhy, oh why, does this financial incentive have to be present all the time? Are we, humans, really that stupid?</p>\n\n<p>I’m sure I had more advertorials, but these must have been the highlights.</p>",
      "summary": "Everyone of use knows that big pile of paper on your desk that contains the things we want to read, scan or just browse. I even have an electronic equivalent. Another pile contains leaflets and glossy folders from conferences, like the ACS meeting in Chicago. OK, going to get rid of those last ones, and will shortly put the links here.",
      
      "date_published": "2007-07-08T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/xfcxw-qf593",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/07/06/standing-on-shoulders-of-blue-obelisk.html",
      "title": "Standing on the shoulders of ... the Blue Obelisk",
      "content_html": "<p><a href=\"http://blog-msb.embo.org/blog/\">The Seven stones</a> wondered <a href=\"http://blog-msb.embo.org/blog/2007/07/what_would_you_do_with_a_petaf_1.html\">what to do with a petaflop in science</a>,\nin response to <a href=\"http://www.declanbutler.info/blog/\">Declan</a>’s <a href=\"http://dx.doi.org/10.1038/448006a\">The petaflop challenge</a> in Nature.\nDeclan discusses in this commentary the increase in computing power and the necessity of parallel programming to make use of it.\nNow, I do have some ideas (e.g. enumerating metabolomic space, mining the RDF graph of our collective biological and chemical\nknowledge base for the one hundred most supported contradictions), but that is not what I want to talk about. It is this fragment\nfrom Declan’s piece:</p>\n\n<blockquote>\n  <p>“I’m amazed at what he can do just using open-source libraries,” [Horst Simon] says. Although there are exceptions, such as\nhigh-energy physics and bioinformatics, many labs keep their software development close to their chests, for fear that their\ncompetitors will put it to better use and get the credit for the academic application of the program. There is little\nincentive to get the software out there, says Simon, and such attitudes plague development.</p>\n</blockquote>\n\n<p>This is something that is very familiar to many of us: developing algorithms for scientific problems is not appreciated.\nIt worries me very much the way the scientific community currently deals with algorithms and data; it seems the community\ndoes not care about correctness or improvement at all, as long as the result illustrates what they think the (bio)chemical\nreality has to offer. At least, that is what effectively happens if they do no give proper credit to the scientific\nimportance of software development.</p>\n\n<p>Of course, scientific credibility of software depends on the open source nature of the software:\n“Given enough eyeballs, all bugs are shallow”, <a href=\"http://www.catb.org/~esr/writings/cathedral-bazaar/cathedral-bazaar/\">The Cathedral and the Bazaar</a>,\nE.S. Raymond. Or, in more traditional wording: science, and scientific software, must be reproducible and/or\nfalsifiable. The <a href=\"http://www.blueobelisk.org/\">Blue Obelisk Movement</a> is trying to achieve this\n(DOI:<a href=\"https://doi.org/10.1021/ci050400b\">10.1021/ci050400b</a>).</p>\n\n<h2 id=\"the-open-source-challenge\">The open source challenge</h2>\n\n<p>Therefore, I hereby challenge all experimental chemists in biologists to acknowledge the amount of scientific software\nthey already use, and give credit where credit is due. I challenge them to stand up and say that chemo- and\nbioinformaticians provide the methods they rely on daily to achieve there goals. I challenge them to say that\nthey stand of the shoulders of scientific software developers.</p>\n\n<p>The article should not have been called <em>The petaflop challenge</em>, but <em>The open source challenge</em>.</p>",
      "summary": "The Seven stones wondered what to do with a petaflop in science, in response to Declan’s The petaflop challenge in Nature. Declan discusses in this commentary the increase in computing power and the necessity of parallel programming to make use of it. Now, I do have some ideas (e.g. enumerating metabolomic space, mining the RDF graph of our collective biological and chemical knowledge base for the one hundred most supported contradictions), but that is not what I want to talk about. It is this fragment from Declan’s piece:",
      
      "date_published": "2007-07-06T00:00:00+00:00",
      "date_modified": "2025-02-22T00:00:00+00:00",
      "tags": ["blue-obelisk"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1038/448006a", "doi": "10.1038/448006a"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1021/ci050400b", "doi": "10.1021/ci050400b"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/wm7aq-eqz10",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/07/01/atom-typing-in-cdk.html",
      "title": "Atom typing in the CDK",
      "content_html": "<p>Atom typing is one of principal activities in chemoinformatics. Atom types provide additional information that cannot be derived\nfrom the connection table that is being used, or may define what force fields terms should be used. This makes perception of\natom types very important.</p>\n\n<p>The <a href=\"http://cdk.sf.net/\">CDK</a> has a few places where atom types are perceived. The <a href=\"http://cdk.svn.sf.net/svnroot/cdk/trunk/cdk/src/org/openscience/cdk/tools/HydrogenAdder.java\">HydrogenAdder</a>\nand <a href=\"http://cdk.svn.sf.net/svnroot/cdk/trunk/cdk/src/org/openscience/cdk/tools/ValencyChecker.java\">ValencyChecker</a> are two examples.\nGetting the perception wrong, makes it impossible to correctly add hydrogens (of course, hydrogen should always be explicit!) For a\nlong time, these perception algorithms have been embedded in the classes that used them, but efforts have been undertaken to refactor\nthe algorithms into separate classes. These can be found in the package <a href=\"http://cdk.svn.sf.net/svnroot/cdk/trunk/cdk/src/org/openscience/cdk/atomype/\">cdk/atomtype/</a>.</p>\n\n<h2 id=\"different-applications-different-scheme\">Different applications, different scheme</h2>\n\n<p>Now, the CDK can be a bit confusing with respect to the HydrogenAdder and <a href=\"http://cdk.svn.sf.net/svnroot/cdk/trunk/cdk/src/org/openscience/cdk/tools/IValencyChecker.java\">IValencyChecker</a>.\nOriginally, the CDK had only one atom type list, the <a href=\"http://cdk.svn.sf.net/svnroot/cdk/trunk/cdk/src/org/openscience/cdk/config/data/valency_atomtypes.xml\">StructGen Atom Types</a>.\nThis list was used by the deterministic structure generator (and still is), and only defined atom types for neutral atoms, and does not know anything about hybridization states.</p>\n\n<p>The first bug reports dropped in when people applied the HydrogenAdder to charged molecules. However, as said, charged atoms were not defined and the algorithm failed,\nnot silently, just gave the wrong answer. Therefore, the <a href=\"http://cdk.svn.sf.net/svnroot/cdk/trunk/cdk/src/org/openscience/cdk/config/data/valency_atomtypes.xml\">Valency Atom Types</a>\nlist was setup, which does include charged atoms. Everyone happy again.</p>\n\n<p>Later, bugs were reported about the SMILES parser, which comes with additional problems: bond orders are not explicit, and have to be\ndeduced from the connectivity; atom type perception is the only way to decide how many bonds an atom should have, and with what bond\norder. However, SMILES defines hybridization states, and the CDK did not have an atom type list with hybridization information. So,\nwhile the Valency Atom Types list was extended from the StructGen Atom Type List, a new list was created extending from the Valency\nAtom Type list: the <a href=\"http://cdk.svn.sf.net/svnroot/cdk/trunk/cdk/src/org/openscience/cdk/config/data/hybridization_atomtypes.xml\">Hybridization Atom Types</a>\nlist.</p>\n\n<p>Since then, applications asked for other atom type lists, such as the <a href=\"http://cdk.svn.sf.net/svnroot/cdk/trunk/cdk/src/org/openscience/cdk/config/data/mm2_atomtypes.xml\">MM2</a>,\n<a href=\"http://cdk.svn.sf.net/svnroot/cdk/trunk/cdk/src/org/openscience/cdk/config/data/mmff94_atomtypes.xml\">MMFF94</a>,\n<a href=\"http://cdk.svn.sf.net/svnroot/cdk/trunk/cdk/src/org/openscience/cdk/config/data/pdb_atomtypes.xml\">PDB</a>, and\n<a href=\"http://cdk.svn.sf.net/svnroot/cdk/trunk/cdk/src/org/openscience/cdk/config/data/mol2_atomtypes.xml\">Sybyl</a> atom\ntypes. The first two are used for the force field code in the CDK, while the latter two are used for the respective\nIChemObjectReaders.</p>\n\n<h2 id=\"junit-testing-the-perceivers\">JUnit testing the perceivers</h2>\n\n<p>Not all applications actually already make use of the new atom type perception classes in cdk.atomtype. It is wished that these well tested\nbefore the replace code in the classes that use those atom types. Therefore, Rajarshi and me have been working on JUnit test suites. The\nlatest step in this process was that I transformed the test classes to extend a new JUnit4-based\n<a href=\"http://cdk.svn.sf.net/svnroot/cdk/trunk/cdk/src/org/openscience/cdk/test/atomtype/AbstractAtomTypeTest.java\">AbstractAtomTypeTest</a> class.\nNew in this class is that it report which atom types in the atom type list have been tested, and the test will fail if not all atom types\nare tested. The StructGen Atom Types list is mostly covered now, but for all other lists tests still have to be written (monitor the progress\non <a href=\"http://cheminfo.informatics.indiana.edu/~rguha/code/java/nightly/test/result-core.html\">CDK Nightly</a>).</p>\n\n<p>For the MOL2 atom type list, there is no Java implementation of the IAtomTypeMatcher, but we have Fortran code that can be ported (provided\nby Martin Ott). Anyone interested?</p>",
      "summary": "Atom typing is one of principal activities in chemoinformatics. Atom types provide additional information that cannot be derived from the connection table that is being used, or may define what force fields terms should be used. This makes perception of atom types very important.",
      
      "date_published": "2007-07-01T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["cdk","junit"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/8hkrb-cb907",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/06/27/chemical-rdfa-with-operator-in-firefox.html",
      "title": "Chemical RDFa with Operator in the Firefox toolbar",
      "content_html": "<p>December last year <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/12/10/including-smiles-cml-and-inchi-in.html\">I proposed the use of microformats and RDFa <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nfor simple semantic markup of molecular information. I linked that with the <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/02/25/hacking-inchi-support-into.html\">InChI extension for the Postgenomic.com software <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nfor <a href=\"http://cb.openmolecules.net/\">Chemical blogspace</a> and wrote these tools to work with the markup:</p>\n\n<ul>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2006/12/17/smiles-cas-and-inchi-in-blogs.html\">wrote a Greasemonkey script to automatically link to webservices <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,</li>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2007/01/02/chemistry-in-html-javascript-from.html\">explained how that script can be used on the server <i class=\"fa-solid fa-recycle fa-xs\"></i></a>, and</li>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2007/05/05/cb-comments-for-inchis.html\">adapted a Greasemonkey script to show blog items related to molecules <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.</li>\n</ul>\n\n<p>All using the new semantic markup.</p>\n\n<p>Of the two, I think RDFa has the best future. Then I <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/05/11/added-my-hcard-to-my-blog.html\">discovered Operator <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\nwritten by <a href=\"http://www.kaply.com/weblog/\">Mike</a>. While the Greasemonkey scripts already allow me to link to, for example, PubChem and eMolecules,\nthe <a href=\"https://addons.mozilla.org/en-US/firefox/addon/4106\">Operator Firefox Addon</a> allowed me to open vCards incorporated in HTML pages directly\nto my address book client. Thus, I could open chemistry directly in <a href=\"http://bioclipse.net/\">Bioclipse</a> too!</p>\n\n<p>That was the idea, at least. I contacted Mike, and he asked me to wait until the first 0.8 releases, which he\n<a href=\"http://www.kaply.com/weblog/2007/06/04/operator-08a-is-available/\">announced earlier this month</a>.\nThis version allows user scripts to be written, which define how RDFa should be handled. And with his patience and help, this was the result:</p>\n\n<p><img src=\"/assets/images/pubchemRDFa.png\" alt=\"\" /></p>\n\n<p>The HTML is almost <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/12/17/smiles-cas-and-inchi-in-blogs.html\">as explained before <i class=\"fa-solid fa-recycle fa-xs\"></i></a>, and looks like:</p>\n\n<div class=\"language-html highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nt\">&lt;html</span> <span class=\"na\">xmlns=</span><span class=\"s\">\"http://www.w3.org/2002/06/xhtml2/\"</span><span class=\"nt\">&gt;</span>\n\n<span class=\"nt\">&lt;h1&gt;</span>Chemical RDFa with Operator<span class=\"nt\">&lt;/h1&gt;</span>\n\n<span class=\"nt\">&lt;div</span> <span class=\"na\">about=</span><span class=\"s\">\"#chem_123\"</span> <span class=\"na\">xmlns:chem=</span><span class=\"s\">\"http://www.blueobelisk.org/chemistryblogs/\"</span><span class=\"nt\">&gt;</span>\n  Methane has the following identifier: <span class=\"nt\">&lt;span</span> <span class=\"na\">property=</span><span class=\"s\">\"chem:inchi\"</span><span class=\"nt\">&gt;</span>InChI=1/CH4/h1H4<span class=\"nt\">&lt;/span&gt;</span>\n<span class=\"nt\">&lt;/div&gt;</span>\n\n<span class=\"nt\">&lt;/html&gt;</span>\n</code></pre></div></div>\n\n<p>It is important here to wrap the statement in a <code class=\"language-plaintext highlighter-rouge\">&lt;div&gt;</code> element and to add the <code class=\"language-plaintext highlighter-rouge\">@about</code> attribute to it, defining the Subject. Moreover,\nyou need to use the <code class=\"language-plaintext highlighter-rouge\">@property</code> attributes instead of <code class=\"language-plaintext highlighter-rouge\">@class</code>. The content of this attribute defined the Predicate, and the content of the\n<code class=\"language-plaintext highlighter-rouge\">&lt;span&gt;</code> element is the Object, completing the RDF triple.</p>\n\n<p>Operator detects these RDFa statements from the HTML, and creates a new menu item <em>Search in Pubchem</em> using this piece of code:</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kd\">var</span> <span class=\"nx\">pubchem_inchi</span> <span class=\"o\">=</span> <span class=\"p\">{</span>\n  <span class=\"na\">description</span><span class=\"p\">:</span> <span class=\"dl\">\"</span><span class=\"s2\">Search in PubChem</span><span class=\"dl\">\"</span><span class=\"p\">,</span>\n  <span class=\"na\">short</span><span class=\"p\">:</span> <span class=\"dl\">\"</span><span class=\"s2\">PubChem</span><span class=\"dl\">\"</span><span class=\"p\">,</span>\n  <span class=\"na\">scope</span><span class=\"p\">:</span> <span class=\"p\">{</span>\n    <span class=\"na\">semantic</span><span class=\"p\">:</span> <span class=\"p\">{</span>\n      <span class=\"dl\">\"</span><span class=\"s2\">RDFa</span><span class=\"dl\">\"</span> <span class=\"p\">:</span>  <span class=\"p\">{</span>\n        <span class=\"na\">property</span> <span class=\"p\">:</span> <span class=\"dl\">\"</span><span class=\"s2\">http://www.blueobelisk.org/chemistryblogs/inchi</span><span class=\"dl\">\"</span><span class=\"p\">,</span>\n        <span class=\"na\">defaultNS</span> <span class=\"p\">:</span> <span class=\"dl\">\"</span><span class=\"s2\">http://www.blueobelisk.org/chemistryblogs/</span><span class=\"dl\">\"</span>\n      <span class=\"p\">}</span>\n    <span class=\"p\">}</span>\n  <span class=\"p\">},</span>\n  <span class=\"na\">doAction</span><span class=\"p\">:</span> <span class=\"kd\">function</span><span class=\"p\">(</span><span class=\"nx\">semanticObject</span><span class=\"p\">,</span> <span class=\"nx\">semanticObjectType</span><span class=\"p\">)</span> <span class=\"p\">{</span>\n    <span class=\"k\">if </span><span class=\"p\">(</span><span class=\"nx\">semanticObjectType</span> <span class=\"o\">==</span> <span class=\"dl\">\"</span><span class=\"s2\">RDFa</span><span class=\"dl\">\"</span><span class=\"p\">)</span> <span class=\"p\">{</span>\n      <span class=\"k\">return</span> <span class=\"dl\">\"</span><span class=\"s2\">http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=search&amp;DB=pccompound&amp;term=%22</span><span class=\"dl\">\"</span> <span class=\"o\">+</span> <span class=\"nx\">semanticObject</span><span class=\"p\">.</span><span class=\"nx\">inchi</span> <span class=\"o\">+</span> <span class=\"dl\">\"</span><span class=\"s2\">%22[InChI]</span><span class=\"dl\">\"</span><span class=\"p\">;</span>\n    <span class=\"p\">}</span>\n  <span class=\"p\">}</span>\n<span class=\"p\">};</span>\n\n<span class=\"nx\">SemanticActions</span><span class=\"p\">.</span><span class=\"nf\">add</span><span class=\"p\">(</span><span class=\"dl\">\"</span><span class=\"s2\">pubchem_inchi</span><span class=\"dl\">\"</span><span class=\"p\">,</span> <span class=\"nx\">pubchem_inchi</span><span class=\"p\">);</span>\n</code></pre></div></div>\n\n<p>You can reproduce this by installing Operator 0.8a in Firefox, saving the script to a file in your home directory, and\nreading it via the Operator “Options” dialog. Make sure to also set the <em>Display Style</em> in the <em>General</em> tab of the dialog to\n<em>Data formats</em>. Only then will the RDFa magic kick in.</p>\n\n<p>Adding support for eMolecules, ChemSpider and whatever else we like is easy now. What I still need to explore (or ask Mike),\nis how I can trigger the <em>Open With/Save As</em> dialog of Firefox.</p>",
      "summary": "December last year I proposed the use of microformats and RDFa for simple semantic markup of molecular information. I linked that with the InChI extension for the Postgenomic.com software for Chemical blogspace and wrote these tools to work with the markup:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/pubchemRDFa.png",
      "date_published": "2007-06-27T00:10:00+00:00",
      "date_modified": "2025-07-30T00:00:00+00:00",
      "tags": ["pubchem","rdf","userscript","inchi"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/ht8k2-aed07",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/06/27/qsar-plugin-for-bioclipse-getting-in.html",
      "title": "QSAR plugin for Bioclipse getting in shape",
      "content_html": "<p>Over the last few weeks I continued the work on getting (descriptor-based) <a href=\"http://en.wikipedia.org/wiki/QSAR\">QSAR</a>/QSPR implemented in\n<a href=\"http://www.bioclipse.net/\">Bioclipse</a>. <a href=\"http://joelib.sf.net/\">JOELib</a> (GPL) and the <a href=\"http://cdk.sf.net/\">CDK</a> (LGPL) being two prominent\nopensource engines that can calculate molecular descriptors, and <a href=\"http://ambit.acad.bg/\">AMBIT</a> a front-end.</p>\n\n<p>To be able to do QSAR/QSPR model building from start to end in Bioclipse, I worked in April\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2007/04/24/bioclipse-now-allows-qsar-descriptor.html\">on an architecture for selecting descriptors <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.\nBeing busy with so many things, it took me some time to get around to completing that, but here are the screenshots:</p>\n\n<p><img src=\"/assets/images/bioQSAR1.png\" alt=\"\" /></p>\n\n<p>The funny characters and the whitespace is gone. Right now, it still only lists one provider, but I plan to add JOELib plugin soon.\nThe list of actual descriptors is provided by the extension.</p>\n\n<p>What Bioclipse then does, is have the extension calculate the descriptor values for the selected <code class=\"language-plaintext highlighter-rouge\">CDKResource</code> in the BioNavigator\nusing the selected descriptors. This will then create a new <code class=\"language-plaintext highlighter-rouge\">MatrixResource</code> in the Bioclipse workspace (currently called\nqsarResult.jam), and which is opened in the Matrix editor:</p>\n\n<p><img src=\"/assets/images/bioQSAR1.png\" alt=\"\" /></p>\n\n<p>There is still enough work left to do. For example, the columns are not yet labeled according to the descriptor name, and\nselecting more than one <code class=\"language-plaintext highlighter-rouge\">CDKResource</code> in the navigator does not give a multirow matrix yet.</p>",
      "summary": "Over the last few weeks I continued the work on getting (descriptor-based) QSAR/QSPR implemented in Bioclipse. JOELib (GPL) and the CDK (LGPL) being two prominent opensource engines that can calculate molecular descriptors, and AMBIT a front-end.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/bioQSAR.png",
      "date_published": "2007-06-27T00:00:00+00:00",
      "date_modified": "2025-08-10T00:00:00+00:00",
      "tags": ["bioclipse","qsar","cdk","ambit"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/a1np2-c3x03",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/06/25/test-file-repository-and-relaxng.html",
      "title": "Test File Repository and RelaxNG",
      "content_html": "<p>Last week I started the <a href=\"http://www.blueobelisk.org/\">Blue Obelisk</a> <a href=\"http://blueobelisk.svn.sf.net/svnroot/blueobelisk/ctfr/trunk/\">Chemical Test File Repository</a>,\na repository of <a href=\"http://www.opensource.org/licenses\">OSI-approved-licence</a>d test files (from various sources) to improve interoperability between\nchemoinformatics software.</p>\n\n<p>Following a discussion on the mailing list earlier, a directory hierarchy has been set up, and each files contains an index.xml to describe\nthe content. In case of a directory with actual test files, it may look like:</p>\n\n<div class=\"language-xml highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nt\">&lt;dir</span> <span class=\"na\">name=</span><span class=\"s\">\"asn/pubchem/valid\"</span> <span class=\"na\">xmlns:dc=</span><span class=\"s\">\"http://purl.org/dc/elements/1.1/\"</span><span class=\"nt\">&gt;</span>\n\n  <span class=\"nt\">&lt;chemfiles&gt;</span>\n\n    <span class=\"nt\">&lt;file</span> <span class=\"na\">name=</span><span class=\"s\">\"cid1.asn\"</span> <span class=\"na\">valid=</span><span class=\"s\">\"yes\"</span><span class=\"nt\">&gt;</span>\n       <span class=\"nt\">&lt;dc:format&gt;</span>chemical/x-asn-pubchem<span class=\"nt\">&lt;/dc:format&gt;</span>\n       <span class=\"nt\">&lt;dc:source&gt;</span>PubChem<span class=\"nt\">&lt;/dc:source&gt;</span>\n       <span class=\"nt\">&lt;dc:creator&gt;</span>Unknown<span class=\"nt\">&lt;/dc:creator&gt;</span>\n       <span class=\"nt\">&lt;dc:rights&gt;</span>PublicDomain<span class=\"nt\">&lt;/dc:rights&gt;</span>\n       <span class=\"nt\">&lt;test</span> <span class=\"na\">by=</span><span class=\"s\">\"CDK\"</span><span class=\"nt\">/&gt;</span>\n    <span class=\"nt\">&lt;/file&gt;</span>\n\n  <span class=\"nt\">&lt;/chemfiles&gt;</span>\n\n<span class=\"nt\">&lt;/dir&gt;</span>\n</code></pre></div></div>\n\n<p>As is clear, <a href=\"http://en.wikipedia.org/wiki/Dublin_core\">Dublin Core</a> is reused for much of the meta data.</p>\n\n<p>To improve and ensure some quality, the XML must be valid in addition to just well-formed, so that I can set up XSLT stylesheets to create XHTML indices and\nsummaries. Therefore, I wanted to setup a schema for the index.xml files. My first thought was to use <a href=\"http://en.wikipedia.org/wiki/Xml_schema\">XML Schema</a>\nwhich has XML Namespaces support and has well defined (and extensible) data types. I have hacked in it in the past my the details have slipped me.\nAlready in 1998 I worked with DTDs, around the time that the XML specification was declared a recommendation. Originating from the SGML year,\nit is not XML based, had no knowledge of namespaces, and only a limited amount of data types.</p>\n\n<p>Then there is <a href=\"http://en.wikipedia.org/wiki/RELAX_NG\">RELAX NG</a>. XML based, uses the same data types are XML Schema and has support for namespaces.\nSince I had to look up the specs for either DTD or XML Schema for the details anyway (e.g. on how to allow the DC namespace in the main namepsace),\nwhy not try something new. Well, I was amazed. RELAX NG has a syntax simplicity like that of DTD, but the functionality from XML Schema. So,\nI hacked up in 30 minutes a XML spec for the test file repository, including a (too short) list of recognized MIME types. Just a combination of some\n<code class=\"language-plaintext highlighter-rouge\">&lt;element&gt;</code>, <code class=\"language-plaintext highlighter-rouge\">&lt;attribute&gt;</code>, <code class=\"language-plaintext highlighter-rouge\">&lt;oneOrMore&gt;</code>, etc elements. The results is available as <a href=\"http://blueobelisk.svn.sf.net/svnroot/blueobelisk/ctfr/trunk/schema.relaxng\">schema.relaxng</a>\nin SVN.</p>",
      "summary": "Last week I started the Blue Obelisk Chemical Test File Repository, a repository of OSI-approved-licenced test files (from various sources) to improve interoperability between chemoinformatics software.",
      
      "date_published": "2007-06-25T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["blue-obelisk","openscience"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/mbw05-t5888",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/06/25/nature-should-host-our-electronic-lab.html",
      "title": "Nature should host our Electronic Lab Notebooks",
      "content_html": "<p><a href=\"http://pbeltrao.blogspot.com/\">Pedro</a> suggested in <a href=\"http://network.nature.com/\">Nature Network</a>s <a href=\"http://network.nature.com/forum/whats-next\">What’s Next</a>\nforum that Nature should add a new service for scientists: hosting electronic lab notebooks. And I think this will be a killer application.\nI am rather excited about the idea, and feel ashamed not putting one-and-one together myself. We have our\n<a href=\"http://www.blueobelisk.org/\">chemoinformatics tools</a> and <a href=\"http://en.wikipedia.org/wiki/Resource_Description_Framework\">RDF</a>\nis just around the corner, that combined with <a href=\"http://hdl.handle.net/10042/23\">semantic wikis</a>, and we have <em>science of the 21st century</em>.\nThis is <a href=\"http://network.nature.com/forums/whats-next/5?page=6#reply-508\">my reply</a> posted on Nature Network:</p>\n\n<blockquote>\n  <p>Pedro, that might be an interesting idea: Nature hosting ELN. with much content, I have been maintaining a wiki in my previous postdoc,\nas replacement for the old paper notebook. Allows me to make links etc. I plan to do this in my new postdoc too, maybe even with a\nRDF-enabled wiki, to have agents automatically verify what I enter for inconsistencies. These things are already possible; just a\nmatter of doing it.</p>\n\n  <p>If Nature would host such a service (RDF-enabled, and integrated with their other pages), they have a true killer for me: I write\nmy ELN items, and for each page I decide if I want to make it public; since it is a wiki, I can keep it private until happy about\nthe results, or, simply, until the experiment has finished. Then, by clicking a button it would become CC+attribution and\nautomatically end up in Nature Preceedings. The full integration of Scintilla/Postgenomic/Connotea comes in when making links to\nbackground material.</p>\n\n  <p>The RDF is important for validating what I write, and I can imagine that Nature has an extensive set of default agents (of course,\nin addition to spell checking etc :). These agents check if the chemical reaction equations makes sense (conservation of mass,\natom count, etc), that NMR/MS spectra and other experimental properties are consistent with that equation, and whatever else\nwe can come up with. The tools for this validation are available, and basically only the glue is missing.</p>\n</blockquote>",
      "summary": "Pedro suggested in Nature Networks What’s Next forum that Nature should add a new service for scientists: hosting electronic lab notebooks. And I think this will be a killer application. I am rather excited about the idea, and feel ashamed not putting one-and-one together myself. We have our chemoinformatics tools and RDF is just around the corner, that combined with semantic wikis, and we have science of the 21st century. This is my reply posted on Nature Network:",
      
      "date_published": "2007-06-25T00:00:00+00:00",
      "date_modified": "2025-02-22T00:00:00+00:00",
      "tags": ["nature","eln","openscience"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/4n5zx-19r45",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/06/22/archiving-spectra-use-inchi-and-cml.html",
      "title": "Archiving spectra: use InChI and CML",
      "content_html": "<p><a href=\"http://acdlabs.typepad.com/my_weblog/\">Ryan</a> blogged in <a href=\"http://acdlabs.typepad.com/my_weblog/2007/06/archive_this.html\">Archive This</a>\nabout some advices from ACD on how to store spectra in your electronic lab notebook.</p>\n\n<h2 id=\"use-inchi\">Use InChI</h2>\n\n<p>This reminded me of a <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/02/01/rsc-first-publisher-to-go-semantic.html\">discussion I had with with Colin <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nwhen he was at the CUBIC, which was about experimental sections. I proposed that the <a href=\"http://www.iupac.org/inchi/\">InChI</a> should have a\nprominent place in the experimental section. An important argument for this is that it allows well-defined atom numbering to be used when\nwriting down the NMR bits in that section: the InChI gives a unique numbering, so that the numbering used in the experimental section\nbecomes author neutral. Because the InChI puts the carbons up front, the <sup>13</sup>C NMR details get numbers from 1-13, or whatever\nthe carbon count is. For proton NMR it is not difficult either, they are simply numbered according to the heavy atom to which they are\nattached. For situations where two hydrogens attached to the same heavy atom have different shifts, then a and b can still be used.\nThe numbers are easily added to 2D diagrams anyway.</p>\n\n<p>If software vendors (e.g. <a href=\"http://www.acdlabs.com/\">ACD</a> and <a href=\"http://bioclipse.net/\">Bioclipse</a>) and publishers (e.g. ACS,\n<a href=\"http://www.rsc.org/Publishing/Journals/ProjectProspect/\">RSC</a>, <a href=\"http://www.chemistrycentral.com/\">Chemistry Central</a>) could adopt this\nproposal, then experimental sections immediately are better machine parsable and ready for automatic processing, such as discussed in\nmy blog item <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/09/08/chemical-archeology-oscar3-to.html\">Chemical Archeology: OSCAR3 to NMRShiftDB.org <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nand by <a href=\"http://www.acscinf.org/dbx/mtgs/232nm/232cinfprogram.asp\">Christoph at the ACS meeting</a>, available as\n<a href=\"http://acscinf.org/docs/meetings/232nm/presentations/232nm101.pdf\">PDF</a> and this 18MB\n<a href=\"http://acscinf.org/docs/meetings/232nm/presentations/232nm101.mp3\">MP3</a>.</p>\n\n<h2 id=\"use-cml\">Use CML</h2>\n\n<p>Even better is to use <a href=\"http://en.wikipedia.org/wiki/Chemical_Markup_Language\">CML</a> for this, or CMLSpect to be precise (paper is accepted,\nand should appear soon). This XML-based language allows the full semantic markup of all the experimental details and all the interesting\nassignments you want to archive. I would like to <strong>challenge ACD</strong> to follow Bioclipse’s lead and provide export as CMLSpect for spectral\nassignments and markup of experimental details, in addition to the PDF in whatever format they prefer. Cheers for the work by Tobias\nand Stefan on spectrum support in Bioclipse!</p>",
      "summary": "Ryan blogged in Archive This about some advices from ACD on how to store spectra in your electronic lab notebook.",
      
      "date_published": "2007-06-22T00:00:00+00:00",
      "date_modified": "2025-07-30T00:00:00+00:00",
      "tags": ["cml","inchi","nmr"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/yeg40-ebt62",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/06/19/new-job-post-doc-at-wur-on-ms-based.html",
      "title": "A new job: post-doc at the WUR on MS based structure elucidation",
      "content_html": "<p>On July 1st I will start a post-doc in <a href=\"http://en.wikipedia.org/wiki/Wageningen\">Wageningen</a>, The Netherlands at the\n<a href=\"http://www.wur.nl/\">WUR</a>. More precisely, with a post-doc in the group of Prof. Van Eeuwijk at <a href=\"http://www.biometris.wur.nl/UK/\">Biometris</a>,\ncooperating with the group of Prof. Hall at <a href=\"http://www.pri.wur.nl/UK/\">Plant Research International</a> (PRI), within the framework of\nthe new <a href=\"http://www.metabolomicscentre.nl/\">Netherlands Metabolomics Center</a>. The topic will be structure elucidation using mass\nspectral data originating from the experimental department of PRI, and will be a nice follow up on the work on SENECA\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2007/04/06/cubic-period-is-over.html\">I have been doing last year <i class=\"fa-solid fa-recycle fa-xs\"></i></a> in the group of\n<a href=\"http://wiki.cubic.uni-koeln.de/blog/\">Dr. Christoph Steinbeck</a> at the <a href=\"https://www.cubic.uni-koeln.de/\">CUBIC</a>.</p>",
      "summary": "On July 1st I will start a post-doc in Wageningen, The Netherlands at the WUR. More precisely, with a post-doc in the group of Prof. Van Eeuwijk at Biometris, cooperating with the group of Prof. Hall at Plant Research International (PRI), within the framework of the new Netherlands Metabolomics Center. The topic will be structure elucidation using mass spectral data originating from the experimental department of PRI, and will be a nice follow up on the work on SENECA I have been doing last year in the group of Dr. Christoph Steinbeck at the CUBIC.",
      
      "date_published": "2007-06-19T00:20:00+00:00",
      "date_modified": "2025-08-10T00:00:00+00:00",
      "tags": ["career","metabolomics"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/49wqj-62k11",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/06/19/quality-of-chemical-database.html",
      "title": "Quality of Chemical Database",
      "content_html": "<p>Lately, <a href=\"http://cb.openmolecules.net/\">Chemical blogspace</a> has seen an interesting discussion on the quality of opendata and free chemical database (over\n<a href=\"https://doi.org/10.59350/jy0f5-7m219\">32 free resources now <i class=\"fa-solid fa-recycle fa-xs\"></i></a>), such as the\n<a href=\"http://nmrshiftdb.org/\">NMRShiftDB.org</a>. For example, see <a href=\"http://www.chemspider.com/blog/?p=44\">Antony’s view on the NMRShiftDB</a>\nand <a href=\"http://nmrpredict.orc.univie.ac.at/csearch_summary/more_or_less_than_250_errors.html\">Robien’s analysis</a>.</p>\n\n<p><a href=\"http://en.wikipedia.org/wiki/Open_data\">Opendata</a> makes such quality assurance possible, and I am happy that the NMRShiftDB was\nexplored like this; the found problems can be reported and corrected. If correcting them upstream is difficult, opendata allows\none to make a better derivative; that’s what opendata is about. For example, <a href=\"http://biometa.cmbi.ru.nl/\">BioMeta</a>\n(DOI:<a href=\"https://doi.org/10.1186/1471-2105-7-517\">10.1186/1471-2105-7-517</a>) took data from KEGG and corrected a lot of molecular\nproblems (like reaction balancing, stereo chemistry, etc).</p>\n\n<p>I have contributed almost 900 spectra to the NMRShiftDB, and I am sure I may have made a mistake here and there. But my submission is verified\nby a reviewer, and furthermore, users of the database can report inconsistencies via the NMRShiftDB.org website. Now, I have focused on uncommon\nNMR nuclei, like <sup>11</sup>B, <sup>195</sup>Pt and <sup>29</sup>Si (see the <a href=\"http://nmrshiftdb.ice.mpg.de/nmrshiftdbhtml/statistics.html\">stats</a>),\nwhich tend to have only one peak. Nothing much that can go wrong; still, one or two errors were catched by the reviewer.</p>\n\n<h2 id=\"ensuring-data-quality\">Ensuring data quality</h2>\n\n<p>Humans make errors, but not even only when data is entered; they make mistakes checking data too. Nothing much that can\nbe done about that, other than using computers to find patterns. This is exactly what Robien did: he used his software\nwhich implements common patterns to find entries in the database that did not comply to those patterns.</p>\n\n<p>Automated quality assurance requires a easy to use, machine-readable interface. For example, CMLRSS\n(DOI:<a href=\"https://doi.org/10.1021/ci034244p\">10.1021/ci034244p</a>) can be used for running new entries in databases\nagainst known patterns. But other interfaces are most welcome too. Rich recently\n<a href=\"https://doi.org/10.59350/zwnp1-qy767\">discussed the new PUG interface <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\nwhich offers an interface to <a href=\"http://pubchem.ncbi.nlm.nih.gov/\">PubChem</a>.</p>\n\n<p>German scientists offer a RDF interface to <a href=\"http://wikipedia.org/\">Wikipedia</a>: <a href=\"http://dbpedia.org/\">DBPedia</a>.\nInformal semantic markup in Wikipedia, such as the <a href=\"http://en.wikipedia.org/wiki/Wikipedia:Infobox_templates\">Infobox template</a>,\n<a href=\"http://dbpedia.org/docs/\">are used to create triples</a>. It’s a shame that the <a href=\"http://en.wikipedia.org/wiki/Template:Chembox\">ChemBox</a>\nis not used yet, which would make <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/06/19/using-wikipedia-to-recognize-molecules.html\">detecting molecules in blogs <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\neven easier.</p>",
      "summary": "Lately, Chemical blogspace has seen an interesting discussion on the quality of opendata and free chemical database (over 32 free resources now ), such as the NMRShiftDB.org. For example, see Antony’s view on the NMRShiftDB and Robien’s analysis.",
      
      "date_published": "2007-06-19T00:10:00+00:00",
      "date_modified": "2025-08-10T00:00:00+00:00",
      "tags": ["opendata","chemistry","pubchem","rdf","nmrshiftdb"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1186/1471-2105-7-517", "doi": "10.1186/1471-2105-7-517"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1021/CI034244P", "doi": "10.1021/CI034244P"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.59350/jy0f5-7m219", "doi": "10.59350/jy0f5-7m219"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.59350/zwnp1-qy767", "doi": "10.59350/zwnp1-qy767"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/c0pvq-c1988",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/06/19/using-wikipedia-to-recognize-molecules.html",
      "title": "Using Wikipedia to recognize Molecules in Blogspace",
      "content_html": "<p>Only few people are <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/12/10/including-smiles-cml-and-inchi-in.html\">using InChI’s to indicate the molecules the blog about <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\n(prominent exceptions are <a href=\"http://usefulchem.blogspot.com/\">Useful Chemistry</a> and <a href=\"http://www.scienceblogs.com/moleculeoftheday/\">Molecule of the Day</a>).\nConsequently, the number of detected molecules (without using OSCAR3) in <a href=\"http://cb.openmolecules.net/\">Chemical blogspace</a> has been low.</p>\n\n<p>Fortunately, many more people use links to <a href=\"http://wikipedia.org/\">Wikipedia</a> to identify the molecules that talk about. And some of these pages\nuse the <a href=\"http://en.wikipedia.org/wiki/Template:Chembox\">ChemBox template</a> which actually might contain a\n<a href=\"http://pubchem.ncbi.nlm.nih.gov/\">PubChem</a> CID or even an <a href=\"http://www.iupac.org/inchi/\">InChI</a>. This has increased the\n<a href=\"http://cb.openmolecules.net/inchis.php\">molecular content of Chemical blogspace</a> considerably.</p>\n\n<p>There is also, however, a good list of molecules in Wikipedia for which no CID or InChI is given:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>http://www.en.wikipedia.org/wiki/Hafnium(IV)_oxide -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Cubane -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/water -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/oxidane -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Carminic_acid -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Alizarin -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/AIBN -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/piperidine -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/hydroxide -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/tetrahydrocannabinol -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Epibatidine -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/cortisone -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Eschenmoser%27s_salt -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/pyrrole -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/anthracene -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/benzylbromide -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Skatole -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Teicoplanin -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Methyl_violet -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Penicillin -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Aspartame -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Splenda -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Sucrose -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Rhodamine -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Ascorbic_acid -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Tabun_(nerve_agent) -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Soman -&gt; but no InChI/CID\nhttp://www.wikipedia.org/wiki/Phosgene -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/AZD2171 -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Heavy_water -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/MTBE -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Biotin -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Spermine -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Silicon_carbide -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/stilbene -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Methyl_salicylate -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Dmso -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/DMF -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Acetonitrile -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/HMPA -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Phenol -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/TBHQ -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/MTBE -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Salvia_divinorum -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/salvinorin -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Tetrahydrocannabinol -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Selenium_dioxide -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Piperidine -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Resveratrol -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/P4O10 -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Dimethyl_sulfide -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Folate -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Hydroxybenzotriazole -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Hydrogen_cyanide -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Peroxyacetic_acid -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/epothilone -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/paraquat -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/N-butyllithium -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Nafion -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Boron_nitride -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Triclosan -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Hydrogen_peroxide -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Cholesterol -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/DMAP -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/aniline -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Phenol -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Ascorbic_acid -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Nicotine -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Tetra-ethyl_lead -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Acetophenone -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Ethanol -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Acetaldehyde -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/EDTA -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Menthol -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Formic_acid -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Octanitrocubane -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/VX_%28nerve_agent%29 -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Tetraazidomethane -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Lawesson%27s_reagent -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Hexafluoroisopropanol -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Cellulose -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Bremelanotide -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Cellulose -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Dimethicone#Applications -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Shikimic_acid -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Methyl_amine -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Dimethyl_amine -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/DDT -&gt; but no InChI/CID\n</code></pre></div></div>\n\n<p>I really would like to start adding InChI’s for these molecules to Wikipedia, but someone needs to enlighten me about\nthe state of ChemBox? Can the InChI be added to the template, or should the InChI be given elsewhere on the page?\nAdding such small bits is easier than <a href=\"http://mndoci.com/blog/2007/06/17/writing-something-on-wikipedia/\">writing a full entry</a>.</p>",
      "summary": "Only few people are using InChI’s to indicate the molecules the blog about (prominent exceptions are Useful Chemistry and Molecule of the Day). Consequently, the number of detected molecules (without using OSCAR3) in Chemical blogspace has been low.",
      
      "date_published": "2007-06-19T00:00:00+00:00",
      "date_modified": "2025-07-30T00:00:00+00:00",
      "tags": ["wikipedia","inchi"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/yt9bg-zdp74",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/06/16/payed-summer-jobs-in-chemoinformatics.html",
      "title": "Payed summer jobs in chemoinformatics",
      "content_html": "<p>Last year the <a href=\"http://www.programmeerzomer.nl/\">Programmeerzomer.nl</a> sponsored one summer student to work on <a href=\"http://www.bioclipse.net/\">Bioclipse</a>\n(see the <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/06/20/dutch-summer-of-code-sponsors.html\">announcement <i class=\"fa-solid fa-recycle fa-xs\"></i></a>). The Programmeerzomer is much like the\n<a href=\"http://kemistry-desktop.blogspot.com/2007/04/chemical-semantic-desktop.html\">Google Summer of Code where I mentor Alexandr</a>. However, it is much\nsmaller and oriented at just the <a href=\"http://en.wikipedia.org/wiki/Netherlands\">NL area</a>: both the student and the mentor needs to be Dutch,\nbut the opensource project does not.</p>\n\n<p>Rob worked last year on a <a href=\"http://wiki.bioclipse.net/index.php?title=Ghemical_plugin\">Ghemical plugin for Bioclipse</a> (see\n<a href=\"http://www.programmeerzomer.nl/interviews/rob_schellhorn/_rp_links1_elementId/1_1257\">this interview in Dutch</a>). The architecture for doing\ncalculations (the Compute plugin) is still being used within several other plugins. This year I got assigned two students: one for\nBioclipse and one for <a href=\"http://www.jmol.org/\">Jmol</a>.</p>\n\n<p>I have no idea at this moment what ideas the students picked from the lists in the wikis (see the\n<a href=\"http://wiki.jmol.org:81/index.php/ProgrammeerZomer\">Jmol project idea</a> and <a href=\"http://wiki.bioclipse.net/index.php?title=SummerOfCode\">Bioclipse idea</a>\nlists). There is a meeting scheduled in the 25th.</p>\n\n<p>The ideas include:</p>\n<ul>\n  <li>Jmol\n    <ul>\n      <li>a SWT widget for Eclipse RCP-based application</li>\n      <li><a href=\"http://wiki.jmol.org:81/index.php/SoCPharmacophores\">Pharmacophore rendering</a></li>\n      <li>Support for PDB in XML</li>\n      <li><a href=\"http://wiki.jmol.org:81/index.php/SoCAjaxJS\">Ajax-enabled JavaScript library</a> (for the applet)</li>\n    </ul>\n  </li>\n  <li>Bioclipse\n    <ul>\n      <li><a href=\"http://wiki.bioclipse.net/index.php?title=Validating_CML_editor\">a validating CML editor</a></li>\n      <li><a href=\"http://wiki.bioclipse.net/index.php?title=Gromacs_plugin\">a GROMACS plugin</a></li>\n      <li><a href=\"http://wiki.bioclipse.net/index.php?title=Sequence_editor\">a sequence editor</a></li>\n      <li><a href=\"http://wiki.bioclipse.net/index.php?title=SummerOfCode\">webservices over Jabber’s XMPP</a></li>\n      <li>Beanshell and Jython scripting plugins</li>\n    </ul>\n  </li>\n</ul>\n\n<p>If you have a suggestion, it would be much appreciated if you can add that to the wiki pages linked above. Make sure to leave a comment to this blog item too, announcing the new idea!</p>",
      "summary": "Last year the Programmeerzomer.nl sponsored one summer student to work on Bioclipse (see the announcement ). The Programmeerzomer is much like the Google Summer of Code where I mentor Alexandr. However, it is much smaller and oriented at just the NL area: both the student and the mentor needs to be Dutch, but the opensource project does not.",
      
      "date_published": "2007-06-16T00:00:00+00:00",
      "date_modified": "2025-07-30T00:00:00+00:00",
      "tags": ["jmol","bioclipse"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/rxzh8-9yr48",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/06/10/janocchio-jmol-and-cdk-based-1h.html",
      "title": "Janocchio: Jmol and CDK based 1H coupling constant prediction",
      "content_html": "<p>While looking up a reference for <a href=\"http://firstglance.jmol.org/\">FirstGlance in Jmol</a>, I found <a href=\"https://sourceforge.net/projects/janocchio/\">Janocchio</a>,\na <a href=\"http://cdk.sf.net/\">CDK</a> and <a href=\"http://www.jmol.org/\">Jmol</a> based tool for prediction of coupling constants,\n<a href=\"https://doi.org/10.1002/mrc.2016\">recently published</a> in <a href=\"http://www3.interscience.wiley.com/cgi-bin/jhome/3767\">Magnetic Resonance in Chemistry</a>.\nIt’s written by Evans, Bodkin, Baker and Sharman (from <a href=\"http://lilly.com/\">Eli Lilly</a>) and licensed LGPL. It is one of those rare contributions of\npharmaceutical industry, and I can only deeply appreciate this contribution.</p>\n\n<p>A quote from the article:</p>\n\n<blockquote>\n  <p>It was therefore decided to create a Java application and applet,\n‘JAva NOe and Coupling Calculator with Handy Interactive Operation’\n(Janocchio), using the open source libraries of the molecular viewer Jmol\nand the Chemical Development Kit (CDK). It aims to provide a simple and\nintuitive way to calculate both the NOEs and couplings.</p>\n</blockquote>\n\n<p>Release 1.0.1 of last May uses an old Jmol, and the CDK release from 26 August 2005. A bit outdated, and I am wondering if it would\nbe a lot of work to integrate this into Bioclipse. <a href=\"http://wiki.bioclipse.net/index.php?title=SummerOfCode\">Maybe a summer job</a>?</p>",
      "summary": "While looking up a reference for FirstGlance in Jmol, I found Janocchio, a CDK and Jmol based tool for prediction of coupling constants, recently published in Magnetic Resonance in Chemistry. It’s written by Evans, Bodkin, Baker and Sharman (from Eli Lilly) and licensed LGPL. It is one of those rare contributions of pharmaceutical industry, and I can only deeply appreciate this contribution.",
      
      "date_published": "2007-06-10T00:00:00+00:00",
      "date_modified": "2025-02-22T00:00:00+00:00",
      "tags": ["jmol","cdk","nmr"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1002/mrc.2016", "doi": "10.1002/mrc.2016"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/16aw3-e6n02",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/06/09/preprint-servers-cps-failed-how-will.html",
      "title": "Preprint servers: the CPS failed, how will Nature Precedings do?",
      "content_html": "<p>Some 7 years ago, following successes in physics, <a href=\"http://chemweb.com/\">ChemWeb.com</a>\n<a href=\"http://www.prnewswire.co.uk/cgi/news/release?id=10870\">launched the Chemistry Preprint Server (CPS)</a>,\nand <a href=\"https://doi.org/10.1021/ci025627a\">Warr evaluated</a> it in a JCIM article three years later.\nShe wrote about ‘lessons learned’, but the only one seemed to have been that chemistry was not\nready for it, as <a href=\"http://www.iucr.org/iucr-top/lists/epc-l/msg00790.html\">the project shutdown in 2004</a>.\nThe <a href=\"http://www.sciencedirect.com/preprintarchive?url=/CPS\">archives are still available</a>,\nfortunately, and you may find it amusing to look up my or some other submission.</p>\n\n<p>Now, <a href=\"http://blogs.nature.com/wp/nascent/2007/06/coming_soon_nature_precedings.html\">Nascent wrote that Nature is setting up</a>\n<a href=\"http://precedings.nature.com/\">Nature Precedings</a>, which was earlier\n<a href=\"http://pbeltrao.blogspot.com/2007/06/nature-preceedings-pre-print-server-for.html\">noted by Pedro</a>.\nThe <a href=\"https://doi.org/10.1038/447614a\">official announcement</a> was published as an editorial in\nNature. This being a Nature initiative, and not focused on just chemistry, I am sure it will do\nbetter than CPS. BTW, media coverage is <a href=\"http://www.connotea.org/user/timo/tag/Precedings\">tracked in a social way</a>.</p>\n\n<p>I might <a href=\"http://network.nature.com/groups/bioinformatics/notice/2007/06/08/nature-precedings-contributors-wanted\">request an test account</a>;\nI do have an old half-finished manuscript that I never got around to finishing. While still relevant,\nit could use some community input; this preprint server would be the perfect tool. That’s how my first\nmanuscript ended up on CPS too :)</p>",
      "summary": "Some 7 years ago, following successes in physics, ChemWeb.com launched the Chemistry Preprint Server (CPS), and Warr evaluated it in a JCIM article three years later. She wrote about ‘lessons learned’, but the only one seemed to have been that chemistry was not ready for it, as the project shutdown in 2004. The archives are still available, fortunately, and you may find it amusing to look up my or some other submission.",
      
      "date_published": "2007-06-09T00:00:00+00:00",
      "date_modified": "2025-02-16T00:00:00+00:00",
      "tags": ["publishing","nature","connotea"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1021/ci025627a", "doi": "10.1021/ci025627a"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1038/447614a", "doi": "10.1038/447614a"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/8rxpp-1e755",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/06/08/scientific-literature-searching-ranking.html",
      "title": "Scientific Literature: searching, ranking, storage",
      "content_html": "<p>Dealing with scientific literature has been one important theme in <a href=\"http://wiki.cubic.uni-koeln.de/cb/\">Chemical blogspace</a>.\nFor example, ranking articles and how to store your personal PDF archive has been topics of discussion. In this blog I will\nsummarize bits of the discussion, and my personal view on things.</p>\n\n<h2 id=\"searching\">Searching</h2>\n\n<p>Searching literature is traditionally done in systems like Chemical Abstracts and Web-of-Science. The open nature of a\ngrowing number of repositories (e.g. the Dutch <a href=\"http://www.darenet.nl/en/page/language.view/search.page\">DARE</a>) and\nindexing facilities like <a href=\"http://www.ncbi.nlm.nih.gov/sites/entrez?db=pubmed\">PubMed</a> make these proprietary tools\nobsolete.</p>\n\n<p>It is incorrect to assume that these payed services are the only trustworthy sources. Even WoS fails to make the all\nlinks between entries in the database. For example, I am aware of two missing citations to articles I have written,\neven though both the cited and the citing article is available in the system. One of the citing articles was in the\n<a href=\"http://www3.interscience.wiley.com/cgi-bin/jhome/26737?CRETRY=1&amp;SRETRY=0\">Angewandte Chemie</a>!</p>\n\n<p>Additionally, some search services, like <a href=\"http://scholar.google.com/\">Google Scholar</a>, have the advantage that they\nfind copies and close variants of articles in proprietary articles on home pages and in open repositories. Today,\nI learned about <a href=\"http://en.scientificcommons.org/\">Scientific Commons</a> which indexes and links to a staggering\n1.5M publications, using, among others, PubMed and university repositories. Where possible it makes direct links\nto PDF versions of the article.</p>\n\n<h2 id=\"ranking\">Ranking</h2>\n\n<p><a href=\"http://www.chemicalforums.com/index.php?topic=17653.msg67580#msg67580\">Mitch set up</a> <a href=\"http://chemrank.com/\">ChemRank</a>,\nto which <a href=\"https://blogs.ch.cam.ac.uk/pmr/2007/05/30/ranking-chemistry-and-blogosphere-metrics/\">Peter <i class=\"fa-solid fa-recycle fa-xs\"></i></a>, the <a href=\"http://www.thechemblog.com/?p=552\">ChemBlog</a>\nand <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/05/30/chemrank-ranking-scientific-literature.html\">I replied <i class=\"fa-solid fa-recycle fa-xs\"></i></a>. Afterwards,\nI learned that other services are available too, that allow, in addition to setting up an online personal literature\ndatabase, voting and commenting on articles.</p>\n\n<p>Apparently, <a href=\"http://www.citeulike.org/\">CiteULike</a> (CUL) supports this too. In contrast to ChemRank, CUL requires\na login, which I personally see as an advantage, because I can browse literature bookmarked by other accounts I trust.\nThere is also <a href=\"http://www.connotea.org/\">Connotea</a> but I never liked that site that much (e.g. is allows bookmarking\nany web page); <a href=\"https://doi.org/10.59350/6zgf4-2wb06\">Rich has his comments too <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.\nI would also like to mention <a href=\"http://www.biowizard.com/\">BioWizard</a> which is based on the PubMed content, which actually\ncovers a good deal of chemistry literature nowadays too.</p>\n\n<h2 id=\"local-storage\">Local Storage</h2>\n\n<p>These above mentioned systems can be used as alternative to offline bibliographic database systems, like EndNote and\n<a href=\"http://jabref.sf.net/\">JabRef</a>. The latter is my favorite, being based on BibTeX which I use for my LaTeX based\npublications, and is opensource and contains <a href=\"http://www.ohloh.net/accounts/2934/contributions/557\">a few patches</a>\nfrom yours truly. Jungfreudlich wondered <a href=\"http://www.jungfreudlich.de/2007/05/20/how-are-your-paper-files-organized/\">how people organized their PDF archive</a>\nand <a href=\"http://www.jungfreudlich.de/2007/05/20/how-are-your-paper-files-organized/#comment-3199\">I commented how I do it</a>:</p>\n\n<ul>\n  <li>a directory hierarchy based on journal name and year</li>\n  <li>file names that include last name of the first author and year</li>\n  <li>JabRef for the bibiographic database</li>\n  <li><a href=\"http://strigi.sf.net/\">Strigi</a> for full text search</li>\n</ul>\n\n<p><a href=\"http://miningdrugs.blogspot.com/2007/05/literature-management.html\">Jörg</a> and\n<a href=\"http://www.thepowerofgoo.net/2007/05/20/organizing-pdfs-papers/\">the power of goo</a> replied too.</p>\n\n<h2 id=\"mashups\">Mashups</h2>\n\n<p>I have accounts on several online tools now (with some duplication which I don’t like), and I have no idea which of\nthe options will stay around. Time will learn. Good news is that the open characters of many of these allow making\nmashups, and generally integrate tools. For example, JabRef allows downloading citations from PubMed, and Noel\n<a href=\"http://baoilleach.blogspot.com/2007/05/supporting-information-available-as.html\">suggested to use Greasemonkey scripts to link to the supplementary information for his articles</a>,\ninstead of using the mechanisms journals have. I can see the advantage of this, as, for example,\n<a href=\"https://blogs.ch.cam.ac.uk/pmr/2007/05/09/access-to-and-re-use-of-open-data-in-chemistry-impressions/\">Wiley takes full copyright of the data in SI material <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\nwhile Noel’s mechanism would keep the data open.</p>\n\n<p>For now, however, I would very much like to see a meta service where I can query rankings and comment for\narticles using any or all of the above tools.</p>",
      "summary": "Dealing with scientific literature has been one important theme in Chemical blogspace. For example, ranking articles and how to store your personal PDF archive has been topics of discussion. In this blog I will summarize bits of the discussion, and my personal view on things.",
      
      "date_published": "2007-06-08T00:00:00+00:00",
      "date_modified": "2025-08-10T00:00:00+00:00",
      "tags": ["citeulike","publishing","chemistry","connotea"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.59350/6zgf4-2wb06", "doi": "10.59350/6zgf4-2wb06"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/z2dqj-ejb68",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/06/05/blue-obelisk-corner-in-chemical.html",
      "title": "A Blue Obelisk corner in Chemical Blogspace",
      "content_html": "<p>I just finished setting up a <a href=\"http://www.blueobelisk.org/\">Blue Obelisk</a> section for <a href=\"http://wiki.cubic.uni-koeln.de/cb/\">Chemical blogspace</a>,\nas future replacement for the current <a href=\"http://www.blueobelisk.org/planetbo/\">Planet Blue Obelisk</a> (unless someone wants to take over that webpage).\nThe only thing really missing is a RSS feed for <a href=\"http://wiki.cubic.uni-koeln.de/cb/posts.php?category=Blue%20Obelisk\">recent posts</a> for just\nthe <a href=\"http://wiki.cubic.uni-koeln.de/cb/blogs.php?category=Blue%20Obelisk\">Blue Obelisk member blogs</a> (BTW, just email me if you want to be\nlisted as BO member with your blog too; the BO community is very open!).</p>\n\n<p>For now, you will have to do with <a href=\"http://wiki.cubic.uni-koeln.de/cb/index.php?category=Blue%20Obelisk\">this page</a>:</p>\n\n<p><img src=\"/assets/images/cbbo.png\" alt=\"\" /></p>\n\n<p>An additional flaw is that it also shows molecules for other blogs.</p>\n\n<p><strong><em>Update</em></strong>: the RSS feed for a specific category was already available, but just not from the FireFox URL bar. Instead, it is\ngiven on the right side of the posts page when you selected a category. Here a shortcut for the RSS for\n<a href=\"http://wiki.cubic.uni-koeln.de/cb/atom.php?category=Blue%20Obelisk&amp;type=latest_posts\">posts from the Blue Obelisk category</a>.</p>",
      "summary": "I just finished setting up a Blue Obelisk section for Chemical blogspace, as future replacement for the current Planet Blue Obelisk (unless someone wants to take over that webpage). The only thing really missing is a RSS feed for recent posts for just the Blue Obelisk member blogs (BTW, just email me if you want to be listed as BO member with your blog too; the BO community is very open!).",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/cbbo.png",
      "date_published": "2007-06-05T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["cb","blue-obelisk"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/t125s-b8827",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/06/03/finding-email-with-strigi-in-tar.html",
      "title": "Finding email with Strigi in .tar backups",
      "content_html": "<p>Now that <a href=\"http://chemicalblogspace.blogspot.com/2007/05/uploaded-source-code-to-sf-svn.html\">my CUBIC desktop machine is shutting down</a>,\nI made the necessary backups, among a mail.tar for my mail correspondence of about a year. About 500MB in size for almost 8700 files.\n<a href=\"http://strigi.sf.net/\">Strigi</a> is a perfect tool to help me find messages in this archive, as it will recurse into the .tar archive,\nand even into email attachements. I created an index just for the archive with:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>strigicmd create -t clucene -d index/ mail.tar\n</code></pre></div></div>\n\n<p>It took Strigi about 30 seconds to index the whole archive. That’s good performance!</p>\n\n<p>Now, Strigi indexes content full text, but also uses a controlled vocabulary (among which\n<a href=\"http://kemistry-desktop.blogspot.com/2007/04/chemical-semantic-desktop.html\">one specifically for chemistry</a>).\nSo I can search for email messages which have article in the subject with:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>strigicmd query -t clucene -d index/ email.subject:article\n</code></pre></div></div>\n\n<p>However, <code class=\"language-plaintext highlighter-rouge\">From:</code> and <code class=\"language-plaintext highlighter-rouge\">To:</code> content was not yet extracted. That was easily patched. This allows me to find correspondence between me and, for example, Christoph:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>strigicmd query -t clucene -d index/ email.to:Christoph AND email.from:Egon\n</code></pre></div></div>",
      "summary": "Now that my CUBIC desktop machine is shutting down, I made the necessary backups, among a mail.tar for my mail correspondence of about a year. About 500MB in size for almost 8700 files. Strigi is a perfect tool to help me find messages in this archive, as it will recurse into the .tar archive, and even into email attachements. I created an index just for the archive with:",
      
      "date_published": "2007-06-03T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["kde"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/6fa88-qpk80",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/05/30/chemrank-ranking-scientific-literature.html",
      "title": "ChemRank: ranking scientific literature",
      "content_html": "<p><a href=\"http://blog.chemicalforums.com/\">Mitch</a> <a href=\"http://www.chemicalforums.com/index.php?topic=17653\">just launched</a>\n<a href=\"http://www.chemrank.com/\">ChemRank</a>, a website where we can comment on and vote thumbs up or down for scientific articles.\nGood initiative I think. Some thoughts:</p>\n\n<ul>\n  <li>please include the DOI for each article overview on the front page (see <a href=\"http://baoilleach.blogspot.com/2007/04/add-quotes-from-postgenomic-and.html\">why</a>)</li>\n  <li>make the content <a href=\"http://en.wikipedia.org/wiki/Open_data\">opendata</a>, e.g. using the <a href=\"http://en.wikipedia.org/wiki/Creative_Commons\">CC license</a></li>\n  <li>provide a means to refer to other literature to back up comments and ranking</li>\n  <li>provide an API to make mashups (like that of <a href=\"http://blueobelisk.svn.sourceforge.net/viewvc/blueobelisk/cb/trunk/interface/api.php?revision=11&amp;view=markup\">Chemical blogspace for use in Greasemonkey scripts</a>)</li>\n  <li>make the website source code opensource (JSON, RDF come to mind)</li>\n  <li>use microformats where possible (for <a href=\"https://addons.mozilla.org/nl/firefox/addon/4106\">Operator</a> and FF3)</li>\n  <li>at least provide means for tagging articles</li>\n  <li>provide browsing by journal</li>\n  <li>import articles from Connotea/NatureNetwork/etc</li>\n</ul>\n\n<p>Please consider there as feature requests, and not as critique. Two of these are already listed in the\n<a href=\"http://www.chemicalforums.com/index.php?topic=17653\">developers wishlist</a>. I will likely come up with more later :)</p>",
      "summary": "Mitch just launched ChemRank, a website where we can comment on and vote thumbs up or down for scientific articles. Good initiative I think. Some thoughts:",
      
      "date_published": "2007-05-30T00:20:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/bnwdb-pa057",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/05/30/weka-decision-trees-to-java-conversion.html",
      "title": "Weka Decision Trees to Java Conversion",
      "content_html": "<p>Some time ago I wrote a small Perl script to convert a decision tree created with <a href=\"http://www.cs.waikato.ac.nz/~ml/weka/\">Weka</a> in the\n<a href=\"http://www.cs.waikato.ac.nz/~ml/weka/arff.html\">ARFF format</a> to Java source code, for use in the\n<a href=\"http://cheminfo.informatics.indiana.edu/~rguha/code/java/nightly/api/org/openscience/cdk/qsar/descriptors/molecular/IPMolecularDescriptor.html\">ionization potential prediction</a>\nin <a href=\"http://cdk.sf.net/\">CDK</a>. The advantage is that Weka is no longer used are runtime, and that there is no model that needs to be loaded and interpreted. Instead, it is simple Java code that does the work, much faster.</p>\n\n<p>This is the code:</p>\n\n<div class=\"language-perl highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"c1\">#!/usr/bin/perl</span>\n<span class=\"c1\">#</span>\n<span class=\"c1\"># Copyright 2007 (C) Egon Willighagen</span>\n<span class=\"c1\"># License: GPL</span>\n\n<span class=\"k\">use</span> <span class=\"nv\">diagnostics</span><span class=\"p\">;</span>\n<span class=\"k\">use</span> <span class=\"nv\">strict</span><span class=\"p\">;</span>\n\n<span class=\"k\">my</span> <span class=\"nv\">$filename</span> <span class=\"o\">=</span> <span class=\"nv\">$ARGV</span><span class=\"p\">[</span><span class=\"mi\">0</span><span class=\"p\">];</span>\n\n<span class=\"k\">print</span> <span class=\"p\">\"</span><span class=\"s2\">double result = 0.0;</span><span class=\"se\">\\n</span><span class=\"p\">\";</span>\n<span class=\"nb\">open</span><span class=\"p\">(</span><span class=\"nv\">INPUT</span><span class=\"p\">,</span> <span class=\"p\">\"</span><span class=\"s2\">&lt;</span><span class=\"si\">$filename</span><span class=\"p\">\");</span>\n<span class=\"k\">my</span> <span class=\"nv\">$level</span> <span class=\"o\">=</span> <span class=\"mi\">0</span><span class=\"p\">;</span>\n<span class=\"k\">my</span> <span class=\"nv\">$prevLevel</span> <span class=\"o\">=</span> <span class=\"o\">-</span><span class=\"mi\">1</span><span class=\"p\">;</span>\n<span class=\"k\">while</span> <span class=\"p\">(</span><span class=\"k\">my</span> <span class=\"nv\">$line</span> <span class=\"o\">=</span> <span class=\"o\">&lt;</span><span class=\"nv\">INPUT</span><span class=\"o\">&gt;</span><span class=\"p\">)</span> <span class=\"p\">{</span>\n  <span class=\"nv\">$line</span> <span class=\"o\">=~</span> <span class=\"sr\">s/\\n//g</span><span class=\"p\">;</span>\n  <span class=\"nv\">$level</span> <span class=\"o\">=</span> <span class=\"mi\">0</span><span class=\"p\">;</span>\n  <span class=\"k\">while</span> <span class=\"p\">(</span><span class=\"nv\">$line</span> <span class=\"o\">=~</span> <span class=\"sr\">/^\\|\\s*(.*)/</span><span class=\"p\">)</span> <span class=\"p\">{</span>\n    <span class=\"nv\">$level</span><span class=\"o\">++</span><span class=\"p\">;</span>\n    <span class=\"nv\">$line</span> <span class=\"o\">=</span> <span class=\"err\">$</span><span class=\"mi\">1</span><span class=\"p\">;</span>\n  <span class=\"p\">}</span>\n  <span class=\"k\">my</span> <span class=\"nv\">$else</span> <span class=\"o\">=</span> <span class=\"p\">\"\";</span>\n  <span class=\"k\">if</span> <span class=\"p\">(</span><span class=\"nv\">$prevLevel</span> <span class=\"o\">==</span> <span class=\"nv\">$level</span><span class=\"p\">)</span> <span class=\"p\">{</span>\n    <span class=\"nv\">$else</span> <span class=\"o\">=</span> <span class=\"p\">\"</span><span class=\"s2\">else </span><span class=\"p\">\";</span>\n  <span class=\"p\">}</span> <span class=\"k\">elsif</span> <span class=\"p\">(</span><span class=\"nv\">$prevLevel</span> <span class=\"o\">&lt;</span> <span class=\"nv\">$level</span><span class=\"p\">)</span> <span class=\"p\">{</span>\n    <span class=\"c1\"># we increase one level at a time</span>\n    <span class=\"k\">for</span> <span class=\"p\">(</span><span class=\"k\">my</span> <span class=\"nv\">$i</span><span class=\"o\">=</span><span class=\"mi\">0</span><span class=\"p\">;</span> <span class=\"nv\">$i</span><span class=\"o\">&lt;</span><span class=\"p\">(</span><span class=\"nv\">$level</span><span class=\"o\">-</span><span class=\"mi\">1</span><span class=\"p\">);</span> <span class=\"nv\">$i</span><span class=\"o\">++</span><span class=\"p\">)</span> <span class=\"p\">{</span> <span class=\"k\">print</span> <span class=\"p\">\"</span><span class=\"s2\">  </span><span class=\"p\">\";</span> <span class=\"p\">};</span>\n    <span class=\"k\">print</span> <span class=\"p\">\"</span><span class=\"s2\">{</span><span class=\"se\">\\n</span><span class=\"p\">\";</span>\n    <span class=\"nv\">$prevLevel</span> <span class=\"o\">=</span> <span class=\"nv\">$level</span><span class=\"p\">;</span>\n  <span class=\"p\">}</span> <span class=\"k\">else</span> <span class=\"p\">{</span>\n    <span class=\"c1\"># this is a bit more tricky: we possibly need more than</span>\n    <span class=\"c1\"># one end bracket</span>\n    <span class=\"k\">my</span> <span class=\"nv\">$diff</span> <span class=\"o\">=</span> <span class=\"nv\">$prevLevel</span> <span class=\"o\">-</span> <span class=\"nv\">$level</span><span class=\"p\">;</span>\n    <span class=\"k\">for</span> <span class=\"p\">(</span><span class=\"k\">my</span> <span class=\"nv\">$closes</span><span class=\"o\">=</span><span class=\"mi\">0</span><span class=\"p\">;</span> <span class=\"nv\">$closes</span><span class=\"o\">&lt;</span><span class=\"nv\">$diff</span><span class=\"p\">;</span> <span class=\"nv\">$closes</span><span class=\"o\">++</span><span class=\"p\">)</span> <span class=\"p\">{</span>\n      <span class=\"k\">for</span> <span class=\"p\">(</span><span class=\"k\">my</span> <span class=\"nv\">$i</span><span class=\"o\">=</span><span class=\"mi\">0</span><span class=\"p\">;</span> <span class=\"nv\">$i</span><span class=\"o\">&lt;</span><span class=\"p\">(</span><span class=\"nv\">$prevLevel</span><span class=\"o\">-</span><span class=\"nv\">$closes</span><span class=\"o\">-</span><span class=\"mi\">1</span><span class=\"p\">);</span> <span class=\"nv\">$i</span><span class=\"o\">++</span><span class=\"p\">)</span> <span class=\"p\">{</span> <span class=\"k\">print</span> <span class=\"p\">\"</span><span class=\"s2\">  </span><span class=\"p\">\";</span> <span class=\"p\">};</span>\n      <span class=\"k\">print</span> <span class=\"p\">\"</span><span class=\"s2\">}</span><span class=\"se\">\\n</span><span class=\"p\">\";</span>\n    <span class=\"p\">}</span>\n    <span class=\"nv\">$prevLevel</span> <span class=\"o\">=</span> <span class=\"nv\">$level</span><span class=\"p\">;</span>\n  <span class=\"p\">}</span>\n  <span class=\"k\">if</span> <span class=\"p\">(</span><span class=\"nv\">$line</span> <span class=\"o\">=~</span> <span class=\"sr\">/:/</span><span class=\"p\">)</span> <span class=\"p\">{</span>\n    <span class=\"k\">my</span> <span class=\"p\">(</span><span class=\"nv\">$if</span><span class=\"p\">,</span> <span class=\"nv\">$then</span><span class=\"p\">)</span> <span class=\"o\">=</span> <span class=\"nb\">split</span><span class=\"p\">(\"</span><span class=\"s2\">:</span><span class=\"p\">\",</span><span class=\"nv\">$line</span><span class=\"p\">);</span>\n    <span class=\"k\">for</span> <span class=\"p\">(</span><span class=\"k\">my</span> <span class=\"nv\">$i</span><span class=\"o\">=</span><span class=\"mi\">0</span><span class=\"p\">;</span> <span class=\"nv\">$i</span><span class=\"o\">&lt;</span><span class=\"nv\">$level</span><span class=\"p\">;</span> <span class=\"nv\">$i</span><span class=\"o\">++</span><span class=\"p\">)</span> <span class=\"p\">{</span> <span class=\"k\">print</span> <span class=\"p\">\"</span><span class=\"s2\">  </span><span class=\"p\">\";</span> <span class=\"p\">};</span>\n    <span class=\"c1\"># FIXME: java-fy $then</span>\n    <span class=\"k\">if</span> <span class=\"p\">(</span><span class=\"nv\">$then</span> <span class=\"o\">=~</span> <span class=\"sr\">/([\\d|_]*)\\s*\\(([^\\)]*)\\)/</span><span class=\"p\">)</span> <span class=\"p\">{</span>\n      <span class=\"k\">my</span> <span class=\"nv\">$result</span> <span class=\"o\">=</span> <span class=\"err\">$</span><span class=\"mi\">1</span><span class=\"p\">;</span>\n      <span class=\"k\">my</span> <span class=\"nv\">$stats</span> <span class=\"o\">=</span> <span class=\"err\">$</span><span class=\"mi\">2</span><span class=\"p\">;</span>\n      <span class=\"nv\">$result</span> <span class=\"o\">=~</span> <span class=\"sr\">s/_/\\./g</span><span class=\"p\">;</span>\n      <span class=\"k\">print</span> <span class=\"nv\">$else</span> <span class=\"o\">.</span> <span class=\"p\">\"</span><span class=\"s2\">if (</span><span class=\"si\">$if</span><span class=\"s2\">) { result = </span><span class=\"si\">$result</span><span class=\"s2\">; // </span><span class=\"si\">$stats</span><span class=\"s2\"> }</span><span class=\"se\">\\n</span><span class=\"p\">\";</span>\n    <span class=\"p\">}</span> <span class=\"k\">else</span> <span class=\"p\">{</span>\n      <span class=\"k\">print</span> <span class=\"nv\">$else</span> <span class=\"o\">.</span> <span class=\"p\">\"</span><span class=\"s2\">if (</span><span class=\"si\">$if</span><span class=\"s2\">) { result = </span><span class=\"si\">$then</span><span class=\"s2\">; }</span><span class=\"se\">\\n</span><span class=\"p\">\";</span>\n    <span class=\"p\">}</span>\n  <span class=\"p\">}</span> <span class=\"k\">else</span> <span class=\"p\">{</span>\n    <span class=\"k\">for</span> <span class=\"p\">(</span><span class=\"k\">my</span> <span class=\"nv\">$i</span><span class=\"o\">=</span><span class=\"mi\">0</span><span class=\"p\">;</span> <span class=\"nv\">$i</span><span class=\"o\">&lt;</span><span class=\"nv\">$level</span><span class=\"p\">;</span> <span class=\"nv\">$i</span><span class=\"o\">++</span><span class=\"p\">)</span> <span class=\"p\">{</span> <span class=\"k\">print</span> <span class=\"p\">\"</span><span class=\"s2\">  </span><span class=\"p\">\";</span> <span class=\"p\">};</span>\n    <span class=\"k\">print</span> <span class=\"nv\">$else</span> <span class=\"o\">.</span> <span class=\"p\">\"</span><span class=\"s2\">if (</span><span class=\"si\">$line</span><span class=\"s2\">)</span><span class=\"se\">\\n</span><span class=\"p\">\";</span>\n  <span class=\"p\">}</span>\n<span class=\"p\">}</span>\n\n<span class=\"c1\"># OK, now add the rest of the closing brackets</span>\n<span class=\"k\">for</span> <span class=\"p\">(</span><span class=\"k\">my</span> <span class=\"nv\">$closes</span><span class=\"o\">=</span><span class=\"nv\">$prevLevel</span><span class=\"p\">;</span> <span class=\"nv\">$closes</span><span class=\"o\">&gt;</span><span class=\"mi\">0</span><span class=\"p\">;</span> <span class=\"nv\">$closes</span><span class=\"o\">--</span><span class=\"p\">)</span> <span class=\"p\">{</span>\n  <span class=\"k\">for</span> <span class=\"p\">(</span><span class=\"k\">my</span> <span class=\"nv\">$i</span><span class=\"o\">=</span><span class=\"mi\">0</span><span class=\"p\">;</span> <span class=\"nv\">$i</span><span class=\"o\">&lt;</span><span class=\"p\">(</span><span class=\"nv\">$closes</span><span class=\"o\">-</span><span class=\"mi\">1</span><span class=\"p\">);</span> <span class=\"nv\">$i</span><span class=\"o\">++</span><span class=\"p\">)</span> <span class=\"p\">{</span> <span class=\"k\">print</span> <span class=\"p\">\"</span><span class=\"s2\">  </span><span class=\"p\">\";</span> <span class=\"p\">};</span>\n  <span class=\"k\">print</span> <span class=\"p\">\"</span><span class=\"s2\">}</span><span class=\"se\">\\n</span><span class=\"p\">\";</span>\n<span class=\"p\">}</span>\n</code></pre></div></div>",
      "summary": "Some time ago I wrote a small Perl script to convert a decision tree created with Weka in the ARFF format to Java source code, for use in the ionization potential prediction in CDK. The advantage is that Weka is no longer used are runtime, and that there is no model that needs to be loaded and interpreted. Instead, it is simple Java code that does the work, much faster.",
      
      "date_published": "2007-05-30T00:10:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["java","cheminf","cdk"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/whq65-gdm56",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/05/29/jcim-is-linking-to-planet-blue-obelisk.html",
      "title": "The JCIM is linking to Planet Blue Obelisk??",
      "content_html": "<p>I use <a href=\"http://www.google.com/analytics/\">Google Analytics</a> to analyze the visitors of my blogs and of\n<a href=\"http://blueobelisk.org/planetbo/\">Planet Blue Obelisk</a> too. Now, for the past couple of weeks, the webpage of the\n<a href=\"http://pubs.acs.org/journals/jcisd8/index.html\">Journal of Chemical Information and Modeling</a> is\nshowing up as refering site:</p>\n\n<p><img src=\"/assets/images/jcim-bo-link.png\" alt=\"\" /></p>\n\n<p>What is going on here ?!?! This is really no fake, but cannot find an actual link when I visit the journal\nwebpage either…</p>\n\n<p><strong>Update</strong>: When looking at the logs, it becomes even weirder. Nothing shows up, so I can only assume this is a\nglitch in the Google Analytics system :( What I did see in the log, was referrals pointing to Chemical\nBlogspace :) That must be the user script in action.</p>",
      "summary": "I use Google Analytics to analyze the visitors of my blogs and of Planet Blue Obelisk too. Now, for the past couple of weeks, the webpage of the Journal of Chemical Information and Modeling is showing up as refering site:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/jcim-bo-link.png",
      "date_published": "2007-05-29T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["blue-obelisk"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/jtzzf-jfz50",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/05/25/numbers-are-copyrighted.html",
      "title": "Numbers are copyrighted?",
      "content_html": "<p>I just read on <a href=\"http://www.blueobelisk.org/planetbo/\">Planet Blue Obelisk</a> <a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/\">Peter</a>’s\ndisturbing news (via <a href=\"https://doi.org/10.63485/mppz2-19243\">Suber <i class=\"fa-solid fa-recycle fa-xs\"></i></a>) that\n<a href=\"https://blogs.ch.cam.ac.uk/pmr/2007/05/24/sued-for-10-data-points/\">Wiley thinks it can copyright a set of numbers <i class=\"fa-solid fa-recycle fa-xs\"></i></a> (also known as data).\nThat is a sad milestone in scientific publishing. It reminds me of the recent internet hype about a long number recently\nflooding the internet (and notably <a href=\"http://www.del.icio.us/\">del.icio.us</a>) related to watching DVDs you legally bought.\nSome details can be found in this <a href=\"http://www.lwn.net/\">Linux Weekly News</a> article on\n<a href=\"http://lwn.net/Articles/233660/\">How Debian packages a number</a>.</p>\n\n<p>Interestingly, this is really not problems just regarding commercial publishers, or closed access publishing or so. Yesterday,\n<a href=\"http://wiki.cubic.uni-koeln.de/blog/\">Christoph</a> and I working on getting <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/09/08/chemical-archeology-oscar3-to.html\">the NMR spectrum text mining <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\ngoing in <a href=\"http://www.bioclipse.net/\">Bioclipse</a> again for the <a href=\"http://teacher.bmc.uu.se/BioclipseWS07/\">workshop</a>,\nwe noticed that the open access <a href=\"http://bjoc.beilstein-journals.org/\">Beilstein Journal of Organic Chemistry</a>,\ndoes not make <a href=\"http://en.wikipedia.org/wiki/Open_Data\">Open Data</a> reality either: the experimental sections are\ngenerally (all?) excluded from the main text in HTML and obscured in .doc files in the supplementary information.</p>\n\n<p>BTW, this makes me wonder if organic chemists still consider the experimental properties of molecules novel science.</p>",
      "summary": "I just read on Planet Blue Obelisk Peter’s disturbing news (via Suber ) that Wiley thinks it can copyright a set of numbers (also known as data). That is a sad milestone in scientific publishing. It reminds me of the recent internet hype about a long number recently flooding the internet (and notably del.icio.us) related to watching DVDs you legally bought. Some details can be found in this Linux Weekly News article on How Debian packages a number.",
      
      "date_published": "2007-05-25T00:00:00+00:00",
      "date_modified": "2025-07-30T00:00:00+00:00",
      "tags": ["copyright"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.63485/mppz2-19243", "doi": "10.63485/mppz2-19243"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/d3fty-h1t71",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/05/11/added-my-hcard-to-my-blog.html",
      "title": "Added my hCard to my blog",
      "content_html": "<p>Getting back on microformats (see <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/05/11/microformats-in-chemistry.html\">yesterday <i class=\"fa-solid fa-recycle fa-xs\"></i></a>),\nI added my <a href=\"http://microformats.org/wiki/hcard\">hCard</a> to the bottom of my blog:</p>\n\n<p><img src=\"/blog//assets/images/hCard2.png\" alt=\"\" /></p>\n\n<p>I will likely populate it a bit more soon (after holiday in Sweden).</p>\n\n<p>Now, if you had the Firefox plugin <a href=\"https://addons.mozilla.org/en-US/firefox/addon/4106\">Operator</a> installed, you would\nhave my contact information show up in your FF toolbar, like this:</p>\n\n<p><img src=\"/blog//assets/images/hCard1.png\" alt=\"\" /></p>\n\n<p>Note the ‘Export Contact’ button in the toolbar. This will automatically create a vCard which I can directly open in\nmy address book (I use the KDE addressbook). Very nice integration!</p>\n\n<p>Now, I already asked the author how the plugin could be extended to support chemical microformats. Just think of the\nfeature “Export Molecule (137)” (e.g. to <a href=\"http://www.bioclipse.net/\">Bioclipse</a>), when reading a HTML version of paper\nin one of the <a href=\"http://www.rsc.org/Publishing/Journals/ProjectProspect/\">Project Prospect</a> enabled journals :)</p>",
      "summary": "Getting back on microformats (see yesterday ), I added my hCard to the bottom of my blog:",
      "image": "https://chem-bla-ics.linkedchemistry.info/blog//assets/images/hCard1.png",
      "date_published": "2007-05-11T00:10:00+00:00",
      "date_modified": "2025-08-10T00:00:00+00:00",
      "tags": ["microformat","blog"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/rt0kd-kdw56",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/05/11/microformats-in-chemistry.html",
      "title": "Microformats in chemistry...",
      "content_html": "<p><a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/\">Peter</a> blogged some days ago about <a href=\"https://blogs.ch.cam.ac.uk/pmr/2007/05/08/microformats-in-the-chemical-blogosphere-the-chemical-sematic-web-has-arrived/\">microformats and how they could be used in chemistry <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.\nBeing late and a bit absent minded, I added a short comment that <a href=\"http://wiki.cubic.uni-koeln.de/cb/\">Chemical blogspace</a>\n<a href=\"http://chemicalblogspace.blogspot.com/2006/12/hacking-inchi-support-into-cb.html\">supports</a>\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2006/12/10/including-smiles-cml-and-inchi-in.html\">microformats for chemistry <i class=\"fa-solid fa-recycle fa-xs\"></i></a>, and that\n<a href=\"http://chemicalblogspace.blogspot.com/2007/02/latest-blogged-molecules-on-front-page.html\">chemistry is harvested from that</a>,\nand actually <a href=\"http://chemicalblogspace.blogspot.com/2007/01/cb-gets-cmlrss-feed.html\">semantically distributed again using CMLRSS</a>.</p>\n\n<p>In reply to my comment, he wrote <a href=\"https://blogs.ch.cam.ac.uk/pmr/2007/05/09/chemical-microformats-have-arrived-some-time-ago/\">a follow up <i class=\"fa-solid fa-recycle fa-xs\"></i></a> highlighting one of blog items linked\nabove (thanx for that!). Accidentally, he also published my Gmail account and IP address, which was really just for the blog owner to\nsee who did the comment, and not for the world to harvest. This is a moment I am not so happy that Peter’s blog is so popular ;) Peter,\nmaybe be a bit more careful with copy/pasting next time.</p>\n\n<p>Peter and Henry (still not in blogspace?) have been <a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=301\">doing things along these lines for years now</a>,\noften in different contexts. But getting these things going is a bit trickier. Actually, the take up of the chemical microformats\nhas been limited, and at least one alternative mechanism is being used: put the InChI in the <code class=\"language-plaintext highlighter-rouge\">@alt</code> attribute on the <code class=\"language-plaintext highlighter-rouge\">&lt;img&gt;</code> element.\nOther alternatives are possible too, such as recognizing molecules (or whatever else) based on a link to wikipedia; linking to\nentries in <a href=\"http://www.wikipedia.org/\">wikipedia</a> is popular in Chemical blogspace.</p>\n\n<p>One problem in getting microformats accepted, especially among chemists, is to have tools available. Tools meaning dedicated plugins\nfor blogging software to easy adding microformats to a blog item. You’d be suprised how uncommon raw HTML editing has become in the\nlast 10 years. <a href=\"http://structuredblogging.org/\">::: Structured Blogging :::</a> is a provider of such tools. On the using site,\nthere is <a href=\"https://addons.mozilla.org/en-US/firefox/addon/4106\">this nice Firefox plugin</a>, that can extract information available in\nmicroformats, though Firefox3 is supposed to support some microformats natively.</p>\n\n<p>Just today, <a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=309\">Peter also blogged about a Berner-Lee’s presentation</a> with the nice\ncircular phenomena in all these web technologies. The diagrams nicely visualize the complex social aspects of these new technologies.\n(I’m sure the apply to chemoinformatics too… who makes a chemoinfo variant?) RDF is the way to go; it’s the machine interpretable\n(well, more accurate) <em>microformat</em>. All sorts of information is getting available as RDF. For example, check out\n<a href=\"http://www.l3s.de/~siberski/bibtex2rdf/\">bibtex2rdf</a>, <a href=\"http://dbpedia.org/docs/#intro\">Wikipedia as RDF</a>,\n<a href=\"http://dev.isb-sib.ch/projects/uniprot-rdf/\">uniprotRDF</a>, and <a href=\"http://bioguid.info/\">BioGUID</a>. Moreover,\n<a href=\"http://www.w3.org/TR/2007/CR-grddl-20070502/\">GRDDL</a> might mave this even more common.\nI have been maintaining a <a href=\"http://del.icio.us/egonw/rdf\">bookmark list of RDF things happening</a>, check it out,\nthe list is <em>social</em> <strong><em>and</em></strong> <em>using microformats</em>.</p>",
      "summary": "Peter blogged some days ago about microformats and how they could be used in chemistry . Being late and a bit absent minded, I added a short comment that Chemical blogspace supports microformats for chemistry , and that chemistry is harvested from that, and actually semantically distributed again using CMLRSS.",
      
      "date_published": "2007-05-11T00:00:00+00:00",
      "date_modified": "2025-12-29T00:00:00+00:00",
      "tags": ["rdf","microformat","chemistry"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/tsx3t-7tp17",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/05/06/preparing-chemoinformatics-workshop.html",
      "title": "Preparing a Chemoinformatics workshop",
      "content_html": "<p>After handing in a new draft of my PhD manuscript with my co-promotors last friday, and a week before we leave for Sweden, it is\ntime to start finishing up the material for my one hour workshop on chemoinformatics in general and QSAR/QSPR in particular for the\n<a href=\"http://teacher.bmc.uu.se/BioclipseWS07\">Bioclipse Workshop</a>.</p>\n\n<p><a href=\"http://plindenbaum.blogspot.com/2007/05/does-this-remind-you-of-anything.html\">Pierre blogged</a> about this movie. It looks relevant:</p>\n\n<iframe width=\"560\" height=\"315\" src=\"https://www.youtube.com/embed/xFAWR6hzZek\" title=\"YouTube video player\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen=\"\">\n</iframe>",
      "summary": "After handing in a new draft of my PhD manuscript with my co-promotors last friday, and a week before we leave for Sweden, it is time to start finishing up the material for my one hour workshop on chemoinformatics in general and QSAR/QSPR in particular for the Bioclipse Workshop.",
      
      "date_published": "2007-05-06T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/z1vb6-w9v12",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/05/05/cb-comments-for-inchis.html",
      "title": "Cb comments for InChI&apos;s",
      "content_html": "<p>About a year ago <a href=\"http://pbeltrao.blogspot.com/2006/05/postgenomics-script-for-firefox-i-am.html\">Pedro wrote a Greasemonkey script</a>\nto add comments from <a href=\"http://www.postgenomic.com/\">PostGenomic.com</a> to table of contents of scientific journals.\n<a href=\"http://baoilleach.blogspot.com/2007/04/add-quotes-from-postgenomic-and.html\">Noel extended</a> it with support for\n<a href=\"http://wiki.cubic.uni-koeln.de/cb/\">Chemical blogspace</a> (see also <a href=\"http://chemicalblogspace.blogspot.com/2007/03/jacs-toc-featuring-your-review.html\">this earlier item</a>).\nNow, the later website is maintained by me, and I\n<a href=\"http://chemicalblogspace.blogspot.com/2006/12/hacking-inchi-support-into-cb.html\">extended the aggregator software with molecule support</a>,\nfor example to show <em>hot</em> <a href=\"http://chemicalblogspace.blogspot.com/2007/02/latest-blogged-molecules-on-front-page.html\">molecules on the frontpage</a>\n(at some point <a href=\"http://www.ghastlyfop.com/blog/2007/05/quick-notices.html\">my patches will be backported into mainstream</a>.\nEuan, why not invite me to London HQ in, say, June?).</p>\n\n<p>So, when we can show comments from blogosphere for journal articles, why can’t we do that for molecules too? Sure we can.\nJust needs some hacking. Right, and done that today. The scripts works for <a href=\"http://pubchem.ncbi.nlm.nih.gov/\">PubChem</a>:</p>\n\n<p><img src=\"/assets/images/cb_inchi_greasemonkey1.png\" alt=\"\" /></p>\n\n<p>Works for any <code class=\"language-plaintext highlighter-rouge\">&lt;a href&gt;</code> element with an URL to PubChem like <em>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=search&amp;DB=pccompound&amp;term=%22InChI=1/CH4/h1H4%22[InChI]</em>.\nBTW, while the URL is not very readable, this might actually be a good way to <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/02/20/invisible-inchis.html\">hide InChIs <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\nthough I am sure Google will not index this InChI either.</p>\n\n<p>And it also works for <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/12/10/including-smiles-cml-and-inchi-in.html\">semantically marked up InChI’s (using either microformats or RDFa) <i class=\"fa-solid fa-recycle fa-xs\"></i></a>:</p>\n\n<p><img src=\"/assets/images/cb_inchi_greasemonkey.png\" alt=\"\" /></p>\n\n<p>You’ll notice here that it is friendly with my\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2006/12/17/smiles-cas-and-inchi-in-blogs.html\">Sechemtic script to make links to Google and PubChem <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.</p>\n\n<p>The tools to make this happen involves a new Greasemonkey script (based on Noels code), and a few patches to the Postgenomic.com software.\nThe user script can be downloaded <a href=\"http://userscripts.org/scripts/show/9002\">here</a>. An entry on the\n<a href=\"http://wiki.cubic.uni-koeln.de/bowiki/index.php/Using_Javascript_and_Greasemonkey_for_Chemistry\">Blue Obelisk userscript page</a>\nwill follow; check that page for more goodies.</p>",
      "summary": "About a year ago Pedro wrote a Greasemonkey script to add comments from PostGenomic.com to table of contents of scientific journals. Noel extended it with support for Chemical blogspace (see also this earlier item). Now, the later website is maintained by me, and I extended the aggregator software with molecule support, for example to show hot molecules on the frontpage (at some point my patches will be backported into mainstream. Euan, why not invite me to London HQ in, say, June?).",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/cb_inchi_greasemonkey1.png",
      "date_published": "2007-05-05T00:00:00+00:00",
      "date_modified": "2025-04-12T00:00:00+00:00",
      "tags": ["cb","inchi","userscript","rdf"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/ztgma-ppp65",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/04/30/improved-cmlrss-feed-for-chemical.html",
      "title": "Improved CMLRSS feed for Chemical blogspace",
      "content_html": "<p>While adding the <a href=\"http://wiki.cubic.uni-koeln.de/cb/blog_search.php?blog_id=116\">116th blog (David Bradley’s Chemistry News)</a>\nto <a href=\"http://wiki.cubic.uni-koeln.de/cb/\">Chemical blogspace</a> (see also <a href=\"http://chemicalblogspace.blogspot.com/2007/04/new-blogs-5.html\">New Blogs #5</a>,\nI noticed that <a href=\"http://www.sciencebase.com/science-blog/\">David</a> is using\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2006/12/10/including-smiles-cml-and-inchi-in.html\">semantic markup for InChI’s <i class=\"fa-solid fa-recycle fa-xs\"></i></a> (thanx!).</p>\n\n<p>That urged me to finally clean up <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/02/25/hacking-inchi-support-into.html\">my InChI extension<i class=\"fa-solid fa-recycle fa-xs\"></i></a> for the\n<a href=\"http://www.postgenomic.com/\">Postgenomic.com</a> software. One important step was to create a PHP page with only one InChI, such as\n<a href=\"http://wiki.cubic.uni-koeln.de/cb/inchi.php?id=61\">this one</a>. That would solve the something broken links in the CMLRSS feed,\nbecause of characters in InChIs that Apache cannot handle as the PHP page expects. Once that was done, I also pimped up the\nCMLRSS feed itself: I added a human-friendly name, the title of the blog item discussing the molecule, and the picture Cb\ndownloads from <a href=\"http://pubchem.ncbi.nlm.nih.gov/\">PubChem</a>:</p>\n\n<p><img src=\"/assets/images/cmlrss_Cb2.png\" alt=\"\" /></p>\n\n<p>Of course the feed is still <a href=\"http://chem-bla-ics.blogspot.com/search?q=CMLRSS\">CML enabled</a>.</p>",
      "summary": "While adding the 116th blog (David Bradley’s Chemistry News) to Chemical blogspace (see also New Blogs #5, I noticed that David is using semantic markup for InChI’s (thanx!).",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/cmlrss_Cb2.png",
      "date_published": "2007-04-30T00:00:00+00:00",
      "date_modified": "2025-07-30T00:00:00+00:00",
      "tags": ["cb","rss","cml","inchi","semweb","pubchem"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/vw63b-w5m79",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/04/27/ex-cubic-get-together.html",
      "title": "Ex-CUBIC get-together",
      "content_html": "<p>Yesterday and today I was in Cologne to meet with other ex-CUBIC researchers from <a href=\"http://wiki.cubic.uni-koeln.de/blog/\">Christoph</a>’s\n<a href=\"http://almost.cubic.uni-koeln.de/jrg\">research group on chemoinformatics</a> (and <a href=\"http://kemistry-desktop.blogspot.com/2007/04/gsoc-meeting-with-alexandr.html\">with Alexandr</a>).\nNot all former group members where there, but on the other hand we were complemented with Pascal:</p>\n\n<p><img src=\"/assets/images/DSCI0173.JPG\" alt=\"\" /></p>\n\n<p>(Yes, the sun was <strong>very</strong> bright :)</p>\n\n<p>The program was consisted of a couple of group things, like making a short list of articles to write up in the next\nfew months. Yesterday evening ended in a very nice Biergarten called the <a href=\"http://www.kneipen-suche.com/koeln-altenberger_hof-2541.html\">Altenberger Hof</a>.</p>",
      "summary": "Yesterday and today I was in Cologne to meet with other ex-CUBIC researchers from Christoph’s research group on chemoinformatics (and with Alexandr). Not all former group members where there, but on the other hand we were complemented with Pascal:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/DSCI0173.JPG",
      "date_published": "2007-04-27T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["cheminf","career"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/hs2p8-r1c84",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/04/24/bioclipse-now-allows-qsar-descriptor.html",
      "title": "Bioclipse now allows QSAR descriptor selection",
      "content_html": "<p>In preparation for the <a href=\"http://teacher.bmc.uu.se/BioclipseWS07/Welcome.html\">Embrace Workshop for Bioclipse</a> in May, I am working on the QSAR functionality of\n<a href=\"http://www.bioclipse.net/\">Bioclipse</a>. A nice extension point got set up some time ago, called\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2006/11/03/bioclipse-workshop-short-but.html\">DescriptorProvider <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\nand implemented by plugins to allow calculation of one or more descriptors for the selected molecules. Now, the\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2006/07/11/matrix-support-in-bioclipse.html\">functionality for the resulting matrix <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nhas been around for some time too.</p>\n\n<p>What had not been available yet, was some GUI stuff to select descriptors to calculate, and the actual calculation. While the latter is yet to be\nhooked up, the selection of descriptors is now available:</p>\n\n<p><img src=\"/assets/images/bioclipseDescriptorSelection.png\" alt=\"\" /></p>\n\n<p>Interesting here is the use of OWL. CDK’s <code class=\"language-plaintext highlighter-rouge\">DescriptorEngine</code> provides a simple API written by Rajarshi that interfaces to the dictionary support\nfor OWL (which CDK offers in addition to CML based dictionaries). All CDK descriptors are written up in OWL (the\n<a href=\"http://cdk.svn.sourceforge.net/viewvc/cdk/trunk/cdk/src/org/openscience/cdk/dict/data/descriptor-algorithms.owl?view=markup\">source file</a>\nand the <a href=\"http://qsar.sourceforge.net/dicts/qsar-descriptors/index.xhtml\">HTML version</a>).\nYou’ll notice the weird characters in the screenshot; there something goes wrong with the encoding when reading the OWL.</p>",
      "summary": "In preparation for the Embrace Workshop for Bioclipse in May, I am working on the QSAR functionality of Bioclipse. A nice extension point got set up some time ago, called DescriptorProvider , and implemented by plugins to allow calculation of one or more descriptors for the selected molecules. Now, the functionality for the resulting matrix has been around for some time too.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/bioclipseDescriptorSelection.png",
      "date_published": "2007-04-24T00:00:00+00:00",
      "date_modified": "2025-04-12T00:00:00+00:00",
      
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/qd21w-71g30",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/04/23/cdk-10-milestone-after-7-year-of.html",
      "title": "CDK 1.0: a milestone after 7 year of development",
      "content_html": "<p>Last night, I <a href=\"http://sourceforge.net/project/showfiles.php?group_id=20024\">released CDK 1.0</a> as the previous release candidate\ndid not show up new major problems. It is far from a perfect release (see these still <a href=\"http://wiki.cubic.uni-koeln.de/cdkwiki/doku.php?id=cdk1.0\">TODO</a>’s\nand <a href=\"http://cheminfo.informatics.indiana.edu/~rguha/code/java/nightly/\">Nightly</a>, run by\n<a href=\"http://cheminfoclub.blogspot.com/\">Rajarshi</a>), but the core is pretty solid.</p>\n\n<p>I would warmly thank everyone who has contributed to the project in one way or another (I worked more on maintainance than\nimplementing functionality), as it has been a great pleasure to make CDK releases. <a href=\"http://www.ohloh.net/\">OHLOH</a> runs a rather nice\ndeveloper <a href=\"http://www.ohloh.net/projects/380/analyses/latest/contributors\">hall of fame for the CDK</a>. You’ll see that\n<a href=\"http://wiki.cubic.uni-koeln.de/blog/\">Christoph</a>’s research group is the major contributor. User contributions, however,\nare equally important and played a bug role in the quite <a href=\"http://cheminfo.informatics.indiana.edu/~rguha/code/java/nightly/junitsummary.html\">large set of JUnit tests</a>\nwe have now (3300+).</p>\n\n<p>Another reason why this is an important milestone, is that it is the last release I am creating. I wrote on the user list:</p>\n\n<blockquote>\n  <p>In advance of the actual CDK 1.0 release, thanx very much to all that contributed big <em>and</em> small ! It was a great 7 years of open source\nchemoinformatics development!</p>\n\n  <p>Hey, that actually sounds like I am stepping down… Well, it <em>is</em> time for a new generation to step up indeed. I won’t leave the project,\nbut being CDK News editor, CDK release manager, CDK code developer is a bit much for doing outside office hours. I feel that I have clearly\nenough made my point for open source chemoinformatics, and it is time for something else… which will very likely involve the CDK, but\nlikely more as user only… I was hoping in the past few years, that the transition would go smoothly, and have been trying to get people\ninterested in various emails, including this one; however, being humans, we wait for the catastrophe and only after that we’re shocked and\nstart doing something about it. So, yeah, I’m forced to make this drastic announcement: CDK 1.0 will be the last CDK release <em>I</em> will make.</p>\n</blockquote>\n\n<p>So, who wants to take over? Some one will have to. I, however, will put my focus on other things. Very likely involving the CDK, as there\nare still many things I want to do. Some things I have on my list:</p>\n\n<ul>\n  <li>the Java2D based 2D renderer/editor</li>\n  <li>more accurate atom type perception</li>\n  <li>more articles for CDK News</li>\n  <li>the book “CDK for Dummies”</li>\n  <li>improved structure generator</li>\n  <li>validation</li>\n  <li>…</li>\n</ul>",
      "summary": "Last night, I released CDK 1.0 as the previous release candidate did not show up new major problems. It is far from a perfect release (see these still TODO’s and Nightly, run by Rajarshi), but the core is pretty solid.",
      
      "date_published": "2007-04-23T00:00:00+00:00",
      "date_modified": "2025-02-22T00:00:00+00:00",
      "tags": ["cdk","cheminf","junit"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/ytpss-fw666",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/04/21/clustering-web-search-results.html",
      "title": "Clustering web search results",
      "content_html": "<p>The Dutch <a href=\"http://www.intermediair.nl/\">Intermediair</a> magazine of this week had a letter sent by a reader introducing\n<a href=\"http://clusty.com/\">Clusty</a>, a web search engine that clusters the results. It does a pretty good job for\n‘<a href=\"http://clusty.com/search?input-form=clusty-simple&amp;v%3Asources=webplus&amp;query=egon+willighagen\">egon willighagen</a>’:</p>\n\n<p><img src=\"/assets/images/clusty1.png\" alt=\"\" /></p>\n\n<p>It seems to use other engine to do the searching and focus on the clustering. Source engine exclude Google, and include\n<a href=\"http://gigablast.com/\">Gigablast</a>, <a href=\"http://www.msn.com/\">MSN</a> and <a href=\"http://wikipedia.org/\">Wikipedia</a>.</p>\n\n<p>For <em>chemoinformatics</em> it comes up with the following top 10 clusters: ‘Drug Discovery’, ‘Structure’, ‘Cheminformatics’,\n‘Research’, ‘Books’, ‘Conference, German’, ‘Textbook, Gasteiger’, ‘Laboratory’, ‘Handbook of Chemoinformatics’, and\n‘School’. Quite acceptable and useful clustering.</p>\n\n<p>This might be the next step in googling. Rich, it also might solve <a href=\"https://doi.org/10.59350/evjkt-07173\">your problem <i class=\"fa-solid fa-recycle fa-xs\"></i></a>:\nsearching for ‘ruby chemoinformatics’ does <strong>not</strong> give a ‘Depth First’ or ‘Rich Apodaca’ cluster :)</p>",
      "summary": "The Dutch Intermediair magazine of this week had a letter sent by a reader introducing Clusty, a web search engine that clusters the results. It does a pretty good job for ‘egon willighagen’:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/clusty1.png",
      "date_published": "2007-04-21T00:00:00+00:00",
      "date_modified": "2025-02-16T00:00:00+00:00",
      "tags": ["google","cheminf"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.59350/evjkt-07173", "doi": "10.59350/evjkt-07173"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/z8a1a-5vh57",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/04/06/cubic-period-is-over.html",
      "title": "CUBIC period is over",
      "content_html": "<p>The end of the CUBIC has come, and so did the end of my 1-year postdoc in the group of <a href=\"http://wiki.cubic.uni-koeln.de/blog/\">Christoph Steinbeck</a>.\nIt would have been much better if the group could have continued for one or two more years, so that we could harvest the fruit of the work done in\nthe past years. Only having been group member since April 1 2006, I mostly contributed work to <a href=\"http://www.bioclipse.net/\">Bioclipse</a>\n(doi:<a href=\"https://doi.org/10.1186/1471-2105-8-59\">10.1186/1471-2105-8-59</a>), CMLSpect (submitted), and integrating Miguel’s mass spectrum prediction\ntoolkit into SENECA (doi:<a href=\"https://doi.org/10.1021/ci000407n\">10.1021/ci000407n</a>) for structure elucidation. The latter topic is rather exciting\nand when the method shows powerful enough, this will have a major impact on the field of <a href=\"http://en.wikipedia.org/wiki/Metabolite\">metabolomics</a>.</p>\n\n<p>BTW, importantly, my CUBIC email address is no longer valid, so please use one of my many other email addresses, e.g. my SourceForge one, or\nmy Gmail account.</p>",
      "summary": "The end of the CUBIC has come, and so did the end of my 1-year postdoc in the group of Christoph Steinbeck. It would have been much better if the group could have continued for one or two more years, so that we could harvest the fruit of the work done in the past years. Only having been group member since April 1 2006, I mostly contributed work to Bioclipse (doi:10.1186/1471-2105-8-59), CMLSpect (submitted), and integrating Miguel’s mass spectrum prediction toolkit into SENECA (doi:10.1021/ci000407n) for structure elucidation. The latter topic is rather exciting and when the method shows powerful enough, this will have a major impact on the field of metabolomics.",
      
      "date_published": "2007-04-06T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["bioclipse","cml"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1186/1471-2105-8-59", "doi": "10.1186/1471-2105-8-59"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1021/ci000407n", "doi": "10.1021/ci000407n"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/fmqmf-03x28",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/03/29/acs-chicago-day-3.html",
      "title": "ACS Chicago - Day #3",
      "content_html": "<p>Tuesday promised to be an interesting day: an interesting ‘Scientific Communication’ CINF session in the morning and early\nafternoon. And, rather important to me, the <a href=\"http://www.blueobelisk.org/\">Blue Obelisk</a> dinner that night, just after another\nCINF party, where I chatted with a few others about options of a chemistry equivalent of the <a href=\"http://code.google.com/soc/\">Google Summer of Code</a>;\nwho knows what happens this summer, but start thinking about ideas on how to increase the web experience of chemistry journal web pages.</p>\n\n<h2 id=\"the-gsoc\">The GSoC</h2>\n\n<p>Now that I am talking about the GSoC, you might have realized that the <a href=\"http://cdk.sf.net/\">CDK</a> and\n<a href=\"http://www.bioclipse.net/\">Bioclipse</a> did not make it as mentoring organization. While I had not seriously expected it,\nall the enthusiasm from within both projects including several interested students, I was a bit hoping for getting\naccepted with at least on of them. Meanwhile, <a href=\"http://www.kde.org/\">KDE</a>, as expected, is approved, and actually contains\ntwo interesting chemistry project ideas too. One is about a 3D viewer/editor for which 7 students send Google a proposal,\nand the other about text mining of chemical content on the desktop, using <a href=\"http://www.vandenoever.info/software/strigi/\">Strigi</a>\n(two students). Both topics have one excellent proposal, who do good in the ranking process. So, we might have some\nchemistry in the GSoC after all.</p>\n\n<h2 id=\"cinf\">CINF</h2>\n\n<p>OK, back to the ACS meeting. Fahrenbach had a presentation on blogging too, but don’t remember anything special about it.\nThe <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/03/26/acs-chicago-day-1.html\">CHED <i class=\"fa-solid fa-recycle fa-xs\"></i></a> session was more elaborate on the whole topic,\nand since you are a reader of chemical blogs, you all know about this anyway :) Loney introduced\n<a href=\"http://biotechexchange.org/\">biotechexchange.org</a> which is building a social network around biotechnology. There are other\ncommunity sites like this, and my major <em>problem</em> with these community building efforts is that they are too well defined.\nI much prefer to work in a more open environment where I can get in contact just as easily with people outside some specific\ntopic. For the rest, the set of technologies is rather comprehensive.</p>\n\n<p>Frenkel spoke about the imminent success of <a href=\"http://trc.nist.gov/ThermoML.html\">ThermoML</a>, which is now being supported by\nvendors and publishers, smoothing the whole dissemination of data supported by this format. It is basically what\n<a href=\"http://www.xml-cml.org/\">CML</a> is attempting to achieve in molecular structure data. Day is having a good go at this with\ncrystal data, and <a href=\"http://wwmm.ch.cam.ac.uk/wikis/wwmm/index.php/CMLCrystBase\">his CrystalEye project</a> is supposed to be\nlaunched next month.</p>\n\n<p>Hey, at Microsoft.com, had a rather manager level presentation, with very little value for someone into the field of\n‘data lifecycle and curation’. Rather disappointing on a scientific conference, or am I judging the ACS conference here?\nIf the Microsoft is getting interested in chemoinformatics that might be a good thing, as long as the are OK with open\nsource, open data and open standards. Who knows…</p>\n\n<p>Rzepa had his presentation on the semantic wiki, which he, in similar form, <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/11/14/german-conference-on-chemoinformatics_14.html\">held at the German Chemoinformatics Conference\ntoo <i class=\"fa-solid fa-recycle fa-xs\"></i></a>. New, I think, were the sheets\non reasoning based on the content of the wiki. That was rather interesting. If we all would make our chemical knowledge\navailable as <a href=\"http://en.wikipedia.org/wiki/Resource_Description_Framework\">RDF</a>, then this can become a big thing very\nsoon. I skipped the presentation of Renear on ontologies, though it was actually one that I had hand picked; but I was\nsimply too tired. Will watch the podcast when available. (BTW, are they making podcasts for the CINF session only, or\nfor the whole ACS meeting?)</p>\n\n<p>In the afternoon, I also followed just a subset of the presentations. The last one was by Scott <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/03/26/acs-chicago-day-1.html\">who spoke earlier on\nSecond Life <i class=\"fa-solid fa-recycle fa-xs\"></i></a>. I’m really interested in seeing where this\nis going, though I have my reservations if this is the right medium for mining chemical knowledge. Today she spoke on the\nsocial bookmarking and podcasting initiatives at Nature and <a href=\"http://network.nature.com/\">Nature Network</a>. The latter is\na social site, like like BioTechExchange, but not limited to one specific topic, and more interdisciplinary (my account\nat <a href=\"http://network.nature.com/profile/U6151BCD6\">NN</a>). I blogged <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/02/23/nature-network-v2-cannot-create-new.html\">about some early issues <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nsome time ago.</p>\n\n<h2 id=\"jmol\">Jmol</h2>\n\n<p>Jabri showed us how <a href=\"http://www.jmol.org/\">Jmol</a> is adding value to the <a href=\"http://pubs.acs.org/journals/acbcct/index.html\">ACS Chemical Biology</a>\njournal. Yes, that’s what she said. An opensource tool, developed by people on their free time, is making an ACS journal\nmore valuable. I am very happy to hear that, and it strongly supports our view that opensource chemoinformatics is very\nimportant. Some more support from established organizations might be in order indeed!</p>\n\n<h2 id=\"blue-obelisk\">Blue Obelisk</h2>\n\n<p>It has become a habit to organize <a href=\"http://www.blueobelisk.org/\">Blue Obelisk</a> dinner to talk about opensource, opendata,\nopenscience, and the future. Actually, we secretly talk about talking over the world, but I can’t say that. (Neither can\nI tell anything about our secret rituals. The protocol for becoming member of our society is quite simple though: be\nclear about your opinion that ODOSOS is the future.) Dinner was great, and it was great talking to several older and\nnewer members of the movement. Cheers all! Oh, there also was some awards involved, but I hope Peter and Christoph will\nblog about that and post the pictures.</p>",
      "summary": "Tuesday promised to be an interesting day: an interesting ‘Scientific Communication’ CINF session in the morning and early afternoon. And, rather important to me, the Blue Obelisk dinner that night, just after another CINF party, where I chatted with a few others about options of a chemistry equivalent of the Google Summer of Code; who knows what happens this summer, but start thinking about ideas on how to increase the web experience of chemistry journal web pages.",
      
      "date_published": "2007-03-29T00:20:00+00:00",
      "date_modified": "2025-04-12T00:00:00+00:00",
      "tags": ["acs"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/6t4jd-n0892",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/03/29/acs-chicago-day-2.html",
      "title": "ACS Chicago - Day #2",
      "content_html": "<p>The wetter was much better today. This is a view on downtown from the walking bridge between Lake Side and McCormick\nbuildings of the conference site:</p>\n\n<p><img src=\"/assets/images/dsci0028.jpg\" alt=\"\" /></p>\n\n<h2 id=\"cinf-morning\">CINF morning</h2>\n\n<p>Yeah, more CINF session reports; I’m a chemoinformatician, remember. Chen showed us around in the latest changes in\n<a href=\"http://cdb.ics.uci.edu/CHEM/Web/\">ChemDB</a>, such as retrosynthesis planning. Banik shows a patented method for\nshowing differences in a set of spectra, though his examples were not really impressive; if the method is really\npowerful, the examples might have been picked a bit more careful. And I have to say, in retrospect, I found the\npresentations in the CINF sessions typically of lower quality than I had expected for the big ACS meeting. Fortunately,\nmeeting all the people here makes more than up for that. Guha presented a potentially powerful method to cluster\nlarge and huge data sets with a method that approximated SVD by splitting up the full matrices into many smaller ones.</p>\n\n<h2 id=\"laptop-lane\">Laptop Lane</h2>\n\n<p>The idea was that us bloggers met up, but that did not quite work out. No problem though. Spoke about many things\nwith several people. <a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/\">Peter</a> pointed me to an email from Noel on getting\n<a href=\"http://chemicalblogspace.blogspot.com/2007/03/jacs-toc-featuring-your-review.html\">blog comments on JACS/ChemComm/JCIM papers on the table of contents</a>.\nAnd somewhere that day, <a href=\"http://usefulchem.blogspot.com/\">Jean-Claude</a> pointed me to\n<a href=\"http://usefulchem.blogspot.com/2007/03/communicating-chemistry-at-acs.html\">more chemistry in Second Life</a>\n(set up by <a href=\"http://bethssecondlife.blogspot.com/\">Beth</a>). Both are great follow ups on\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2007/03/26/acs-chicago-day-1.html\">the blog/wiki session by CHED yesterday <i class=\"fa-solid fa-recycle fa-xs\"></i></a>!\nAround 17:00 we left for one of the receptions in the W hotel in one of the WOW rooms, though the view was not that spectacular:</p>\n\n<p><img src=\"/assets/images/dsci0037.jpg\" alt=\"\" /></p>\n\n<h2 id=\"bulls\">Bulls</h2>\n\n<p>I did not stay very long at the party, as I had to leave for the Bulls game against Portland. That was fun indeed!\nIf I knew my first breakfast was going to cost 22 dollar, I would have bought a better ticket, but now I had a\n<em>Stand 1</em> ticket which is so high up in the stadium that even the security people had not idea where to send\nme :) The view was still more than good enough to feel the pain/joy of the blocks and dunks-in-your-face:</p>\n\n<p><img src=\"/assets/images/dsci0042.jpg\" alt=\"\" /></p>",
      "summary": "The wetter was much better today. This is a view on downtown from the walking bridge between Lake Side and McCormick buildings of the conference site:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/dsci0028.jpg",
      "date_published": "2007-03-29T00:10:00+00:00",
      "date_modified": "2025-04-12T00:00:00+00:00",
      "tags": ["acs"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/pjej9-6ab48",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/03/26/acs-chicago-day-1.html",
      "title": "ACS Chicago - Day #1",
      "content_html": "<p>I was happy to notice just a minute ago that the first blog items covering the\n<a href=\"http://acswebcontent.acs.org/nationalmeeting/chicago2007/home.html\">ACS meeting</a> are popping up: C&amp;EN has set up a\n<a href=\"http://cen07.wordpress.com/\">dedicated blog about the meeting</a>, Nature’s Sceptical Caterine\n<a href=\"http://blogs.nature.com/thescepticalchymist/\">wrote she has reached the meeting too</a>, Richard wrote about the\n<a href=\"http://www.rscweb.org/blogs/cw/\">scent of bugs in wine</a> (or so), and\n<a href=\"http://www.rscweb.org/blogs/cw/\">Kyle won’t make it other than tomorrow</a>. Additionally, Nature is running a\n<a href=\"http://blogs.nature.com/news/blog/conference_reports/american_chemical_society/\">coverage of the ACS meeting</a>.\nOn the reader side, Paul is <a href=\"http://blog.chembark.com/2007/03/25/acs-07-chicago/\">hoping that Whitesides</a>\nwill be blogged about.</p>\n\n<p>My first day at the conference was interesting. The huge facility makes navigation a bit problematic, and we\nseemed to make it a habit to explore the wrong end of the building before heading in the right direction.\nThere are a lot of maps in the ACS On-Site Meeting Program, but a nice overview map is lacking. Anyway, I\nspent the morning session in the ‘blog, wiki, and podcast session’, and the afternoon in CINF session honoring\n<a href=\"http://www.informatics.indiana.edu/people/profiles.asp?u=wiggins\">Prof. Wiggins</a>.</p>\n\n<h2 id=\"ched\">CHED</h2>\n\n<p>Vogel was the first speaker in the CHED C Section morning session, and spoke about blogs and RSS feeds in general.\n<a href=\"http://www.chemicalforums.com/index.php?topic=13540.msg62586#msg62586\">Mitch’ Yahoo Pipes hackup</a> was mentioned\nin one of the talks in this morning session. Currano followed with a discussion on social bookmarking, and so did\nPence who focussed on the function in education. Francl put chemical blogging in some perspective which led to a\nshort discussion on the difference in idea between blogs and wiki’s. Gelder and Picione spoke about podcasting as\nmultimedia blogs. Scott represented recent work by Nature in exploring Second Life technologies, and mentioned the\nchemistry on their island, which happened to host <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/12/06/chemoblogs-2.html\">a session of the First Online EMBL PhD Symposium last year <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.\nBradley spoke about how he integrated blogs and wiki’s into there practicals. The atmosphere of the session was\nrelaxed and the discussion lively.</p>\n\n<h2 id=\"cinf\">CINF</h2>\n\n<p>The downside of all these parallel sessions is that it is bound to give clashes. It’s apparently even supposed to,\nbecause the ACS website private schedule assistant is made to make you aware and resolve such clashes. So, while\nI had to skip the CINF morning session honoring Wiggens, I had to skip the CHED session on social networking continuing\non the CHED morning session. For example, I has to miss the presentation by <a href=\"http://www.ch.ic.ac.uk/rzepa/confchem06/\">Rzepa on the semantic wiki</a>\n(Henry, I hope to have made up for it, by plugging your work here :)</p>\n\n<p>Murray-Rust was the first speaker of the CINF Section A afternoon session, and talked about mashups, text mining\nand other things done in Cambridge. He also mentioned recent Greasemonkey scripts using comments from and\nenhancing our chemical blogs, now <a href=\"http://wiki.cubic.uni-koeln.de/bowiki/index.php/Using_Javascript_and_Greasemonkey_for_Chemistry\">described at the Blue Obelisk website</a>.\n(Especially the Chemical blogspace enhanced TOC of chemistry journals is nice.) Wild spoke about\n<a href=\"http://djwild.info/acs07/\">integrating text mining and chemoinformatics tools</a>, and showed a mockup of a\n‘by the way’ system for PubChem, where a PubChem entry would be enhanced with ‘BTW, did you know that these 7\narticles mention this molecules, and that … etc’. These things are going to happen this year. Heller held his\nusual talk on InChI and PubChem, though the content has slightly changed since the last two versions I’ve\nseen (not the message, though). Doman gave a practical example showing backing up earlier statements that\ntoo much information is lost in the publication process. Heritage showed Elsevier/MDL’s view on the future of\nchemoinformatics, and accurately touched where it is currently failing. Amusingly, he pointed out that Elsevier,\nthe publisher, would love to see more accurate QSAR/QSPAR/VS/etc models; ironically, it is, actually, for a\nlarge part caused by data not ending up in publications that predictive models are not as accurate as they\ncould be. So, while looking at the chemoinformaticians/metricians, they should really be looking at themselves.</p>\n\n<p>Some of these presentations mentioned directly or indirectly things I worked on. Thanx for doing that! Because I\nknew that there was funding for going to this meeting, only after the poster submission deadline was closed, I\nam not in the opportunity to present my work myself.</p>\n\n<p>The evening is for the traditionally parties, time to eat, drink, network, make deals and try to convince others\nabout the virtues of <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/10/28/opensource-chemistry-and-opensource.html\">ODOSOS <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.</p>\n\n<p>A last reminder: tomorrow afternoon at 13:00 at Laptop Lane in the exposition area is a meeting of chemical\nbloggers. Please join and chat IRL for once! :)</p>",
      "summary": "I was happy to notice just a minute ago that the first blog items covering the ACS meeting are popping up: C&amp;EN has set up a dedicated blog about the meeting, Nature’s Sceptical Caterine wrote she has reached the meeting too, Richard wrote about the scent of bugs in wine (or so), and Kyle won’t make it other than tomorrow. Additionally, Nature is running a coverage of the ACS meeting. On the reader side, Paul is hoping that Whitesides will be blogged about.",
      
      "date_published": "2007-03-26T00:00:00+00:00",
      "date_modified": "2025-04-12T00:00:00+00:00",
      "tags": ["acs"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/fx8a8-qnp55",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/03/25/arrived-in-chicago.html",
      "title": "Arrived in Chicago...",
      "content_html": "<p>I arrived in Chicago yesterday afternoon. Much warmed than the cold Chicago the ACS promised me,\nso my winter coat was really not necessary. Is this global warming? Or was the ACS simply wrong?\nAnyway, very foggy indeed, just like the <a href=\"http://wiki.cubic.uni-koeln.de/cb/blog_search.php?timeframe=10y&amp;blog_id=44\">Chemistry World blog wrote</a>:</p>\n\n<p><img src=\"/assets/images/dsci0027.jpg\" alt=\"\" /></p>\n\n<p>There were several other Dutch chemists on the plane, among which a few formed postdocs from Nijmegen,\nwho I knew from the time I was still a M.Sc. student in organic chemistry. The plane was nice too, a\nBoeing 747, the first time I flew with one. OK, there now is the new Airbus, so the Boeing has lost\nsome of its prestige, but here’s the image anyway:</p>\n\n<p><img src=\"/blog//assets/images/dsci0024.jpg\" alt=\"\" /></p>\n\n<p>The <a href=\"http://www.youtube.com/watch?v=6iTqwPj5ChE\">380 is actually supposed to be in Chicago</a>,\nbut I did not see it. OK, going for breakfast now, and to the ACS conference site afterwards.</p>",
      "summary": "I arrived in Chicago yesterday afternoon. Much warmed than the cold Chicago the ACS promised me, so my winter coat was really not necessary. Is this global warming? Or was the ACS simply wrong? Anyway, very foggy indeed, just like the Chemistry World blog wrote:",
      "image": "https://chem-bla-ics.linkedchemistry.info/blog//assets/images/dsci0024.jpg",
      "date_published": "2007-03-25T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["acs"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/240x4-a7y42",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/03/22/chicago-bulls-here-i-come.html",
      "title": "Chicago (Bulls), here I come!",
      "content_html": "<p>I had some fun today with making prints of reservations etcetera for my trip to the\n<a href=\"http://www.chemistry.org/portal/a/c/s/1/acsdisplay.html?DOC=meetings%5Cchicago2007%5Chome.html\">ACS conference in Chicago</a>.\nWent over to the website to make a print of the location of the hotel I am in.\n(<a href=\"http://chicago.intercontinental.com/\">Intercontinental Chicago</a>: in case you want to leave me a message to\nmeet up over breakfast or so.) Anyway, so at the ACS website I found a notice that the ACS Housing people\nclosed down and that I should contact the hotel directly. Fine, no problem. Oh wait, my hotel is not in the\nlist. No worries, I just enter my last name and acknowledgment number. Huh, they don’t know me?? Already\nworried about which bridge to use as backup alternative, I emailed the organization which now takes care\nof it, being answered some 15 minutes later that they no longer do the hotel administration for that ACS\nconference anymore. That indeed rang some bell; I went over back to the ACS webpage, and this time found\nthe correct ACS housing webpage. I had been using one from a previous ACS conference. Yeah, one of my\nfinest hours :) Things are sorted out now, as I had already email the hotel too. Things are fine, and so\nmy nerviness activity is back to normal. (If you care to reproduce, just go to the\n<a href=\"http://www.chemistry.org/portal/a/c/s/1/acsdisplay.html?DOC=meetings\\national\\international.html\">page for International Visitors</a>\nlinked from the Chicago conference homepage, scroll down to “Preparing for Your ACS Meeting Experience” and click the\n<a href=\"http://www.chemistry.org/portal/a/c/s/1/acsdisplay.html?DOC=meetings\\national\\housing.html\">Hotel Information link</a>.\nMakes sense, because the international guests already know how things work :) And, yes, I could have seen it mention\nSA in the subtitle, I know.)</p>\n\n<h2 id=\"my-acs-schedule\">My ACS Schedule</h2>\n\n<p>My schedule is pretty regular, filled mostly with CINF and COMP presentations. The Monday and and Wednesday\nafternoons are empty, though there was some <a href=\"http://chemicalblogspace.blogspot.com/2007/03/chemical-blogspace-getting-physical-at.html\">plans to meet up with bloggers</a>\n(<a href=\"http://gaussling.wordpress.com/2007/03/07/bloggenvolk/\">and here</a>) on Monday afternoon, which I still\nthink we should do, even though <a href=\"http://gaussling.wordpress.com/2007/03/19/bloggenvolk-acs-chicago-meeting-minus-gaussling/\">Gaussling had to back out</a>.\nI hereby suggest we meet at 13:00 at <a href=\"http://map.mapnetwork.com/tradeshow/chicago/acs/\">Laptop Lane on the ACS Show Floor</a>.\nSunday evening there are all sorts of parties, and I and some colleagues are going to the party for\ninternational guests at the Sheraton Chicago. Monday evening is reserved for the <a href=\"http://www.nba.com/bulls/\">Bulls</a>.\nI am indeed some 10 years late, but happy to finally be able to visit a NBA stadium during the season. They play\nagainst Portland on Monday, while against Detroit on Tuesday. I figure that game would have been nicer, but\nTuesday evening is reserved for the <a href=\"http://blueobelisk.org/\">Blue Obelisk</a> social event at the\n<a href=\"http://hardly.cubic.uni-koeln.de/pipermail/blue-obelisk/2007-March/001125.html\">South Water Kitchen</a>.</p>\n\n<p>Suggestions, like <a href=\"http://blind-science.blogspot.com/2007/03/if-i-were-going-to-chicago.html\">these from Carmen</a>\nare most welcome!</p>",
      "summary": "I had some fun today with making prints of reservations etcetera for my trip to the ACS conference in Chicago. Went over to the website to make a print of the location of the hotel I am in. (Intercontinental Chicago: in case you want to leave me a message to meet up over breakfast or so.) Anyway, so at the ACS website I found a notice that the ACS Housing people closed down and that I should contact the hotel directly. Fine, no problem. Oh wait, my hotel is not in the list. No worries, I just enter my last name and acknowledgment number. Huh, they don’t know me?? Already worried about which bridge to use as backup alternative, I emailed the organization which now takes care of it, being answered some 15 minutes later that they no longer do the hotel administration for that ACS conference anymore. That indeed rang some bell; I went over back to the ACS webpage, and this time found the correct ACS housing webpage. I had been using one from a previous ACS conference. Yeah, one of my finest hours :) Things are sorted out now, as I had already email the hotel too. Things are fine, and so my nerviness activity is back to normal. (If you care to reproduce, just go to the page for International Visitors linked from the Chicago conference homepage, scroll down to “Preparing for Your ACS Meeting Experience” and click the Hotel Information link. Makes sense, because the international guests already know how things work :) And, yes, I could have seen it mention SA in the subtitle, I know.)",
      
      "date_published": "2007-03-22T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["acs","chemistry","blue-obelisk"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/7fs8y-v1g48",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/03/16/pipelining-chemical-information-with.html",
      "title": "Pipelining chemical information with Yahoo Pipes",
      "content_html": "<p>Chemists are picking up <a href=\"http://pipes.yahoo.com/\">Yahoo Pipes</a>, or, as Noel calls them,\n<a href=\"http://www.mail-archive.com/blue-obelisk@hardly.cubic.uni-koeln.de/msg00120.html\">Pipeline Pilot for RSS feeds</a>.\nI tend to agree, as the source of the workflows are closed, that is, at least require registering to the Yahoo webpage.</p>\n\n<p>Several chemical applications have been developed since. One was developed by <a href=\"http://msblog.kermitmurray.com/\">Kermit</a>\nwho wrote an <a href=\"http://msblog.kermitmurray.com/2007/02/yahoo-pipes-mass-spectrometry.html\">aggregator for mass spectrometry journal articles</a>.\nAnd <a href=\"http://www.chemicalforums.com/index.php?action=profile;u=2\">Mitch</a> has set up a\n<a href=\"http://www.chemicalforums.com/index.php?topic=13458.msg62253#msg62253\">similar feature for ACS journals</a>.</p>\n\n<p>Now, what I am really waiting for, are the first applications that deal with molecular structures, and a\npipe that alerts me about publications in which molecules are discussed matching a certain\n<a href=\"https://doi.org/10.1021/ci600305h\">MQL molecular query for an interesting substructure</a>.</p>",
      "summary": "Chemists are picking up Yahoo Pipes, or, as Noel calls them, Pipeline Pilot for RSS feeds. I tend to agree, as the source of the workflows are closed, that is, at least require registering to the Yahoo webpage.",
      
      "date_published": "2007-03-16T00:00:00+00:00",
      "date_modified": "2025-02-16T00:00:00+00:00",
      "tags": ["rss","chemistry","publishing"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1021/ci600305h", "doi": "10.1021/ci600305h"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/3et2a-pkv75",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/03/14/what-is-dapagliflozin.html",
      "title": "What is dapagliflozin?",
      "content_html": "<p><a href=\"http://www.qdinformation.com/qdisblog\">QDIS</a> blogged about <a href=\"http://www.qdinformation.com/qdisblog/2007/01/11/bristol-myers-and-astrazeneca-in-1-billion-drug-pact/\">Bristol-Myers and AstraZeneca teaming up for a new drug called\ndapagliflozin</a>. Now,\ndapagliflozin is, this week, the most used search keyword in <a href=\"http://www.google.com/\">Google</a>, leading to\n<a href=\"http://wiki.cubic.uni-koeln.de/cb/\">Chemical blogspace</a>.</p>\n\n<p>I wondered what the chemical structure of this compound is. The <a href=\"http://www.astrazeneca.com/\">AstraZeneca</a> and\n<a href=\"http://www.bms.com/\">Bristol-Myers Squibb</a> websites don’t say. Since everything in pharma is patented I went to the US\npatent database and a search for <em>DPP-4 AND inhibitor</em> found the patents <a href=\"http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&amp;Sect2=HITOFF&amp;p=1&amp;u=%2Fnetahtml%2FPTO%2Fsearch-bool.html&amp;r=1&amp;f=G&amp;l=50&amp;co1=AND&amp;d=PTXT&amp;s1=DPP-4&amp;s2=inhibitor&amp;OS=DPP-4+AND+inhibitor&amp;RS=DPP-4+AND+inhibitor\">6,995,183</a>\nand <a href=\"http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&amp;Sect2=HITOFF&amp;p=1&amp;u=%2Fnetahtml%2FPTO%2Fsearch-bool.html&amp;r=2&amp;f=G&amp;l=50&amp;co1=AND&amp;d=PTXT&amp;s1=DPP-4&amp;s2=inhibitor&amp;OS=DPP-4+AND+inhibitor&amp;RS=DPP-4+AND+inhibitor\">6,995,180</a>.\nBut that does not help me either.</p>\n\n<p>Does anyone know the chemical structure of this compound? Just the InChI would be fine…</p>",
      "summary": "QDIS blogged about Bristol-Myers and AstraZeneca teaming up for a new drug called dapagliflozin. Now, dapagliflozin is, this week, the most used search keyword in Google, leading to Chemical blogspace.",
      
      "date_published": "2007-03-14T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/swa1d-q3q68",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/03/08/fast-molecular-similarity-with-new-3d.html",
      "title": "Fast molecular similarity with a new 3D shape descriptor",
      "content_html": "<p><a href=\"http://wwmm.ch.cam.ac.uk/blogs/downing\">Jim</a> reported about <a href=\"http://www.dspace.cam.ac.uk/handle/1810/183858\">SPECTRa</a>\n<a href=\"http://wwmm.ch.cam.ac.uk/blogs/downing/?p=79\">being in the news</a> and <a href=\"http://www.slashdot.org/\">./</a> about\n<a href=\"http://developers.slashdot.org/developers/07/03/08/1638241.shtml\">Toward a 3D Search Engine</a>. These two items have in\ncoming that they deal with the article <em>Ultrafast shape recognition for similarity search in molecular databases</em> by\nBallester and Richards (DOI:<a href=\"https://doi.org/10.1098/rspa.2007.1823\">10.1098/rspa.2007.1823</a>). The NewScientist wrote\nup <a href=\"http://www.newscientisttech.com/article/dn11283-novel-search-engine-matches-molecules-in-a-flash.html\">their angle on it</a>,\nwith a quote from <a href=\"http://www.ch.ic.ac.uk/local/organic/mod/\">Henry Rzepa</a>.</p>\n\n<p>The article proposes a new shape descriptor which is requires little computational resources to be calculated. It consists\nof 12 numbers describing the shape, and a simple similarity measure converts it into similarities. The results shown in\nthe article, and replicated in the NewScientist article linked above, are interesting enough for me to wonder if I could\n<a href=\"http://cia.navi.cx/stats/author/f_marighetti\">Federico</a>, one of our <a href=\"http://almost.cubic.uni-koeln.de/jrg/\">CUBIC</a>\nstudents, to work on this in the last two weeks of his practical.</p>\n\n<p>BTW, <a href=\"http://andygoesus.blogspot.com/\">Andreas</a>, don’t those review articles (viz.\nDOI:<a href=\"https://doi.org/10.1039/b409813g\">10.1039/b409813g</a>) work out good for your citation count ;)</p>",
      "summary": "Jim reported about SPECTRa being in the news and ./ about Toward a 3D Search Engine. These two items have in coming that they deal with the article Ultrafast shape recognition for similarity search in molecular databases by Ballester and Richards (DOI:10.1098/rspa.2007.1823). The NewScientist wrote up their angle on it, with a quote from Henry Rzepa.",
      
      "date_published": "2007-03-08T00:00:00+00:00",
      "date_modified": "2025-02-16T00:00:00+00:00",
      
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1098/rspa.2007.1823", "doi": "10.1098/rspa.2007.1823"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1039/b409813g", "doi": "10.1039/b409813g"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/7dazs-avy60",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/02/23/nature-network-v2-cannot-create-new.html",
      "title": "Nature Network v2: cannot create a new group",
      "content_html": "<p><a href=\"http://blogs.nature.com/wp/nascent/2007/02/nature_network_v2_is_live.html\">Nascent</a> reported that\n<a href=\"http://network.nature.com/\">Nature Network v2</a> has gone life. Never too anxious to try something new,\nI created an account and signed in. I even joined two groups: <em>Bioinformatics</em> and <em>Semantic Web for the Life Sciences</em>.</p>\n\n<p>But, when I tried to create a new group, the system fails. I promised me to send me email for confirmation.\nTried it twice via my <a href=\"http://www.sf.net/\">Sourceforge</a> email account. No email. I then changed my email\nfor my Nature account to my Gmail address. Still no email…</p>\n\n<p>I am not located in Boston or London, is that the problem? Is being ‘global’ not good enough? Is the requirement\nto have two ‘o’s in the name? Cologne then, maybe?</p>\n\n<h2 id=\"missing-features\">(Missing) Features</h2>\n\n<p>For the rest, the system seems interesting. I am not too fond of having to create accounts all over the place\n(<em>what was the password again???</em>), but looks promising. The thing I missed most when filling out my profile\nwas a feature to import the list of my publications from <a href=\"http://www.connotea.org/\">Connotea</a>.</p>\n\n<p>Another thing I missed, was the ability to mention my blog(s) in my profile. May I put this in as request too?\nBTW, is there a group or forum on Nature Network where I can file these things?</p>",
      "summary": "Nascent reported that Nature Network v2 has gone life. Never too anxious to try something new, I created an account and signed in. I even joined two groups: Bioinformatics and Semantic Web for the Life Sciences.",
      
      "date_published": "2007-02-23T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["nature","connotea"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/ezjxg-p8m13",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/02/20/invisible-inchis.html",
      "title": "Invisible InChI&apos;s",
      "content_html": "<p>Some <a href=\"http://www.iupac.org/inchi/\">InChI</a>’s are short, such as that for methane: <span class=\"chem:inchi\">InChI=1/CH4/h1H4</span>.\nOthers are long (think <a href=\"http://chem-bla-ics.linkedchemistry.info/2006/03/31/inchis-in-latex-and-cdk-news.html\">crambin <i class=\"fa-solid fa-recycle fa-xs\"></i></a>), and you don’t\nwant to show them inline. Or you just want to show them anyway, but still want the chemistry to be understood. Here come the\ninvisible InChI’s.</p>\n\n<h2 id=\"alt-text-for-images\">Alt text for images</h2>\n\n<p>One solution is to put the InChI as content of the @alt attribute of the HTML <code class=\"language-plaintext highlighter-rouge\">&lt;img&gt;</code> element. This has the downside that it\nhas no explicit semantic meaning. For example, the <a href=\"http://scienceblogs.com/moleculeoftheday/\">Molecule Of The Day</a> blog is using\nthis approach. It’s an excellent start, but not the solution.</p>\n\n<h2 id=\"as-keyword\">As Keyword</h2>\n\n<p>Another option is to put it in as keyword, in the HTML <code class=\"language-plaintext highlighter-rouge\">&lt;head&gt;</code> element: <code class=\"language-plaintext highlighter-rouge\">&lt;meta name=\"keywords\" content=\"InChI=1/CH4/h1H4\"&gt;</code>.\nBut Google does not index this, so the use is restricted.</p>\n\n<h2 id=\"invisible-text\">Invisible text</h2>\n\n<p>The most promosing alternative, however, is to put it in using the <code class=\"language-plaintext highlighter-rouge\">&lt;span&gt;</code> element, in combination with microformats or RDFa,\nLike this: <span class=\"chem:inchi\" style=\"font-size: 0%; visibility: hidden;\">InChI=1/CH4/h1H4</span>.\nIt does not show up, does it? But it is really there, as you would see, if you have\n<a href=\"http://chem-bla-ics.linkedchemistry.info/2006/12/19/chemistry-in-html-greasemonkey-again.html\">the special Greasemonkey <i class=\"fa-solid fa-recycle fa-xs\"></i>\n</a> installed.</p>\n\n<p>This is the HTML code for this example:</p>\n\n<div class=\"language-html highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nt\">&lt;span</span> <span class=\"na\">class=</span><span class=\"s\">\"chem:inchi\"</span> <span class=\"na\">style=</span><span class=\"s\">\"font-size: 0%; visibility: hidden;\"</span><span class=\"nt\">&gt;</span>InChI=1/CH4/h1H4<span class=\"nt\">&lt;/span&gt;</span>\n</code></pre></div></div>\n\n<p>The <code class=\"language-plaintext highlighter-rouge\">@style</code> attribute marks the text’s visibility as hidden, and the font-size is set to 0%. It is important not to set it\nto zero itself, because many web browsers do not interpret zero font size correctly, and take the default font size instead.</p>\n\n<p>This should solve the standing problem that we would like to include the InChI’s in our blogs, if it would just not be so\nlong and unreadable. Just hide it.</p>\n\n<p><strong>Update</strong>: Daniel <a href=\"https://web.archive.org/web/20070514085137/https://chem-bla-ics.blogspot.com/2007/02/invisible-inchis.html#comment-6321491648638004528\">informed <i class=\"fa-solid fa-box-archive fa-xs\"></i></a> <!-- keep link -->\nme that Google won’t index text marked ‘visibility: hidden’ and may even mark your webpage as spam :( Not the solution either.\nRead the comments for more thoughts.</p>",
      "summary": "Some InChI’s are short, such as that for methane: InChI=1/CH4/h1H4. Others are long (think crambin ), and you don’t want to show them inline. Or you just want to show them anyway, but still want the chemistry to be understood. Here come the invisible InChI’s.",
      
      "date_published": "2007-02-20T00:00:00+00:00",
      "date_modified": "2025-04-27T00:00:00+00:00",
      "tags": ["inchi","html"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/bpnj5-40e86",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/02/19/pimp-my-javadoc.html",
      "title": "Pimp my JavaDoc",
      "content_html": "<p><a href=\"http://miningdrugs.blogspot.com/\">Jörg</a>’s PhD book <em>Data Mining und Graph Mining auf molekularen Graphen - Chemoinformatik und\nmolekulare Kodierungen für ADME/Tox-QSAR-Analysen</em> has a dump of the JavaDoc of the <code class=\"language-plaintext highlighter-rouge\">GroupContributionPredictor</code> in\n<a href=\"http://joelib.sf.net/\">JOELib</a> (Figure 3.2, page 43). There are two nice things to the shown JavaDoc: 1. it has links to\n<a href=\"http://www.wikipedia.org/\">Wikipedia</a>; 2. it has a Further Reading section.</p>\n\n<p>Now, the <a href=\"http://cdk.sf.net/\">CDK</a> already links to a bibliography for some time now. However, it would just give a BibTex\nkey, and link to a webpage created from a <a href=\"http://bibtexml.sf.net/\">BibTeXML</a> file in which we store all references\n(<a href=\"http://cdk.svn.sourceforge.net/viewvc/cdk/trunk/cdk/doc/refs/cheminf.bibx?view=log\">cdk/doc/refs/cheminf.bibx</a>).\nPutting the full citation inline makes the JavaDoc more informative, but I wanted to preserve the <code class=\"language-plaintext highlighter-rouge\">@cdk.cite</code>\nmechanism we were using.</p>\n\n<p>This weekend I hacked up a nice CDKCiteDoclet that would read the BibTeXML file with <a href=\"http://www.xom.nu/\">XOM</a>,\nand convert items to HTML to put into the pimped JavaDoc:</p>\n\n<p><img src=\"/assets/images/pimpedJavaDoc.png\" alt=\"\" /></p>",
      "summary": "Jörg’s PhD book Data Mining und Graph Mining auf molekularen Graphen - Chemoinformatik und molekulare Kodierungen für ADME/Tox-QSAR-Analysen has a dump of the JavaDoc of the GroupContributionPredictor in JOELib (Figure 3.2, page 43). There are two nice things to the shown JavaDoc: 1. it has links to Wikipedia; 2. it has a Further Reading section.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/pimpedJavaDoc.png",
      "date_published": "2007-02-19T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["cdk","javadoc","literature"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/5w2j5-jfr13",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/02/17/is-that-jmol-in-that-d-wave-demo.html",
      "title": "Is that Jmol in that D-Wave demo?",
      "content_html": "<p><a href=\"http://science.slashdot.org/article.pl?sid=07/02/15/1417236&amp;from=rss\">Slashdot reported</a> on\n<a href=\"http://www.dwavesys.com/\">D-Wave</a>’s recent demo of their 16-<a href=\"http://en.wikipedia.org/wiki/Qubit\">qubit</a>\nquantum computing system. <a href=\"http://kwc.org/blog/archives/2007/2007-02-14.dwave_demo.html\">Video’s of the demo</a>\ncan be watched on <a href=\"http://video.google.com/\">Google Video</a>. The <a href=\"http://video.google.com/videoplay?docid=-291541120357804188&amp;hl=en\">second video</a>\ndemonstrates the use of the machine in similarity searching:</p>\n\n<p><img src=\"/assets/images/dwaveDemo.png\" alt=\"\" /></p>\n\n<p>Now, that screenshot does look like <a href=\"http://jmol.sf.net/\">Jmol</a>.\nThe companies website does not give the answer, <a href=\"http://scottaaronson.com/blog/?p=198\">though Scott mentions C and Java front end software</a>.</p>\n\n<p>So, let’s ask the source: Dear dr. <a href=\"http://dwave.wordpress.com/\">Rose</a>, is it Jmol what we see in that demo?</p>",
      "summary": "Slashdot reported on D-Wave’s recent demo of their 16-qubit quantum computing system. Video’s of the demo can be watched on Google Video. The second video demonstrates the use of the machine in similarity searching:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/dwaveDemo.png",
      "date_published": "2007-02-17T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["jmol","quantum"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/k9nar-6wt60",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/02/04/writing-up-my-phd-introduction-chapter.html",
      "title": "Writing up my PhD introduction chapter...",
      "content_html": "<p>The last twelve months or so, I have been doing two jobs (excluding hobbies of mine, such as\n<a href=\"http://wiki.cubic.uni-koeln.de/cb/\">Chemical blogspace</a>): my postdoc in <a href=\"http://almost.cubic.uni-koeln.de/jrg\">the group of Christoph Steinbeck</a>\non computer aided structure elucidation, and finishing my PhD. The topic of my PhD is about the interplay between chemoinformatics\nand chemometrics: the first being strong in dealing with molecular structures, the latter strong in data analysis and mining,\noriginally on experimental data. Really, I focused on a few existing problems, such as how to represent and analyze large\nlibraries of crystal structures, the use of NMR spectra in QSAR studies, and two more practical problems regarding reproducibility\nof scientific results, which includes communication of data, and transferability of algorithms. Actually, I also studied fragment\nmining in QSAR for a set of transfactants, but that has not lead to firm results yet.</p>\n\n<p>The below diagram shows how I see the interplay between both fields:</p>\n\n<p><img src=\"/assets/images/rodeDraad.png\" alt=\"\" /></p>",
      "summary": "The last twelve months or so, I have been doing two jobs (excluding hobbies of mine, such as Chemical blogspace): my postdoc in the group of Christoph Steinbeck on computer aided structure elucidation, and finishing my PhD. The topic of my PhD is about the interplay between chemoinformatics and chemometrics: the first being strong in dealing with molecular structures, the latter strong in data analysis and mining, originally on experimental data. Really, I focused on a few existing problems, such as how to represent and analyze large libraries of crystal structures, the use of NMR spectra in QSAR studies, and two more practical problems regarding reproducibility of scientific results, which includes communication of data, and transferability of algorithms. Actually, I also studied fragment mining in QSAR for a set of transfactants, but that has not lead to firm results yet.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/rodeDraad.png",
      "date_published": "2007-02-04T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/kvq10-snm07",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/02/03/cdk-workshop-days-3-and-4.html",
      "title": "CDK Workshop - Days #3 and #4",
      "content_html": "<p>Days #3 and #4 of the <a href=\"http://wiki.cubic.uni-koeln.de/cdkwiki/doku.php?id=spring2007workshop\">CDK Workshop</a> have been\nquite busy indeed, and I have not been able to summarize them so far. After <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/01/30/cdk-workshop-day-2.html\">a rather interesting day #2 <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\nthe third day was the last one with scheduled presentations. Kai Hartmann showed how he used the CDK in his systems\nbiology research, and contributed the code he wrote to predict Gibbs energies based on fragment contributions.\nMiguel Rojas showed his MS prediction work, which is based on the CDK too.</p>\n\n<p>Much of the rest of day and Thursday continued on the work started yesterday: making the 3D structure builder a\nsingleton class, and applying and testing an optimization for the AllRingsFinder to address\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2007/01/30/cdk-workshop-day-2.html\">molecules like Choloyl-CoA <i class=\"fa-solid fa-recycle fa-xs\"></i></a>. The trick\nbasically consists of applying the all rings finding algorithm to isolated systems only. The effect is\nconsiderable: the total computation time for Choloyl-CoA decreases by a 93 fold! We found that the\nfingerprints used in the template library for the 3D structure builder are outdated, and Christoph worked\non updating that, which required searching into old archives to find the tool to do just this.</p>\n\n<p>Because the above performance fix did not fix the current slow SMILES parsing, Kai looked at the\n<code class=\"language-plaintext highlighter-rouge\">DeduceBondOrderTool</code> which is the slow component, and optimized the used algorithm by reusing determined\nmolecular ring systems. Nevertheless, on users requests, a time out mechanism is now available for SMILES\nparsing. Additionally, several of the bugs found on the second workshop day have been fixed. Meanwhile,\nI was distracted by other things. For example, fixing <a href=\"http://www.bioclipse.net/\">Bioclipse</a> bugs for\n<a href=\"http://bioclipse.blogspot.com/2007/02/bioclipse-101-released.html\">the version 1.0.1 released yesterday</a>.\nThe SENECA tool is not forgotten too, and last weekend I made some good progress with it,\n<a href=\"http://wiki.cubic.uni-koeln.de/blog/pivot/entry.php?id=15\">which Christoph blogged about</a>.</p>",
      "summary": "Days #3 and #4 of the CDK Workshop have been quite busy indeed, and I have not been able to summarize them so far. After a rather interesting day #2 , the third day was the last one with scheduled presentations. Kai Hartmann showed how he used the CDK in his systems biology research, and contributed the code he wrote to predict Gibbs energies based on fragment contributions. Miguel Rojas showed his MS prediction work, which is based on the CDK too.",
      
      "date_published": "2007-02-03T00:00:00+00:00",
      "date_modified": "2025-08-10T00:00:00+00:00",
      "tags": ["cdk","smiles"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/ne4rf-wey66",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/02/01/rsc-first-publisher-to-go-semantic.html",
      "title": "RSC: the first publisher to go semantic!",
      "content_html": "<p>Just announced: <a href=\"http://web.archive.org/web/20070211195109/http://www.rsc.org/Publishing/Journals/ProjectProspect/index.asp\">the RSC goes semantic <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>!\nColin Batchelor was here at the CUBIC last autumn, where we discussed issues involved, mostly relating to\nexperimental section of organic chemistry syntheses, and NMR and MS spectra in particular, so I knew that\nthis was coming our way. The announcement writes:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>RSC Publishing, the publishing arm of the Royal Society of Chemistry, is\npleased to announce a new initiative for its journals. From February\n2007 electronic RSC journal papers will be enhanced so that their data\ncan be read, indexed and intelligently searched by machine, a first step\ntowards the \"semantic web\".\n\nReaders will be able to click on named compounds and scientific concepts\nin an electronic journal article to download structures, understand\ntopics, or link through to electronic databases; compounds and ontology\nterms will be published as RSS feeds enabling automated discovery of\nrelevant research.\n\nThe initiative, coined 'Project Prospect', is the first of its scope\nfrom a primary research publisher. Developed together with UK academics\nbased at the Unilever Centre of Molecular Informatics and the Computing\nLaboratory at Cambridge University, the Project uses InChIs (IUPAC's\nInternational Chemical Identifier for compounds); OBO ontology terms\n(Open Biomedical Ontologies: a hierarchical classification of biomedical\nterms) such as the Gene Ontology (GO) and the related Sequence Ontology\n(SO); terms from the IUPAC Gold Book; and CML (Chemical Markup Language:\na means to describe molecular information in a structured form).\n\nThis is a completely free service for authors and readers of RSC\njournals. The enhanced articles have an at a glance HTML view with\nadditional features accessed by a tool box. Downloadable compound\nstructures and printer friendly versions will be available via this new\nservice.\n</code></pre></div></div>\n\n<p>Colin, cheers!</p>",
      "summary": "Just announced: the RSC goes semantic ! Colin Batchelor was here at the CUBIC last autumn, where we discussed issues involved, mostly relating to experimental section of organic chemistry syntheses, and NMR and MS spectra in particular, so I knew that this was coming our way. The announcement writes:",
      
      "date_published": "2007-02-01T00:00:00+00:00",
      "date_modified": "2024-10-14T00:00:00+00:00",
      "tags": ["semweb","chemistry","publishing"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/f7vn0-kqz26",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/01/30/cdk-workshop-day-2.html",
      "title": "CDK Workshop - Day #2",
      "content_html": "<p>Because of other obligations, I was unable to attend the first day of the <a href=\"http://wiki.cubic.uni-koeln.de/cdkwiki/doku.php?id=spring2007workshop\">CDK Workshop</a>,\nthough Christoph had set up Skype so that at least I could hear the talks from <a href=\"http://www.inf.uni-konstanz.de/bioml/staff/berthold/\">Prof. Berthold</a>\n(Konstanz, Germany) about <a href=\"http://www.knime.org/\">KNIME</a> and <a href=\"http://almost.cubic.uni-koeln.de/cosi/curriculumVitae_zielesny.htm\">Prof. Zielesny</a>\nabout <a href=\"http://cdk-taverna.de/\">CDK-Taverna</a>.</p>\n\n<p>Today, Miguel Rojas and Stefan Kuhn discussed their research. Miguel showed the state of mass spectrum prediction using the <a href=\"http://cdk.sf.net/\">CDK</a>\nand the MEDEA plugin for <a href=\"http://www.bioclipse.net/\">Bioclipse</a>. Stefan demonstrated the <a href=\"http://www.nmrshiftdb.org/\">NMRShiftDB</a>\nand a new lab systems for NMR experiment scheduling and management system based on that. <a href=\"http://www2.cmbi.ru.nl/who-and-where/staff/27/\">Dr. Ott</a>\n(Nijmegen, Netherlands) showed the <a href=\"http://biometa.cmbi.ru.nl/\">BioMeta Database</a> which contains metabolite and reaction information derived from the\n<a href=\"http://www.genome.jp/kegg/ligand.html\">KEGG</a>, but which fixes a set of chemical problems in the latter (see also the article,\nDOI:<a href=\"https://doi.org/10.1186/1471-2105-7-517\">10.1186/1471-2105-7-517</a>).</p>\n\n<p>The afternoons of CDK workshops traditionally have discussion sessions and hackathons. Two groups were formed: one consisted of the KNIME guys who,\ntogether with Miguel and Federico focused in QSAR descriptor calculations in KNIME, while Stefan, Martin and me looked at the fingerprinter\npeculiarities that Martin found (see also this <a href=\"http://almost.cubic.uni-koeln.de/cdk/cdk_top/cdk_news/archive/cdknews2.2.article22.pdf\">CDK News article</a>),\nand came up with a possible further performance improvement of the AllRingsFinder. Because one class of molecules that is causing trouble consist of two\nring systems connected by a long linker, like Choloyl-CoA (below), we anticipate that splitting the molecule up into ring systems prior to using the\nSSSR algorithm should speed up the complete all-ring finding process.</p>\n\n<p><img src=\"/assets/images/choloyl-coa.png\" alt=\"\" /></p>\n\n<p>Currently, the spanning tree is calculated before deciding on using the SSSR finder, which, we think, can be used to partition the molecule\ninto separate ring systems. On each of them, then, the further steps of the ring search can be applied.</p>\n\n<p>After dinner (pasta/pizza), during the Spanish-German handball game, we continued the hacking and discussions, now focusing as a whole group\non QSAR descriptors in KNIME. We looked at each descriptor and decided if it should go into a QSAR calculator node, or even in a node of its own.</p>\n\n<h2 id=\"bugs-found\">Bugs found</h2>\n<p>I won’t close this blog entry without giving a list of problems we found in the current CDK; some minor and small, some more troublesome.\nHere goes: typos all over the place; the OrderQueryBond lack a return statement in an else clause; the Mol2Reader does not mark atom and\nbond aromaticity properly and reads a single bond as aromatic, and an aromatic bond as single; the Renderer2D does not always highlight\nboth atoms when hovering over a bond; SmilesGenerator.parseBond() should output bond orders correctly; the SSSR finder seems to have a\nmessed up if-else statement for the ringBondCount limit of 37; the BondCount descriptor should count all bonds by default, not just the\nsingle bonds; <code class=\"language-plaintext highlighter-rouge\">IDescriptor.getParameters()</code> should return null instead of <code class=\"language-plaintext highlighter-rouge\">Object[0];</code> several programs use the SYBYL atomtype S.o2, while\nthe specification and the CDK config defines S.O2; the IP descriptor now returns a variable length descriptor.</p>",
      "summary": "Because of other obligations, I was unable to attend the first day of the CDK Workshop, though Christoph had set up Skype so that at least I could hear the talks from Prof. Berthold (Konstanz, Germany) about KNIME and Prof. Zielesny about CDK-Taverna.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/choloyl-coa.png",
      "date_published": "2007-01-30T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["cdk","kegg","knime","smiles","taverna","nmrshiftdb"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1186/1471-2105-7-517", "doi": "10.1186/1471-2105-7-517"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/ccz4k-vge50",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/01/24/osmb2007-day-1-venture-capital.html",
      "title": "OSMB2007 Day #1: venture capital, scientific blogger and Kepler",
      "content_html": "<p>The second day just started of the <a href=\"http://www.heise.de/veranstaltungen/2007/ho_osb/en/\">Open Source Meets Business</a>,\nand now actually listening to the PHP talk, but here is a short update on day 1, which was the investment summit. It\nwas not so crowded, but especially the talks from the venture capitalists were interesting. During lunch we actually\ntalked to one in person, which was insightful. I will be putting up links to interesting sites mentioned during this\nconference on my <a href=\"http://del.icio.us/egonw/OSMB2007\">delicious account</a>.</p>\n\n<p>Nothing much more I can tell about this, except for a few general quotes:</p>\n\n<ul>\n  <li>2% of the downloaders become paying customers</li>\n  <li>an active community is important, cherish it</li>\n  <li>support as business model is not interesting for venture capatilists</li>\n  <li>don’t think you understand the legal implications</li>\n</ul>\n\n<p>Noteworthy is that we have free wireless at the conference site :) So I downloaded a recent\n<a href=\"http://drexel-coas-talks-mp3-podcast.blogspot.com/2007/01/nc-science-blogging-conference.html\">presentation by Jean-Claude about his open science work and blogging efforts</a>,\nwhich I enjoyed watching very much. I skyped with my wife and children, and I booked a hotel for the\n<a href=\"http://www.chemistry.org/portal/a/c/s/1/acsdisplay.html?DOC=meetings%5Cchicago2007%5Chome.html\">ACS meeting in March in Chicago</a>,\nas chances are high that I will attend that meeting.</p>\n\n<p>Last night it started snowing, and it is completely white outside right now. The temperature has dropped to normal\nwinter season, which made the burritos in downtown Nuernberg extra nice. Later today, Christoph’s\n<a href=\"http://www.chemoinformatics.org/\">COSI</a> talk is scheduled, and I was delighted to learn via\n<a href=\"http://wiki.cubic.uni-koeln.de/cb/\">Chemical blogspace</a> that\n<a href=\"http://cszamudio.spaces.live.com/Blog/cns!9BCF6F9D6772B8F5!1461.entry\">Carlos blogged about it yesterday</a>!\nCheers Carlos! In the same blog he also mentions that he is integrating the <a href=\"http://cdk.sf.net/\">CDK</a>\nwith something called Kepler. Carlos, if you read this: what is the URL for Kepler?</p>",
      "summary": "The second day just started of the Open Source Meets Business, and now actually listening to the PHP talk, but here is a short update on day 1, which was the investment summit. It was not so crowded, but especially the talks from the venture capitalists were interesting. During lunch we actually talked to one in person, which was insightful. I will be putting up links to interesting sites mentioned during this conference on my delicious account.",
      
      "date_published": "2007-01-24T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["cdk","acs","cheminf"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/jsz01-vf544",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/01/24/blogging-and-press.html",
      "title": "Blogging and the Press",
      "content_html": "<p>Today at the <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/01/24/osmb2007-day-1-venture-capital.html\">OSMB <i class=\"fa-solid fa-recycle fa-xs\"></i></a> we had again a good\nlunch again, and Rachel Sterne joined our table. She works at a New York based start up\n<a href=\"http://groundreport.com/articles.php?id=274\">Ground Report</a>, which is a news website where anyone, including bloggers,\ncan post news stories. Not links to news stories, as on <a href=\"http://slashdot.org/\">Slashdot</a>, but actual news stories.\nStories that can be committed are not restricted to any topic, or country, or whatever. The good news is that the\nrevenues out of advertisement is shared with the people that submit the stories, 50/50 even, if I understood correctly.\nThe more visitor hits your story gets, the bigger your part of the revenue is.</p>\n\n<p>Now, the reason why I advertise this, is that <a href=\"http://blog.chembark.com/2007/01/22/blogging-creds/\">Paul recently blogged about the status of bloggers as members of the\npress</a>. ACS does not seem to think so, though even\n<a href=\"http://www.pulitzer.org/resources/onlinerel.html\">the Pulizer organization disagrees</a>. The ACS requires that\nfreelancers are connected to an news organization, and I am wondering wether they would accept Ground Report as such…</p>",
      "summary": "Today at the OSMB we had again a good lunch again, and Rachel Sterne joined our table. She works at a New York based start up Ground Report, which is a news website where anyone, including bloggers, can post news stories. Not links to news stories, as on Slashdot, but actual news stories. Stories that can be committed are not restricted to any topic, or country, or whatever. The good news is that the revenues out of advertisement is shared with the people that submit the stories, 50/50 even, if I understood correctly. The more visitor hits your story gets, the bigger your part of the revenue is.",
      
      "date_published": "2007-01-24T00:00:00+00:00",
      "date_modified": "2025-04-27T00:00:00+00:00",
      "tags": ["acs","blog"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/ta3z2-pc970",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/01/22/open-source-meets-business-2007.html",
      "title": "Open Source Meets Business 2007",
      "content_html": "<p>Today I leave for a two day visit at the <a href=\"http://www.heise.de/veranstaltungen/2007/ho_osb/en/\">Open Source Meets Business</a>\nconference in Nürnberg, where <a href=\"http://wiki.cubic.uni-koeln.de/blog/\">Christoph</a> will speak about the\n<a href=\"http://chemoinformatics.org/\">Chemoinformatics OpenSource Initiative</a> (COSI). If you happen to go to that meeting too,\nlet’s try to meet!</p>",
      "summary": "Today I leave for a two day visit at the Open Source Meets Business conference in Nürnberg, where Christoph will speak about the Chemoinformatics OpenSource Initiative (COSI). If you happen to go to that meeting too, let’s try to meet!",
      
      "date_published": "2007-01-22T00:00:00+00:00",
      "date_modified": "2025-02-16T00:00:00+00:00",
      "tags": ["cheminf","opensource"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/g22fv-gtc07",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/01/14/cdk-literature-1.html",
      "title": "CDK Literature #1",
      "content_html": "<p>For each <a href=\"http://www.cdknews.org/\">CDK News</a> I try to write up what CDK related literature has been published\nrecently, but I failed to do so for the last two issues. In order to not postpone writing it up until close to\nthe deadline, I will write up things here, so that I can copy-paste it later for CDK News.</p>\n\n<h2 id=\"oxidoreductase-catalyzed-reactions\">Oxidoreductase-catalyzed reactions</h2>\n\n<p>Mu <em>et al.</em> analyzed about 2000 oxidation/reduction reactions from <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/01/14/cdk-literature-1.html\">KEGG <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nusing the <a href=\"http://cdk.sf.net/\">CDK</a> and <a href=\"http://joelib.sf.net/\">JOELib</a> for the chemoinformatics bits. The reactions were grouped into\n12 subclasses, and SVM was used to train models to distinguish reactants from non-reactants. It seems that there were not independent\ntest sets used, but cross-validation indicates that there approach is possible. The works uses CDK’s HydrogenAdder,\nUniversalIsomorphismTester, and unnamed QSAR descriptors. It would be interesting to see how it compares to\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2006/04/04/mining-kegg-pathway-database-with-self.html\">the work of Aires-de-Sousa <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.</p>\n\n<h2 id=\"cognate-ligands\">Cognate ligands</h2>\n\n<p>Bashton <em>et al.</em> took a different approach in analyzing the metabolome. They looked at the correlation of ligand structure with enzyme\ndomains, and propose a method to identify cognate ligands, that is, ligands that are present in vivo and are required for a functional\nmetobolome. The CDK is used for calculating fingerprints and used for calculating maximal common substructures (MCSS). The paper notes\nthat the MCSS is not necessarily of biochemical relevance, indicating that there is room for pharmacophore like concept in the CDK.</p>",
      "summary": "For each CDK News I try to write up what CDK related literature has been published recently, but I failed to do so for the last two issues. In order to not postpone writing it up until close to the deadline, I will write up things here, so that I can copy-paste it later for CDK News.",
      
      "date_published": "2007-01-14T00:00:00+00:00",
      "date_modified": "2025-04-27T00:00:00+00:00",
      "tags": ["cdk"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1093/bioinformatics/btl535", "doi": "10.1093/bioinformatics/btl535"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1016/j.jmb.2006.09.041", "doi": "10.1016/j.jmb.2006.09.041"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/ke8wz-vks94",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/01/11/why-do-i-blog.html",
      "title": "Why do I blog?",
      "content_html": "<p><a href=\"http://www.chemicalforums.com/index.php?topic=12307.msg57384#msg57384\">Mitch blogged</a> about a comment Bethany Halford,\nAssociate Editor of <a href=\"http://pubs.acs.org/cen/\">C&amp;EN</a>, <a href=\"http://www.thechemblog.com/?p=360#comment-1889\">left in The Chem Blog</a>.\nShe is writing an opinion piece on chemistry blogs, and is wondering why I blog, whether I use a nickname, and if my\nemployer knows I blog. So, here goes.</p>\n\n<h2 id=\"why-do-i-blog\">Why do I blog?</h2>\n\n<p>I started blogging <a href=\"https://chem-bla-ics.linkedchemistry.info/2005/10/15/chem-bla-ics.html\">in October 2005 <i class=\"fa-solid fa-recycle fa-xs\"></i></a> to reduce my workload:\ninvolved in open source chemoinformatics projects, I quite often emailed to mailing lists about interesting websites/projects/events\netc. Not uncommonly to multiple lists, which required me to tune the email to the list. I realized that blogging about it, would\nmake it possible to no longer post it to mailing lists, and, therefore, reduce my workload. A second reason is that I post\ntricks there, so that I have them available in a central place, and to post questions that, hopefully, others can answer.\nAs such, it is a way of communicating with fellow scientists, without the need the specifically address them. Open, free and fast.</p>\n\n<p>Deliberately, I did not start a personally diary blog, but a blog about my work as chemoinformatician. Nevertheless, the nature of\nblogging allows to give what you write a personal twist. To stress scientific nature of my blog, and many others, is that blogging\nscientists often cite and discuss literature, which nicely leads to scientific blog aggregators like <a href=\"http://postgenomic.com/\">Postgenomic.com</a>\nand <a href=\"http://web.archive.org/web/20070115091132/http://wiki.cubic.uni-koeln.de/cb/\">Chemical blogspace <i class=\"fa-solid fa-archive fa-xs\"></i></a>,\nwhich summarize the scientific literature being discussed in\nthe blogosphere. The latter even <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/01/04/chemical-blogspace-is-getting-more.html\">recently started to blog about molecules being discussed <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.\nThere are even blogs which specialize on discussing literature, such as <a href=\"http://cheminfoclub.blogspot.com/\">the blog by Rajarshi, Gary and David</a>.</p>\n\n<h2 id=\"why-do-i-not-use-a-nickname\">Why do I not use a nickname?</h2>\n\n<p>In my blogging I am clear in who I am, even where I work; I blog about my scientific work, and, as reader, putting one and one\ntogether would lead to my real name soon enough anyway. I did not discuss the blogging with the employer I had in 2005, but the\nblogging is mostly done outside office hours anyway, certainly in that period. My current employer is a\n<a href=\"https://web.archive.org/web/20070120084645/http://wiki.cubic.uni-koeln.de/blog/index.php\">scientific blogger himself <i class=\"fa-solid fa-archive fa-xs\"></i></a>.\nEven my nickname, or pseudonym, is not that obfuscated.</p>\n\n<p>Moreover, I do make a statement in my blog (which sort of summarizes to: “you cannot do science if you cannot reproduce experimental results”),\nand I think it is not more than fair to identify myself. I’m not like Ender’s brother <a href=\"http://en.wikipedia.org/wiki/Peter_Wiggin\">Peter</a>.</p>\n\n<h2 id=\"why-do-i-answer-bethanys-questions\">Why do I answer Bethany’s questions?</h2>\n\n<p>I try to convince myself that I do not answer these questions out of <a href=\"https://cen.acs.org/articles/84/i22/Power-Procrastination.html\">procrastination <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\nsomething Bethany is wondering. Instead, I like blogging as new way to communicate with fellow scientists on a scientific level (Bethany,\nplease <em>do</em> explore <a href=\"https://web.archive.org/web/20070607221957/http://wiki.cubic.uni-koeln.de/cb/blogs.php\">the full chemical blogspace <i class=\"fa-solid fa-recycle fa-xs\"></i></a>, and be amazed of the high scientific\ncontent gems around!), though this might qualify is catching up with current literature. Moreover, answering this questions allows\nme to advertize my blog, and some websites I like. I feel that blogging might fill a niche in scientific communication.</p>\n\n<p>Bethany, please feel free to leave additional questions as comment.</p>",
      "summary": "Mitch blogged about a comment Bethany Halford, Associate Editor of C&amp;EN, left in The Chem Blog. She is writing an opinion piece on chemistry blogs, and is wondering why I blog, whether I use a nickname, and if my employer knows I blog. So, here goes.",
      
      "date_published": "2007-01-11T00:00:00+00:00",
      "date_modified": "2025-04-27T00:00:00+00:00",
      "tags": ["blog","chemistry"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/m734k-e1938",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/01/09/delicious-tagometer-on-www2bloggercom.html",
      "title": "The del.icio.us tagometer on www2.blogger.com",
      "content_html": "<p>Yesterday I blogged about <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/01/08/delicious-tagometer-on-blogspotcom.html\">how to include the new del.icio.us tagometer on a\nwww.blogger.com blog <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\njust like <a href=\"http://consumingexperience.blogspot.com/2006/12/delicious-tagometer-howto-manual-mode.html\">Improbulus did last December</a>\nas I discovered later. <a href=\"http://chemical-quantum-images.blogspot.com/\">Felix</a>\nasked me how it could be done on the new www2.blogger.com template system. Well,\nhere it is.</p>\n\n<p>Like with the old blogger.com template system, you need to add this to the header,\njust before the <code class=\"language-plaintext highlighter-rouge\">&lt;/head&gt;</code> end tag:</p>\n\n<div class=\"language-html highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"c\">&lt;!-- del.icio.us badge stuff --&gt;</span>\n<span class=\"nt\">&lt;script </span><span class=\"na\">type=</span><span class=\"s\">\"text/javascript\"</span><span class=\"nt\">&gt;</span>\n  <span class=\"k\">if </span><span class=\"p\">(</span><span class=\"k\">typeof</span> <span class=\"nb\">window</span><span class=\"p\">.</span><span class=\"nx\">Delicious</span> <span class=\"o\">==</span> <span class=\"dl\">\"</span><span class=\"s2\">undefined</span><span class=\"dl\">\"</span><span class=\"p\">)</span> <span class=\"nb\">window</span><span class=\"p\">.</span><span class=\"nx\">Delicious</span> <span class=\"o\">=</span> <span class=\"p\">{};</span>\n  <span class=\"nx\">Delicious</span><span class=\"p\">.</span><span class=\"nx\">BLOGBADGE_MANUAL_MODE</span> <span class=\"o\">=</span> <span class=\"kc\">true</span><span class=\"p\">;</span>\n<span class=\"nt\">&lt;/script&gt;</span>\n<span class=\"nt\">&lt;link</span> <span class=\"na\">id=</span><span class=\"s\">\"delicious-blogbadge-css\"</span> \n      <span class=\"na\">href=</span><span class=\"s\">\"http://images.del.icio.us/static/css/blogbadge.css\"</span>\n      <span class=\"na\">rel=</span><span class=\"s\">\"stylesheet\"</span> <span class=\"na\">type=</span><span class=\"s\">\"text/css\"</span> <span class=\"nt\">/&gt;</span>\n<span class=\"nt\">&lt;script </span><span class=\"na\">src=</span><span class=\"s\">\"http://images.del.icio.us/static/js/blogbadge.js\"</span> <span class=\"nt\">/&gt;</span>\n</code></pre></div></div>\n\n<p>And, for the blog entry template bit, look for this the <code class=\"language-plaintext highlighter-rouge\">&lt;p&gt;</code> element of class\n‘post-footer-line post-footer-line-3’, which was empty for me. Add this <code class=\"language-plaintext highlighter-rouge\">&lt;div&gt;</code>\nto that:</p>\n\n<div class=\"language-html highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nt\">&lt;p</span> <span class=\"na\">class=</span><span class=\"s\">'post-footer-line post-footer-line-3'</span><span class=\"nt\">&gt;</span>\n  <span class=\"nt\">&lt;div</span> <span class=\"na\">class=</span><span class=\"s\">\"delicious-blogbadge-line\"</span> <span class=\"na\">expr:id=</span><span class=\"s\">\"data:post.id\"</span><span class=\"nt\">&gt;</span>\n    <span class=\"nt\">&lt;script </span><span class=\"na\">type=</span><span class=\"s\">\"text/javascript\"</span><span class=\"nt\">&gt;</span>\n      <span class=\"nx\">Delicious</span><span class=\"p\">.</span><span class=\"nx\">BlogBadge</span><span class=\"p\">.</span><span class=\"nx\">register</span><span class=\"p\">(</span><span class=\"dl\">'</span><span class=\"s1\">&lt;data:post.id/&gt;</span><span class=\"dl\">'</span><span class=\"p\">,</span> <span class=\"dl\">'</span><span class=\"s1\">&lt;data:post.url/&gt;</span><span class=\"dl\">'</span><span class=\"p\">,</span> <span class=\"dl\">'</span><span class=\"s1\">&lt;data:post.title/&gt;</span><span class=\"dl\">'</span><span class=\"p\">);</span>\n    <span class=\"nt\">&lt;/script&gt;</span>\n  <span class=\"nt\">&lt;/div&gt;</span>\n<span class=\"nt\">&lt;/p&gt;</span>\n</code></pre></div></div>\n\n<p>To get at the right place, with the full template XHTML content, go to your\n<a href=\"http://www2.blogger.com/home\">www2.blogger.com/home</a> homepage, click the\nTemplate tab, then pick the <em>Edit HTML</em> option, and make sure to enable the\n<strong>Expand Widget Templates</strong> option.</p>",
      "summary": "Yesterday I blogged about how to include the new del.icio.us tagometer on a www.blogger.com blog , just like Improbulus did last December as I discovered later. Felix asked me how it could be done on the new www2.blogger.com template system. Well, here it is.",
      
      "date_published": "2007-01-09T00:00:00+00:00",
      "date_modified": "2025-04-27T00:00:00+00:00",
      "tags": ["blog"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/ysd5e-04645",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/01/08/delicious-tagometer-on-blogspotcom.html",
      "title": "The del.icio.us tagometer on Blogspot.com",
      "content_html": "<p>Some days ago I read about the <a href=\"http://del.icio.us/\">del.icio.us</a>\n<a href=\"http://blog.del.icio.us/blog/2006/12/the_new_and_tag.html#more\">tagometer</a>, which\nis basically sort of save as I had before on this blog. The tagometer, however,\nshows some interesting properties of the blog items, like the number of people who\nbookmarked the item, and what tags they used. The\n<a href=\"http://del.icio.us/help/tagometer\">tagometer help</a> does not show how it can be\nintegrated with <a href=\"http://www.blogspot.com/\">blogspot.com</a> (where this blog is hosted),\nbut with the source from <a href=\"http://decafbad.com/blog/\">0xDECAFBAD</a> I got it working.\nThese blogs are not yet moved to the new blogger.com system (so, www.blogger.com,\nnot www2.blogger.com), so the below principally applies to the older system.</p>\n\n<p>First you need to adapt this blob to the <code class=\"language-plaintext highlighter-rouge\">&lt;head&gt;</code> of the template:</p>\n\n<div class=\"language-html highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nt\">&lt;</span><span class=\"err\">$</span><span class=\"na\">BlogMetaData</span><span class=\"err\">$</span><span class=\"nt\">&gt;</span>\n\n<span class=\"c\">&lt;!-- del.icio.us badge stuff --&gt;</span>\n<span class=\"nt\">&lt;script </span><span class=\"na\">type=</span><span class=\"s\">\"text/javascript\"</span><span class=\"nt\">&gt;</span>\n  <span class=\"k\">if </span><span class=\"p\">(</span><span class=\"k\">typeof</span> <span class=\"nb\">window</span><span class=\"p\">.</span><span class=\"nx\">Delicious</span> <span class=\"o\">==</span> <span class=\"dl\">\"</span><span class=\"s2\">undefined</span><span class=\"dl\">\"</span><span class=\"p\">)</span> <span class=\"nb\">window</span><span class=\"p\">.</span><span class=\"nx\">Delicious</span> <span class=\"o\">=</span> <span class=\"p\">{};</span>\n  <span class=\"nx\">Delicious</span><span class=\"p\">.</span><span class=\"nx\">BLOGBADGE_MANUAL_MODE</span> <span class=\"o\">=</span> <span class=\"kc\">true</span><span class=\"p\">;</span>\n<span class=\"nt\">&lt;/script&gt;</span>\n<span class=\"nt\">&lt;link</span> <span class=\"na\">id=</span><span class=\"s\">\"delicious-blogbadge-css\"</span> \n      <span class=\"na\">href=</span><span class=\"s\">\"http://images.del.icio.us/static/css/blogbadge.css\"</span>\n      <span class=\"na\">rel=</span><span class=\"s\">\"stylesheet\"</span> <span class=\"na\">type=</span><span class=\"s\">\"text/css\"</span> <span class=\"nt\">/&gt;</span>\n<span class=\"nt\">&lt;script </span><span class=\"na\">src=</span><span class=\"s\">\"http://images.del.icio.us/static/js/blogbadge.js\"</span><span class=\"nt\">&gt;&lt;/script&gt;</span>\n\n<span class=\"nt\">&lt;/head&gt;</span>\n</code></pre></div></div>\n\n<p>where <code class=\"language-plaintext highlighter-rouge\">&lt;$BlogMetaData$&gt;</code> and <code class=\"language-plaintext highlighter-rouge\">&lt;/head&gt;</code> should already be present in the template.</p>\n\n<p>Further down the template, you need to add a bit in the <code class=\"language-plaintext highlighter-rouge\">&lt;div class=\"blogPost\"&gt;</code>\nsection, just after the last <code class=\"language-plaintext highlighter-rouge\">&lt;div class=\"byline\"&gt;</code> element in your template.\nThe bits you add use blogger variables, so make sure to get it right:</p>\n\n<div class=\"language-html highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nt\">&lt;div</span> <span class=\"na\">class=</span><span class=\"s\">\"delicious-blogbadge-line\"</span> <span class=\"na\">id=</span><span class=\"s\">\"badge-&lt;$BlogItemNumber$&gt;\"</span><span class=\"nt\">&gt;</span>\n  <span class=\"nt\">&lt;script </span><span class=\"na\">type=</span><span class=\"s\">\"text/javascript\"</span><span class=\"nt\">&gt;</span>\n    <span class=\"nx\">Delicious</span><span class=\"p\">.</span><span class=\"nx\">BlogBadge</span><span class=\"p\">.</span><span class=\"nx\">register</span><span class=\"p\">(</span><span class=\"dl\">'</span><span class=\"s1\">badge-&lt;$BlogItemNumber$&gt;</span><span class=\"dl\">'</span><span class=\"p\">,</span> <span class=\"dl\">'</span><span class=\"s1\">&lt;$BlogItemPermalinkURL$&gt;</span><span class=\"dl\">'</span><span class=\"p\">,</span> <span class=\"dl\">\"</span><span class=\"s2\">&lt;$BlogItemTitle$&gt;</span><span class=\"dl\">\"</span><span class=\"p\">);</span>\n  <span class=\"nt\">&lt;/script&gt;</span>\n<span class=\"nt\">&lt;/div&gt;</span>\n</code></pre></div></div>\n\n<p>Note the quotes of the third argument. Do this properly, the quotes in the output\nof <code class=\"language-plaintext highlighter-rouge\">&lt;$BlogItemTitle$&gt;</code> should be escaped, so that it does not interfere with the\nquotes of the <code class=\"language-plaintext highlighter-rouge\">register()</code> JavaScript call. Can anyone tell me how to do that\nin JavaScript?</p>",
      "summary": "Some days ago I read about the del.icio.us tagometer, which is basically sort of save as I had before on this blog. The tagometer, however, shows some interesting properties of the blog items, like the number of people who bookmarked the item, and what tags they used. The tagometer help does not show how it can be integrated with blogspot.com (where this blog is hosted), but with the source from 0xDECAFBAD I got it working. These blogs are not yet moved to the new blogger.com system (so, www.blogger.com, not www2.blogger.com), so the below principally applies to the older system.",
      
      "date_published": "2007-01-08T00:00:00+00:00",
      "date_modified": "2025-02-16T00:00:00+00:00",
      "tags": ["blog"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/td7y5-2gb60",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/01/04/chemical-blogspace-is-getting-more.html",
      "title": "Chemical blogspace is getting more chemical",
      "content_html": "<p>The best remedy for being depressed is the rush after hacking some nice new feature (unfortunately, it is addictive). After\n<a href=\"http://chemicalblogspace.blogspot.com/2006/12/hacking-inchi-support-into-cb.html\">hacking InChI support into Chemical blogspace</a>\na couple of days back, adding some more visual feedback on <a href=\"http://web.archive.org/web/20070611160715/http://wiki.cubic.uni-koeln.de/cb/inchis.php\">those molecules <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>\nis not that hard, with <a href=\"http://pubchem.ncbi.nlm.nih.gov/\">PubChem</a> around that is:</p>\n\n<p><img src=\"/assets/images/inchisCbPage.png\" alt=\"\" /></p>\n\n<p>Beware! Every <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/12/10/including-smiles-cml-and-inchi-in.html\">marked up molecule <i class=\"fa-solid fa-box-archive fa-xs\"></i></a> in your\nblog is being picked up! So should the compound with the SMILES N(=NC1=CC=C(C=C1)N(CCO)CCO)C3=CC=C(C=CC2=C(C(=C(C#N)C#N)OC2(C)C)C#N)S3,\nwhich is <a href=\"http://web.archive.org/web/20240915152205/https://www.sciencelink.net/verdieping/organische-chemie-versnelt-internet/9035.article\">reported to be the most light sensitive molecule every synthesized so far\n<i class=\"fa-solid fa-box-archive fa-xs\"></i></a>.</p>",
      "summary": "The best remedy for being depressed is the rush after hacking some nice new feature (unfortunately, it is addictive). After hacking InChI support into Chemical blogspace a couple of days back, adding some more visual feedback on those molecules is not that hard, with PubChem around that is:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/inchisCbPage.png",
      "date_published": "2007-01-04T00:00:00+00:00",
      "date_modified": "2025-02-16T00:00:00+00:00",
      "tags": ["cb","inchi","pubchem"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/dmfek-pdt97",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/01/02/chemistry-in-html-javascript-from.html",
      "title": "Chemistry in HTML: JavaScript from the server",
      "content_html": "<p>Recently I blogged about <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/12/17/smiles-cas-and-inchi-in-blogs.html\">a Greasemonkey script <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nto take advantage of <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/12/10/including-smiles-cml-and-inchi-in.html\">semantic markup of chemistry in blogs <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\n(and HTML in general), and later made <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/12/19/chemistry-in-html-greasemonkey-again.html\">some plans how this can be\nextended <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.\nOne of the ideas was to make this userscript available from the server, instead\nof having people need to install <a href=\"http://greasemonkey.mozdev.org/\">Greasemonkey</a>\nand the script separately. So, here it is.</p>\n\n<h2 id=\"sechemticjs\">sechemtic.js</h2>\n\n<p>Consider this (X)HTML:</p>\n\n<div class=\"language-html highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nt\">&lt;html</span> <span class=\"na\">xmlns=</span><span class=\"s\">\"http://www.w3.org/1999/xhtml\"</span>\n      <span class=\"na\">xmlns:chem=</span><span class=\"s\">\"http://www.blueobelisk.org/chemistryblogs/\"</span><span class=\"nt\">&gt;</span>\n\n<span class=\"nt\">&lt;head&gt;</span>\n <span class=\"nt\">&lt;title&gt;</span>m1<span class=\"nt\">&lt;/title&gt;</span>\n <span class=\"nt\">&lt;script </span><span class=\"na\">type=</span><span class=\"s\">\"text/javascript\"</span> <span class=\"na\">src=</span><span class=\"s\">\"sechemtic.js\"</span> <span class=\"nt\">/&gt;</span>\n<span class=\"o\">&lt;</span><span class=\"sr\">/head</span><span class=\"err\">&gt;\n</span>\n<span class=\"o\">&lt;</span><span class=\"nx\">body</span> <span class=\"nx\">onload</span><span class=\"o\">=</span><span class=\"dl\">\"</span><span class=\"s2\">addGoogleAndPubChemLinks(1,1)</span><span class=\"dl\">\"</span><span class=\"o\">&gt;</span>\n  <span class=\"o\">&lt;</span><span class=\"nx\">h1</span><span class=\"o\">&gt;</span><span class=\"nx\">The</span> <span class=\"nx\">Output</span><span class=\"o\">&lt;</span><span class=\"sr\">/h1</span><span class=\"err\">&gt;\n</span>  <span class=\"o\">&lt;</span><span class=\"nx\">p</span><span class=\"o\">&gt;</span><span class=\"nx\">This</span> <span class=\"nx\">article</span> <span class=\"nx\">is</span> <span class=\"nx\">about</span> <span class=\"o\">&lt;</span><span class=\"nx\">span</span> <span class=\"kd\">class</span><span class=\"o\">=</span><span class=\"dl\">\"</span><span class=\"s2\">chem:compound</span><span class=\"dl\">\"</span><span class=\"o\">&gt;</span><span class=\"nx\">m1</span><span class=\"o\">&lt;</span><span class=\"sr\">/span&gt;</span><span class=\"err\"> \n</span>  <span class=\"p\">(</span><span class=\"nx\">SMILES</span><span class=\"p\">:</span><span class=\"o\">&lt;</span><span class=\"nx\">span</span> <span class=\"kd\">class</span><span class=\"o\">=</span><span class=\"dl\">\"</span><span class=\"s2\">chem:smiles</span><span class=\"dl\">\"</span><span class=\"o\">&gt;</span><span class=\"nx\">CCCOC</span><span class=\"o\">&lt;</span><span class=\"sr\">/span&gt;</span><span class=\"se\">)</span><span class=\"sr\">.&lt;/</span><span class=\"nx\">p</span><span class=\"o\">&gt;</span>\n\n<span class=\"o\">&lt;</span><span class=\"sr\">/body</span><span class=\"err\">&gt;\n</span>\n<span class=\"o\">&lt;</span><span class=\"sr\">/html</span><span class=\"err\">&gt;\n</span></code></pre></div></div>\n\n<p><img src=\"/assets/images/sechemticJSOutput.png\" alt=\"\" /></p>\n\n<p>I think the above example shows the simple setup of the Sechemtic Web script (please\nforgive me my habit to use bad linguistic mashups ;). Just load the script in the\nHTML <code class=\"language-plaintext highlighter-rouge\">&lt;head&gt;</code>, and add in the <code class=\"language-plaintext highlighter-rouge\">onload=\"addGoogleAndPubChemLinks(1,1)\"</code> attribute to\nthe <code class=\"language-plaintext highlighter-rouge\">&lt;body&gt;</code> element. With blogs these bits would be part of the template, and,\ntherefore, need to be installed once. From then on, just use the <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/12/10/including-smiles-cml-and-inchi-in.html\">semantic markup as\nexplained earlier <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.\nBoth the microformat and the RDFa method are supported. In\ncase of the latter, I recommend to define the chem namespace in the template of\nwebpages too, instead of in the <code class=\"language-plaintext highlighter-rouge\">&lt;span&gt;</code> elements.</p>\n\n<p>Currently, the Sechemtic Web script only has one functionality: to add links to\n<a href=\"http://pubchem.ncbi.nlm.nih.gov/\">PubChem</a> and <a href=\"http://www.google.com/\">Google</a>,\nwith the <code class=\"language-plaintext highlighter-rouge\">addGoogleAndPubChemLinks(int, int)</code> method. The\nfirst parameter determines (0 or 1) if links to Google should be made, and the\nsecond parameter does the same for links to PubChem.</p>\n\n<h2 id=\"download\">Download</h2>\n\n<p>For now, the script can be downloaded <a href=\"http://wiki.cubic.uni-koeln.de/cb/sechemtic.js\">here</a>.\nIt is licensed with the GPL version 2.0.</p>\n\n<h2 id=\"microformats\">Microformats</h2>\n\n<p>Here’s the same example using <a href=\"http://microformats.org/\">microformats</a>\ninstead of RDFa:</p>\n\n<div class=\"language-html highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nt\">&lt;html&gt;</span>\n\n<span class=\"nt\">&lt;head&gt;</span>\n <span class=\"nt\">&lt;title&gt;</span>m1<span class=\"nt\">&lt;/title&gt;</span>\n <span class=\"nt\">&lt;script </span><span class=\"na\">type=</span><span class=\"s\">\"text/javascript\"</span> <span class=\"na\">src=</span><span class=\"s\">\"sechemtic.js\"</span> <span class=\"nt\">/&gt;</span>\n<span class=\"o\">&lt;</span><span class=\"sr\">/head</span><span class=\"err\">&gt;\n</span>\n<span class=\"o\">&lt;</span><span class=\"nx\">body</span> <span class=\"nx\">onload</span><span class=\"o\">=</span><span class=\"dl\">\"</span><span class=\"s2\">addGoogleAndPubChemLinks(1,1)</span><span class=\"dl\">\"</span><span class=\"o\">&gt;</span>\n  <span class=\"o\">&lt;</span><span class=\"nx\">h1</span><span class=\"o\">&gt;</span><span class=\"nx\">The</span> <span class=\"nx\">Output</span><span class=\"o\">&lt;</span><span class=\"sr\">/h1</span><span class=\"err\">&gt;\n</span>  <span class=\"o\">&lt;</span><span class=\"nx\">p</span><span class=\"o\">&gt;</span><span class=\"nx\">This</span> <span class=\"nx\">article</span> <span class=\"nx\">is</span> <span class=\"nx\">about</span> <span class=\"o\">&lt;</span><span class=\"nx\">span</span> <span class=\"kd\">class</span><span class=\"o\">=</span><span class=\"dl\">\"</span><span class=\"s2\">compound</span><span class=\"dl\">\"</span><span class=\"o\">&gt;</span><span class=\"nx\">m1</span><span class=\"o\">&lt;</span><span class=\"sr\">/span&gt;</span><span class=\"err\"> \n</span>  <span class=\"p\">(</span><span class=\"nx\">SMILES</span><span class=\"p\">:</span><span class=\"o\">&lt;</span><span class=\"nx\">span</span> <span class=\"kd\">class</span><span class=\"o\">=</span><span class=\"dl\">\"</span><span class=\"s2\">smiles</span><span class=\"dl\">\"</span><span class=\"o\">&gt;</span><span class=\"nx\">CCCOC</span><span class=\"o\">&lt;</span><span class=\"sr\">/span&gt;</span><span class=\"se\">)</span><span class=\"sr\">.&lt;/</span><span class=\"nx\">p</span><span class=\"o\">&gt;</span>\n\n<span class=\"o\">&lt;</span><span class=\"sr\">/body</span><span class=\"err\">&gt;\n</span>\n<span class=\"o\">&lt;</span><span class=\"sr\">/html</span><span class=\"err\">&gt;\n</span></code></pre></div></div>",
      "summary": "Recently I blogged about a Greasemonkey script to take advantage of semantic markup of chemistry in blogs (and HTML in general), and later made some plans how this can be extended . One of the ideas was to make this userscript available from the server, instead of having people need to install Greasemonkey and the script separately. So, here it is.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/sechemticJSOutput.png",
      "date_published": "2007-01-02T00:00:00+00:00",
      "date_modified": "2025-04-27T00:00:00+00:00",
      "tags": ["html","javascript","userscript"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/h2ytc-6pt51",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/12/30/modern-chemistry-in-cdk-beyond-two.html",
      "title": "Modern chemistry in the CDK: beyond the two-atom bond",
      "content_html": "<p><a href=\"http://depth-first.com/\">Rich</a> <a href=\"https://doi.org/10.59350/pz3p6-fv247\">recently blogged <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nabout the limitations of the two-atom bond representation often used in chemoinformatics,\ntriggered by <a href=\"https://doi.org/10.59350/xnt9b-80962\">the four ferrocene entries in PubChem <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.\nIn reply to himself, <a href=\"https://doi.org/10.59350/s1vtx-e9q82\">Rich described FlexMol <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\nan XML language that can describe bond systems that involve more than two atoms.</p>\n\n<p>Obviously, the problems originates from the lack of mathematical knowledge of chemists: the\ncurrent chemoinformatics heavily depends on graph theory, where each atom is a vertex and each\nbond an edge. This has the advantage that we can borrow all algorithms that work with graph\nrepresentations, such as <a href=\"http://en.wikipedia.org/wiki/Dijkstra's_algorithm\">Dijkstra’s algorithm</a>\nto find the shortest path between two vertices. Or, in chemical language, an algorithm to\ncalculate how many bonds two atoms are apart in a molecule.</p>\n\n<p>When discussing FlexMol, Rich mentions the work by Dietz (DOI:<a href=\"https://doi.org/10.1021/ci00027a001\">10.1021/ci00027a001</a>),\nbut I would like to mention the PhD thesis of S. Bauerschmidt to this (see\nDOI:<a href=\"https://doi.org/10.1021/ci9704423\">10.1021/ci9704423</a>) done in Gasteiger’s group.\nDropping this ‘two-atom bond’ representation in favor of something that better describes compounds\nlike ferrocene, like the Dietz and Bauerschmidt approaches, has the unfortunate disadvantage of\nloosing compatibility with graph theory algorithms. Nevertheless, in order to take\nchemoinformatics to the next level, we have to address these issues. But hope is not lost, and\npeople are working on rewriting our toolkit of chemoinformatics algorithms to match such new\nrepresentations.</p>\n\n<h2 id=\"cdk\">CDK</h2>\n\n<p>I will postpone analyzing the <a href=\"http://cdk.sf.net/\">CDK</a> for compatibility with such more modern\nrepresentations (look out for a <a href=\"http://cdknews.org/\">CDK News</a> article), and now just describe\nhow the CDK can be used for FlexMol/Dietz/Bauerschmidt representations. Consider\n<a href=\"https://doi.org/10.59350/s1vtx-e9q82\">the four examples Rich gives <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nin his blog. Here are the CDK ways of doing the same.</p>\n\n<p>For example, 1,3,5-cyclohexatriene:</p>\n\n<div class=\"language-java highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kd\">public</span> <span class=\"nc\">IMolecule</span> <span class=\"nf\">makeCycloHexaTriene</span><span class=\"o\">()</span> <span class=\"o\">{</span>\n  <span class=\"nc\">IMolecule</span> <span class=\"n\">cyclohexatriene</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newMolecule</span><span class=\"o\">();</span>\n\n  <span class=\"nc\">IAtom</span> <span class=\"n\">atomC0</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"nc\">Elements</span><span class=\"o\">.</span><span class=\"na\">CARBON</span><span class=\"o\">);</span>\n    <span class=\"n\">atomC0</span><span class=\"o\">.</span><span class=\"na\">setID</span><span class=\"o\">(</span><span class=\"s\">\"C0\"</span><span class=\"o\">);</span> <span class=\"n\">atomC0</span><span class=\"o\">.</span><span class=\"na\">setHydrogenCount</span><span class=\"o\">(</span><span class=\"mi\">1</span><span class=\"o\">);</span>\n  <span class=\"nc\">IAtom</span> <span class=\"n\">atomC1</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"nc\">Elements</span><span class=\"o\">.</span><span class=\"na\">CARBON</span><span class=\"o\">);</span>\n    <span class=\"n\">atomC1</span><span class=\"o\">.</span><span class=\"na\">setID</span><span class=\"o\">(</span><span class=\"s\">\"C1\"</span><span class=\"o\">);</span> <span class=\"n\">atomC1</span><span class=\"o\">.</span><span class=\"na\">setHydrogenCount</span><span class=\"o\">(</span><span class=\"mi\">1</span><span class=\"o\">);</span>\n  <span class=\"nc\">IAtom</span> <span class=\"n\">atomC2</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"nc\">Elements</span><span class=\"o\">.</span><span class=\"na\">CARBON</span><span class=\"o\">);</span>\n    <span class=\"n\">atomC2</span><span class=\"o\">.</span><span class=\"na\">setID</span><span class=\"o\">(</span><span class=\"s\">\"C2\"</span><span class=\"o\">);</span> <span class=\"n\">atomC2</span><span class=\"o\">.</span><span class=\"na\">setHydrogenCount</span><span class=\"o\">(</span><span class=\"mi\">1</span><span class=\"o\">);</span>\n  <span class=\"nc\">IAtom</span> <span class=\"n\">atomC3</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"nc\">Elements</span><span class=\"o\">.</span><span class=\"na\">CARBON</span><span class=\"o\">);</span>\n    <span class=\"n\">atomC3</span><span class=\"o\">.</span><span class=\"na\">setID</span><span class=\"o\">(</span><span class=\"s\">\"C3\"</span><span class=\"o\">);</span> <span class=\"n\">atomC3</span><span class=\"o\">.</span><span class=\"na\">setHydrogenCount</span><span class=\"o\">(</span><span class=\"mi\">1</span><span class=\"o\">);</span>\n  <span class=\"nc\">IAtom</span> <span class=\"n\">atomC4</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"nc\">Elements</span><span class=\"o\">.</span><span class=\"na\">CARBON</span><span class=\"o\">);</span>\n    <span class=\"n\">atomC4</span><span class=\"o\">.</span><span class=\"na\">setID</span><span class=\"o\">(</span><span class=\"s\">\"C4\"</span><span class=\"o\">);</span> <span class=\"n\">atomC4</span><span class=\"o\">.</span><span class=\"na\">setHydrogenCount</span><span class=\"o\">(</span><span class=\"mi\">1</span><span class=\"o\">);</span>\n  <span class=\"nc\">IAtom</span> <span class=\"n\">atomC5</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"nc\">Elements</span><span class=\"o\">.</span><span class=\"na\">CARBON</span><span class=\"o\">);</span>\n    <span class=\"n\">atomC5</span><span class=\"o\">.</span><span class=\"na\">setID</span><span class=\"o\">(</span><span class=\"s\">\"C5\"</span><span class=\"o\">);</span> <span class=\"n\">atomC5</span><span class=\"o\">.</span><span class=\"na\">setHydrogenCount</span><span class=\"o\">(</span><span class=\"mi\">1</span><span class=\"o\">);</span>\n\n  <span class=\"nc\">IBond</span> <span class=\"n\">bondB0</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newBond</span><span class=\"o\">(</span><span class=\"n\">atomC0</span><span class=\"o\">,</span> <span class=\"n\">atomC1</span><span class=\"o\">,</span> <span class=\"mf\">1.0</span><span class=\"o\">);</span>\n    <span class=\"n\">bondB0</span><span class=\"o\">.</span><span class=\"na\">setElectronCount</span><span class=\"o\">(</span><span class=\"mi\">2</span><span class=\"o\">);</span>\n  <span class=\"nc\">IBond</span> <span class=\"n\">bondB1</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newBond</span><span class=\"o\">(</span><span class=\"n\">atomC1</span><span class=\"o\">,</span> <span class=\"n\">atomC2</span><span class=\"o\">,</span> <span class=\"mf\">2.0</span><span class=\"o\">);</span>\n    <span class=\"n\">bondB1</span><span class=\"o\">.</span><span class=\"na\">setElectronCount</span><span class=\"o\">(</span><span class=\"mi\">4</span><span class=\"o\">);</span>\n  <span class=\"nc\">IBond</span> <span class=\"n\">bondB2</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newBond</span><span class=\"o\">(</span><span class=\"n\">atomC2</span><span class=\"o\">,</span> <span class=\"n\">atomC3</span><span class=\"o\">,</span> <span class=\"mf\">1.0</span><span class=\"o\">);</span>\n    <span class=\"n\">bondB2</span><span class=\"o\">.</span><span class=\"na\">setElectronCount</span><span class=\"o\">(</span><span class=\"mi\">2</span><span class=\"o\">);</span>\n  <span class=\"nc\">IBond</span> <span class=\"n\">bondB3</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newBond</span><span class=\"o\">(</span><span class=\"n\">atomC3</span><span class=\"o\">,</span> <span class=\"n\">atomC4</span><span class=\"o\">,</span> <span class=\"mf\">2.0</span><span class=\"o\">);</span>\n    <span class=\"n\">bondB3</span><span class=\"o\">.</span><span class=\"na\">setElectronCount</span><span class=\"o\">(</span><span class=\"mi\">4</span><span class=\"o\">);</span>\n  <span class=\"nc\">IBond</span> <span class=\"n\">bondB4</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newBond</span><span class=\"o\">(</span><span class=\"n\">atomC4</span><span class=\"o\">,</span> <span class=\"n\">atomC5</span><span class=\"o\">,</span> <span class=\"mf\">1.0</span><span class=\"o\">);</span>\n    <span class=\"n\">bondB4</span><span class=\"o\">.</span><span class=\"na\">setElectronCount</span><span class=\"o\">(</span><span class=\"mi\">2</span><span class=\"o\">);</span>\n  <span class=\"nc\">IBond</span> <span class=\"n\">bondB5</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newBond</span><span class=\"o\">(</span><span class=\"n\">atomC0</span><span class=\"o\">,</span> <span class=\"n\">atomC5</span><span class=\"o\">,</span> <span class=\"mf\">2.0</span><span class=\"o\">);</span>\n    <span class=\"n\">bondB5</span><span class=\"o\">.</span><span class=\"na\">setElectronCount</span><span class=\"o\">(</span><span class=\"mi\">4</span><span class=\"o\">);</span>\n\n  <span class=\"n\">cyclohexatriene</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">atomC0</span><span class=\"o\">);</span> <span class=\"n\">cyclohexatriene</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">atomC1</span><span class=\"o\">);</span>\n  <span class=\"n\">cyclohexatriene</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">atomC2</span><span class=\"o\">);</span> <span class=\"n\">cyclohexatriene</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">atomC3</span><span class=\"o\">);</span>\n  <span class=\"n\">cyclohexatriene</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">atomC4</span><span class=\"o\">);</span> <span class=\"n\">cyclohexatriene</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">atomC5</span><span class=\"o\">);</span>\n\n  <span class=\"n\">cyclohexatriene</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">bondB0</span><span class=\"o\">);</span> <span class=\"n\">cyclohexatriene</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">bondB1</span><span class=\"o\">);</span>\n  <span class=\"n\">cyclohexatriene</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">bondB2</span><span class=\"o\">);</span> <span class=\"n\">cyclohexatriene</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">bondB3</span><span class=\"o\">);</span>\n  <span class=\"n\">cyclohexatriene</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">bondB4</span><span class=\"o\">);</span> <span class=\"n\">cyclohexatriene</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">bondB5</span><span class=\"o\">);</span>\n\n  <span class=\"k\">return</span> <span class=\"n\">cyclohexatriene</span><span class=\"o\">;</span>\n<span class=\"o\">}</span>\n</code></pre></div></div>\n\n<p>Summarizing, the key thing is to use the <code class=\"language-plaintext highlighter-rouge\">IBond.setElectronCount()</code> method.\nThe call is sort of  redundant, as the CDK defaults to two electrons if not\nexplicitly given. This compound is, of course, benzene which we can represent\nlike this too:</p>\n\n<div class=\"language-java highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kd\">public</span> <span class=\"nc\">IMolecule</span> <span class=\"nf\">makeBenzene</span><span class=\"o\">()</span> <span class=\"o\">{</span>\n  <span class=\"nc\">IMolecule</span> <span class=\"n\">benzene</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newMolecule</span><span class=\"o\">();</span>\n\n  <span class=\"nc\">IAtom</span> <span class=\"n\">atomC0</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"nc\">Elements</span><span class=\"o\">.</span><span class=\"na\">CARBON</span><span class=\"o\">);</span>\n    <span class=\"n\">atomC0</span><span class=\"o\">.</span><span class=\"na\">setID</span><span class=\"o\">(</span><span class=\"s\">\"C0\"</span><span class=\"o\">);</span> <span class=\"n\">atomC0</span><span class=\"o\">.</span><span class=\"na\">setHydrogenCount</span><span class=\"o\">(</span><span class=\"mi\">1</span><span class=\"o\">);</span>\n  <span class=\"nc\">IAtom</span> <span class=\"n\">atomC1</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"nc\">Elements</span><span class=\"o\">.</span><span class=\"na\">CARBON</span><span class=\"o\">);</span>\n    <span class=\"n\">atomC1</span><span class=\"o\">.</span><span class=\"na\">setID</span><span class=\"o\">(</span><span class=\"s\">\"C1\"</span><span class=\"o\">);</span> <span class=\"n\">atomC1</span><span class=\"o\">.</span><span class=\"na\">setHydrogenCount</span><span class=\"o\">(</span><span class=\"mi\">1</span><span class=\"o\">);</span>\n  <span class=\"nc\">IAtom</span> <span class=\"n\">atomC2</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"nc\">Elements</span><span class=\"o\">.</span><span class=\"na\">CARBON</span><span class=\"o\">);</span>\n    <span class=\"n\">atomC2</span><span class=\"o\">.</span><span class=\"na\">setID</span><span class=\"o\">(</span><span class=\"s\">\"C2\"</span><span class=\"o\">);</span> <span class=\"n\">atomC2</span><span class=\"o\">.</span><span class=\"na\">setHydrogenCount</span><span class=\"o\">(</span><span class=\"mi\">1</span><span class=\"o\">);</span>\n  <span class=\"nc\">IAtom</span> <span class=\"n\">atomC3</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"nc\">Elements</span><span class=\"o\">.</span><span class=\"na\">CARBON</span><span class=\"o\">);</span>\n    <span class=\"n\">atomC3</span><span class=\"o\">.</span><span class=\"na\">setID</span><span class=\"o\">(</span><span class=\"s\">\"C3\"</span><span class=\"o\">);</span> <span class=\"n\">atomC3</span><span class=\"o\">.</span><span class=\"na\">setHydrogenCount</span><span class=\"o\">(</span><span class=\"mi\">1</span><span class=\"o\">);</span>\n  <span class=\"nc\">IAtom</span> <span class=\"n\">atomC4</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"nc\">Elements</span><span class=\"o\">.</span><span class=\"na\">CARBON</span><span class=\"o\">);</span> \n    <span class=\"n\">atomC4</span><span class=\"o\">.</span><span class=\"na\">setID</span><span class=\"o\">(</span><span class=\"s\">\"C4\"</span><span class=\"o\">);</span> <span class=\"n\">atomC4</span><span class=\"o\">.</span><span class=\"na\">setHydrogenCount</span><span class=\"o\">(</span><span class=\"mi\">1</span><span class=\"o\">);</span>\n  <span class=\"nc\">IAtom</span> <span class=\"n\">atomC5</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"nc\">Elements</span><span class=\"o\">.</span><span class=\"na\">CARBON</span><span class=\"o\">);</span> \n    <span class=\"n\">atomC5</span><span class=\"o\">.</span><span class=\"na\">setID</span><span class=\"o\">(</span><span class=\"s\">\"C5\"</span><span class=\"o\">);</span> <span class=\"n\">atomC5</span><span class=\"o\">.</span><span class=\"na\">setHydrogenCount</span><span class=\"o\">(</span><span class=\"mi\">1</span><span class=\"o\">);</span>\n\n  <span class=\"nc\">IBond</span> <span class=\"n\">bondB0</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newBond</span><span class=\"o\">(</span><span class=\"n\">atomC0</span><span class=\"o\">,</span> <span class=\"n\">atomC1</span><span class=\"o\">);</span>\n    <span class=\"n\">bondB0</span><span class=\"o\">.</span><span class=\"na\">setElectronCount</span><span class=\"o\">(</span><span class=\"mi\">2</span><span class=\"o\">);</span>\n  <span class=\"nc\">IBond</span> <span class=\"n\">bondB1</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newBond</span><span class=\"o\">(</span><span class=\"n\">atomC1</span><span class=\"o\">,</span> <span class=\"n\">atomC2</span><span class=\"o\">);</span>\n    <span class=\"n\">bondB1</span><span class=\"o\">.</span><span class=\"na\">setElectronCount</span><span class=\"o\">(</span><span class=\"mi\">2</span><span class=\"o\">);</span>\n  <span class=\"nc\">IBond</span> <span class=\"n\">bondB2</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newBond</span><span class=\"o\">(</span><span class=\"n\">atomC2</span><span class=\"o\">,</span> <span class=\"n\">atomC3</span><span class=\"o\">);</span>\n    <span class=\"n\">bondB2</span><span class=\"o\">.</span><span class=\"na\">setElectronCount</span><span class=\"o\">(</span><span class=\"mi\">2</span><span class=\"o\">);</span>\n  <span class=\"nc\">IBond</span> <span class=\"n\">bondB3</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newBond</span><span class=\"o\">(</span><span class=\"n\">atomC3</span><span class=\"o\">,</span> <span class=\"n\">atomC4</span><span class=\"o\">);</span>\n    <span class=\"n\">bondB3</span><span class=\"o\">.</span><span class=\"na\">setElectronCount</span><span class=\"o\">(</span><span class=\"mi\">2</span><span class=\"o\">);</span>\n  <span class=\"nc\">IBond</span> <span class=\"n\">bondB4</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newBond</span><span class=\"o\">(</span><span class=\"n\">atomC4</span><span class=\"o\">,</span> <span class=\"n\">atomC5</span><span class=\"o\">);</span>\n    <span class=\"n\">bondB4</span><span class=\"o\">.</span><span class=\"na\">setElectronCount</span><span class=\"o\">(</span><span class=\"mi\">2</span><span class=\"o\">);</span>\n  <span class=\"nc\">IBond</span> <span class=\"n\">bondB5</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newBond</span><span class=\"o\">(</span><span class=\"n\">atomC0</span><span class=\"o\">,</span> <span class=\"n\">atomC5</span><span class=\"o\">);</span>\n    <span class=\"n\">bondB5</span><span class=\"o\">.</span><span class=\"na\">setElectronCount</span><span class=\"o\">(</span><span class=\"mi\">2</span><span class=\"o\">);</span>\n\n  <span class=\"nc\">IBond</span> <span class=\"n\">bondingSystem</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newBond</span><span class=\"o\">();</span>\n    <span class=\"n\">bondingSystem</span><span class=\"o\">.</span><span class=\"na\">setElectronCount</span><span class=\"o\">(</span><span class=\"mi\">6</span><span class=\"o\">);</span>\n    <span class=\"n\">bondingSystem</span><span class=\"o\">.</span><span class=\"na\">setAtoms</span><span class=\"o\">(</span>\n      <span class=\"k\">new</span> <span class=\"nc\">IAtom</span><span class=\"o\">[]</span> <span class=\"o\">{</span> <span class=\"n\">atomC0</span><span class=\"o\">,</span> <span class=\"n\">atomC1</span><span class=\"o\">,</span> <span class=\"n\">atomC2</span><span class=\"o\">,</span> \n                    <span class=\"n\">atomC3</span><span class=\"o\">,</span> <span class=\"n\">atomC4</span><span class=\"o\">,</span> <span class=\"n\">atomC5</span><span class=\"o\">}</span>\n    <span class=\"o\">);</span>\n\n  <span class=\"n\">benzene</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">atomC0</span><span class=\"o\">);</span> <span class=\"n\">benzene</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">atomC1</span><span class=\"o\">);</span>\n  <span class=\"n\">benzene</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">atomC2</span><span class=\"o\">);</span> <span class=\"n\">benzene</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">atomC3</span><span class=\"o\">);</span>\n  <span class=\"n\">benzene</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">atomC4</span><span class=\"o\">);</span> <span class=\"n\">benzene</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">atomC5</span><span class=\"o\">);</span>\n\n  <span class=\"n\">benzene</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">bondB0</span><span class=\"o\">);</span> <span class=\"n\">benzene</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">bondB1</span><span class=\"o\">);</span>\n  <span class=\"n\">benzene</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">bondB2</span><span class=\"o\">);</span> <span class=\"n\">benzene</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">bondB3</span><span class=\"o\">);</span>\n  <span class=\"n\">benzene</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">bondB4</span><span class=\"o\">);</span> <span class=\"n\">benzene</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">bondB5</span><span class=\"o\">);</span>\n  <span class=\"n\">benzene</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">bondingSystem</span><span class=\"o\">);</span>\n\n  <span class=\"k\">return</span> <span class=\"n\">benzene</span><span class=\"o\">;</span>\n<span class=\"o\">}</span>\n</code></pre></div></div>\n\n<p>This version represents the delocalized aromatic pi-system as one IBond:\none with 6 electrons, and 6 associated atoms.</p>\n\n<p>The cyclopentadienyl anion is represented similarly:</p>\n\n<div class=\"language-java highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kd\">public</span> <span class=\"nc\">IMolecule</span> <span class=\"nf\">makeCycloPentadienylAnion</span><span class=\"o\">()</span> <span class=\"o\">{</span>\n  <span class=\"nc\">IMolecule</span> <span class=\"n\">cp</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newMolecule</span><span class=\"o\">();</span>\n\n  <span class=\"nc\">IAtom</span> <span class=\"n\">atomC0</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"nc\">Elements</span><span class=\"o\">.</span><span class=\"na\">CARBON</span><span class=\"o\">);</span>\n <span class=\"n\">atomC0</span><span class=\"o\">.</span><span class=\"na\">setID</span><span class=\"o\">(</span><span class=\"s\">\"C0\"</span><span class=\"o\">);</span> <span class=\"n\">atomC0</span><span class=\"o\">.</span><span class=\"na\">setHydrogenCount</span><span class=\"o\">(</span><span class=\"mi\">1</span><span class=\"o\">);</span>\n  <span class=\"nc\">IAtom</span> <span class=\"n\">atomC1</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"nc\">Elements</span><span class=\"o\">.</span><span class=\"na\">CARBON</span><span class=\"o\">);</span>\n <span class=\"n\">atomC1</span><span class=\"o\">.</span><span class=\"na\">setID</span><span class=\"o\">(</span><span class=\"s\">\"C1\"</span><span class=\"o\">);</span> <span class=\"n\">atomC1</span><span class=\"o\">.</span><span class=\"na\">setHydrogenCount</span><span class=\"o\">(</span><span class=\"mi\">1</span><span class=\"o\">);</span>\n  <span class=\"nc\">IAtom</span> <span class=\"n\">atomC2</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"nc\">Elements</span><span class=\"o\">.</span><span class=\"na\">CARBON</span><span class=\"o\">);</span>\n <span class=\"n\">atomC2</span><span class=\"o\">.</span><span class=\"na\">setID</span><span class=\"o\">(</span><span class=\"s\">\"C2\"</span><span class=\"o\">);</span> <span class=\"n\">atomC2</span><span class=\"o\">.</span><span class=\"na\">setHydrogenCount</span><span class=\"o\">(</span><span class=\"mi\">1</span><span class=\"o\">);</span>\n  <span class=\"nc\">IAtom</span> <span class=\"n\">atomC3</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"nc\">Elements</span><span class=\"o\">.</span><span class=\"na\">CARBON</span><span class=\"o\">);</span>\n <span class=\"n\">atomC3</span><span class=\"o\">.</span><span class=\"na\">setID</span><span class=\"o\">(</span><span class=\"s\">\"C3\"</span><span class=\"o\">);</span> <span class=\"n\">atomC3</span><span class=\"o\">.</span><span class=\"na\">setHydrogenCount</span><span class=\"o\">(</span><span class=\"mi\">1</span><span class=\"o\">);</span>\n  <span class=\"nc\">IAtom</span> <span class=\"n\">atomC4</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"nc\">Elements</span><span class=\"o\">.</span><span class=\"na\">CARBON</span><span class=\"o\">);</span>\n <span class=\"n\">atomC4</span><span class=\"o\">.</span><span class=\"na\">setID</span><span class=\"o\">(</span><span class=\"s\">\"C4\"</span><span class=\"o\">);</span> <span class=\"n\">atomC4</span><span class=\"o\">.</span><span class=\"na\">setHydrogenCount</span><span class=\"o\">(</span><span class=\"mi\">1</span><span class=\"o\">);</span>\n\n  <span class=\"nc\">IBond</span> <span class=\"n\">bondB0</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newBond</span><span class=\"o\">(</span><span class=\"n\">atomC0</span><span class=\"o\">,</span> <span class=\"n\">atomC1</span><span class=\"o\">);</span>\n    <span class=\"n\">bondB0</span><span class=\"o\">.</span><span class=\"na\">setElectronCount</span><span class=\"o\">(</span><span class=\"mi\">2</span><span class=\"o\">);</span>\n  <span class=\"nc\">IBond</span> <span class=\"n\">bondB1</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newBond</span><span class=\"o\">(</span><span class=\"n\">atomC1</span><span class=\"o\">,</span> <span class=\"n\">atomC2</span><span class=\"o\">);</span>\n    <span class=\"n\">bondB1</span><span class=\"o\">.</span><span class=\"na\">setElectronCount</span><span class=\"o\">(</span><span class=\"mi\">2</span><span class=\"o\">);</span>\n  <span class=\"nc\">IBond</span> <span class=\"n\">bondB2</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newBond</span><span class=\"o\">(</span><span class=\"n\">atomC2</span><span class=\"o\">,</span> <span class=\"n\">atomC3</span><span class=\"o\">);</span>\n    <span class=\"n\">bondB2</span><span class=\"o\">.</span><span class=\"na\">setElectronCount</span><span class=\"o\">(</span><span class=\"mi\">2</span><span class=\"o\">);</span>\n  <span class=\"nc\">IBond</span> <span class=\"n\">bondB3</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newBond</span><span class=\"o\">(</span><span class=\"n\">atomC3</span><span class=\"o\">,</span> <span class=\"n\">atomC4</span><span class=\"o\">);</span>\n    <span class=\"n\">bondB3</span><span class=\"o\">.</span><span class=\"na\">setElectronCount</span><span class=\"o\">(</span><span class=\"mi\">2</span><span class=\"o\">);</span>\n  <span class=\"nc\">IBond</span> <span class=\"n\">bondB4</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newBond</span><span class=\"o\">(</span><span class=\"n\">atomC4</span><span class=\"o\">,</span> <span class=\"n\">atomC0</span><span class=\"o\">);</span>\n    <span class=\"n\">bondB4</span><span class=\"o\">.</span><span class=\"na\">setElectronCount</span><span class=\"o\">(</span><span class=\"mi\">2</span><span class=\"o\">);</span>\n\n  <span class=\"nc\">IBond</span> <span class=\"n\">bondingSystem</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newBond</span><span class=\"o\">();</span>\n    <span class=\"n\">bondingSystem</span><span class=\"o\">.</span><span class=\"na\">setElectronCount</span><span class=\"o\">(</span><span class=\"mi\">6</span><span class=\"o\">);</span>\n  <span class=\"n\">bondingSystem</span><span class=\"o\">.</span><span class=\"na\">setAtoms</span><span class=\"o\">(</span>\n    <span class=\"k\">new</span> <span class=\"nc\">IAtom</span><span class=\"o\">[]{</span> <span class=\"n\">atomC0</span><span class=\"o\">,</span> <span class=\"n\">atomC1</span><span class=\"o\">,</span> <span class=\"n\">atomC2</span><span class=\"o\">,</span> <span class=\"n\">atomC3</span><span class=\"o\">,</span> <span class=\"n\">atomC4</span><span class=\"o\">}</span>\n  <span class=\"o\">);</span>\n\n  <span class=\"n\">cp</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">atomC0</span><span class=\"o\">);</span> <span class=\"n\">cp</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">atomC1</span><span class=\"o\">);</span>\n  <span class=\"n\">cp</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">atomC2</span><span class=\"o\">);</span> <span class=\"n\">cp</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">atomC3</span><span class=\"o\">);</span>\n  <span class=\"n\">cp</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">atomC4</span><span class=\"o\">);</span>\n\n  <span class=\"n\">cp</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">bondB0</span><span class=\"o\">);</span> <span class=\"n\">cp</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">bondB1</span><span class=\"o\">);</span>\n  <span class=\"n\">cp</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">bondB2</span><span class=\"o\">);</span> <span class=\"n\">cp</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">bondB3</span><span class=\"o\">);</span>\n  <span class=\"n\">cp</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">bondB4</span><span class=\"o\">);</span> <span class=\"n\">cp</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">bondingSystem</span><span class=\"o\">);</span>\n\n  <span class=\"k\">return</span> <span class=\"n\">cp</span><span class=\"o\">;</span>\n<span class=\"o\">}</span>\n</code></pre></div></div>\n\n<p>And the final step in this series, is ferrocene:</p>\n\n<div class=\"language-java highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kd\">public</span> <span class=\"nc\">IMolecule</span> <span class=\"nf\">makeFerrocene</span><span class=\"o\">()</span> <span class=\"o\">{</span>\n  <span class=\"nc\">IMolecule</span> <span class=\"n\">ferrocene</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newMolecule</span><span class=\"o\">();</span>\n\n  <span class=\"nc\">IAtom</span> <span class=\"n\">atomC0</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"nc\">Elements</span><span class=\"o\">.</span><span class=\"na\">CARBON</span><span class=\"o\">);</span>\n    <span class=\"n\">atomC0</span><span class=\"o\">.</span><span class=\"na\">setID</span><span class=\"o\">(</span><span class=\"s\">\"C0\"</span><span class=\"o\">);</span> <span class=\"n\">atomC0</span><span class=\"o\">.</span><span class=\"na\">setHydrogenCount</span><span class=\"o\">(</span><span class=\"mi\">1</span><span class=\"o\">);</span>\n  <span class=\"nc\">IAtom</span> <span class=\"n\">atomC1</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"nc\">Elements</span><span class=\"o\">.</span><span class=\"na\">CARBON</span><span class=\"o\">);</span>\n    <span class=\"n\">atomC1</span><span class=\"o\">.</span><span class=\"na\">setID</span><span class=\"o\">(</span><span class=\"s\">\"C1\"</span><span class=\"o\">);</span> <span class=\"n\">atomC1</span><span class=\"o\">.</span><span class=\"na\">setHydrogenCount</span><span class=\"o\">(</span><span class=\"mi\">1</span><span class=\"o\">);</span>\n  <span class=\"nc\">IAtom</span> <span class=\"n\">atomC2</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"nc\">Elements</span><span class=\"o\">.</span><span class=\"na\">CARBON</span><span class=\"o\">);</span>\n    <span class=\"n\">atomC2</span><span class=\"o\">.</span><span class=\"na\">setID</span><span class=\"o\">(</span><span class=\"s\">\"C2\"</span><span class=\"o\">);</span> <span class=\"n\">atomC2</span><span class=\"o\">.</span><span class=\"na\">setHydrogenCount</span><span class=\"o\">(</span><span class=\"mi\">1</span><span class=\"o\">);</span>\n  <span class=\"nc\">IAtom</span> <span class=\"n\">atomC3</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"nc\">Elements</span><span class=\"o\">.</span><span class=\"na\">CARBON</span><span class=\"o\">);</span>\n    <span class=\"n\">atomC3</span><span class=\"o\">.</span><span class=\"na\">setID</span><span class=\"o\">(</span><span class=\"s\">\"C3\"</span><span class=\"o\">);</span> <span class=\"n\">atomC3</span><span class=\"o\">.</span><span class=\"na\">setHydrogenCount</span><span class=\"o\">(</span><span class=\"mi\">1</span><span class=\"o\">);</span>\n  <span class=\"nc\">IAtom</span> <span class=\"n\">atomC4</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"nc\">Elements</span><span class=\"o\">.</span><span class=\"na\">CARBON</span><span class=\"o\">);</span>\n    <span class=\"n\">atomC4</span><span class=\"o\">.</span><span class=\"na\">setID</span><span class=\"o\">(</span><span class=\"s\">\"C4\"</span><span class=\"o\">);</span> <span class=\"n\">atomC4</span><span class=\"o\">.</span><span class=\"na\">setHydrogenCount</span><span class=\"o\">(</span><span class=\"mi\">1</span><span class=\"o\">);</span>\n  <span class=\"nc\">IAtom</span> <span class=\"n\">atomC5</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"nc\">Elements</span><span class=\"o\">.</span><span class=\"na\">CARBON</span><span class=\"o\">);</span>\n    <span class=\"n\">atomC5</span><span class=\"o\">.</span><span class=\"na\">setID</span><span class=\"o\">(</span><span class=\"s\">\"C5\"</span><span class=\"o\">);</span> <span class=\"n\">atomC5</span><span class=\"o\">.</span><span class=\"na\">setHydrogenCount</span><span class=\"o\">(</span><span class=\"mi\">1</span><span class=\"o\">);</span>\n  <span class=\"nc\">IAtom</span> <span class=\"n\">atomC6</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"nc\">Elements</span><span class=\"o\">.</span><span class=\"na\">CARBON</span><span class=\"o\">);</span>\n    <span class=\"n\">atomC6</span><span class=\"o\">.</span><span class=\"na\">setID</span><span class=\"o\">(</span><span class=\"s\">\"C6\"</span><span class=\"o\">);</span> <span class=\"n\">atomC6</span><span class=\"o\">.</span><span class=\"na\">setHydrogenCount</span><span class=\"o\">(</span><span class=\"mi\">1</span><span class=\"o\">);</span>\n  <span class=\"nc\">IAtom</span> <span class=\"n\">atomC7</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"nc\">Elements</span><span class=\"o\">.</span><span class=\"na\">CARBON</span><span class=\"o\">);</span>\n    <span class=\"n\">atomC7</span><span class=\"o\">.</span><span class=\"na\">setID</span><span class=\"o\">(</span><span class=\"s\">\"C7\"</span><span class=\"o\">);</span> <span class=\"n\">atomC7</span><span class=\"o\">.</span><span class=\"na\">setHydrogenCount</span><span class=\"o\">(</span><span class=\"mi\">1</span><span class=\"o\">);</span>\n  <span class=\"nc\">IAtom</span> <span class=\"n\">atomC8</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"nc\">Elements</span><span class=\"o\">.</span><span class=\"na\">CARBON</span><span class=\"o\">);</span>\n    <span class=\"n\">atomC8</span><span class=\"o\">.</span><span class=\"na\">setID</span><span class=\"o\">(</span><span class=\"s\">\"C8\"</span><span class=\"o\">);</span> <span class=\"n\">atomC8</span><span class=\"o\">.</span><span class=\"na\">setHydrogenCount</span><span class=\"o\">(</span><span class=\"mi\">1</span><span class=\"o\">);</span>\n  <span class=\"nc\">IAtom</span> <span class=\"n\">atomC9</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"nc\">Elements</span><span class=\"o\">.</span><span class=\"na\">CARBON</span><span class=\"o\">);</span>\n    <span class=\"n\">atomC9</span><span class=\"o\">.</span><span class=\"na\">setID</span><span class=\"o\">(</span><span class=\"s\">\"C9\"</span><span class=\"o\">);</span> <span class=\"n\">atomC9</span><span class=\"o\">.</span><span class=\"na\">setHydrogenCount</span><span class=\"o\">(</span><span class=\"mi\">1</span><span class=\"o\">);</span>\n  <span class=\"nc\">IAtom</span> <span class=\"n\">iron</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"nc\">Elements</span><span class=\"o\">.</span><span class=\"na\">IRON</span><span class=\"o\">);</span>\n    <span class=\"n\">iron</span><span class=\"o\">.</span><span class=\"na\">setID</span><span class=\"o\">(</span><span class=\"s\">\"Fe10\"</span><span class=\"o\">);</span> <span class=\"n\">iron</span><span class=\"o\">.</span><span class=\"na\">setHydrogenCount</span><span class=\"o\">(</span><span class=\"mi\">0</span><span class=\"o\">);</span>\n\n  <span class=\"nc\">IBond</span> <span class=\"n\">bondB0</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newBond</span><span class=\"o\">(</span><span class=\"n\">atomC0</span><span class=\"o\">,</span> <span class=\"n\">atomC1</span><span class=\"o\">);</span>\n    <span class=\"n\">bondB0</span><span class=\"o\">.</span><span class=\"na\">setElectronCount</span><span class=\"o\">(</span><span class=\"mi\">2</span><span class=\"o\">);</span>\n  <span class=\"nc\">IBond</span> <span class=\"n\">bondB1</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newBond</span><span class=\"o\">(</span><span class=\"n\">atomC1</span><span class=\"o\">,</span> <span class=\"n\">atomC2</span><span class=\"o\">);</span>\n    <span class=\"n\">bondB1</span><span class=\"o\">.</span><span class=\"na\">setElectronCount</span><span class=\"o\">(</span><span class=\"mi\">2</span><span class=\"o\">);</span>\n  <span class=\"nc\">IBond</span> <span class=\"n\">bondB2</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newBond</span><span class=\"o\">(</span><span class=\"n\">atomC2</span><span class=\"o\">,</span> <span class=\"n\">atomC3</span><span class=\"o\">);</span>\n    <span class=\"n\">bondB2</span><span class=\"o\">.</span><span class=\"na\">setElectronCount</span><span class=\"o\">(</span><span class=\"mi\">2</span><span class=\"o\">);</span>\n  <span class=\"nc\">IBond</span> <span class=\"n\">bondB3</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newBond</span><span class=\"o\">(</span><span class=\"n\">atomC3</span><span class=\"o\">,</span> <span class=\"n\">atomC4</span><span class=\"o\">);</span>\n    <span class=\"n\">bondB3</span><span class=\"o\">.</span><span class=\"na\">setElectronCount</span><span class=\"o\">(</span><span class=\"mi\">2</span><span class=\"o\">);</span>\n  <span class=\"nc\">IBond</span> <span class=\"n\">bondB4</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newBond</span><span class=\"o\">(</span><span class=\"n\">atomC4</span><span class=\"o\">,</span> <span class=\"n\">atomC0</span><span class=\"o\">);</span>\n    <span class=\"n\">bondB4</span><span class=\"o\">.</span><span class=\"na\">setElectronCount</span><span class=\"o\">(</span><span class=\"mi\">2</span><span class=\"o\">);</span>\n  <span class=\"nc\">IBond</span> <span class=\"n\">bondB5</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newBond</span><span class=\"o\">(</span><span class=\"n\">atomC5</span><span class=\"o\">,</span> <span class=\"n\">atomC6</span><span class=\"o\">);</span>\n    <span class=\"n\">bondB5</span><span class=\"o\">.</span><span class=\"na\">setElectronCount</span><span class=\"o\">(</span><span class=\"mi\">2</span><span class=\"o\">);</span>\n  <span class=\"nc\">IBond</span> <span class=\"n\">bondB6</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newBond</span><span class=\"o\">(</span><span class=\"n\">atomC6</span><span class=\"o\">,</span> <span class=\"n\">atomC7</span><span class=\"o\">);</span>\n    <span class=\"n\">bondB6</span><span class=\"o\">.</span><span class=\"na\">setElectronCount</span><span class=\"o\">(</span><span class=\"mi\">2</span><span class=\"o\">);</span>\n  <span class=\"nc\">IBond</span> <span class=\"n\">bondB7</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newBond</span><span class=\"o\">(</span><span class=\"n\">atomC7</span><span class=\"o\">,</span> <span class=\"n\">atomC8</span><span class=\"o\">);</span>\n    <span class=\"n\">bondB7</span><span class=\"o\">.</span><span class=\"na\">setElectronCount</span><span class=\"o\">(</span><span class=\"mi\">2</span><span class=\"o\">);</span>\n  <span class=\"nc\">IBond</span> <span class=\"n\">bondB8</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newBond</span><span class=\"o\">(</span><span class=\"n\">atomC8</span><span class=\"o\">,</span> <span class=\"n\">atomC9</span><span class=\"o\">);</span>\n    <span class=\"n\">bondB8</span><span class=\"o\">.</span><span class=\"na\">setElectronCount</span><span class=\"o\">(</span><span class=\"mi\">2</span><span class=\"o\">);</span>\n  <span class=\"nc\">IBond</span> <span class=\"n\">bondB9</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newBond</span><span class=\"o\">(</span><span class=\"n\">atomC9</span><span class=\"o\">,</span> <span class=\"n\">atomC5</span><span class=\"o\">);</span>\n    <span class=\"n\">bondB9</span><span class=\"o\">.</span><span class=\"na\">setElectronCount</span><span class=\"o\">(</span><span class=\"mi\">2</span><span class=\"o\">);</span>\n\n  <span class=\"nc\">IBond</span> <span class=\"n\">bondingSystem1</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newBond</span><span class=\"o\">();</span>\n    <span class=\"n\">bondingSystem1</span><span class=\"o\">.</span><span class=\"na\">setElectronCount</span><span class=\"o\">(</span><span class=\"mi\">6</span><span class=\"o\">);</span>\n    <span class=\"n\">bondingSystem1</span><span class=\"o\">.</span><span class=\"na\">setAtoms</span><span class=\"o\">(</span>\n      <span class=\"k\">new</span> <span class=\"nc\">IAtom</span><span class=\"o\">[]</span> <span class=\"o\">{</span>\n       <span class=\"n\">atomC0</span><span class=\"o\">,</span> <span class=\"n\">atomC1</span><span class=\"o\">,</span> <span class=\"n\">atomC2</span><span class=\"o\">,</span> <span class=\"n\">atomC3</span><span class=\"o\">,</span> <span class=\"n\">atomC4</span><span class=\"o\">,</span> <span class=\"n\">iron</span>\n      <span class=\"o\">}</span>\n    <span class=\"o\">);</span>\n  <span class=\"nc\">IBond</span> <span class=\"n\">bondingSystem2</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newBond</span><span class=\"o\">();</span> \n    <span class=\"n\">bondingSystem2</span><span class=\"o\">.</span><span class=\"na\">setElectronCount</span><span class=\"o\">(</span><span class=\"mi\">6</span><span class=\"o\">);</span>\n    <span class=\"n\">bondingSystem2</span><span class=\"o\">.</span><span class=\"na\">setAtoms</span><span class=\"o\">(</span>\n      <span class=\"k\">new</span> <span class=\"nc\">IAtom</span><span class=\"o\">[]</span> <span class=\"o\">{</span>\n        <span class=\"n\">atomC5</span><span class=\"o\">,</span> <span class=\"n\">atomC6</span><span class=\"o\">,</span> <span class=\"n\">atomC7</span><span class=\"o\">,</span> <span class=\"n\">atomC8</span><span class=\"o\">,</span> <span class=\"n\">atomC9</span><span class=\"o\">,</span> <span class=\"n\">iron</span>\n      <span class=\"o\">}</span>\n    <span class=\"o\">);</span>\n  <span class=\"nc\">IBond</span> <span class=\"n\">bondingSystem3</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newBond</span><span class=\"o\">();</span>\n    <span class=\"n\">bondingSystem3</span><span class=\"o\">.</span><span class=\"na\">setElectronCount</span><span class=\"o\">(</span><span class=\"mi\">6</span><span class=\"o\">);</span>\n    <span class=\"n\">bondingSystem3</span><span class=\"o\">.</span><span class=\"na\">setAtoms</span><span class=\"o\">(</span>\n      <span class=\"k\">new</span> <span class=\"nc\">IAtom</span><span class=\"o\">[]{</span>\n        <span class=\"n\">atomC0</span><span class=\"o\">,</span> <span class=\"n\">atomC1</span><span class=\"o\">,</span> <span class=\"n\">atomC2</span><span class=\"o\">,</span> <span class=\"n\">atomC3</span><span class=\"o\">,</span> <span class=\"n\">atomC4</span><span class=\"o\">,</span>\n        <span class=\"n\">atomC5</span><span class=\"o\">,</span> <span class=\"n\">atomC6</span><span class=\"o\">,</span> <span class=\"n\">atomC7</span><span class=\"o\">,</span> <span class=\"n\">atomC8</span><span class=\"o\">,</span> <span class=\"n\">atomC9</span><span class=\"o\">,</span>\n        <span class=\"n\">iron</span>\n      <span class=\"o\">}</span>\n    <span class=\"o\">);</span>\n\n  <span class=\"n\">ferrocene</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">atomC0</span><span class=\"o\">);</span> <span class=\"n\">ferrocene</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">atomC1</span><span class=\"o\">);</span>\n  <span class=\"n\">ferrocene</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">atomC2</span><span class=\"o\">);</span> <span class=\"n\">ferrocene</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">atomC3</span><span class=\"o\">);</span>\n  <span class=\"n\">ferrocene</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">atomC4</span><span class=\"o\">);</span> <span class=\"n\">ferrocene</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">atomC5</span><span class=\"o\">);</span>\n  <span class=\"n\">ferrocene</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">atomC6</span><span class=\"o\">);</span> <span class=\"n\">ferrocene</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">atomC7</span><span class=\"o\">);</span>\n  <span class=\"n\">ferrocene</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">atomC8</span><span class=\"o\">);</span> <span class=\"n\">ferrocene</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">atomC9</span><span class=\"o\">);</span>\n  <span class=\"n\">ferrocene</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">iron</span><span class=\"o\">);</span>\n\n  <span class=\"n\">ferrocene</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">bondB0</span><span class=\"o\">);</span> <span class=\"n\">ferrocene</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">bondB1</span><span class=\"o\">);</span>\n  <span class=\"n\">ferrocene</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">bondB2</span><span class=\"o\">);</span> <span class=\"n\">ferrocene</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">bondB3</span><span class=\"o\">);</span>\n  <span class=\"n\">ferrocene</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">bondB4</span><span class=\"o\">);</span>\n  <span class=\"n\">ferrocene</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">bondB5</span><span class=\"o\">);</span> <span class=\"n\">ferrocene</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">bondB6</span><span class=\"o\">);</span>\n  <span class=\"n\">ferrocene</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">bondB7</span><span class=\"o\">);</span> <span class=\"n\">ferrocene</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">bondB8</span><span class=\"o\">);</span>\n  <span class=\"n\">ferrocene</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">bondB9</span><span class=\"o\">);</span>\n  <span class=\"n\">ferrocene</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">bondingSystem1</span><span class=\"o\">);</span>\n  <span class=\"n\">ferrocene</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">bondingSystem2</span><span class=\"o\">);</span>\n  <span class=\"n\">ferrocene</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">bondingSystem3</span><span class=\"o\">);</span>\n\n  <span class=\"k\">return</span> <span class=\"n\">ferrocene</span><span class=\"o\">;</span>\n<span class=\"o\">}</span>\n</code></pre></div></div>\n\n<p>Now, you will note that this approach does not exactly follow Rich’s FlexMol examples: the\nskipped atom pair concepts in the FlexMol version of ferrocene. His example, more closely follows\nwhat we are likely to draw, while the CDK code above more closely follows the molecular orbital\nconcept. (I have to check to see how Dietz and Bauerschmidt did this.)</p>\n\n<p>As said, the real trick is to have the chemoinformatics toolkit that can work with this\nrepresentation, but I will save that for later. At least our algorithms to calculate the\nmolecular mass should work ;)</p>",
      "summary": "Rich recently blogged about the limitations of the two-atom bond representation often used in chemoinformatics, triggered by the four ferrocene entries in PubChem . In reply to himself, Rich described FlexMol , an XML language that can describe bond systems that involve more than two atoms.",
      
      "date_published": "2006-12-30T00:00:00+00:00",
      "date_modified": "2025-02-16T00:00:00+00:00",
      "tags": ["cheminf","cdk"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1021/ci00027a001", "doi": "10.1021/ci00027a001"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1021/ci9704423", "doi": "10.1021/ci9704423"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.59350/pz3p6-fv247", "doi": "10.59350/pz3p6-fv247"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.59350/xnt9b-80962", "doi": "10.59350/xnt9b-80962"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.59350/s1vtx-e9q82", "doi": "10.59350/s1vtx-e9q82"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/9pkxf-3ns82",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/12/21/updated-chemical-blogspace-layout-and.html",
      "title": "Updated Chemical Blogspace Layout and Software",
      "content_html": "<p>Last night I upgraded the software behind <a href=\"https://web.archive.org/web/20061223075417/http://wiki.cubic.uni-koeln.de/cb/\">Chemical <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2006/11/03/chemical-blogspace-updates.html\">blogspace <i class=\"fa-solid fa-recycle fa-xs\"></i></a>, to the version\n<a href=\"http://code.google.com/p/openreview/\">online</a> on <a href=\"http://code.google.com/\">Google Code</a>, though I needed the help from\n<a href=\"https://web.archive.org/web/20051104010705/http://www.ghastlyfop.com/blog/\">Eaun <i class=\"fa-solid fa-box-archive fa-xs\"></i></a> to get paper titles correctly picked up for <a href=\"http://pubs.acs.org/\">ACS journals</a>.\nThe number of working blogs is a bit down and now at <a href=\"https://web.archive.org/web/20070102170205/http://wiki.cubic.uni-koeln.de/cb/blogs.php\">68 <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>,\nwith an average number of 30 active blogs posting more than 100 blog items each day (see <a href=\"https://web.archive.org/web/20061223075048/http://wiki.cubic.uni-koeln.de/cb/stats.php\">Zeitgeist <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>).\nThe new design looks like quite nice compared to the <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/08/25/chemical-blogspace.html\">old one</a>:</p>\n\n<p><img src=\"/assets/images/chemicalBlogspaceScreeny.png\" alt=\"\" /></p>",
      "summary": "Last night I upgraded the software behind Chemical blogspace , to the version online on Google Code, though I needed the help from Eaun to get paper titles correctly picked up for ACS journals. The number of working blogs is a bit down and now at 68 , with an average number of 30 active blogs posting more than 100 blog items each day (see Zeitgeist ). The new design looks like quite nice compared to the old one:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/chemicalBlogspaceScreeny.png",
      "date_published": "2006-12-21T00:00:00+00:00",
      "date_modified": "2024-08-24T00:00:00+00:00",
      "tags": ["cb","chemistry"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/qwtr4-v2q49",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/12/19/chemistry-in-html-greasemonkey-again.html",
      "title": "Chemistry in HTML: Greasemonkey again",
      "content_html": "<p>Here’s a quick update on my blog about <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/12/17/smiles-cas-and-inchi-in-blogs.html\">SMILES, CAS and InChI in blogs: Greasemonkey <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nlast sunday. The original download was messed up :( You can download a new version at <a href=\"http://userscripts.org/scripts/show/6807\">userscripts.org</a>.</p>\n\n<p>This new version also supports <code class=\"language-plaintext highlighter-rouge\">chem:compound</code>, for any chemical. For example:</p>\n\n<ul>\n  <li><span class=\"chem:compound\">isopropyl alcohol</span></li>\n</ul>\n\n<p>Remember that it only works for properly marked up content, as described in <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/12/10/including-smiles-cml-and-inchi-in.html\">Including SMILES, CML and InChI in blogs <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.\nThe HTML source code of the above example looks like (in RDFa):</p>\n\n<div class=\"language-html highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nt\">&lt;ul&gt;&lt;li&gt;</span>\n<span class=\"nt\">&lt;span</span> <span class=\"na\">xmlns:chem=</span><span class=\"s\">\"http://www.blueobelisk.org/chemistryblogs/\"</span>\n      <span class=\"na\">class=</span><span class=\"s\">\"chem:compound\"</span><span class=\"nt\">&gt;</span>isopropyl alcohol<span class=\"nt\">&lt;/span&gt;</span>\n<span class=\"nt\">&lt;/li&gt;&lt;/ul&gt;</span>\n</code></pre></div></div>\n\n<p>The current script only adds search links to <a href=\"http://pubchem.ncbi.nlm.nih.gov/\">PubChem</a> and\n<a href=\"http://google.com/\">Google</a>, but the possibilities are endless, and potentially very powerfull.\nHere are some future ideas.</p>\n\n<h2 id=\"a-link-to-predict-nmr-spectra-using-nmrshiftdborg\">A link to predict NMR spectra using NMRShiftDB.org:</h2>\n\n<p>Making a link to the <a href=\"http://www.nmrshiftdb.org/\">NMRShiftDB.org</a> website to predict <sup>13</sup>C or\n<sup>1</sup>H NMR from a SMILES, and InChI likely too, is easy, if the website provides a URL to do this.\n(I will discuss this with Stefan.)</p>\n\n<h2 id=\"a-popup-window-with-the-3d-structure-in-jmol\">A popup window with the 3D structure in Jmol:</h2>\n\n<p>This would involve some more work, but this most certainly possible too, given that we actually have\na website around which allows downloading 3D coordinates given a SMILES or InChI. While a simple approach\nwould be to make a popup with <a href=\"http://www.jmol.org/\">Jmol</a> that takes the URL to that 3D coordinate website,\nit could be extended using Ajax to query the 3D structure first, and depending on success, show\nJmol or a message “Could not find 3D coordinates”.</p>\n\n<h2 id=\"summarize-molecular-details-hidden-in-cml\">Summarize molecular details hidden in CML:</h2>\n\n<p>This is likely the most exiting possibility. I blogged about CMLRSS <a href=\"http://search.blogger.com/?as_q=CMLRSS&amp;ie=UTF-8&amp;ui=blg&amp;bl_url=chem-bla-ics.blogspot.com&amp;x=0&amp;y=0\">many times</a> <!-- keep link -->\nnow (check the AVI, the <a href=\"https://doi.org/10.1021/ci034244p\">article</a>, etc), and combining these two\ntechnologies will take the semantic, chemistry internet to the next level. CMLRSS describes how CML\ncan be embedded in blog items (e.g. <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/02/18/blogging-chemistry-on-blogspotcom.html\">Blogging chemistry on blogspot.com <i class=\"fa-solid fa-recycle fa-xs\"></i></a>),\nbut really works for any <a href=\"http://www.w3.org/TR/xhtml1/\">XHTML</a>.</p>\n\n<p>Consider this mockup: add CML content to your blog item, containing molecular properties, such as its\nNMR peaks, elemental analysis, etc. This will not show up in your blog item, so that the user is not\nbothered with implementation details. Now, a userscript will now about the CML content, as it has access\nto the whole content of the page. The visible text will mention the molecule for which CML contains\nexperimental or other details. Using the <code class=\"language-plaintext highlighter-rouge\">&lt;span class=\"chem:compound\"/&gt;</code> technology shown above, it is\npossible to link that compound to this CML bit (details to follow in this blog in January 2007). The\nuserscript will then on the fly create a popup for the compound name in the visible text to show those\nexperimental details.</p>\n\n<p>How about that? Comments and other ideas are more than welcome!</p>\n\n<h2 id=\"server-side-scripts\">Server side scripts:</h2>\n\n<p>Greasemonkey allows users to decide which scripts to run on a website, and which not. If you, as blogger\nor XHTML editor, want to force a script like the above to be run, that should be possible too.\nGreasemonkey scripts are written in JavaScript, so including them on the server side should be\npossible too. I might explore this option soon too.</p>",
      "summary": "Here’s a quick update on my blog about SMILES, CAS and InChI in blogs: Greasemonkey last sunday. The original download was messed up :( You can download a new version at userscripts.org.",
      
      "date_published": "2006-12-19T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["userscript","html","rdf","nmrshiftdb"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1021/CI034244P", "doi": "10.1021/CI034244P"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/ygxfh-xfs36",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/12/17/smiles-cas-and-inchi-in-blogs.html",
      "title": "SMILES, CAS and InChI in blogs: Greasemonkey",
      "content_html": "<p>As follow up on my <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/12/10/including-smiles-cml-and-inchi-in.html\">Including SMILES, CML and InChI in blogs <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nblog last week, I had a go at <a href=\"http://en.wikipedia.org/wiki/Greasemonkey\">Greasemonkey</a>. Some time ago already,\n<a href=\"http://www.ghastlyfop.com/blog/2006/09/postgenomic-pubmed-mashup.html\">Flags and Lollipops</a> and\n<a href=\"http://www.nodalpoint.org/2006/05/16/postgenomic_greasemonkey_script\">Nodalpoint</a> showed with two cool mashups (one Connotea/Postgenomic\nand one Pubmed/Postgenomic) that userscripts are rather useful in science too. I can very much recommend the PubMed/Postgenomic mashup,\nas PubMed has several organic chemistry journals indexed too!</p>\n\n<p>So, how does this relate to my blog of last week? Well, would it not be nice that if your blog uses the markup as suggested in that\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2006/12/10/including-smiles-cml-and-inchi-in.html\">blog <i class=\"fa-solid fa-recycle fa-xs\"></i></a>, that you automatically get links to\n<a href=\"http://pubchem.ncbi.nlm.nih.gov/\">PubChem</a> and <a href=\"http://google.com/\">Google</a>? That is now possible with a small GPL-ed Greasemonkey script\ncalled <a href=\"http://www.woc.science.ru.nl/devel/egonw/blogchemistry.user.js\">blogchemistry.user.js</a>.</p>\n\n<p>The <a href=\"http://greasemonkey.mozdev.org/\">Greasemonkey plugin</a> requires <a href=\"http://getfirefox.com/\">Firefox</a> to be installed. If ready, install\nthe script by cli·cking this link earlier, and the Greasemonkey will ask you if you want to install the script. After, check the output\nfor this RDFa markup content:</p>\n\n<ul>\n  <li>a SMILES: <span xmlns:chem=\"http://www.blueobelisk.org/chemistryblogs/\" class=\"chem:smiles\">CCO</span></li>\n  <li>a CAS registry number: <span xmlns:chem=\"http://www.blueobelisk.org/chemistryblogs/\" class=\"chem:casnumber\">50-00-0</span></li>\n  <li>and an InChI: <span xmlns:chem=\"http://www.blueobelisk.org/chemistryblogs/\" class=\"chem:inchi\">InChI=1/CH4/h1H4</span></li>\n</ul>\n\n<p>It should look like the output for this blog item:</p>\n\n<p><img src=\"/assets/images/sechemticWebScript.png\" alt=\"\" /></p>\n\n<p>Note the superscript PubChem and Google links.</p>",
      "summary": "As follow up on my Including SMILES, CML and InChI in blogs blog last week, I had a go at Greasemonkey. Some time ago already, Flags and Lollipops and Nodalpoint showed with two cool mashups (one Connotea/Postgenomic and one Pubmed/Postgenomic) that userscripts are rather useful in science too. I can very much recommend the PubMed/Postgenomic mashup, as PubMed has several organic chemistry journals indexed too!",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/sechemticWebScript.png",
      "date_published": "2006-12-17T00:00:00+00:00",
      "date_modified": "2025-02-16T00:00:00+00:00",
      "tags": ["chemistry","userscript","smiles","pubchem","inchi"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/r9gwr-k2s81",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/12/17/counting-stereoisomers-from-molecular_17.html",
      "title": "Counting constitutional isomers from the molecular formula",
      "content_html": "<p><strong>Update</strong>: check <a href=\"https://doi.org/10.1186/s13321-022-00604-9\">these</a> <a href=\"https://doi.org/10.1186/s13321-021-00529-9\">two</a> papers.</p>\n\n<p>We all know the combinatorial explosion when calculating the number of possible constitutional\nisomers (see <a href=\"http://en.wikipedia.org/wiki/Structural_isomerism\">wp:structural isomorphism</a>) of\na certain molecular formula. For example, C2H6 has only one constitutional isomer (ethane,\n<span class=\"chem:inchi\" xmlns:chem=\"http://www.blueobelisk.org/chemistryblogs/\">InChI=1/C2H6/c1-2/h1-2H3</span>),\nand C4H10 has only two. Especially, breaking symmetry by replacing one\ncarbon by another element, or replacing a single by a double bond, increases the number sharply.\nFor example, C7H16 has only nine constitutional isomers, while replacing two single bonds by two\ndouble bonds, creating C7H10, increases this number to 499! Then, replacing in the last formula,\none carbon by an oxygen adds another few, totaling 747 isomers.</p>\n\n<p>Now, C8H8NBr has at least <strong>649 thousand</strong> constitutional isomers, and I am quite interested in\nbeing able to know the number of isomers beforehand, without having to generate the structures\nitself (for example, using <a href=\"http://cdk.sf.net/\">CDK</a>’s <code class=\"language-plaintext highlighter-rouge\">GENMDeterministicGenerator</code>).\n<span class=\"chem:inchi\" xmlns:chem=\"http://www.blueobelisk.org/chemistryblogs/\">InChI=1/C8H8BrN/c9-7-1-2-8-6(5-7)3-4-10-8/h1-2,5,10H,3-4H2</span>\nis one of the isomers.</p>\n\n<p>So, my question: is anyone aware of free code (in order of preference: 1. LGPL, 2. BSD/MIT,</p>\n<ol>\n  <li>opensource, 4. free) to calculate or estimate the number of constitutional isomers for a\ncertain molecular formula. An estimate would already be nice. Ideally, I would implement this bit\nof code into the CDK, but otherwise, just knowing the number of isomers for C8H8NBr would be\nnice :)</li>\n</ol>\n\n<p>Additionally, any relevant, recent literature recommendations are most welcomed. I am aware of the\nuse of polynomials, but literature I have seen so far just focuses on molecules of a certain\narchitecture, and it not able to come up with a guess based on the molecular formula alone.</p>",
      "summary": "Update: check these two papers.",
      
      "date_published": "2006-12-17T00:00:00+00:00",
      "date_modified": "2025-02-23T00:00:00+00:00",
      "tags": ["cheminf","cdk"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1186/s13321-021-00529-9", "doi": "10.1186/s13321-021-00529-9"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1186/s13321-022-00604-9", "doi": "10.1186/s13321-022-00604-9"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/frmbe-grb50",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/12/12/molecular-chemometrics.html",
      "title": "Molecular Chemometrics",
      "content_html": "<p>I just found out that a review article that I wrote earlier this year got printed: <em>Molecular Chemometrics</em>\n(DOI:<a href=\"https://doi.org/10.1080/10408340600969601\">10.1080/10408340600969601</a>), with my personal view on the interplay between\nchemoinformatics and chemometrics. The review discusses interesting developments in the last five years, and was fun writing\n(reading too, I think :). It has four major topics:</p>\n\n<ul>\n  <li><em>molecular representation</em> (with ‘molecular descriptors’ and ‘beyond the molecule’)</li>\n  <li><em>chemical space, similarity and diversity</em></li>\n  <li><em>activity and property modeling</em> (with ‘dimension reduction’ and ‘model validation’)</li>\n  <li><em>library searching</em>, which mostly focuses on semantic web developments</li>\n</ul>\n\n<p>Comments most welcome; just leave them below this blog item, or blog about the article yourself :)</p>",
      "summary": "I just found out that a review article that I wrote earlier this year got printed: Molecular Chemometrics (DOI:10.1080/10408340600969601), with my personal view on the interplay between chemoinformatics and chemometrics. The review discusses interesting developments in the last five years, and was fun writing (reading too, I think :). It has four major topics:",
      
      "date_published": "2006-12-12T00:00:00+00:00",
      "date_modified": "2025-02-16T00:00:00+00:00",
      "tags": ["chemometrics","cheminf"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1080/10408340600969601", "doi": "10.1080/10408340600969601"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/wpaxr-93167",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/12/10/including-smiles-cml-and-inchi-in.html",
      "title": "Including SMILES, CML and InChI in blogs",
      "content_html": "<p>The blogs <a href=\"http://blog.chembark.com/\">ChemBark</a> and <a href=\"http://kinasepro.wordpress.com/\">KinasePro</a> have been discussing\nthe use of SMILES, CML and InChI in <a href=\"http://wiki.cubic.uni-koeln.de/pg/\">Chemical Blogspace</a> (with 70 chemistry blogs now!).\nChemists seem to <a href=\"http://kinasepro.wordpress.com/2006/12/05/monday-night-ot-2/\">prefer SMILES over InChI</a>, while there is\n<a href=\"http://blog.chembark.com/2006/11/25/help-needed-how-do-we-use-cml-properly/\">interest in moving towards CML too</a>.\n<a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/\">Peter commented</a>.</p>\n\n<p>Any incorporation of content other than images and free text requires some HTML knowledge, but this can be rather limited.\nIt is up to us chemoinformaticians to write good documentation on how to do things; so here is a first go.</p>\n\n<h2 id=\"including-cml-in-blogs-and-other-rss-feeds\">Including CML in blogs and other RSS feeds</h2>\n\n<p>I blogged about including <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/02/18/blogging-chemistry-on-blogspotcom.html\">CML in blogs <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nlast February, and can generally refer to this article published last year: <em>Chemical markup, XML, and the World Wide Web. 5.\nApplications of chemical metadata in RSS aggregators</em> (PMID:<a href=\"https://pubmed.ncbi.nlm.nih.gov/15032525\">15032525</a>,\nDOI:<a href=\"https://doi.org/10.1021/ci034244p\">10.1021/ci034244p</a>). Basically, it just comes down to putting the CML code into\nthe HTML version of your blog content, though I appreciate the need for plugins.</p>\n\n<h2 id=\"including-smiles-cas-and-inchi-in-blogs\">Including SMILES, CAS and InChI in blogs</h2>\n\n<p>Including SMILES is much easier as it is plain text, and has the advantage over InChI that it is much more readable.\n<a href=\"http://www.cambridgemedchemconsulting.com/\">Chris</a> wondered in the KinasePro blog on how to tag SMILES, while Paul\ndid the same on ChemBark about CAS numbers.</p>\n\n<p>Now, users of <a href=\"http://postgenomic.com/\">PostGenomic.com</a> know how to <a href=\"http://postgenomic.com/wiki/doku.php?id=markup\">add markup to their blogs</a>\nto get PostGenomic index discussed literature, website and conferences. Something similar is easily done for chemistry\nthings too, as I showed in <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/02/25/hacking-inchi-support-into.html\">Hacking InChI support into postgenomic.com <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\n(which was put on lower priority because of finishing my PhD). PostGenomic.com basically uses microformats, which I\nblogged about just a few days ago in <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/12/06/chemoblogs-2.html\">Chemo::Blogs #2 <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\nwhere I suggested the use of <code class=\"language-plaintext highlighter-rouge\">&lt;span class=\"chemicalcompound\"&gt;asperin&lt;/span&gt;</code>.</p>\n\n<p>And this is the way SMILES, CAS and InChI’s can be tagged on blogs. The <code class=\"language-plaintext highlighter-rouge\">&lt;span&gt;</code> element is HTML code to indicate\na bit of similar content in HTML, and can, among many other things, be formatted differently than other text. However,\nthis can also be used to add semantics in a relatively cheap, but accepted, way. [Microformats](http://microformats.org/\n are formalized just by use, so whatever we, as chemistry bloggers, use will become the de facto standard. Here are my suggestions:</p>\n\n<ul>\n  <li>for SMILES: <code class=\"language-plaintext highlighter-rouge\">&lt;span class=\"smiles\"&gt;CCO&lt;/span&gt;</code></li>\n  <li>for CAS registry numbers: <code class=\"language-plaintext highlighter-rouge\">&lt;span class=\"casnumber\"&gt;50-00-0&lt;/span&gt;</code></li>\n  <li>for InChI: <code class=\"language-plaintext highlighter-rouge\">&lt;span class=\"inchi\"&gt;InChI=1/CH4/h1H4&lt;/span&gt;</code></li>\n</ul>\n\n<h2 id=\"the-rdfa-alternative\">The RDFa alternative</h2>\n\n<p>The future, however, might use RDFa over microformats, so here are the RDFa equivalents:</p>\n\n<ul>\n  <li>for SMILES: <code class=\"language-plaintext highlighter-rouge\">&lt;span class=\"chem:smiles\"&gt;CCO&lt;/span&gt;</code></li>\n  <li>for CAS registry numbers: <code class=\"language-plaintext highlighter-rouge\">&lt;span class=\"chem:casnumber\"&gt;50-00-0&lt;/span&gt;</code></li>\n  <li>for InChI: <code class=\"language-plaintext highlighter-rouge\">&lt;span class=\"chem:inchi\"&gt;InChI=1/CH4/h1H4&lt;/span&gt;</code></li>\n</ul>\n\n<p>which requires you to register the namespace <code class=\"language-plaintext highlighter-rouge\">xmlns:chem=\"http://www.blueobelisk.org/chemistryblogs/\"</code> somewhere though.\nFormally, the URN for this namespace needs to be formalized; Peter, would the <a href=\"http://www.blueobelisk.org/\">Blue Obelisk</a>\nbe the platform to do this? BTW, this is more advanced, and currently does not have practical advantages over the use of\nmicroformats.</p>",
      "summary": "The blogs ChemBark and KinasePro have been discussing the use of SMILES, CML and InChI in Chemical Blogspace (with 70 chemistry blogs now!). Chemists seem to prefer SMILES over InChI, while there is interest in moving towards CML too. Peter commented.",
      
      "date_published": "2006-12-10T00:00:00+00:00",
      "date_modified": "2024-12-29T00:00:00+00:00",
      "tags": ["cml","inchi","blog","cb","microformat","rdf","html"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1021/CI034244P", "doi": "10.1021/CI034244P"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/me9g4-sa136",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/12/09/h-index-in-chemoinformatics.html",
      "title": "H-index in chemoinformatics",
      "content_html": "<p><a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/\">Peter</a> <a href=\"https://blogs.ch.cam.ac.uk/pmr/2006/12/08/impact-factors-hirsch-erdos-and-pauling/\">blogged <i class=\"fa-solid fa-recycle fa-xs\"></i></a> about the\n<a href=\"http://en.wikipedia.org/wiki/H-index\">h-index</a>, which is a measure for ones scientific impact. He used\n<a href=\"http://scholar.google.com/\">Google Scholar</a>, but I do not feel that that database is clean enough. I believe a better\nsource would be the <a href=\"http://portal.isiknowledge.com/portal.cgi?DestApp=WOS&amp;Func=Frame\">ISI Web-of-Science</a>.</p>\n\n<p>Therefore, I composed a list of h-indices of my own, ordered by value. The choice of authors is biased to the\n<a href=\"http://www.blueobelisk.org/\">Blue Obelisk</a> and the <a href=\"http://cdk.sf.net/\">CDK</a>, has some personal touches\n(<a href=\"http://www.cac.science.ru.nl/people/lbuydens/\">Buydens</a> are <a href=\"http://www.cac.science.ru.nl/people/rwehrens/\">Wehrens</a>\nare my PhD supervisors) and some names that put the rest into perspective:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>query\t\th-index\t#pubs\nBENDER A\t41\t222\nWILLETT P\t37\t302\nGASTEIGER J\t33\t212\nRZEPA HS\t25\t236\nBUYDENS LMC\t18\t108\nGLEN RC\t\t18\t78\nWEHRENS R\t11\t47\nMURRAY-RUST P*\t9\t41\nSTEINBECK C\t9\t29\nFECHNER U\t6\t12\nGUHA R\t\t4\t24\nWILLIGHAGEN E*\t4\t9\nWEGNER JK\t3\t9\nLUTTMANN E\t2\t4\n</code></pre></div></div>\n\n<p>Of course, there are many comments on this. Like any measurement, take into account the error. Sources of error\ninclude, but are not limited to, ambiguity in the query. The most notable example of this, I think, is\n<a href=\"http://andygoesus.blogspot.com/\">Andreas Bender</a>; I don’t think he has been <em>that</em> successful :) Also,\n<a href=\"http://cheminfo.informatics.indiana.edu/~rguha/\">Rajarshi Guha</a>’s h-index was reported 6, but the list included\ntwo articles from the 70-ies and 80-ies, which I do not think are actually really his.</p>\n\n<p>Feel free to suggest other names, query corrections, tips, and I will add or work on those too.</p>",
      "summary": "Peter blogged about the h-index, which is a measure for ones scientific impact. He used Google Scholar, but I do not feel that that database is clean enough. I believe a better source would be the ISI Web-of-Science.",
      
      "date_published": "2006-12-09T00:00:00+00:00",
      "date_modified": "2025-04-20T00:00:00+00:00",
      "tags": ["cheminf"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/9dsmh-pc150",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/12/06/power-of-big-numbers.html",
      "title": "The power of big numbers",
      "content_html": "<p>Contributions to open data do not have to be large, as long as many people are doing it. The\n<a href=\"http://wikipedia.org/\">Wikipedia</a> is a good example, and <a href=\"http://pubchem.ncbi.nlm.nih.gov/\">PubChem</a>\naccepts contributions of small databases too (I think). The result can still be large and rather useful, even scientifically.</p>\n\n<p>The latter was recently written down in the paper <em>Internet-based monitoring of influenza-like illness (ILI) in the general\npopulation of the Netherlands during the 2003–2004 influenza season</em> by Marquet et al. (DOI:<a href=\"https://doi.org/10.1186/1471-2458/6/242\">10.1186/1471-2458/6/242</a>).\nThe data was provided by Internet users via <a href=\"http://www.degrotegriepmeting.nl/\">The Great Influenza Survey</a> website. The article states that\nthe sum of all those small contributions (anonymous website users are asked to fill out a weekly form), yields reliable data. The user is\nrewarded by colorful pictures, such as:</p>\n\n<p><img src=\"/assets/images/alles_2006-12-06.png\" alt=\"\" /></p>\n\n<p>If all chemists and biochemists would add information about or properties of one molecule or metabolite to the Wikipedia each month,\none or more commercial database companies will have to change their business model soon. Oh, you already can start doing this\n<a href=\"http://en.wikipedia.org/wiki/Portal:Chemistry\">here</a>.</p>",
      "summary": "Contributions to open data do not have to be large, as long as many people are doing it. The Wikipedia is a good example, and PubChem accepts contributions of small databases too (I think). The result can still be large and rather useful, even scientifically.",
      
      "date_published": "2006-12-06T00:00:00+00:00",
      "date_modified": "2025-02-16T00:00:00+00:00",
      "tags": ["virus","chemometrics"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1186/1471-2458/6/242", "doi": "10.1186/1471-2458/6/242"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/gwy1n-1sc04",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/12/06/chemoblogs-2.html",
      "title": "Chemo::Blogs #2",
      "content_html": "<p>Because no one picked up my <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/09/15/chemoblogs-1.html\">Chemo::Blogs <i class=\"fa-solid fa-recycle fa-xs\"></i></a> suggestion, I will now\nofficially claim the blog series title. However, unlike the original <a href=\"http://bioblogs.wordpress.com/\">Bio::Blogs</a> series,\nI will not summarize interesting blogs, but just spam you with websites I recently marked as\n<a href=\"http://del.icio.us/egonw/toblog\">toblog on del.icio.us</a>.</p>\n\n<h2 id=\"semantics-and-text-mining\">Semantics and Text Mining</h2>\n\n<p><a href=\"http://evan.prodromou.name/\">Evan Prodromou</a> wrote about <a href=\"http://evan.prodromou.name/RDFa_vs_microformats\">RDFa vs microformats</a>.\nThe latter are commonly used in <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/02/06/tagging-blog-items.html\">enhancing blog semantics <i class=\"fa-solid fa-recycle fa-xs\"></i></a>, and\nfor example used by <a href=\"http://postgenomic.com/wiki/doku.php?id=markup\">PostGenomic.com</a>. While RDFa is more explicit, e.g. by using\nnamespaced markup, we have to wait until XHTML2 to see it working. I do not think chemists are using tags a log yet, but let me\npropose the following microformats: <span class=\"inchi\"><a href=\"http://google.com/search?q=1/CH4/h1H4\">1/CH4/h1H4</a></span> and\n<span class=\"chemicalcompound\">methane<span>. Standard JavaScripts and CSS scripts will then do the rest. (Think: addressing newlines,\nauto <a href=\"http://wwmm-svc.ch.cam.ac.uk/wwmm/html/googleinchiserver.html\">googling-for-inchi</a>, etc).</span></span></p>\n\n<p>The reason why using microformats is interesting, is text mining, of various kinds. Whether it is setting up a molecule-article\nlink database, or <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/02/25/hacking-inchi-support-into.html\">find hot molecules in blogspace <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\nadding semantics will help tools like <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/09/08/chemical-archeology-oscar3-to.html\">OSCAR3 to mine chemistry <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.\nSome time ago <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/05/07/open-text-mining-interface-and.html\">OTMI was proposed by Nature <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\nand they now set up a <a href=\"http://www.opentextmining.org/wiki/Main_Page\">dedicated web site</a> to explain there view on text mining.\n<a href=\"http://www.zacker.com/\">Zack Rosen</a> has a good idea why <a href=\"http://www.zacker.org/semantic-web-research-isnt-working\">RDF Semantic web research isn’t working</a>.</p>\n\n<h2 id=\"blogspace\">Blogspace</h2>\n\n<p>There are a few new chemistry blogs I want to mention (and already added to <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/08/25/chemical-blogspace.html\">Chemical blogspace <i class=\"fa-solid fa-recycle fa-xs\"></i></a>):\n<a href=\"http://blog.chembark.com/\">ChemBark</a>, <a href=\"http://www.lirico.co.uk/wp/\">lirico</a> which has an interesting\n<a href=\"http://www.lirico.co.uk/wp/?cat=8\">chemoinformatics section</a>, and <a href=\"http://ashutoshchemist.blogspot.com/\">The Curious Wavefunction</a>.\nWorth reading indeed.</p>\n\n<p><a href=\"http://plindenbaum.blogspot.com/\">Pierre’s YOKOFAKUN</a> deserves a paragraph of his own. He recently blogged about\n<a href=\"http://plindenbaum.blogspot.com/2006/11/bio2rdf.html\">bio2rdf</a> which provides an <a href=\"http://bio2rdf.org/\">RDF interface to biochemical knowledge</a>\nvia <a href=\"http://lsid.sourceforge.net/\">Life Science Identifiers</a> (LSID), <a href=\"http://plindenbaum.blogspot.com/2006/11/wwwoboeditorg.html\">OBOEdit</a>\nwhich is a Java-based ontology editor, and <a href=\"http://plindenbaum.blogspot.com/2006/12/visual-unix-pipeline.html\">Amadea</a>\nwhich is a <a href=\"http://taverna.sf.net/\">Taverna</a>- and <a href=\"http://www.knime.org/\">KNIME</a>-like tool for setting up UNIX pipes.</p>\n\n<h2 id=\"online-embl-symposium\">Online EMBL Symposium</h2>\n\n<p>A few EMBL PhD students are having the <a href=\"http://virtualsymposium.predocs.org/\">First Online EMBL PhD Symposium</a> (catchy name, or … ;)\nAnyway, discussions are held on IRC, and it has a rather interesting Web2.0 session. All\n<a href=\"http://virtualsymposium.predocs.org/media\">media is available on the website</a> but requires registration right now.\nAfter the conference it will become open access to all. <a href=\"http://www.blogger.com/profile/6833158\">Jean-Claude</a> contributed\n<em>The UsefulChem Project: Open Source Chemistry Research using Blogs and Wikis</em> to the\n<a href=\"http://virtualsymposium.predocs.org/media/participants-contributions/\">Participants’ Contributions section</a>, and I had\na poster on <em>Distributing molecular information over the Internet</em>, discussing CMLRSS, blog aggregators, CML and other things.\nThe IRC session was logged and is <a href=\"http://virtualsymposium.predocs.org/chat/discussion-about-the-influence-of-web-2-0-on-science-tuesday-december-6-2006-16-00-cet/\">available here</a>.</p>\n\n<h2 id=\"literature\">Literature</h2>\n\n<p>Finally, I want to mention three recent articles. First one is a recent write up by Bourne and Friedberg about\n<em>Ten Simple Rules for Selecting a Postdoctoral Position</em> (DOI: <a href=\"https://doi.org/10.1371/journal.pcbi.0020121\">10.1371/journal.pcbi.0020121</a>).\nWith the end of my current postdoc position nearing, rather useful reading. Some time ago I blogged about a\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2006/05/11/new-open-access-journal-source-code.html\">New open access journal Source Code for Biology and Medicine <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\nand the journal is now up and running. Details can be read in the first editorial (DOI: <a href=\"https://doi.org/10.1186/1751-0473-1-1\">10.1186/1751-0473-1-1</a>).\nThe third article I would like to mention is <em>Scientific Software Development Is Not an Oxymoron</em> by Baxter\n(DOI: <a href=\"https://doi.org/10.1371/journal.pcbi.0020087\">10.1371/journal.pcbi.0020087</a>), though I do not think it has new insights.</p>\n\n<p>OK, this was a rather lengthy write up, but really needed to clean up my toblog section :)</p>",
      "summary": "Because no one picked up my Chemo::Blogs suggestion, I will now officially claim the blog series title. However, unlike the original Bio::Blogs series, I will not summarize interesting blogs, but just spam you with websites I recently marked as toblog on del.icio.us.",
      
      "date_published": "2006-12-06T00:00:00+00:00",
      "date_modified": "2025-02-16T00:00:00+00:00",
      "tags": ["blog","rdf","textmining","cb"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1371/journal.pcbi.0020121", "doi": "10.1371/journal.pcbi.0020121"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1186/1751-0473-1-1", "doi": "10.1186/1751-0473-1-1"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1371/journal.pcbi.0020087", "doi": "10.1371/journal.pcbi.0020087"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/gtt31-tee03",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/11/28/code-coverage-making-sure-your-code-is.html",
      "title": "Code coverage: making sure your code is tested",
      "content_html": "<p>Recently I <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/10/26/running-single-junit-tests-in-eclipse.html\">discussed JUnit testing from within Eclipse <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\nand blogged at <a href=\"http://search.blogger.com/?as_q=JUnit&amp;ie=UTF-8&amp;x=0&amp;y=0&amp;q=JUnit+blogurl:chem-bla-ics.blogspot.com&amp;ui=blg&amp;start=0\">several occasions</a> <!-- keep link -->\nabout it in other situations. I cannot stress enough how useful unit testing is: it adds this extra set of\n<a href=\"http://en.wikipedia.org/wiki/Given_enough_eyeballs,_all_bugs_are_shallow\">eyeballs to make bugs shallow</a>.\nAnd it does that, indeed.</p>\n\n<p>Ensuring that you actually test all the code you write, however, is not easy. A couple of years back I read an article about\n<a href=\"http://hansel.sf.net/\">Hansel</a>, which does code coverage checking, but never got it nicely working for the\n<a href=\"http://cdk.sf.net/\">CDK project</a>. Never looked at that lately, so no idea how the current release would work out.\nHansel is an extension of <a href=\"http://www.junit.org/\">JUnit</a>, and requires hard coding class names, which conflicts with\nCDK’s module setup.</p>\n\n<p>Thomas Kuhn pointed me last week to <a href=\"http://emma.sf.net/\">Emma</a>, which seems a nice tool. It does not require hacking\nour source, and generates cool HTML:</p>\n\n<p><img src=\"/assets/images/emmaCoverage.png\" alt=\"\" /></p>\n\n<p>And even highlights the source code:</p>\n\n<p><img src=\"/assets/images/emmaCoverage1.png\" alt=\"\" /></p>\n\n<p>BTW, I seem to be in good company: <a href=\"http://www.gnu.org/software/classpath/\">Classpath</a> is\n<a href=\"http://builder.classpath.org/~cpdev/coverage/\">using it too</a>.</p>\n\n<p>Below is the command I issued to generate the HTML output. Rajarshi, maybe this can be integrated into\n<a href=\"http://cheminfo.informatics.indiana.edu/~rguha/code/java/nightly/\">Nightly</a>? Note that it only runs the tests\nfor the data module:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>ant dist-large dist-test-large\njava <span class=\"nt\">-cp</span> ~/tmp/emma-2.0.5312/lib/emma.jar emmarun <span class=\"se\">\\</span>\n  <span class=\"nt\">-cp</span> develjar/junit.jar:dist/jar/cdk-svn-20061128.jar:dist/jar/cdk-test-svn-20061128.jar <span class=\"se\">\\</span>\n  <span class=\"nt\">-r</span> html <span class=\"nt\">-sp</span> src junit.textui.TestRunner org.openscience.cdk.test.MdataTest\n</code></pre></div></div>",
      "summary": "Recently I discussed JUnit testing from within Eclipse , and blogged at several occasions about it in other situations. I cannot stress enough how useful unit testing is: it adds this extra set of eyeballs to make bugs shallow. And it does that, indeed.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/emmaCoverage1.png",
      "date_published": "2006-11-28T00:00:00+00:00",
      "date_modified": "2025-02-16T00:00:00+00:00",
      "tags": ["opensource","cdk","junit"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/c3j5m-z1d76",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/11/14/german-conference-on-chemoinformatics_14.html",
      "title": "German Conference on Chemoinformatics 2006: Day 3",
      "content_html": "<p>Just some short quites note about the third day (see <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/11/13/german-conference-on-chemoinformatics.html\">day 1 and 2 <i class=\"fa-solid fa-recycle fa-xs\"></i></a>).\nToday’s program of the <a href=\"http://scholle.oc.uni-kiel.de/users/cic/tagungen/workshop06/index.html\">German Conference on Chemoinformatics</a>\nstarted with a presentation by Rzepa about his work on a semantic wiki (DOI:<a href=\"https://doi.org/10.1021/ci060139e\">10.1021/ci060139e</a>),\nwhich might be <a href=\"http://www.ch.ic.ac.uk/wiki/\">online here</a>. (He recorded a podcast, but I have not seen it online yet.) I wish I could\nsee the sources of those wiki pages, to see how that system integrates RDF, but at least <a href=\"http://www.jmol.org/\">Jmol</a> is running fine.\nThe presentation by Couch showed the status of the <a href=\"http://www.materialsgrid.org/\">Materials Grid project</a>, and how a guy called AgentX\ndoes all the hard work. Ihlenfeldt updated us about the status of <a href=\"http://pubchem.ncbi.nlm.nih.gov/\">PubChem</a>, and mostly on what they\nhad to do to keep the system from dying from its own success, for example using something called minimol. Googling does not seem to\nhelp, as that points to a number of things, but not any PubChem webpage. I am still waiting for a European organization to set up a mirror.</p>\n\n<p>After the coffee break, Kuhn showed a coarse grained force field, approximating molecules by hacking them up in fragment of 3-10 heavy atoms.\nI guess, a bit like some small molecules force fields do for methyls. Fragments within a molecule are tied together by springs, and intra-\nand intermolecular force field parameters by running MD runs on fragment pairs. Varnek argued that QSPR for melting point prediction has\nreached a fundamental limited, with an RMSE of around 30 to 40 degrees Celsius, which makes it quite unreasonable to decide whether a\ncompound with a predicted melting point of 40 degrees is solid or fluid at room temperature.</p>\n\n<p>You have to forgive me for not reporting on the afternoon session; I was tied up talking with people at our booth, talking about the CDK,\nTaverna, Bioclipse, Jmol, other opensource chemoinformatics tools, and chemoinformatics in general. Very nice, but exhausting. I might\nadvise the organization to set up a blog aggregator next year, though I am not sure whether there are others blogging about this conference.</p>",
      "summary": "Just some short quites note about the third day (see day 1 and 2 ). Today’s program of the German Conference on Chemoinformatics started with a presentation by Rzepa about his work on a semantic wiki (DOI:10.1021/ci060139e), which might be online here. (He recorded a podcast, but I have not seen it online yet.) I wish I could see the sources of those wiki pages, to see how that system integrates RDF, but at least Jmol is running fine. The presentation by Couch showed the status of the Materials Grid project, and how a guy called AgentX does all the hard work. Ihlenfeldt updated us about the status of PubChem, and mostly on what they had to do to keep the system from dying from its own success, for example using something called minimol. Googling does not seem to help, as that points to a number of things, but not any PubChem webpage. I am still waiting for a European organization to set up a mirror.",
      
      "date_published": "2006-11-14T00:00:00+00:00",
      "date_modified": "2025-02-16T00:00:00+00:00",
      "tags": ["cheminf","conference","semweb"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1021/ci060139e", "doi": "10.1021/ci060139e"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/txr0z-6j242",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/11/13/german-conference-on-chemoinformatics.html",
      "title": "German Conference on Chemoinformatics 2006: Day 1 and 2",
      "content_html": "<p>The <a href=\"http://scholle.oc.uni-kiel.de/users/cic/tagungen/workshop06/index.html\">2nd German Conference on Chemoinformatics</a>\nstarted yesterday, with two chemoinformatics tutorials: one on industrial chemoinformatics (I saw this presentation\nbefore… not sure when), with a good overview on integrating different information sources; the second one was about\nopensource chemoinformatics by <a href=\"http://wiki.cubic.uni-koeln.de/blog/index.php\">Christoph Steinbeck</a> (being involved\nin opensource chemoinformatics for almost 10 years now!), which included a <a href=\"http://www.bioclipse.net/\">Bioclipse</a>\ndemo (by me) and a demo by Thomas Kuhn on the <a href=\"http://cdk.sf.net/\">CDK</a> based chemoinformatics plugin to\n<a href=\"http://taverna.sf.net/\">Taverna</a>. Other opensource projects of the <a href=\"http://www.blueobelisk.org/\">Blue Obelisk</a>\nmovement were mentioned and a few outside it too.</p>\n\n<p>The conference is in honor of the life work by <a href=\"http://www2.chemie.uni-erlangen.de/\">Prof. Gasteiger</a>, who gave an\noverview of chemoinformatics in his group, Germany and Europe. He stressed the need of education in chemoinformatics,\nlike in <a href=\"http://wiki.cubic.uni-koeln.de/blog/pivot/entry.php?id=12\">Obernai</a>. He also highlighted that we, today,\nare still solving the same problem as 30 years ago. Which is true, which is why this channel is called\n<a href=\"https://chem-bla-ics.linkedchemistry.info/\">Chem-bla-ics <i class=\"fa-solid fa-recycle fa-xs\"></i></a>, trying to solve that problem. When asked if opensource chemoinformatics\nform the start would have addressed this, he replied that he requires people to cooperatively do research with his\ngroup; opensource clearly cannot enforce that.</p>\n\n<h1 id=\"day-2\">Day 2</h1>\n\n<p>Todays program had a number of interesting presentations (I, unfortunately, missed the first presentation, so\nhave to visit that group soon now, to make up for that.) <a href=\"http://www.dq.fct.unl.pt/staff/jas/introduction.htm\">Prof. Aires-de-Sousa</a>\nshowed his work on MOLMAP for mapping metabolic networks (<a href=\"http://www.genome.jp/kegg/\">KEGG</a> really, see my\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2006/04/04/mining-kegg-pathway-database-with-self.html\">earlier blog <i class=\"fa-solid fa-recycle fa-xs\"></i></a>), and showed,\njust as proof of principle, classification of organisms based on this.</p>\n\n<p>J. Weisser talked about docking, still an obligatory topic. This work really showed two new approaches: the use\nof QM partial charges (the example showed an improvement in RMSD of a factor 10, not very statistical, but\npromising indeed); the second was the fact that water does not like to be in tight spots, because of reduced\npossibilities for hydrogen bonding. A concept common in understand supramolecular phenomenon, but I have not\nseen this applied to docking before. But I am no expert in that field. M. Wagner showed work on using KEGG\ndata to estimate likely metabolites, and the use in reducing effects of metabolic degradation. T. Schroeter\nintroduced me to <a href=\"http://www.gaussianprocess.org/\">gaussian processes</a>, a new data modeling method. Quite\nembarrassing to get introduced to such, as being specialized in modeling methods for chemical problems.</p>\n\n<p>The poster session was, as normally, really exhausting, talking to a lot of people. Having a booth at the exhibition\non opensource chemoinformatics added a nice twist to this. I therefore skipped the FIZ-award winner lectures, so I\nhope someone else will blog about those.</p>\n\n<p>One last note: <a href=\"http://www.sun.com/software/opensource/java/\">Sun started releasing their Java platform under the GPL license</a>.\n<a href=\"http://wwmm.ch.cam.ac.uk/blogs/downing/\">Jim</a>, seems that they <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/10/25/being-good-opensource-user.html\">proved me wrong <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.\nThe class library is still not GPL, but is expected to become licensed such somewhere in the first half of next year.</p>",
      "summary": "The 2nd German Conference on Chemoinformatics started yesterday, with two chemoinformatics tutorials: one on industrial chemoinformatics (I saw this presentation before… not sure when), with a good overview on integrating different information sources; the second one was about opensource chemoinformatics by Christoph Steinbeck (being involved in opensource chemoinformatics for almost 10 years now!), which included a Bioclipse demo (by me) and a demo by Thomas Kuhn on the CDK based chemoinformatics plugin to Taverna. Other opensource projects of the Blue Obelisk movement were mentioned and a few outside it too.",
      
      "date_published": "2006-11-13T00:00:00+00:00",
      "date_modified": "2025-02-16T00:00:00+00:00",
      "tags": ["cheminf","conference","openscience","bioclipse","cdk","taverna","java"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/qgf4b-2p384",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/11/12/organic-chemists-can-now-tune.html",
      "title": "Organic chemists can now tune properties without changing the molecular structure??",
      "content_html": "<p><a href=\"http://www.paulbracher.com/blog/?p=217\">Paul Bracher</a> and <a href=\"http://blogs.nature.com/thescepticalchymist/2006/08/the_big_picture.html\">Joshua Finkelstein</a>\npointed my attention to a nice discussion in <a href=\"http://www.nature.com/\">Nature</a> on the future of chemistry, in\n<a href=\"http://www.nature.com/nature/journal/v442/n7102/full/442500a.html\">What Chemists Want to Know</a>, by Philip Ball.\nPaul and Joshua already reviewed it thoroughly, but I could not resist commenting in it too. Having chosen chemistry\nas specialization when I went to <a href=\"http://www.ru.nl/\">university</a>, and with a minor in supramolecular chemistry,\nthis is a something I do relate to.</p>\n\n<p>A main theme is whether chemistry is unexplored enough to justify further academic research and education. Ball’s answer is\nyes, and came up with a six questions, of which I found this one most intriguing: <em>what is the chemical basis of thought and memory</em>.\nBut the article interestingly also discusses if chemistry has not become a tool for more interesting fields of research.\nThe Nobel prize winners Ball interviewed do not think so.</p>\n\n<p>One quote took my surprise: <em>Where is synthetic astronomy - changing the gravitational constant to see what effect that\nhas on the properties of the Universe, and thus perhaps improving it?</em> Well, I might be out of the synthetic organic\nchemistry for too long now, but this is not a quote I would like to be in Nature with; is synthetic chemistry now\nable, then, to modify the nature, strengths of bonds now?? can they actually change molecular properties without\nchanging the connectivity?? Moreover, astronomers have changed the properties of objects in our universe: since\nyears they have been reducing the mass of the earth by sending of probes to other objects (satellites etc).\nLikewise, chemistry is <strong>not</strong> changing nature, it is just exploring all compounds we never had purified in our\nglassware yet. Synthesis is nowhere like changing nature.</p>\n\n<p>There is one other comment I would like to post here. I strongly agree that chemistry in itself is important to have\nas separate educational and research topic at universities. Simply because too databases are, from a chemical point\nof view, messed up. For example, <a href=\"http://www.genome.jp/kegg/\">KEGG</a> and the <a href=\"http://www.pdb.org/\">PDB</a> are know to\nhave many chemical errors, though these databases are rather important indeed. We need people around to educate\npeople and point out those errors, if life sciences itself is to have a future.</p>",
      "summary": "Paul Bracher and Joshua Finkelstein pointed my attention to a nice discussion in Nature on the future of chemistry, in What Chemists Want to Know, by Philip Ball. Paul and Joshua already reviewed it thoroughly, but I could not resist commenting in it too. Having chosen chemistry as specialization when I went to university, and with a minor in supramolecular chemistry, this is a something I do relate to.",
      
      "date_published": "2006-11-12T00:00:00+00:00",
      "date_modified": "2025-02-16T00:00:00+00:00",
      "tags": ["chemistry","nature"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1038/442500a", "doi": "10.1038/442500a"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/1ygnr-aa173",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/11/07/when-is-open-source-chemoinformatics.html",
      "title": "When is open source chemoinformatics successful?",
      "content_html": "<p>Open source chemoinformatics has become a common phenomenon, though many projects are small in nature:\nsource code is developed by only few developers, or even in a closed manner and released when considered\ndone. Within open source software there is room for distinguishing a subset of open development\nchemoinformatics, that is, Bazar-like, instead of Cathedral-like (see\n<a href=\"http://catb.org/esr/writings/cathedral-bazaar/cathedral-bazaar/\">ESR’s famous writing</a>).</p>\n\n<p>Measuring the importance of an open source project can be done by many measures, such as the number of people\non the user and developers mailing lists, number of downloads, number of source lines of code\n[<a href=\"http://en.wikipedia.org/wiki/Source_lines_of_code\">wp:SLOC</a>], number of independent development locations,\nand rankings on, for example, <a href=\"http://www.sourceforge.net/\">SourceForge</a> or <a href=\"http://www.google.com/\">Google</a>.\nJust to name a few.</p>\n\n<p>Scientific importance of an open source project can sometimes be measured by a citation index; that is, only\nwhen there is a landmark article for the project. <a href=\"http://www.umass.edu/microbio/rasmol/index2.htm\">Rasmol</a>\nis such a project: a first article was published in 1995 (DOI:<a href=\"https://doi.org/10.1016/S0968-0004(00)89080-5\">10.1016/S0968-0004(00)89080-5</a>),\nand a follow up in 2000 (DOI:<a href=\"https://doi.org/10.1016/S0968-0004(00)01606-6\">10.1016/S0968-0004(00)01606-6</a>).\nThe first was cited <strong>1190</strong> times, and the second 65 times (as stated on <a href=\"http://www.isiknowledge.com/wos/\">Web-of-Science</a>).\nQuite successful indeed.</p>\n\n<p>OK, it is not even 100+, but I am quite happy with the <a href=\"http://wiki.cubic.uni-koeln.de/cdkwiki/doku.php?id=literature\">scientific impact of the CDK</a>\nso far: the 2003 CDK article (DOI:<a href=\"https://doi.org/10.1021/ci025584y\">10.1021/ci025584y</a>) was cited 24 times\nnow, and the just published 2006 article (DOI:<a href=\"https://doi.org/10.2174/138161206777585274\">10.2174/138161206777585274</a>)\nonce:</p>\n\n<p><img src=\"/assets/images/cdkCitationCounts.png\" alt=\"\" /></p>",
      "summary": "Open source chemoinformatics has become a common phenomenon, though many projects are small in nature: source code is developed by only few developers, or even in a closed manner and released when considered done. Within open source software there is room for distinguishing a subset of open development chemoinformatics, that is, Bazar-like, instead of Cathedral-like (see ESR’s famous writing).",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/cdkCitationCounts.png",
      "date_published": "2006-11-07T00:00:00+00:00",
      "date_modified": "2025-02-16T00:00:00+00:00",
      "tags": ["cdk","rasmol","cheminf"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1016/S0968-0004(00)89080-5", "doi": "10.1016/S0968-0004(00)89080-5"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1016/S0968-0004(00)01606-6", "doi": "10.1016/S0968-0004(00)01606-6"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1021/CI025584Y", "doi": "10.1021/CI025584Y"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.2174/138161206777585274", "doi": "10.2174/138161206777585274"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/myymc-sr268",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/11/03/chemical-blogspace-updates.html",
      "title": "Chemical Blogspace updates",
      "content_html": "<p><a href=\"http://wiki.cubic.uni-koeln.de/pg/\">Chemical Blogspace</a> is up and running fine for some time now. Since the\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2006/08/25/chemical-blogspace.html\">start <i class=\"fa-solid fa-recycle fa-xs\"></i></a> the number of aggregated blogs increased from 19 to\n<a href=\"http://wiki.cubic.uni-koeln.de/pg/all_blogs.php\">64</a> now, of which a number are situated at\n<a href=\"http://chemblogs.org/\">ChemBlogs</a> which is a site where you can run a blog. Meanwhile, the number of\n<a href=\"http://wiki.cubic.uni-koeln.de/pg/all_papers.php\">cited papers</a> went up to 186! The\n<a href=\"http://pubs.acs.org/journals/jacsat/\">JACS</a> is most popular so far, followed by the\n<a href=\"http://www3.interscience.wiley.com/cgi-bin/jhome/26737\">Angewandte Chemie Int. Ed.</a></p>\n\n<p>As mentioned <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/08/25/chemical-blogspace.html\">before <i class=\"fa-solid fa-recycle fa-xs\"></i></a>, the software was taken\n<a href=\"http://postgenomic.com/\">Postgenomic.com</a>, which has upgraded considerably and released new software since the author\n<a href=\"http://www.ghastlyfop.com/blog/2006/09/changes.html\">moved to Nature</a>, but I have not found time to follow that upgrade\nyet :( The <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/02/25/hacking-inchi-support-into.html\">promised InChI support <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nis still pending too.</p>",
      "summary": "Chemical Blogspace is up and running fine for some time now. Since the start the number of aggregated blogs increased from 19 to 64 now, of which a number are situated at ChemBlogs which is a site where you can run a blog. Meanwhile, the number of cited papers went up to 186! The JACS is most popular so far, followed by the Angewandte Chemie Int. Ed.",
      
      "date_published": "2006-11-03T00:00:00+00:00",
      "date_modified": "2025-02-16T00:00:00+00:00",
      "tags": ["cb","inchi"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/xf0q9-v4n97",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/11/03/bioclipse-workshop-short-but.html",
      "title": "Bioclipse Workshop: short but productive",
      "content_html": "<p>The <a href=\"http://www.bioclipse.net/\">Bioclipse</a> <a href=\"http://wiki.bioclipse.net/index.php?title=Bioclipse_Workshop_Oct/Nov_2006\">Workshop</a>\nhas ended and, for just three days, turned out <a href=\"http://wiki.bioclipse.net/index.php?title=Outcome_of_the_Bioclipse_autumn_workshop_2006\">quite productive</a>.\nWe have first bits of scripting support for JavaScript using <a href=\"http://www.mozilla.org/rhino/\">Rhino</a>. At this moment the\nscripting plugin needs to explicit depend on plugins to be able to access their classpath, but we plan to solve that.\nAn example script:</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"c1\">// to have short identifiers</span>\n<span class=\"nb\">Array</span> <span class=\"o\">=</span> <span class=\"nb\">Packages</span><span class=\"p\">.</span><span class=\"nx\">java</span><span class=\"p\">.</span><span class=\"nx\">lang</span><span class=\"p\">.</span><span class=\"nx\">reflect</span><span class=\"p\">.</span><span class=\"nb\">Array</span><span class=\"p\">;</span>\n<span class=\"nb\">String</span> <span class=\"o\">=</span> <span class=\"nb\">Packages</span><span class=\"p\">.</span><span class=\"nx\">java</span><span class=\"p\">.</span><span class=\"nx\">lang</span><span class=\"p\">.</span><span class=\"nb\">String</span><span class=\"p\">;</span>\n<span class=\"nx\">msgBox</span> <span class=\"o\">=</span> <span class=\"nb\">Packages</span><span class=\"p\">.</span><span class=\"nx\">net</span><span class=\"p\">.</span><span class=\"nx\">bioclipse</span><span class=\"p\">.</span><span class=\"nx\">plugins</span><span class=\"p\">.</span><span class=\"nx\">bc_rhino</span><span class=\"p\">.</span><span class=\"nx\">ShowBcMsgBox</span><span class=\"p\">;</span>\n<span class=\"nx\">DbfetchServiceServiceLocator</span> <span class=\"o\">=</span>\n  <span class=\"nb\">Packages</span><span class=\"p\">.</span><span class=\"nx\">uk</span><span class=\"p\">.</span><span class=\"nx\">ac</span><span class=\"p\">.</span><span class=\"nx\">ebi</span><span class=\"p\">.</span><span class=\"nx\">www</span><span class=\"p\">.</span><span class=\"nx\">ws</span><span class=\"p\">.</span><span class=\"nx\">services</span><span class=\"p\">.</span><span class=\"nx\">urn</span><span class=\"p\">.</span><span class=\"nx\">Dbfetch</span><span class=\"p\">.</span><span class=\"nx\">DbfetchServiceServiceLocator</span><span class=\"p\">;</span>\n\n<span class=\"c1\">// get data</span>\n<span class=\"nx\">service</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"nc\">DbfetchServiceServiceLocator</span><span class=\"p\">();</span>\n<span class=\"nx\">strarray</span> <span class=\"o\">=</span> <span class=\"nx\">service</span><span class=\"p\">.</span><span class=\"nf\">getUrnDbfetch</span><span class=\"p\">().</span><span class=\"nf\">fetchData</span><span class=\"p\">(</span><span class=\"dl\">\"</span><span class=\"s2\">refseq:NM_210721</span><span class=\"dl\">\"</span><span class=\"p\">,</span> <span class=\"dl\">\"</span><span class=\"s2\">refseq</span><span class=\"dl\">\"</span><span class=\"p\">,</span> <span class=\"dl\">\"</span><span class=\"s2\">raw</span><span class=\"dl\">\"</span><span class=\"p\">);</span>\n\n<span class=\"c1\">// make readable</span>\n<span class=\"nx\">str</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"nc\">String</span><span class=\"p\">();</span>\n<span class=\"k\">for </span><span class=\"p\">(</span><span class=\"nx\">i</span> <span class=\"o\">=</span> <span class=\"mi\">0</span><span class=\"p\">;</span> <span class=\"nx\">i</span> <span class=\"o\">&lt;</span> <span class=\"nb\">Array</span><span class=\"p\">.</span><span class=\"nf\">getLength</span><span class=\"p\">(</span><span class=\"nx\">strarray</span><span class=\"p\">);</span> <span class=\"nx\">i</span><span class=\"o\">++</span><span class=\"p\">)</span> <span class=\"p\">{</span>\n  <span class=\"k\">if </span><span class=\"p\">(</span><span class=\"nx\">i</span> <span class=\"o\">!=</span> <span class=\"mi\">0</span><span class=\"p\">)</span>\n  <span class=\"nx\">str</span> <span class=\"o\">=</span> <span class=\"nx\">str</span> <span class=\"o\">+</span> <span class=\"p\">(</span><span class=\"dl\">\"</span><span class=\"se\">\\n</span><span class=\"dl\">\"</span><span class=\"p\">);</span>\n  <span class=\"nx\">str</span> <span class=\"o\">=</span> <span class=\"nx\">str</span> <span class=\"o\">+</span> <span class=\"nx\">strarray</span><span class=\"p\">[</span><span class=\"nx\">i</span><span class=\"p\">];</span>\n<span class=\"p\">}</span>\n\n<span class=\"c1\">// show</span>\n<span class=\"nx\">msgBox</span><span class=\"p\">.</span><span class=\"nc\">ShowStatic</span><span class=\"p\">(</span><span class=\"nx\">str</span><span class=\"p\">);</span>\n</code></pre></div></div>\n\n<p>It’s just a short example that uses webservice technology in Bioclipse to fetch a sequence.</p>\n\n<h1 id=\"qsar-support\">QSAR support</h1>\n\n<p>QSAR support is getting along too, with a new DescriptorProvider extension point in <a href=\"http://svn.sourceforge.net/viewvc/bioclipse/trunk/\">trunk/</a>\nand work is progressing on a wizard that allows selecting descriptors and a CDK backend. The output of the wizard is a matrix resource, for\nwhich we already have a rich editor. A <a href=\"http://www-ra.informatik.uni-tuebingen.de/software/joelib/\">JOELib</a> plugin has been suggested,\nas it has a good deal of QSAR descriptors too; <a href=\"http://miningdrugs.blogspot.com/\">Jörg</a>, interested in doing a tiny bit of Bioclipse hacking?</p>\n\n<p>A full proceedings is available <a href=\"http://wiki.bioclipse.net/index.php?title=Outcome_of_the_Bioclipse_autumn_workshop_2006\">online</a>.</p>",
      "summary": "The Bioclipse Workshop has ended and, for just three days, turned out quite productive. We have first bits of scripting support for JavaScript using Rhino. At this moment the scripting plugin needs to explicit depend on plugins to be able to access their classpath, but we plan to solve that. An example script:",
      
      "date_published": "2006-11-03T00:00:00+00:00",
      "date_modified": "2025-02-16T00:00:00+00:00",
      "tags": ["bioclipse","qsar","javascript","conference"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/hhtcn-zah03",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/11/01/bioclipse-workshop-is-in-progress.html",
      "title": "The Bioclipse Workshop is in progress",
      "content_html": "<p>The <a href=\"http://www.bioclipse.net/\">Bioclipse</a> <a href=\"http://wiki.bioclipse.net/index.php?title=Bioclipse_Workshop_Oct/Nov_2006\">Workshop</a>\nis in progress, and <a href=\"http://bioclipse.blogspot.com/\">Ola</a> is now leading a discussion about future releases and functionality.\nProceedings are <a href=\"http://wiki.bioclipse.net/index.php?title=Outcome_of_the_Bioclipse_autumn_workshop_2006\">live updated</a>,\nand presentation sheets will be available shortly.</p>",
      "summary": "The Bioclipse Workshop is in progress, and Ola is now leading a discussion about future releases and functionality. Proceedings are live updated, and presentation sheets will be available shortly.",
      
      "date_published": "2006-11-01T00:00:00+00:00",
      "date_modified": "2025-02-16T00:00:00+00:00",
      "tags": ["bioclipse","conference"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/vmj1w-f9732",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/10/28/opensource-chemistry-and-opensource.html",
      "title": "Opensource Chemistry and Opensource Chemoinformatics",
      "content_html": "<p>The <a href=\"http://hardly.cubic.uni-koeln.de/mailman/listinfo/blue-obelisk\">Blue Obelisk mailing list</a> has seen an\n<a href=\"http://hardly.cubic.uni-koeln.de/pipermail/blue-obelisk/2006-September/thread.html\">interesting discussion</a> on ambiguity in the term ‘open source’,\ntriggered by a study by <a href=\"http://www.blogger.com/profile/19401667\">Beth Ritter Guth</a>. For example, <a href=\"http://www.blogger.com/profile/6833158\">Jean-Claude Bradley</a>\nperforms ‘open source’ science (see his <a href=\"http://usefulchem.blogspot.com/\">Useful Chemistry blog</a>) who is not opposed to using\nclosed source software, while the <a href=\"http://www.blueobelisk.org/\">Blue Obelisk</a> is about ‘open source’ software. It seemed that\nthis was contradicting, and <a href=\"http://wwmm.ch.cam.ac.uk/wikis/wwmm/index.php/Peter_Murray_Rust\">Peter Murray-Rust</a>\n[<a href=\"http://en.wikipedia.org/\">wp</a>:<a href=\"http://en.wikipedia.org/wiki/Peter_Murray-Rust\">en</a>] wrote up a lengthy\n<a href=\"https://blogs.ch.cam.ac.uk/pmr/2006/09/26/open/\">overview of the use of the term ‘open’ <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.</p>\n\n<p>Now, I have been giving the ‘open source’ ambiguity some thinking (well, about a month or so…), and came to the following conclusions:</p>\n\n<ol>\n  <li>open source has the exact same meaning in both Bradley-like open source chemistry, and BO-like open source chemoinformatics</li>\n  <li>both have the same goal</li>\n  <li>it’s just the research topic that is different</li>\n</ol>\n\n<h1 id=\"ad-1-same-meaning-of-open-source\">Ad 1: same meaning of ‘open source’</h1>\n\n<p>I think ‘open source’ just means that every has the right to reproduce (and distribute and the same or modified shape)\nproducts created from the source.</p>\n\n<p>In ‘open source chemistry’ (Bradley-like, sorry for the term :) the source is are the details about the chemical reactions\nto perform, the product being being able to run the whole reaction pathway.</p>\n\n<p>In ‘open source chemoinformatics’ (Blue Obelisk-like) the source is the procedure that described how to get from one set\nof bits to another, really quite like getting from one molecule to another. Chemoinformatics, being IT science, just\nmakes it a lot easier to distribute the algorithm to do that. (Sure, <a href=\"https://doi.org/10.1021/ci0502698\">CMLReact</a>\nis getting along quite nicely.)</p>\n\n<p>The analogy even goes further, both science do not only depend on open source. Like Bradley-like open source science allows\nembedding proprietary stuff (glass-ware, closed-source software, chemical both from <a href=\"http://www.fisherscientific.com/\">Acros (now Fisher)</a>,\n…), so does BO-like open source science, which uses tons of proprietary stuff too (computers, Sun’s JVM, MS-Windows).</p>\n\n<h1 id=\"ad-2-same-goal\">Ad 2: same goal</h1>\n\n<p>I can be short on this one. For both ‘open source’ initiatives the goal is to share knowledge and make science reproducible.</p>\n\n<h1 id=\"ad-3-different-topic\">Ad 3: different topic</h1>\n\n<p>So, the confusion was just coming from the fact to what extend ‘open source’ tools are being used. Can you do open source\nscience without using open source chemoinformatics? Sure. In a utopic situation, all tools and small bits are ‘open source’\n(though <a href=\"http://wwmm.ch.cam.ac.uk/blogs/corbett/?p=7\">some are agnostic to this</a>). But fact is, that many Blue Obelisk members use ‘closed source’ tools all the time,\neven if they do not have too. At least everyone is doing ‘open source’ on their specialisms, both in open source chemistry\nand in open source chemoinformatics.</p>\n\n<p>I guess we should just be stop being short on ‘open source software’ to remove any ambiguity of the term ‘open source’.\nAs a spin-off, this would make Bradley’s work fit in nicely with ODOSOS: open data, open source, open standards.</p>",
      "summary": "The Blue Obelisk mailing list has seen an interesting discussion on ambiguity in the term ‘open source’, triggered by a study by Beth Ritter Guth. For example, Jean-Claude Bradley performs ‘open source’ science (see his Useful Chemistry blog) who is not opposed to using closed source software, while the Blue Obelisk is about ‘open source’ software. It seemed that this was contradicting, and Peter Murray-Rust [wp:en] wrote up a lengthy overview of the use of the term ‘open’ .",
      
      "date_published": "2006-10-28T00:00:00+00:00",
      "date_modified": "2025-04-20T00:00:00+00:00",
      "tags": ["openscience","opensource","blue-obelisk","chemistry","cheminf"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1021/ci0502698", "doi": "10.1021/ci0502698"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/3htbd-qma24",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/10/26/running-single-junit-tests-in-eclipse.html",
      "title": "Running single JUnit tests in Eclipse",
      "content_html": "<p>Unit testing is important when developing source code. <a href=\"http://www.junit.org/\">JUnit</a> provides a library to facilitate this in Java,\nand <a href=\"http://www.eclipse.org/te\">Eclipse</a> had the functionality to run JUnit tests. Even better, it allows you to run single JUnit\ntests, even in debug mode:</p>\n\n<p><img src=\"/assets/images/JUnitTestInDebugMode.png\" alt=\"\" /></p>\n\n<p>Just open the java class in your Package Explorer, right click on the JUnit method you want to run, then pick <code class=\"language-plaintext highlighter-rouge\">Run As</code> or <code class=\"language-plaintext highlighter-rouge\">Debug As</code>,\nand then <code class=\"language-plaintext highlighter-rouge\">JUnit test</code>.</p>",
      "summary": "Unit testing is important when developing source code. JUnit provides a library to facilitate this in Java, and Eclipse had the functionality to run JUnit tests. Even better, it allows you to run single JUnit tests, even in debug mode:",
      
      "date_published": "2006-10-26T00:00:00+00:00",
      "date_modified": "2024-08-12T00:00:00+00:00",
      "tags": ["junit","eclipse"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/ta3ms-e8480",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/10/25/being-good-opensource-user.html",
      "title": "Being a good opensource user",
      "content_html": "<p>There are many ways to contribute to opensource software (OSS), programming only being one of them. I develop OSS, but use OSS too.\nFor example, I am a big user of the <a href=\"http://www.kernel.org/\">Linux</a> kernel, the <a href=\"http://www.kde.org/\">KDE desktop</a>, <a href=\"http://www.kubuntu.org/\">Kubuntu</a>,\n<a href=\"http://www.debian.org/\">Debian</a> (I have unstable in a <a href=\"http://www.ubuntuforums.org/showthread.php?t=24575\">chroot</a>),\n<a href=\"http://www.getfirefox.com/\">Firefox</a>, <a href=\"http://www.eclipse.org/\">Eclipse</a>, <a href=\"http://www.gnu.org/software/classpath/\">Classpath</a>, and many,\nmany others. What these have in common, is that I generally have no time to look into the source code of these projects. A small patch excluded,\nI am really a regular user of these projects.</p>\n\n<p>However, I try not to <a href=\"http://en.wikipedia.org/wiki/Leech_(computing)\">leech</a> (see also <a href=\"https://blogs.ch.cam.ac.uk/pmr/2006/10/01/open-source-and-the-tragedy-of-the-lurkers/\">Peter’s related comment on that <i class=\"fa-solid fa-recycle fa-xs\"></i></a>):\nI care about these projects and, therefore, I file bug reports. Sometimes, I even join the developers and talk to them via commonly used IRC and\nmailing lists. Even, every now and then I get this itch and then I do look up source code and contribute a patch. But filing bug reports is the\nleast one can do, the least everyone should do.</p>\n\n<h1 id=\"classpath\">Classpath</h1>\n\n<p><a href=\"http://www.gnu.org/software/classpath/\">Classpath</a> is the GNU project to provide a free Java library, i.e. the set of <code class=\"language-plaintext highlighter-rouge\">java.*</code> classes\nthat come with the Sun JVM. It is not a virtual machine, though, for which several opensource implementations are available, many of\nwhich use Classpath as library provider. They have a very nice chat channel at irc.freenode.net, called <code class=\"language-plaintext highlighter-rouge\">#classpath</code>.\nThere wiki provides a <a href=\"http://developer.classpath.org/mediation/FreeSwingTestApps\">platform for given feedback</a> on how well software\nruns. A bug track system (BTS) is <a href=\"http://www.gnu.org/software/classpath/bugs.html\">available too</a>. An overview of the bugs that I filed,\ncan be found at <a href=\"http://del.icio.us/egonw\">my del.icio.us account</a>: <a href=\"http://del.icio.us/egonw/bugreports%2BClasspath\">bugreports+Classpath</a>.</p>\n\n<p>Needless to say, Classpath is important in making our Java based chemoinformatics truely opensource.</p>\n\n<h1 id=\"debiankubuntu\">Debian/Kubuntu</h1>\n\n<p>Things are different for <a href=\"http://www.debian.org/\">Debian</a> and <a href=\"http://www.kubuntu.org/\">Kubuntu</a>: these are distributions and, except for\nsome patching, are generally not involved software development as done by upstream. However, they generally do appreciate to know about\nbugs too, so there is some duplication of bug reports here.</p>\n\n<p>That said, they do provide nice tools for bug reporting which works for all packages that they distribute. Debian has\n<a href=\"http://packages.debian.org/reportbug\">reportbug</a> and Kubuntu has <a href=\"http://launchpad.net/\">Launchpad</a>. An over view of bugs I reported with\nDebian can be found at del.icio.us <a href=\"http://del.icio.us/egonw/bugreports%2Bdebian\">bugreports+debian</a>. I do not have bug reports in Launchpad\nyet, but two can be found in mailing list archives, see del.icio.us <a href=\"http://del.icio.us/egonw/bugreports%2Bubuntu\">bugreports+ubuntu</a>.</p>\n\n<h1 id=\"kde\">KDE</h1>\n\n<p>I also tracked back two bugs I reported with KDE, see del.icio.us <a href=\"http://del.icio.us/egonw/bugreports%2BKDE\">bugreports+KDE</a>.</p>\n\n<h1 id=\"sourceforge\">SourceForge</h1>\n\n<p>Surely, I filed many more bugs to many other projects. A long list of bug reports can be found on SourceForge. However, it seems not\npossible to make an easy list of that :(</p>",
      "summary": "There are many ways to contribute to opensource software (OSS), programming only being one of them. I develop OSS, but use OSS too. For example, I am a big user of the Linux kernel, the KDE desktop, Kubuntu, Debian (I have unstable in a chroot), Firefox, Eclipse, Classpath, and many, many others. What these have in common, is that I generally have no time to look into the source code of these projects. A small patch excluded, I am really a regular user of these projects.",
      
      "date_published": "2006-10-25T00:00:00+00:00",
      "date_modified": "2025-04-20T00:00:00+00:00",
      "tags": ["openscience"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/a587x-jvp62",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/10/11/are-chemogenomics-and.html",
      "title": "Are chemogenomics and proteochemometrics the same?",
      "content_html": "<p><a href=\"http://www.blogger.com/profile/2366764\">Joerg Wegner</a> <a href=\"http://miningdrugs.blogspot.com/2006/09/chemogenomics-structuring-drug.html\">recently blogged</a>\nabout <em>Chemogenomics: structuring the drug discovery process to gene families</em> by C.J. Harris and A. P. Stevens in Drug Discov Today\n(DOI: <a href=\"https://doi.org/10.1016/j.drudis.2006.08.013\">10.1016/j.drudis.2006.08.013</a>). This review article provides a nice overview of a trend in\nmathematical modelling of the interaction of small organic molecules with proteins, often referred to as <a href=\"http://en.wikipedia.org/wiki/QSAR\">QSAR</a>.\nWhat the article does not discuss, is the <a href=\"http://www.proteochemometrics.org/index.php?option=com_content&amp;task=view&amp;id=20&amp;Itemid=22\">work by the group of Jarl Wikberg</a>\nwho coined the term proteochemometrics (see PubMed: <a href=\"https://pubmed.ncbi.nlm.nih.gov/11342268/\">11342268</a>).</p>",
      "summary": "Joerg Wegner recently blogged about Chemogenomics: structuring the drug discovery process to gene families by C.J. Harris and A. P. Stevens in Drug Discov Today (DOI: 10.1016/j.drudis.2006.08.013). This review article provides a nice overview of a trend in mathematical modelling of the interaction of small organic molecules with proteins, often referred to as QSAR. What the article does not discuss, is the work by the group of Jarl Wikberg who coined the term proteochemometrics (see PubMed: 11342268).",
      
      "date_published": "2006-10-11T00:00:00+00:00",
      "date_modified": "2025-02-16T00:00:00+00:00",
      "tags": ["cheminf","bioinfo"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1016/j.drudis.2006.08.013", "doi": "10.1016/j.drudis.2006.08.013"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1016/s0304-4165(00)00187-2", "doi": "10.1016/s0304-4165(00)00187-2"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/1sk32-0jb54",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/10/06/googles-new-search-engine-code-search.html",
      "title": "Google&apos;s new search engine: /* Code Search */",
      "content_html": "<p><a href=\"http://www.google.com/\">Google</a> has set up a new search enginge specifically for source code:\n<a href=\"http://www.google.com/codesearch\">/* Code Search */</a>. Important difference with their normal search engine is that it\nallows restricting your search by programming language, license and filename and package. I have not been able to figure\nout how to use ‘package’ yet, but the others are pretty clear. For example: <code class=\"language-plaintext highlighter-rouge\">AtomContainer license:LGPL lang:java</code>\nshould do it. The search results show filenames, licenses and programming languages:</p>\n\n<p><img src=\"/assets/images/google_code.png\" alt=\"\" /></p>\n\n<p>Alternatively, you can use <a href=\"http://www.koders.com/\">Koders</a>, which is a source code search engine too. It has been around\nfor quite some time now, and shows the copyright notice too. Additionally, Koders offers a\n<a href=\"http://www.koders.com/info.aspx?c=tools\">plugin for Eclipse</a> which adds a search ‘view’ which will show the HTML from the\nwebsite in an editor window inside Eclipse.</p>",
      "summary": "Google has set up a new search enginge specifically for source code: /* Code Search */. Important difference with their normal search engine is that it allows restricting your search by programming language, license and filename and package. I have not been able to figure out how to use ‘package’ yet, but the others are pretty clear. For example: AtomContainer license:LGPL lang:java should do it. The search results show filenames, licenses and programming languages:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/google_code.png",
      "date_published": "2006-10-06T00:00:00+00:00",
      "date_modified": "2006-10-06T00:00:00+00:00",
      "tags": ["google","opensource"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/5wjg9-2e512",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/10/04/bioinformatics-open-source-or-open.html",
      "title": "Bioinformatics: Open Source or Open Access??",
      "content_html": "<p>I have heard that bioinformatics is ahead of chemoinformatics. However, I discoverd that this is not necessarily the case,\nwhile preparing for a homology modeling course I gave this week at the <a href=\"http://www.cubic.uni-koeln.de/\">CUBIC</a>. Open Access\nis really no issue there, with open access journals and many open access databases. But it is different when it comes down\nto open source software.</p>\n\n<p>Below is a list of bioinformatics programs which are free for academic use, but not open:</p>\n\n<ul>\n  <li><a href=\"http://www-cryst.bioc.cam.ac.uk/~joy/\">JOY</a> (free after getting license)</li>\n  <li><a href=\"http://www.cryst.chem.uu.nl/platon/\">PLATON</a> (free download)</li>\n  <li><a href=\"http://www.biochem.ucl.ac.uk/~roman/procheck/procheck.html\">PROCHECK</a> (free after getting license)</li>\n  <li><a href=\"http://www.predictprotein.org/\">ProteinPredict</a> (free download)</li>\n  <li><a href=\"http://dunbrack.fccc.edu/SCWRL3.php\">SCWRL</a> (free after getting license)</li>\n  <li><a href=\"http://bioinf.cs.ucl.ac.uk/threader/\">THREADER</a> (free after getting license)</li>\n  <li><a href=\"http://swift.cmbi.ru.nl/gv/whatcheck/\">WHAT_CHECK</a> (free download)</li>\n  <li><a href=\"http://swift.cmbi.ru.nl/whatif/\">WHAT_IF</a> (free after getting license)</li>\n</ul>\n\n<p>And this not even includes the many websites which do not offer the software behind them. And these programs cover several\nsteps in the whole homology modeling process. Open source homology modeling is not possible at this moment :(</p>\n\n<p>But, on the bright side, there are already some open source programs involved too:</p>\n\n<ul>\n  <li><a href=\"http://www.ncbi.nlm.nih.gov/blast/\">BLAST</a> (public domain)</li>\n  <li><a href=\"http://www.gromacs.org/\">GROMACS</a> (GPL)</li>\n</ul>\n\n<p>And protein structure viewers is hardly a problem at all; several open source viewers are available, among which\n<a href=\"http://pymol.sourceforge.net/\">Rasmol</a>, <a href=\"http://pymol.sourceforge.net/\">PyMOL</a> and\n<a href=\"http://www.jmol.org/\">Jmol</a>.</p>\n\n<p>In other words: we might not want to look at bioinformatics too much.</p>",
      "summary": "I have heard that bioinformatics is ahead of chemoinformatics. However, I discoverd that this is not necessarily the case, while preparing for a homology modeling course I gave this week at the CUBIC. Open Access is really no issue there, with open access journals and many open access databases. But it is different when it comes down to open source software.",
      
      "date_published": "2006-10-04T00:00:00+00:00",
      "date_modified": "2025-02-16T00:00:00+00:00",
      "tags": ["opensource","bioinfo"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/c1cqc-vvg92",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/09/28/complife06-day-1.html",
      "title": "CompLife&apos;06 - Day 1",
      "content_html": "<p><a href=\"http://www.inf.uni-konstanz.de/complife06/\">CompLife’06</a> started today in Cambridge, UK. About 80 people are attending the meeting,\nand topics range from systems biology to QSAR. This evening there was a free software session mostly focussing on opensource software.\nTwelve projects were presented, among which the <a href=\"http://cdk.sf.net/\">CDK</a> (by me) and <a href=\"http://www.bioclipse.net/\">Bioclipse</a> (by Ola),\nin five minute presentations, and a two hour demo period during a reception (free speech and free beer :). We had our brand new fliers\nwith us, as well as a large poster for some additional branding.</p>\n\n<p>One research presentation compared a number of fingerprint implementations in a QSAR study, and CDK came out very well, beating a few\ncommercial programs. The free software session was full of CDK, however, with <a href=\"http://ambit.acad.bg/\">AMBIT</a>,\n<a href=\"http://openbabel.sourceforge.net/wiki/IBabel\">iBabel</a>, Bioclipse and <a href=\"http://knime.org/\">KNIME</a> mentioning the CDK.</p>\n\n<p>The latter is really interesting: it’s a workflow program just like <a href=\"http://taverna.sourceforge.net/\">Taverna</a> or\n<a href=\"http://www.scitegic.com/products/overview/index.html\">PipeLine Pilot</a>, which is using the Eclipse RCP as starting point, just like\nBioclipse. And like the other two, KNIME has CDK integration, at least for displaying structures.</p>",
      "summary": "CompLife’06 started today in Cambridge, UK. About 80 people are attending the meeting, and topics range from systems biology to QSAR. This evening there was a free software session mostly focussing on opensource software. Twelve projects were presented, among which the CDK (by me) and Bioclipse (by Ola), in five minute presentations, and a two hour demo period during a reception (free speech and free beer :). We had our brand new fliers with us, as well as a large poster for some additional branding.",
      
      "date_published": "2006-09-28T00:00:00+00:00",
      "date_modified": "2025-02-16T00:00:00+00:00",
      "tags": ["cdk","bioclipse","knime"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/paxbm-rac78",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/09/24/cdk-bug-squash-party-day-5.html",
      "title": "CDK Bug Squash Party - Day 5",
      "content_html": "<p>Day 5 was formally the last day (see also the summaries of <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/09/18/cdk-bug-squash-party-day-1.html\">day 1 <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2006/09/20/cdk-bug-squash-party-day-2.html\">day 2 <i class=\"fa-solid fa-recycle fa-xs\"></i></a> and\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2006/09/22/cdk-bug-squash-party-day-3-and-4.html\">day 3/4 <i class=\"fa-solid fa-recycle fa-xs\"></i></a>) of the\n<a href=\"http://cdk.sf.net/\">Chemistry Development Kit</a> <a href=\"http://wiki.cubic.uni-koeln.de/cdkwiki/doku.php?id=bsp200609\">Bug Squash Party</a> (BSP).\nMiguel uploaded the last bits of his CDK <a href=\"http://cheminfo.informatics.indiana.edu/~rguha/code/java/nightly/api/org/openscience/cdk/protein/data/PDBPolymer.html\">PDBPolymer</a>\nto CML to CDK PDBPolymer roundtripping functionality (closing a bug and a feature request in one go). Have not tested this first hand yet,\nbut looking forward to playing with this bit of code. Kai continued to work on the more difficult bits of the\n<a href=\"http://wiki.cubic.uni-koeln.de/cdkwiki/doku.php?id=refactoringkernelclasses\">code refactoring</a>, resulting in fewer though more\ncomprehensive commits. Stefan fixed another bug in JChemPaint; the rendering of implicit hydrogens.</p>\n\n<p>About the last, the <a href=\"http://cheminfo.informatics.indiana.edu/~rguha/code/java/nightly/api/org/openscience/cdk/renderer/Renderer2D.html\">Renderer2D</a>\nneeds a serious overhaul. That is, a complete rewrite in proper Java2D, which can use affine transformations for zooming, scaling and fixing the\ncoordinate system. The current code is ancient and predates Java2D. <a href=\"https://doi.org/10.59350/qxc8d-c1w35\">Rich’ code</a>\nmight be a good starting point. I would love to do this rewrite, but lack the resources… anyone in need of some open source fame?</p>\n\n<p>I worked on atom typing, which is yet largely untested, and often integrated with other bits of code. Yesterday I uploaded\n<a href=\"http://svn.sourceforge.net/viewvc/cdk/trunk/cdk/src/org/openscience/cdk/atomtype/\">some first patches</a> which I wrote on the train ride\nback to the Netherlands.</p>\n\n<p>Now, what can be concluded from this BSP? The participant count was below what I had hoped for, but those who did worked hard (and\nwith pleasure I hope :) The total number of JUnit test has increased:</p>\n\n<p><img src=\"/assets/images/junit_tests.png\" alt=\"\" /></p>\n\n<p>And so has the number of failing tests:</p>\n\n<p><img src=\"/assets/images/fails_tests.png\" alt=\"\" /></p>\n\n<p>These plots were made with <a href=\"http://www.r-project.org/\">R</a> from data created with two custom scripts both found in\n<a href=\"http://svn.sourceforge.net/viewvc/cdk/trunk/cdk/tools/\">cdk/tools</a>: makeBugCountPlot.pl and extractBugCountPlotData.bsh.\nNote that <a href=\"http://cheminfo.informatics.indiana.edu/~rguha/code/java/nightly/junitsummary.html\">96.86% of the tests do not fail</a>!</p>\n\n<p>The bump in failing tests seems to be due to <a href=\"http://svn.sourceforge.net/viewvc/cdk/trunk/cdk/src/org/openscience/cdk/smiles/SmilesParser.java?r1=7009&amp;r2=7011\">commit 7010-7011</a>,\nwhich has to do with SMILES parsing. Yes, the bond order resolving is still not solved. I don’t seem to get Todd’s patch for this working,\nbut not giving up either. The bump is so large, because quite some JUnit tests use the SmilesParser as a quick tool to get a configured\nconnection table. However, these tests should be replaced by explicit CDK models, which is easy done with the\n<a href=\"http://cheminfo.informatics.indiana.edu/~rguha/code/java/nightly/api/org/openscience/cdk/io/CDKSourceCodeWriter.html\">CDKSourceCodeWriter</a>.\nI’ll blog about how to use that soon.</p>",
      "summary": "Day 5 was formally the last day (see also the summaries of day 1 , day 2 and day 3/4 ) of the Chemistry Development Kit Bug Squash Party (BSP). Miguel uploaded the last bits of his CDK PDBPolymer to CML to CDK PDBPolymer roundtripping functionality (closing a bug and a feature request in one go). Have not tested this first hand yet, but looking forward to playing with this bit of code. Kai continued to work on the more difficult bits of the code refactoring, resulting in fewer though more comprehensive commits. Stefan fixed another bug in JChemPaint; the rendering of implicit hydrogens.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/junit_tests.png",
      "date_published": "2006-09-24T00:00:00+00:00",
      "date_modified": "2025-02-22T00:00:00+00:00",
      "tags": ["cdk","bsp","junit","conference"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.59350/qxc8d-c1w35", "doi": "10.59350/qxc8d-c1w35"
            , "cito":
              
              
                [ 
                  "citesAsPotentialSolution"
                  
                 ]
              
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/zwkym-aty79",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/09/22/cdk-bug-squash-party-day-3-and-4.html",
      "title": "CDK Bug Squash Party - Day 3 and 4",
      "content_html": "<p>Because I was struggling hard with <a href=\"http://sourceforge.net/mailarchive/forum.php?thread_id=30594266&amp;forum_id=2178\">default values for cdk.interfaces fields</a>,\nI did not have time to write up the <a href=\"http://wiki.cubic.uni-koeln.de/cdkwiki/doku.php?id=bsp200609\">Bug Squash Party</a> report for day 3 (see also\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2006/09/18/cdk-bug-squash-party-day-1.html\">day 1 <i class=\"fa-solid fa-recycle fa-xs\"></i></a> and\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2006/09/20/cdk-bug-squash-party-day-2.html\">day 2 <i class=\"fa-solid fa-recycle fa-xs\"></i></a>).\nBut here it is.</p>\n\n<h1 id=\"day-3\">Day 3</h1>\n\n<p>Kai worked hard on getting the <code class=\"language-plaintext highlighter-rouge\">cdk.interfaces</code> API cleaned up, as <a href=\"http://wiki.cubic.uni-koeln.de/cdkwiki/doku.php?id=refactoringkernelclasses\">agreed upon earlier</a>.\nChristian added a test for the <a href=\"http://cheminfo.informatics.indiana.edu/~rguha/code/java/nightly/api/org/openscience/cdk/geometry/GeometryTools.html\">RMSD calculator</a>\n(see <code class=\"language-plaintext highlighter-rouge\">getAllAtomRMSD()</code>), and cleaned up his code a bit. Stefan continued his bug-squashing on JChemPaint and fixed another one or two bugs.</p>\n\n<p>Rajarshi uploaded a patch to set undefined atomic properties, like partial and formal charges and the implicit hydrogen count, to <code class=\"language-plaintext highlighter-rouge\">UNSET</code> by default.\nHowever, this broke the CDK at many places, as apparently many class methods assume the default to be zero. After discussing the issue at the CUBIC,\nit turned out that this was sort of the intended, though undocumented, behavior: use the <a href=\"http://java.sun.com/docs/books/tutorial/java/nutsandbolts/datatypes.html\">default Java values</a>.</p>\n\n<p>And I added missing <code class=\"language-plaintext highlighter-rouge\">clone()</code> methods, closing one bug on SourceForge, added files for Eclipse to know how to build the CDK with Ant (thanx\nto Nico for similar files for <a href=\"http://www.jmol.org/\">Jmol</a>), and got CDK compiled again against <a href=\"http://www.classpath.org/\">Classpath</a>.</p>\n\n<h1 id=\"day-4\">Day 4</h1>\n\n<p>Miguel uploaded his first patched for support saving <a href=\"http://cheminfo.informatics.indiana.edu/~rguha/code/java/nightly/api/org/openscience/cdk/protein/data/PDBPolymer.html\">PDBPolymer</a>\ndata structures into and restoring them again from CML, addressing an <a href=\"https://sourceforge.net/tracker/index.php?func=detail&amp;aid=1085912&amp;group_id=20024&amp;atid=120024\">almost two-year-old bug</a>.\nHe created new cdk.interfaces for them, to address module dependencies, but a large set of JUnit tests are <a href=\"http://cheminfo.informatics.indiana.edu/~rguha/code/java/nightly/test/result-data.html\">yet missing</a>.</p>\n\n<p>Kai continued his cdk.interfaces refactoring, working on the more involved changes. Stefan, Tobias, and me worked on a poster and three three-fold\nflyers for our CDK booth at <a href=\"http://www.inf.uni-konstanz.de/complife06/\">CompLife2006</a>, so have not been very productive in bug squashing.\nBut we are happy with the result. Below is a screenshot on one side of the main CDK folder:</p>\n\n<p><img src=\"/assets/images/flyerScreeny.png\" alt=\"\" /></p>\n\n<p>With <a href=\"http://cheminfo.informatics.indiana.edu/~rguha/code/java/nightly/junitsummary.html\">77 failing JUnit test</a>, and still a too large number of\n<a href=\"http://sourceforge.net/tracker/?atid=120024&amp;group_id=20024&amp;func=browse\">open bugs on SourceForge</a>, there is plenty of things to do today.</p>",
      "summary": "Because I was struggling hard with default values for cdk.interfaces fields, I did not have time to write up the Bug Squash Party report for day 3 (see also day 1 and day 2 ). But here it is.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/flyerScreeny.png",
      "date_published": "2006-09-22T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["cdk","bsp","java","pdb","conference"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/wzs4m-9ky43",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/09/20/cdk-bug-squash-party-day-2.html",
      "title": "CDK Bug Squash Party - Day 2",
      "content_html": "<p>Like <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/09/18/cdk-bug-squash-party-day-1.html\">yesterday <i class=\"fa-solid fa-recycle fa-xs\"></i></a> I will give short overview of things done at the\n<a href=\"http://cdk.sf.net/\">Chemistry Development Kit</a> <a href=\"http://wiki.cubic.uni-koeln.de/cdkwiki/doku.php?id=bsp200609\">Bug Squash Party</a> (BSP).\nI think Stefan was the only to fix and close a bug report yesterday. Rajarshi added the\n<a href=\"http://cheminfo.informatics.indiana.edu/~rguha/code/java/nightly/api/org/openscience/cdk/qsar/descriptors/molecular/MDEDescriptor.html\">MDE descriptor</a>\n(yes, during a BSP new code might be commited too ;)</p>\n\n<p>More interestingly, discussion on the <a href=\"http://sourceforge.net/mailarchive/forum.php?forum_id=2178\">developers mailing list</a> on the\npatch by Todd Martin of the <a href=\"http://www.epa.gov/\">EPA</a> to address deducing bond orders in\nSMILES parsing (the major source of current open bugs!). A problem seems to be when his tool should be called in the SmilesParser class.</p>\n\n<p>More details on the proceedings can be found on the <a href=\"http://wiki.cubic.uni-koeln.de/cdkwiki/doku.php?id=bsp200609\">BSP wiki page</a>.</p>",
      "summary": "Like yesterday I will give short overview of things done at the Chemistry Development Kit Bug Squash Party (BSP). I think Stefan was the only to fix and close a bug report yesterday. Rajarshi added the MDE descriptor (yes, during a BSP new code might be commited too ;)",
      
      "date_published": "2006-09-20T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["cdk","bsp","conference"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/y5srm-hzx87",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/09/18/cdk-bug-squash-party-day-1.html",
      "title": "CDK Bug Squash Party - Day 1",
      "content_html": "<p>I plan to do a daily coverage of the <a href=\"http://cdk.sf.net/\">Chemistry Development Kit</a> <a href=\"http://wiki.cubic.uni-koeln.de/cdkwiki/doku.php?id=bsp200609\">Bug Squash Party</a>\n(BSP). While Stefan was working hard to get the <a href=\"http://wiki.cubic.uni-koeln.de/\">wiki machine</a> back online after a hard-disc crash, Rajarshi,\nMiguel and me have been working hard. Miguel started to work on missing JUnit tests for <a href=\"http://sourceforge.net/tracker/?group_id=20024&amp;atid=120024\">bugs reported on SourceForge</a>\nand Rajarshi <a href=\"http://cia.navi.cx/stats/author/rajarshi\">fixed PMD, JavaDoc and other problems</a>. I wrote 19 new JUnit tests and fixed two bugs,\nbut with 44 bugs still open at SourceForge, there is quite some work to do. Luckily, several others will join in later this week.</p>\n\n<p>As can be read on the <a href=\"http://wiki.cubic.uni-koeln.de/cdkwiki/doku.php?id=bsp200609\">BSP wiki page</a>, there is work for everyone, on every level,\nand even for non-programmers. Or just stop by on <a href=\"irc://irc.freenode.net/#jmol\">CDK’s IRC channel</a> (link works with Konqueror,\nmaybe other browsers too) to see what a BSP looks like from the inside.</p>",
      "summary": "I plan to do a daily coverage of the Chemistry Development Kit Bug Squash Party (BSP). While Stefan was working hard to get the wiki machine back online after a hard-disc crash, Rajarshi, Miguel and me have been working hard. Miguel started to work on missing JUnit tests for bugs reported on SourceForge and Rajarshi fixed PMD, JavaDoc and other problems. I wrote 19 new JUnit tests and fixed two bugs, but with 44 bugs still open at SourceForge, there is quite some work to do. Luckily, several others will join in later this week.",
      
      "date_published": "2006-09-18T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["cdk","bsp","conference"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/zagc3-qnj56",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/09/15/chemoblogs-1.html",
      "title": "Chemo::Blogs #1",
      "content_html": "<p>There are a number of links I wanted to blog about, but never really had time for yet. Here’s a short review of a them.\n<a href=\"http://bioblogs.wordpress.com/\">Bio::Blogs</a> is a series of summary/review articles of bio related blogs, and definately\nworth putting in your aggregator. Maybe someone is interested in setting up a Chemo::Blogs for\n<a href=\"http://blueobelisk.org/pg/all_blogs.php\">chemistry blogs</a>?</p>\n\n<p>My <a href=\"http://del.icio.us/\">del.icio.us</a> (social bookmarking) <a href=\"http://del.icio.us/network/egonw\">network</a> informed me about\n<a href=\"http://www.w3.org/Talks/Tools/Slidy/\">HTML Slidy</a>, an XHTML based PowerPoint replacement. Being true XHTML, it allows\nembedding <a href=\"http://www.jmol.org/\">Jmol</a>, <a href=\"http://jchempaint.sf.net/\">JChemPaint</a> and any other applet. Embed your pieces\nof CML, MathML and SVG (or any other <a href=\"http://en.wikipedia.org/wiki/XML_namespace\">namespace</a>) and you no longer\n<a href=\"https://blogs.ch.cam.ac.uk/pmr/2006/09/10/hamburgers-and-cows-the-cognitive-style-of-pdf/\">have data loss <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.</p>\n\n<p><a href=\"http://nar.oxfordjournals.org/\">Nucleic Acids Research</a> recently had a special issue on webservers\n(DOI:<a href=\"http://dx.doi.org/10.1093/nar/gkl385\">10.1093/nar/gkl385</a>), in which <a href=\"https://incubator.apache.org/projects/taverna.html\">Taverna <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nwas featured (DOI:<a href=\"https://doi.org/10.1093/nar/gkl320\">10.1093/nar/gkl320</a>). Just want to mention once more that Taverna has\na chemoinformatics module: <a href=\"http://sourceforge.net/project/showfiles.php?group_id=20024&amp;package_id=166755\">CDK-Taverna</a>.</p>\n\n<p>Day and Motherwell published the paper <em>An Experiment in Crystal Structure Prediction by Popular Vote</em>\n(DOI:<a href=\"https://doi.org/10.1021/cg060313r\">10.1021/cg060313r</a>). It links to a<a href=\"http://pubs.acs.org/isubscribe/journals/cgdefu/asap/objects/cg060313r/CSP_popular_vote.html\"> openaccess website</a>\nto participate yourself. This is one way in which one have tigher integration of the internet with old-fashion publishing.</p>\n\n<p>And some minor notes: a video tutorial was put online in <a href=\"http://phobos.xtec.net/fmas/modules.php?name=News&amp;file=article&amp;sid=27\">this blog</a>\nthat shows how Jmol is inserted on a Moodle page. And, as <a href=\"http://plindenbaum.blogspot.com/2006/08/life-sciences-semantic-web-is-full-of.html\">Pierre reminded me</a>,\n<em>The Life Sciences Semantic Web is Full of Creeps!</em> (DOI:<a href=\"https://doi.org/10.1093/bib/bbl025\">10.1093/bib/bbl025</a>),\nwhich puts me in an identity crisis: hacker, chemist or creep. Mmmm…</p>",
      "summary": "There are a number of links I wanted to blog about, but never really had time for yet. Here’s a short review of a them. Bio::Blogs is a series of summary/review articles of bio related blogs, and definately worth putting in your aggregator. Maybe someone is interested in setting up a Chemo::Blogs for chemistry blogs?",
      
      "date_published": "2006-09-15T00:00:00+00:00",
      "date_modified": "2025-04-20T00:00:00+00:00",
      "tags": ["taverna","cdk"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1093/NAR/GKL385", "doi": "10.1093/NAR/GKL385"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1093/NAR/GKL320", "doi": "10.1093/NAR/GKL320"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1093/BIB/BBL025", "doi": "10.1093/BIB/BBL025"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1021/CG060313R", "doi": "10.1021/CG060313R"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/b3zbn-9w223",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/09/14/complex-pdb-documents-using-bioclipse.html",
      "title": "Complex PDB documents using the Bioclipse ChildResourceCreator",
      "content_html": "<p>Some time ago I blogged about the <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/08/22/bioclipse-gets-new-extension-point.html\">ChildResourceCreator extension point in Bioclipse <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nand hinted as using that for <a href=\"http://www.rcsb.org/pdb/\">PDB files</a>. which contain 3D molecular models, sequences and bibliographic information. Using the new extension point,\n<a href=\"http://www.bioclipse.net/\">Bioclipse</a> now treats PDB files as complex documents, creating child resources for the 3D molecular model (using the\n<a href=\"http://cdk.sf.net/\">CDK</a> plugin), and a sequence resource (using the <a href=\"http://www.biojava.org/\">BioJava</a> plugin).</p>\n\n<p><img src=\"/assets/images/bioclipseBioJavaSupport.png\" alt=\"\" /></p>",
      "summary": "Some time ago I blogged about the ChildResourceCreator extension point in Bioclipse and hinted as using that for PDB files. which contain 3D molecular models, sequences and bibliographic information. Using the new extension point, Bioclipse now treats PDB files as complex documents, creating child resources for the 3D molecular model (using the CDK plugin), and a sequence resource (using the BioJava plugin).",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/bioclipseBioJavaSupport.png",
      "date_published": "2006-09-14T00:00:00+00:00",
      "date_modified": "2024-12-29T00:00:00+00:00",
      "tags": ["bioclipse","biojava","cdk","pdb","jmol"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/gh2mq-4qc75",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/09/13/jmol-and-cdk-add-powerful-chemical.html",
      "title": "&quot;Jmol and the CDK add powerful chemical capabilities&quot;, says Munos in Nature Reviews Drug Discovery",
      "content_html": "<p><a href=\"http://www.nature.com/nrd/journal/vaop/ncurrent/authors/nrd2131.html\">Bernard Munos</a> at <a href=\"http://www.lilly.com/\">Eli Lilly &amp; Co.</a>\nwrote up a lengthy analysis on open source in drug discovery in <a href=\"http://www.nature.com/nrd/index.html\">Nature Reviews Drug Discovery</a>:\nCan open-source R&amp;D reinvigorate drug research? (DOI:<a href=\"https://doi.org/10.1038/nrd2131\">10.1038/nrd2131</a>). When scanning the article\nI saw this quote:</p>\n\n<p><em>Other tools such as eMolecules, Jmol or the Chemistry Development Kit are adding powerful chemical search and visualization\ncapabilities to the open-source scientist’s toolbox.</em></p>\n\n<p>Unfortunately, the paper does not point to the correct <a href=\"http://cdk.sf.net/\">CDK website</a>, but to the CUBIC backend at\n<a href=\"http://almost.cubic.uni-koeln.de/cdk\">http://almost.cubic.uni-koeln.de/cdk</a>. Moreover, I don’t think the quote does full justice to\nwhat the CDK has achieved in the past six years; I’m sure we have achieved more than a fingerprinter and some 2D and 3D rendering!</p>",
      "summary": "Bernard Munos at Eli Lilly &amp; Co. wrote up a lengthy analysis on open source in drug discovery in Nature Reviews Drug Discovery: Can open-source R&amp;D reinvigorate drug research? (DOI:10.1038/nrd2131). When scanning the article I saw this quote:",
      
      "date_published": "2006-09-13T00:00:00+00:00",
      "date_modified": "2025-02-16T00:00:00+00:00",
      "tags": ["drugdiscovery","jmol","cdk","opensource"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1038/nrd2131", "doi": "10.1038/nrd2131"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/am2k8-ygc58",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/09/08/chemical-archeology-oscar3-to.html",
      "title": "Chemical Archeology: OSCAR3 to NMRShiftDB.org",
      "content_html": "<p>Chemical Archeology (see <a href=\"http://wiki.cubic.uni-koeln.de/blog/pivot/entry.php?id=7#body\">Christoph’s comment</a>) is the\nprocess of extracting chemical information from old journal articles. Some time ago,\n<a href=\"http://wwmm.ch.cam.ac.uk/blogs/corbett/\">Peter Corbett</a> from the group of <a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/\">Peter Murray-Rust</a>\nvisited the <a href=\"http://almost.cubic.uni-koeln.de/jrg/\">CUBIC</a> to talk to us about\n<a href=\"http://wwmm.ch.cam.ac.uk/wikis/wwmm/index.php/Oscar3\">Oscar3</a> which can do just that. That day, we already\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2006/06/22/text-mining-for-chemistry-using-oscar3.html\">hooked OPSIN into Bioclipse <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.</p>\n\n<p>Oscar3, however, is capable of more than the name2structure of OPSIN (see also\n<a href=\"httpa://doi.org/10.1039/b411033a\">10.1039/b411033a</a>; it can take a plain text file with an experimental section\nwith details on the synthesis of small organic compounds, and analyze the chemistry in that. This functionality has been\navailable as <a href=\"http://www.rsc.org/Publishing/ReSourCe/AuthorGuidelines/AuthoringTools/index.asp\">an RSC authoring tool</a>\nfor some time now (see also <a href=\"https://doi.org/10.1039/b411699m\">10.1039/b411699m</a>). Unfortunately, what publisher put\nonline (PDF and HTML) is much more difficult to process with Oscar3: those formats are often optimized for display,\nnot for machine processing. The HTML can be cleaned up, but there is no general approach.</p>\n\n<p><a href=\"http://wiki.cubic.uni-koeln.de/blog/\">Christoph Steinbeck</a> is going to present at the\n<a href=\"http://www.chemistry.org/portal/a/c/s/1/acsdisplay.html?DOC=meetings%5Csanfrancisco2006%5Chome.html\">upcoming ACS meeting</a>\nthe use of Oscar3 for extraction of NMR spectra from old journal article, in preperation for submission to the\n<a href=\"http://www.nmrshiftdb.org/\">NMRShiftDB.org</a> (see the <a href=\"http://wiki.cubic.uni-koeln.de/blog/pivot/entry.php?id=4#body\">abstract</a>\nof <a href=\"http://oasys2.confex.com/acs/232nm/techprogram/P981204.HTM\">CINF 101</a>).</p>\n\n<p>Since the full Oscar3 was not hooked into <a href=\"http://www.bioclipse.net/\">Bioclipse</a> yet, I had some work to do. It took me\nsome time to figure out how to properly configure Oscar3, and what additional things I had to do to clean up the HTML\nused by publishers to get Oscar3 to extract NMR spectra (thanx to PeterC for hints!). I also had to tweak the Oscar3\ncode itself here and there, but that’s what opensource is about :) (Peter, if you are reading this: I have a number\nof patches for the Oscar3 code in <a href=\"http://svn.sourceforge.net/viewvc/bioclipse/trunk/bc_oscar/\">bc_oscar</a>;\nlet me know if you’re interested in them.)</p>\n\n<p>This is the end result:</p>\n\n<p><img src=\"/assets/images/oscar1.png\" alt=\"\" /></p>\n\n<p>Note especially the hierarchy in the resource navigator on the left. The misc folder contains all the chemistry found in the article. But more importantly is that for six molecules it fully detected he experimental section! For 3-(2-Oxocyclooctanyl)-3-phenylpropan-1-al (InChI=1/C17H22O2/c18-13-12-15(14-8-4-3-5-9-14)16-10-6-1-2-7-11-17(16)19/h3-5,8-9,13,15-16H,1-2,6-7,10-12H2) it derived the molecular structure (with OPSIN), and a few spectra: H-NMR, high-resolution MS and IR.</p>\n\n<p>So, if you attend the ACS meeting: make sure to visit Christoph’s CINF 101 presentation!</p>",
      "summary": "Chemical Archeology (see Christoph’s comment) is the process of extracting chemical information from old journal articles. Some time ago, Peter Corbett from the group of Peter Murray-Rust visited the CUBIC to talk to us about Oscar3 which can do just that. That day, we already hooked OPSIN into Bioclipse .",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/oscar1.png",
      "date_published": "2006-09-08T00:00:00+00:00",
      "date_modified": "2025-02-16T00:00:00+00:00",
      "tags": ["oscar","bioclipse","acs","chemistry","textmining","nmrshiftdb"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1039/b411033a", "doi": "10.1039/b411033a"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1039/b411699m", "doi": "10.1039/b411699m"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/b3ggs-7vt20",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/09/08/biojava-15-beta-released.html",
      "title": "BioJava 1.5 beta released",
      "content_html": "<p><a href=\"http://www.bioservices.net/2006/09/biojava-15-beta-released.html\">Martin Szugat reported</a> that a beta for <a href=\"http://biojava.org/wiki/BioJava:Download\">BioJava 1.5</a>\nhas been released. New features include: a new <a href=\"http://www.biojava.org/docs/api15b/index.html\">biojavax</a> package with extension on the basic functionlity, such as\nthe <code class=\"language-plaintext highlighter-rouge\">RichSequence.IOTools</code> and the <code class=\"language-plaintext highlighter-rouge\">RichSequence</code> object; a <a href=\"http://biojava.org/wiki/BioJava:BioJavaXDocs#Genetic_Algorithms\">genetic algorithm library</a>; features\nthat allow manipulation of 3D structure files and objects; and non-HMM implementations of the NW and SW alignment algorithms. The announcement also mentions a new\npackage for handling external processes (org.biojava.utils.process); I am wondering what that is about. I will upload this beta to Bioclipse\n<a href=\"http://svn.sourceforge.net/viewvc/bioclipse/trunk/bc_biojava/\">trunk/bc_biojava/</a> shortly, so that we can play with it.</p>",
      "summary": "Martin Szugat reported that a beta for BioJava 1.5 has been released. New features include: a new biojavax package with extension on the basic functionlity, such as the RichSequence.IOTools and the RichSequence object; a genetic algorithm library; features that allow manipulation of 3D structure files and objects; and non-HMM implementations of the NW and SW alignment algorithms. The announcement also mentions a new package for handling external processes (org.biojava.utils.process); I am wondering what that is about. I will upload this beta to Bioclipse trunk/bc_biojava/ shortly, so that we can play with it.",
      
      "date_published": "2006-09-08T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["biology","java","biojava"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/sy6r5-pzw09",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/09/02/calculating-geometrical-properties.html",
      "title": "Calculating geometrical properties with the CDK",
      "content_html": "<p><a href=\"http://cheminformatics.seesaa.net/\">ケムインフォマティクスに虚空投げ</a> runs <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/09/02/calculating-geometrical-properties.html\">a story on how to calculate geometrical\nproperties of a 3D structure <i class=\"fa-solid fa-recycle fa-xs\"></i></a> using\nCDK’s <a href=\"http://cdk.sourceforge.net/api/org/openscience/cdk/modeling/forcefield/ForceFieldTools.html\">ForceFieldTools</a>.\nThis class contains a few methods to calculate distances between atoms and angles between bonds.</p>\n\n<p>This tools class is special as it uses vecmath GVector objects, which just contain atomic coordinates, likely suitable\nfor extensive computation, as expected in <a href=\"http://cdk.sourceforge.net/api/org/openscience/cdk/modeling/forcefield/package-frame.html\">CDK’s force field implementation</a>.\nHowever, for just calculating the distance and angles, there are simpler alternatives.</p>\n\n<p>The distance between two atoms can be calculated with:</p>\n\n<div class=\"language-java highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">atom1</span> <span class=\"o\">=</span> <span class=\"n\">molecule</span><span class=\"o\">.</span><span class=\"na\">getAtom</span><span class=\"o\">(</span><span class=\"mi\">0</span><span class=\"o\">);</span>\n<span class=\"n\">atom2</span> <span class=\"o\">=</span> <span class=\"n\">molecule</span><span class=\"o\">.</span><span class=\"na\">getAtom</span><span class=\"o\">(</span><span class=\"mi\">1</span><span class=\"o\">);</span>\n<span class=\"kt\">double</span> <span class=\"n\">dist</span> <span class=\"o\">=</span> <span class=\"n\">atom1</span><span class=\"o\">.</span><span class=\"na\">getPoint3d</span><span class=\"o\">().</span><span class=\"na\">distance</span><span class=\"o\">(</span><span class=\"n\">atom2</span><span class=\"o\">.</span><span class=\"na\">getPoint3d</span><span class=\"o\">());</span>\n</code></pre></div></div>\n\n<p>or, by constructing a vector for the bond first:</p>\n\n<div class=\"language-java highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nc\">Vector3d</span> <span class=\"n\">bond1to2</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"nc\">Vector3d</span><span class=\"o\">(</span><span class=\"n\">atom2</span><span class=\"o\">.</span><span class=\"na\">getPoint3d</span><span class=\"o\">());</span>\n<span class=\"n\">bond1to2</span><span class=\"o\">.</span><span class=\"na\">sub</span><span class=\"o\">(</span><span class=\"n\">atom1</span><span class=\"o\">.</span><span class=\"na\">getPoint3d</span><span class=\"o\">());</span>\n<span class=\"kt\">double</span> <span class=\"n\">dist</span> <span class=\"o\">=</span> <span class=\"n\">bond1to2</span><span class=\"o\">.</span><span class=\"na\">length</span><span class=\"o\">();</span>\n</code></pre></div></div>\n\n<p>Using vectors to represent bond (with two atoms!), allows easily calculating angles too (assuming the bonds shard atom1):</p>\n\n<div class=\"language-java highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kt\">double</span> <span class=\"n\">angle</span> <span class=\"o\">=</span> <span class=\"n\">bond1to2</span><span class=\"o\">.</span><span class=\"na\">angle</span><span class=\"o\">(</span><span class=\"n\">bond1to3</span><span class=\"o\">);</span>\n</code></pre></div></div>\n\n<p>Vecmath does not seem to contain a convenience method for calculating torsion angles :(</p>",
      "summary": "ケムインフォマティクスに虚空投げ runs a story on how to calculate geometrical properties of a 3D structure using CDK’s ForceFieldTools. This class contains a few methods to calculate distances between atoms and angles between bonds.",
      
      "date_published": "2006-09-02T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["cdk"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/e8r2m-ja015",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/08/25/r-news-special-issue-on-chemistry.html",
      "title": "R News special issue on chemistry",
      "content_html": "<p><a href=\"http://cran.r-project.org/doc/Rnews/\">R News</a> just released a <a href=\"http://cran.r-project.org/doc/Rnews/Rnews_2006-3.pdf\">special issue</a> on\nthe use of the versatile statistics program <a href=\"http://www.r-project.org/\">R</a> in chemistry. It features six articles amongst which one by\nRajarshi Guha on the <a href=\"http://cdk.sf.net/\">CDK</a>-R bridge, and one by my supervisor and me on the use of self-organizing maps to\ncluster crystal structures.</p>",
      "summary": "R News just released a special issue on the use of the versatile statistics program R in chemistry. It features six articles amongst which one by Rajarshi Guha on the CDK-R bridge, and one by my supervisor and me on the use of self-organizing maps to cluster crystal structures.",
      
      "date_published": "2006-08-25T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["rstats","chemistry"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/xge7p-17184",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/08/25/chemical-blogspace.html",
      "title": "Chemical blogspace",
      "content_html": "<p>We all know <a href=\"http://en.wikipedia.org/wiki/Chemical_space\">chemical space</a>; <a href=\"http://wiki.cubic.uni-koeln.de/pg/\">Chemical blogspace</a> (Cb) is different:\nit is the chemistry discussed in <a href=\"http://en.wikipedia.org/wiki/Blogspace\">blogspace</a>. Cb is build on the\n<a href=\"http://postgenomic.org/\">opensource software</a> of <a href=\"http://postgenomic.com/\">Postgenomic.com</a> which I bloged on\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2006/02/15/hot-articles-mining-semantic-web.html\">before <i class=\"fa-solid fa-recycle fa-xs\"></i></a>. The now running Cb aggregates\n<a href=\"http://wiki.cubic.uni-koeln.de/pg/all_blogs.php\">19 blogs</a> and, like the original, extracts linked (cited or reviewed) articles from literature.</p>\n\n<p><img src=\"/assets/images/chemblogspace.png\" alt=\"\" /></p>\n\n<p>The system is beta, but I am happy about it already that I mention it now. For example, some article titles are not properly recognized,\nand some journals are known in the statistics in several formats. And, more importantly, I have not yet hooked in the\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2006/02/25/hacking-inchi-support-into.html\">InChI <i class=\"fa-solid fa-recycle fa-xs\"></i></a> support I developed earlier.</p>\n\n<p>So, if you like the idea, or know other interesting scientifically interesting chemistry blogs, leave a comment, or send me email.</p>",
      "summary": "We all know chemical space; Chemical blogspace (Cb) is different: it is the chemistry discussed in blogspace. Cb is build on the opensource software of Postgenomic.com which I bloged on before . The now running Cb aggregates 19 blogs and, like the original, extracts linked (cited or reviewed) articles from literature.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/chemblogspace.png",
      "date_published": "2006-08-25T00:00:00+00:00",
      "date_modified": "2024-12-29T00:00:00+00:00",
      "tags": ["cb","feeds","chemistry"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/x2d47-s7776",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/08/22/bioclipse-gets-new-extension-point.html",
      "title": "Bioclipse gets a new extension point",
      "content_html": "<p>I hacked in a new extension point for <a href=\"http://www.bioclipse.net/\">Bioclipse</a> yesterday, based on a <a href=\"http://wiki.bioclipse.net/index.php?title=ChildCreator_extension_point\">proposal</a>\nI made earlier. The new extension point (EP) is called <code class=\"language-plaintext highlighter-rouge\">ChildResourceCreator</code> and allows creating child resources for a given IBioResource. One application where this is very useful is the\n<a href=\"http://dx.doi.org/10.1021/ci034244p\">CMLRSS application</a> (<a href=\"https://chem-bla-ics.linkedchemistry.info/2006/07/03/avi-movies-of-cmlrss-howto-in.html\">earlier blog <i class=\"fa-solid fa-recycle fa-xs\"></i></a>), or any\n<a href=\"http://en.wikipedia.org/wiki/RSS_(file_format)\">RSS</a> or <a href=\"http://www.atomenabled.org/\">Atom</a> enriched with any other XML language. Here, child resources are\ncreated for each feed entry resource with as content the foreign XML, e.g. the CML bits in the blog.</p>\n\n<p>Other applications involve complex documents, which is basically most existing documents. Take, for example, the\n<a href=\"http://www.rcsb.org/pdb/static.do?p=file_formats/pdb/index.html\">PDB format</a> from the <a href=\"http://www.rcsb.org/pdb/\">PDB database</a>. These PDB files contain a pletory\nof information including one or more protein structures, sequences and bibliographic information. Bioclipse supports each of those using the\n<a href=\"http://cdk.sf.net/\">CDK</a>, <a href=\"http://biojava.org/\">BioJava</a> and <a href=\"http://jabref.sf.net/\">JabRef</a> libraries.</p>\n\n<p>By making extension for the <code class=\"language-plaintext highlighter-rouge\">ChildResourceCreator</code> EP, I am able to setup a general PDBResource (with Bioclipse’s syntax highlighted PDB editor),\nand child resources for the different bits of information. <a href=\"http://sourceforge.net/project/showfiles.php?group_id=150681\">Bioclipse 1.0</a>, however,\nonly allow looking at the molecular structure(s) in the file, not at the sequence, nor the references. Will post the obligatory screenshot asap.</p>",
      "summary": "I hacked in a new extension point for Bioclipse yesterday, based on a proposal I made earlier. The new extension point (EP) is called ChildResourceCreator and allows creating child resources for a given IBioResource. One application where this is very useful is the CMLRSS application (earlier blog ), or any RSS or Atom enriched with any other XML language. Here, child resources are created for each feed entry resource with as content the foreign XML, e.g. the CML bits in the blog.",
      
      "date_published": "2006-08-22T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["bioclipse","feeds","cml"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/k7h76-cy148",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/08/21/cml-explained.html",
      "title": "CML Explained",
      "content_html": "<p>Recently, a new generation of <a href=\"http://www.xml-cml.org/\">Chemical Markup Language</a> CML users seem to hit the\nlearning-curve-wall; there seems to be a niche in explaining the use of CML,\n<a href=\"http://cmlexplained.blogspot.com/2006/08/cml-explained.html\">so here goes</a>. My new (third) blog will discuss\nfrequently and less frequently asked questions about the use of CML.</p>",
      "summary": "Recently, a new generation of Chemical Markup Language CML users seem to hit the learning-curve-wall; there seems to be a niche in explaining the use of CML, so here goes. My new (third) blog will discuss frequently and less frequently asked questions about the use of CML.",
      
      "date_published": "2006-08-21T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["cml"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/269sm-8sn42",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/08/18/small-java-applet-for-2d-structure.html",
      "title": "Small java applet for 2D structure drawing",
      "content_html": "<p>Trepalin et al. published in <a href=\"http://mdpi.org/molecules/\">Molecules</a> the article <em>A Java Chemical Structure Editor Supporting the Modular Chemical Descriptor\nLanguage (MCDL)</em> (open access <a href=\"http://mdpi.org/molecules/papers/11040219.pdf\">PDF</a>). The applet is about 250kB (though the article mentions 200kB) in size and\ndownloadable from the <a href=\"http://sourceforge.net/projects/mcdl/\">MCDL</a> project on SourceForge (license: Public Domain). The article compares the applet with the\n<a href=\"http://jchempaint.sf.net/\">JChemPaint</a> applet and notes that their applet is much smaller. Both allow a template database for automated structure diagram\ngeneration, and the database that comes with the MCDL applet contains 105 fragments, whereas the JChemPaint applet contains a few.</p>\n\n<p>The article also discusses the algorithm they use to deduce bond orders, starting from the MCDL, a problem <a href=\"http://cdk.sf.net/\">CDK</a> is struggling with when\ndealing with SMILES strings.</p>",
      "summary": "Trepalin et al. published in Molecules the article A Java Chemical Structure Editor Supporting the Modular Chemical Descriptor Language (MCDL) (open access PDF). The applet is about 250kB (though the article mentions 200kB) in size and downloadable from the MCDL project on SourceForge (license: Public Domain). The article compares the applet with the JChemPaint applet and notes that their applet is much smaller. Both allow a template database for automated structure diagram generation, and the database that comes with the MCDL applet contains 105 fragments, whereas the JChemPaint applet contains a few.",
      
      "date_published": "2006-08-18T00:00:00+00:00",
      "date_modified": "2025-02-16T00:00:00+00:00",
      "tags": ["java"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.3390/11040219", "doi": "10.3390/11040219"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/z4kfz-xcy58",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/08/14/classpath-092-has-been-released.html",
      "title": "Classpath 0.92 has been released",
      "content_html": "<p><a href=\"http://gnu.wildebeest.org/diary/index.php?p=163\">Bling! Bling!</a>. Mark Wielaard announced the <a href=\"http://savannah.gnu.org/forum/forum.php?forum_id=4573\">GNU Classpath 0.92</a>\nrelease, with the following changes: <em>an alternative awt peer implementation based on Escher that uses the X protocol directly. Various ImageIO providers for png,\ngif and bmp images. Support for reading and writing midi files and reading .au and .wav files have been added. Various tools and support classes have been added\nfor jar, native2ascii, serialver, keytool, jarsigner. A GConf based util.peers backend has been added. Support for using alternative root certificate authorities\nwith the security and crypto packages. Start of javax.management and runtime lang.managment runtime support. NIO channels now support scatter-gather operations.</em></p>\n\n<p>IMAGE LOST</p>\n\n<p>This means new items on my TODO list: remove the dust from the <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/02/06/test-suite-for-free-open-source-jvms.html\">CDK based test suite\n<i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\ntest if <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/03/11/classpath-090-makes-jmol-application.html\">Jmol <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2005/11/20/open-source-swing-jchempaint-runs.html\">JChemPaint <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2006/05/18/taverna-runs-with-classpath-091.html\">Taverna <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nstill work, and report the outcome on the <a href=\"http://developer.classpath.org/mediation/FreeSwingTestApps\">Classpath website</a>. I wonder how the Cairo\nand Escher patches for AWT and Swing affect my favorite chemblaics tools.</p>\n\n<p>BTW, that the Classpath team appreciates such testing efforts is clear from the foto in the ‘Bling! Bling!’ blog by Mark mentioned above.</p>",
      "summary": "Bling! Bling!. Mark Wielaard announced the GNU Classpath 0.92 release, with the following changes: an alternative awt peer implementation based on Escher that uses the X protocol directly. Various ImageIO providers for png, gif and bmp images. Support for reading and writing midi files and reading .au and .wav files have been added. Various tools and support classes have been added for jar, native2ascii, serialver, keytool, jarsigner. A GConf based util.peers backend has been added. Support for using alternative root certificate authorities with the security and crypto packages. Start of javax.management and runtime lang.managment runtime support. NIO channels now support scatter-gather operations.",
      
      "date_published": "2006-08-14T00:00:00+00:00",
      "date_modified": "2006-08-14T00:00:00+00:00",
      "tags": ["java","cdk","jchempaint","taverna"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/m201y-8xs64",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/08/10/fortran-and-xml-fox-reads-and-writes.html",
      "title": "Fortran and XML: FoX reads and writes CML",
      "content_html": "<p>Mix one of the oldest and one of the latest computer technologies, and you get <a href=\"http://www.uszla.me.uk/software/FoX.html\">FoX</a>\n(BSD license), a <a href=\"http://en.wikipedia.org/wiki/Fortran\">Fortran</a> library for reading and writing <a href=\"http://www.xml-cml.org/\">Chemical Markup Language</a>,\nand thus <a href=\"http://www.w3.org/XML/\">XML</a>. Amazing, what Toby White achieved, though he did not start from scratch:\n<em>“FoX evolved from the initial codebase of <a href=\"http://lcdx00.wm.lc.ehu.es/ag/xml/\">xmlf90</a>, which was written largely by Alberto\nGarcia and Jon Wakelin.”</em> (source: <a href=\"http://sourceforge.net/mailarchive/forum.php?forum=cml-discuss\">cml-discuss mailing list</a>).</p>",
      "summary": "Mix one of the oldest and one of the latest computer technologies, and you get FoX (BSD license), a Fortran library for reading and writing Chemical Markup Language, and thus XML. Amazing, what Toby White achieved, though he did not start from scratch: “FoX evolved from the initial codebase of xmlf90, which was written largely by Alberto Garcia and Jon Wakelin.” (source: cml-discuss mailing list).",
      
      "date_published": "2006-08-10T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["cml"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/m0wwb-ty759",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/08/06/new-atomelementscarbon.html",
      "title": "new Atom(Elements.CARBON);",
      "content_html": "<p>Something I have not completely comfortable with about the <a href=\"http://cdk.sf.net/\">CDK</a> in the past, is the way\n<a href=\"http://cdk.sourceforge.net/api/org/openscience/cdk/Atom.html\">Atom</a>’s are constructed:</p>\n\n<div class=\"language-java highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>  <span class=\"nc\">IAtom</span> <span class=\"n\">carbon</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"nc\">Atom</span><span class=\"o\">(</span><span class=\"s\">\"C\"</span><span class=\"o\">);</span>\n</code></pre></div></div>\n\n<p>Not that it is horrible code, but the CDK has an <a href=\"http://cdk.sourceforge.net/api/org/openscience/cdk/Element.html\">Element</a>\ntoo. Why not reuse that? However, until revision 6755 there were not constructors that allowed something like the following:</p>\n\n<div class=\"language-java highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>  <span class=\"nc\">IAtom</span> <span class=\"n\">carbon</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"nc\">Atom</span><span class=\"o\">(</span><span class=\"k\">new</span> <span class=\"nc\">Element</span><span class=\"o\">(</span><span class=\"s\">\"C\"</span><span class=\"o\">));</span>\n</code></pre></div></div>\n\n<p>This afternoon I have hacked in constructors for <a href=\"http://cdk.sourceforge.net/api/org/openscience/cdk/ChemObject.html\">ChemObject</a>,\nElement, <a href=\"http://cdk.sourceforge.net/api/org/openscience/cdk/Isotope.html\">Isotope</a>, <a href=\"http://cdk.sourceforge.net/api/org/openscience/cdk/AtomType.html\">AtomType</a>,\nAtom and <a href=\"http://cdk.sourceforge.net/api/org/openscience/cdk/PseudoAtom.html\">PseudoAtom</a> that allow to be constructed from its\ninterface, or the interface of one of its superclasses.</p>\n\n<p>Additionally, in revision 6753, I added <a href=\"http://svn.sourceforge.net/viewvc/cdk/trunk/cdk/src/org/openscience/cdk/config/Elements.java\">cdk.config.Elements</a> with\nstatic IElements for all elements up to atomic number 116, taken from the <a href=\"http://www.blueobelisk.org/\">Blue Obelisk Data Repository</a>.\nTherefore, I can now also write:</p>\n\n<div class=\"language-java highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>  <span class=\"nc\">IAtom</span> <span class=\"n\">carbon</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"nc\">Atom</span><span class=\"o\">(</span><span class=\"nc\">Elements</span><span class=\"o\">.</span><span class=\"na\">CARBON</span><span class=\"o\">);</span>\n</code></pre></div></div>",
      "summary": "Something I have not completely comfortable with about the CDK in the past, is the way Atom’s are constructed:",
      
      "date_published": "2006-08-06T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["cdk"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/k9w2j-xb784",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/08/03/blueobelisk-components-in-japanese.html",
      "title": "BlueObelisk components in Japanese",
      "content_html": "<p><a href=\"http://technorati.com/\">Technorati</a> is nice in several ways, one being the feature to set up a <a href=\"http://technorati.com/watchlist/\">watchlist</a>.\nI have set watches on <em>chemoinformatics, Jmol, Bioclipse</em> and a few more. This allows me see the latest blog items on these topics. Often,\nthe point to Asian blogs, mostly Chinese and Japanese, which I mostly find hard to read. Funny characters with <em>Jmol</em> somewhere in the sentence :)</p>\n\n<p>Yesterday, I found this way a rather interesting Japanese blog, called <a href=\"http://cheminformatics.seesaa.net/\">ケムインフォマティクスに虚空投げ</a>,\nwhich I still can’t read, but which has a lot of small code fragments. (Can someone please translate the title for me??) The last 10-ish items\ndiscuss fingerprints calculation with the <a href=\"http://cdk.sf.net/\">CDK</a> and <a href=\"http://joelib.sf.net/\">JOELib</a>, some SMARTS work with JOELib, and some\ndiscussion on neural network tools.</p>",
      "summary": "Technorati is nice in several ways, one being the feature to set up a watchlist. I have set watches on chemoinformatics, Jmol, Bioclipse and a few more. This allows me see the latest blog items on these topics. Often, the point to Asian blogs, mostly Chinese and Japanese, which I mostly find hard to read. Funny characters with Jmol somewhere in the sentence :)",
      
      "date_published": "2006-08-03T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["cdk","joelib","technorati"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/6az21-1dt76",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/08/01/cdk-and-java-6-beta.html",
      "title": "CDK and the Java 6 beta",
      "content_html": "<p>Recently, a second beta of Java 6 was <a href=\"http://java.sun.com/javase/downloads/ea.jsp\">released</a>, which triggered a\n<a href=\"http://lists.alioth.debian.org/pipermail/pkg-java-maintainers/2006-June/008385.html\">patch</a> for the\n<a href=\"http://www.debian.org/\">Debian</a> <a href=\"http://packages.debian.org/java-package\">java-package</a> package. It was a Bioclipse\n<a href=\"http://sourceforge.net/tracker/index.php?func=detail&amp;aid=1532612&amp;group_id=150681&amp;atid=778609\">bug report</a> today,\nhowever, which made me patch my java-package setup and install the beta.</p>\n\n<p>So, next thing was to try to get the <a href=\"http://cdk.sf.net/\">CDK</a> compile with the Java 6 beta. Because our build system uses\nJavaDoc (anyone with a pointer with a easy to use Java parser, which parses JavaDoc too?), and because this setup is\ndifferent for literally every platform and Java version, the <a href=\"http://svn.sourceforge.net/viewvc/cdk/trunk/cdk/build.xml?view=log\">build.xml</a>\nneeded some tweaking (patch 6719 and 6721). Additionally, a number of source files were marked as needing Java 1.5, while they actually\ndepend on features introduced in Java 5 (aka 1.5) and which are present in Java 6 (aka 1.6) too, so that needed some tweaking\ntoo (patch 6720).</p>\n\n<p>I have no idea what Java 6 will change and/or introduce, but I did note some comments on it being faster, which is always a good thing.\nThe <a href=\"http://www.junit.org/\">JUnit</a> test timings seems to agree with this. While my Java 1.5.0_06 installation needed 204 seconds\n(no duplicates), Java 1.6.0_beta2 needed only 168 seconds (no duplicates), and improvement of 18%.</p>",
      "summary": "Recently, a second beta of Java 6 was released, which triggered a patch for the Debian java-package package. It was a Bioclipse bug report today, however, which made me patch my java-package setup and install the beta.",
      
      "date_published": "2006-08-01T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["cdk","java","debian"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/94e4t-2q855",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/07/13/context-help-in-bioclipse.html",
      "title": "Context help in Bioclipse",
      "content_html": "<p>The <a href=\"http://www.eclipse.org/\">Eclipse</a> <a href=\"http://wiki.eclipse.org/index.php/Rich_Client_Platform\">Rich Client Platform (RCP)</a> is very powerfull,\nand takes a lot of architectural things of your hand when developing a bio- and chemoinformatics GUIs. <a href=\"http://www.bioclipse.net/\">Bioclipse</a>\nis based on it. One thing the RCP offers is a Help View which works with plain (X)HTML files, and one neat feature is the context help. It is\nhelp shown in the Help View when one focused on a specific GUI element.</p>\n\n<p>As an example, the below figure gives the context help for the JmolView in the <a href=\"http://www.jmol.org/\">Jmol</a> plugin\n(<a href=\"http://wiki.bioclipse.net/index.php?title=Jmol_plugin\">bc_jmol</a>) plugin for Bioclipse:</p>\n\n<p><img src=\"/assets/images/contextHelp.png\" alt=\"\" /></p>\n\n<p>On the right side of the Jmol view (showing <a href=\"http://www.pdb.org/pdb/navbarsearch.do?newSearch=yes&amp;isAuthorSearch=no&amp;radioset=All&amp;inputQuickSearch=1SPX\">1SPX</a>)\nis the Help View, showing the context help for the Jmol View pointing to the ‘Jmol Script Commands Reference’.</p>",
      "summary": "The Eclipse Rich Client Platform (RCP) is very powerfull, and takes a lot of architectural things of your hand when developing a bio- and chemoinformatics GUIs. Bioclipse is based on it. One thing the RCP offers is a Help View which works with plain (X)HTML files, and one neat feature is the context help. It is help shown in the Help View when one focused on a specific GUI element.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/contextHelp.png",
      "date_published": "2006-07-13T00:00:00+00:00",
      "date_modified": "2006-07-13T00:00:00+00:00",
      "tags": ["jmol","bioclipse"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/6924n-01r62",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/07/11/matrix-support-in-bioclipse.html",
      "title": "Matrix support in Bioclipse",
      "content_html": "<p>With <a href=\"http://en.wikipedia.org/wiki/Chemometrics\">chemometrics</a> in mind (QSAR, data mining, …), I have started working on matrix support in\n<a href=\"http://www.bioclipse.net/\">Bioclipse</a>, because the matrix is the important step between (bio-)molecular content and statistical analysis.\nI implemented this such that the actual matrix implementation can be freely chosen, that is,\n<a href=\"http://svn.sourceforge.net/viewcvs.cgi/bioclipse/trunk/bc_statistical/\">bc_statistical</a> provides a <code class=\"language-plaintext highlighter-rouge\">IMatrixImplementation</code> extension point.\nThe plugin <a href=\"http://svn.sourceforge.net/viewcvs.cgi/bioclipse/trunk/bc_jama/\">bc_jama</a> provides a <a href=\"http://math.nist.gov/javanumerics/jama/\">JAMA</a>\nbased extension for this, but other implementations are possible, and possibly useful.</p>\n\n<p>The second component provided by the new statistics plugin, is the MatrixResource, a <a href=\"http://wiki.bioclipse.net/index.php?title=Bioclipse_object_model\">BioResource</a>\nfor documents (e.g. files on the harddisk) that represent a matrix. However, Bioclipse can create such matrices on the fly too, and these do not necessarily have\nto be stored on disk, as is general for BioResource’s. This makes it possible for other plugins to create matrices from other resources: for example, the\n<a href=\"http://cdk.sf.net/\">CDK</a> plugin can now have an action that converts a SDF file into a QSAR data matrix.</p>\n\n<p>The MatrixResource can be edited using a plain text editor, and a more visually attractive graphical editor based on the\n<a href=\"http://sourceforge.net/projects/ktable\">KTable</a> SWT widget:</p>\n\n<p><img src=\"/assets/images/bioclipseMatrixSupport.png\" alt=\"\" /></p>\n\n<p>The next step is to work on column and row names, and replace those uninformative X’s. As you can see in the Properties View, I also need to tweak adding and\nremoving advanced properties a bit. And then it is time to have the CDK plugin create a QSAR data matrix.</p>",
      "summary": "With chemometrics in mind (QSAR, data mining, …), I have started working on matrix support in Bioclipse, because the matrix is the important step between (bio-)molecular content and statistical analysis. I implemented this such that the actual matrix implementation can be freely chosen, that is, bc_statistical provides a IMatrixImplementation extension point. The plugin bc_jama provides a JAMA based extension for this, but other implementations are possible, and possibly useful.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/bioclipseMatrixSupport.png",
      "date_published": "2006-07-11T00:00:00+00:00",
      "date_modified": "2006-07-11T00:00:00+00:00",
      "tags": ["bioclipse","chemometrics","qsar","cdk"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/5rq0q-4ht07",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/07/03/avi-movies-of-cmlrss-howto-in.html",
      "title": "AVI movies of CMLRSS howto in Bioclipse",
      "content_html": "<p>David Strumfels posted news <a href=\"https://web.archive.org/web/20061011100407/http://usefulchem.blogspot.com/2006/07/cml-in-rss-feeds.html\">about the Useful Chemistry CMLRSS feed <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>.\nHe explains how this feed can be accessed using <a href=\"http://www.jmol.org/\">Jmol</a> and <a href=\"http://www.bioclipse.net/\">Bioclipse</a>. The latter are accompanied by two AVI\nmovies: <a href=\"http://showme.physics.drexel.edu/usefulchem/Software/bioclipse/CreatingAnOPML.avi\">one about creating a new OPML file</a>, and\n<a href=\"http://showme.physics.drexel.edu/usefulchem/Software/bioclipse/UsingAnOPML.avi\">one about accessing the CMLRSS file from the OPML</a>.</p>",
      "summary": "David Strumfels posted news about the Useful Chemistry CMLRSS feed . He explains how this feed can be accessed using Jmol and Bioclipse. The latter are accompanied by two AVI movies: one about creating a new OPML file, and one about accessing the CMLRSS file from the OPML.",
      
      "date_published": "2006-07-03T00:00:00+00:00",
      "date_modified": "2006-07-03T00:00:00+00:00",
      "tags": ["cml","rss","bioclipse","jmol"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/c8ttv-08d66",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/07/01/new-chemistry-on-desktop-blog.html",
      "title": "New chemistry-on-the-desktop blog",
      "content_html": "<p>I started a spin-off blog earlier this week: <a href=\"http://kemistry-desktop.blogspot.com/\">kemistry desktop environment</a>. It will deal with\nintegration of chemistry on opensource desktops, with <a href=\"http://www.kde.org/\">KDE</a> as one of them. Today, it features an\n<a href=\"http://kemistry-desktop.blogspot.com/2006/07/overview-of-earlier-blogs.html\">overview of earlier blogs</a> on the subject in this new blog.</p>",
      "summary": "I started a spin-off blog earlier this week: kemistry desktop environment. It will deal with integration of chemistry on opensource desktops, with KDE as one of them. Today, it features an overview of earlier blogs on the subject in this new blog.",
      
      "date_published": "2006-07-01T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["kde"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/62e2c-ycj21",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/06/25/kde4-keyword-support-mockups.html",
      "title": "KDE4 keyword support mockups",
      "content_html": "<p>In reply to interesting comments to <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/06/20/strigi-gets-kfile-plugin-support.html\">my previous blog <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\non <a href=\"http://www.vandenoever.info/software/strigi/\">Strigi</a> and xAttr support in <a href=\"http://www.kde.org/\">KDE</a>4, I would like to suggest\nthe following mockups, which I would find very useful. The deal with the ability to store keywords, for example, not but necessarily\nusing xAttr. I have no idea on how to implement these mockups, so any help or pointers are appreciated.</p>\n\n<p>The first plot is an example of how these keyword markup could be used in KDE, other than searching itself. When showing the properties\nof a directory in KDE, it would show an overview of hottest keywords for that directory, such as used on social bookmark website like\n<a href=\"http://technorati.com/\">Technorati</a> too:</p>\n\n<p><img src=\"/assets/images/kfileXAttrSupport.png\" alt=\"\" /></p>\n\n<p>This example shows that the keyword ‘Strigi’ was used much inside the index_files directory (they are not just the keywords given for\nthat directory, but a summary of the directory content!). Now, these keywords could be stored as xAttr, but in a database too. The\nfirst requires a filesystem that supports xAttr, while the second requires a database daemon to be running. However, for speed\nperformance reasons this would be required anyway. Strigi indexes xAttr now (post 0.3.0 release), and basically allows both.</p>\n\n<p>Independent of the chosen/prefered way to store keywords, these keywords can be edited from the Properties dialog:</p>\n\n<p><img src=\"/assets/images/kfileXAttrSupport2.png\" alt=\"\" /></p>\n\n<p>Now comes the tricky part: though I would like to add this to KDE, I do not have the C++/KDE experience to actually do this.\nI’m already happy that I was able to extend the Strigi with support for KDE’s kfile architecture. Yes, the Strigi version in\nSVN will index all metadata extractable with kfile plugins installed on the KDE installation.</p>",
      "summary": "In reply to interesting comments to my previous blog on Strigi and xAttr support in KDE4, I would like to suggest the following mockups, which I would find very useful. The deal with the ability to store keywords, for example, not but necessarily using xAttr. I have no idea on how to implement these mockups, so any help or pointers are appreciated.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/kfileXAttrSupport2.png",
      "date_published": "2006-06-25T00:00:00+00:00",
      "date_modified": "2024-12-29T00:00:00+00:00",
      "tags": ["kde","strigi","technorati"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/wpk6m-d9y71",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/06/22/text-mining-for-chemistry-using-oscar3.html",
      "title": "Text mining for chemistry using OSCAR3",
      "content_html": "<p><a href=\"http://wwmm.ch.cam.ac.uk/wikis/wwmm/index.php/User:ptc24\">Peter Corbett</a> from <a href=\"http://wwmm.ch.cam.ac.uk/\">Peter Murray-Rust’s group</a>\nat the <a href=\"http://www-ucc.ch.cam.ac.uk/\">Unilever Cambridge Centre for Molecular Informatics</a> visited\n<a href=\"http://almost.cubic.uni-koeln.de/jrg/\">Christoph Steinbeck’s junior Research Group on Molecular Informatics</a> at the\n<a href=\"http://www.cubic.uni-koeln.de/\">CUBIC</a> today, and spoke about the status of <a href=\"http://sourceforge.net/projects/oscar3-chem\">Oscar3</a>,\na chemistry text mining program with the <a href=\"http://www.opensource.org/licenses/artistic-license.php\">Artistic License</a>.\nOscar3, the successor of version 1 and 2, can detect and extract molecular structures and experimental details from plain text articles,\nusing a variety of text mining techniques.</p>\n\n<p>The afternoon was spend on hacking Oscar3 into <a href=\"http://www.bioclipse.net/\">Bioclipse</a>, with good success. It involved updating Oscar3\nfor the latest <a href=\"http://cdk.sf.net/\">CDK</a> and setting up a plugin infrastructure for Bioclipse. This plugin will allow mining\n(scientific) articles for chemical compounds and there properties from within Bioclipse. The outcome of today’s hacking session was\nsomewhat less ambitious and focused on the general infrastructure, and getting the OPSIN functionality in Oscar3 available as a wizard.\nOPSIN is a IUPAC name 2 structure tool and, amongst many other names, is able to recognize <a href=\"http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=2519\">caffeine</a>\n(<code class=\"language-plaintext highlighter-rouge\">InChI=1/C8H10N4O2/c1-10-4-9-6-5(10)7(13)12(3)8(14)11(6)2/h4H,1-3H3</code>):</p>\n\n<p><img src=\"/assets/images/opsin.png\" alt=\"\" /></p>",
      "summary": "Peter Corbett from Peter Murray-Rust’s group at the Unilever Cambridge Centre for Molecular Informatics visited Christoph Steinbeck’s junior Research Group on Molecular Informatics at the CUBIC today, and spoke about the status of Oscar3, a chemistry text mining program with the Artistic License. Oscar3, the successor of version 1 and 2, can detect and extract molecular structures and experimental details from plain text articles, using a variety of text mining techniques.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/opsin.png",
      "date_published": "2006-06-22T00:00:00+00:00",
      "date_modified": "2006-06-22T00:00:00+00:00",
      "tags": ["oscar","bioclipse","textmining"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/6gyk7-hsk38",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/06/20/strigi-gets-kfile-plugin-support.html",
      "title": "Strigi gets kfile plugin support",
      "content_html": "<p>With some help, I got the <a href=\"http://developer.kde.org/documentation/tutorials/kfile-plugin/t1.html\">kfile</a> stream analyzer\nfor <a href=\"http://www.vandenoever.info/software/strigi/\">Strigi</a> working. This means that Strigi will now index the meta data\nfields defined by the <a href=\"http://www.kde-apps.org/content/show.php?content=28995\">kfile-chemical</a> plugins.</p>\n\n<p>The problem why it was not working earlier, was that it segfaulted on every creation of KDE classes. That’s something I\nreally hate about C/C++: the lack of stack traces, though <a href=\"http://valgrind.org/\">valgrind</a> was helpful. It turned out\nthat adding the below line fixed all. A <a href=\"http://developer.kde.org/documentation/library/3.0-api/classref/kdecore/KInstance.html\">KInstance</a>\nis needed when using KDE technology outside a KDE program:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>KInstance instance( \"strigita_kfile\" );\n</code></pre></div></div>\n\n<p>Combine this with the <a href=\"http://wiki.linuxquestions.org/wiki/Extended_attributes\">xattr</a> support added by Jos earlier today, I hope to\nsee an interesting new Strigi release soon! Now we only need to get <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/06/17/kde-desktop-search-kat-strigi-and.html\">editing of keywords <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\ninto KDE4.</p>",
      "summary": "With some help, I got the kfile stream analyzer for Strigi working. This means that Strigi will now index the meta data fields defined by the kfile-chemical plugins.",
      
      "date_published": "2006-06-20T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["strigi","kde"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/d22pr-jvr69",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/06/20/dutch-summer-of-code-sponsors.html",
      "title": "Dutch Summer of Code sponsors a Bioclipse project",
      "content_html": "<p>The Dutch version of the <a href=\"http://code.google.com/soc\">Google Summer of Code</a>, <a href=\"http://www.programmeerzomer.nl/\">Programmeerzomer.nl</a>,\nannounced today the five students participating. I was happy to see that Rob Schellhorn was selected with his project proposal for a\n<a href=\"http://bioinformatics.org/ghemical/ghemical/\">Ghemical</a> plugin for <a href=\"http://www.bioclipse.net/\">Bioclipse</a>. Like in the Google\noriginal, both the student and the mentoring organization are funded, 3600 and 400 euro respectively.</p>",
      "summary": "The Dutch version of the Google Summer of Code, Programmeerzomer.nl, announced today the five students participating. I was happy to see that Rob Schellhorn was selected with his project proposal for a Ghemical plugin for Bioclipse. Like in the Google original, both the student and the mentoring organization are funded, 3600 and 400 euro respectively.",
      
      "date_published": "2006-06-20T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["programmeerzomer","bioclipse","ghemical"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/9n9m7-y4v29",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/06/17/kde-desktop-search-kat-strigi-and.html",
      "title": "KDE desktop search: Kat, Strigi and Tenor",
      "content_html": "<p>Desktop searching has become a hot topic (some <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/05/26/molecular-indexing-on-kde-and-osx.html\">earlier <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2005/11/07/ubuntu-dapper-will-include-chemistry.html\">blogs <i class=\"fa-solid fa-recycle fa-xs\"></i></a>), now that years of data accumulated on ones\nhard disk: PDFs, OpenOffice.org documents, Latex manuscripts, old Java source code, digitized music, and a lot of chemical files. Well,\non my hard disk that is. Unlike piles of paper, a computer could search this data, but due to the size an index is required. What’s KDE4\ngoing to offer?</p>\n\n<p>For the <a href=\"http://www.kde.org/\">KDE</a> desktop <a href=\"http://kat.mandriva.com/\">Kat</a> has for more than a year offered this, and latter\n<a href=\"http://www.kde-apps.org/content/show.php?content=36832\">Kerry</a> came along as frontend to [Beagle(http://beaglewiki.org/Main_Page)],\nthough this does not have the nice integration with KDE <a href=\"http://developer.kde.org/documentation/tutorials/kfile-plugin/t1.html\">kfile plugins</a>.\nSince then, Kat developed has come to a stop (unfortunately), and attempts to reach the main author\n(<a href=\"mailto:roberto.cappuccio@gmail.com\">Roberto</a>) have been unsuccesfull. Last thing happening was a rewrite of the database backend.</p>\n\n<p>Additionally, <a href=\"http://dot.kde.org/1109163846/\">Scott Wheeler proposed Tenor</a> on <a href=\"http://www.fosdem.org/\">FOSDEM</a> 2005:\n<em>“KDE 4: Beyond Hierarchical Data, The Desktop as a Searchable Web of Context”</em>. A semantic desktop; potentially cool, but I have heard\n<a href=\"http://www.kdedevelopers.org/blog/72?from=10\">little from it lately</a>, except for some rumours that\n<a href=\"http://mail.kde.org/pipermail/klink/2006-April/000133.html\">Scott has some actual code at home</a>.</p>\n\n<p>Now, <a href=\"http://www.vandenoever.info/software/strigi/\">Strigi</a> (<a href=\"http://www.kde-look.org/content/show.php?content=40889\">download</a>) has come along,\nwith a fast indexing engine, just the thing where the Kat developed seemed to have stopped. The design is different from that of Kat, but it\ndoes not seem unlikely that Kat code can be ported. No support for PDF or OpenOffice.org documents yet, but that’s really the easy part, and\nkfile is on its way.</p>\n\n<p>Getting back to Tenor, one might wonder how Strigi could implement Tenor concepts. A simple approach is at least to allow users to tag files,\njust like we have become used to with blogs (e.g. <a href=\"http://www.technorati.com/\">Technorati.com</a>) and websites (e.g.\n<a href=\"http://www.connotea.org/\">Connotea</a>). This could be easily implemented using <a href=\"http://wiki.linuxquestions.org/wiki/Extended_attributes\">extended attributes</a>\n(xattr), <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/05/26/molecular-indexing-on-kde-and-osx.html\">already used by Beagle <i class=\"fa-solid fa-recycle fa-xs\"></i></a>:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code># file: home/egonw/1CRN.jpg\nuser.Tenor.Keywords=\"crambin\"\nuser.Tenor.Comment=\"Used in my ontologies presentation.\"\n</code></pre></div></div>\n\n<p>Obviously, this example shows not just these tags, but a user comment too. The idea, here, is that Strigi mines these attributes in\naddition to the file itself, so that search on tags can be done too. BTW, my argument to use this, instead of putting these things\nin the Strigi database itself, is persistence: data and metadata are kept together. KDE’s file properties dialog would be extended\nwith an extra tab that allows editing these fields.</p>\n\n<p>Strigi itself can be embedded in KDE applications to search specific information (e.g. search molecular data within\n<a href=\"http://cniehaus.livejournal.com/23010.html\">Kalzium</a> using the <a href=\"http://www.iupac.org/inchi/\">InChI</a>), and even in the FileOpen dialog.\nWe need patches for KDE4 that allows this, soon.</p>",
      "summary": "Desktop searching has become a hot topic (some earlier blogs ), now that years of data accumulated on ones hard disk: PDFs, OpenOffice.org documents, Latex manuscripts, old Java source code, digitized music, and a lot of chemical files. Well, on my hard disk that is. Unlike piles of paper, a computer could search this data, but due to the size an index is required. What’s KDE4 going to offer?",
      
      "date_published": "2006-06-17T00:00:00+00:00",
      "date_modified": "2024-12-29T00:00:00+00:00",
      "tags": ["kde","strigi","kalzium","linux","technorati","connotea"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/kc7ax-n3f66",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/06/12/chemistry-extension-for-spreadsheets.html",
      "title": "A chemistry extension for spreadsheet(s)",
      "content_html": "<p>Just wanted to make sure this news made it to the <a href=\"http://www.blueobelisk.org/planetbo\">Blue Obelisk Planet</a> too:\n<a href=\"http://www.blogger.com/profile/21711372\">David Strumfels</a> reported that\n<a href=\"https://web.archive.org/web/20060614224108/http://usefulchem.blogspot.com/2006/06/processing-usefulchem-molecules-with.html\">he extended MS-Excel with CDK functionality <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>.\nI wonder how difficult it would be to do this with <a href=\"http://www.koffice.org/kspread/\">Kspread</a> or\n<a href=\"http://www.gnome.org/projects/gnumeric/\">Gnumeric</a>?</p>",
      "summary": "Just wanted to make sure this news made it to the Blue Obelisk Planet too: David Strumfels reported that he extended MS-Excel with CDK functionality . I wonder how difficult it would be to do this with Kspread or Gnumeric?",
      
      "date_published": "2006-06-12T00:00:00+00:00",
      "date_modified": "2006-06-12T00:00:00+00:00",
      "tags": ["cdk","cheminf"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/wtyb0-h2328",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/06/05/recent-developments-of-chemistry.html",
      "title": "Recent Developments of the Chemistry Development Kit",
      "content_html": "<p><em><a href=\"https://doi.org/10.2174/138161206777585274\">Recent Developments of the Chemistry Development Kit (CDK) <i class=\"fa-solid fa-recycle fa-xs\"></i></a> -\nAn Open-Source Java Library for Chemo- and Bioinformatics</em> (<a href=\"https://repository.ubn.ru.nl/bitstream/handle/2066/35445/35445_aut.pdf\">green OA</a>) discusses (reasonably) recent additions to the\n<a href=\"http://cdk.sf.net/\">CDK</a>. It appeared in issue 17 of this years <a href=\"http://www.bentham.org/cpd/\">Current Pharmaceutical Design</a>\nvolume, after being too long in the queue after being accepted; but I am happy that it is out now.</p>\n\n<p>The article discusses CDK’s QSAR capabilities (the class designs and an overview of provided descriptors), the 3D model builder\n(see also <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/06/05/recent-developments-of-chemistry.html\">C. Hoppe, CDK News, 1(2):4-5 <i class=\"fa-solid fa-recycle fa-xs\"></i></a>)\nand and the interface to the statistical software <a href=\"http://www.r-project.org/\">R</a> (see also\n<a href=\"http://sourceforge.net/project/showfiles.php?group_id=20024&amp;package_id=124796&amp;release_id=310462\">CDK News, vol.2, issue 1</a>).\nThe article is part of a small special issue on Computational Applications in Medicinal Chemistry.</p>\n\n<p>CDK’s QSAR package comes with one main requirement: <strong>the outcome of QSAR descriptor calculations must be reproducable</strong>.\n<em>“Science must be reproducable”</em>; I’m sure someone once said this :) Therefore, each QSAR descriptor has a specification\npointing the a unique algorithm found in an ontology (see diagram below). This QSAR descriptor ontology is maintained by\nthe <a href=\"http://qsar.sf.net/\">qsar.sf.net</a> project, which is project independent, and even welcomes proprietary programs to\ndiscuss interoperability.</p>\n\n<p><img src=\"/assets/images/DescriptorOverview.png\" alt=\"\" /></p>\n\n<p>And calculated descriptors are explicitely linked to this specification again, though it is up to the user to do with\nthis what he wants:</p>\n\n<p><img src=\"/assets/images/DescriptorResultOverview.png\" alt=\"\" /></p>\n\n<p>Note that code has evolved since this publication, so class, interface and method names may have changed a bit.</p>",
      "summary": "Recent Developments of the Chemistry Development Kit (CDK) - An Open-Source Java Library for Chemo- and Bioinformatics (green OA) discusses (reasonably) recent additions to the CDK. It appeared in issue 17 of this years Current Pharmaceutical Design volume, after being too long in the queue after being accepted; but I am happy that it is out now.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/DescriptorOverview.png",
      "date_published": "2006-06-05T00:00:00+00:00",
      "date_modified": "2025-02-16T00:00:00+00:00",
      "tags": ["cdk","cheminf","qsar"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.2174/138161206777585274", "doi": "10.2174/138161206777585274"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/y3kng-8yq60",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/05/28/blue-obelisk-in-obernai-at.html",
      "title": "Blue Obelisk in Obernai at Chemoinformatics in Europe",
      "content_html": "<p>Together with <a href=\"http://wiki.cubic.uni-koeln.de/blog/pivot/entry.php?id=7\">Christoph</a>, Christian and Jerome, I will be\nrepresenting the Blue Obelisk movement on the first <a href=\"http://infochim.u-strasbg.fr/recherche/europeen_chemistry/index.php\">First Workshop on Chemoinformatics in\nEurope</a> with the topic <em>Research and Teaching</em>.\nThough I wonder what this theme excludes? Development? Can’t imagine that commercials companies will not be\nrepresented as usual. Moreover, it will likely include some bioinformatics too, unless you consider that to\ndeal with sequences only.</p>\n\n<p>I have my laptop with me, and, of course, the <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/05/22/live-life-sciences-cd.html\">Blue Obelisk Live CD 2 <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\non which the mouse now actually works. <a href=\"http://bioclipse.blogspot.com/2006/05/bioclipse-091-released.html\">Bioclipse 0.9.1</a>\ndoes not work, though; will report that bug later.</p>\n\n<p>My work schedule for the train ride:</p>\n\n<ul>\n  <li>Work on my manuscript</li>\n  <li>Integrate Todd Martin’s SMILES and QSAR work</li>\n  <li>Work on the next CDK News</li>\n  <li>Think about InChI creation in Bioclipse, using OpenBabel</li>\n</ul>",
      "summary": "Together with Christoph, Christian and Jerome, I will be representing the Blue Obelisk movement on the first First Workshop on Chemoinformatics in Europe with the topic Research and Teaching. Though I wonder what this theme excludes? Development? Can’t imagine that commercials companies will not be represented as usual. Moreover, it will likely include some bioinformatics too, unless you consider that to deal with sequences only.",
      
      "date_published": "2006-05-28T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["cdk","bioclipse","cheminf","bioinfo"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/51khs-pyh66",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/05/26/molecular-indexing-on-kde-and-osx.html",
      "title": "Molecular indexing on the KDE and OS/X desktops",
      "content_html": "<p><a href=\"http://geoffhutchison.net/about/\">Geoff Hutchinson</a> <a href=\"http://geoffhutchison.net/blog/archives/2006/05/25/chemspotlight-indexing-chemistry-on-your-mac/\">blogged</a>\nabout his <a href=\"http://geoffhutchison.net/projects/chem/\">OS/X ChemSpotLight</a>, an indexing tool for chemistry documents. It’s like,\nbut more advanced than, the <a href=\"http://www.kde-apps.org/content/show.php?content=28995\">kfile_chemical</a> and\n<a href=\"http://kat.mandriva.com/\">Kat</a> I have been working on (with others) for the\n<a href=\"https://kde.org/\">KDE <i class=\"fa-solid fa-recycle fa-xs\"></i></a> desktop (see earlier blog items).</p>\n\n<p>ChemSpotLight currently does more than the KDE tools: it adds Spotlight comments. I assume these are like the Linux\n<a href=\"http://wiki.linuxquestions.org/wiki/Extended_attributes\">extended attributes</a>, used for example by\n<a href=\"http://beaglewiki.org/Main_Page\">Beagle</a>. For example, a file indexed by Beagle will have extended attributes like:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code># file: home/egonw/m43.jpg\nuser.Beagle.AttrTime=\"20060509071950\"\nuser.Beagle.Filter=\"003 Beagle.Filters.FilterJpeg\"\nuser.Beagle.Fingerprint=\"02 xHn5Yi58x0eoI8ityBYkUw\"\nuser.Beagle.MTime=\"20031225151016\"\nuser.Beagle.Uid=\"YcIW72RWyk+K5FbGnpv4iA\"\n</code></pre></div></div>\n\n<p>This is very suitable for adding metadata, like comments as in ChemSpotLight. Geoff’s program adds metadata like number of\natoms and bond, but it calculates the <a href=\"http://www.daylight.com/smiles/\">SMILES</a> and <a href=\"http://www.iupac.org/inchi/\">InChI</a>\non the fly too. Especially the last is very good for indexing purposes, as it is a really unique identifier for molecular\nstructures, and even works for <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/03/31/inchis-in-latex-and-cdk-news.html\">proteins <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.</p>\n\n<p>Now, kfile_chemical is a kfile plugin. These kfile plugins only extract metadata from files, and have little to do with\ncalculated metadata. Kat, on the other hand, is an indexing application and might be expected to add additional, derived\nor calculated, metadata as extended attributes, just like Beagle does. And then InChI and SMILES are good candidates.</p>",
      "summary": "Geoff Hutchinson blogged about his OS/X ChemSpotLight, an indexing tool for chemistry documents. It’s like, but more advanced than, the kfile_chemical and Kat I have been working on (with others) for the KDE desktop (see earlier blog items).",
      
      "date_published": "2006-05-26T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["kde","cheminf","inchi"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/59vwx-e6a02",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/05/24/xml-validation-on-eclipse-with-web.html",
      "title": "XML validation on Eclipse with Web Tools Platform",
      "content_html": "<p>Yesterday I installed the <a href=\"http://www.eclipse.org/webtools/\">Eclipse Web Tools Platform</a> again, and now\nsuccesfully, using the Eclipse update mechanism, on my <a href=\"http://www.kubuntu.org/\">Kubuntu dapper</a> eclipse\ninstall. Because it has a validating XML editor, the one last thing I still needed\n<a href=\"http://www.jedit.org/\">jEdit</a> for. (I do miss the vertical selection feature of jEdit, though.) It\nsignals me of errors, and allows autocompletion.</p>\n\n<p>Now I can validate all <a href=\"http://www.xml-cml.org/\">Chemical Markup Langauge</a> files I have around, which is\nvery useful for those I use to make sure <a href=\"http://cdk.sf.net/\">CDK</a> and <a href=\"http://www.bioclipse.net/\">Bioclipse</a>\nis working properly. I just need to make sure I use the <code class=\"language-plaintext highlighter-rouge\">http://www.w3.org/2001/XMLSchema-instance namespace</code>,\nfor example as in this example from CDK SVN:</p>\n\n<div class=\"language-xml highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nt\">&lt;cml</span> <span class=\"na\">title=</span><span class=\"s\">\"Regression tests for valid XML Schema documents for CML 2.3\"</span>\n\n  <span class=\"na\">xmlns=</span><span class=\"s\">\"http://www.xml-cml.org/schema\"</span>\n  <span class=\"na\">xmlns:xsi=</span><span class=\"s\">\"http://www.w3.org/2001/XMLSchema-instance\"</span>\n  <span class=\"na\">xsi:schemaLocation=</span><span class=\"s\">\"http://www.xml-cml.org/schema ../../../io/cml/data/cml23.xsd\"</span><span class=\"nt\">&gt;</span>\n</code></pre></div></div>\n\n<p>Now, I do have some questions. Firstly, does WTP allow recycling of the XML editor? That is, can I use their validating XML editor in, for example, Bioclipse? Would I just depend on the right plugin jars from WTP, or is it more complicated? Alternatively, since in RCP all is a plugin, can WTP be installed as plugin in Bioclipse directly??</p>\n\n<p>Secondly, does Kubuntu or Debian sid have binary packages for WTP? I think to remember having read something about this, in relation with splitting up the WTP into smaller, more specific plugins. Anyone?</p>",
      "summary": "Yesterday I installed the Eclipse Web Tools Platform again, and now succesfully, using the Eclipse update mechanism, on my Kubuntu dapper eclipse install. Because it has a validating XML editor, the one last thing I still needed jEdit for. (I do miss the vertical selection feature of jEdit, though.) It signals me of errors, and allows autocompletion.",
      
      "date_published": "2006-05-24T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["xml","bioclipse","cml"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/vwtxz-8dh40",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/05/22/live-life-sciences-cd.html",
      "title": "A live life-sciences CD",
      "content_html": "<p>November last year, I <a href=\"https://chem-bla-ics.linkedchemistry.info/2005/11/18/goal-live-chemblaics-cd.html\">reported my plans <i class=\"fa-solid fa-recycle fa-xs\"></i></a> to develop\na live CD with all our favorite chemo- and bioinformatics software. <a href=\"http://www.bioclipse.net/\">Bioclipse</a> requires Java5\nand sort of still depends on the Sun JVM (I will experiment with classpath-generics later), but is now distributable with\noperating systems. So, I made a <a href=\"http://www.kubuntu.org/\">Kubuntu</a> derived operating system with\n<a href=\"http://openbabel.sourceforge.net/\">OpenBabel</a>, <a href=\"http://www.jmol.org/\">Jmol</a>, <a href=\"http://pymol.sourceforge.net/\">PyMOL</a>,\nBioclipse, and, on systems level, the chemical MIMEs and <a href=\"http://www.kde-apps.org/content/show.php?content=28995\">kfile_chemical</a>,\nwich extends the desktop with chemistry awareness. In addition, I added the\n<a href=\"http://www.blueobelisk.org/\">Blue Obelisk Data Repository</a>, all <a href=\"http://almost.cubic.uni-koeln.de/cdk/cdk_top/cdk_news/\">CDK News</a>\nissues, and the full <a href=\"http://www.nmrshiftdb.org/\">NMRShiftDB</a> data in CML format.</p>\n\n<p>The <a href=\"http://wiki.cubic.uni-koeln.de/iso/cdname.iso\">iso image</a> can be downloaded, and is really a first set up. Bioclipse does not\nwork, but much of the rest does. Please download it (about 625MB) and experiment with it, and leave your comments with this blog item.</p>",
      "summary": "November last year, I reported my plans to develop a live CD with all our favorite chemo- and bioinformatics software. Bioclipse requires Java5 and sort of still depends on the Sun JVM (I will experiment with classpath-generics later), but is now distributable with operating systems. So, I made a Kubuntu derived operating system with OpenBabel, Jmol, PyMOL, Bioclipse, and, on systems level, the chemical MIMEs and kfile_chemical, wich extends the desktop with chemistry awareness. In addition, I added the Blue Obelisk Data Repository, all CDK News issues, and the full NMRShiftDB data in CML format.",
      
      "date_published": "2006-05-22T00:00:00+00:00",
      "date_modified": "2006-05-22T00:00:00+00:00",
      "tags": ["linux","jmol","bioclipse","nmrshiftdb"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/4f36v-1ze23",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/05/18/taverna-runs-with-classpath-091.html",
      "title": "Taverna runs with Classpath 0.91",
      "content_html": "<p>Classpath 0.91 <a href=\"http://www.gnu.org/software/classpath/announce/20060515.html\">is released</a> with\n<a href=\"http://jroller.com/page/dgilbert?entry=1_45_million_lines_of\">1.45 million</a> lines of code and with\n<a href=\"http://www.kaffe.org/~stuart/japi/htmlout/h-jdk14-classpath.html\">98.96%</a> coverage of Java 1.4.2,\nand 99.82% of java.swing. Or, as <a href=\"http://jroller.com/page/dgilbert?entry=gnu_classpath_0_91\">Dave calls it</a>:\n0.91 rocks! <a href=\"https://chem-bla-ics.linkedchemistry.info/2005/11/20/open-source-swing-jchempaint-runs.html\">JChemPaint runs again <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\n(they fixed the XML parsing problem), and <a href=\"https://chem-bla-ics.linkedchemistry.info/2005/11/27/open-source-swing-jmol-renderer-runs.html\">Jmol still runs &lt;i class=”fa-solid fa-recycle fa-xs”</a>,\n<a href=\"http://developer.classpath.org/mediation/FreeSwingTestApps\">but slow</a>. I also tested\n<a href=\"http://taverna.sourceforge.net/\">Taverna</a> which now also starts up, but has an XML parsing error too:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>Exception occured whilst loading RDFS! Error on line 2: required string: \"?&gt;\"\norg.jdom.input.JDOMParseException: Error on line 2: required string: \"?&gt;\"\n   at org.jdom.input.SAXBuilder.build(SAXBuilder.java:468)\n   at org.jdom.input.SAXBuilder.build(SAXBuilder.java:851)\n   at org.embl.ebi.escience.scufl.semantics.RDFSParser.loadRDFSDocument(RDFSParser.java:70)\n   at org.embl.ebi.escience.scuflui.workbench.Workbench.main(Workbench.java:128)\n   at java.lang.reflect.Method.invokeNative(Native Method)\n   at java.lang.reflect.Method.invoke(Method.java:355)\n   at org.embl.ebi.escience.scuflui.workbench.WorkbenchLauncher.main(WorkbenchLauncher.java:40)\n</code></pre></div></div>\n\n<p>Oh, and rumours go that <a href=\"http://www.nongnu.org/gcjwebplugin/\">gcjwebplugin</a> can run the Jmol applet now,\nexcept for the JavaScript interaction, that is.</p>",
      "summary": "Classpath 0.91 is released with 1.45 million lines of code and with 98.96% coverage of Java 1.4.2, and 99.82% of java.swing. Or, as Dave calls it: 0.91 rocks! JChemPaint runs again (they fixed the XML parsing problem), and Jmol still runs &lt;i class=”fa-solid fa-recycle fa-xs”, but slow. I also tested Taverna which now also starts up, but has an XML parsing error too:",
      
      "date_published": "2006-05-18T00:00:00+00:00",
      "date_modified": "2006-05-18T00:00:00+00:00",
      "tags": ["java","workflow","jchempaint","taverna"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/4hf7p-hxt14",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/05/11/new-open-access-journal-source-code.html",
      "title": "New open access journal Source Code for Biology and Medicine",
      "content_html": "<p><a href=\"http://www.biomedcentral.com/\">BioMed Central</a> is setting up a new peer-reviewed, open access journal\n<a href=\"http://www.scfbm.org/\">Source Code for Biology and Medicine</a>. It will <em>“encompass all aspects of workflow for\ninformation systems, decision support systems, client user networks, database management, and data mining”</em>.\nBasically, anything that fits into chem-bla-ics. (Thanx to Werner, for pointing me to the website!)</p>\n\n<p>The ‘source code’ aspect is the interesting thing of this new journal. The editorial board set the aim to <em>publish\nsource code for distribution and use in the public domain in order to advance biological and medical research</em>.\nAnd, in a bit more detail, they list the following goals:</p>\n\n<ul>\n  <li>increase productivity</li>\n  <li>reduce discovery times</li>\n  <li>reduce search times for source code</li>\n  <li>Provide a historical reflection of source code applied</li>\n  <li>serve as a repository</li>\n</ul>\n\n<p>This comes close to what open source is trying to achieve too, but I do not differences. For example, the announcement\nmentions the public domain (see the <a href=\"http://en.wikipedia.org/wiki/Public_domain\">Wikipedia entry</a>). I tend to be a\nbit confused by the use of this term: to me the public domain is where things end up after copyright claims have\nended, and everyone is free to do with it whatever he wants, and, very important in this case, that open source\nsoftware is not in the public domain. Do they mean that they will not allow open source in the new journal?</p>\n\n<p>I also wonder wether we need a journal like this? Open source projects often have other resources available that\nserve as repository (e.g. <a href=\"https://sourceForge.net\">SourceForge <i class=\"fa-solid fa-recycle fa-xs\"></i></a>), and the use\nversion control systems as repositories (like <a href=\"http://www.nongnu.org/cvs/\">CVS</a>, <a href=\"http://subversion.tigris.org/\">Subversion</a>)\nis widespread too, which takes care of the historical reflection. Indeed, many open source software is already\npublished in other journals.</p>\n\n<p>The process of picking the journal to submit to, often involves looking up the journals impact factor. Is this new\njournal expected to get a high impact factor? How many people will regularly read the journal? Will it be read by\nthe right audience, or just by fellow bioinformaticians?</p>\n\n<p>Though I have my doubts about the success of this journal, I am looking forward to the first issue!</p>\n\n<p><strong>Update</strong>: <a href=\"http://www.nodalpoint.org/user/pedrobeltrao\">Pedro</a> <a href=\"https://web.archive.org/web/20060615123103/http://www.nodalpoint.org/2006/05/12/source_code_for_biology_and_medicine\">pointed <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>\nme to the <a href=\"https://web.archive.org/web/20060620202859/http://www.scfbm.org/info/about/\">About page <i class=\"fa-solid fa-box-archive fa-xs\"></i></a> of\nthe SCFBM, giving details on the types of articles taken into consideration.</p>",
      "summary": "BioMed Central is setting up a new peer-reviewed, open access journal Source Code for Biology and Medicine. It will “encompass all aspects of workflow for information systems, decision support systems, client user networks, database management, and data mining”. Basically, anything that fits into chem-bla-ics. (Thanx to Werner, for pointing me to the website!)",
      
      "date_published": "2006-05-11T00:00:00+00:00",
      "date_modified": "2025-02-13T00:00:00+00:00",
      "tags": ["openscience","bioinfo"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/wyet7-r6r37",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/05/07/open-text-mining-interface-and.html",
      "title": "Open Text Mining Interface and Bioclipse",
      "content_html": "<p>Timo Hannay <a href=\"https://web.archive.org/web/20060620194249/http://blogs.nature.com/wp/nascent/2006/04/open_text_mining_interface.html\">blogged <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>\nin <a href=\"http://www.nature.com/\">Nature</a>’s <a href=\"https://web.archive.org/web/20060504035155/http://blogs.nature.com/wp/nascent/\">Nascent blog <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>\nabout the Open Text Mining Interface (OTMI), which is “a suggestion from Nature about how we might achieve text-mining\nand indexing purposes”. The idea is that each article has a link pointing to a machine readable file\ncontaining raw data about (and from?) the article. The standing example uses\n<a href=\"http://atompub.org/2005/07/11/draft-ietf-atompub-format-10.html\">Atom 1.0</a> as a container, allowing raw\ndata to be included using foreign namespaces, such as <a href=\"http://prismstandard.org/\">Dublic Core</a>\n(for metadata) and <a href=\"http://prismstandard.org/\">Prism</a> (for bibliographic data), and the OTMI text\nmining statistics uses a namespace too.</p>\n\n<p>In a comment, <a href=\"http://www.ch.ic.ac.uk/rzepa/\">Henry Rzepa</a> proposed inclusion of CML, and refers to earlier\nwork on CMLRSS where <a href=\"http://www.xml-cml.org/\">Chemical Markup Language</a> is embedded in RSS news feeds\nfor which I wrote readers for <a href=\"http://www.jmol.org/\">Jmol</a> and\n<a href=\"http://jchempaint.sf.net/\">JChemPaint</a> (DOI:<a href=\"https://doi.org/10.1021/ci034244p\">10.1021/ci034244p</a>).</p>\n\n<p>As readers of my blog know, the <a href=\"http://www.bioclipse.net/\">Bioclipse</a> project has been working hard\non an integrated (bio)chemistry workbench, and the <a href=\"http://bioclipse.blogspot.com/2006/05/bioclipse-090-released.html\">latest release</a>\nincludes a <a href=\"http://wiki.bioclipse.net/index.php?title=CMLRSS_plugin\">CMLRSS reader plugin</a> too, which\nsupports CML embedded in Atom 0.3/1.0 and RSS 1.0/2.0 feeds. Now, adding support for other embedded\nnamespaces is trivial, and this morning I hacked in support for OTMI:</p>\n\n<p><img src=\"/assets/images/otmiSupport.png\" alt=\"\" /></p>\n\n<p>This screenshot show the original OTMI example\nwith the Atom 1.0 entry now wrapped in an Atom 1.0 <code class=\"language-plaintext highlighter-rouge\">&lt;feed&gt;</code> element. There is no nice OTMI icon for the OTMI content in the\nAtom 1.0 entry, neither did I make a ‘view’ yet showing the actual vector’s or the snippet’s, but that’s a piece of cake too.</p>\n\n<p>Now, the nice thing about this is that the Bioclipse code for the Atom and RSS feeds, just greps through the feed entry\nand show whatever CML or OTMI content is present. When Nature decides to include CML in these OTMI files too,\nI will not have to update the current code.</p>",
      "summary": "Timo Hannay blogged in Nature’s Nascent blog about the Open Text Mining Interface (OTMI), which is “a suggestion from Nature about how we might achieve text-mining and indexing purposes”. The idea is that each article has a link pointing to a machine readable file containing raw data about (and from?) the article. The standing example uses Atom 1.0 as a container, allowing raw data to be included using foreign namespaces, such as Dublic Core (for metadata) and Prism (for bibliographic data), and the OTMI text mining statistics uses a namespace too.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/otmiSupport.png",
      "date_published": "2006-05-07T00:00:00+00:00",
      "date_modified": "2025-02-16T00:00:00+00:00",
      "tags": ["cml","bioclipse","xml","textmining","rss"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1021/CI034244P", "doi": "10.1021/CI034244P"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/r4sw4-ehh35",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/05/03/four-graph-mining-methods-integrated.html",
      "title": "Four graph mining methods integrated in ParMol",
      "content_html": "<p><a href=\"https://www.blogger.com/profile/09112376168632883058\">Joerg Wegner <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\n<a href=\"http://miningdrugs.blogspot.com/2006/05/molecule-mining-field-is-rapidly.html\">mentioned in his blog</a>\nthe graph mining program <a href=\"https://web.archive.org/web/20070609221004/http://www2.informatik.uni-erlangen.de/Forschung/Projekte/ParMol/?language=en\">ParMol <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>\nwhich integrates four mining algorithms:\n<a href=\"http://fuzzy.cs.uni-magdeburg.de/~borgelt/moss.html\">MoSS</a> (aka MoFa) and <a href=\"http://www.liacs.nl/~snijssen/gaston/\">Gaston</a>, which I mentioned\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2005/11/02/open-source-data-mining-in.html\">in November last year <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\nand FFSM and gSpan, which I did not know about\nyet. ParMol provides a common interface to the four different algorithms and is, like the four mining modules, licensed GPL. An interesting aspect\nis that Gaston was originally written in C++.</p>",
      "summary": "Joerg Wegner mentioned in his blog the graph mining program ParMol which integrates four mining algorithms: MoSS (aka MoFa) and Gaston, which I mentioned in November last year , and FFSM and gSpan, which I did not know about yet. ParMol provides a common interface to the four different algorithms and is, like the four mining modules, licensed GPL. An interesting aspect is that Gaston was originally written in C++.",
      
      "date_published": "2006-05-03T00:00:00+00:00",
      "date_modified": "2025-02-16T00:00:00+00:00",
      "tags": ["cheminf","opensource"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/s63wt-hbx56",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/05/01/nightly-cdk-builds-now-available.html",
      "title": "Nightly CDK builds now available",
      "content_html": "<p><a href=\"http://web.archive.org/web/20060815001811/http://cheminfo.informatics.indiana.edu/~rguha/\">Rajarshi Guha <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>\nhas set a <a href=\"http://blue.chem.psu.edu/~rajarshi/code/java/nightly/\">nightly build service <i class=\"fa-solid fa-link-slash fa-xs\"></i></a>\nfor the <a href=\"http://cdk.sf.net/\">Chemistry Development Kit</a> (CDK). The output is pretty, but information rich: it includes results for the\n<a href=\"http://www.junit.org/\">JUnit test</a>, <a href=\"http://java.sun.com/j2se/javadoc/doccheck/\">DocCheck</a>, and <a href=\"http://pmd.sourceforge.net/\">PMD</a>.\nThe compiled jar and the corresponding JavaDoc can be downloaded, offering a cutting edge distribution for users.</p>",
      "summary": "Rajarshi Guha has set a nightly build service for the Chemistry Development Kit (CDK). The output is pretty, but information rich: it includes results for the JUnit test, DocCheck, and PMD. The compiled jar and the corresponding JavaDoc can be downloaded, offering a cutting edge distribution for users.",
      
      "date_published": "2006-05-01T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["cdk","junit","pmd","opensource"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/23wn4-1nt07",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/04/23/protein-support-in-bioclipse-using.html",
      "title": "Protein support in Bioclipse using Jmol and the CDK",
      "content_html": "<p>I have not blogged for about a week now, and been too busy with other things, like finishing my PhD articles/manuscript,\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2006/03/25/cologne-university-bioinformatics.html\">my new job at the CUBIC <i class=\"fa-solid fa-recycle fa-xs\"></i></a> where I\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2006/04/10/getting-jmols-cartoon-on-to-work-in.html\">continued the work <i class=\"fa-solid fa-recycle fa-xs\"></i></a> on proper protein support in\n<a href=\"http://www.bioclipse.net/\">Bioclipse</a> using the <a href=\"http://cdk.sf.net/\">CDK</a> and\n<a href=\"http://www.jmol.org/\">Jmol</a>:</p>\n\n<p><img src=\"/assets/images/cdkpdbsupport800.png\" alt=\"Screenshot of Bioclipse with a protein visualized with Jmol in the middle.\" /></p>\n\n<p>The latter involves getting the <a href=\"https://sourceforge.net/p/bioclipse/code/11760/log/?path=/bioclipse/trunk/plugins/net.bioclipse.jmol/src/net/bioclipse/plugins/adapter/cdk/CdkJmolAdapter.java\">CdkJmolAdapter <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\nthe interface between the CDK and Jmol, <a href=\"https://web.archive.org/web/20060508024648/http://wiki.cubic.uni-koeln.de/cdkwiki/doku.php?id=cdknewsartjmolandcdk\">updated for changes <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>\nsince the <a href=\"https://sourceforge.net/projects/cdk/files/CDK%20News/2_1/\">Jmol as 3D viewer for CDK <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\narticle in <a href=\"https://sourceforge.net/projects/cdk/files/CDK%20News/\">CDK News <i class=\"fa-solid fa-recycle fa-xs\"></i></a>, the open access journal for CDK related projects.</p>\n\n<p>The screenshot is not showing the actual status: the <code class=\"language-plaintext highlighter-rouge\">CdkJmolAdapter</code> does not propagate all information to Jmol correctly; as you\ncan see in the screenshot in the <code class=\"language-plaintext highlighter-rouge\">BioPolymerTree</code> and <code class=\"language-plaintext highlighter-rouge\">Property</code> views, the CDK now reads the structure information from the PDB file,\nand I verified that Jmol really extracts this using the <code class=\"language-plaintext highlighter-rouge\">StructureIterator</code>, but the secundairy structure does not show up yet.\nI believe the problem is in the <code class=\"language-plaintext highlighter-rouge\">AtomIterator</code>: issueing the <code class=\"language-plaintext highlighter-rouge\">select protein</code> script, selects zero atoms.</p>\n\n<p>The above screenshot is using a workaround, and was made by using Jmol’s own IO instead of the <code class=\"language-plaintext highlighter-rouge\">CdkJmolAdapter</code>. But\nI’m very close and think I will be able to fix this soon.</p>",
      "summary": "I have not blogged for about a week now, and been too busy with other things, like finishing my PhD articles/manuscript, my new job at the CUBIC where I continued the work on proper protein support in Bioclipse using the CDK and Jmol:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/cdkpdbsupport800.png",
      "date_published": "2006-04-23T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["bioclipse","jmol","cdk"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/mgcny-v7r82",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/04/23/download-statistics-for-chemblaics.html",
      "title": "Download statistics for chemblaics components",
      "content_html": "<p>Here are some quick download statistics for some of the chemblaics components. First\n<a href=\"http://www.jmol.org/\">Jmol</a>. The new stable Jmol 10.2 was release just over a week ago, and this obviously boosted downloads,\nbreaking the monthly download total of two earlier this year (<a href=\"http://sourceforge.net/project/stats/?group_id=23629&amp;ugn=jmol&amp;type=&amp;mode=alltime\">source</a>):</p>\n\n<p><img src=\"/assets/images/jmolDownloadStats.April2006.png\" alt=\"\" /></p>\n\n<p>Statistics for the CDK include download numbers for the <a href=\"http://cdk.sf.net/\">CDK</a> library itself, but for <a href=\"http://jchempaint.sf.net/\">JChemPaint</a>,\nthe CDK News, and several other packages too. Totals are at about 1/3rd of Jmol. Another new record, breaking an earlier record set in February 2003\n(<a href=\"http://sourceforge.net/project/stats/?group_id=20024&amp;ugn=cdk&amp;type=&amp;mode=alltime\">source</a>):</p>\n\n<p><img src=\"/assets/images/cdkDownloadStats.April2006.png\" alt=\"\" /></p>\n\n<p>Finally, I want to mention the overall download count for <a href=\"http://www.kde-apps.org/content/show.php?content=28995\">kfile_chemical</a>\nwas is much higher than I ever would have hoped for: 1125 in 7 months! Maybe I should ask to get this in the\n<a href=\"http://www.kde.org/\">KDE</a> extragear.</p>",
      "summary": "Here are some quick download statistics for some of the chemblaics components. First Jmol. The new stable Jmol 10.2 was release just over a week ago, and this obviously boosted downloads, breaking the monthly download total of two earlier this year (source):",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/jmolDownloadStats.April2006.png",
      "date_published": "2006-04-23T00:00:00+00:00",
      "date_modified": "2006-04-23T00:00:00+00:00",
      "tags": ["jmol","cdk","jchempaint"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/7d8hq-pp704",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/04/14/postgenomiccom-maps-upcoming.html",
      "title": "Postgenomic.com maps upcoming conferences",
      "content_html": "<p>Conference season is nearing. And just in time, <a href=\"http://web.archive.org/web/20240601063018mp_/http://postgenomic.com/\">Postgenomic.com <i class=\"fa-solid fa-box-archive fa-xs\"></i></a> added\na <a href=\"https://web.archive.org/web/20060513202812/http://postgenomic.com/meetings.php\">conferences map <i class=\"fa-solid fa-box-archive fa-xs\"></i></a> showing locations of upcoming and\nrecently finished conferences. Oh boy, do I want to set this up for chemoinformatics too!</p>\n\n<p>Postgenomic.com makes use of the <a href=\"https://web.archive.org/web/20060813150816/http://postgenomic.com/about_reviews.php\">rel=”conference” attribute <i class=\"fa-solid fa-box-archive fa-xs\"></i></a> for the\n<code class=\"language-plaintext highlighter-rouge\">&lt;a&gt;</code> element. I’m not sure how they distinguish between upcoming and finished conferences (will need to check the\n<a href=\"http://web.archive.org/web/20060519215119/http://www.postgenomic.org/\">source code <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>). But I think some manual\nprocessing is done, for example, to extract conference details, like title, location and dates. I assume the URL is used as unique identifier. Additionally,\nthe conferences are not ‘tagged’ yet, which should be possible too, as Postgenomic.com already associates tags from blog items with articles mentioned in\nthat item. But this is likely a temporary ommision.</p>\n\n<p>I already saw <a href=\"https://web.archive.org/web/20060621192118/http://www.ched-ccce.org/confchem/\">ChemConf2006 <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>\n<a href=\"https://web.archive.org/web/20060513202812/http://postgenomic.com/meetings.php#conference_id_6\">picked up <i class=\"fa-solid fa-box-archive fa-xs\"></i></a> from an\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2006/04/02/free-online-chemconf-2006-conference.html\">earlier post <i class=\"fa-solid fa-recycle fa-xs\"></i></a> by me. Unfortunately, because it is an online conference, it does not show up on tha map :( The following two conference do have a physical location, and I hope the will appear on the map. If you wonder why I mention only these two, they are the two I will attend in the next 8 weeks, and will have presence of open source bio- and chemoinformatics software developers (at least one, me).</p>\n\n<ul>\n  <li><a href=\"https://gw1-prod.nbic.nl/http://cms1-prod-inside.nbic.nl/home/events/20060424_NBICevent\">Netherlands Bioinformatics Conference <i class=\"fa-solid fa-link-slash fa-xs\"></i></a>, April 24, Ede, The Netherlands</li>\n  <li><a href=\"http://web.archive.org/web/20060612215907/http://infochim.u-strasbg.fr/recherche/europeen_chemistry/index.php\">Workshop Chemoinformatics in Europe: Research and Teaching <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>, 29 May - 1 June, Obernai, France</li>\n</ul>",
      "summary": "Conference season is nearing. And just in time, Postgenomic.com added a conferences map showing locations of upcoming and recently finished conferences. Oh boy, do I want to set this up for chemoinformatics too!",
      
      "date_published": "2006-04-14T00:00:00+00:00",
      "date_modified": "2024-12-29T00:00:00+00:00",
      "tags": ["postgenomic","cheminf"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/3xsm0-bt478",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/04/12/cdk-data-classes-and-change.html",
      "title": "The CDK data classes and change notifications",
      "content_html": "<p>The data classes of the <a href=\"http://cdk.sf.net/\">Chemistry Development Kit</a> are mutable, unlike those of\n<a href=\"http://sourceforge.net/projects/octet\">Octet</a>. This means that other classes may need to respond when\nthe content updates. For example, a render class. CDK’s <a href=\"http://cdk.sourceforge.net/api/org/openscience/cdk/ChemObject.html\">ChemObject</a>\nprovides a <code class=\"language-plaintext highlighter-rouge\">notifyChanged()</code> and <code class=\"language-plaintext highlighter-rouge\">addListener()</code> methods for this. However, as was\n<a href=\"http://sourceforge.net/mailarchive/forum.php?thread_id=10001141&amp;forum_id=2178\">recently</a> pointed out,\nwhile this is useful in editors, such as <a href=\"http://jchempaint.sf.net/\">JChemPaint</a>, this is a performance killer in high-throughput\nsitations, such as descriptor calculation, or structure diagram generation runs.</p>\n\n<p>To address this, the <a href=\"http://svn.sourceforge.net/viewcvs.cgi/cdk/trunk/cdk/src/org/openscience/cdk/interfaces/IChemObject.java?view=log\">IChemObject</a>\ninterface has been extended with the methods <code class=\"language-plaintext highlighter-rouge\">setNotification(boolean)</code> and <code class=\"language-plaintext highlighter-rouge\">getNotification()</code>, which allow to temporarily\ndisable change notifications. There are no helper methods yet to disable it for a complete data structure, like\n<code class=\"language-plaintext highlighter-rouge\">ChemModelManipulator.setNotification(ChemModel, boolean)</code>, but I expect these to be written soon.</p>\n\n<p>Alternatively, special data classes may be used if notification is never needed for a special setup, for example, in case the QSAR descriptor calculation. In such cases, the new <a href=\"http://svn.sourceforge.net/viewcvs.cgi/cdk/trunk/cdk/src/org/openscience/cdk/nonotify/NoNotificationChemObjectBuilder.java?view=log\">NoNotificationChemObjectBuilder</a>\ncan be used:</p>\n\n<div class=\"language-java highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nc\">IChemObjectReader</span> <span class=\"n\">reader</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"nc\">MDLReader</span><span class=\"o\">(</span><span class=\"k\">new</span> <span class=\"nc\">FileInputStream</span><span class=\"o\">(</span><span class=\"k\">new</span> <span class=\"nc\">File</span><span class=\"o\">(</span><span class=\"s\">\"some.mol\"</span><span class=\"o\">)));</span>\n<span class=\"nc\">IChemObjectBuilder</span> <span class=\"n\">builder</span> <span class=\"o\">=</span> <span class=\"nc\">NoNotificationChemObjectBuilder</span><span class=\"o\">.</span><span class=\"na\">getInstance</span><span class=\"o\">();</span>\n<span class=\"nc\">IMolecule</span> <span class=\"n\">molecule</span> <span class=\"o\">=</span> <span class=\"n\">reader</span><span class=\"o\">.</span><span class=\"na\">read</span><span class=\"o\">(</span><span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newMolecule</span><span class=\"o\">());</span>\n<span class=\"c1\">// then perform some operation in which the molecule changes a lot</span>\n</code></pre></div></div>\n\n<p>The advantage is that you do not have to manually disable notification for each class you instantiate. This should give a considerable speed up, and I hope soon to give some statistics.</p>",
      "summary": "The data classes of the Chemistry Development Kit are mutable, unlike those of Octet. This means that other classes may need to respond when the content updates. For example, a render class. CDK’s ChemObject provides a notifyChanged() and addListener() methods for this. However, as was recently pointed out, while this is useful in editors, such as JChemPaint, this is a performance killer in high-throughput sitations, such as descriptor calculation, or structure diagram generation runs.",
      
      "date_published": "2006-04-12T00:00:00+00:00",
      "date_modified": "2025-02-13T00:00:00+00:00",
      "tags": ["cdk","cheminf","jchempaint"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/7nz8x-a7q09",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/04/10/getting-jmols-cartoon-on-to-work-in.html",
      "title": "Getting Jmol&apos;s &apos;cartoon on&apos; to work in Bioclipse",
      "content_html": "<p><a href=\"https://web.archive.org/web/20060420034219/http://www.bioclipse.net/\">Bioclipse</a> 1.0 is to be released in May, and the cartoon on script command is\nstill not working in the <a href=\"http://www.jmol.org/\">Jmol</a> viewer. For those who do not know yet, <a href=\"http://www.eclipse.org/\">Bioclipse</a> is a cool Eclipse\nRCP based Java chemo-and bioinformatics workbench. To have a better idea what goes on inside Bioclipse, I wrote a new BioPolymer tree to show me the\nstrands in the protein. After <a href=\"http://bioclipse.blogspot.com/\">Ola</a> wrote code to show properties for IChemObject’s, I extended it with PDB properties\nfor the atoms, strands and monomers.</p>\n\n<p>The contents of the ChemTree view in the middle and the Properties view below that look fine:</p>\n\n<p><img src=\"https://media.springernature.com/full/springer-static/image/art%3A10.1186%2F1471-2105-8-59/MediaObjects/12859_2006_Article_1431_Fig4_HTML.jpg?as=webp\" alt=\"\" /></p>\n\n<p>So I’ll have to dig a bit further.</p>",
      "summary": "Bioclipse 1.0 is to be released in May, and the cartoon on script command is still not working in the Jmol viewer. For those who do not know yet, Bioclipse is a cool Eclipse RCP based Java chemo-and bioinformatics workbench. To have a better idea what goes on inside Bioclipse, I wrote a new BioPolymer tree to show me the strands in the protein. After Ola wrote code to show properties for IChemObject’s, I extended it with PDB properties for the atoms, strands and monomers.",
      
      "date_published": "2006-04-10T00:00:00+00:00",
      "date_modified": "2024-05-26T00:00:00+00:00",
      "tags": ["bioclipse","jmol","protein"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/tzwfe-ww931",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/04/04/mining-kegg-pathway-database-with-self.html",
      "title": "Mining the KEGG pathway database with self-organizing maps",
      "content_html": "<p>The <a href=\"https://en.wikipedia.org/wiki/Self_organizing_map\">Self-organizing map</a> (SOM) is a popular (again) and intuitive non-linear mapping\nmethod: it transforms a multidimensional space into two dimensions (normally: they are so easy to visualize). Latino and\n<a href=\"http://www.dq.fct.unl.pt/staff/jas/\">Aires-de-Sousa</a> published a paper that uses this method to analyze the whole\n<a href=\"http://www.genome.jp/kegg/pathway.html\">KEGG pathway database</a>: <em>Genome-Scale Classification of Metabolic Reactions: A Chemoinformatics\nApproach</em> (DOI: <a href=\"https://doi.org/10.1002/anie.200503833\">anie.200503833</a>).</p>\n\n<p>The method is based on earlier work by Zhang and Aires-de-Sousa: <em>Structure-Based Classification of Chemical Reactions without Assignment\nof Reaction Centers</em> (DOI: <a href=\"https://doi.org/10.1021/ci0502707\">10.1021/ci0502707</a>). A non-trivial feature of the suggested method is the\nuse of two SOMs. The first maps the reaction onto a fixed-length vector (coined MOLMAP), which is used as input vector for the second map.\nThis later map is used to cluster the KEGG reactions on a purely chemical basis. The resemblence with the\n<a href=\"https://en.wikipedia.org/wiki/EC_number\">EC numbering system</a> is striking.</p>",
      "summary": "The Self-organizing map (SOM) is a popular (again) and intuitive non-linear mapping method: it transforms a multidimensional space into two dimensions (normally: they are so easy to visualize). Latino and Aires-de-Sousa published a paper that uses this method to analyze the whole KEGG pathway database: Genome-Scale Classification of Metabolic Reactions: A Chemoinformatics Approach (DOI: anie.200503833).",
      
      "date_published": "2006-04-04T00:00:00+00:00",
      "date_modified": "2025-02-16T00:00:00+00:00",
      "tags": ["kegg","chemometrics"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1002/ANIE.200503833", "doi": "10.1002/ANIE.200503833"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1021/CI0502707", "doi": "10.1021/CI0502707"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/we1qn-tq260",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/04/02/uncertainty-in-nmr-based-3d-protein.html",
      "title": "Uncertainty in NMR based 3D protein models",
      "content_html": "<p>While I was working on implementing proper author-given chain IDs in <a href=\"http://www.pdb.org/\">PDB</a> structures for\n<a href=\"http://www.jmol.org/\">Jmol</a>’s mmCIF reader today, I thought it was interesting to mention the recent article\n<em>Traditional Biomolecular Structure Determination by NMR Spectroscopy Allows for Major Errors by Nabuurs</em>\n(DOI:<a href=\"http://dx.doi.org/10.1371/journal.pcbi.0020009\">10.1371/journal.pcbi.0020009</a>, open access), working at the\n<a href=\"http://www.cmbi.ru.nl/\">CMBI</a>, two floors away from my former working location at the\n<a href=\"https://www.ru.nl/\">Radboud University Nijmegen</a>.</p>\n\n<p>Nabuurs discusses in this article the uncertainties that come with NMR derived 3D molecular structures of proteins.\nThese studies do not give factual data on atomic coordinates, but generally give facts about interatomic distances.\nSolving the 3D geometry is then an optimization problem where the task is to find the 3D geometry that best\nreproduces the factual interatomic distances.</p>\n\n<p>Now, this optimization has many closeby, i.e. in terms of matching the experimental data, minima, corresponding,\npossibly, to quite different structures.</p>\n\n<p>This is nicely demonstrated in the article, by comparing the folds of <a href=\"http://www.pdb.org/pdb/explore.do?structureId=1Y4O\">1Y4O</a>\nand <a href=\"http://www.pdb.org/pdb/explore.do?structureId=1TGQ\">1TGQ</a>, as shown in the figure below\n(<a href=\"http://www.plos.org/oa/index.html\">CCAL</a> license):</p>\n\n<p><img src=\"/assets/images/pcbi.0020009.g001.png\" alt=\"Figure 1 from the article: Sequence and Structure Ensembles of Two DLC2A Structures.\" /></p>\n\n<p>It is interesting to note that 1TGQ got replaced by <a href=\"http://www.pdb.org/pdb/explore.do?structureId=2B95\">2B95</a> about the same\ntime the article by Nabuurs was published, which shows a 3D model that is homologous with that of 1Y4O, and different from\nthat in the Nabuurs article.</p>",
      "summary": "While I was working on implementing proper author-given chain IDs in PDB structures for Jmol’s mmCIF reader today, I thought it was interesting to mention the recent article Traditional Biomolecular Structure Determination by NMR Spectroscopy Allows for Major Errors by Nabuurs (DOI:10.1371/journal.pcbi.0020009, open access), working at the CMBI, two floors away from my former working location at the Radboud University Nijmegen.",
      
      "date_published": "2006-04-02T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["pdb","crystal","pdb","cif"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1371/JOURNAL.PCBI.0020009", "doi": "10.1371/JOURNAL.PCBI.0020009"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/3en08-3zc34",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/04/02/free-online-chemconf-2006-conference.html",
      "title": "Free online ChemConf 2006 conference",
      "content_html": "<p>Internet has the nice feature of bringing together people. This has helped many open source projects in the past. But it is also a\nconvenient and cheap way to have conferences. Next month, the\n<a href=\"http://web.archive.org/web/20060213124001/http://www.ched-ccce.org/confchem/\">ChemConf 2006 <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>\nconference will be held, and interested people only need to subscribe to a mailing list to participate.</p>\n\n<p>The topic of this years ChemConf is Web-Based Applications for Chemical Education. At least three posters will show the use of\nJava applets in chemistry education, using <a href=\"https://jmol.org/\">Jmol</a>, <a href=\"http://jchempaint.sourceforge.net/\">JChemPaint</a> and\n<a href=\"http://jspecview.sourceforge.net/\">JSpecView</a>. I am (co-)author of two of them.</p>\n\n<p>Again, participation is free. So join in!</p>",
      "summary": "Internet has the nice feature of bringing together people. This has helped many open source projects in the past. But it is also a convenient and cheap way to have conferences. Next month, the ChemConf 2006 conference will be held, and interested people only need to subscribe to a mailing list to participate.",
      
      "date_published": "2006-04-02T00:00:00+00:00",
      "date_modified": "2024-02-19T00:00:00+00:00",
      "tags": ["conference","jmol","jchempaint","education"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/hysd0-wvc09",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/03/31/inchis-in-latex-and-cdk-news.html",
      "title": "InChI&apos;s in LaTex and CDK News",
      "content_html": "<p>An <a href=\"http://www.iupac.org/inchi/\">InChI</a> (or see the <a href=\"http://www.iupac.org/inchi/\">FAQ</a>) is a line notation\nfor a molecular structure that was recently developed by the <a href=\"http://www.nist.gov/\">NIST</a> and the\n<a href=\"http://www.iupac.org/\">IUPAC</a>. Principally they can be applied to protein too (see below), but because\nproteins would give lenghty InChI’s and are quite well defined in terms of connectivity anyway, those can\nbetter be described by their amino acid sequence.</p>\n\n<p>The March 2006 issue of <a href=\"http://almost.cubic.uni-koeln.de/cdk/cdk_top/cdk_news/\">CDK News</a>, the\n<a href=\"http://cdk.sf.net/\">Chemistry Development Kit</a> project newsletter, will be\n<a href=\"http://sourceforge.net/project/showfiles.php?group_id=20024&amp;package_id=124796\">released</a> later today,\nand had, for the second time, the requirment that authors provide InChI’s for molecular structures mentioned in the articles.\nDifferent from the previous issue is how InChI’s are marked up in LaTeX. I’ve setup a <code class=\"language-plaintext highlighter-rouge\">\\inchi{}</code>\nfor this that automatically creates a <a href=\"http://www.google.com/\">Google</a> search query as link behind the InChI:</p>\n\n<div class=\"language-latex highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">\\newcommand</span><span class=\"p\">{</span>\n  <span class=\"k\">\\inchi</span><span class=\"p\">}</span>[1]<span class=\"p\">{</span><span class=\"k\">\\href</span><span class=\"p\">{</span>http://www.google.com/search?q=#1<span class=\"p\">}</span>\n                  <span class=\"p\">{</span><span class=\"k\">\\normalfont\\texttt</span><span class=\"p\">{</span>InChI=#1<span class=\"p\">}</span>\n            <span class=\"p\">}</span>\n<span class=\"p\">}</span>\n</code></pre></div></div>\n\n<p>Now, googling for InChI’s only works if one removes the <code class=\"language-plaintext highlighter-rouge\">InChI=</code> part of the InChI. As an example I will show how it works\nfor methane. The InChI for this compound is <code class=\"language-plaintext highlighter-rouge\">InChI=1/CH4/h1H4</code>, so in LaTex one enters <code class=\"language-plaintext highlighter-rouge\">\\inchi{1/CH4/h1H4}</code>.\nThis will create a link like: <a href=\"http://www.google.com/search?q=1/CH4/h1H4\">InChI=1/CH4/h1H4</a>.</p>\n\n<p>BTW, if you are interested in InChI’s for proteins, here is the InChI for <a href=\"http://www.pdb.org/pdb/explore.do?structureId=1CRN\">1CRN</a>,\ncreated with <a href=\"http://openbabel.sourceforge.net/\">OpenBabel</a>:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>InChI=1/C202H439N55O64S6/c1-28-92(12)149-188(308)237-127-84-323-324-\n85-128(176(296)225-114(46-37-63-212-202(209)210)165(285)232-122(69-89(6)7)195(315)253-64-38-\n47-132(253)179(299)215-80-143(274)241-158(107(27)265)199(319)257-68-42-51-136(257)182(302)226-\n115(60-61-144(275)276)164(284)218-100(20)162(282)244-149)236-187(307)148(91(10)11)242-172(292)\n120(74-138(204)269)229-168(288)117(70-108-43-34-33-35-44-108)228-169(289)119(73-137(203)268)\n230-173(293)124(81-258)234-166(286)113(45-36-62-211-201(207)208)224-159(279)99(19)221-186(306)\n147(90(8)9)243-189(309)150(93(13)29-2)245-174(294)125(82-259)235-183(303)135-50-41-66-255(135)\n196(316)130-87-326-322-83-126(223-142(273)79-216-185(305)154(103(23)261)251-171(291)118(72-\n110-54-58-112(267)59-55-110)231-192(312)155(104(24)262)250-163(283)101(21)220-175(127)295)178\n(298)246-151(94(14)30-3)190(310)247-152(95(15)31-4)191(311)248-153(96(16)32-5)198(318)256-67-\n40-49-134(256)181(301)213-77-140(271)217-97(17)161(281)249-156(105(25)263)194(314)240-131\n(88-327-325-86-129(177(297)239-130)238-193(313)157(106(26)264)252-184(304)146(206)102(22)260)197\n(317)254-65-39-48-133(254)180(300)214-78-141(272)222-121(76-145(277)278)170(290)227-116(71-\n109-52-56-111(266)57-53-109)167(287)219-98(18)160(280)233-123(200(320)321)75-139(205)270/h89-\n202,211-252,258-321H,28-88,203-210H2,1-27H3/t92-,93-,94-,95-,96-,97-,98-,99-,100-,101-,102+,\n103+,104+,105+,106+,107+,109-,110-,111+,112+,113-,114-,115-,116-,117-,118-,119-,120-,121-,122-,\n123-,124-,125-,126-,127-,128-,129-,130-,131-,132-,133-,134-,135-,136-,137?,138-,139-,140-,141+,\n142-,143+,146-,147-,148-,149-,150-,151-,152-,153-,154-,155-,156-,157-,158-,159+,160?,161-,162?,\n163-,164-,165?,166+,167?,168+,169+,170+,171-,172+,173+,174+,175?,176-,177?,178+,179+,180-,\n181?,182-,183+,184?,185+,186+,187-,188-,189-,190+,191?,192-,193?,194-,195-,196-,197-,198-,199-/m0/s1\n</code></pre></div></div>",
      "summary": "An InChI (or see the FAQ) is a line notation for a molecular structure that was recently developed by the NIST and the IUPAC. Principally they can be applied to protein too (see below), but because proteins would give lenghty InChI’s and are quite well defined in terms of connectivity anyway, those can better be described by their amino acid sequence.",
      
      "date_published": "2006-03-31T00:00:00+00:00",
      "date_modified": "2024-03-10T00:00:00+00:00",
      "tags": ["inchi","cdk","cdknews","iupac","nist","google","protein","openbabel"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/nn7ag-7fp72",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/03/25/cologne-university-bioinformatics.html",
      "title": "The Cologne University BioInformatics Center (CUBIC)",
      "content_html": "<p>As of April 3, I will be working as postdoc in the group of <a href=\"http://almost.cubic.uni-koeln.de/jrg/\">Christoph Steinbeck</a>\nat the <a href=\"http://www.cubic.uni-koeln.de/\">Cologne University BioInformatics Center</a>, or simply CUBIC, for a year. Though\nno exact plans have been decided upon, the work will include <a href=\"http://cdk.sf.net/\">CDK</a>, <a href=\"http://www.xml-cml.org/\">CML</a>,\nontologies, <a href=\"http://www.bioclipse.net/\">Bioclipse</a>, semantic web technologies, <a href=\"http://www.jmol.org/\">Jmol</a>, and other\ninteresting things. Research areas will at least include <a href=\"http://qsar.sf.net/\">QSAR</a>, but I hope to touch bits of\nbioinformatics too.</p>",
      "summary": "As of April 3, I will be working as postdoc in the group of Christoph Steinbeck at the Cologne University BioInformatics Center, or simply CUBIC, for a year. Though no exact plans have been decided upon, the work will include CDK, CML, ontologies, Bioclipse, semantic web technologies, Jmol, and other interesting things. Research areas will at least include QSAR, but I hope to touch bits of bioinformatics too.",
      
      "date_published": "2006-03-25T00:00:00+00:00",
      "date_modified": "2024-03-10T00:00:00+00:00",
      "tags": ["qsar","ontology","cml","jmol","bioclipse","semweb","career"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/2f962-tpe07",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/03/18/how-to-make-money-from-open-source.html",
      "title": "How to make money from Open Source scientific software",
      "content_html": "<p><a href=\"http://www.openscience.org/blog/\">Dan</a> (the original <a href=\"http://www.jmol.org/\">Jmol</a> author) has an interesting blog series:\nHow to make money from Open Source scientific software <a href=\"http://www.openscience.org/blog/?p=164\">I</a>,\n<a href=\"http://www.openscience.org/blog/?p=165\">II</a> and <a href=\"http://www.openscience.org/blog/?p=166\">III</a>. Three more blog items are\nin the planning. The deal with how to make money from open source scientific software. He wants to be able to\nskeptically review the software in his field, hence open source. But open source software development, at least in\nchemistry, needs funding, because there are too few people working on such software on a voluntary basis.</p>\n\n<p>The articles discuss possible scenarios. Article I discusses ‘Sell hardware’ that comes with open source software, and\narticle II discusses the ‘Sell services’ scenario, which still works in the GNU/Linux OS world. He argues that\nselling support does not fit the chem-bla-ics world: <em>“First, scientific software targets a relatively small group of\nusers, and at the same time, the development and support costs are often quite large.”</em> and <em>“Why would a researcher spend\n$10000 on a support contract if the problem could be solved by throwing a graduate student at the open source version\nof the code for a few months?”</em> Interesting arguments indeed.</p>\n\n<p>Instead, he suggests, the service sold should be knowledge. The open source based company should sell knowledge,\nshould solve customer problems using open source software. Each problem will come with specific needs, allowing indirect\nfunding of open source development. And, yes, this is indeed how open source chemo-/bioinformatics software is\ncurrently development: as a mean to solve scientific challenging problems.</p>\n\n<p>I’m looking forward to his next articles in this series.</p>",
      "summary": "Dan (the original Jmol author) has an interesting blog series: How to make money from Open Source scientific software I, II and III. Three more blog items are in the planning. The deal with how to make money from open source scientific software. He wants to be able to skeptically review the software in his field, hence open source. But open source software development, at least in chemistry, needs funding, because there are too few people working on such software on a voluntary basis.",
      
      "date_published": "2006-03-18T00:00:00+00:00",
      "date_modified": "2025-02-13T00:00:00+00:00",
      "tags": ["jmol","openscience"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/364zg-cwy39",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/03/16/pdb-protein-database-uses-jmol.html",
      "title": "The PDB protein database uses Jmol",
      "content_html": "<p>The beta has been using <a href=\"http://www.jmol.org/\">Jmol</a> as one of the viewers for ages already, but this beta\nis no longer: it’s the new interface for the <a href=\"http://www.pdb.org/\">PDB database</a>.</p>",
      "summary": "The beta has been using Jmol as one of the viewers for ages already, but this beta is no longer: it’s the new interface for the PDB database.",
      
      "date_published": "2006-03-16T00:00:00+00:00",
      "date_modified": "2025-02-13T00:00:00+00:00",
      "tags": ["jmol"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/29xwa-ehq10",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/03/12/open-source-in-drug-discovery.html",
      "title": "Open source in drug discovery",
      "content_html": "<p>Geldenhuys et al. published an article in <a href=\"http://www.sciencedirect.com/science/journal/13596446\">Drug Discovery Today</a> titled\n<em>Optimizing the use of open-source software applications in drug discovery</em> (DOI:<a href=\"https://doi.org/10.1016/S1359-6446(05)03692-5\">10.1016/S1359-6446(05)03692-5</a>),\nand approached the review from a bench chemist point of view. Unfortunately, he discusses free, but closed source, program in one go.</p>\n\n<p>He discusses the advantages and problems with opensource, and mentions the often lacking user-friendly GUI (true),\nand the the lack of literature to validate the program. It was unclear to me wether the last argument applied to the free tools,\nor to the open source programs; I thought the open-source projects like the <a href=\"http://cdk.sf.net/\">CDK</a>,\n<a href=\"http://joelib.sf.net/\">JOELib</a>, <a href=\"http://www.jmol.org/\">Jmol</a> and <a href=\"http://pymol.sf.net/\">PyMol</a> were quite strong in this area,\nat least compared to the commercial software I have seen.</p>",
      "summary": "Geldenhuys et al. published an article in Drug Discovery Today titled Optimizing the use of open-source software applications in drug discovery (DOI:10.1016/S1359-6446(05)03692-5), and approached the review from a bench chemist point of view. Unfortunately, he discusses free, but closed source, program in one go.",
      
      "date_published": "2006-03-12T00:00:00+00:00",
      "date_modified": "2025-02-16T00:00:00+00:00",
      "tags": ["drugdiscovery","openscience"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1016/S1359-6446(05)03692-5", "doi": "10.1016/S1359-6446(05)03692-5"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/83mgj-93w85",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/03/11/more-chemistry-in-kde.html",
      "title": "More chemistry in KDE",
      "content_html": "<p>After <a href=\"http://edu.kde.org/kalzium/\">Kalzium</a> and\n<a href=\"https://web.archive.org/web/20150930165836/http://kde-apps.org/content/show.php?content=28995\">kfile_chemical <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>,\nKDE has now be extended with kparts for 3D structure and spectrum display:\n<a href=\"https://web.archive.org/web/20130721124532/http://www.kde-apps.org/content/show.php?content=36260\">Kryomol <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>.\nIt is written in C++ and licensed GPL. It supports several chemistry formats, among which quantum chemical formats like Gaussian03,\nNwChem and ACES, and 3D structures as MDL molefile and XYZ.</p>",
      "summary": "After Kalzium and kfile_chemical , KDE has now be extended with kparts for 3D structure and spectrum display: Kryomol . It is written in C++ and licensed GPL. It supports several chemistry formats, among which quantum chemical formats like Gaussian03, NwChem and ACES, and 3D structures as MDL molefile and XYZ.",
      
      "date_published": "2006-03-11T00:00:00+00:00",
      "date_modified": "2023-09-16T00:00:00+00:00",
      "tags": ["kde","chemistry"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/ef1pm-6g994",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/03/11/classpath-090-makes-jmol-application.html",
      "title": "Classpath 0.90 makes the Jmol application run",
      "content_html": "<p>A few days back, <a href=\"http://www.gnu.org/software/classpath/announce/20060306.html\">Classpath 0.90</a> was released, the first release after the 0.20 release. Earlier Classpath releases\n<a href=\"/blog/2005/11/27/open-source-swing-jmol-renderer-runs.html\">could run the rendering engine <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\nbut <a href=\"/blog/2005/11/18/goal-live-chemblaics-cd.html\">running the application failed so far <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.</p>\n\n<p>Today it hit Debian unstable, so upgrade my sid32 chroot and had <a href=\"http://www.cacaojvm.org/\">Cacao</a> run Jmol.\nI had some memory issues opening a small molecule (4-methyl-2-pentyne),\nand the rendering speed was a factor 100 or so slower than Sun’s JVM, but it runs!</p>\n\n<p>Using the command <code class=\"language-plaintext highlighter-rouge\">cacao -Xmx512M -jar Jmol.jar triplebond.mol</code> I got results.</p>\n\n<p>Note the exceptions copied to the console. Many thanx to the Classpath team!</p>\n\n<p>The stacktrace:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>The full stack trace:\n\njava.lang.IllegalArgumentException: width&lt;<span class=\"o\">=</span>0 height&lt;<span class=\"o\">=</span>0\nat java.awt.image.SampleModel.&lt;init&gt; <span class=\"o\">(</span>SampleModel.java:63<span class=\"o\">)</span>\nat java.awt.image.SinglePixelPackedSampleModel.&lt;init&gt; <span class=\"o\">(</span>SinglePixelPackedSampleModel.java:61<span class=\"o\">)</span>\nat java.awt.image.SinglePixelPackedSampleModel.&lt;init&gt; <span class=\"o\">(</span>SinglePixelPackedSampleModel.java:55<span class=\"o\">)</span>\nat org.jmol.g3d.Swing3D.allocateImage <span class=\"o\">(</span>Swing3D.java:65<span class=\"o\">)</span>\nat org.jmol.g3d.Platform3D.allocateBuffers <span class=\"o\">(</span>Platform3D.java:102<span class=\"o\">)</span>\nat org.jmol.g3d.Graphics3D.beginRendering <span class=\"o\">(</span>Graphics3D.java:697<span class=\"o\">)</span>\nat org.jmol.viewer.Viewer.render1 <span class=\"o\">(</span>Viewer.java:1840<span class=\"o\">)</span>\nat org.jmol.viewer.Viewer.renderScreenImage <span class=\"o\">(</span>Viewer.java:1798<span class=\"o\">)</span>\nat org.openscience.jmol.app.DisplayPanel.paint <span class=\"o\">(</span>DisplayPanel.java:100<span class=\"o\">)</span>\nat javax.swing.JComponent.paintChildren <span class=\"o\">(</span>JComponent.java:1659<span class=\"o\">)</span>\nat javax.swing.JComponent.paint <span class=\"o\">(</span>JComponent.java:1564<span class=\"o\">)</span>\nat javax.swing.JComponent.paintChildren <span class=\"o\">(</span>JComponent.java:1659<span class=\"o\">)</span>\nat javax.swing.JComponent.paint <span class=\"o\">(</span>JComponent.java:1564<span class=\"o\">)</span>\nat javax.swing.JComponent.paintChildren <span class=\"o\">(</span>JComponent.java:1659<span class=\"o\">)</span>\nat javax.swing.JComponent.paint <span class=\"o\">(</span>JComponent.java:1564<span class=\"o\">)</span>\nat javax.swing.JComponent.paintChildren <span class=\"o\">(</span>JComponent.java:1659<span class=\"o\">)</span>\nat javax.swing.JComponent.paint <span class=\"o\">(</span>JComponent.java:1564<span class=\"o\">)</span>\nat javax.swing.JComponent.paintChildren <span class=\"o\">(</span>JComponent.java:1659<span class=\"o\">)</span>\nat javax.swing.JComponent.paint <span class=\"o\">(</span>JComponent.java:1564<span class=\"o\">)</span>\nat javax.swing.JLayeredPane.paint <span class=\"o\">(</span>JLayeredPane.java:647<span class=\"o\">)</span>\nat javax.swing.JComponent.paintChildren <span class=\"o\">(</span>JComponent.java:1659<span class=\"o\">)</span>\nat javax.swing.JComponent.paint <span class=\"o\">(</span>JComponent.java:1564<span class=\"o\">)</span>\nat javax.swing.JComponent.paintDoubleBuffered <span class=\"o\">(</span>JComponent.java:1782<span class=\"o\">)</span>\nat javax.swing.JComponent.paint <span class=\"o\">(</span>JComponent.java:1555<span class=\"o\">)</span>\nat java.awt.Container<span class=\"nv\">$GfxPaintVisitor</span>.visit <span class=\"o\">(</span>Container.java:1888<span class=\"o\">)</span>\nat java.awt.Container.visitChild <span class=\"o\">(</span>Container.java:1703<span class=\"o\">)</span>\nat java.awt.Container.visitChildren <span class=\"o\">(</span>Container.java:1674<span class=\"o\">)</span>\nat java.awt.Container.paint <span class=\"o\">(</span>Container.java:770<span class=\"o\">)</span>\nat gnu.java.awt.peer.gtk.GtkWindowPeer.handleEvent <span class=\"o\">(</span>GtkWindowPeer.java:268<span class=\"o\">)</span>\nat java.awt.Component.dispatchEventImpl <span class=\"o\">(</span>Component.java:4968<span class=\"o\">)</span>\nat java.awt.Container.dispatchEventImpl <span class=\"o\">(</span>Container.java:1723<span class=\"o\">)</span>\nat java.awt.Window.dispatchEventImpl <span class=\"o\">(</span>Window.java:626<span class=\"o\">)</span>\nat java.awt.Component.dispatchEvent <span class=\"o\">(</span>Component.java:2320<span class=\"o\">)</span>\nat java.awt.EventQueue.dispatchEvent <span class=\"o\">(</span>EventQueue.java:474<span class=\"o\">)</span>\nat java.awt.EventDispatchThread.run <span class=\"o\">(</span>EventDispatchThread.java:60<span class=\"o\">)</span>\nat java.lang.VMThread.run <span class=\"o\">(</span>VMThread.java:121<span class=\"o\">)</span>\n</code></pre></div></div>",
      "summary": "A few days back, Classpath 0.90 was released, the first release after the 0.20 release. Earlier Classpath releases could run the rendering engine , but running the application failed so far .",
      
      "date_published": "2006-03-11T00:00:00+00:00",
      "date_modified": "2023-09-24T00:00:00+00:00",
      "tags": ["jmol"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/xvhk1-q8r72",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/03/06/progress-with-cmlrss-plugin-for.html",
      "title": "Progress with CMLRSS plugin for Bioclipse",
      "content_html": "<p>With quite some help from <a href=\"http://bioclipse.blogspot.com/\">Ola</a> (thanx!), I made good progress with the\n<a href=\"https://web.archive.org/web/20160413181618/http://wiki.bioclipse.net/index.php?title=CMLRSS_plugin\">CMLRSS plugin <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>.\nThe current result looks like:</p>\n\n<p><img src=\"/assets/images/Cmlrss_bioclipse.png\" alt=\"Screenshot of Bioclipse with an OPML file in the navigator on the left and some first extracted info.\" /></p>\n\n<p>A problem in the transition from Jumbo 5.0 to 5.1 is causing a problem so that it does not show a 3D model or 2D diagram, but that will follow soon.</p>",
      "summary": "With quite some help from Ola (thanx!), I made good progress with the CMLRSS plugin . The current result looks like:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/Cmlrss_bioclipse.png",
      "date_published": "2006-03-06T00:00:00+00:00",
      "date_modified": "2023-09-16T00:00:00+00:00",
      "tags": ["bioclipse","cml"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/qft6n-ctj44",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/02/25/open-source-jmol-taking-over-world.html",
      "title": "Open source Jmol taking over the world",
      "content_html": "<p>Earlier I already <a href=\"2006-02-01-open-source-jmol-hits-student-text.markdown\">reported</a> that student text books were picking up\n<a href=\"http://www.jmol.org/\">Jmol</a> as 3D viewer. Now, <a href=\"http://www.nature.com/nsmb/index.html\">Nature Structural &amp; Molecular Biology</a> reports\n(DOI:<a href=\"https://doi.org/10.1038/nsmb0206-93\">10.1038/nsmb0206-93</a>) that they picked it up too, using\n<a href=\"http://firstglance.jmol.org/\">FirstGlance in Jmol</a> (thanx Peter, for reporting this on the\n<a href=\"http://blueobelisk.org/\">Blue Obelisk</a> <a href=\"http://hardly.cubic.uni-koeln.de/mailman/listinfo/blue-obelisk\">mailing list</a>!).\nAnd, thanx Eric, for acknowledging the hard work of the <a href=\"http://sourceforge.net/project/memberlist.php?group_id=23629\">Jmol developers</a>.</p>\n\n<p>An example article in this Nature publication is Crystal structure of the essential N-terminal domain of telomerase reverse transcriptase\nby Jacobs et al. (DOI:<a href=\"http://dx.doi.org/10.1038/nsmb1054\">10.1038/nsmb1054</a>) about the structure of a part of the telomerase reverse\ntranscriptase (FirstGlance: <a href=\"http://molvis.sdsc.edu/fgij/fg.htm?mol=2B2A\">2B2A</a>). You can easily <a href=\"http://www.google.com/search?q=FirstGlance+site%3Anature.com\">google</a>\nfor more articles as they get indexed.</p>\n\n<p>Note that FirstGlance is certainly not the only webinterface using Jmol! An overview of <a href=\"http://wiki.jmol.org/WebsitesUsingJmol\">websites using Jmol</a>\nis found in the <a href=\"http://wiki.jmol.org/\">Jmol wiki</a>. Those who are not convinced yet, please check out <a href=\"http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=search&amp;DB=pubmed\">PubMed</a>\nand search for Jmol there.</p>\n\n<p>And, yes, this makes me a proud Jmol developer!</p>",
      "summary": "Earlier I already reported that student text books were picking up Jmol as 3D viewer. Now, Nature Structural &amp; Molecular Biology reports (DOI:10.1038/nsmb0206-93) that they picked it up too, using FirstGlance in Jmol (thanx Peter, for reporting this on the Blue Obelisk mailing list!). And, thanx Eric, for acknowledging the hard work of the Jmol developers.",
      
      "date_published": "2006-02-25T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["jmol"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1038/nsmb0206-93", "doi": "10.1038/nsmb0206-93"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1038/nsmb1054", "doi": "10.1038/nsmb1054"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/nth2m-yyk05",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/02/25/hacking-inchi-support-into.html",
      "title": "Hacking InChI support into postgenomic.com",
      "content_html": "<p>Earlier I <a href=\"2006-02-15-hot-articles-mining-semantic-web.markdown\">reported</a> about\n<a href=\"https://web.archive.org/web/20060303081952/https://postgenomic.com/\">postgenomic.com <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>,\nand needed some diversion from my manuscript work (could no longer think straight about the article I’m working on). So time for\nsome reading up on new technologies. Timing was perfect, because the source code of postgenomic.com got just uploaded to\n<a href=\"http://sourceforge.net/projects/postgenomic\">SourceForge SVN</a>.</p>\n\n<p>Though the author marks it as not-well-documented and alpha, I was quite happy to see a clear modularisation, and good enough\ndocs to get me started with <a href=\"http://www.iupac.org/inchi/\">InChI</a> support: if it can do mining for papers on\n<a href=\"http://www.doi.org/\">DOIs</a>, then it can do mining for InChI’s too.</p>\n\n<p>It does not show which blog items cite this compound, not does it extract some molecular info from PubChem, but\nI’m happy with the result of four hours of hacking. BTW, the first two InChI’s are left overs from bad\nregular expressions :)</p>",
      "summary": "Earlier I reported about postgenomic.com , and needed some diversion from my manuscript work (could no longer think straight about the article I’m working on). So time for some reading up on new technologies. Timing was perfect, because the source code of postgenomic.com got just uploaded to SourceForge SVN.",
      
      "date_published": "2006-02-25T00:00:00+00:00",
      "date_modified": "2023-09-16T00:00:00+00:00",
      "tags": ["cb","inchi"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/tsanv-nyz36",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/02/24/novel-qsar-and-qspr-descriptors_24.html",
      "title": "Novel QSAR and QSPR descriptors?",
      "content_html": "<p>For the past few weeks I have been working on a review article, which will contain a section with new QSAR/QSPR descriptors\npublished in the period 2000-now. Here are a few:</p>\n\n<ul>\n  <li>2001: oxygen paths of length 3 <a href=\"https://doi.org/10.1021/ci000116e\">10.1021/ci000116e</a></li>\n  <li>2002: a molecular shape descriptor <a href=\"https://doi.org/10.1021/ci000100o\">10.1021/ci000100o</a></li>\n  <li>2003: molecular signature <a href=\"https://doi.org/10.1021/ci020345w\">10.1021/ci020345w</a></li>\n  <li>2004: 4D-fingerprint <a href=\"https://doi.org/10.1021/ci049898s\">10.1021/ci049898s</a></li>\n  <li>2005: summed NMR shift difference <a href=\"https://doi.org/10.1021/ci049643e\">10.1021/ci049643e</a></li>\n</ul>\n\n<p>If you know additional new descriptors, or feel like discussion one or more of the above, please leave a comment.</p>",
      "summary": "For the past few weeks I have been working on a review article, which will contain a section with new QSAR/QSPR descriptors published in the period 2000-now. Here are a few:",
      
      "date_published": "2006-02-24T00:00:00+00:00",
      "date_modified": "2025-02-16T00:00:00+00:00",
      "tags": ["qsar","cheminf"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1021/CI000116E", "doi": "10.1021/CI000116E"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1021/CI000100O", "doi": "10.1021/CI000100O"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1021/CI020345W", "doi": "10.1021/CI020345W"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1021/CI049898S", "doi": "10.1021/CI049898S"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1021/CI049643E", "doi": "10.1021/CI049643E"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/nt2z0-80x71",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/02/22/blueobelisk-opensource-opendata-and.html",
      "title": "BlueObelisk: OpenSource, OpenData and OpenStandards",
      "content_html": "<p>OpenSource, OpenData and OpenStandards are not as strong in chemoinformatics as they are in bioinformatcs, where it is common knowledge\nthat sharing is a good. Today, the <a href=\"http://pubs3.acs.org/acs/journals/toc.page?incoden=jcisd8\">JCIM</a> published on the web an\n<a href=\"http://dx.doi.org/10.1021/ci050400b\">article</a> about the <a href=\"http://www.blueobelisk.org/\">Blue Obelisk</a> movement, which promotes these\nthree idealogies.</p>\n\n<p>Several open source projects participate, amongst which the <a href=\"http://cdk.sf.net/\">CDK</a>, <a href=\"http://www.jmol.org/\">Jmol</a>,\n<a href=\"http://joelib.sf.net/\">JOELib</a>, <a href=\"http://openbabel.sf.net/\">OpenBabel</a>, <a href=\"http://cml.sf.net/\">Chemical Markup Language</a>,\n<a href=\"http://bioclipse.net/\">Bioclipse</a> and <a href=\"http://cdk.sf.net/\">Kalzium</a>.</p>",
      "summary": "OpenSource, OpenData and OpenStandards are not as strong in chemoinformatics as they are in bioinformatcs, where it is common knowledge that sharing is a good. Today, the JCIM published on the web an article about the Blue Obelisk movement, which promotes these three idealogies.",
      
      "date_published": "2006-02-22T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["blue-obelisk","openscience","cdk","cml","bioclipse","kde","jmol"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1021/CI050400B", "doi": "10.1021/CI050400B"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/p37t7-7mz48",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/02/18/blogging-chemistry-on-blogspotcom.html",
      "title": "Blogging chemistry on blogspot.com",
      "content_html": "<p>You might have read earlier posts in this blog on <a href=\"https://doi.org/10.1021/ci034244p\">CMLRSS</a>, and received a question today on how to integrate\nCMLRSS with blogs on blogspot.com. Now, <a href=\"http://www.ch.ic.ac.uk/rzepa/cmlrss_distrib/\">current CMLRSS feeds</a> are normally generated with customized\nscripts, often directly from a database.</p>\n\n<p>So, here’s my attempt to include CML in a blogspot.com blog. <a href=\"http://openbabel.sf.net/\">OpenBabel 2.0</a> can create good CML, for example for acetic acid:</p>\n\n<cml:molecule xmlns:cml=\"http://www.xml-cml.org/schema/cml2/core\">\n<cml:atomArray atomID=\"a1 a2 a3 a4\" elementType=\"C C O O\" formalCharge=\"0 0 0 0\" />\n<cml:bondArray atomRef1=\"a1 a2 a2\" atomRef2=\"a2 a3 a4\" order=\"1 2 1\" />\n</cml:molecule>\n\n<p>Nothing much to see, right? Well, that’s good, because it’s inserted as CML, not as anything readable, like this equivalent:</p>\n\n<div class=\"language-xml highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nt\">&lt;cml:molecule</span> <span class=\"na\">xmlns:cml=</span><span class=\"s\">\"http://www.xml-cml.org/schema/cml2/core\"</span><span class=\"nt\">&gt;</span>\n<span class=\"nt\">&lt;cml:atomArray</span> <span class=\"na\">atomID=</span><span class=\"s\">\"a1 a2 a3 a4\"</span> <span class=\"na\">elementType=</span><span class=\"s\">\"C C O O\"</span> <span class=\"na\">formalCharge=</span><span class=\"s\">\"0 0 0 0\"</span><span class=\"nt\">/&gt;</span>\n<span class=\"nt\">&lt;cml:bondArray</span> <span class=\"na\">atomRef1=</span><span class=\"s\">\"a1 a2 a2\"</span> <span class=\"na\">atomRef2=</span><span class=\"s\">\"a2 a3 a4\"</span> <span class=\"na\">order=</span><span class=\"s\">\"1 2 1\"</span><span class=\"nt\">/&gt;</span>\n<span class=\"nt\">&lt;/cml:molecule&gt;</span>\n</code></pre></div></div>\n\n<p>I am curious how this will come out in the RSS feed. Maybe it is usefull; please read the comments for additional notes.</p>",
      "summary": "You might have read earlier posts in this blog on CMLRSS, and received a question today on how to integrate CMLRSS with blogs on blogspot.com. Now, current CMLRSS feeds are normally generated with customized scripts, often directly from a database.",
      
      "date_published": "2006-02-18T00:00:00+00:00",
      "date_modified": "2006-02-18T00:00:00+00:00",
      "tags": ["cml","semweb"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1021/CI034244P", "doi": "10.1021/CI034244P"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/zm41a-h0037",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/02/17/chemical-reactions-in-cml.html",
      "title": "Chemical reactions in CML",
      "content_html": "<p>Gemma Holiday’s article on CMLReact was published in the january issue of the <a href=\"http://pubs3.acs.org/acs/journals/toc.page?incoden=jcisd8\">JCIM</a>\n(doi:<a href=\"https://doi.org/10.1021/ci0502698\">10.1021/ci0502698</a>), which seems to be marked as sample issue right now. She used CMLReact as data format for\n<a href=\"http://www-mitchell.ch.cam.ac.uk/macie/\">MACiE <i class=\"fa-solid fa-link-slash fa-xs\"></i></a> (see doi:<a href=\"https://doi.org/10.1093/bioinformatics/bti693\">10.1093/bioinformatics/bti693</a>), a\ndatabase of 100 enzyme reactions, with fully annotated reaction mechanisms, making this an remarkable and insightfull database.</p>\n\n<p>Now, the nice thing is that this CML should be readable and renderable by the <a href=\"http://cdk.sf.net/\">CDK</a>, though the webinterface uses\nSVG and can be used using <a href=\"http://www.mozilla.com/firefox/\">FireFox</a> too.</p>",
      "summary": "Gemma Holiday’s article on CMLReact was published in the january issue of the JCIM (doi:10.1021/ci0502698), which seems to be marked as sample issue right now. She used CMLReact as data format for MACiE (see doi:10.1093/bioinformatics/bti693), a database of 100 enzyme reactions, with fully annotated reaction mechanisms, making this an remarkable and insightfull database.",
      
      "date_published": "2006-02-17T00:00:00+00:00",
      "date_modified": "2025-02-16T00:00:00+00:00",
      "tags": ["cdk","cml","bioinfo"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1021/ci0502698", "doi": "10.1021/ci0502698"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1093/bioinformatics/bti693", "doi": "10.1093/bioinformatics/bti693"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/5z2k2-86w60",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/02/15/hot-articles-mining-semantic-web.html",
      "title": "Hot articles; mining the semantic web",
      "content_html": "<p><a href=\"https://web.archive.org/web/20100730101359/http://www.molgen.mpg.de/~krause//\">Roland Krause <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>\n<a href=\"http://binf.twoday.net/stories/1572879/\">discussed today</a> in his blog <a href=\"http://binf.twoday.net/\">Notes from the Biomass</a> an interesting\nwebsite: <a href=\"https://web.archive.org/web/20060409032031/http://postgenomic.com/\">postgenomic.com <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>.\nThis website, still marked BETA, mines blogs in the field of genomics and extract noteworthy statistics from it: which articles are cited in those blogs.</p>\n\n<p>For example, the most discussed article is Kai Wang’s <a href=\"https://doi.org/10.1038/439534a\">Gene-function wiki would let biologists pool worldwide resources <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nin <a href=\"http://www.nature.com/\">Nature</a>. Additionally, postgenomic.com links to the DOI, PubMed and shows which blogs discuss the article.</p>\n\n<p>Wow. This really shows what happens when you start doing things in a semantic way!</p>\n\n<p>Now, what does this mean to the <em>molecular web</em>? We already have chemistry enriched blogs, i.e.\n<a href=\"https://doi.org/10.1021/ci034244p\">CMLRSS <i class=\"fa-solid fa-recycle fa-xs\"></i></a>. Now, let’s make a website\nthat mines chemoinformatics blogs in the same way that postgenomic.com does, and not stick with statistics for article citations,\nbut add statistics for citing molecules too! Start discussing the molecules we find in our CMLRSS feeds!</p>",
      "summary": "Roland Krause discussed today in his blog Notes from the Biomass an interesting website: postgenomic.com . This website, still marked BETA, mines blogs in the field of genomics and extract noteworthy statistics from it: which articles are cited in those blogs.",
      
      "date_published": "2006-02-15T00:00:00+00:00",
      "date_modified": "2025-02-16T00:00:00+00:00",
      "tags": ["bioinfo","cb","semweb"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1021/CI034244P", "doi": "10.1021/CI034244P"
             }
            ,
          
        
          
          
            { "url": "https://doi.org/10.1038/439534a", "doi": "10.1038/439534a"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/b9e4k-8j009",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/02/13/kalzium-wins-award-carsten-niehaus.html",
      "title": "Kalzium Wins Award; Carsten Niehaus Interviewed",
      "content_html": "<p>I was very pleased to read today that <a href=\"http://edu.kde.org/kalzium/\">Kalzium</a>, one of the projects that participate in the\n<a href=\"http://blueobelisk.org/\">Blue Obelisk</a>, <a href=\"http://dot.kde.org/1139779450/\">got awarded</a>! Cheers, Carsten!</p>",
      "summary": "I was very pleased to read today that Kalzium, one of the projects that participate in the Blue Obelisk, got awarded! Cheers, Carsten!",
      
      "date_published": "2006-02-13T00:00:00+00:00",
      "date_modified": "2025-02-13T00:00:00+00:00",
      "tags": ["kde","blue-obelisk"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/2ky91-4yz69",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/02/06/test-suite-for-free-open-source-jvms.html",
      "title": "A test suite for free, open source JVMs",
      "content_html": "<p>This weekend I continued my work on getting the <a href=\"http://cdk.sf.net/\">CDK</a> and <a href=\"http://www.jmol.org/\">Jmol</a> run with free, open source JVMs.\nReally, a lot works fine, as reported earlier in this blog: JChemPaint works and Jmol almost works (see the\n<a href=\"http://developer.classpath.org/mediation/FreeSwingTestApps\">Classpath’s FreeSwingTestApps wiki page</a>), and well over 95% of the CDK JUnit\ntests run without trouble too. So it comes down to identifying what does not run properly, and file bugs for this. For example,\n<a href=\"http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26101\">26101</a> and <a href=\"http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26108\">26108</a>.</p>\n\n<p>To make this finding bugs in Classpath and the free virtual machines easier, I have setup a CDK based test suite: the CDK\n<a href=\"http://sourceforge.net/project/showfiles.php?group_id=20024\">OpenSource JVM Test Suite</a>. The idea is it can be used for regression testing,\nand identification of bugs in the virtual machines. It can also be used to do timing benchmarks, and I will report on both of these soon.</p>\n\n<p>But I first need to write some scripts to make nice XHTML pages. And, I have tweaked the CDK tests to skip known bugs, so that all reported\nbugs are actually caused by the virtual machine and the Java library that it uses, and not by a bug in the CDK itself.</p>",
      "summary": "This weekend I continued my work on getting the CDK and Jmol run with free, open source JVMs. Really, a lot works fine, as reported earlier in this blog: JChemPaint works and Jmol almost works (see the Classpath’s FreeSwingTestApps wiki page), and well over 95% of the CDK JUnit tests run without trouble too. So it comes down to identifying what does not run properly, and file bugs for this. For example, 26101 and 26108.",
      
      "date_published": "2006-02-06T00:00:00+00:00",
      "date_modified": "2025-02-13T00:00:00+00:00",
      "tags": ["linux","java","cdk","jmol"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/3vppj-ez166",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/02/06/tagging-blog-items.html",
      "title": "Tagging blog items",
      "content_html": "<p>If you have read <a href=\"/blog/2006/02/06/blog-about-bioinformatics-semantic-web.html\">my previous post</a>\nand visited that other blog, you might have noted the\n<a href=\"http://web.archive.org/web/20060207020403/http://www.technorati.com/tags/\">Technorati keywords <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>.\nOr tags, really, as explained in this <a href=\"http://microformats.org/wiki/reltag\">rel=”tag”</a> microformat. Adding them\nto blog items, will enable indexing by Technorati, one of the bigger blog search engines. So, from now on,\nyou’ll see these tags in my items too, hoping they don’t get annoying. No idea, btw, how blog planets respond to them…\nFor the record, the tags I list below are general for my blog, and not for this blog item specifically.</p>\n\n<p>Update: The idea was discontinued at some point. The tags are now local to this blog.</p>",
      "summary": "If you have read my previous post and visited that other blog, you might have noted the Technorati keywords . Or tags, really, as explained in this rel=”tag” microformat. Adding them to blog items, will enable indexing by Technorati, one of the bigger blog search engines. So, from now on, you’ll see these tags in my items too, hoping they don’t get annoying. No idea, btw, how blog planets respond to them… For the record, the tags I list below are general for my blog, and not for this blog item specifically.",
      
      "date_published": "2006-02-06T00:00:00+00:00",
      "date_modified": "2023-08-19T00:00:00+00:00",
      "tags": ["cheminf","chemometrics","bioinfo","technorati"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/2rv6n-k4s95",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/02/06/blog-about-bioinformatics-semantic-web.html",
      "title": "A blog about bioinformatics, semantic web, comics and social networks.",
      "content_html": "<p>I never got around to mentioning this blog, but <a href=\"http://plindenbaum.blogspot.com/\">YAKAFOKON</a> is a nice blog about, as the\ntitel already says, bioinformatics, the semantic web and social networks. Nice to read, and interesting comments on the\nfunction and features of the internet and how they relate to bioinformatics, and science in general. Recommended!</p>",
      "summary": "I never got around to mentioning this blog, but YAKAFOKON is a nice blog about, as the titel already says, bioinformatics, the semantic web and social networks. Nice to read, and interesting comments on the function and features of the internet and how they relate to bioinformatics, and science in general. Recommended!",
      
      "date_published": "2006-02-06T00:00:00+00:00",
      "date_modified": "2025-02-13T00:00:00+00:00",
      
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/61fm6-pey06",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/02/04/skype-on-kubuntu-using-tiptel-usb.html",
      "title": "Skype on Kubuntu using a Tiptel USB telephone",
      "content_html": "<p>Because I wanted to test internet telephony I downloaded <a href=\"http://www.skype.com/\">Skype</a> and tried to get it to work on my\n<a href=\"http://www.kubuntu.org/\">Kubuntu</a> system. Unfortunately, the Skype version is only 1.2.0.18, and it does not work well with\n<code class=\"language-plaintext highlighter-rouge\">arts</code> :( That is, using <code class=\"language-plaintext highlighter-rouge\">artsdsp</code> it crashes with segfaults whenever I start even a chat, let alone a phone call. This\ncould be worked around by disabling sound in my KDE session, and then the <code class=\"language-plaintext highlighter-rouge\">/dev/dsp</code> is open again.</p>\n\n<p>Better even, I bought a USB telephone yesterday: a reasonably cheap <a href=\"http://www.tiptel.nl/\">Tiptel 115</a>, with\n<a href=\"http://www.skypefoon.nl/skype_telefoon_info.php/products_id/126\">Skype support</a>. Kubunty breezy recognized the USB device,\nadded a <code class=\"language-plaintext highlighter-rouge\">/dev/dsp1</code> and after running <code class=\"language-plaintext highlighter-rouge\">alsamixer</code> to raise the sound levels, it seems to work fine, though did not have an\nactual phone call yet :) I enabled KDE sound again, which is in the first device, and Skype runs on the second.\nNo more segfaults it seems.</p>",
      "summary": "Because I wanted to test internet telephony I downloaded Skype and tried to get it to work on my Kubuntu system. Unfortunately, the Skype version is only 1.2.0.18, and it does not work well with arts :( That is, using artsdsp it crashes with segfaults whenever I start even a chat, let alone a phone call. This could be worked around by disabling sound in my KDE session, and then the /dev/dsp is open again.",
      
      "date_published": "2006-02-04T00:00:00+00:00",
      "date_modified": "2025-02-13T00:00:00+00:00",
      "tags": ["linux"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/3j6pf-yw823",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/02/02/dutch-google-news-themes-messed-up.html",
      "title": "Dutch Google News themes messed up",
      "content_html": "<p>Recently, a <a href=\"http://news.google.nl/\">Dutch version of Google News</a> was started, and might mean a replacement for\n<a href=\"http://nu.nl/\">nu.nl</a>. I do not like the verbose layout much, because it makes it more difficult to scan headlines.\nI do like the themes. Except for one.</p>\n\n<p>The English theme ‘Sci/Tech’ is Wetenschap in the Dutch version, or plain Science. And it annoys me to read IT headlines\nwhen looking up scientific news. Is a IE 7 beta really science, or did the translators mess up? (If any Google employee\nis reading this: please split up those two themes.)</p>",
      "summary": "Recently, a Dutch version of Google News was started, and might mean a replacement for nu.nl. I do not like the verbose layout much, because it makes it more difficult to scan headlines. I do like the themes. Except for one.",
      
      "date_published": "2006-02-02T00:00:00+00:00",
      "date_modified": "2023-08-16T00:00:00+00:00",
      
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/1amt8-5me42",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/02/01/open-source-jmol-hits-student-text.html",
      "title": "Open source Jmol hits student text book Biochemistry",
      "content_html": "<p>Today I received news on the <a href=\"http://sourceforge.net/mail/?group_id=23629\">Jmol user list</a> that Lubert Stryer’s\n<a href=\"https://www.macmillanlearning.com/college/us/product/Biochemistry/p/1319333621\">Biochemistry <i class=\"fa-solid fa-recycle fa-xs\"></i></a> replaced the\n<a href=\"https://en.wikipedia.org/wiki/MDL_Chime\">proprietary Chime <i class=\"fa-solid fa-recycle fa-xs\"></i></a> with the open source\n<a href=\"http://www.jmol.org/\">Jmol</a>. The third edition from which I learned biochemistry in my first year at the university did not feature a CD with live\nfigures, but I am very thrilled to see a program on which I have actively programmed hit a text book I used myself in the past.</p>",
      "summary": "Today I received news on the Jmol user list that Lubert Stryer’s Biochemistry replaced the proprietary Chime with the open source Jmol. The third edition from which I learned biochemistry in my first year at the university did not feature a CD with live figures, but I am very thrilled to see a program on which I have actively programmed hit a text book I used myself in the past.",
      
      "date_published": "2006-02-01T00:00:00+00:00",
      "date_modified": "2023-08-16T00:00:00+00:00",
      "tags": ["jmol","publishing"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/h25t3-mpk14",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/01/27/1d-nmr-spectra-do-not-work-in-qspr.html",
      "title": "1D NMR Spectra do not work in QSPR",
      "content_html": "<p>About two years ago a student started with me to work on the use of 1D NMR and IR spectra in quantitative structure-activity relationship\n(QSAR) work, with the goal to show that these spectra contain 3D information relevent to QSAR models. It is known that these spectra\ndepend on the 3D conformation of the molecule.</p>\n\n<p>Half a year later we concluded that from the data which we started with (48 compounds with binding affinity), no conclusions could be drawn\nwhat so ever: no statistically sound models could be build at all. So, we composed three larger data sets. These sets, all QSPR data sets,\ndid give us models, but all the spectra based models were worse than a <a href=\"http://web.archive.org/web/20080113162439/http://www.talete.mi.it/dragon_net.htm\">Dragon <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>\ndescriptor based model using the same number of variables, without doing any variable selection.</p>\n\n<p>I presented this work at the 7th <a href=\"https://iccs-nl.org/\">ICCS <i class=\"fa-solid fa-recycle fa-xs\"></i></a> in Noordwijkerhout half a year ago, and now got published in the JCIM: DOI\n<a href=\"https://doi.org/10.1021/ci050282s\">10.1021/ci050282s</a>. Comments on this article are <strong><em>most</em></strong> welcome!</p>",
      "summary": "About two years ago a student started with me to work on the use of 1D NMR and IR spectra in quantitative structure-activity relationship (QSAR) work, with the goal to show that these spectra contain 3D information relevent to QSAR models. It is known that these spectra depend on the 3D conformation of the molecule.",
      
      "date_published": "2006-01-27T00:00:00+00:00",
      "date_modified": "2025-02-15T00:00:00+00:00",
      "tags": ["cheminf"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1021/CI050282S", "doi": "10.1021/CI050282S"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/rn78z-r7j37",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/01/22/trouble-running-cdk-junit-tests-with.html",
      "title": "Trouble running the CDK JUnit tests with Cacao and Kaffe",
      "content_html": "<p>Because I am still looking forward to testing CDK against the latest <a href=\"http://gnu.wildebeest.org/diary/index.php?p=147\">Classpath 0.20</a>,\nI downloaded cacao 0.94-1 for Debian sid, then tried to compile CDK with it:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nv\">JAVA_HOME</span><span class=\"o\">=</span>/usr/lib/jvm/cacao ant <span class=\"nt\">-Dbuild</span>.compiler<span class=\"o\">=</span>gcj clean test-all\n</code></pre></div></div>\n\n<p>But that hangs at some point with zero load. I have no idea what is going on there. I’ve spoken with twisti on the\n#classpath IRC channel, and he helped me run the compile with gdb, which indicated that at some point all threads were waiting.</p>\n\n<p>I also tried it with kaffe 1.1.6.91-2 in sid, but now with a XML parser in the CLASSPATH, as Dalibor in\n<a href=\"/blog/2006/01/06/open-source-java-tool-chain-cdk.html\">a previous blog item suggested <i class=\"fa-solid fa-recycle fa-xs\"></i></a>:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nb\">export </span><span class=\"nv\">CLASSPATH</span><span class=\"o\">=</span>/usr/share/java/xercesImpl.jar:xmlParserAPIs.jar\n<span class=\"nv\">JAVA_HOME</span><span class=\"o\">=</span>/usr/lib/kaffe ant <span class=\"nt\">-Dbuild</span>.compiler<span class=\"o\">=</span>gcj clean test-all\n</code></pre></div></div>\n\n<p>But that failed too with:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nb\">test</span>:\n    <span class=\"o\">[</span>junit] Running org.openscience.cdk.test.CDKTests\n    <span class=\"o\">[</span>junit] kaffe-bin: /home/mkoch/debian/kaffe/kaffe-1.1.6.91/build-tree/kaffe-1.1.6.91/kaffe/kaffevm/jit3/machine.c:276: translate: Assertion <span class=\"sb\">`</span>reinvoke <span class=\"o\">==</span> <span class=\"nb\">false</span><span class=\"s1\">' failed.\n    [junit] Test org.openscience.cdk.test.CDKTests FAILED\n</span></code></pre></div></div>\n\n<p>It did work previously :(</p>\n\n<p>OK, to reproduce this yourself, you need to check out CDK from CVS (hoping that anonymous CVS is reasonable in sync, and online) with:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>cvs <span class=\"nt\">-d</span>:pserver:anonymous@cvs.sourceforge.net:/cvsroot/cdk login\ncvs <span class=\"nt\">-z3</span> <span class=\"nt\">-d</span>:pserver:anonymous@cvs.sourceforge.net:/cvsroot/cdk co <span class=\"nt\">-P</span> cdk\n</code></pre></div></div>",
      "summary": "Because I am still looking forward to testing CDK against the latest Classpath 0.20, I downloaded cacao 0.94-1 for Debian sid, then tried to compile CDK with it:",
      
      "date_published": "2006-01-22T00:00:00+00:00",
      "date_modified": "2023-08-11T00:00:00+00:00",
      "tags": ["cdk","java"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/hgjn2-w5e63",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/01/19/free-at-last.html",
      "title": "Free at last!",
      "content_html": "<p>Free at last! Well, not quite yet, but close enough anyway: my PhD contract has ended; last friday was my last working day, which my\ncollegues and I celebrated with a visit to Nijmegen oldest bar, <a href=\"https://indeblaauwehand.nl/in-de-blaauwe-hand/\">In de Blauwe Hand <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.\nBut I still have my manuscript to finish. This formally ends a period of almost 12.5 years at the <a href=\"http://ru.nl/\">Radboud University Nijmegen</a>.</p>\n\n<p>Starting last monday I’m at home, trying to get things finished as soon as possible. Mostly working on my laptop, remote logged in into\nour desktop machine downstairs. A good ADSL (170kB downstream) helps a lot too, and the proxy on my university machine allows me to\naccess the full access journals of my university.</p>\n\n<p>I’m trying to dome some open source chemoinformatics in between writing, and my current QSAR research actually allows me to do some\nfeature enhancement in CDK’s QSAR package too. Today, I hope to write and finish a <a href=\"http://sourceforge.net/mailarchive/forum.php?thread_id=9476956&amp;forum_id=2178\">config file architecture <i class=\"fa-solid fa-link-slash fa-xs\"></i></a>\nthat allow fine tuning which QSAR descriptors should be calculated. I anticipate a default config files to be distributed.</p>\n\n<p>Additionally, I will try to finish running teh CDK JUnit test against <a href=\"http://gnu.wildebeest.org/diary/index.php?p=147\">Classpath 0.20</a>,\nwhich 98% of Java 1.4.2 covered, and the limited support for HTML rendering is most of this last 2%. The Classpath progress has\nreally amazed me over the last few weeks. I have not tested Jmol and JChemPaint against the latest open source java tools, but will\ntry to do that before I go on holiday next week. Results with 0.19 were very promising, as I reported in earlier blog entries.</p>",
      "summary": "Free at last! Well, not quite yet, but close enough anyway: my PhD contract has ended; last friday was my last working day, which my collegues and I celebrated with a visit to Nijmegen oldest bar, In de Blauwe Hand . But I still have my manuscript to finish. This formally ends a period of almost 12.5 years at the Radboud University Nijmegen.",
      
      "date_published": "2006-01-19T00:00:00+00:00",
      "date_modified": "2023-08-10T00:00:00+00:00",
      "tags": ["phd","cdk"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/737fz-e9f47",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/01/11/uspto-considers-open-source-software.html",
      "title": "USPTO considers open source software prior art",
      "content_html": "<p>This is the best news I heard in weeks! The <a href=\"http://www.uspto.gov/\">US Patent and Trade Offfice</a> spoke with open source representatives\nabout ways to deal with open source software as prior art. Apparently, their problem was how to be sure about release dates of open source,\nand authoritative sites like <a href=\"http://www.sf.net/\">SourceForge.net</a>,\n<a href=\"http://freshmeat.net/\">FreshMeat.net</a> help a lot here, which extensive logging of releases.</p>\n\n<p>Quoting from there website:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>The Department of Commerce’s United States Patent and Trademark Office (USPTO)\nhas created a partnership with the open source community to ensure that patent\nexaminers have access to all available prior art relating to software code\nduring the patent examination process.\n</code></pre></div></div>\n\n<p>It also indicates that releasing open source software with, or announcing it on, such an authoritative website is important! Otherwise, patent offices will not be able to decide wether our open source art is really prior.</p>",
      "summary": "This is the best news I heard in weeks! The US Patent and Trade Offfice spoke with open source representatives about ways to deal with open source software as prior art. Apparently, their problem was how to be sure about release dates of open source, and authoritative sites like SourceForge.net, FreshMeat.net help a lot here, which extensive logging of releases.",
      
      "date_published": "2006-01-11T00:00:00+00:00",
      "date_modified": "2025-02-13T00:00:00+00:00",
      "tags": ["openscience"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/933ah-c7f36",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/01/06/open-source-java-tool-chain-cdk.html",
      "title": "Open Source Java tool chain: CDK compiles and JUnit tests run",
      "content_html": "<p>While waiting for a <a href=\"http://www.talete.mi.it/products/dragon_description.htm\">Dragon</a> calculation to finish (it does not work for molecules with more than\n300 atoms!), I updated <a href=\"http://cdk.sf.net/\">CDK</a>’s build.xml to support <a href=\"http://www.gnu.org/software/classpath/cp-tools/\">gjdoc</a>. The build script is now\nable to compile the custom doclets we use for creating the <code class=\"language-plaintext highlighter-rouge\">src/*.javafiles</code> and others from the Java source files. And using\n<a href=\"http://gcc.gnu.org/onlinedocs/gcc-3.0.4/gcj_8.html\">gij</a> I could also run\n<a href=\"http://cvs.sourceforge.net/viewcvs.py/cdk/cdk/src/org/openscience/cdk/test/\">CDK’s 1688 JUnit tests</a>!</p>\n\n<p>On my Debian GNU/Linux sid chroot, I have java-gcj-compat installed allowing me to do (thanx man-di!):</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>JAVA_HOME=/usr/lib/jvm/java-1.4.2-gcj-4.0-1.4.2.0 ant -Dbuild.compiler=gcj runDoclet\nJAVA_HOME=/usr/lib/jvm/java-1.4.2-gcj-4.0-1.4.2.0 ant -Dbuild.compiler=gcj test-all\n</code></pre></div></div>\n\n<p>The first command creates the custom doclets, while the second command compiles the CDK and runs the JUnit tests. For Classpath developers:\n<a href=\"http://sourceforge.net/cvs/?group_id=20024\">here</a>’s how to check out the cdk module from CVS.</p>\n\n<p>The results are interesting: while Sun’s JVM gives 11 problems, gij gives 399 problems. The test-all target creates a <code class=\"language-plaintext highlighter-rouge\">reports/result.txt</code>\ndocument listing all failing tests, and I’ve put the <a href=\"http://www.woc.science.ru.nl/devel/egonw/diff_cdk_junit_sun_vs_gij_debianSid_20060106.txt\">diff -u</a>\nfor the two JVMs online. I will make diffs for jamvm, kaffe and cacao too.</p>\n\n<p>I hope this gives the free Java community extra feedback on the excellent work they are doing.</p>",
      "summary": "While waiting for a Dragon calculation to finish (it does not work for molecules with more than 300 atoms!), I updated CDK’s build.xml to support gjdoc. The build script is now able to compile the custom doclets we use for creating the src/*.javafiles and others from the Java source files. And using gij I could also run CDK’s 1688 JUnit tests!",
      
      "date_published": "2006-01-06T00:00:00+00:00",
      "date_modified": "2025-02-13T00:00:00+00:00",
      "tags": ["cdk","java"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/s0ftg-ppb65",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/01/03/kubuntu-xrandr-and-tv-out.html",
      "title": "Kubuntu, XRandR and TV-OUT",
      "content_html": "<p>One of the things I had not fully figured out up to today, was how to configure my <a href=\"http://www.kubuntu.org/\">Kubuntu</a> system to easily view DVDs on our TV,\nusing my <a href=\"http://www.nvidia.com/\">NVIDIA</a>’s TV-OUT. I’ve seen xorg.conf files that define a X11 server for the monitor and a second for the TV, and files\nthat use TwinView. Now, I did not really like the way first option worked, so tried the second.</p>\n\n<p>Unfortunately, I had to reconfigure and restart my X11 each time my kids wanted to see <a href=\"https://nl.wikipedia.org/wiki/Bob_de_Bouwer\">Bob the Builder <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.\nI already knew about <a href=\"http://wiki.x.org/X11R6.8.1/doc/Xrandr.3.html\">XRandR</a>, and today finally had a look at it again, and got it to work without much\ntrouble this time. (Lesson: if something does not work, let it rest and try again half a year later.)</p>\n\n<p>For the googlers, this is what my <a href=\"http://wiki.x.org/X11R6.8.0/doc/xorg.conf.5.html\">xorg.conf</a> <code class=\"language-plaintext highlighter-rouge\">Screen</code> section now looks like:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>Section \"Screen\"\n Identifier \"Default Screen\"\n Device  \"NVIDIA Corporation NV18 [GeForce4 MX 4000 AGP 8x]\"\n Monitor  \"Hansol H711\"\n DefaultDepth 24\n Option \"TwinView\" \"on\"\n Option \"TwinViewOrientation\" \"clone\"\n Option \"SecondMonitorHorizSync\"     \"30-50\"\n Option \"SecondMonitorVertRefresh\"   \"60\"\n Option  \"MetaModes\" \"1280x1024,1280x1024;1024x768,1024x768\"\n Option \"TVStandard\" \"PAL-B\"\n Option \"TVOutFormat\" \"SVIDEO\"\n Option \"ConnectedMonitor\" \"crt, tv\"\n SubSection \"Display\"\n  Depth  24\n  Modes  \"1280x1024\" \"1024x768\" \"832x624\" \"800x600\" \"720x400\" \"640x480\"\n EndSubSection\nEndSection\n</code></pre></div></div>\n\n<p>And now, to switch resolution, I can just do:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nb\">sudo </span>xrandr <span class=\"nt\">-s</span> 1\n<span class=\"c\"># watch DVD</span>\n<span class=\"nb\">sudo </span>xrandr <span class=\"nt\">-s</span> 0\n</code></pre></div></div>\n\n<p>PS. Happy new year!</p>",
      "summary": "One of the things I had not fully figured out up to today, was how to configure my Kubuntu system to easily view DVDs on our TV, using my NVIDIA’s TV-OUT. I’ve seen xorg.conf files that define a X11 server for the monitor and a second for the TV, and files that use TwinView. Now, I did not really like the way first option worked, so tried the second.",
      
      "date_published": "2006-01-03T00:00:00+00:00",
      "date_modified": "2023-08-09T00:00:00+00:00",
      "tags": ["kde","linux"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/2jyfn-d1910",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/12/28/good-bad-and-ugly-molecules.html",
      "title": "The good, the bad and the ugly molecules",
      "content_html": "<p>Derek Lowe is the author of the blog <a href=\"https://web.archive.org/web/20051229035537/http://corante.com/pipeline/\">In the Pipeline <i class=\"fa-solid fa-box-archive fa-xs\"></i></a> which is really fun to read. Derek works in\npharmaceutical industry and gives a great insight in how things work in that field of molecular sciences. Yesterday he blogged about\n<a href=\"https://web.archive.org/web/20080611192217/http://www.corante.com/pipeline/archives/2005/12/27/what_makes_an_ugly_molecule.php\">What Makes an Ugly Molecule? <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>, and touches the\nRule-of-Five, the hydrochloric acid bath (aka stomach), and other reasons that make molecules ugly.</p>\n\n<p>But there are many other interesting posts, and, something that my blog still lacks, comments by many users, discussing the ideas he\nposts, making his blog even nicer.</p>",
      "summary": "Derek Lowe is the author of the blog In the Pipeline which is really fun to read. Derek works in pharmaceutical industry and gives a great insight in how things work in that field of molecular sciences. Yesterday he blogged about What Makes an Ugly Molecule? , and touches the Rule-of-Five, the hydrochloric acid bath (aka stomach), and other reasons that make molecules ugly.",
      
      "date_published": "2005-12-28T00:00:00+00:00",
      "date_modified": "2023-08-08T00:00:00+00:00",
      "tags": ["chemistry","cheminf"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/1pgeq-yqn56",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/12/27/knoppix-saves-day.html",
      "title": "Knoppix saves the day...",
      "content_html": "<p>After the three obligatory days of christmas holidays (fun, especially with two children, but very exhausting), it is time to get back to business again. I’m still\nat my father-in-laws place with only XP installed, so booted the <a href=\"http://www.knopper.net/knoppix/\">Knoppix 4.0.2 DVD</a> I burned last friday. Eclipse is not working,\nbut being able to use Kmail to read my email again is just what you need as in internet-junkie. A computer is just not complete without a nice KDE session hanging around.</p>\n\n<p>Anyway, booted eclipse on my computer at work, and tunneled the window over SSH. Not overly fast, but it seems to run fine. (If only I knew how to setup NX on\nthat Kubuntu breezy system!) Let’s see if I can get the <a href=\"http://sourceforge.net/tracker/?group_id=20024&amp;atid=120024\">CDK bug count</a> somewhat lower.</p>",
      "summary": "After the three obligatory days of christmas holidays (fun, especially with two children, but very exhausting), it is time to get back to business again. I’m still at my father-in-laws place with only XP installed, so booted the Knoppix 4.0.2 DVD I burned last friday. Eclipse is not working, but being able to use Kmail to read my email again is just what you need as in internet-junkie. A computer is just not complete without a nice KDE session hanging around.",
      
      "date_published": "2005-12-27T00:00:00+00:00",
      "date_modified": "2005-12-27T00:00:00+00:00",
      "tags": ["linux","kde"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/52me2-0wm09",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/12/23/subset-selection-mind-complexity.html",
      "title": "Subset selection: mind the complexity",
      "content_html": "<p>In a recent <a href=\"http://pubs.acs.org/journals/jcisd8/\">JCIM</a> article, Schuffenhauer <a href=\"http://dx.doi.org/10.1021/ci0503558\">compares</a> a few subset selection\nmethods, and notes that some of them reduce the average complexity of the molecules. They put this in relation to other research that states that\nlead compounds with high complexity have higher activities. Recommended reading material for the holidays.</p>",
      "summary": "In a recent JCIM article, Schuffenhauer compares a few subset selection methods, and notes that some of them reduce the average complexity of the molecules. They put this in relation to other research that states that lead compounds with high complexity have higher activities. Recommended reading material for the holidays.",
      
      "date_published": "2005-12-23T00:00:00+00:00",
      "date_modified": "2025-02-16T00:00:00+00:00",
      "tags": ["cheminf"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1021/ci0503558", "doi": "10.1021/ci0503558"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/q7ehm-v1m81",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/12/18/statcvs-on-cdk.html",
      "title": "StatCVS on CDK",
      "content_html": "<p>One of the <a href=\"http://www.classpath.org/\">Classpath</a> developers pointed me to their\n<a href=\"http://object-refinery.com/classpath/statcvs/\">CVS statistics</a> when I asked them\nhow actively their project is currently developed, i.e. the number of active developers.</p>\n\n<p>The pages are generated with <a href=\"http://statcvs.sourceforge.net/\">StatCVS</a>, and I ran it one the CDK too.</p>\n\n<p>I knew I did a lot of work on the CDK, but never realized that <a href=\"http://www.woc.science.ru.nl/devel/egonw/log.html/authors.html\">62.7%</a>\nof the commits were mine! Keep in mind, though, that a lot of these commits are for code maintainance! Next in line are\n<a href=\"http://almost.cubic.uni-koeln.de/jrg/Members/steinbeck\">steinbeck</a> and <a href=\"http://blue.chem.psu.edu/~rajarshi/\">rajarshi</a>.\nIn total 28 people commited patches to CVS, though other people contributed patches too, which were commited by a developer with write\naccess. There is jump in the commit messages somewhere this summer, which I think is the move of the data directory from cdk/data to\ncdk/src/data.</p>\n\n<p>The full analysis results can be found <a href=\"http://www.woc.science.ru.nl/devel/egonw/log.html/\">here</a>. It was generated with the\n<a href=\"http://packages.debian.org/unstable/devel/statcvs\">StatCVS version in sid</a>, and will rerun it soon with a more recent StatCVS version.</p>",
      "summary": "One of the Classpath developers pointed me to their CVS statistics when I asked them how actively their project is currently developed, i.e. the number of active developers.",
      
      "date_published": "2005-12-18T00:00:00+00:00",
      "date_modified": "2005-12-18T00:00:00+00:00",
      "tags": ["cdk"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/6p49t-sj396",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/12/16/cdk-debug-classes-and-fixing.html",
      "title": "CDK Debug classes and fixing the ModelBuilder3D bug",
      "content_html": "<p>For some weeks now I have been thinking about bug <a href=\"https://sourceforge.net/tracker/index.php?func=detail&amp;aid=1309731&amp;group_id=20024&amp;atid=120024\">1309731</a>:\n“ModelBuilder3D overwrites Atom IDs”. The <a href=\"http://cvs.sourceforge.net/viewcvs.py/cdk/cdk/src/org/openscience/cdk/modeling/builder3d/ModelBuilder3D.java?rev=1.23&amp;view=markup\">ModelBuilder3D</a>\nis a complex piece of source code, reusing many other parts of the CDK, including\n<a href=\"http://cdk.sourceforge.net/api/org/openscience/cdk/atomtype/package-summary.html\">atom type perception</a>.</p>\n\n<p>Somewhere in October, however, I found that Taverna could not create 3D models and convert these into reasonable CML because the Atom ID’s were messed up. So the question is, where did the\nModelBuilder3D do this? Did it do this itself, or is it done by one of the other pieces of CDK that it uses? But due to the complex nature of this algorithm, it quickly became clear\nthat looking at the code was not going to solve it; there was too much code to look at.</p>\n\n<p>The solution was clear to me: use the [new data interfaces <i class=\"fa-solid fa-recycle fa-xs\">](https://chem-bla-ics.linkedchemistry.info/2005/10/25/more-cdkinterfaces-updates.html).\nTo identify where the IDs where messed up, I only needed to write a DebugAtom class with a method that looked like:</i></p>\n\n<div class=\"language-java highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kd\">public</span> <span class=\"kt\">void</span> <span class=\"nf\">setID</span><span class=\"o\">(</span><span class=\"nc\">String</span> <span class=\"n\">identifier</span><span class=\"o\">)</span> <span class=\"o\">{</span>\n  <span class=\"n\">logger</span><span class=\"o\">.</span><span class=\"na\">debug</span><span class=\"o\">(</span><span class=\"s\">\"Setting ID: \"</span><span class=\"o\">,</span> <span class=\"n\">identifier</span><span class=\"o\">);</span>\n  <span class=\"kd\">super</span><span class=\"o\">.</span><span class=\"na\">setID</span><span class=\"o\">(</span><span class=\"n\">identifier</span><span class=\"o\">);</span>\n<span class=\"o\">}</span>\n</code></pre></div></div>\n\n<p>And I would immediately at what stage the ID was overwritten.</p>\n\n<p>So I started this week to implement the <a href=\"http://cvs.sourceforge.net/viewcvs.py/cdk/cdk/src/org/openscience/cdk/debug/DebugAtom.java?rev=1.1&amp;view=markup\">DebugAtom</a> and related classes.\nBy extending <code class=\"language-plaintext highlighter-rouge\">Atom</code>, I could just add debugging stuff and reuse the code in that class. However, the <code class=\"language-plaintext highlighter-rouge\">DebugAtom</code> can not extend <code class=\"language-plaintext highlighter-rouge\">DebugAtomType</code> too then. And this is a pity,\nbecause all methods inherited by the <code class=\"language-plaintext highlighter-rouge\">Atom</code> interface from <code class=\"language-plaintext highlighter-rouge\">AtomType</code>, <code class=\"language-plaintext highlighter-rouge\">Isotope</code>, <code class=\"language-plaintext highlighter-rouge\">Element</code> and <code class=\"language-plaintext highlighter-rouge\">ChemObject</code> interfaces could not be inherited from the <code class=\"language-plaintext highlighter-rouge\">DebugAtomType</code> class.\nInstead, they now have to duplicate those bits of code.</p>\n\n<p>This is not a clean solution, as duplicate code is a known cause of bugs. So, the next step was to write JUnit tests for the new debug classes. And for this\nI wanted to reuse, i.e. extend, the tests for the default data classes. This required, however, changes to those test classes.</p>\n\n<p>The first thing that needed to be changed was that instantiation of data classes in the tests would now have to depend on the data classes being tested. A simple</p>\n\n<div class=\"language-java highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nc\">Atom</span> <span class=\"n\">atom</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"nc\">Atom</span><span class=\"o\">(</span><span class=\"s\">\"C\"</span><span class=\"o\">);</span>\n</code></pre></div></div>\n\n<p>only makes sense when a specific <code class=\"language-plaintext highlighter-rouge\">Atom</code> class was important. Fortunately, the new interfaces provide a solution for this: the <code class=\"language-plaintext highlighter-rouge\">ChemObjectBuilder</code> implementations.\nThese allow to use the following syntax to replace the hard coded instantiation:</p>\n\n<div class=\"language-java highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nc\">Atom</span> <span class=\"n\">atom</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"s\">\"C\"</span><span class=\"o\">);</span>\n</code></pre></div></div>\n\n<p>Therefore, I added a protected field to the <code class=\"language-plaintext highlighter-rouge\">AtomTest</code>, which was instantiated in the <code class=\"language-plaintext highlighter-rouge\">setUp()</code>:</p>\n\n<div class=\"language-java highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kd\">protected</span> <span class=\"nc\">ChemObjectBuilder</span> <span class=\"n\">builder</span><span class=\"o\">;</span>\n<span class=\"kd\">public</span> <span class=\"kt\">void</span> <span class=\"nf\">setUp</span><span class=\"o\">()</span> <span class=\"o\">{</span>\n  <span class=\"n\">builder</span> <span class=\"o\">=</span> <span class=\"nc\">DefaultChemObjectBuilder</span><span class=\"o\">.</span><span class=\"na\">getInstance</span><span class=\"o\">();</span>\n<span class=\"o\">}</span>\n</code></pre></div></div>\n\n<p>and use this builder to instantiate all test objects, as shows for the atom above.</p>\n\n<p>And then I can simply reuse this JUnit test by defining the <code class=\"language-plaintext highlighter-rouge\">DebugAtomTest</code> like:</p>\n\n<div class=\"language-java highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kd\">public</span> <span class=\"kd\">class</span> <span class=\"nc\">DebugAtomTest</span> <span class=\"kd\">extends</span> <span class=\"nc\">AtomTest</span> <span class=\"o\">{</span>\n  <span class=\"kd\">public</span> <span class=\"nf\">DebugAtomTest</span><span class=\"o\">(</span><span class=\"nc\">String</span> <span class=\"n\">name</span><span class=\"o\">)</span> <span class=\"o\">{</span>\n    <span class=\"kd\">super</span><span class=\"o\">(</span><span class=\"n\">name</span><span class=\"o\">);</span>\n  <span class=\"o\">}</span>\n\n  <span class=\"kd\">public</span> <span class=\"kt\">void</span> <span class=\"nf\">setUp</span><span class=\"o\">()</span> <span class=\"o\">{</span>\n    <span class=\"kd\">super</span><span class=\"o\">.</span><span class=\"na\">builder</span> <span class=\"o\">=</span> <span class=\"nc\">DebugChemObjectBuilder</span><span class=\"o\">.</span><span class=\"na\">getInstance</span><span class=\"o\">();</span>\n  <span class=\"o\">}</span>\n\n  <span class=\"kd\">public</span> <span class=\"kd\">static</span> <span class=\"nc\">Test</span> <span class=\"nf\">suite</span><span class=\"o\">()</span> <span class=\"o\">{</span>\n    <span class=\"k\">return</span> <span class=\"k\">new</span> <span class=\"nf\">TestSuite</span><span class=\"o\">(</span><span class=\"nc\">DebugAtomTest</span><span class=\"o\">.</span><span class=\"na\">class</span><span class=\"o\">);</span>\n  <span class=\"o\">}</span>\n<span class=\"o\">}</span>\n</code></pre></div></div>\n\n<p>The sources for these debug data classes tests are found in the new <code class=\"language-plaintext highlighter-rouge\">cdk.test.debug</code> package.</p>\n\n<p>The number of JUnit tests for the CDK jumped from around 1250 to over 1500 tests right now. And if you think these new\ntests only test old code, because of all the <code class=\"language-plaintext highlighter-rouge\">super.bla()</code> calls in the debug classes, you’re way off. I found bugs in the\nnew debug classes, but <strong>also</strong> many class cast bugs and several other problems in the real data classes!</p>\n\n<p>Anyway. Does this help fix the <code class=\"language-plaintext highlighter-rouge\">ModelBuilder3D</code> bug? Yes, it does:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nv\">$ </span><span class=\"nb\">grep</span> <span class=\"s2\">\"Setting ID\"</span> reports/result.modeling.builder3d.ModelBuilder3dTest.txt\norg.openscience.cdk.debug.DebugAtom DEBUG: Setting ID: carbon1\norg.openscience.cdk.debug.DebugAtom DEBUG: Setting ID: oxygen1\norg.openscience.cdk.debug.DebugAtom DEBUG: Setting ID: C\norg.openscience.cdk.debug.DebugAtom DEBUG: Setting ID: HC\norg.openscience.cdk.debug.DebugAtom DEBUG: Setting ID: HC\norg.openscience.cdk.debug.DebugAtom DEBUG: Setting ID: HC\norg.openscience.cdk.debug.DebugAtom DEBUG: Setting ID: O\norg.openscience.cdk.debug.DebugAtom DEBUG: Setting ID: HO\n</code></pre></div></div>\n\n<p>This shows me where the <code class=\"language-plaintext highlighter-rouge\">Atom</code> ID is overwritten to be something other than “carbon1”! I can now look at the rest of the\n<code class=\"language-plaintext highlighter-rouge\">result.modeling.builder3d.ModelBuilder3dTest.txt</code> file to see what the <code class=\"language-plaintext highlighter-rouge\">ModelBuilder3D</code> was doing at the time,\nand which CDK class made the <code class=\"language-plaintext highlighter-rouge\">setID()</code> call.</p>\n\n<p>I only needed to change this line in the JUnit test for the bug to generate the above debug lines:</p>\n\n<div class=\"language-java highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nc\">Molecule</span> <span class=\"n\">methanol</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"nc\">Molecule</span><span class=\"o\">();</span>\n</code></pre></div></div>\n\n<p>into</p>\n\n<div class=\"language-java highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nc\">Molecule</span> <span class=\"n\">methanol</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"nc\">DebugMolecule</span><span class=\"o\">();</span>\n</code></pre></div></div>",
      "summary": "For some weeks now I have been thinking about bug 1309731: “ModelBuilder3D overwrites Atom IDs”. The ModelBuilder3D is a complex piece of source code, reusing many other parts of the CDK, including atom type perception.",
      
      "date_published": "2005-12-16T00:00:00+00:00",
      "date_modified": "2024-03-23T00:00:00+00:00",
      "tags": ["cdk","cheminf"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/er890-p9m81",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/12/13/math-libraries-for-java.html",
      "title": "Math libraries for Java?",
      "content_html": "<p>I drop in on the <code class=\"language-plaintext highlighter-rouge\">#classpath</code> channel of <a href=\"http://www.freenode.net/\">freenode.net</a> IRC network, where the <code class=\"language-plaintext highlighter-rouge\">#cdk</code> channel runs too.\nThe <code class=\"language-plaintext highlighter-rouge\">#classpath</code> channel is for the <a href=\"http://www.gnu.org/software/classpath/\">Classpath</a> project which is developing the free Java libraries used by most\nopen source virtual machines.</p>\n\n<p>A <a href=\"http://slashdot.org/\">Slashdot.org</a> item was mentioned <a href=\"http://developers.slashdot.org/developers/05/12/13/1824236.shtml?tid=108&amp;tid=156\">“Java Is So 90s”</a>.\nIt lead to a funny discussion about what that would make C/C++ and Fortran. A more serious question was brought up: where are the efficient and super fast\nJava linear algebra and complex number libraries?</p>\n\n<p>There is <a href=\"http://www.cs.waikato.ac.nz/ml/weka/\">Weka</a> but it is more aimed at data analysis. I believe it has support principle component analysis, so it\nmust have singular value decomposition. There is a book called <strong>Java Number Cruncher: The Java Programmer’s Guide to Numerical Computing</strong>\nby Ronald Mak, 2003, Prentice Hall.</p>\n\n<p>After some further asking about it on the channel, they mentioned the <a href=\"http://jakarta.apache.org/commons/math/\">Apache commons math</a> project,\nwhich seems promising. The website mentions complex numbers, linear algebra, statistics and numerical analysis, but have not looked at the full API,\nso not sure how well populated these areas are.</p>\n\n<p>Anyone, with experience in the area of numerical computing and Java?</p>",
      "summary": "I drop in on the #classpath channel of freenode.net IRC network, where the #cdk channel runs too. The #classpath channel is for the Classpath project which is developing the free Java libraries used by most open source virtual machines.",
      
      "date_published": "2005-12-13T00:00:00+00:00",
      "date_modified": "2005-12-13T00:00:00+00:00",
      "tags": ["math","java"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/y0mte-4ns18",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/12/10/jumbo-50-and-cdk.html",
      "title": "Jumbo 5.0 and the CDK",
      "content_html": "<p>I <a href=\"https://egonw.github.io/blog/2005/12/08/jumbo-50-and-cml-support-in-cdk.html\">reported earlier <i class=\"fa-solid fa-recycle fa-xs\"></i></a> that the CDK has been updated in CVS to use\nCML from the new Jumbo 5.0. The transition actually involved a lot of changes in the CDK, some I would like to address in the following comments.\nOne thing is that CML write support (not reading!) uses the new Jumbo library which requires Java 1.5. Thus, if Java 1.5 is not available,\nthen CML writing should not be compiled. This is how this is done.</p>\n\n<h3 id=\"the-javadoc\">The JavaDoc</h3>\n\n<p>The CDK makes extensive use of <a href=\"http://java.sun.com/j2se/1.5.0/docs/guide/javadoc/taglet/spec/com/sun/tools/doclets/Taglet.html\">JavaDoc taglets</a>.\nCDK uses tags of type <code class=\"language-plaintext highlighter-rouge\">@cdk.SOMETAG</code>. And an important tag in this case, is the <code class=\"language-plaintext highlighter-rouge\">@cdk.require</code> tag, becuase it allows us to make the CDK build\nsystem aware that the class requires Java 5.0 to be compiled. Thus, we have for example\n<a href=\"http://cvs.sourceforge.net/viewcvs.py/cdk/cdk/src/org/openscience/cdk/io/CMLWriter.java?rev=1.90&amp;view=log\">this code in CVS</a>, of which bits are:</p>\n\n<div class=\"language-java highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"cm\">/**\n * Serializes a SetOfMolecules or a Molecule object to CML 2 code.\n * Chemical Markup Language is an XML based file format {@cdk.cite PMR99}.\n * Output can be redirected to other Writer objects like StringWriter\n * and FileWriter.\n *\n * @cdk.module       libio-cml\n * @cdk.builddepends xom-1.0.jar\n * @cdk.depends      jumbo50.jar\n * @cdk.require      java1.5\n */</span>\n<span class=\"kd\">public</span> <span class=\"kd\">class</span> <span class=\"nc\">CMLWriter</span> <span class=\"kd\">extends</span> <span class=\"nc\">DefaultChemObjectWriter</span> <span class=\"o\">{</span>\n<span class=\"o\">}</span>\n</code></pre></div></div>\n\n<p>As probably is clear compiling this jars requires a two jars to be present, of which the <code class=\"language-plaintext highlighter-rouge\">jumbo50.jar</code> itself is not required for compiling\nthe class source code. It also shows the use of the <code class=\"language-plaintext highlighter-rouge\">@cdk.require</code> tag.</p>\n\n<h3 id=\"the-buildxml\">The build.xml</h3>\n\n<p>Because the CDK still does not require Java 1.5, the CDK is supposed to be buildable with Java 1.4 (the oldest supported Java release). The\n<a href=\"http://ant.apache.org/\">Ant</a> <a href=\"http://cvs.sourceforge.net/viewcvs.py/cdk/cdk/build.xml?rev=1.310&amp;view=markup\">build.xml</a> script is quite\nable to conditionally leave out compiling parts of the CDK, if configured correctly using proper JavaDoc tags, as explained earlier.</p>\n\n<p>First, the build.xml checks what libraries are available for compiling certain parts of the CDK. For example, the build.xml code to check for Java 1.5 looks like:</p>\n\n<div class=\"language-xml highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nt\">&lt;condition</span> <span class=\"na\">property=</span><span class=\"s\">\"isJava15\"</span><span class=\"nt\">&gt;</span>\n  <span class=\"nt\">&lt;contains</span> <span class=\"na\">string=</span><span class=\"s\">\"${java.version}\"</span> <span class=\"na\">substring=</span><span class=\"s\">\"1.5\"</span><span class=\"nt\">/&gt;</span>\n<span class=\"nt\">&lt;/condition&gt;</span>\n</code></pre></div></div>\n\n<p>Run <code class=\"language-plaintext highlighter-rouge\">ant info</code> to see what is being checked for, or look at the <code class=\"language-plaintext highlighter-rouge\">build.xml</code> source code for the check target.</p>\n\n<p>All compiling is done by the compile-module target, and there it in- and excludes bits of the CDK depending on the checked conditions:</p>\n\n<div class=\"language-xml highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nt\">&lt;javac</span> <span class=\"na\">srcdir=</span><span class=\"s\">\"${build.src}\"</span> <span class=\"na\">destdir=</span><span class=\"s\">\"${build}\"</span> <span class=\"na\">optimize=</span><span class=\"s\">\"${optimization}\"</span> \n       <span class=\"na\">debug=</span><span class=\"s\">\"${debug}\"</span> <span class=\"na\">deprecation=</span><span class=\"s\">\"${deprecation}\"</span><span class=\"nt\">&gt;</span>\n\n  <span class=\"nt\">&lt;excludesfile</span> <span class=\"na\">name=</span><span class=\"s\">\"${src}/java1.4+.javafiles\"</span> <span class=\"na\">if=</span><span class=\"s\">\"isJava13\"</span><span class=\"nt\">/&gt;</span>\n  <span class=\"nt\">&lt;excludesfile</span> <span class=\"na\">name=</span><span class=\"s\">\"${src}/java1.4.javafiles\"</span> <span class=\"na\">unless=</span><span class=\"s\">\"isJava14\"</span><span class=\"nt\">/&gt;</span>\n  <span class=\"nt\">&lt;excludesfile</span> <span class=\"na\">name=</span><span class=\"s\">\"${src}/java1.5.javafiles\"</span> <span class=\"na\">unless=</span><span class=\"s\">\"isJava15\"</span><span class=\"nt\">/&gt;</span>\n  <span class=\"nt\">&lt;excludesfile</span> <span class=\"na\">name=</span><span class=\"s\">\"${src}/ant1.6.javafiles\"</span> <span class=\"na\">unless=</span><span class=\"s\">\"hasAnt16\"</span><span class=\"nt\">/&gt;</span>\n  <span class=\"nt\">&lt;excludesfile</span> <span class=\"na\">name=</span><span class=\"s\">\"${src}/r-project.javafiles\"</span> <span class=\"na\">unless=</span><span class=\"s\">\"rispresent\"</span><span class=\"nt\">/&gt;</span>\n\n  <span class=\"nt\">&lt;includesfile</span> <span class=\"na\">name=</span><span class=\"s\">\"${src}/${module}.javafiles\"</span><span class=\"nt\">/&gt;</span>\n<span class=\"nt\">&lt;/javac&gt;</span>\n</code></pre></div></div>\n\n<p>Keep in mind that the <code class=\"language-plaintext highlighter-rouge\">*.javafiles</code> are created with JavaDoc based on the CDK JavaDoc tags mentioned earlier.</p>\n\n<h3 id=\"the-buildxml-2\">The build.xml 2</h3>\n\n<p>While the above mechanism has been present since for some time now, having jumbo50.jar in CVS made the situation a bit trickier:\nthe <code class=\"language-plaintext highlighter-rouge\">jumbo50.jar</code> uses the 49.0 class format used in Java 1.5, and cannot be processed by Java 1.4 systems. Since the classpath\nused when compiling CDK source code, is defined in configuration files for those modules in\n<a href=\"http://cvs.sourceforge.net/viewcvs.py/cdk/cdk/src/META-INF/\">src/META-INF</a>, the problem did not occur when compiling the modules.\nHowever, it did show an error in the <code class=\"language-plaintext highlighter-rouge\">reallyRunDoclet</code> target today, when I was creating the <code class=\"language-plaintext highlighter-rouge\">*.javafiles</code> with JavaDoc.\nThe solution was trivial:</p>\n\n<div class=\"language-xml highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nt\">&lt;target</span> <span class=\"na\">name=</span><span class=\"s\">\"reallyRunDoclet\"</span> <span class=\"na\">id=</span><span class=\"s\">\"reallyRunDoclet\"</span>\n  <span class=\"na\">depends=</span><span class=\"s\">\"compileDoclet\"</span> <span class=\"na\">unless=</span><span class=\"s\">\"dotjavafiles.uptodate\"</span><span class=\"nt\">&gt;</span>\n  <span class=\"nt\">&lt;javadoc</span> <span class=\"na\">private=</span><span class=\"s\">\"true\"</span>  <span class=\"na\">maxmemory=</span><span class=\"s\">\"128m\"</span><span class=\"nt\">&gt;</span>\n    <span class=\"nt\">&lt;classpath&gt;</span>\n      <span class=\"nt\">&lt;fileset</span> <span class=\"na\">dir=</span><span class=\"s\">\"${lib}\"</span><span class=\"nt\">&gt;</span>\n        <span class=\"nt\">&lt;include</span> <span class=\"na\">name=</span><span class=\"s\">\"*.jar\"</span> <span class=\"nt\">/&gt;</span>\n        <span class=\"c\">&lt;!-- some jars require some Java version --&gt;</span>\n        <span class=\"nt\">&lt;exclude</span> <span class=\"na\">name=</span><span class=\"s\">\"jumbo50.jar\"</span> <span class=\"na\">unless=</span><span class=\"s\">\"isJava15\"</span><span class=\"nt\">/&gt;</span>\n      <span class=\"nt\">&lt;/fileset&gt;</span>\n      <span class=\"nt\">&lt;fileset</span> <span class=\"na\">dir=</span><span class=\"s\">\"${lib}/libio\"</span><span class=\"nt\">&gt;</span>\n        <span class=\"nt\">&lt;include</span> <span class=\"na\">name=</span><span class=\"s\">\"*.jar\"</span> <span class=\"nt\">/&gt;</span>\n      <span class=\"nt\">&lt;/fileset&gt;</span>\n      <span class=\"nt\">&lt;fileset</span> <span class=\"na\">dir=</span><span class=\"s\">\"${devellib}\"</span><span class=\"nt\">&gt;</span>\n        <span class=\"nt\">&lt;include</span> <span class=\"na\">name=</span><span class=\"s\">\"*.jar\"</span> <span class=\"nt\">/&gt;</span>\n      <span class=\"nt\">&lt;/fileset&gt;</span>\n    <span class=\"nt\">&lt;/classpath&gt;</span>\n\n    <span class=\"nt\">&lt;doclet</span> <span class=\"na\">name=</span><span class=\"s\">\"net.sf.cdk.tools.MakeJavaFilesFilesDoclet\"</span>\n      <span class=\"na\">path=</span><span class=\"s\">\"${doc}/javadoc\"</span><span class=\"nt\">/&gt;</span>\n\n    <span class=\"nt\">&lt;packageset</span> <span class=\"na\">dir=</span><span class=\"s\">\"${src}\"</span><span class=\"nt\">&gt;</span>\n      <span class=\"nt\">&lt;include</span> <span class=\"na\">name=</span><span class=\"s\">\"org/openscience/cdk/**\"</span><span class=\"nt\">/&gt;</span>\n    <span class=\"nt\">&lt;/packageset&gt;</span>\n\n<span class=\"nt\">&lt;/javadoc&gt;</span>\n</code></pre></div></div>\n\n<h3 id=\"cdkapplicationsfileconvertor\">cdk.applications.FileConvertor</h3>\n\n<p>There is another area of interest: the <code class=\"language-plaintext highlighter-rouge\">FileConvertor</code>, which is, sort of, CDK’s\n<a href=\"http://openbabel.sf.net/\">OpenBabel</a>’s <code class=\"language-plaintext highlighter-rouge\">babel</code> variant. The FileConvertor must\nbe compiled in all cases, so we need to conditionally instantiate the <code class=\"language-plaintext highlighter-rouge\">CMLWriter</code>, which is not really a problem. However, compiling\nthe source code is more troublesome: the <code class=\"language-plaintext highlighter-rouge\">CMLWriter</code> class must be loaded on runtime, and not occur hardcoded in the source code.</p>\n\n<p>In the past I have solved this by using <code class=\"language-plaintext highlighter-rouge\">.getInstance()</code> constructs, but the\n<a href=\"http://cvs.sourceforge.net/viewcvs.py/cdk/cdk/src/org/openscience/cdk/io/ChemObjectWriter.java?rev=1.19&amp;view=log\">ChemObjectWriter interface</a> does not define this\nfunctionality, so I decided to use the <code class=\"language-plaintext highlighter-rouge\">java.lang.reflect</code> mechanism:</p>\n\n<div class=\"language-java highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"o\">}</span> <span class=\"k\">else</span> <span class=\"k\">if</span> <span class=\"o\">(</span><span class=\"n\">format</span><span class=\"o\">.</span><span class=\"na\">equalsIgnoreCase</span><span class=\"o\">(</span><span class=\"s\">\"CML\"</span><span class=\"o\">))</span> <span class=\"o\">{</span>\n  <span class=\"nc\">Class</span> <span class=\"n\">cmlWriterClass</span> <span class=\"o\">=</span> <span class=\"k\">this</span><span class=\"o\">.</span><span class=\"na\">getClass</span><span class=\"o\">().</span><span class=\"na\">getClassLoader</span><span class=\"o\">().</span>\n    <span class=\"n\">loadClass</span><span class=\"o\">(</span><span class=\"s\">\"org.opscience.cdk.io.CMLWriter\"</span><span class=\"o\">);</span>\n  <span class=\"k\">if</span> <span class=\"o\">(</span><span class=\"n\">cmlWriterClass</span> <span class=\"o\">!=</span> <span class=\"kc\">null</span><span class=\"o\">)</span> <span class=\"o\">{</span>\n    <span class=\"n\">writer</span> <span class=\"o\">=</span> <span class=\"o\">(</span><span class=\"nc\">ChemObjectWriter</span><span class=\"o\">)</span><span class=\"n\">cmlWriterClass</span><span class=\"o\">.</span><span class=\"na\">newInstance</span><span class=\"o\">();</span>\n  <span class=\"o\">}</span>\n  <span class=\"nc\">Constructor</span> <span class=\"n\">constructor</span> <span class=\"o\">=</span> <span class=\"n\">writer</span><span class=\"o\">.</span><span class=\"na\">getClass</span><span class=\"o\">().</span><span class=\"na\">getConstructor</span><span class=\"o\">(</span><span class=\"k\">new</span> <span class=\"nc\">Class</span><span class=\"o\">[]{</span><span class=\"nc\">Writer</span><span class=\"o\">.</span><span class=\"na\">class</span><span class=\"o\">});</span>\n  <span class=\"n\">writer</span> <span class=\"o\">=</span> <span class=\"o\">(</span><span class=\"nc\">ChemObjectWriter</span><span class=\"o\">)</span><span class=\"n\">constructor</span><span class=\"o\">.</span><span class=\"na\">newInstance</span><span class=\"o\">(</span><span class=\"k\">new</span> <span class=\"nc\">Object</span><span class=\"o\">[]{</span><span class=\"n\">fileWriter</span><span class=\"o\">});</span>\n<span class=\"o\">}</span> <span class=\"k\">else</span> <span class=\"o\">{</span>\n</code></pre></div></div>\n\n<p>Now, this has been, by far, the longest blog item I have written so far. I hope it gave you good insight in some techniques CDK uses to deal with\nsituations where functionality might, or might not, be present at build and at run time.</p>",
      "summary": "I reported earlier that the CDK has been updated in CVS to use CML from the new Jumbo 5.0. The transition actually involved a lot of changes in the CDK, some I would like to address in the following comments. One thing is that CML write support (not reading!) uses the new Jumbo library which requires Java 1.5. Thus, if Java 1.5 is not available, then CML writing should not be compiled. This is how this is done.",
      
      "date_published": "2005-12-10T00:00:00+00:00",
      "date_modified": "2023-08-05T00:00:00+00:00",
      "tags": ["cdk","cml","java"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/dzvnw-3b413",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/12/08/jumbo-50-and-cml-support-in-cdk.html",
      "title": "Jumbo 5.0 and CML support in CDK",
      "content_html": "<p>Tobias <a href=\"http://cvs.sourceforge.net/viewcvs.py/cdk/cdk/jar/jumbo50.jar?rev=1.1&amp;view=log\">commited</a>\n<a href=\"http://sourceforge.net/forum/forum.php?forum_id=518283\">Jumbo 5.0</a> to CDK CVS, so that the CDK is now\nagain up to date with the latest <a href=\"http://www.xml-cml.org/\">CML</a> library. Note that Jumbo 5.0 requires Java 5.0.</p>\n\n<p>At first all JUnit tests seems to work, but apparently the <a href=\"http://cvs.sourceforge.net/viewcvs.py/cdk/cdk/src/org/openscience/cdk/test/io/cml/CML2WriterTest.java?rev=1.13&amp;view=log\">CML2Writer</a>\ntests were skipped because they were only run when Java 1.4 was found. I updated the test for the a appropriate\nJava version, and then it turned out that most tests fail. So those running CDK from CVS and depent on CML\nwriting: hang on, it will be fixed very soon.</p>",
      "summary": "Tobias commited Jumbo 5.0 to CDK CVS, so that the CDK is now again up to date with the latest CML library. Note that Jumbo 5.0 requires Java 5.0.",
      
      "date_published": "2005-12-08T00:00:00+00:00",
      "date_modified": "2005-12-08T00:00:00+00:00",
      "tags": ["cdk","blue-obelisk","cml"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/a3r1n-72841",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/12/06/uml-diagram-of-cdk-module-dependencies.html",
      "title": "UML diagram of CDK module dependencies",
      "content_html": "<p>The code clean up after <a href=\"http://cdk.sf.net/\">CDK</a>’s interfaces transition is in progress, and two\n<a href=\"http://almost.cubic.uni-koeln.de/cdk/cdk_top/devel/modules/\">CDK modules</a> are now independent\nof the <em>data</em> module. After <a href=\"https://chem-bla-ics.linkedchemistry.info/2005/10/25/more-cdkinterfaces-updates.html\">doing the <em>core</em> module <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\nthe standard was next, and I finished this yesterday. The dependencies in CVS now look like (click it to get a larger view):</p>\n\n<p>IMAGE LOST</p>\n\n<p>This <a href=\"https://en.wikipedia.org/wiki/Unified_Modeling_Language\">UML</a> diagram was made with <a href=\"http://uml.sourceforge.net/\">Umbrello</a>, and the source is in\n<a href=\"http://www-128.ibm.com/developerworks/xml/library/x-xmi/\">XMI</a> in CVS.</p>\n\n<p>I cannot stress enough the advantages of these changes:</p>\n\n<ol>\n  <li>the code is cleaner</li>\n  <li>module dependencies are cleaner</li>\n  <li>impossible to use methods outside the interface</li>\n  <li>the algorithms are independent of the data classes</li>\n</ol>\n\n<p>The last advantage is really important: it allows alternative implementations of the data classes. For example, we could make debug\ndata classes, which, unlike the normal classes, do all sorts of checks when using methods of these classes. For example, they can\nexplicitely check that parameters are not null, of the right class, and generally make sense. This makes them, possibly, slower,\nbut also more type save, and as such great for debugging and development sessions.</p>\n\n<p>Another important application of making the CDK library independent of the data classes (and only depending on the\n<a href=\"http://cdk.sourceforge.net/api/org/openscience/cdk/interfaces/package-frame.html\">interfaces</a>), is that we can have data classes\nshared with other Java libraries, such as <a href=\"http://joelib.sf.net/\">JOElib</a>, <a href=\"http://octetsource.com/\">Octet</a>,\nCML (<a href=\"http://sourceforge.net/mailarchive/forum.php?thread_id=9146642&amp;forum_id=8774\">Jumbo 5.0 is out!</a>), and even proprietary libraries.\nThis approach is already used in the <a href=\"https://chem-bla-ics.linkedchemistry.info/2005/10/18/cdk-taverna-fully-recognized.html\">CDK-Taverna <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nlibrary, and I anticipate much wider use with the arrival of <a href=\"http://www.bioclipse.net/\">Bioclipse</a>.</p>",
      "summary": "The code clean up after CDK’s interfaces transition is in progress, and two CDK modules are now independent of the data module. After doing the core module , the standard was next, and I finished this yesterday. The dependencies in CVS now look like (click it to get a larger view):",
      
      "date_published": "2005-12-06T00:00:00+00:00",
      "date_modified": "2024-03-11T00:00:00+00:00",
      "tags": ["cdk","uml"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/s2cqd-wvh17",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/12/04/planet-blue-obelisk-website-updates.html",
      "title": "Planet Blue Obelisk website updates",
      "content_html": "<p>After requests I added yesterday more visible the RSS and Atom feeds for the\n<a href=\"http://www.woc.science.ru.nl/planetbo/\">Planet Blue Obelisk</a>. They are linked in the menu\non the right, and as alternative links to the document. These should show up in most recent webbrowsers as feed icon in the\nlower right corner of the browser window. It is often an orange icon. I also added a ‘Leave a comment’ link to encourage\npeople to leave comments on items. Please do!</p>",
      "summary": "After requests I added yesterday more visible the RSS and Atom feeds for the Planet Blue Obelisk. They are linked in the menu on the right, and as alternative links to the document. These should show up in most recent webbrowsers as feed icon in the lower right corner of the browser window. It is often an orange icon. I also added a ‘Leave a comment’ link to encourage people to leave comments on items. Please do!",
      
      "date_published": "2005-12-04T00:00:00+00:00",
      "date_modified": "2005-12-04T00:00:00+00:00",
      "tags": ["feeds","blue-obelisk"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/v0a2f-hfk94",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/12/03/about-jchempaints-future-and-todays.html",
      "title": "About JChemPaint&apos;s future and todays 2.1.5 release",
      "content_html": "<p>Stefan has done an excellent debugging week on <a href=\"http://jchempaint.sf.net/\">JChemPaint</a>, while I have been late with a\n2.1 release. Anyway, I’ve just uploaded a Java 1.4 compiled JChemPaint 2.1 series release. I was told the (reported) bug\ncount is down to one, so I expect to see the next stable branch to be released soon (2.2 series).</p>\n\n<p>But what after JChemPaint 2.2 gets released? Will a 2.3 developers branch be opened? Or will the JChemPaint application,\nas we know it, cease to exist, and make place for the <a href=\"http://www.bioclipse.net/\">Bioclipse</a>\n<a href=\"http://www.bioclipse.net/index.php?option=com_content&amp;task=view&amp;id=6&amp;Itemid=7\">JChemPaint plugin</a>, that is being worked on?</p>\n\n<p>It is worth mentioning the pros and cons of JChemPaint. One big pro is the applet version of JChemPaint, though free but\nclosed source alternatives are available (e.g. <a href=\"http://www.chemaxon.com/marvin/chemaxon/marvin/help/common.html\">MarvinSketch</a>).\nAnother advantage is the great semantics of the chemistry being drawn. For example, when drawing reactions, reactants are\nreally marked as reactants, and are not just molecules left of an arrow. Moreover, JChemPaint is a great platform in which\nideas can be tested! One of the key virtues of opensourceness. Cons include the limited amount of templates, print quality\ngraphics, and others. (Comments on JChemPaint most welcomed.)</p>\n\n<p>So what about this Bioclipse then? It is inheritently SWT based, but currently the\n<a href=\"http://help.eclipse.org/help30/index.jsp?topic=/org.eclipse.platform.doc.isv/reference/api/org/eclipse/swt/awt/SWT_AWT.html\">SWT_AWT</a>\nbridge is used to embed to current JChemPaint and underlying CDK code as is. Unfortunately,\n<a href=\"http://lists.gnu.org/archive/html/classpath/2005-11/msg00162.html\">this bridge is using proprietary code from Sun</a>\n(<code class=\"language-plaintext highlighter-rouge\">sun.awt classes</code>), which makes it impossible to use with free virtual machines.</p>\n\n<p>But there is also the option of using the SWT drawing classes. This has the advantage that it can be run with free virtual\nmachines, and that it can even be compiled to native code. It requires serious rewriting of code in the JChemPaint and\nCDK code base. But, CDK’s <a href=\"http://cdk.sourceforge.net/api/org/openscience/cdk/renderer/Renderer2D.html\">Renderer2D</a> needs a\nrewrite anyway: it does not even use Swing’s Java2D efficiently (try to figure out how it transforms atomic 2D coordinates into\nscreen coordinates!). Some efforts have been ongoing, but a rewrite from scratch, with a better, more modular, design cannot\nhurd at all.</p>",
      "summary": "Stefan has done an excellent debugging week on JChemPaint, while I have been late with a 2.1 release. Anyway, I’ve just uploaded a Java 1.4 compiled JChemPaint 2.1 series release. I was told the (reported) bug count is down to one, so I expect to see the next stable branch to be released soon (2.2 series).",
      
      "date_published": "2005-12-03T00:00:00+00:00",
      "date_modified": "2005-12-03T00:00:00+00:00",
      "tags": ["jchempaint","cdk","bioclipse"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/egxtq-kd254",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/11/30/kde-35-is-out.html",
      "title": "KDE 3.5 is out",
      "content_html": "<p><a href=\"http://www.kde.org/\">KDE</a> 3.5 was <a href=\"http://dot.kde.org/1133270759/\">released</a> with\n<a href=\"http://www.kde.org/announcements/visualguide-3.5.php\">lots of changes</a>. SuperKaramba is now a standard\nKDE application and is neatly integrated. It allows embedding themelets on your desktop background.</p>\n\n<p>It shows several themelets: the weather, a calender, a toolbar with applications, a\n<a href=\"https://web.archive.org/web/20060127053003/http://wiki.jmol.org/FoldingAtHomeCommunity\">FoldingAtHome monitor <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>,\nthe contents of the clipboard, the music that is playing\n(<a href=\"http://en.wikipedia.org/wiki/Cake_(band)\">Cake</a>) and a simple todo list. All customizable up to the pixel.</p>\n\n<p>And before I forget: a nice new <a href=\"http://edu.kde.org/kalzium/\">Kalzium</a> release!</p>",
      "summary": "KDE 3.5 was released with lots of changes. SuperKaramba is now a standard KDE application and is neatly integrated. It allows embedding themelets on your desktop background.",
      
      "date_published": "2005-11-30T00:00:00+00:00",
      "date_modified": "2023-08-03T00:00:00+00:00",
      "tags": ["kde"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/v9q9d-pbv52",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/11/30/getting-started-with-eclipse-and-swt.html",
      "title": "Getting Started with Eclipse and the SWT",
      "content_html": "<p><a href=\"http://www.cs.umanitoba.ca/~eclipse/\">Getting Started with Eclipse and the SWT</a> is a very nice set of introductory tutorial on working\nwith SWT and Eclipse in general. The tutorials cover the <a href=\"http://www.cs.umanitoba.ca/~eclipse/2-Basic.pdf\">basic</a>,\n<a href=\"http://www.cs.umanitoba.ca/~eclipse/3-Advanced.pdf\">advanced</a> SWT widgets,\n<a href=\"http://www.cs.umanitoba.ca/~eclipse/4-Layouts.pdf\">SWT layout</a>, and several other interesting topics.</p>\n\n<p>Now that <a href=\"http://www.bioclipse.net/\">Bioclipse</a> is gaining speed, it is a must-read.</p>",
      "summary": "Getting Started with Eclipse and the SWT is a very nice set of introductory tutorial on working with SWT and Eclipse in general. The tutorials cover the basic, advanced SWT widgets, SWT layout, and several other interesting topics.",
      
      "date_published": "2005-11-30T00:00:00+00:00",
      "date_modified": "2005-11-30T00:00:00+00:00",
      "tags": ["bioclipse"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/1vq27-8js77",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/11/28/blue-obelisk-blog-planet.html",
      "title": "A Blue Obelisk blog Planet",
      "content_html": "<p>Today I setup a blog planet for <a href=\"http://www.blueobelisk.org/\">Blue Obelisk</a> members. First I tried\nChumpologica but it did not read Atom feeds.</p>\n\n<p>Next in line was <a href=\"https://web.archive.org/web/20171029175722/http://www.planetplanet.org/\">Planet <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>,\nwhich turned out to be used by many big planet sites, like\n<a href=\"http://planet.debian.org/\">Planet Debian <i class=\"fa-solid fa-recycle fa-xs\"></i></a>. It also works with Atom feeds in general, but not well with Atom 1.0 feeds, like that of\n<a href=\"http://www.livejournal.com/users/cniehaus/\">Carsten</a>. After some googling I found a\n<a href=\"http://lists.planetplanet.org/pipermail/devel/2005-November/000710.html\">patched version <i class=\"fa-solid fa-link-slash fa-xs\"></i></a> which did the job.</p>\n\n<p>The result is at <a href=\"http://www.woc.science.ru.nl/planetbo/\">http://www.woc.science.ru.nl/planetbo/ <i class=\"fa-solid fa-link-slash fa-xs\"></i></a>,\nbut I hope that someone can arrange a http://planet.blueobelisk.org/.</p>",
      "summary": "Today I setup a blog planet for Blue Obelisk members. First I tried Chumpologica but it did not read Atom feeds.",
      
      "date_published": "2005-11-28T00:00:00+00:00",
      "date_modified": "2023-08-03T00:00:00+00:00",
      "tags": ["blue-obelisk","feeds"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/s1sxs-8qb11",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/11/27/open-source-swing-jmol-renderer-runs.html",
      "title": "Open Source Swing: Jmol renderer runs!",
      "content_html": "<p>Where I was able to mention <a href=\"/blog/2005/11/20/open-source-swing-jchempaint-runs.html\">earlier <i class=\"fa-solid fa-recycle fa-xs\"></i></a> that JChemPaint now runs with free\n(as in open source) Java virtual machines, I just tried to run the core Jmol renderer, using the\n<a href=\"https://sourceforge.net/p/jmol/code/4289/tree//trunk/Jmol/examples/Integration.java\">Integration.java <i class=\"fa-solid fa-recycle fa-xs\"></i></a> which comes as an example.</p>\n\n<p>Sadly, the original screenshots got lost that were made with <a href=\"http://jamvm.sourceforge.net/\">jamvm</a> 1.3.3 and <a href=\"http://developer.classpath.org/\">classpath</a> 0.19.</p>\n\n<p>It is very slow, however. I have not tried it with other free virtual machines, which are supposedly faster. It is a good start nevertheless: it means that a\nJmol based <a href=\"http://www.bioclipse.net/\">Bioclipse</a> plugin will work with free virtual machines too.</p>",
      "summary": "Where I was able to mention earlier that JChemPaint now runs with free (as in open source) Java virtual machines, I just tried to run the core Jmol renderer, using the Integration.java which comes as an example.",
      
      "date_published": "2005-11-27T00:00:00+00:00",
      "date_modified": "2024-08-09T00:00:00+00:00",
      "tags": ["jmol","java"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/bzqem-cqy33",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/11/23/machine-crash-svn-went-along.html",
      "title": "Machine crash; SVN went along",
      "content_html": "<p>Doesn’t happen often, but my machine crashed two hours ago. Not a big deal, because I have my important files in SVN. Oh wait, SVN had a commit\nin progress during the crash. So, <code class=\"language-plaintext highlighter-rouge\">svn recover</code>. Mmmm… doesn’t work either. OK, SVN FAQ: try <code class=\"language-plaintext highlighter-rouge\">db_recover</code>. That worked. No, it did not:\n<code class=\"language-plaintext highlighter-rouge\">svn commit</code> still not working for the files I was trying to commit. Fortunately, I make regular SVN db backups so I created a brand new\nSVN repository from scratch and recovered the back up. That worked. Really.</p>",
      "summary": "Doesn’t happen often, but my machine crashed two hours ago. Not a big deal, because I have my important files in SVN. Oh wait, SVN had a commit in progress during the crash. So, svn recover. Mmmm… doesn’t work either. OK, SVN FAQ: try db_recover. That worked. No, it did not: svn commit still not working for the files I was trying to commit. Fortunately, I make regular SVN db backups so I created a brand new SVN repository from scratch and recovered the back up. That worked. Really.",
      
      "date_published": "2005-11-23T00:00:00+00:00",
      "date_modified": "2023-08-02T00:00:00+00:00",
      "tags": ["svn"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/sm10s-hjc49",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/11/21/bioclipse-chemo-bioinformatics.html",
      "title": "Bioclipse: the chemo-/bioinformatics workbench",
      "content_html": "<p>Some weeks back there was the <a href=\"https://web.archive.org/web/20080208101002/http://almost.cubic.uni-koeln.de/cdk/cdk_top/events/cdk5yearworkshop/\">CDK5AW <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>,\nthe CDK 5th anniversiry workshop. A small group of international open source chemo-, bioinformatics software developers met,\namong which two from Sweden. It was then decided to generalize their work resulting in Bioclipse:</p>\n\n<p><a href=\"https://www.bioclipse.net/\">https://www.bioclipse.net/</a></p>\n\n<p>It’s heavily using the <a href=\"https://wiki.eclipse.org/Rich_Client_Platform\">Eclipse Rich Client Platform <i class=\"fa-solid fa-recycle fa-xs\"></i></a>, making additional plugins trivial. OK, if this does\nnot convinve you: check the screenshots on the Bioclipse website.</p>\n\n<p>It’s a killer, really! Ola, Martin: great work!</p>\n\n<p>PS. I am going to try to run it with free Java virtual machines this weekend, but if you have a working solution earlier than that, please leave a comment and screenshot in the comments.</p>",
      "summary": "Some weeks back there was the CDK5AW , the CDK 5th anniversiry workshop. A small group of international open source chemo-, bioinformatics software developers met, among which two from Sweden. It was then decided to generalize their work resulting in Bioclipse:",
      
      "date_published": "2005-11-21T00:00:00+00:00",
      "date_modified": "2023-08-02T00:00:00+00:00",
      "tags": ["cdk","bioclipse"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/4dgp8-dtq30",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/11/20/open-source-swing-jchempaint-runs.html",
      "title": "Open Source Swing: JChemPaint runs!",
      "content_html": "<p>Thanx to <a href=\"https://chem-bla-ics.blogspot.com/2005/11/goal-live-chemblaics-cd.html?showComment=1132422120000\">Mark’s encouragements</a>, I tried to run <!-- keep link -->\n<a href=\"http://www.jmol.org/\">Jmol</a> and <a href=\"http://jchempaint.sf.net/\">JChemPaint</a> with\n<a href=\"http://jamvm.sourceforge.net/\">jamvm</a>.</p>\n\n<p>Jmol fails with an <a href=\"https://chem-bla-ics.linkedchemistry.info/2005/11/18/goal-live-chemblaics-cd.html\">NullPointerException <i class=\"fa-solid fa-recycle fa-xs\"></i></a>, but JChemPaint runs! And note that\nthis was not even running with the latest of the latest; just recent packages from Kubuntu! Yes, there are some glitches, but I’m happy nevertheless!</p>",
      "summary": "Thanx to Mark’s encouragements, I tried to run Jmol and JChemPaint with jamvm.",
      
      "date_published": "2005-11-20T00:00:00+00:00",
      "date_modified": "2024-03-23T00:00:00+00:00",
      "tags": ["jchempaint","java","jmol","linux"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/e2cdx-9q525",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/11/18/goal-live-chemblaics-cd.html",
      "title": "The goal: a live chemblaics CD",
      "content_html": "<p>This evening I have been looking at with the <a href=\"http://www.knoppix.net/\">KNOPPIX</a> customization howto, and ran many of the interesting commands.\nI’ve setup a environment with Kalzium, OpenBabel, CDK, jython, <a href=\"http://pymol.sourceforge.net/\">PyMOL</a>, and for development I included gcj and\nEclipse. At some later point I will include kfile_chemical too, but I want to make a deb package first.</p>\n\n<p>Moreover, I also wanted it to include JChemPaint, Jmol and <a href=\"http://taverna.sourceforge.net/\">Taverna</a> (with the CDK extension). However, these\ndepend on Swing, which is not suffiently provided by open source java virtual machines. I attempted gij 4.0, <a href=\"http://www.kaffe.org/\">kaffe</a>\nand <a href=\"http://sablevm.org/\">sablevm</a>, all without success.</p>\n\n<p>A live CD with all the open source chemo- and bioinformatics tools would be a real killer. We could take a burned live CD with us to conferences\nand have others run our software on their laptop! But we need to stop use Swing. Fortunately, there seems to be a serious project going on to\nport JChemPaint and Jmol to a free Java GUI environment, so maybe we can have the live CD up and going before the 2006 conferences start.</p>",
      "summary": "This evening I have been looking at with the KNOPPIX customization howto, and ran many of the interesting commands. I’ve setup a environment with Kalzium, OpenBabel, CDK, jython, PyMOL, and for development I included gcj and Eclipse. At some later point I will include kfile_chemical too, but I want to make a deb package first.",
      
      "date_published": "2005-11-18T00:00:00+00:00",
      "date_modified": "2005-11-18T00:00:00+00:00",
      "tags": ["cheminf","linux","java","workflow"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/sfzaf-73y03",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/11/17/back-from-1st-gcc.html",
      "title": "Back from the 1st GCC",
      "content_html": "<p>OK, just back from the <a href=\"http://www.cic-workshop.de/\">first German Chemoinformatics Conference</a>, which I enjoyed very much. A rather interesting\nprogram, and lots of interesting posters too. You can read the programme online, and will not spend too many words on that (at least not now).\nBut what I will do is point out some interesting posters here.</p>\n\n<p>One poster was on the Molecular Query Language (MQL) by Ewgenij Proschak from <a href=\"http://gecco.org.chemie.uni-frankfurt.de/\">Frankfurt</a>. You can\nread more on this in the latest <a href=\"http://almost.cubic.uni-koeln.de/cdk/cdk_top/cdk_news/\">CDK News</a> as it is implemented for the CDK too.\nThe opensource implementation is expected next year.</p>\n\n<p>Another interesting poster was on the use of <a href=\"http://www.biowisdom.com/ontology/faq_q3.htm\">ontologies to connect chemistry and biology</a>.\nThis poster was by Juergen Harter from <a href=\"http://www.biowisdom.com/\">BioWisdom</a>, a Cambridge, UK based company.</p>\n\n<p><a href=\"http://www.scai.fraunhofer.de/209.0.html?&amp;L=1\">Marc Zimmermann</a> had a poster on the chemical OCR variant, called chemical structure\nrecognition (CSR). This process converts images, for example scanned from literature, into a connectivity table. Difficult task, indeed.\n<a href=\"http://www.ercim.org/publication/Ercim_News/enw60/zimmermann.html\">This page</a> contains some information about this project.</p>\n\n<p>There were other interesting posters too, so will probably report on those later too. But do feel free to leave comments to this blog post,\ndiscussing other interesting posters.</p>",
      "summary": "OK, just back from the first German Chemoinformatics Conference, which I enjoyed very much. A rather interesting program, and lots of interesting posters too. You can read the programme online, and will not spend too many words on that (at least not now). But what I will do is point out some interesting posters here.",
      
      "date_published": "2005-11-17T00:00:00+00:00",
      "date_modified": "2005-11-17T00:00:00+00:00",
      "tags": ["cheminf","ontology"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/37nvb-e8970",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/11/11/going-to-german-chemoinformatics.html",
      "title": "Going to the German Chemoinformatics Conference",
      "content_html": "<p>This sunday starts the first <a href=\"https://web.archive.org/web/20051215010113/https://www.cic-workshop.de/\">German Chemoinformatics Conference <i class=\"fa-solid fa-box-archive fa-xs\"></i></a> in\n<a href=\"http://www.goslar.de/\">Goslar</a>. It’s an interesting <a href=\"https://web.archive.org/web/20060206222231/http://scholle.oc.uni-kiel.de/users/cic/tagungen/workshop05/programm.html\">programme <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>, with\npresentations on the InChI, PubChem, 25 years of chemoinformatics, the chemical semantic web, and much more.</p>\n\n<p>Among these presentations is mine, on comparing crystal structures\n(<a href=\"https://web.archive.org/web/20050410111504/http://www.cac.science.ru.nl/research/publications/PDFs/willighagen2005.pdf\">PDF <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>)\nand deducing cell parameters. But I’m having a poster on QSAR too.</p>\n\n<p>I’ll arrive on saturday afternoon in Goslar, so leave a message at the conference hotel if you want to meet up, and talk about my work, or yours, or\nthe CDK, KDE, JChemPaint, Jmol, kfile_chemical, Kat/Chemistry, <a href=\"http://www.blueobelisk.org/\">BlueObelisk</a>, Eclipse, R, or whatever else…\nI plan to have a modest german meal and one or two beers in the evening.</p>\n\n<p>BTW, after Belém (Lissabon), Sintra, Boppard, Kinderdijk, Hoorn and Cologne, it’s the 7th\n<a href=\"http://whc.unesco.org/\">UNESCO world heritage</a> site I’m visiting in just 14 months! Can’t we just have conferences in Hawaii and sorts, like\nthey do in other fields?? Oh, wait, we do: EuroQSAR is on a cruise boat.</p>",
      "summary": "This sunday starts the first German Chemoinformatics Conference in Goslar. It’s an interesting programme , with presentations on the InChI, PubChem, 25 years of chemoinformatics, the chemical semantic web, and much more.",
      
      "date_published": "2005-11-11T00:00:00+00:00",
      "date_modified": "2025-02-16T00:00:00+00:00",
      "tags": ["cheminf","crystal","career"],
      "_references": [
        
          
          
            { "url": "https://doi.org/10.1107/S0108768104028344", "doi": "10.1107/S0108768104028344"
             }
            
          
        ],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/6n4we-wam18",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/11/10/scons-and-bksys-for-kfilechemical.html",
      "title": "Scons and bksys for kfile_chemical",
      "content_html": "<p>Not so long ago, it was <a href=\"http://conference2005.kde.org/slides/software-construction-tools-talk--thomas-nagy.pdf\">decided</a> that KDE 4.0\nwill use <a href=\"http://www.scons.org/\">SCons</a> as a configuration and building tool, instead of the autotools and make: the common\n<code class=\"language-plaintext highlighter-rouge\">./configure &amp;&amp; make &amp;&amp; make install</code> which has served the open source community very well for so long.</p>\n\n<p>SCons is <a href=\"http://dot.kde.org/1126452494/\">different</a> in several ways. One of these is that the tar.gz packages it produces are some\n500kB smaller, which makes a huge difference for <a href=\"http://kde-apps.org/content/show.php?content=28995\">kfile_chemical</a> which is\nnow 121kB instead of 635kB.</p>\n\n<p>Now, the <a href=\"http://www.kde.org/\">KDE</a> community, or Thomas Nagy to be precise, developed a helper for KDE software, called\n<a href=\"http://www.kde-apps.org/content/show.php?content=19243\">bksys</a>. Version 1.5.1, however, did not contain an example directory for kfile\nplugins, but I managed to work something out starting from the configuring scripts from <a href=\"http://kde-apps.org/content/show.php?content=12725\">kdissert</a>,\nand ended up with these <a href=\"http://websvn.kde.org/trunk/playground/utils/kfile_chemical/SConstruct?rev=479410&amp;view=log\">SConstruct</a> and\n<a href=\"http://websvn.kde.org/trunk/playground/utils/kfile_chemical/config.bks?rev=479414&amp;view=log\">config.bks</a>.</p>\n\n<p>Now, I haven’t figured out how to include the translations, but will figure that out sooner or later… for now, I’m quite happy with the new build system.</p>",
      "summary": "Not so long ago, it was decided that KDE 4.0 will use SCons as a configuration and building tool, instead of the autotools and make: the common ./configure &amp;&amp; make &amp;&amp; make install which has served the open source community very well for so long.",
      
      "date_published": "2005-11-10T00:00:00+00:00",
      "date_modified": "2005-11-10T00:00:00+00:00",
      "tags": ["kde","chemistry"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/hxb0r-66s49",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/11/08/when-to-stop-including-qsar-model.html",
      "title": "When to stop including QSAR model variables...",
      "content_html": "<p>Yesterday I reviewed an article which published a QSPR model which looked something like:</p>\n\n\\[y = 151 + 50p1 - 12p2 - 0.006p3\\]\n\n<p>with quite OK prediction results (R=0.9880). But I was not quite comfortable with the coefficient for the \\(p3\\) variable.\nThe article did not calculate significances for the coefficients, so it was not obvious from the article wether is was useful\nto include them. I then looked at the range for <code class=\"language-plaintext highlighter-rouge\">p3</code>, which was 110-150; so, the maximal influence this variable can have is\n\\(150*0.006 = 0.9\\). Now, the experimental values given in the article were rounded to integers, indicating that the maximal\neffect of the <code class=\"language-plaintext highlighter-rouge\">p3</code> variable is smaller than the experimental error! It’s even worse when you consider the difference between the\nmin and max value (40), then the influence would even be smaller (assuming that most model methods would put the mean temperature\neffect in the offset, 151 in this case).</p>\n\n<p>Today, I reread an article with a similar issue. The model was something like:</p>\n\n\\[y = -0.81 + 0.03*p1 + 0.009*p2\\]\n\n<p>Here, \\(max(p2)-min(p2)\\) is a smaller than 100, so the maximal effect of the variable would be in the order 0.9, which is of\nthe same order of the root mean square error of prediction (RMSEP) for this model. Indeed, the article already states that the\ncoefficient is only significant at the 95% level, and not at the 99% level. But, without having calculated the RMSEP for a model\nwithout the p4 variable, I would guess that leaving it out would give equally good prediction results.</p>\n\n<p>Concluding, I would say the the <code class=\"language-plaintext highlighter-rouge\">p2</code> variable does not include relevant information.</p>\n\n<p>Do you think it is reasonable to include the <code class=\"language-plaintext highlighter-rouge\">p2</code> variable in the second model?</p>",
      "summary": "Yesterday I reviewed an article which published a QSPR model which looked something like:",
      
      "date_published": "2005-11-08T00:00:00+00:00",
      "date_modified": "2005-11-08T00:00:00+00:00",
      "tags": ["cheminf","qsar"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/r8zfg-3e891",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/11/08/r-gui-rkward.html",
      "title": "A R GUI: rkward",
      "content_html": "<p>The great thing about open source is that… it’s open.</p>\n\n<p>When I was browsing the internet just now, I dropped in on <a href=\"http://dot.kde.org/\">KDE Dot News</a>. In the rightside column, there is a feed of\nnew KDE software from <a href=\"http://www.kde-apps.org/\">KDE-apps.org</a>. A new version of my favoriate music player,\n<a href=\"http://amarok.kde.org/\">amarok</a>, lured me to the KDE-apps website, where I saw <a href=\"http://rkward.sf.net/\">rkward</a> is latest announcement. The funny\nname, and the categorization as scientific, triggered some interest on my side, and it turned out to be a graphical frontend to my favorite statistics program,\n<a href=\"http://www.r-project.org/\">R</a>.</p>\n\n<p>Ok, they had a <a href=\"http://www.debian.org/\">Debian</a> package, and the debian/ build dir in the tar.gz so I downloaded it and started making a\n<a href=\"http://www.woc.science.ru.nl/devel/egonw/rkward_0.3.4_i386.deb\">Kubuntu 5.10 package</a>. While doing this I saw some notice about the R syntax highlighting\nused, which conflicts with the older version in the Kate packages.</p>\n\n<p>Then I realized that a long time ago, I wrote such syntax highlighting for Kate, so my attention was lured again. And, indeed, they use my syntax highlighting,\nthough <a href=\"http://www.uni-kiel.de/agrarpol/ahenningsen/index-e.html\">extended later</a> (somewhere down the page).</p>\n\n<p>And this makes me happy. The syntax highlighting was useful to me in the past, but apparently to a lot of other people too. And because I released it\nas GPL, back then, it now appears in rkward! Yes, a really like open source :)</p>",
      "summary": "The great thing about open source is that… it’s open.",
      
      "date_published": "2005-11-08T00:00:00+00:00",
      "date_modified": "2005-11-08T00:00:00+00:00",
      "tags": ["rstats","kde"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/v6wb4-fxp54",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/11/07/ubuntu-dapper-will-include-chemistry.html",
      "title": "Ubuntu Dapper will include chemistry features",
      "content_html": "<p>I just <a href=\"https://launchpad.net/distros/ubuntu/+spec/kubuntu-file-search\">read</a> that the <a href=\"http://www.kubuntu.org/\">Kubuntu</a> team\n<a href=\"https://wiki.ubuntu.com/KubuntuFileSearchWithKat\">wants</a> to include <a href=\"http://kat.mandriva.com/\">Kat</a> in the\n<a href=\"http://packages.ubuntu.com/dapper/\">dapper</a> release (scheduled for April 2006). Kat is (to be) the KDE equivalent of Google’s desktop search bar.</p>\n\n<p>This is great news for us chem-bla-icians, as Kat has support for full text searching of chemistry files! Let’s see if I can get the Kubuntu team\nto package up <a href=\"http://www.kde-apps.org/content/show.php?content=28995\">kfile_chemical</a> too, which will extend Kat (and KDE in general), with\nextraction of meta data from chemical documents.</p>",
      "summary": "I just read that the Kubuntu team wants to include Kat in the dapper release (scheduled for April 2006). Kat is (to be) the KDE equivalent of Google’s desktop search bar.",
      
      "date_published": "2005-11-07T00:00:00+00:00",
      "date_modified": "2005-11-07T00:00:00+00:00",
      "tags": ["kde","chemistry"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/b1vyj-0kd63",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/11/02/rcdk-install-fails-on-gcc-40-systems.html",
      "title": "R/CDK install fails on GCC 4.0 systems",
      "content_html": "<p>Some time ago <a href=\"http://blue.chem.psu.edu/~rajarshi/\">Rajarshi Guha</a> introduced <a href=\"http://www.r-project.org/\">R</a> bindings for the\n<a href=\"http://cdk.sf.net/\">CDK</a> (see his CDK News <a href=\"http://almost.cubic.uni-koeln.de/cdk/cdk_top/cdk_news/\">articles</a>), and\ntoday I tried to install his rcdk package that makes it happen.</p>\n\n<p>However, it requires <a href=\"http://www.omegahat.org/RSJava/\">SJava</a> which compiled fine on other machines, but not on my AMD64\nmachine. The problem seems to be related to the GNU GCC 4.0 compiler I have installed. Compiling with 3.4 works fine,\nbut 4.0 complains with:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>CtoJava.cweb:215: error: static declaration of <span class=\"s1\">'std_env'</span> follows non-static declaration\nCtoJava.cweb:195: error: previous declaration of <span class=\"s1\">'std_env'</span> was here\n</code></pre></div></div>\n\n<p>Googling, learned me that I am not the only one with this problem, but did not find any solution. If you know how to fix this problem, please leave a message in the comments.</p>",
      "summary": "Some time ago Rajarshi Guha introduced R bindings for the CDK (see his CDK News articles), and today I tried to install his rcdk package that makes it happen.",
      
      "date_published": "2005-11-02T00:00:00+00:00",
      "date_modified": "2005-11-02T00:00:00+00:00",
      "tags": ["cdk","rstats"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/gc4hw-5k265",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/11/02/open-source-data-mining-in.html",
      "title": "Open Source data mining in chemoinformatics",
      "content_html": "<p>On the <a href=\"http://web.archive.org/web/20050829074352/http://www.int-conf-chem-structures.org/\">7th International Conference on Chemical Structures <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>\n<a href=\"http://web.archive.org/web/20061011043216/http://www.medchem.leidenuniv.nl/people/jeroen_kazius.htm\">Jeroen Kazius <i class=\"fa-solid fa-box-archive fa-xs\"></i></a> has a\n<a href=\"http://web.archive.org/web/20070123184631/http://www.liacs.nl/~snijssen/gaston/iccs.html\">poster <i class=\"fa-solid fa-box-archive fa-xs\"></i></a> on finding discriminative substructures, that is, molecular fragments\nwhich can be discriminate between two acitivity classes. The software is released as\n<a href=\"http://web.archive.org/web/20060829055804/http://www.liacs.nl/~snijssen/gaston/\">Gaston <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>, is written in C++ and has the GPL license.</p>\n\n<p>Later I encountered <a href=\"http://web.archive.org/web/20051218120845/http://fuzzy.cs.uni-magdeburg.de/~borgelt/moss.html\">MoSS <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>\nwhich has the same goal, but uses a different algorithm.\nMoSS is written in Java and uses the LGPL license. MoSS reads STN and SMILES as input, which might not be optimal for all users,\nso a CDK port comes to mind.</p>",
      "summary": "On the 7th International Conference on Chemical Structures Jeroen Kazius has a poster on finding discriminative substructures, that is, molecular fragments which can be discriminate between two acitivity classes. The software is released as Gaston , is written in C++ and has the GPL license.",
      
      "date_published": "2005-11-02T00:00:00+00:00",
      "date_modified": "2025-06-08T00:00:00+00:00",
      "tags": ["iccs","cheminf"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/p46tq-r7946",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/11/01/annual-lunteren-meeting.html",
      "title": "The annual Lunteren meeting",
      "content_html": "<p>Most Dutch chemists have their annual Lunteren meeting, so do I. Lunteren is a small village on the Veluwe where nothing much can be done,\nexcept for listening to the presentations. I participate in the Lunteren meeting for analytical chemists, i.e. HPLC, MS, GC and all their\ncombinations upto and including HPLC/MS/MS, and since a few years the Lab-on-a-Chip stuff. And, as such, in many cases a lot of details on\nhow to use and develop these methods.</p>\n\n<p>For a computational chemist, this often is too much practical detail on too little -ics. Fortunately, the proteomics, genomics, etc is a\nstrong upcoming funding subject, so data analysis is getting in their picture too. Which is good for someone with a chemometrics/chemoinformatics\nbackground as funding in that area is getting smaller every year.</p>\n\n<p>My presentation went reasonable well, as far as I can tell myself. I was very nervous with both my professor and some 150 other people in the\naudience, but managed to not wander off the main topic. However, I was told to be a bit too monotone, but that’s an unfortunate effect of\nbeing so nervous.</p>",
      "summary": "Most Dutch chemists have their annual Lunteren meeting, so do I. Lunteren is a small village on the Veluwe where nothing much can be done, except for listening to the presentations. I participate in the Lunteren meeting for analytical chemists, i.e. HPLC, MS, GC and all their combinations upto and including HPLC/MS/MS, and since a few years the Lab-on-a-Chip stuff. And, as such, in many cases a lot of details on how to use and develop these methods.",
      
      "date_published": "2005-11-01T00:00:00+00:00",
      "date_modified": "2005-11-01T00:00:00+00:00",
      "tags": ["phd"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/xf9bq-44218",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/10/30/cdk-news.html",
      "title": "CDK News",
      "content_html": "<p>Just finished applying the latest spelling error fixes to <a href=\"https://sourceforge.net/projects/cdk/files/CDK%20News/2_3/\">CDK News 2.3 <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.\nTook me some three hours to finish it up the 12 pages, which has mostly to the need to recompile the PDF after each change to make sure that nothing in\nthe layout got broken.</p>\n\n<p>The content contains four <a href=\"https://web.archive.org/web/20070807110111/http://almost.cubic.uni-koeln.de/cdk/cdk_top/cdk_news/submitting\">communications <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>:</p>\n\n<ul>\n  <li>An Open Framework for Online QSAR Modeling</li>\n  <li>Atom types in the CDK</li>\n  <li>MQL - Development of a novel substructure query language</li>\n  <li>Stereochemistry detection in the CDK</li>\n</ul>\n\n<p>And, of course, the recurrent Editorial, FAQ and ChangeLog.</p>",
      "summary": "Just finished applying the latest spelling error fixes to CDK News 2.3 . Took me some three hours to finish it up the 12 pages, which has mostly to the need to recompile the PDF after each change to make sure that nothing in the layout got broken.",
      
      "date_published": "2005-10-30T00:00:00+00:00",
      "date_modified": "2023-07-30T00:00:00+00:00",
      "tags": ["cdk","cheminf"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/frske-p0649",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/10/29/kfilechemical-gets-xyz-mol2-smiles-vmd.html",
      "title": "kfile_chemical gets XYZ, Mol2, SMILES, VMD and GenBank support",
      "content_html": "<p>Jerome Pansanel contributed new patches for <a href=\"http://www.kde-apps.org/content/show.php?content=28995\">kfile_chemical</a>; on\nMonday actually, but I have been busy with other things, among which a presentation I have to give next Monday for some 100+\nanalytical chemists. The patch adds support to <a href=\"http://www.kde.org/\">KDE</a> for five new chemical MIMEs: XYZ, Mol2, SMILES,\nVMD and GenBank. Therefore, I just released a new version (0.10), and added an announcement to\n<a href=\"http://freshmeat.net/projects/kfile_chemical/\">Freshmeat.net</a>.</p>\n\n<p>As a reminder, version 1.0 will have all chemical mime types supported, after which I will initiate a process to formalize\nthe meta data we want the kfile plugins to give, which will lead to the 2.0 release. So far, I had in mind that the next\nstep was to make the plugins ready for KDE 4.0, but I became aware of the <a href=\"http://developer.kde.org/documentation/library/kdeqt/kde3arch/mime.html\">mime magic</a>\nas implemented in <a href=\"http://developer.kde.org/documentation/library/3.1-api/classref/kio/KMimeMagic.html\">KMimeMagic</a>.</p>\n\n<p>So, concluding, I might squeeze in another beta release 3.0, where this magic gets addressed; knowing that it will definately\nnot work for all files, but hopefully it will for files with stupid file extensions like .log.</p>",
      "summary": "Jerome Pansanel contributed new patches for kfile_chemical; on Monday actually, but I have been busy with other things, among which a presentation I have to give next Monday for some 100+ analytical chemists. The patch adds support to KDE for five new chemical MIMEs: XYZ, Mol2, SMILES, VMD and GenBank. Therefore, I just released a new version (0.10), and added an announcement to Freshmeat.net.",
      
      "date_published": "2005-10-29T00:00:00+00:00",
      "date_modified": "2023-07-30T00:00:00+00:00",
      "tags": ["kde","chemistry","web"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/a4vw2-r5y93",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/10/27/my-birthday-31-and-adsense.html",
      "title": "My birthday (31) and the Adsense",
      "content_html": "<p>Today is my 31st birthday, nearing half-point now (statistically seen). Also, by now I should have had my scientific moment of glory, otherwise I can forget that Nobel prize. Oh well, forget it.</p>\n\n<p>Have you seen those small advertisements on this page (RSS users, please visit the website :)? Funny links they give. The system is very nice btw: it awaits google indexing of the blog and then decides which ads are relevant. Hence, the links to small chemoinformatics companies. Nice to browse.</p>\n\n<p>Disclaimer, when clicking any or all of the ads, I’ll get a bit of money. But don’t start clicking away, otherwise Adsense will get upset, and then I get nothing.</p>",
      "summary": "Today is my 31st birthday, nearing half-point now (statistically seen). Also, by now I should have had my scientific moment of glory, otherwise I can forget that Nobel prize. Oh well, forget it.",
      
      "date_published": "2005-10-27T00:00:00+00:00",
      "date_modified": "2005-10-27T00:00:00+00:00",
      "tags": ["google"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/y9z8g-s6k09",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/10/25/more-cdkinterfaces-updates.html",
      "title": "More cdk.interfaces updates",
      "content_html": "<p>Yesterday I had some spare time before going to a meeting about the <a href=\"http://www.woc.science.ru.nl/\">Woordenboek Organische Chemie</a>,\nso I was boldly going where no one has went before: getting the CDK module core independent of the data module. Why, you might wonder…</p>\n\n<p>Well, if the as many modules of CDK become independent of the classes implementing the data interfaces, i.e. those classes that\nimplement the <a href=\"http://cdk.sourceforge.net/api/org/openscience/cdk/interfaces/package-frame.html\">org.openscience.cdk.interfaces</a>\ninterfaces, then it becomes possible to make alternative implementations. For example, an implementation that also implement the\n<a href=\"http://octetsource.net/\">Octet</a> interfaces, or an implementation that extends the <a href=\"http://joelib.sf.net/\">JOELib</a> classes. In that\nway, combining these libraries becomes as easy as writing a blog :)</p>\n\n<p>Anyway, today I finished the <a href=\"http://cdk.sourceforge.net/api/org/openscience/cdk/config/AtomTypeFactory.html\">AtomTypeFactory</a>, and\nonly the <a href=\"http://cdk.sourceforge.net/api/org/openscience/cdk/config/IsotopeFactory.html\">IstopeFactory</a> remains to be updated.\nSince many classes in the CDK library use these two classes, patches had to be applied throughout the library. And code outside the\nCDK library might be broken now, so be aware…</p>",
      "summary": "Yesterday I had some spare time before going to a meeting about the Woordenboek Organische Chemie, so I was boldly going where no one has went before: getting the CDK module core independent of the data module. Why, you might wonder…",
      
      "date_published": "2005-10-25T00:00:00+00:00",
      "date_modified": "2005-10-25T00:00:00+00:00",
      "tags": ["cdk"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/sqzez-f9r89",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/10/24/jchempaint-applet-download-size-538kb.html",
      "title": "JChemPaint applet download size: 538kB",
      "content_html": "<p>A good functional molecular editor is of much important to the chemical web. There are a few small download sized editors around.\n<a href=\"http://jchempaint.sf.net/\">JChemPaint</a> has been available as applet for some time now, but the download size has been large. The\nsituation has improved considerable over the past months, and the download size upon which the applet now shows up in your webbrowser\nis down to 538kB. A live demo is available from <a href=\"http://www.chemistry-development-kit.org/\">www.chemistry-development-kit.org</a>.</p>\n\n<p>The applet, however, does have the same functionality as the full application. When a feature is used that is not available from the\njars downloaded first (which make up the 538kB), additional jars are downloaded.</p>\n\n<p>The applet is not bugless yet. For example, drawing reactions does not seem to work :( But, it’s really getting somewhere.\nCongrats to the applet development team!</p>",
      "summary": "A good functional molecular editor is of much important to the chemical web. There are a few small download sized editors around. JChemPaint has been available as applet for some time now, but the download size has been large. The situation has improved considerable over the past months, and the download size upon which the applet now shows up in your webbrowser is down to 538kB. A live demo is available from www.chemistry-development-kit.org.",
      
      "date_published": "2005-10-24T00:00:00+00:00",
      "date_modified": "2005-10-24T00:00:00+00:00",
      "tags": ["jchempaint"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/6tezh-5a955",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/10/23/wrapping-up.html",
      "title": "Wrapping up...",
      "content_html": "<p>Less then three months before the end of my contract of my PhD project. And not nearly done yet. Weekends are now spend on wrapping up\nbits of experimental research into something like a coherent article. And even lot’s of calculations to do to answer the open\nquestions. <a href=\"http://freemind.sourceforge.net/\">FreeMind</a> is helping me organize thoughts.</p>\n\n<p>Opensource chemoinformatics is a welcomed diversion now and then. Working on some easy-to-fix CDK bugs yesterday, like the\n<a href=\"https://cdk.github.io/cdk/latest/docs/api/org/openscience/cdk/isomorphism/matchers/QueryAtomContainer.html\">QueryAtomContainer <i class=\"fa-solid fa-recycle fa-xs\"></i></a> now correctly\nupdated for the recent <a href=\"http://sourceforge.net/mailarchive/forum.php?thread_id=8016575&amp;forum_id=2178\">cdk.interfaces changes <i class=\"fa-solid fa-link-slash fa-xs\"></i></a>. Fixed now.\nI also touched a lot of code when updating the FSF address in the LGPL license notice, and when I modified the construction of\n<a href=\"https://cdk.github.io/cdk/latest/docs/api/org/openscience/cdk/exception/CDKException.html\">CDKException <i class=\"fa-solid fa-recycle fa-xs\"></i></a>’s to set the causing Throwable.\nAlso helped out <a href=\"http://www.livejournal.com/users/cniehaus/\">Carsten</a> a bit with adding his data from\n<a href=\"http://edu.kde.org/kalzium/\">Kalzium</a> to the <a href=\"http://www.blueobelisk.org/\">Blue Obelisk</a>\n<a href=\"https://github.com/BlueObelisk/bodr\">data repository <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.</p>\n\n<p>Another nice diversion is <a href=\"http://wesnoth.org/\">The Battle for Wesnoth</a>. Just got killed, though.</p>",
      "summary": "Less then three months before the end of my contract of my PhD project. And not nearly done yet. Weekends are now spend on wrapping up bits of experimental research into something like a coherent article. And even lot’s of calculations to do to answer the open questions. FreeMind is helping me organize thoughts.",
      
      "date_published": "2005-10-23T00:00:00+00:00",
      "date_modified": "2023-07-29T00:00:00+00:00",
      "tags": ["phd","cdk","career"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/fbnx1-9r832",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/10/21/viagra-saves-environment.html",
      "title": "Viagra saves the environment",
      "content_html": "<p>This week there was an interesting article in the Dutch <a href=\"http://intermediair.nl/\">Intermediar</a> about viagra. They cite an article in\n<a href=\"http://www.swetswise.com/eAccess/viewTitleIssues.do?titleID=68609\">Environmental Conversation</a> and state that it saves the environment\nas it greatly reduced the market for animal parts from the traditional chinese medicine that address the same problem as viagra does.</p>\n\n<p><em>Viagra: good for the environment, good for you! ;)</em></p>\n\n<p>You don’t see this often, though. Public opinion, at least in my social environment, is that chemicals (in general) are bad for the environment,\nwhat so ever… Natural products are much better. Wait, those are chemical too… but that is to complicated for most :(</p>\n\n<p>BTW, viagra is <a href=\"http://www.google.com/search?client=safari&amp;rls=en-us&amp;q=InChI%3D1S%2FC22H30N6O4S.C6H8O7%2Fc1-5-7-17-19-20%2827%284%2925-17%2922%2829%2924-21%2823-19%2916-14-15%288-9-18%2816%2932-6-2%2933%2830%2C31%2928-12-10-26%283%2911-13-28%3B7-3%288%291-6%2813%2C5%2811%2912%292-4%289%2910%2Fh8-9%2C14H%2C5-7%2C10-13H2%2C1-4H3%2C%28H%2C23%2C24%2C29%29%3B13H%2C1-2H2%2C%28H%2C7%2C8%29%28H%2C9%2C10%29%28H%2C11%2C12%29\">InChI=1S/C22H30N6O4S.C6H8O7/c1-5-7-17-19-20(27(4)25-17)22(29)24-21(23-19)16-14-15(8-9-18(16)32-6-2)33(30,31)28-12-10-26(3)11-13-28;7-3(8)1-6(13,5(11)12)2-4(9)10/h8-9,14H,5-7,10-13H2,1-4H3,(H,23,24,29);13H,1-2H2,(H,7,8)(H,9,10)(H,11,12) <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.</p>",
      "summary": "This week there was an interesting article in the Dutch Intermediar about viagra. They cite an article in Environmental Conversation and state that it saves the environment as it greatly reduced the market for animal parts from the traditional chinese medicine that address the same problem as viagra does.",
      
      "date_published": "2005-10-21T00:00:00+00:00",
      "date_modified": "2005-10-21T00:00:00+00:00",
      "tags": ["chemistry","environment"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/z97vw-87009",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/10/20/cdk-news-23-and-inchis.html",
      "title": "CDK News 2.3 and InChI&apos;s",
      "content_html": "<p><a href=\"https://sourceforge.net/projects/cdk/files/CDK%20News/\">CDK News <i class=\"fa-solid fa-recycle fa-xs\"></i></a> 2.3 is scheduled for this month, and origanally\nplanned to be distributed on the CDK5AW event. So, it’s a bit late. But the editorial process is converging… I realized that\nI forgot to mention the requirement for <a href=\"http://www.iupac.org/inchi/\">InChI</a>’s whenever molecules are given. So,\nI’m now in the process of going through the issue and add the missing identifiers…</p>",
      "summary": "CDK News 2.3 is scheduled for this month, and origanally planned to be distributed on the CDK5AW event. So, it’s a bit late. But the editorial process is converging… I realized that I forgot to mention the requirement for InChI’s whenever molecules are given. So, I’m now in the process of going through the issue and add the missing identifiers…",
      
      "date_published": "2005-10-20T00:00:00+00:00",
      "date_modified": "2023-07-28T00:00:00+00:00",
      "tags": ["cdk","inchi"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/904sy-xc977",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/10/19/jmols-fah-team-in-top-800.html",
      "title": "Jmol&apos;s FAH team in Top 800",
      "content_html": "<p>The <a href=\"https://wiki.jmol.org/index.php/Folding_At_Home_Community\">Jmol FAH team <i class=\"fa-solid fa-recycle fa-xs\"></i></a> has just entered the Top 800 of most active\n<a href=\"https://foldingathome.org/\">Folding@Home <i class=\"fa-solid fa-recycle fa-xs\"></i></a> teams. And they started monitoring contributions on a user level. Thus, I can now see how active\nI am within the team. And so can you! Join the team, and let’s get into the Top 500!</p>",
      "summary": "The Jmol FAH team has just entered the Top 800 of most active Folding@Home teams. And they started monitoring contributions on a user level. Thus, I can now see how active I am within the team. And so can you! Join the team, and let’s get into the Top 500!",
      
      "date_published": "2005-10-19T00:00:00+00:00",
      "date_modified": "2005-10-19T00:00:00+00:00",
      "tags": ["jmol"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/bs3x9-0em56",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/10/19/inchi-meta-data-with-kfilechemical.html",
      "title": "InChI meta data with kfile_chemical",
      "content_html": "<p>I’ve just uploaded <a href=\"http://web.archive.org/web/20051120044043/http://www.kde-apps.org/content/show.php?content=28995\">kfile_chemical 0.9 <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>. It has new translations for\nES and DA, and plugins for <a href=\"http://www.iupac.org/inchi/\">InChI</a> files. It will extract the InChI string as meta data (and will thus be used by the\n<a href=\"http://www.kde.org/\">KDE</a> desktop search <a href=\"http://web.archive.org/web/20230727174017/https://lwn.net/Articles/148822/\">Kat <i class=\"fa-solid fa-recycle fa-xs\"></i></a>, and the InChI version number.</p>\n\n<p>Thinking about this, it might be useful to extract all layers as meta data, so that one can search on chemical formula and even\nconnectivity, and find all matching structures. Not really close to substructure search, but we’ll tackle that later :)</p>",
      "summary": "I’ve just uploaded kfile_chemical 0.9 . It has new translations for ES and DA, and plugins for InChI files. It will extract the InChI string as meta data (and will thus be used by the KDE desktop search Kat , and the InChI version number.",
      
      "date_published": "2005-10-19T00:00:00+00:00",
      "date_modified": "2023-07-27T00:00:00+00:00",
      "tags": ["kde","inchi"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/pk40z-7z702",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/10/18/cdk-taverna-fully-recognized.html",
      "title": "CDK-Taverna fully recognized",
      "content_html": "<p>After asking about it, Tom explained me how <a href=\"http://taverna.sf.net/\">Taverna</a> can pick\nup the <code class=\"language-plaintext highlighter-rouge\">apiconsumer.xml</code> file from jars: just copy it into the root directory of the jar package. Easy as that.</p>\n\n<p>So, users now only need to copy the <code class=\"language-plaintext highlighter-rouge\">cdk-taverna.jar</code> into the <code class=\"language-plaintext highlighter-rouge\">taverna-workbench-1.3/lib/</code> directory and have a nice chemoinformatics\nworkbench environment. I’ll upload the jar to <a href=\"http://sourceforge.net/projects/cdk\">CDK’s project page</a> right now.</p>",
      "summary": "After asking about it, Tom explained me how Taverna can pick up the apiconsumer.xml file from jars: just copy it into the root directory of the jar package. Easy as that.",
      
      "date_published": "2005-10-18T00:00:00+00:00",
      "date_modified": "2005-10-18T00:00:00+00:00",
      "tags": ["cdk","workflow"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/f4370-9cz05",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/10/17/cia-statistics-for-blue-obelisk.html",
      "title": "CIA statistics for Blue Obelisk",
      "content_html": "<p>I have just enabled <a href=\"https://web.archive.org/web/20051024075530/http://cia.navi.cx/\">CIA <i class=\"fa-solid fa-box-archive fa-xs\"></i></a> statistics for the\n<a href=\"https://web.archive.org/web/20060422193559/http://www.blueobelisk.org/repos/blueobelisk/\">Blue Obelisk SVN <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>:\n<a href=\"http://cia.navi.cx/stats/project/cdk/blueobelisk\">/stats/project/cdk/blueobelisk  <i class=\"fa-solid fa-link-slash fa-xs\"></i></a>.</p>\n\n<p>It’s done by using the <a href=\"https://web.archive.org/web/20050924050012/http://cia.navi.cx/doc/clients\">ciabot_svn.py <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>\nclient script and hooked into the <code class=\"language-plaintext highlighter-rouge\">$REPOS/hooks/post-commit</code> hook on the SVN server. The client script is slightly hacked to hard code the module name, which\notherwise did not show up on the <a href=\"irc://irc.freenode.net/#cdk\">chat channel</a>.</p>",
      "summary": "I have just enabled CIA statistics for the Blue Obelisk SVN : /stats/project/cdk/blueobelisk .",
      
      "date_published": "2005-10-17T00:00:00+00:00",
      "date_modified": "2023-07-27T00:00:00+00:00",
      "tags": ["blue-obelisk"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/rgdzb-bfe36",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/10/15/single-pdfs-for-cdk-news-articles.html",
      "title": "Single PDFs for CDK News articles",
      "content_html": "<p>This week was the <a href=\"https://web.archive.org/web/20080208101002/http://almost.cubic.uni-koeln.de/cdk/cdk_top/events/cdk5yearworkshop/\">CDK5AW <i class=\"fa-solid fa-box-archive fa-xs\"></i></a> event, a workshop for users and\ndevelopers of the <a href=\"http://cdk.sf.net/\">Chemistry Development Kit</a> (CDK). After talking with other developers we agreed on\ncreating PDF and HTML versions of single articles that appeared in the\n<a href=\"https://sourceforge.net/projects/cdk/files/CDK%20News/\">CDK News <i class=\"fa-solid fa-recycle fa-xs\"></i></a> newsletter. Well, I haven’t figured out how to create nice HTML\n(the latex2html does not give nice results, anyone ideas?), but for the PDF version I now have a pipeline.</p>\n\n<p>For each article, a split.config file determines which pages from the CDK News issue PDF should be extracted. To do this, I used the\n<a href=\"http://www.accesspdf.com/pdftk/\">PDF ToolKit</a>, or pdftk for short (comes with Debian/Unbuntu by default). And using a Perl script to read this config files,\nthe pipeline creates PDF files for each article. Currently, I’ll only have it do the features articles; that is, not the\nChangeLog, Editorial, Literature and FAQ. For those you’ll need to download the full issue. If you don’t like that, let me know :)</p>\n\n<p>Ok, you will probably have noticed that the almost server is down\n(<a href=\"http://www.google.com/search?q=CDK+News\">Googling for ‘CDK News’</a> allows you read the cache!), and\nI the PDF’s will be uploaded there asap. For those not familiar with CDK News, the articles are FDL, so feel free to\ncopy and distribute them. If you reuse the text and update it, which is allowed too, please let us know.</p>",
      "summary": "This week was the CDK5AW event, a workshop for users and developers of the Chemistry Development Kit (CDK). After talking with other developers we agreed on creating PDF and HTML versions of single articles that appeared in the CDK News newsletter. Well, I haven’t figured out how to create nice HTML (the latex2html does not give nice results, anyone ideas?), but for the PDF version I now have a pipeline.",
      
      "date_published": "2005-10-15T00:00:00+00:00",
      "date_modified": "2023-07-27T00:00:00+00:00",
      "tags": ["cdk"],
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/za0jj-7x159",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/10/15/chem-bla-ics.html",
      "title": "Chem-bla-ics",
      "content_html": "<p>This new blog will deal with chemblaics in the broader sense, and will not be restricted to research in this field\nin which I am involved personally.</p>\n\n<p>Chemblaics (pronounced chem-bla-ics) is the science that uses computers to address and possibly solve problems in\nthe area of chemistry, biochemistry and related fields. The general denomiter seems to be molecules, but I might\nbe wrong there.</p>\n\n<p>The <strong>big</strong> difference between chemblaics and areas as cheminformatics, chemoinformatics, chemometrics, proteochemometrics,\netc, is that chemblaic <em>only</em> uses open source software, making experimental results reproducable and validatable.\nAnd this is a <strong>big</strong> difference with how research in these areas is now often done.</p>\n\n<p>Egon</p>",
      "summary": "This new blog will deal with chemblaics in the broader sense, and will not be restricted to research in this field in which I am involved personally.",
      
      "date_published": "2005-10-15T00:00:00+00:00",
      "date_modified": "2005-10-15T00:00:00+00:00",
      
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    }
  ]
}
