{
  "version": "https://jsonfeed.org/version/1.1",
  "title": "chem-bla-ics",
  "description": "Chemblaics (pronounced chem-bla-ics) is the science that uses open science and computers to solve problems in chemistry, biochemistry and related fields.",
  "home_page_url": "https://chem-bla-ics.linkedchemistry.info/",
  "feed_url": "https://chem-bla-ics.linkedchemistry.info/feed.json",
  "language": "en",
  "authors": [
    {
      "name": "Egon Willighagen",
      "url": "https://orcid.org/0000-0001-7542-0286",
      "_orcid": "0000-0001-7542-0286"
    }
  ],
  "items": [
    {
      "id": "https://doi.org/10.59350/cf885-kee54",
      "url": "https://chem-bla-ics.linkedchemistry.info/2025/01/19/blog-updates.html",
      "title": "Blog updates",
      "content_html": "<p><a href=\"https://doi.org/10.59350/nfqxs-qs982\">One-and-a-half years ago</a> I started migration my blog from blogger.com to a Markdown and Git-based blog.\nIt has been a fascinating journey that I do not regret. I love being back in control and not reliant on features of some\n<em>content management system</em>. I learned so much along the way, including <a href=\"https://jekyllrb.com/\">Jekyll</a> and <a href=\"https://jekyllrb.com/docs/liquid/\">Liquid</a>\nto start with, but also <a href=\"https://doi.org/10.59350/nfqxs-qs982\">Fontawesome</a> (for better or worse)m and <a href=\"https://doi.org/10.59350/8x2f1-h6d21\">Goatcounter</a>\nfor GDPR-compatible and privacy-first impact tracking.</p>\n\n<p>But I also greatly enjoy the interaction with the <a href=\"https://rogue-scholar.org/\">Rogue Scholar</a> team (particularly <a href=\"https://blog.front-matter.io/\">Martin Fenner</a>).\nFirst, it has great to be listed on (something like) a blog planet, and to read the collection of blog posts, of course! BTW, also thanks to\n<a href=\"https://larsgw.blogspot.com/\">Lars Willighagen</a> who joined Rogue Scholar earlier than I did. This interaction allowed me\nto take part in various innovations, like archiving and getting DOIs for blog posts, archiving entire blogs (see doi:<a href=\"https://doi.org/10.53731/3c6pm-xbp04\">10.53731/3c6pm-xbp04</a> and\ndoi:<a href=\"https://doi.org/10.59350/vjvdy-6p110\">10.59350/vjvdy-6p110</a>), <a href=\"https://doi.org/10.59350/er1mn-m5q69\">cite blog posts with DOIs</a>,\nreferences in blogs (e.g. see doi:<a href=\"https://doi.org/10.53731/m9d5v-xmr74\">10.53731/m9d5v-xmr74</a>),\n<a href=\"https://www.jsonfeed.org/\">JSON Feed</a> (see doi:<a href=\"https://doi.org/10.53731/d6vdvbt-tffmezj\">10.53731/d6vdvbt-tffmezj</a>;\n<a href=\"https://chem-bla-ics.linkedchemistry.info/feed.json\">last 10</a> or <a href=\"https://chem-bla-ics.linkedchemistry.info/archive.json\">full archive</a>),\n<a href=\"https://doi.org/10.59350/1cg8w-qth68\">ORCID support</a>,\nand if things goes well, <a href=\"https://doi.org/10.53731/m9d5v-xmr74\">preregistration of blogpost DOIs with commonmeta</a>.</p>\n\n<p>The JSON Feed is interesting. For example, it includes more specific support for references, something that any scholarly\nblogger should look at:</p>\n\n<div class=\"language-json highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"p\">{</span><span class=\"w\">\n  </span><span class=\"err\">...</span><span class=\"w\">\n  </span><span class=\"nl\">\"_references\"</span><span class=\"p\">:</span><span class=\"w\"> </span><span class=\"p\">[</span><span class=\"w\">\n    </span><span class=\"p\">{</span><span class=\"w\"> </span><span class=\"nl\">\"url\"</span><span class=\"p\">:</span><span class=\"w\"> </span><span class=\"s2\">\"https://doi.org/10.7717/peerj-cs.214\"</span><span class=\"w\"> </span><span class=\"p\">},</span><span class=\"w\">\n    </span><span class=\"p\">{</span><span class=\"w\"> </span><span class=\"nl\">\"url\"</span><span class=\"p\">:</span><span class=\"w\"> </span><span class=\"s2\">\"https://doi.org/10.5281/ZENODO.14562484\"</span><span class=\"w\"> </span><span class=\"p\">},</span><span class=\"w\">\n    </span><span class=\"p\">{</span><span class=\"w\"> </span><span class=\"nl\">\"url\"</span><span class=\"p\">:</span><span class=\"w\"> </span><span class=\"s2\">\"https://doi.org/10.5281/ZENODO.14562504\"</span><span class=\"w\"> </span><span class=\"p\">}</span><span class=\"w\">\n  </span><span class=\"p\">],</span><span class=\"w\">\n  </span><span class=\"err\">...</span><span class=\"w\">\n</span><span class=\"p\">}</span><span class=\"w\">\n</span></code></pre></div></div>\n\n<p>And the citations get propagated and show up like this in the Rogue Scholar archives:</p>\n\n<p><img src=\"/assets/images/rs_archives.png\" alt=\"\" /></p>\n\n<p>I think we also see the ongoing innovation in action. Previously, this is the first time I see the “Unknown title”,\nbut from the JSON it is obviously missing too. One thing to remember here, is that currently my blog does\nnot have this metadata, and when you read my blog, it is <a href=\"https://citation.js.org/\">citation.js</a>\n(doi:<a href=\"https://doi.org/10.7717/peerj-cs.214\">10.7717/peerj-cs.214</a>) that looks up the metadata using the DOI and adds\nthat to the blog post in your browser. Doing this when the HTML is being generated is something\nI still need to learn how to do that.</p>",
      "summary": "One-and-a-half years ago I started migration my blog from blogger.com to a Markdown and Git-based blog. It has been a fascinating journey that I do not regret. I love being back in control and not reliant on features of some content management system. I learned so much along the way, including Jekyll and Liquid to start with, but also Fontawesome (for better or worse)m and Goatcounter for GDPR-compatible and privacy-first impact tracking.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/rs_archives.png",
      "date_published": "2025-01-19T00:00:00+00:00",
      "date_modified": "2025-01-19T00:00:00+00:00",
      "tags": ["blog"],
      "_references": [{ "url": "https://doi.org/10.53731/m9d5v-xmr74" },{ "url": "https://doi.org/10.53731/d6vdvbt-tffmezj" },{ "url": "https://doi.org/10.53731/3c6pm-xbp04" },{ "url": "https://doi.org/10.7717/peerj-cs.214" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/fjbv7-53d20",
      "url": "https://chem-bla-ics.linkedchemistry.info/2025/01/05/sr24-results.html",
      "title": "Serious Request: the results",
      "content_html": "<p>The last week before the winter break <a href=\"https://chem-bla-ics.linkedchemistry.info/2024/12/09/sr24.html\">Serious Request took place</a>.\nWe started <a href=\"https://www.npo3fm.nl/kominactie/acties/wikipathways-in-actie-voor-metakids\">an action around WikiPathways</a> and\nwe collected 877 euro for <a href=\"https://nl.wikipedia.org/wiki/Stichting_Metakids\">the MetaKids Foundation</a>. In total there were 2612\nactions, many of which brought in a lot more. We ended up in position 928.</p>\n\n<p>But the money was only one part of our “donation” of the MetaKids goal to make 35 percent point more inherited metabolic\ndisorders treatable (which they currently are not), and to address the number one cause of death among Dutch kids.\nBecause our action focussed on getting more biology relevant to metabolic diseases into WikiPathays. For this we set\nup a <a href=\"https://sr24.wikipathways.org/\">WikiPathways SR24 community</a> page, along with a <a href=\"https://www.wikipathways.org/sr24-curation/index2.html\">curation page</a>\nshowing the results of automated curation alerts. Actually, in preparation of the Action, I updated that code\nbase to no longer have two states (succeeded, failed), but four states, depending on the percentage of tests failing\nfor that pathway. This has also been roled out to the <a href=\"https://www.wikipathways.org/\">main WikiPathways website</a>.</p>\n\n<p>In the weekend before our action, I wanted to test my <a href=\"skills\">PathVisio</a> and had a go at a pathway drawing\nfrom a book of which most pathways had already been digitized (see doi:<a href=\"https://doi.org/10.1007/978-3-030-67727-5_73\">10.1007/978-3-030-67727-5_73</a>),\nbut not this one. This resulted in a first pathway (wikipathways:<a href=\"https://wikipathways.org/instance/WP5504\">WP5504</a>),\nwhich was later that week greatly extended by <a href=\"https://scholar.google.com/citations?hl=en&amp;user=Le-4tuQAAAAJ\">Denise</a>.\nI also ported the table of chapters from this book to <a href=\"https://blau.wikipathways.org/\">the new WikiPathways community page for the book</a>.</p>\n\n<h2 id=\"a-list-of-genes\">A list of genes</h2>\n\n<p>From <a href=\"https://scholar.google.com/citations?user=6yvglHYAAAAJ&amp;hl=en\">Marek Noga</a> from our university medical center\nI received a pointer to a nice paper with a long list of diseases and matching genes (doi:<a href=\"https://doi.org/10.1002/jimd.12348\">10.1002/jimd.12348</a>)\nwhich provided a great starting point. I started out by making the data from the supplementary files more FAIR\nby <a href=\"https://social.edu.nl/@egonw/113661472648129803\">converting the data into RDF</a>.</p>\n\n<p>With SPARQL I compared the genes (via their HGNC symbols) with the content of WikiPathways:</p>\n\n<div class=\"language-sparql highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">PREFIX</span><span class=\"w\"> </span><span class=\"nn\">wp</span><span class=\"o\">:</span><span class=\"w\">      </span><span class=\"nn\">&lt;http://vocabularies.wikipathways.org/wp#&gt;</span><span class=\"w\">\n</span><span class=\"k\">PREFIX</span><span class=\"w\"> </span><span class=\"nn\">dc</span><span class=\"o\">:</span><span class=\"w\">      </span><span class=\"nn\">&lt;http://purl.org/dc/elements/1.1/&gt;</span><span class=\"w\">\n\n</span><span class=\"k\">SELECT</span><span class=\"w\"> </span><span class=\"nv\">?gene</span><span class=\"w\"> </span><span class=\"nv\">?omim</span><span class=\"w\"> </span><span class=\"nv\">?geneLabel</span><span class=\"w\"> </span><span class=\"k\">WHERE</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\">\n  </span><span class=\"nv\">?gene</span><span class=\"w\"> </span><span class=\"k\">a</span><span class=\"w\"> </span><span class=\"nn\">wp</span><span class=\"o\">:</span><span class=\"ss\">GeneProduct</span><span class=\"w\"> </span><span class=\"p\">;</span><span class=\"w\">\n    </span><span class=\"nn\">rdfs</span><span class=\"o\">:</span><span class=\"ss\">label</span><span class=\"w\"> </span><span class=\"nv\">?geneLabel</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n  </span><span class=\"k\">OPTIONAL</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\">\n    </span><span class=\"nv\">?gene</span><span class=\"w\"> </span><span class=\"nn\">rdfs</span><span class=\"o\">:</span><span class=\"ss\">seeAlso</span><span class=\"w\"> </span><span class=\"nv\">?omimIRI</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n    </span><span class=\"nv\">?omimIRI</span><span class=\"w\"> </span><span class=\"nn\">dc</span><span class=\"o\">:</span><span class=\"ss\">identifier</span><span class=\"w\"> </span><span class=\"nv\">?omim</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n    </span><span class=\"k\">FILTER</span><span class=\"w\"> </span><span class=\"p\">(</span><span class=\"nb\">contains</span><span class=\"p\">(</span><span class=\"nb\">str</span><span class=\"p\">(</span><span class=\"nv\">?omimIRI</span><span class=\"p\">),</span><span class=\"w\"> </span><span class=\"s2\">\"omim:\"</span><span class=\"p\">))</span><span class=\"w\">\n  </span><span class=\"p\">}</span><span class=\"w\">\n  </span><span class=\"nv\">?gene</span><span class=\"w\"> </span><span class=\"nn\">wp</span><span class=\"o\">:</span><span class=\"ss\">bdbHgncSymbol</span><span class=\"w\"> </span><span class=\"nv\">?hgnc</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n  </span><span class=\"k\">OPTIONAL</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\">\n    </span><span class=\"k\">SERVICE</span><span class=\"w\"> </span><span class=\"nn\">&lt;https://sparql.wikipathways.org/sparql&gt;</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\">\n      </span><span class=\"nv\">?wpGene</span><span class=\"w\"> </span><span class=\"nn\">wp</span><span class=\"o\">:</span><span class=\"ss\">bdbHgncSymbol</span><span class=\"w\"> </span><span class=\"nv\">?hgnc</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n    </span><span class=\"p\">}</span><span class=\"w\">\n  </span><span class=\"p\">}</span><span class=\"w\">\n  </span><span class=\"k\">FILTER</span><span class=\"w\"> </span><span class=\"p\">(</span><span class=\"o\">!</span><span class=\"nb\">BOUND</span><span class=\"p\">(</span><span class=\"nv\">?wpGene</span><span class=\"p\">))</span><span class=\"w\">\n  </span><span class=\"k\">FILTER</span><span class=\"w\"> </span><span class=\"p\">(</span><span class=\"nb\">CONTAINS</span><span class=\"p\">(</span><span class=\"nv\">?geneLabel</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"s2\">\" \"</span><span class=\"p\">))</span><span class=\"w\">\n</span><span class=\"p\">}</span><span class=\"w\">\n</span></code></pre></div></div>\n\n<p>This resulted in a <a href=\"https://docs.google.com/spreadsheets/d/1fWFKXVs9q172eHDpv4OLa0TcHuozTBweDe2_zOLJc-Q/edit?usp=sharing\">spreadsheet with more than 300 genes not in WikiPathways</a>.\nAn analysis by Karen Rothfels and Lisa Matthews showed that the number of genes not found in Reactome\nis only 129. Indeed, later analyses showed that Reactome has a few very relevant pathways missing in\nWikiPathways.</p>\n\n<h1 id=\"new-biological-pathways\">New biological pathways</h1>\n\n<p>To figure out, it turns out the <a href=\"https://pfocr.wikipathways.org/\">Pathway Figure OCR</a> (doi:<a href=\"https://doi.org/10.1186/s13059-020-02181-2\">10.1186/s13059-020-02181-2</a>)\nand <a href=\"https://www.ndexbio.org/\">NDEX</a> (doi:<a href=\"https://doi.org/10.1093/bioinformatics/btad118\">10.1093/bioinformatics/btad118</a>) tools\nare very useful here. They both allow passing a list of genes and return results (sets, pathways, models) relevant to\nthat list. NDEX includes the sets from Pathway Figure OCR, and those sets are a set of genes linked to single\njournal article which included a pathway diagram. I used this on the list of 371 genes not in WikiPathways and the list\nof 129 genes not in Reactome, and identified five articles. It actually turns out that two\nbasically described the same biology and both are captured in the same new pathway\n(wikipathways:<a href=\"https://wikipathways.org/instance/WP5505\">WP5505</a>). This pathway includes a good number\nof PIG genes, handling the very specific metabolic conversion of a metabolite.</p>\n\n<h1 id=\"complex-chemistry\">Complex chemistry</h1>\n\n<p>That <a href=\"https://social.edu.nl/@egonw/113678723229529283\">metabolite is complex</a> and databases do not seem to have the structure yet, so I set out\ngenerating a SMILES:</p>\n\n<p><img src=\"/assets/images/b0fbb7ea135318b9.png\" alt=\"\" /></p>\n\n<p>I reported the final SMILES, but I am not happy with it yet, and actually spotted an error already:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>N[Prot]C(=O)NCCOP(=O)([O-])OC[C@@H]1[C@@H](O)[C@H]([R11])[C@H]([R10])[C@@H](O1)O[C@H]1[C@@H]([R8])[C@H](O)[C@@H](C[R9])O[C@H]1OC[C@@H]1[C@@H]([R7])[C@H]([R6])[C@H]([R5])[C@@H](O1)OC[C@@H]1[C@@H](O[M3])[C@H](O)[C@@H](N)[C@H](O1)O[C@@H]1[C@@H](O)[C@H](O)[C@@H](O)[C@@H]([R3])[C@H]1OP(=O)([O-])OC[C@H]([R1])C[R2]\n</code></pre></div></div>\n\n<p>So, for completeness and as backup, here are the fragment SMILES that you can copy/paste into <a href=\"https://www.simolecule.com/cdkdepict/depict.html\">CDK Depict</a>:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>N[Prot]C(=O)NCCOP(=O)([O-])OC[C@@H]1[C@@H](O)[C@H]([R11])[C@H]([R10])[C@@H](O1)O[C@H]1[C@@H]([R8])[C@H](O)[C@@H](C[R9])O[C@H]1OC[C@@H]1[C@@H]([R7])[C@H]([R6])[C@H]([R5])[C@@H](O1)OC[C@@H]1[C@@H](O[M3])[C@H](O)[C@@H](N)[C@H](O1)O[C@@H]1[C@@H](O)[C@H](O)[C@@H](O)[C@@H]([R3])[C@H]1OP(=O)([O-])OC[C@H]([R1])C[R2]\nN[Prot]C(=O)NCCOP(=O)([O-])O protein-linked ethanolamine phosphate (E0)\n[E0]OC[C@@H]1[C@@H](O)[C@H]([R11])[C@H]([R10])[C@@H](O1)O[M2] Manα1-2 (M1)\n[M1]O[C@H]1[C@@H]([R8])[C@H](O)[C@@H](C[R9])O[C@H]1O[M3] Manα1-6 (M2)\n[M2]OC[C@@H]1[C@@H]([R7])[C@H]([R6])[C@H]([R5])[C@@H](O1)O[G4] Manα1-4 (M3)\n[R4]C[C@@H]1[C@@H](O[M3])[C@H](O)[C@@H](N)[C@H](O1)O[S5] GlCNα1-6 (G4)\n[G4]O[C@@H]1[C@@H](O)[C@H](O)[C@@H](O)[C@@H]([R3])[C@H]1OP(=O)([O-])OC[C@H]([R1])C[R2] phosphatidylinositol (S5)\n</code></pre></div></div>\n\n<h1 id=\"the-hackathon-day\">The hackathon day</h1>\n\n<p>On Thursday we had a hackathon day at our <a href=\"https://www.maastrichtuniversity.nl/research/translational-genomics\">Translational Genomics department</a>\n(UNS60 building). One of the Action organizers was still travelling back from Germany, but otherwise Tina, Denise, Daan, me, and Marek worked\non Thursday on various things. Tina worked on WP5505, Daan created his first pathways (wikipathways:<a href=\"https://wikipathways.org/instance/WP5507\">WP5507</a>),\nand so did Marek (wikipathways:<a href=\"https://wikipathways.org/instance/WP5506\">WP5506</a>).</p>\n\n<p>We now have 36 pathways on <a href=\"https://sr24.wikipathways.org/\">the community page</a>:</p>\n\n<p><img src=\"/assets/images/sr24_community_pathways.png\" alt=\"\" /></p>\n\n<p>After that hackathon, and to wrap up things, I finalized the updated to the curation page, making the output\nlook better (more curation tests now output Markdown) and failing tests now almost all have an explanation page\nshowing how the affected pathway can be improved (to address the issue).</p>\n\n<p>Somewhere next week, the results of the pathways will be available from the <a href=\"https://sparql.wikipathways.org/\">WikiPathways SPARQL endpoint</a>\nand I can then calculate new numbers. The number of genes not in WikiPathways should be lower.</p>\n\n<p>Finally, perhaps, there are some very specific results, but also we have created a nice todo list:</p>\n\n<ul>\n  <li>plenty of curation on those 36 pathways remains to be done</li>\n  <li>we still have many genes of interest not in pathways, and we should start stubs in WikiPathways</li>\n  <li>we need a better overview of the mitochondiral biology</li>\n</ul>\n\n<p>And there are also still a few issues open:</p>\n\n<ul>\n  <li>I have a todo item to make a curation SPARQL query available via the automated testing (enhancement)</li>\n  <li>not all interactions end up in the RDF (bug)</li>\n</ul>\n\n<p>That bug actually has significant impact on downstream analyses, I guestimate.</p>",
      "summary": "The last week before the winter break Serious Request took place. We started an action around WikiPathways and we collected 877 euro for the MetaKids Foundation. In total there were 2612 actions, many of which brought in a lot more. We ended up in position 928.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/b0fbb7ea135318b9.png",
      "date_published": "2025-01-05T00:00:00+00:00",
      "date_modified": "2025-01-17T00:00:00+00:00",
      "tags": ["sr24"],
      "_references": [{ "url": "https://doi.org/10.1002/jimd.12348" },{ "url": "https://doi.org/10.1007/978-3-030-67727-5_73" },{ "url": "https://doi.org/10.1186/s13059-020-02181-2" },{ "url": "https://doi.org/10.1093/bioinformatics/btad118" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/1cg8w-qth68",
      "url": "https://chem-bla-ics.linkedchemistry.info/2025/01/04/isaac-browser-extension.html",
      "title": "ISAAC Chrome Extension",
      "content_html": "<p>In 2022 I had my first experience with the <a href=\"https://isaac.nwo.nl/\">ISAAC database</a>\nby the Dutch <a href=\"https://www.nwo.nl/\">NWO</a> research funding organization. ISAAC is\nwhere you apply for funding and where grants get tracked. As such, research output\nis recorded in this database.</p>\n\n<p>The list of supported research output types in ISAAC is a bit dated, but includes\nscientific articles, books and monographs, book chapters, PhD theses, conference\nproceedings papers, professional publications, publications aimed at a broad audience,\npatents, contracts, and other. With <a href=\"https://recognitionrewards.nl/\">Recognition &amp; Rewards</a>\nin mind,this list should\nbe more diverse. And clearly missing are software and data, because these are\nalready supported by global unique identifiers and dedicated efforts for\nsoftware citations and data citations. FAIR has progressed a lot for these two\ntypes.</p>\n\n<p>When entering new research output to a project in ISAAC, you get asked to fill\nout various HTML forms. For articles, each author is a separate HTML form. All\nin all, quite a bit of work. But with unique identifiers and open APIs, it is a\nwaste of research funding to have to enter this all by hand. Some time earlier\nI heard of browser plugins in the USA that automagically filled out those forms,\nand realized I wanted that too.</p>\n\n<p>Fortunately, Lars Willighagen had done much of the work already with\n<a href=\"https://citation.js.org/\">citation.js</a> (doi:<a href=\"https://doi.org/10.7717/peerj-cs.214\">10.7717/peerj-cs.214</a>),\na JavaScript library that can convert formats like BibTeX into formatted references\n(with <a href=\"https://citationstyles.org/\">Citation Style Language</a> and\n<a href=\"https://citeproc-js.readthedocs.io/\">citeproc-js</a>), but also can support various\nidentifiers to fetch bibliographic metadata. All we needed is to integrate that.\nAnd so the <a href=\"https://chromewebstore.google.com/detail/ISAAC%20Chrome%20extension/kiljfbiapahlahhilgcgfkfjnkgggode?hl=en-GB&amp;authuser=0\">ISAAC Chrome extension</a>\nwas born. But the history, technology, and use has not been written up, while we\nhave a solid base of some 50 users who regularly use it. And one user <a href=\"https://chromewebstore.google.com/detail/isaac-chrome-extension/kiljfbiapahlahhilgcgfkfjnkgggode/reviews\">wrote</a>:</p>\n\n<blockquote>\n  <p>Werkt geweldig. [..] dit de enige manier waarop publicaties redelijk ingevoerd kunnen worden.</p>\n</blockquote>\n\n<p>Actually, maybe we should rename the extension to <em>ISAAC Browser Extenaion</em>,\nbecause it also works in Brave and Edge.</p>\n\n<h2 id=\"2025-updates\">2025 updates</h2>\n\n<p>The last update had been a while, and we got reports of some changes on the ISAAC\ndatabase side, and we could confirm at least one of the HTML form identifiers had\nchanged, so we fixed filling out the Open Access status of output. This is released\nas <a href=\"https://github.com/citation-js/isaac-chrome-extension/releases/tag/v1.5.0\">version 1.5.0</a>\n(doi:<a href=\"https://doi.org/10.5281/zenodo.14562484\">10.5281/zenodo.14562484</a>).</p>\n\n<p>Another change is that the ISAAC database now supports listing the <a href=\"https://orcid.org/\">ORCID</a>\nidentifier of authors, and this metadata is increasingly available from research\noutput metadata, and <a href=\"https://github.com/citation-js/isaac-chrome-extension/commit/8306809803ef93f448645fc4ca8c55d4c9bb7c6b\">a single line change</a> was enough for Lars to update the extension\nto automatically fill that out too. This is FAIR in action. This version is released\nas <a href=\"https://github.com/citation-js/isaac-chrome-extension/releases/tag/v1.6.0\">version 1.6.0</a>\n(doi:<a href=\"https://doi.org/10.5281/zenodo.14562504\">10.5281/zenodo.14562504</a>) and should\nbe available from the webstore soon.</p>\n\n<h2 id=\"how-it-works\">How it works</h2>\n\n<p>While the ISAAC database does not have an API, at least we found sufficient hooks\nin the HTML to get a reproducible workflow. The foundation of the browser extension\nis global unique identifiers, and it supports DOIs, ISBNs, and PubMed identifiers\nfor research output. For authors, it supports the ORCID. To fetch the metadata,\nit uses online resources Crossref, DataCite, and mEDRA, Google Books and OpenLibrary,\nPubMed and Unpaywall. The first three to fetch metadata for DOIs, the next two for\nISBN numbers, and PubMed for, well, PubMed identifiers. Based on the retrieved\nmetadata it determines which type of research output it should fill out the HTML\nfor.</p>\n\n<p><a href=\"https://unpaywall.org/\">Unpaywall</a> is used to see if the output is, for example,\npublished in a purely open access venue (like a CC-BY-only journal like <a href=\"https://elifesciences.org/\">eLife</a>\nor Nature’s <a href=\"https://www.nature.com/sdata/\">Scientific Data</a>), or published in a\nhybrid journal. The ISAAC database does not have the option to drop a URL (which\ncould be automated with Unpaywall too), but does allow uploading documents into\ntheir database. This last is left to the user.</p>\n\n<h2 id=\"how-to-use-it\">How to use it</h2>\n\n<p>Users would install the browser extension and this would add an add-on icon to\nthe toolbar. The <img src=\"/assets/images/icon.svg\" width=\"16\" alt=\"Icon: Black serif font 'I' on a background of four colored squares: brown, gold, green and platinum\" /> icon shows the various colors of Open Access with an <code class=\"language-plaintext highlighter-rouge\">I</code>, for\nidentifier. The user would then login on the ISAAC database, open their project\ngrant page, and navigate to the Product tab:</p>\n\n<p><img src=\"/assets/images/isaac2025_1.png\" alt=\"\" /></p>\n\n<p>To use the extension, the user would take the following steps.\nFirst, click the “Toevoegen” button, green-blue with white letters in the above\nscreenshot. This would give a page like this:</p>\n\n<p><img src=\"/assets/images/isaac2025_2.png\" alt=\"\" /></p>\n\n<p>Second, and optionally, click one of the types. The metadata retrieved by the extension\ncontains sufficient information to make the right guess, so that this step is optional.\nIf you find that the metadata is wrong and the wrong guess was made, in this step\nyou can first manually indicate the research output type.</p>\n\n<p>Third, one the page with the above screenshot (or, optionally, after indicating\nthe output type), click the ISAAC Chrome Extension icon in the browser toolbar\nto give a popup dialog:</p>\n\n<p><img src=\"/assets/images/isaac2025_3.png\" alt=\"\" /></p>\n\n<p>Fourth, select the identifier type (DOI, ISBN, or PubMed) and give the identifier\nitself, and then click Search. For example, for a DOI, it would look like this:</p>\n\n<p><img src=\"/assets/images/isaac2025_4.png\" alt=\"\" /></p>\n\n<p>Fifth, the plugin will then guide you through the ISAAC HTML forms, just like you\nwould do manually, but with the important difference that some forms show in different\norder. But rest assured, it will not submit anything before your final approval.\nFor example, for a journal article I would immediate fo to the HTML form for the\nfirst author, which, for a random article, could look like this:</p>\n\n<p><img src=\"/assets/images/isaac2025_5.png\" alt=\"\" /></p>\n\n<p>By clicking “Verder”, the browser extension allows you to add missing metadata\n(for example, the ORCID is not listed for this author in the CrossRef metadata\nand the gender is not shared by the publisher), and sometimes you may find yourself\nneeding to correct metadata.</p>\n\n<p>Sixth, after going through all author pages, you will return to the main product\nform, which will look something like this (for a random paper):</p>\n\n<p><img src=\"/assets/images/isaac2025_6.png\" alt=\"\" /></p>\n\n<p>Here you can add the final missing information and upload additional files, like\na PDF of the article. In the above screenshot we find some required (red asterix)\nmissing. In this case, the DOI referred to an article published as “as soon as\npublishable” and the page numbers and issue is simply not known yet. You can also\nsee the Unpaywall metadat in action here.</p>\n\n<p>Seven, like before, the final submission of this new output is done manually.\nThe ISAAC Chrome Extension requires that manual step; on purpose: you are in control.</p>",
      "summary": "In 2022 I had my first experience with the ISAAC database by the Dutch NWO research funding organization. ISAAC is where you apply for funding and where grants get tracked. As such, research output is recorded in this database.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/isaac2025_1.png",
      "date_published": "2025-01-04T00:00:00+00:00",
      "date_modified": "2025-01-04T00:00:00+00:00",
      "tags": ["javascript"],
      "_references": [{ "url": "https://doi.org/10.7717/peerj-cs.214" },{ "url": "https://doi.org/10.5281/ZENODO.14562484" },{ "url": "https://doi.org/10.5281/ZENODO.14562504" }],
      
      
        "authors": [
        
          
            { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" },
          
        
          
            { "name": "Lars Willighagen", "url": "https://orcid.org/0000-0002-4751-4637" }
          
        
        ]
      
    },
    {
      "id": "https://doi.org/10.59350/er1mn-m5q69",
      "url": "https://chem-bla-ics.linkedchemistry.info/2024/12/30/fair-blog-to-blog-citations.html",
      "title": "FAIR blog-to-blog citations",
      "content_html": "<p><a href=\"https://chem-bla-ics.linkedchemistry.info/2021/08/28/scholarly-journals-should-use-archived.html\">Linkrot is real</a> and\n<a href=\"https://doi.org/10.59348/1z1p2-nn569\">digital preservation problematic</a>. One reason why I have\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2023/07/27/archiving-and-updating-my-blog.html\">started migrating my blog</a>\nto a more robust platform. That first step gave me version control. This summer my blog was\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2024/07/21/rogue-scholar-and-more.html\">accepted to Rogue Scholar</a>.\nThat gave me DOIs. And an idea.</p>\n\n<p>Things are coming together, and while commercial publishers (SpringerNature, Elsevier, MDPI, Frontiers, etc)\nare focused on profit (“shareholder value”) instead of the community they serve, Open Science is providing\nworking, real-world, inexpensive, superior FAIR solutions for scientific dissemination. Maybe European\nuniversities are not convinced yet (see <a href=\"https://doi.org/10.59350/1nmwy-nhk20\">Björn’s post</a>), but it is\nhappening.</p>\n\n<p>Two things that are happening are <a href=\"https://openalex.org/\">OpenAlex</a> and <a href=\"https://opencitations.net/\">OpenCitations</a>.\n<a href=\"https://chem-bla-ics.linkedchemistry.info/tag/cito\">CiTO adoption</a> not so much yet, but I am not giving up\nyet. Simply because Open Science doesn’t go away and everything can be picked up tomorrow. Each holiday\nI am picking up the Citation Typing Ontology and this holiday the\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2024/04/02/open-science-retreat-2.html\">use of nanopublications for CiTO intent annotation</a>\nof April this year.</p>\n\n<p>Yesterday, I played with the nanopublication templates used by NanoDash, got to the triples of it, and\nended up using the web interface to create <a href=\"https://w3id.org/np/RA43F9EoOuzF0xoNUnCMNyFsfIqlsuWDdPHCnN0wCdCAw\">a derived template</a>\nfrom <a href=\"https://w3id.org/np/RAX_4tWTyjFpO6nz63s14ucuejd64t2mK3IBlkwZ7jjLo\">Tobias’ template from April</a>.\nWhat makes this nanopublication template special is that it uses <a href=\"https://github.com/SPAROntologies/cito\">the CiTO ontology</a>.</p>\n\n<p>The difference is that the original template used <code class=\"language-plaintext highlighter-rouge\">ScholarlyWork</code> as type for the citing resource,\nwhile the derivative uses <code class=\"language-plaintext highlighter-rouge\">CreativeWork</code> from the schema.org namespace, allowing things like this:</p>\n\n<ul>\n  <li>article to software release: <a href=\"https://w3id.org/np/RAzmTPPM7v5Ilgvo-3aFRRZgdD3ImaUB434NtGlfI0G90\">example nanopub</a></li>\n  <li>article to blog: <a href=\"https://w3id.org/np/RAaRH1WhRgirso3JiTUJJ0XcBaRyHI6G4OZPdWBoIf17U\">example nanopub</a></li>\n  <li>blog to article: <a href=\"https://w3id.org/np/RAXL9q3jakrpaDh8oyVaNS1Y7JowmZm4tx4WcdIFMmg8g\">example nanopub</a></li>\n  <li>blog to blog: <a href=\"https://w3id.org/np/RAJOwolZUwUxuvPEhMFiQYHywJdWMWTlt_gnXoUbUBaYY\">example nanopub</a></li>\n</ul>\n\n<p>The last three are possible because of the Rogue Scholar DOIs. Let’s continue with the fourth example,\nthe blog to blog citation. While an URL is a unique, global identifier, the digital preservation depends\non a lot of things. On the other hand, a DOI with the associated metadata is easier to preserve. For example,\nbecause it can be spread more easily than the digital object itself.</p>\n\n<p>So, when <a href=\"https://blog.front-matter.io/author/martin/\">Martin Fenner</a> and\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2024/12/08/rich-l-apodaca.html\">I</a>\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2024/12/27/archiving_blogs.html\">started</a>\n<a href=\"https://doi.org/10.53731/3c6pm-xbp04\">archiving</a>\nthe <a href=\"https://depth-first.com/\">Depth-First blog of Rich Apodaca</a> to digitally preserve his blog,\nit also automatically gave the blog posts DOIs. This makes the blog more FAIR, just like it does\nfor my blog. And being more FAIR, we can use the DOIs for other things too, like blog to blog\ncitations with CiTO intent annotation, as nanopublications.\n(Technically, any Springer Nature journal can do this, but they found reasons to not do it.)</p>\n\n<p>So, let’s take <a href=\"https://chem-bla-ics.linkedchemistry.info/2024/12/08/rich-l-apodaca.html\">this blog post</a>.\nI have today updated this to not use <code class=\"language-plaintext highlighter-rouge\">depth-first.com</code> URLs but, following Martin’s example, use the DOIs\nfor those posts instead.</p>\n\n<p>And when I make a nanopublication out of this, I can add the citation intent, and then it looks like\n<a href=\"https://w3id.org/np/RAmETOQXyoS5dYeP8yhJscOrAIimf1RHFnzG2GtziqIQ8\">this</a>:</p>\n\n<p><img src=\"/assets/images/nanopub1.png\" alt=\"\" /></p>\n\n<p>For some reason, the DOIs do not show up as references as they do for this post for the DOIs of the\nposts of Martin Paul Eve, Björn Brembs, and Martin Fenner. It does for\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2009/11/19/chempedia-rdf-1-sparql-end-point.html\">this post citing Depth-First</a>.</p>\n\n<p>So, from now on, I will use DOIs when citing other blog posts, and I hope many other blogs will\nstart using Rogue Scholar or some other service to generate DOIs for single blog posts.\nI also have to figure out if I want to use DOIs to link to posts in my own blog.\nAnd hopefully, OpenCitations will soon accept citations provided by nanopublications.\nWith or without CiTO intent annotations, whatever comes first. Oh, and I cannot wait to see\nthe citations who up in <a href=\"https://www.altmetric.com/\">Altmetric.com</a> :)</p>\n\n<p>Let’s see where this is going.</p>",
      "summary": "Linkrot is real and digital preservation problematic. One reason why I have started migrating my blog to a more robust platform. That first step gave me version control. This summer my blog was accepted to Rogue Scholar. That gave me DOIs. And an idea.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/nanopub1.png",
      "date_published": "2024-12-30T00:00:00+00:00",
      "date_modified": "2024-12-30T00:00:00+00:00",
      "tags": ["cito","blog","publishing"],
      "_references": [{ "url": "https://doi.org/10.59348/1z1p2-nn569" },{ "url": "https://doi.org/10.59350/1nmwy-nhk20" },{ "url": "https://doi.org/10.53731/3c6pm-xbp04" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/vjvdy-6p110",
      "url": "https://chem-bla-ics.linkedchemistry.info/2024/12/27/archiving_blogs.html",
      "title": "Archiving blogs",
      "content_html": "<p>Blogs come and go. Sometimes they move from one location to another. However, blogs have not been systematically\narchived, perhaps for work by efforts by OpenLaboraty. Bora Zivkovic gave in 2012\n<a href=\"https://web.archive.org/web/20120713032329/http://blogs.scientificamerican.com/a-blog-around-the-clock/2012/07/10/science-blogs-definition-and-a-history/\">a good overview <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>,\nto which Paul Raeburn <a href=\"https://ksj.mit.edu/tracker-archive/what-was-first-science-blog/\">replied</a>: <em>“If you weren’t\nblogging in the mid-2000s, when all the science bloggers knew and blogrolled each other, you’ve already missed the golden\nage.”</em>. I think blogging is as strong as ever, but a lot of blogs have become more like columns in bigger media.\nArchiving of blog had not been done systematically, tho some posts made it into print, for example in\n<a href=\"https://web.archive.org/web/20120114030926/http://blogs.scientificamerican.com/network-central/2011/07/18/open-laboratory-2011-submissions-so-far/\">the Open Laboratory <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>\nseries. Some copies made it into libraries, e.g. <a href=\"https://search.worldcat.org/en/title/225554926\">2006</a>,\n<a href=\"https://search.worldcat.org/en/title/727023103\">2010</a>, and <a href=\"https://search.worldcat.org/en/title/797975793\">2012</a>.</p>\n\n<p>The two posts from the first paragraph from the <em>blogs.scientificamerican.com</em> provide a good example of the problem:\nbitrot. The Internet Archive has always been useful for archiving webpages and has been useful for archiving blogs too.\nBut I do not believe it has been used systematically either, but at least it helped recover the above two pages.</p>\n\n<p>So, when I discussed <a href=\"https://depth-first.com/\">the blog of Rich Apodaca</a> <a href=\"https://chem-bla-ics.linkedchemistry.info/2024/12/08/rich-l-apodaca.html\">earlier this month</a>,\nthe question came up if we could archive his blog. Beside his <a href=\"https://depth-first.com/\">personal coverage</a> of\nhis cancer, his blog also covers a good bit of open science cheminformatics of the zeros and 10s.</p>\n\n<h2 id=\"rogue-scholar\">Rogue Scholar</h2>\n\n<p>This is where <a href=\"https://rogue-scholar.org/\">Rogue Scholar</a> comes in. <a href=\"https://blog.front-matter.io/\">Martin Fenner</a>\ntook up my question and started archiving Rich’ blog, resulting in <a href=\"https://rogue-scholar.org/communities/rapodaca/records?q=&amp;l=list&amp;p=1&amp;s=10&amp;sort=newest\">this ‘community’</a>\ncollecting the blog posts. This is what an archive page for a single blog post looks like:</p>\n\n<p><img src=\"/assets/images/depth-first-on-rogue-scholar.png\" alt=\"\" /></p>\n\n<p>What this archive now has is DOIs for each blog post, archived metadata that will also propagate via DataCite, etc.\nIt does not have PDFs or other copies of the full blog posts yet. There are more than 900 blog posts to create\nPDFs for. Anyone <a href=\"https://mastodon.social/@egonw/113725573843479243\">has an idea?</a></p>\n\n<p>I will post later this year about formally/semantically linking blogs citing other blogs using DOIs for blog\nposts, for example from Rogue Scholar. Any probably throw in <a href=\"http://localhost:4000/2024/04/02/open-science-retreat-2.html\">some use of the Citation Typing Ontology</a>.</p>\n\n<p>Anyway, I can recommend everyon to get their blog lists on Rogue Scholar, for the DOIs and for the automatic\narchiving.</p>",
      "summary": "Blogs come and go. Sometimes they move from one location to another. However, blogs have not been systematically archived, perhaps for work by efforts by OpenLaboraty. Bora Zivkovic gave in 2012 a good overview , to which Paul Raeburn replied: “If you weren’t blogging in the mid-2000s, when all the science bloggers knew and blogrolled each other, you’ve already missed the golden age.”. I think blogging is as strong as ever, but a lot of blogs have become more like columns in bigger media. Archiving of blog had not been done systematically, tho some posts made it into print, for example in the Open Laboratory series. Some copies made it into libraries, e.g. 2006, 2010, and 2012.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/depth-first-on-rogue-scholar.png",
      "date_published": "2024-12-27T00:00:00+00:00",
      "date_modified": "2024-12-27T00:00:00+00:00",
      "tags": ["blog","openlab"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/b76wv-bbn97",
      "url": "https://chem-bla-ics.linkedchemistry.info/2024/12/09/sr24.html",
      "title": "Serious Request: &quot;WikiPathways in actie voor MetaKids&quot;",
      "content_html": "<p><a href=\"https://sr24.wikipathways.org/\"><img src=\"/assets/images/sr24.png\" style=\"width: 40%; display: block; margin-left: auto; margin-right: auto; float: right\" alt=\"Screenshot of the 'WikiPathways in actie voor MetaKids' action page.\" /></a>\nEvery day a child is born with an <a href=\"https://imd.wikipathways.org/\">inherited metabolic disorder</a>, and many do not grow old.\n<a href=\"https://metakids.nl/\">MetaKids</a> is a Dutch foundation that collects money and raises awareness and the charity selected\nthis year for the <a href=\"https://npo.nl/\">NPO</a> (Dutch national radio/tv) <a href=\"https://en.wikipedia.org/wiki/NPO_3FM\">3FM</a>\n<a href=\"https://www.npo3fm.nl/kominactie\">Serious Request</a>. This has become <a href=\"https://en.wikipedia.org/wiki/Serious_Request\">a Dutch tradition</a>.\nSerious Request will play music on the radio, when people contributed to the fundraiser, and the more money, the\nmore often the music gets played.</p>\n\n<p>But besides this, Serious Request also encourages people to jump into action. And we have jumped into action.</p>\n\n<h2 id=\"what-we-will-do\">What we will do</h2>\n\n<p>In the week when the <a href=\"https://en.wikipedia.org/wiki/Disc_jockey\">DJ</a>s are locked up in their\n<a href=\"https://en.wikipedia.org/wiki/Serious_Request#/media/File:Serious_Request_2008_-_20.jpg\">glass house</a> in\n<a href=\"https://en.wikipedia.org/wiki/Zwolle\">Zwolle</a> just before christmas, <a href=\"https://scholia.toolforge.org/author/Q56868311\">Dr Laura Steinbusch</a>,\n<a href=\"https://scholia.toolforge.org/author/Q27987764\">Martina Kutmon</a>, <a href=\"https://www.maastrichtuniversity.nl/d-van-beek\">Daan van Beek</a>,\nand I will work on making our open science <a href=\"https://wikipathways.org/\">WikiPathways</a> knowledgebase even better to support\nresearch into these disorders. Like we did for COVID-9/SARS-CoV-2 before (see doi:<a href=\"https://doi.org/10.1038/s41597-020-0477-8\">10.1038/s41597-020-0477-8</a>).\nGuided by experts, we will update existing maps (leveraging on the awesome work\n<a href=\"https://doi.org/10.26481/dis.20240624ds\">by Denise Slenter in her PhD</a>) with recent literature, supported by\n<a href=\"https://www.wikipathways.org/sr24-curation/\">computer-assisted data curation</a>, and draw new maps where there\nare knowledge gaps.</p>\n\n<p>Read our full statements here: <a href=\"https://www.npo3fm.nl/kominactie/acties/wikipathways-in-actie-voor-metakids\">https://www.npo3fm.nl/kominactie/acties/wikipathways-in-actie-voor-metakids</a></p>\n\n<p>Part of this will be a workshop day on Thursday 19th of December in Maastricht. Details about that will follow.</p>\n\n<p>In this way, we collect not only money to donate, but we also donate research.</p>\n\n<h2 id=\"how-to-donate\">How to donate</h2>\n\n<p>Well, obviously, it is a fund-raiser. So, please <a href=\"https://www.npo3fm.nl/kominactie/acties/wikipathways-in-actie-voor-metakids\">donate here</a>.\nWe have at least one donation with PayPal (not a fan) from outside The Netherlands.</p>\n\n<p>We are currently at 405 euro of our 2500 euro goal. Please <a href=\"https://www.npo3fm.nl/kominactie/acties/wikipathways-in-actie-voor-metakids\">help us a bit closer to that goal</a>.</p>\n\n<h2 id=\"how-can-you-help\">How can you help</h2>\n\n<p>You can help us enormously by spreading the news of the “kom in actie” in your social network, and raise awareness\nfor the cause of MetaKids. For example, by sharing our action:</p>\n\n<ul>\n  <li>the action page: <a href=\"https://www.npo3fm.nl/kominactie/acties/wikipathways-in-actie-voor-metakids\">https://www.npo3fm.nl/kominactie/acties/wikipathways-in-actie-voor-metakids</a></li>\n  <li>the “we are working on” and results page: <a href=\"https://www.wikipathways.org/communities/sr24.html\">https://www.wikipathways.org/communities/sr24.html</a></li>\n</ul>\n\n<p>Second, in good open science practice, we welcome you to join our “kom in actie”, and several other have\nalread indicated wanting to do so. There is plenty of work that can be done, and we are documenting\n<a href=\"https://github.com/orgs/wikipathways/projects/2/views/1\">our activity on a project board</a>. Any work that will make\nthe FAIR and open knowledge better or show the power will help. To get some ideas of how the knowledge can be used\nis written up in <a href=\"https://link.springer.com/chapter/10.1007/978-3-030-67727-5_73\">this open access chapter</a> by\nDenise, Tina, and me.</p>",
      "summary": "Every day a child is born with an inherited metabolic disorder, and many do not grow old. MetaKids is a Dutch foundation that collects money and raises awareness and the charity selected this year for the NPO (Dutch national radio/tv) 3FM Serious Request. This has become a Dutch tradition. Serious Request will play music on the radio, when people contributed to the fundraiser, and the more money, the more often the music gets played.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/sr24.png",
      "date_published": "2024-12-09T00:00:00+00:00",
      "date_modified": "2024-12-09T00:00:00+00:00",
      "tags": ["wikipathways","sr24"],
      "_references": [{ "url": "https://doi.org/10.1038/S41597-020-0477-8" },{ "url": "https://doi.org/10.26481/DIS.20240624DS" },{ "url": "https://doi.org/10.1007/978-3-030-67727-5_73" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/myaw4-dtg76",
      "url": "https://chem-bla-ics.linkedchemistry.info/2024/12/08/rich-l-apodaca.html",
      "title": "Richard L. Apodaca",
      "content_html": "<p><img src=\"/assets/images/depth_first.png\" style=\"width: 40%; display: block; margin-left: auto; margin-right: auto; float: right\" alt=\"Screenshot of the first Depth-First blog post\" />\nIf you are into openscience chemistry or chemistry blogging, then you probably heard of\n<a href=\"https://orcid.org/0000-0003-3855-9427\">Rich Apodaca</a>’s <a href=\"https://depth-first.com/\">Depth-First blog</a>.\nRich <a href=\"https://doi.org/10.59350/xyp0f-9dt42\">started blogging in 2006 <i class=\"fa-solid fa-recycle fa-xs\"></i></a> but this is not\nhow I discovered his work originally. I know that we at least already had contact in 2005,\nbecause that is when he wrote about an integration between his Octet library and the Chemistry Development Kit\nin the <a href=\"https://sourceforge.net/projects/cdk/files/CDK%20News/\">CDK News</a> (volume 2, issue 2),\n<em>CDKTools: The CDK-Octet Bridge</em>. In 2006 he <a href=\"https://doi.org/10.59350/esgte-mv539\">reviewed our use of the Open Journal System for CDK News <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.</p>\n\n<p>But I did find we have been blogging about our work a lot. <a href=\"https://www.google.com/search?q=site%3Achem-bla-ics.blogspot.com+rich\">Searching for Rich</a>\ngives false positives, but plenty of discussions of his work. At the same time, <a href=\"https://www.google.com/search?q=site:depth-first.com+egon\">my name shows up multiple times</a>\nin Depth-First too. Looking back at our shared history, we find, for example, Rich has blogged a lot about using the\n<a href=\"https://doi.org/10.59350/50ebs-4zq55\">Chemistry Development Kit in Ruby <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.</p>\n\n<p>Rich <a href=\"https://depth-first.com/articles/\">blogged about a lot of cheminformatics innovation</a>. For example,\nin 2006 <a href=\"https://doi.org/10.59350/pz3p6-fv247\">he was working on multi-atom bonding <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\nsuch as in ferrocene, something that is even today not routinely used in cheminformatics. I replied\nto that in <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/12/30/modern-chemistry-in-cdk-beyond-two.html\">this post</a>.\nAnother thing he explored was embedding chemical graph notations in PNG images. In 2007 he\nwrote how to <a href=\"https://doi.org/10.59350/j026p-17z02\">Never Draw the Same Molecule Twice: Image Metadata for Cheminformatics <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.\nThis was picked up by several others, including me with <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/08/24/jchempaint-too-png-embedded.html\">an implementation in JChemPaint</a>.</p>\n\n<p>Another tool that I really liked was <a href=\"https://web.archive.org/web/20101010030537/http://chempedia.com/\">his Chempedia <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>\nwhich collected “[f]ree chemical information resources created and reviewed by chemists”. One of the things it did\nwas link chemical names to chemical structures, e.g. for <a href=\"https://web.archive.org/web/20101031093610/http://chempedia.com/substances/0-4825-8876-0064\">this compound <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>.\nAnd because of the open license I was able to generate <a href=\"https://chem-bla-ics.linkedchemistry.info/2009/11/19/chempedia-rdf-1-sparql-end-point.html\">an RDF representation of Chempedia</a>.\nThis resulted perhaps in one of my first online SPARQL endpoints.</p>\n\n<p>One and a half year ago he was <a href=\"https://doi.org/10.59350/5ct28-aaj63\">confronted with health issues <i class=\"fa-solid fa-recycle fa-xs\"></i></a>. Rich\nblogged openly about the months after that. Rereading this post is still hard, having seen cancer in action\non my mother. It turned out to be cancer, <a href=\"https://doi.org/10.59350/g29jj-d3m35\">a brain tumor <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.\nJust this Thursday I attended a fascinating <sup>2</sup>H NMR presentation, showing how much better\nwe got at recognizing tumors, but Rich’ MRI was obvious. He blogged for months on\n<a href=\"https://depth-first.com/articles/2023/05/18/everyone-has-a-plan/\">his plan</a>. Until <a href=\"https://doi.org/10.59350/6beed-gk067\">the end of May <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nthis year.</p>\n\n<p>Some weeks ago I received confirmation our fear; he passed away. Richard L. Apodaca was\n<a href=\"https://search.lib.utexas.edu/discovery/fulldisplay?docid=alma991024143089706011&amp;context=L&amp;vid=01UTAU_INST:SEARCH&amp;lang=en&amp;search_scope=MyInst_and_CI&amp;adaptor=Local%20Search%20Engine&amp;tab=Everything&amp;query=any,contains,39207173&amp;sortby=rank\">born in 1968</a>,\ncompleted his PhD at the University of Texas at Austin in 1996 on <em>Studies in enantioselective catalysis:\n(1) a new class of chiral C₂-symmetric bisphenols; (2) Diorganotin dihalides</em> (wikidata:<a href=\"https://scholia.toolforge.org/work/Q131405461\">Q131405461</a>).\nRich published multiple papers in the field of medicinal chemistry (see <a href=\"https://scholia.toolforge.org/author/Q43837652\">his Scholia profile</a>),\nwas very active in open science and <a href=\"https://patents.google.com/?inventor=Richard+Apodaca\">held many patents</a>.\nHis latest work was about <em>Balsa: A Compact Line Notation Based on SMILES</em>\n(see doi:<a href=\"https://doi.org/10.26434/chemrxiv-2022-01ltp\">10.26434/chemrxiv-2022-01ltp</a>).</p>\n\n<p>The <a href=\"https://depth-first.com/\">Depth-First blog</a> has a CC-BY 2.0 license and perhaps <a href=\"https://rogue-scholar.org/\">Rogue Scholar</a>\ncan archive it? It helps us remember Rich and his contributions to open science cheminformatics.</p>",
      "summary": "If you are into openscience chemistry or chemistry blogging, then you probably heard of Rich Apodaca’s Depth-First blog. Rich started blogging in 2006 but this is not how I discovered his work originally. I know that we at least already had contact in 2005, because that is when he wrote about an integration between his Octet library and the Chemistry Development Kit in the CDK News (volume 2, issue 2), CDKTools: The CDK-Octet Bridge. In 2006 he reviewed our use of the Open Journal System for CDK News .",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/depth_first.png",
      "date_published": "2024-12-08T00:00:00+00:00",
      "date_modified": "2024-12-30T00:00:00+00:00",
      "tags": ["openscience","cheminf"],
      "_references": [{ "url": "https://doi.org/10.26434/chemrxiv-2022-01ltp" },{ "url": "https://doi.org/10.59350/xyp0f-9dt42" },{ "url": "https://doi.org/10.59350/esgte-mv539" },{ "url": "https://doi.org/10.59350/50ebs-4zq55" },{ "url": "https://doi.org/10.59350/pz3p6-fv247" },{ "url": "https://doi.org/10.59350/j026p-17z02" },{ "url": "https://doi.org/10.59350/5ct28-aaj63" },{ "url": "https://doi.org/10.59350/g29jj-d3m35" },{ "url": "https://doi.org/10.59350/6beed-gk067" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/9mb5c-y3a10",
      "url": "https://chem-bla-ics.linkedchemistry.info/2024/11/23/version-of-record.html",
      "title": "Version of record, and what Open Access must learn from Open Science",
      "content_html": "<p>Before we go into the learning bit, let’s just revisit what a <em>version of record</em> is. Wikipedia\n<a href=\"https://en.wikipedia.org/wiki/Version_of_record\">describes it</a> as\n“the fully copyedited, typeset and formatted copy of a manuscript as published” (with two references).\nBasically, in the whole scheme of research output, it is a <em>release</em>. It is a tagged version of the\noutput, allowing people to discuss that version specifically, so that we do not run into endless “oh, but\nI meant version <code class=\"language-plaintext highlighter-rouge\">manuscript_rewrite_V2_AE_MB_Fixed.docx</code>”. Really, publishing is not unique at all and\npublishers are doing it wrong.</p>\n\n<p>So, with “version of record” defined, why do we have only one in publishing?</p>\n\n<p>There is absolutely no reason not to have multiple versions of the same narrative, as long as they\nare clearly tagged. Open Science has been doing this for two decades, but publishers have been slacking.\nRetractions are updated versions of the same article, as are corrections, corrigendum, and errata.\nIt is not that publishers do not know how to do it. Hesitently, they are accepting that preprints\nexist, but publishers tend to frame that as inferior versions. There was a paper earlier this month\nthat looked into how much the versions are really different, and when I find it again, I will add the link.</p>\n\n<p>Of course, money, control, power likely have a role here. And historic reasons too, I guess. After all,\nwhen you have to print a journal issue and send them by horse carriage to universities around the\nworld, making updates is indeed not trivial.</p>\n\n<h2 id=\"twitter-or-x-or-mastodon-or-bluesky\">Twitter or X (or Mastodon or Bluesky)</h2>\n\n<p>But as an openscientist, I have the urge to keep research output relevant. We do this for data, we\ndo this for community standards, and we do this for research code. Routinely. Again, for decades.\nMust open access not learn from open science here?</p>\n\n<p>I <a href=\"https://mastodon.social/@egonw/113252951241453752\">asked that recently on Mastodon</a>.\nSpecifically, should the <em>Ten simple rules for getting started on Twitter as a scientist</em> article\n(doi:<a href=\"https://doi.org/10.1371/journal.pcbi.1007513\">10.1371/journal.pcbi.1007513</a>) not be updated?\nLooking at the number of scientific papers that discuss social media in scientific\ncommunication, ten tips sound to me to be timeless. And I was interested in why or why-not the paper\nshould be updated. Content-wise, a trivial update would be to update the name to X, which is the\nname of what was formerly known as Twitter. But then, updating the paper to replace Twitter\nby Bluesky or Mastodon would not be that much work either.</p>\n\n<p>The discussion brought up various aspects of this question (and hereby thanks to all who joined the\ndiscussion!). Is it worth it? Is it legal? Should it be an update, or just a new paper? Who\nshould do it? Do scholars have some responsibility to keep there research relevant? If I string-replace\nTwitter with X, how do I make clear who the original authors are, and what my role is? How do we\nget PLOS to point to the new version (surely not as corrigendum)? I do not have the answers.</p>\n\n<p>But I do see differences between different types of research output and that\nmakes these question an essential part of <a href=\"https://recognitionrewards.nl/\">Recognition and Rewards</a>.\nIf it some types of output have different rules, then we do not give all scholars the same\nchance of recognition. Of course, this is the current situation, and just reflects that academia\nstill has much to do to adopt Open Science.</p>",
      "summary": "Before we go into the learning bit, let’s just revisit what a version of record is. Wikipedia describes it as “the fully copyedited, typeset and formatted copy of a manuscript as published” (with two references). Basically, in the whole scheme of research output, it is a release. It is a tagged version of the output, allowing people to discuss that version specifically, so that we do not run into endless “oh, but I meant version manuscript_rewrite_V2_AE_MB_Fixed.docx”. Really, publishing is not unique at all and publishers are doing it wrong.",
      
      "date_published": "2024-11-23T00:00:00+00:00",
      "date_modified": "2024-11-23T00:00:00+00:00",
      "tags": ["publishing","openaccess","openscience"],
      "_references": [{ "url": "https://doi.org/10.1371/JOURNAL.PCBI.1007513" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/djm89-5nb39",
      "url": "https://chem-bla-ics.linkedchemistry.info/2024/11/17/sparql-examples.html",
      "title": "SPARQL examples: SIB model, software, and patches",
      "content_html": "<p><a href=\"https://akademienl.social/@jerven\">Jerven Bolleman</a> <em>et al.</em> recently <a href=\"https://arxiv.org/abs/2410.06010\">published a great preprint</a>\nabout how to use RDF to give SPARQL queries context by linking it (semantically) with metadata. The context includes\nkeywords, the SPARQL endpoint the query can be run against, and a human-oriented description of the query. A few groups\nhave at recent hackathons been working on usingn the combination of a SPARQL query and a human-oriented description\nto train large language models, including the group behind this paper. Given that SPARQL is a very small language, I can see\nthis may work well, and that it may support our <a href=\"https://vhp4safety.nl/\">VHP4Safety</a> and\n<a href=\"https://scholia.toolforge.org/\">Scholia</a> projects.</p>\n\n<p>But in addition to the data model for SPARQL as research output (see doi:<a href=\"https://doi.org/10.32388/ZNWI7T.2\">10.32388/ZNWI7T.2</a>),\nthe paper also introduces the <a href=\"https://github.com/sib-swiss/sparql-examples-utils\">sparql-example-utils</a> software that I was\nfirst introduced with at <a href=\"https://www.wikidata.org/wiki/Wikidata:Scholia/Events/Hackathon_October_2024\">the recent October Scholia hackathon</a>.</p>\n\n<p>But I have/had some features I like to see added. The first is provenance. Who is the author/contributor of the SPARQL\nquery? Is there a open license for it, or perhaps public domain? How do I give attribution if I reuse the SPARQL query?\nThese things matter in a modern <a href=\"https://recognitionrewards.nl/\">recognition and rewards</a> world where is room for\neveryone’s talent. A set of good SPARQL queries may be more valuable than a ten-page Jupyter notebook (and the other way\naround). So, I <a href=\"https://github.com/sib-swiss/sparql-examples-utils/pull/24\">started</a>\n<a href=\"https://github.com/sib-swiss/sparql-examples-utils/pull/25\">writing</a>\n<a href=\"https://github.com/sib-swiss/sparql-examples-utils/pull/26\">patches</a>. And I created\n<a href=\"https://github.com/BiGCAT-UM/sparql-examples-utils/releases/tag/v2.0.11-tgx-1\">a custom jar</a> so that I can see these\npatches in action in <a href=\"https://bigcat-um.github.io/sparql-examples/\">our growing list of SPARQL queries</a>\n(here <a href=\"https://bigcat-um.github.io/sparql-examples/examples/WikiPathways/002.html\">a WikiPathways query</a>):</p>\n\n<p><img src=\"/assets/images/sparql-examples-tgx.png\" alt=\"\" /></p>\n\n<p>I started collecting SPARQL queries for <a href=\"https://bigcat-um.github.io/sparql-examples/examples/ChEMBL/\">ChEMBL</a>,\n<a href=\"https://bigcat-um.github.io/sparql-examples/examples/WikiPathways/\">WikiPathways</a>, and\n<a href=\"https://bigcat-um.github.io/sparql-examples/examples/VHP4Safety/\">VHP4Safety</a>. These queries are often part\nof other interfaces but we can easily extract the original SPARQL from the Turtle files behind these pages.</p>",
      "summary": "Jerven Bolleman et al. recently published a great preprint about how to use RDF to give SPARQL queries context by linking it (semantically) with metadata. The context includes keywords, the SPARQL endpoint the query can be run against, and a human-oriented description of the query. A few groups have at recent hackathons been working on usingn the combination of a SPARQL query and a human-oriented description to train large language models, including the group behind this paper. Given that SPARQL is a very small language, I can see this may work well, and that it may support our VHP4Safety and Scholia projects.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/sparql-examples-tgx.png",
      "date_published": "2024-11-17T00:00:00+00:00",
      "date_modified": "2024-11-17T00:00:00+00:00",
      "tags": ["sparql","wikipathways","vhp4safety","chembl","scholia"],
      "_references": [{ "url": "https://doi.org/10.32388/ZNWI7T" },{ "url": "https://doi.org/10.48550/arXiv.2410.06010" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/yxxp4-r5j24",
      "url": "https://chem-bla-ics.linkedchemistry.info/2024/11/10/mastodon-bridge-to-bluesky.html",
      "title": "Mastodon, RSS, BlueSky",
      "content_html": "<p><img style=\"float: right;\" src=\"/assets/images/bluesky.png\" width=\"200\" />\nThe x-odus continues and there is a wave of researchers moving from X to another walled-garden called <a href=\"https://en.wikipedia.org/wiki/Bluesky\">Bluesky</a>.\nThis is good and bad. First, it is good that people are leaving X (imho) and it is good that they move to a platform that supports\nopen standards, the <a href=\"https://en.wikipedia.org/wiki/AT_Protocol\">AT Protocol</a>. But I am less sure, about moving to another closed source\nplatform. I prefer <a href=\"https://chem-bla-ics.linkedchemistry.info/tag/mastodon\">Mastodon</a>. You can follow Mastodon accounts with their\n<a href=\"https://chem-bla-ics.linkedchemistry.info/tag/rss\">RSS</a> feeds and that gives BlueSky users the ability to follow me on social media.\nThis is important to me. I have a LinkedIn account too, but you can only follow me there if you have an account there too. To me,\nthat does not align with the Open Science ideals.</p>\n\n<p>But while you can follow me Mastodon accounts <a href=\"https://social.edu.nl/@egonw.rss\">with</a> <a href=\"https://mastodon.social/@egonw.rss\">RSS</a>\n(or just by checking the <a href=\"https://social.edu.nl/@egonw\">two</a> <a href=\"https://mastodon.social/@egonw\">webpages</a>, this is a read-only access. That is,\nyou cannot reply. For that, you still need an Mastodon (or Fediverse) account too.</p>\n\n<p>But then there is <a href=\"https://fed.brid.gy/docs\">Bridgy Fed</a>. It <em>“is a decentralized social network bridge. It connects the fediverse,\nthe web, and Bluesky”</em>. I learned about this recently, and it seems to do what it promises. Using the AT Protocol, it allows me\nto follow and reply to BlueSky users (if they have enabled the bridge), and BlueSky users can interact with me.</p>\n\n<p>So, if you have BlueSky and want to follow one or both of my Mastodon accounts, check out:</p>\n\n<ul>\n  <li><a href=\"https://bsky.app/profile/egonw.social.edu.nl.ap.brid.gy\">@egonw.social.edu.nl.ap.brid.gy</a> (focused on my research)</li>\n  <li><a href=\"https://bsky.app/profile/egonw.mastodon.social.ap.brid.gy\">@egonw.mastodon.social.ap.brid.gy</a> (more general open science)</li>\n</ul>\n\n<p>But only if they enabled the bridge too, I can follow them back.</p>",
      "summary": "The x-odus continues and there is a wave of researchers moving from X to another walled-garden called Bluesky. This is good and bad. First, it is good that people are leaving X (imho) and it is good that they move to a platform that supports open standards, the AT Protocol. But I am less sure, about moving to another closed source platform. I prefer Mastodon. You can follow Mastodon accounts with their RSS feeds and that gives BlueSky users the ability to follow me on social media. This is important to me. I have a LinkedIn account too, but you can only follow me there if you have an account there too. To me, that does not align with the Open Science ideals.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/bluesky.png",
      "date_published": "2024-11-10T00:00:00+00:00",
      "date_modified": "2024-11-10T00:00:00+00:00",
      "tags": ["mastodon"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/acrqt-9y217",
      "url": "https://chem-bla-ics.linkedchemistry.info/2024/10/29/suppdata-data-dataset-database.html",
      "title": "Additional files, data, datasets, databases, and published data",
      "content_html": "<p>Open Science doesn’t make publishing easier. That that’s all for the better: our research efforts are complex,\nso why should the publishing be. Sure, I am <strong>not</strong> talking about references formatting or moving the Methods\nsection to the right location, or some silly statement that all authors agree with the manuscript when you are\nthe only author.</p>\n\n<p>No, let’s talk about data. What should you publish? How, and when? And why would you do it in the first\nplace? This is not going to be a post about FAIR either, but instead about when to publish data as additional\nfiles (aka supplementary data), raw data, processed data, as a datasets, or even as a database. That’s a\nlot of types of data, and the differences matter at least for the effort you want to put in.</p>\n\n<p>First, things have changed. We produce a massive amount more data. In the past your data, or at least the\nprocessed data, would be part of your conference talk, your journal article, or your book (chapter).\nOpen Science has changed this: data should be easier to reuse. But that results in new questions; those\nas in the previous paragraph. So, let’s add some context.</p>\n\n<p>Data is very broad and includes digital knowledge. Data can be raw, and the exact numbers collected (e.g.\nby a apparatus) or created by researchers. Processed data is what you get when you process the raw data.\nFor example, raw data may be a FID graph in nucleic magnetic resonance, while processed data would be a\nplot showing intensities versus chemical shifts. Published data is then a list of peaks you put in your\nresults section to support your claim of chemical identity.</p>\n\n<p>A fourth type of data is metadata, and could here be the instrument on which the FID was measured, or\nthe solvent used, etc. This is where it gets complicated, because depending on the researcher who\nprocesses the data, metadata can actually be data itself. For example, when you study the chemical\nshift differences in different organic solvents.</p>\n\n<p>From a more social level, the <a href=\"https://chem-bla-ics.linkedchemistry.info/2024/10/21/nasa-tops.html\">Open Science 101</a>\nuses the following categories: primary data as collected/recorded by the researcher, and\n“secondary data typically refers to data that is used by someone different from who collected or generated the data”.\nThis angle of data captures the collaboration aspects of open science, but says more about\nthe processors than the data, I think.</p>\n\n<h2 id=\"monitoring-open-data\">Monitoring Open Data</h2>\n\n<p>Central aspect of doing research is to disseminate the research. Traditionally, this has been\ndisseminating results, hoping they become facts. Increasingly, we realize that this process needs\nimprovement, particularly clearly studies, done, and communicated by the Open Science approaches.</p>\n\n<p>Complementary, there is recognition&amp;rewarding (R&amp;R) and the wish to use various kinds of monitoring to\nassess who should be rewarded (and who should be fired), and the monitor is the implementation\nof the recognition. So, how does this work for open data? We can count every open data, but\nif thrown on a big pile, that becomes a bad monitor for use in recognition and rewarding.</p>\n\n<p>One idea is to differentiate in what data we monitor? Just raw data? Or processed data?\nHow much intellectual effort does that have to in collecting/recording the data? Should that\nbe part of the monitor and how do you even measure that? Lot’s of known unknowns here.</p>\n\n<p>But this should not inhibit us from telling the research narative. And maybe we should\njust exploring the possible narratives to allow us how it may help us monitor work done,\nhow to recognize contributions to the scientific record, and how to use all that in R&amp;R.</p>\n\n<p>I here present some example from my own research, just to start a narrative.</p>\n\n<h2 id=\"raw-data\">Raw data</h2>\n\n<p>Over the years I have collected and recorded quite a bit of raw data. First data collected in the lab\nand later mostly recorded. Even though I have been doing Open Science since the late nineties,\nI cannot say all my data has been archived well. Even less so, I do not have a “publication list”\nof all my raw data. As an academic community, we have been focusing too much on the scholarly\narticle as the center of the research system (more on that later, because there is awesome\nresearch presented at the Dutch National Open Science Festival).</p>\n\n<ul>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.org/03/27/migrating-pka-data-from-drugmet-to.html\">pKa values</a> (not archived, no DOI)</li>\n  <li><a href=\"https://doi.org/10.6084/m9.figshare.7075214.v1\">NanoWiki 5</a> (archived, with DOI)</li>\n</ul>\n\n<h2 id=\"processed-data\">Processed data</h2>\n\n<p>As is defined in the <a href=\"https://commission.europa.eu/law/law-topic/data-protection/reform/what-constitutes-data-processing_en\">European laws around GDPR</a>,\nprocessing “includes the collection, recording, organisation, structuring, storage,\nadaptation or alteration, retrieval, consultation, use, disclosure by transmission, dissemination or otherwise making available,\nalignment or combination, restriction, erasure or destruction of [..] data”. As you can see, this is slightly\ndifferent from the first, but in light of protecting citizen, this broader definition makes sense.\nMy point here the that processing should be taken broadly. And data curation, which researchers\nroutinely do, is processing too. For any data scientist, this is easily taking up 25% of the\nfull time needed for any data analysis. One of the points of the FAIR principles is to keep\nthat number as low as possible, but not really the point here.</p>\n\n<p>When it comes to this kind of data, I like people to have readily access to the results\nof my curation. You will find a lot of processed data like this archived. Some examples of\ndata by me or to which I contributed:</p>\n\n<ul>\n  <li><a href=\"https://doi.org/10.5281/zenodo.13933046\">WikiPathways</a> (monthly archived, with DOIs)</li>\n  <li><a href=\"https://doi.org/10.6084/m9.figshare.681678\">ChemPedia RDF</a> (different format than original data, archived, with DOI)</li>\n  <li><a href=\"https://doi.org/10.6084/m9.figshare.26931712.v1\">BridgeDb Metabolite ID mapping database</a> (irregular releases, not every one is notable; archived, with DOI)</li>\n</ul>\n\n<p>The last one will look something like this:</p>\n\n<p><img src=\"/assets/images/figshare_bridgedb.png\" alt=\"\" /></p>\n\n<h2 id=\"published-data\">Published data</h2>\n\n<p>And then we have published data, which refers to data presented in a publication, like a journal\narticles. We know this as supplementary data or additional files. Several publishers, like\nBioMedCentral, submit these data automatically to a repository. For example, the\n<a href=\"https://jcheminf.biomedcentral.com/\">Journal of Cheminformatics</a> publishes all additional files under a CCZero license on Figshare.\nBut many of these support the narrative of the story, rather than the narrative of the\nresearch question. Of course, journals also have limited expectations of the format and\nmy personal impression is that these are not commonly FAIR. (Open Access is not Open Science.)</p>\n\n<p>Some examples of such datasets where I do not see them as notable and do not expect them\nto be monitored. These datasets are part of the journal article, and that narrative is\nalready monitored.</p>\n\n<ul>\n  <li><a href=\"https://doi.org/10.6084/m9.figshare.c.3696370_D1.v1\">MOESM1 of PubChemRDF: towards the semantic annotation of PubChem compound and substance databases</a> (Word document with data, with DOI)</li>\n  <li><a href=\"https://doi.org/10.6084/m9.figshare.c.3698536_D1.v1\">MOESM1 of XMetDB: an open access database for xenobiotic metabolism</a> (archived Structured Data file with chemical structures, with DOI)</li>\n</ul>\n\n<h2 id=\"databases\">Databases</h2>\n\n<p>And then we have databases provides as interactive website. This allows other researchers\nto explore the data, before the start processing the data. These typically do not have a DOI itself,\ntho data can be routinely archived as in the above WikiPathways example.</p>\n\n<p>Databases itself, as research output, are much harder to archive. And to make them citatable,\nresearch publish journal articles with a narrative that describes the database. The follwing two\nare such database papers, where the article DOI is a proxy for the database:</p>\n\n<ul>\n  <li><a href=\"https://doi.org/10.1186/1758-2946-5-23\">The ChEMBL database as linked open data</a> (<a href=\"https://chemblmirror.rdf.bigcat-bioinformatics.org/\">online</a>, DOI via article)</li>\n  <li><a href=\"https://doi.org/10.1186/s13321-021-00573-5\">PSnpBind</a> (<a href=\"https://psnpbind.org/\">online</a>, DOI via article)</li>\n</ul>",
      "summary": "Open Science doesn’t make publishing easier. That that’s all for the better: our research efforts are complex, so why should the publishing be. Sure, I am not talking about references formatting or moving the Methods section to the right location, or some silly statement that all authors agree with the manuscript when you are the only author.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/figshare_bridgedb.png",
      "date_published": "2024-10-29T00:00:00+00:00",
      "date_modified": "2024-11-02T00:00:00+00:00",
      "tags": ["data"],
      "_references": [{ "url": "https://doi.org/10.6084/M9.FIGSHARE.7075214.V1" },{ "url": "https://doi.org/10.5281/ZENODO.13933046" },{ "url": "https://doi.org/10.6084/M9.FIGSHARE.681678" },{ "url": "https://doi.org/10.6084/M9.FIGSHARE.26931712.V1" },{ "url": "https://doi.org/10.6084/M9.FIGSHARE.C.3696370_D1.V1" },{ "url": "https://doi.org/10.6084/M9.FIGSHARE.C.3698536_D1.V1" },{ "url": "https://doi.org/10.1186/1758-2946-5-23" },{ "url": "https://doi.org/10.1186/S13321-021-00573-5" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/mch14-dtx11",
      "url": "https://chem-bla-ics.linkedchemistry.info/2024/10/24/vhp4safety.html",
      "title": "New paper: The Virtual Human Platform for Safety Assessment (VHP4Safety)",
      "content_html": "<p>I have <a href=\"https://chem-bla-ics.linkedchemistry.info/tag/vhp4safety\">not posted a lot</a> about our <a href=\"https://vhp4safety.nl/\">Virtual Human Platform for Safety Assessment</a>\n(VHP4Safety) project yet. Actually, more generally I do not post frequently about the funded projects. This is likely that few of them are Open Science\nby contract and often they have some formal process in place to approve output. That makes open notebook science-style posting about these projects\nhard. One is restricted to previously cleared material.</p>\n\n<p>One such material is the new project paper about VHP4Safety, <em>The Virtual Human Platform for Safety Assessment (VHP4Safety) project: Next generation chemical\nsafety assessment based on human data</em> (doi:<a href=\"https://doi.org/10.14573/altex.2407211\">10.14573/altex.2407211</a>). It is a fun project to work in,\nambitious, and in a vibrant community making steps in open science. That means that a lot of what we is core science, but the science comes\nfrom many different disciplines, and it is as much natural sciences as it is humanities.</p>\n\n<p>So, we somewhere during the project we started organizing hackathons. Some of us had plenty of experience with that already, but these\nare hackathons from fields where this has not been as common, perhaps. But is has been fun, e.g. see\n<a href=\"https://www.sciencrew.com/c/9347/a/335221636?title=Advancing_AI_in_Toxicology_Insights_from_the_Third_VHP4Safety_H\">this write up of the third hackathon</a>.</p>\n\n<p>There is a lot more I should be writing about VHP4Safety, and I will try, but for now I will limit it to these pointers:</p>\n\n<ul>\n  <li>the main VHP4Safety website: <a href=\"https://vhp4safety.nl/\">https://vhp4safety.nl/</a></li>\n  <li>our documentation platform: <a href=\"https://docs.vhp4safety.nl/\">https://docs.vhp4safety.nl/</a></li>\n  <li>our catalogue of cloud services: <a href=\"https://cloud.vhp4safety.nl/\">https://cloud.vhp4safety.nl/</a></li>\n  <li>our common language: <a href=\"https://glossary.vhp4safety.nl/\">https://glossary.vhp4safety.nl/</a></li>\n</ul>\n\n<p>And we try to register our solutions as widely as possible, e.g. with national and ELIXIR indices:</p>\n\n<ul>\n  <li>our <a href=\"https://taxila.nl/content_providers/vhp4safety\">Taxila.nl section</a></li>\n  <li>out <a href=\"https://tess.elixir-europe.org/content_providers/vhp4safety\">ELIXIR TeSS section</a></li>\n</ul>",
      "summary": "I have not posted a lot about our Virtual Human Platform for Safety Assessment (VHP4Safety) project yet. Actually, more generally I do not post frequently about the funded projects. This is likely that few of them are Open Science by contract and often they have some formal process in place to approve output. That makes open notebook science-style posting about these projects hard. One is restricted to previously cleared material.",
      
      "date_published": "2024-10-24T00:00:00+00:00",
      "date_modified": "2024-10-24T00:00:00+00:00",
      "tags": ["vhp4safety"],
      "_references": [{ "url": "https://doi.org/10.14573/ALTEX.2407211" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/w7kzh-8y965",
      "url": "https://chem-bla-ics.linkedchemistry.info/2024/10/21/nasa-tops.html",
      "title": "NASA Transform to Open Science (TOPS) Open Science 101",
      "content_html": "<p>It was on my radar for some time already, but did not get around to finishing it. But I completed all\nfive modules of the <a href=\"https://openscience101.org/\">NASA Transform to Open Science (TOPS) Open Science 101</a>\n(doi:<a href=\"https://doi.org/10.5281/zenodo.10161527\">10.5281/zenodo.10161527</a>).\nThis Open Science 101 consists of several modules, starting with <em>The Ethos of Open Science</em>, via\n<em>Open Tools and Resources</em>, <em>Open Data</em>, and <em>Open Code</em>, to <em>Open Results</em>.</p>\n\n<p><img src=\"/assets/images/nasa_tops.png\" alt=\"\" /></p>\n\n<p>Now, since I have been practising aspects of science for almost 25 years, I have to admit I was nervous\ndoing this. That probably explains why it took me so long to do it. Just going through the material will\nprobably take 4-8 hours, but there was a lot to reflect on. They also link to many additional resources\nand cite a good bunch of scientific research.</p>\n\n<p>I also like to stress that I like the material very, very much. It is very well designed, covers a lot\nof aspects, and finds a great balance between depth and coverage. Sure, I had some comments here and\nthere, but it higjhlights very well what open science really is, what not, and how the open science\ncommunity is working on reaching the goals, which things work well, and which things need more work.</p>\n\n<p>The material itself is <a href=\"https://github.com/nasa/Transform-to-Open-Science\">open and available from GitHub</a>.</p>",
      "summary": "It was on my radar for some time already, but did not get around to finishing it. But I completed all five modules of the NASA Transform to Open Science (TOPS) Open Science 101 (doi:10.5281/zenodo.10161527). This Open Science 101 consists of several modules, starting with The Ethos of Open Science, via Open Tools and Resources, Open Data, and Open Code, to Open Results.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/nasa_tops.png",
      "date_published": "2024-10-21T00:00:00+00:00",
      "date_modified": "2024-10-21T00:00:00+00:00",
      "tags": ["openscience"],
      "_references": [{ "url": "https://doi.org/10.5281/zenodo.10161527" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/3abda-n1j28",
      "url": "https://chem-bla-ics.linkedchemistry.info/2024/09/23/patents-and-impact.html",
      "title": "Patents, societal impact, and sustainability",
      "content_html": "<p>Division 1 of our <a href=\"https://www.maastrichtuniversity.nl/research/school-nutrition-and-translational-research-metabolism\">Institute of Nutrition and Translational Research in Metabolism</a>\n(NUTRIM) held a meeting last week which had a panel discussion on the use of\npatents to bring research to the market, aimed at PhD candidates of the institute.\nPatents are one of the routes to make research output more sustainable. For\nexample, the research output into a new method to study something or make something\noften needs the development into a product. For example, a new multivariate\nstatistics method may need a graphical user interface.\nAs such, the “development” after the research (think, R&amp;D) is often part of the\n<em>sustainability</em> of some research.</p>\n\n<p>Patents, trade secrets, and precompetitive collaboration are three methods that\nhave been used to make research output sustainable. Of course, in addition to\nthe fourth, which is simply the published journal article or book chapter.</p>\n\n<p>This led to the notion that PhD research, if it is to benefit (the Dutch) society,\nthen if needs to get used. There needs to be a market of users. This could be\nother scholars that use the method, use the data (see also\n<a href=\"https://chem-bla-ics.linkedchemistry.info/tag/cito\">Citation Typing Ontology</a>\nthat captures such reuse), or could be a product sold to other businesses or\neven a consumer market product.</p>\n\n<p>Filing a patent is often seen as research having societal impact. It captures\nthe notion that one or more people trust the impact enough to invest a considerable\namount of money. BTW, patents allow others to reuse your knowledge, to extend\nit, and to modify it. It is just that the patent limits how you use the results\nof that reuse commercially.</p>\n\n<p>But patents are interesting in another way. A mention of your research means that\nthe people that cited your work in their patent found your research valuable\nenough to list it as support of their patent. This is similar to getting cited\nin another journal article (or book (chapter)), but much closer to society.</p>\n\n<p>Therefore, if you are interested to learn whhich of the research you do, and the output\nof that research, has an impact on society, scanning patent literature for citations to\nyour work or the work of the research group you work in, can give surprising\nresults. Worst case, it gives you ideas of how the research may benefit society.</p>\n\n<h2 id=\"google-patents\">Google Patents</h2>\n\n<p>Nowadays, there are multiple patent search engines and sometimes the do a lot\nof text mining, e.g. to find patents that mention a certain chemical structures.\nBut a general search engine like <a href=\"https://patents.google.com/\">Google Patents</a>\nwill already to you a great service. If you search here on terms related to\nyour research, or your last name, you can find results. If your research project\nhas a unique name, this will, of course, greatly simplify the search.</p>\n\n<p>For example, when I search for <a href=\"https://www.wikipathways.org/\">WikiPathways</a> (our biological WikiPathways\nknowledge graph), it finds <a href=\"https://patents.google.com/?q=(wikipathways)&amp;oq=wikipathways\">over 200 patents that mention it</a>.\nWikiPathways is an Open Science project and there is no patent on our approach,\nbut what this project has done, turns out to be important for SMEs enough that\nthey base a patent on it. Of course, the role is often just supportive, just\nlike a journal article citation. This is what a results page may look like:</p>\n\n<p><img src=\"/assets/images/google_patent_wikipathways.png\" alt=\"\" /></p>\n\n<h2 id=\"citations-to-specific-article\">Citations to specific article</h2>\n\n<p>There are also tools that make available text mining results that found which\narticles have been cited in which patent. <a href=\"https://altmetric.com/\">Altmetric.com</a>\nis one of them. For many articles (DOIs) they provide information on where that\narticle (DOI) is mentioned. And they provide a <a href=\"https://www.altmetric.com/about-us/our-data/donut-and-altmetric-attention-score/\">donut to visualize that\nattention</a>.\nOver time, the diversity of what mentions they find has gone down, and new\nmedia are not added frequently and Mastodon is a big one missing, but patents\nis still one of the supported resources.</p>\n\n<p>For any DOI you can look up what data Altmetric.com has using this URL\npattern (the example is for the DOI <em>10.1039/D3DD00069A</em>):</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>https://altmetric.com/details/doi/10.1039/D3DD00069A\n</code></pre></div></div>\n\n<p>Maastricht University users can use our <a href=\"https://cris.maastrichtuniversity.nl/\">cris</a>\nwhich provides an HTML page listing all your articles (e.g.\n<a href=\"https://cris.maastrichtuniversity.nl/en/persons/egon-willighagen/publications/\">mine</a>)\nand each has a Altmetric.com donut, which an orange band for patents:</p>\n\n<p><img src=\"/assets/images/altmetrics_patents.png\" alt=\"\" /></p>\n\n<p>We can see here that this article is cited in three patents. You can click\nthe donut to find which patents those are. The <em>cris</em> overview page gives\na quick look which articles (or research lines) are cited in patents.</p>\n\n<p>Also look out for the purple bands, which reflect citations in policy documents,\nwhich reflect another kind of societal impact.</p>\n\n<h2 id=\"potential\">Potential</h2>\n\n<p>For early career researchers with few articles and not a lot of time to\nget cited in patents (or policies), it can also be useful to look at articles\nthat your work is based on, e.g. those of your supervisor.</p>",
      "summary": "Division 1 of our Institute of Nutrition and Translational Research in Metabolism (NUTRIM) held a meeting last week which had a panel discussion on the use of patents to bring research to the market, aimed at PhD candidates of the institute. Patents are one of the routes to make research output more sustainable. For example, the research output into a new method to study something or make something often needs the development into a product. For example, a new multivariate statistics method may need a graphical user interface. As such, the “development” after the research (think, R&amp;D) is often part of the sustainability of some research.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/google_patent_wikipathways.png",
      "date_published": "2024-09-23T00:00:00+00:00",
      "date_modified": "2024-09-23T00:00:00+00:00",
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/7qe60-evp05",
      "url": "https://chem-bla-ics.linkedchemistry.info/2024/09/16/publishing.html",
      "title": "Better Publishing",
      "content_html": "<p>If you read my blog, it should not surprise you that I have long experimented with technologies\nto improve knowledge dissemination, for example <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/12/10/including-smiles-cml-and-inchi-in.html\">in HTML</a>. And I have <a href=\"https://chem-bla-ics.linkedchemistry.info/tag/publishing\">blogged about publishing</a>\nfrom an author and researcher, and editor perspective, for many years (see <a href=\"https://chem-bla-ics.blogspot.com/search?q=publishing\">this longer list\non my old blog</a>).\nAlso, in the <a href=\"https://jcheminf.biomedcentral.com/\">Journal of Cheminformatics</a>\nwe pushed for innovation, including <a href=\"https://jcheminf.biomedcentral.com/articles/10.1186/s13321-019-0365-4\">ORCID and GitHub adoption</a> and <a href=\"https://jcheminf.biomedcentral.com/articles/10.1186/s13321-020-00448-1\">Citation Typing Ontology adoption</a>.</p>\n\n<p>All of these depend on the publisher to support these efforts. But the big publishers are not good\nat this (see also doi:<a href=\"\">10.5281/zenodo.4926031</a>https://doi.org/10.5281/zenodo.4926031)\nand/or prefer to make 20-30% profit first.</p>\n\n<p>This opens room for innovative publishers. We have <a href=\"https://f1000research.com/\">F1000Research</a> pushing open peer review,\nand <a href=\"https://pensoft.net/\">PenSoft</a> pushing a new editor,\n<a href=\"https://www.overleaf.com/\">Overleaf</a> bringing collaborative online editing of LaTeX,\n[Qeios] experimenting with a <a href=\"https://chem-bla-ics.linkedchemistry.info/2023/07/02/qeios-open-dissemination-platform-for.html\">wider range of output types</a>,\nand the  <a href=\"https://joss.theoj.org/\">Journal of Open Source Software</a> (JOSS) pioneering\na more open platform for the whole editing process.</p>\n\n<p>And, of course, we have <a href=\"https://en.wikipedia.org/wiki/Diamond_open_access\">Diamond Open Access</a>\npublishers that do not get enough visibility, like <a href=\"https://scipost.org/\">SciPost</a>\nand <a href=\"https://www.beilstein-journals.org/\">Beilstein</a> for natural sciences and\nJOSS for open source.</p>\n\n<h2 id=\"open-journal-systems\">Open Journal Systems</h2>\n\n<p>And there is the <a href=\"https://pkp.sfu.ca/software/ojs/\">Open Journal Systems</a> (OJS), another\neditor manager, one that has been around for some time now. We use OJS for the\n<a href=\"https://chem-bla-ics.linkedchemistry.info/tag/cdknews\">CDK News newsletter</a>.\nBig news this week was that <a href=\"https://pkp.sfu.ca/2024/09/12/ojs-infrastructure-for-open-research-europe/\">OJS has been selected</a>\nas infrastructure to underping the <a href=\"https://open-research-europe.ec.europa.eu/\">Open Research Europe</a> publishing platform,\nsomething running on F1000Research, <a href=\"https://en.wikipedia.org/wiki/F1000_(publisher)\">bought up up Taylor&amp;Francis in 2020</a>.</p>\n\n<p>I need to catch up with where the OJS is technically. Do they support Markdown\nsubmissions? Do they export <a href=\"https://jats.nlm.nih.gov/\">JATS</a>? Do they support CiTO annotations? But this needs\neditors and journals to expect these things. Unfortunately, many journals have\na limited expectation of digitial knowledge dissemination, and it’s still\nPDF galore.</p>\n\n<h2 id=\"better-publishing\">Better Publishing</h2>\n\n<p>This brings me to the following: should the Dutch universities continue to fund\nthe publisher business, stakeholder profit, or should we invest in open infrastructure\nto benefit our own core business: research and education. I think you understand\nwhat my position is on this. The current big deals we have with the big\npublishers are not actually really in our benefit and with the upcoming defunding\nwe have to use every euro carefully. And then I prefer to fund a young researcher\ninstead of publisher stakeholders.</p>\n\n<p>I hope you are willing the read the following petition to the Dutch negotiators\nto very carefully consider what their priorities are and who they represent.\nYou can sign anonymously (if you fear backslash) and you can just read the details\nbehind this well-written petition: there are many references at the bottom to\nsupport the statements I make here, and more.</p>\n\n<p>But I really, really hope you wish a better future for knowledge dissemination.\nJust think of your next Reviewer 2, that you pay the publisher to have Reviewer 2\nscold at you, or the time spent on reference formatting, just because the publisher\nprefers profit over usability.</p>\n\n<p>Join and <a href=\"https://openscienceretreat.eu/call-to-commitment-future-proof-oa-publishing/\">sign</a>!</p>\n\n<p><a href=\"https://openscienceretreat.eu/call-to-commitment-future-proof-oa-publishing/\"><img src=\"/assets/images/0-768x768.jpg\" alt=\"\" /></a></p>",
      "summary": "If you read my blog, it should not surprise you that I have long experimented with technologies to improve knowledge dissemination, for example in HTML. And I have blogged about publishing from an author and researcher, and editor perspective, for many years (see this longer list on my old blog). Also, in the Journal of Cheminformatics we pushed for innovation, including ORCID and GitHub adoption and Citation Typing Ontology adoption.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/0-768x768.jpg",
      "date_published": "2024-09-16T00:00:00+00:00",
      "date_modified": "2024-09-16T00:00:00+00:00",
      "tags": ["publishing","openscience"],
      "_references": [{ "url": "https://doi.org/10.1186/S13321-019-0365-4" },{ "url": "https://doi.org/10.1186/S13321-020-00448-1" },{ "url": "https://doi.org/10.5281/ZENODO.4926030" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/7hjzg-ngr66",
      "url": "https://chem-bla-ics.linkedchemistry.info/2024/09/07/wikidata-citations.html",
      "title": "Adding citations between existing articles in Wikidata",
      "content_html": "<p>Scholarly articles provide context to the factualness of statements in <a href=\"https://wikidata.org/\">Wikidata</a>,\nsimilar to the <a href=\"https://en.wikipedia.org/wiki/Citation_needed\">[citation needed]</a> in <a href=\"https://en.wikipedia.org/wiki/\">Wikipedia</a>.\nAnd just like the cited references in each scholarly article itself. The citation network is general seen\nas an essential part of (doing) science, even without <a href=\"https://chem-bla-ics.linkedchemistry.info/tag/cito\">citation intention annotation</a>.\nNowadays, citations are mostly open, but this took very serious lobbying by the <a href=\"https://i4oc.org/\">Initiative for Open Citations</a> and\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2018/11/17/join-me-in-encouraging-acs-to-join.html\">not every publisher reacted immediately</a>.\nBut now that they are open, projects like <a href=\"https://opencitations.net/\">OpenCitations</a> are making this citation\nnetwork FAIR.</p>\n\n<p>Therefore, when an article is cited as reference in Wikidata, I think that the articles (and other research output)\ncited in that article is part of the reference. After all, it is really hard to understand any article without the details\nin the cited articles. So, getting these citations between article into Wikidata deepens the knowledge captured\nby Wikidata. Of course, Wikidata is also one of the few places where we can capture the citation intentions at all.</p>\n\n<p>Adding these citations manually is cumbersome but <a href=\"https://chem-bla-ics.linkedchemistry.info/2023/08/08/history-provenance-detail.html\">sometimes needed</a>\nas these citations are not open or not FAIR yet. Fortunately, in many cases we can automate the process, for\nwhich I wrote a <a href=\"https://chem-bla-ics.linkedchemistry.info/tag/bioclipse\">Bacting</a>-cased\n<a href=\"https://github.com/egonw/ons-wikidata/blob/main/OpenCitations/quickstatements.groovy\">script</a>.\nUntil recently, the script takes as input a single DOI or a list of DOIs as input, and for each DOI\nlooks up in OpenCitations if it cites other article DOIs and is cited by other DOIs. For the\ncited and citing DOIs it checks if those are in Wikidata and (only) if they are in Wikidata,\nthen it create QuickStatements. The result can look like <a href=\"https://www.wikidata.org/wiki/Q91911528#P2860\">this</a>:</p>\n\n<p><img src=\"/assets/images/opencitationsImport.png\" alt=\"\" /></p>\n\n<p>The script also needs a OpenCitation token, which you can <a href=\"https://opencitations.net/querying\">get here</a>.\nThis is how I run the code from the command line (with the token in the <code class=\"language-plaintext highlighter-rouge\">TOKEN</code> environment variable),\nfor a single DOI:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>groovy quickstatements.groovy <span class=\"nt\">-t</span> <span class=\"k\">${</span><span class=\"nv\">TOKEN</span><span class=\"k\">}</span> <span class=\"nt\">-d</span> 10.1002/JLAC.18721620110 | <span class=\"nb\">tee </span>output.qs\n</code></pre></div></div>\n\n<p>A list of DOIs is provided as a text file, with one DOI on one line. I then use the <code class=\"language-plaintext highlighter-rouge\">-l</code> parameter\n(oh, here DOIs of works by <a href=\"https://en.wikipedia.org/wiki/Shyamala_Gopalan\">Shyamala Gopalan</a>, mother of\n<a href=\"https://en.wikipedia.org/wiki/Kamala_Harris\">Kamala Harris</a>):</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>groovy quickstatements.groovy <span class=\"nt\">-t</span> <span class=\"k\">${</span><span class=\"nv\">TOKEN</span><span class=\"k\">}</span> <span class=\"nt\">-l</span> harris_dois.txt | <span class=\"nb\">tee </span>output.qs\n</code></pre></div></div>\n\n<p>But last weekend I created a new feature. To enrich the profiles of authors, for example Nobel Prize\nwinners, mothers of, or <a href=\"https://scholia.toolforge.org/author/Q76784\">famous</a> <a href=\"https://scholia.toolforge.org/author/Q80956\">chemists</a>,\npreviously I would create a list of DOIs, now I have the script do that:</p>\n\n<p>So, today I could add the citation network for any arbitraty author, e.g. <a href=\"https://en.wikipedia.org/wiki/Carolyn_Bertozzi\">Carolyn Bertozzi</a>,\nI just pass the Wikidata QID:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>groovy quickstatements.groovy <span class=\"nt\">-t</span> <span class=\"k\">${</span><span class=\"nv\">TOKEN</span><span class=\"k\">}</span> <span class=\"nt\">-a</span> Q7442 | <span class=\"nb\">tee </span>output.qs\n</code></pre></div></div>\n\n<p>I can imagine that in the future the script will have more such options, to do the same\nfor many authors at some affiliation, or all DOIs for a certain journal.</p>",
      "summary": "Scholarly articles provide context to the factualness of statements in Wikidata, similar to the [citation needed] in Wikipedia. And just like the cited references in each scholarly article itself. The citation network is general seen as an essential part of (doing) science, even without citation intention annotation. Nowadays, citations are mostly open, but this took very serious lobbying by the Initiative for Open Citations and not every publisher reacted immediately. But now that they are open, projects like OpenCitations are making this citation network FAIR.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/opencitationsImport.png",
      "date_published": "2024-09-07T00:00:00+00:00",
      "date_modified": "2024-09-07T00:00:00+00:00",
      "tags": ["wikidata","bioclipse","opencitations"],
      "_references": [{ "url": "https://doi.org/10.1002/JLAC.18721620110" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/epanj-4t315",
      "url": "https://chem-bla-ics.linkedchemistry.info/2024/08/23/scholia.html",
      "title": "Scholia configurability",
      "content_html": "<p><a href=\"https://scholia.toolforge.org/\">Scholia</a> is a visual layer on top of <a href=\"https://wikidata.org/\">Wikidata</a> providing\na rich user experience for browing scholarly research related knowledge. I am using the combinatie\nfor various things, including exploring new research topics (a method, compound, or protein I do not know so much\nabout yet), indexing notable research output (including citations), <a href=\"https://chem-bla-ics.linkedchemistry.info/tag/cito\">progress of Citation Typing Ontology\nuptake</a>, etc. This weekend I hope to send around the\nfinal draft for the <em>Scholia Chemistry</em> paper.</p>\n\n<p>Scholia has received a fair share of scholarly and social attention. The Scholia paper has been cited\n<a href=\"https://scholar.google.com/scholar?hl=en&amp;as_sdt=0%2C5&amp;q=scholia+wikidata&amp;btnG=&amp;oq=scholia\">over 100 times</a> and\nthe websites received about 200 thousand page views each day (though we do not know how to get Toolforge\nto give us sufficient insight into the how and what of that count). There is a Wikipedia template to link\nto Scholia and some of projects I am involved in link Scholia for articles, such as\n<a href=\"https://wikipathways.org/\">WikiPathways</a>.</p>\n\n<p>With that, there is also interest in using it for other Wikibases and perhaps even random SPARQL endpoints.\nThese things are not trivial, as Scholia uses complementary APIs, various URL patterns for some of the\nfunctionality, and generally, all SPARQL queries are tweaked to the Wikidata Blazegraph SPARQL endpoint\nto ensure results are returned in reasonable time. But that last requires use of Blazegraph extensions\nto the SPARQL standard.</p>\n\n<p>All this requires Scholia to become more independent, in a better model-view-controller model. And that\nactually turns out very important at this moment. That is, Wikidata is not a RDF-first database, but\na Wikibase-based store. Whenever an edit is made, RDF is generated and the SPARQL endpoint is updated.\nNow, the number of edits in Wikidata is enormous and the notion that the SPARQL endpoint is often minutes\nat most behind is a huge accomplishment. But the Blazegraph platform cannot keep up with Wikidata.\nBlazegraph is open source, but has been bought up and development stopped from one day to another.</p>\n\n<p>Therefore, a split of the Wikidata SPARQL platform is <a href=\"https://phabricator.wikimedia.org/T337013\">planned</a>.\nThis split will put one part of\nthe knowledge in on endpoint and the other half in the other. Any query that needs information\nfrom both graphs, will have to do a federated SPARQL query. Basically, there are very few Scholia\nqueries that do not rewriting. My first rewrite actually failed, because the rewriting is not\nobvious and quickly times out. To some extend, this is because now lots of results of subqueries\nneed to be send over the network from one endpoint to the other. When the combined query basically\ncovers half of each endpoint, that’s a lot of network traffic.</p>\n\n<p>An immediate use case of the configuration is therefore running Scholia against the current three\nendpoints: the current official endpoint, and the two split endpoints under development. With\n<a href=\"https://github.com/WDscholia/scholia/pull/2515\">a recent patch</a> <a href=\"@fnielsen@expressional.social\">Finn</a>\nand I worked on, this configuration looks like this (and saved as <code class=\"language-plaintext highlighter-rouge\">scholia.ini</code>:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>[query-server]\n# Wikidata:\n#sparql_endpoint = https://query.wikidata.org/sparql\n#sparql_editurl = https://query.wikidata.org/#\n#sparql_embedurl = https://query.wikidata.org/embed.html#\n\n# Wikidata Split Main\nsparql_endpoint = https://query-main.wikidata.org/sparql\nsparql_editurl = https://query-main.wikidata.org/#\nsparql_embedurl = https://query-main.wikidata.org/embed.html#\n\n# Wikidata Split Scholar\n#sparql_endpoint = https://query-scholarly.wikidata.org/sparql\n#sparql_editurl = https://query-scholarly.wikidata.org/#\n#sparql_embedurl = https://query-scholarly.wikidata.org/embed.html#\n</code></pre></div></div>\n\n<p>So, right now, we can test the impact of the split with Scholia and this patch.\nWe would fire up a local instances of Scholia, running against one of the\nsplit endpoints, and use the Toolforge instance as baseline.</p>\n\n<p>Now, on my system I need to use <a href=\"https://python.land/virtual-environments/virtualenv\">Python virtualenv</a>\nso, I first start a Scholia <code class=\"language-plaintext highlighter-rouge\">venv</code>:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nb\">source</span> ~/.venvs/scholia/bin/activate\n</code></pre></div></div>\n\n<p>After that, I can select an other endpoint, e.g. the <code class=\"language-plaintext highlighter-rouge\">main</code> Wikidata split endpoint (<code class=\"language-plaintext highlighter-rouge\">query-main-experimental.wikidata.org</code>)\nwere it not they are <a href=\"https://phabricator.wikimedia.org/T371833\">currently offline</a> as part of the transition\nand run Scholia on a unique port:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>scholia run\n</code></pre></div></div>\n\n<p>Then I can have two browser windows along side and compare Scholia pages againt the current\nScholia instance and when running against another SPARQL endpoint. For now, I can test how well\nScholia runs on the <a href=\"qlever.cs.uni-freiburg.de/wikidata\">QLever instance of Wikidata</a> (superfast and\nupdated data once a week). Here the configuration I have is not entirely complete, and many\nSPARQL queries do not work against QLever, including anything with graphical depiction. But\nthat said, I can use this configuration:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>[query-server]\n# QLever\n#sparql_endpoint = https://qlever.cs.uni-freiburg.de/api/wikidata\n#sparql_editurl = https://qlever.cs.uni-freiburg.de/wikidata/?query=\n#sparql_embedurl = \n</code></pre></div></div>\n\n<p>Then, I can compare, for example, the chemicals statistics the main Scholia with one running\nagainst QLever:</p>\n\n<p><img src=\"/assets/images/scholia_comparison.png\" alt=\"\" /></p>\n\n<p>This query ran without modification. For other queries rewriting is needed, but with this\nsetup we can at least quickly see the differences in the results.</p>",
      "summary": "Scholia is a visual layer on top of Wikidata providing a rich user experience for browing scholarly research related knowledge. I am using the combinatie for various things, including exploring new research topics (a method, compound, or protein I do not know so much about yet), indexing notable research output (including citations), progress of Citation Typing Ontology uptake, etc. This weekend I hope to send around the final draft for the Scholia Chemistry paper.",
      
      "date_published": "2024-08-23T00:00:00+00:00",
      "date_modified": "2024-09-05T00:00:00+00:00",
      "tags": ["scholia","wikidata","sparql"],
      "_references": [{ "url": "https://doi.org/10.1007/978-3-319-70407-4_36" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/7vhj4-ae665",
      "url": "https://chem-bla-ics.linkedchemistry.info/2024/08/15/kasabi-archives.html",
      "title": "Kasabi archive at the Internet Archive",
      "content_html": "<p><a href=\"https://www.wikidata.org/wiki/Q128214915\">Kasabi</a> was an innovative RDF publishing platform from around 2011.\n<a href=\"https://web.archive.org/web/20130907095112/http://blog.kasabi.com/about/\">Shortlived</a>, and maybe just too early.\nI published two open datasets there. One was ChEMBL-RDF (see these <a href=\"https://chem-bla-ics.linkedchemistry.info/tag/chembl\">posts</a>).\nThe second was a small data sets called <a href=\"https://chem-bla-ics.linkedchemistry.info/2011/07/06/chempedia-rdf-2-kasabi.html\">ChemPedia</a>,\na open science effort to crowdsource chemical names. This is still very much needed, and possibly Wikidata could fill that gap,\nbut it would first need to be able to handle all labels as statements itself.</p>\n\n<p>Anyway, just before they shutdown because of, I understood, lack of commercial interest, they\n<a href=\"https://archive.org/details/kasabi\">archived all data</a>, including the ChemPedia datasets. I was happy to be reminded about that,\nbecause I am not sure I had archived that data.</p>",
      "summary": "Kasabi was an innovative RDF publishing platform from around 2011. Shortlived, and maybe just too early. I published two open datasets there. One was ChEMBL-RDF (see these posts). The second was a small data sets called ChemPedia, a open science effort to crowdsource chemical names. This is still very much needed, and possibly Wikidata could fill that gap, but it would first need to be able to handle all labels as statements itself.",
      
      "date_published": "2024-08-15T00:00:00+00:00",
      "date_modified": "2024-08-15T00:00:00+00:00",
      "tags": ["semweb","chembl","kasabi","ia"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/y9chc-zb166",
      "url": "https://chem-bla-ics.linkedchemistry.info/2024/08/11/scholarly-discussions.html",
      "title": "Scholarly discussions through the eyes of CiTO (and Wikidata)",
      "content_html": "<p>Diabetes was already discussed in literature back in 1838-1839 (doi:<a href=\"https://doi.org/10.1016/S0140-6736(02)96038-1\">10.1016/S0140-6736(02)96038-1</a>,\ndoi:<a href=\"10.1016/S0140-6736(02)96066-6\">10.1016/S0140-6736(02)96066-6</a>, and doi:<a href=\"https://doi.org/10.1016/S0140-6736(02)83966-6\">10.1016/S0140-6736(02)83966-6</a>).\nThese three papers show a short discussion. Papers were a lot shorter back in the days, and the discussion actually shows why papers are longer now\n(tho I am not sure they really got sufficiently more reproducible, but that’s another discussion).</p>\n\n<p>Traditional citation counts do not make this discussion obvious, but if we make our publishing sufficiently FAIR (it’s far from that, right now),\nthen we can get a step closer. For example, with the <a href=\"https://purl.org/spar/cito\">Citation Typing Ontology</a>\nwe can show how the papers relate to each other:</p>\n\n<p><img src=\"/assets/images/clannyNetwork.png\" alt=\"\" /></p>\n\n<p>This network is based on public knowledge in <a href=\"https://wikidata.org/\">Wikidata</a> and actually can be easily reproduced by anyone\nwith <a href=\"https://w.wiki/AtV9\">this query</a>:</p>\n\n<div class=\"language-sparql highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"c1\">#defaultView:Graph</span><span class=\"w\">\n</span><span class=\"k\">SELECT</span><span class=\"w\"> </span><span class=\"k\">DISTINCT</span><span class=\"w\"> </span><span class=\"nv\">?focus1</span><span class=\"w\"> </span><span class=\"nv\">?focus1Label</span><span class=\"w\"> </span><span class=\"nv\">?focus2</span><span class=\"w\"> </span><span class=\"nv\">?focus2Label</span><span class=\"w\"> </span><span class=\"nv\">?edgeLabel</span><span class=\"w\"> </span><span class=\"k\">WHERE</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\">\n  </span><span class=\"k\">VALUES</span><span class=\"w\"> </span><span class=\"nv\">?focus1</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\"> </span><span class=\"nn\">wd</span><span class=\"o\">:</span><span class=\"ss\">Q124174475</span><span class=\"w\"> </span><span class=\"nn\">wd</span><span class=\"o\">:</span><span class=\"ss\">Q124174776</span><span class=\"w\"> </span><span class=\"nn\">wd</span><span class=\"o\">:</span><span class=\"ss\">Q124174815</span><span class=\"w\"> </span><span class=\"p\">}</span><span class=\"w\">\n  </span><span class=\"k\">VALUES</span><span class=\"w\"> </span><span class=\"nv\">?focus2</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\"> </span><span class=\"nn\">wd</span><span class=\"o\">:</span><span class=\"ss\">Q124174475</span><span class=\"w\"> </span><span class=\"nn\">wd</span><span class=\"o\">:</span><span class=\"ss\">Q124174776</span><span class=\"w\"> </span><span class=\"nn\">wd</span><span class=\"o\">:</span><span class=\"ss\">Q124174815</span><span class=\"w\"> </span><span class=\"p\">}</span><span class=\"w\">\n  </span><span class=\"nv\">?focus1</span><span class=\"w\"> </span><span class=\"nn\">p</span><span class=\"o\">:</span><span class=\"ss\">P2860</span><span class=\"w\"> </span><span class=\"nv\">?citation</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n  </span><span class=\"nv\">?citation</span><span class=\"w\"> </span><span class=\"nn\">ps</span><span class=\"o\">:</span><span class=\"ss\">P2860</span><span class=\"w\"> </span><span class=\"nv\">?focus2</span><span class=\"w\"> </span><span class=\"p\">;</span><span class=\"w\"> </span><span class=\"nn\">pq</span><span class=\"o\">:</span><span class=\"ss\">P3712</span><span class=\"w\"> </span><span class=\"nv\">?edge</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n  </span><span class=\"nv\">?edge</span><span class=\"w\"> </span><span class=\"nn\">rdfs</span><span class=\"o\">:</span><span class=\"ss\">label</span><span class=\"w\"> </span><span class=\"nv\">?edgeLabel</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\"> </span><span class=\"k\">FILTER</span><span class=\"p\">(</span><span class=\"nb\">LANG</span><span class=\"p\">(</span><span class=\"nv\">?edgeLabel</span><span class=\"p\">)</span><span class=\"w\"> </span><span class=\"p\">=</span><span class=\"w\"> </span><span class=\"s2\">\"en\"</span><span class=\"p\">)</span><span class=\"w\">\n  </span><span class=\"k\">SERVICE</span><span class=\"w\"> </span><span class=\"nn\">wikibase</span><span class=\"o\">:</span><span class=\"ss\">label</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\"> </span><span class=\"nn\">bd</span><span class=\"o\">:</span><span class=\"ss\">serviceParam</span><span class=\"w\"> </span><span class=\"nn\">wikibase</span><span class=\"o\">:</span><span class=\"ss\">language</span><span class=\"w\"> </span><span class=\"s2\">\"[AUTO_LANGUAGE],mul,en\"</span><span class=\"p\">.</span><span class=\"w\"> </span><span class=\"p\">}</span><span class=\"w\">\n</span><span class=\"p\">}</span><span class=\"w\">\n</span></code></pre></div></div>\n\n<p>The two “focus” values are an identical list of the articles I want to see. To make sure to get citations between all of them,\nI have to give them twice.</p>\n\n<p>In the above example I have used <code class=\"language-plaintext highlighter-rouge\">VALUES</code> for this, but I can also generate the controlled list of items between the citations\nI want to visualize with any SPARQL fragment too. <a href=\"https://edu.nl/y38rg\">This query</a> does that (or here as\n<a href=\"https://gist.github.com/egonw/b5fb7ae550c1597ff247f70cee8063c8\">GitHub Gist</a>, but something else too: it uses a trick I learned\nfrom <a href=\"https://scholia.toolforge.org/author/Q20980928\">Finn Nielsen</a> from <a href=\"https://github.com/WDscholia/scholia/commit/d34dee85bc12575e0f1891c4e663ef8e2c450083\">this patch</a>\nfrom the <a href=\"https://scholia.toolforge.org/\">Scholia</a> project (doi:<a href=\"https://doi.org/10.1007/978-3-319-70407-4_36\">10.1007/978-3-319-70407-4_36</a>)).</p>\n\n<p>Here, I select the articles by replacing the above <code class=\"language-plaintext highlighter-rouge\">VALUES</code> lines with this fragment (<code class=\"language-plaintext highlighter-rouge\">P50</code> is ‘author’ and <code class=\"language-plaintext highlighter-rouge\">Q20895241</code> is me in Wikidata):</p>\n\n<pre><code class=\"language-SPARQL\">  ?focus1 wdt:P50 wd:Q20895241 .\n  ?focus2 wdt:P50 wd:Q20895241 .\n</code></pre>\n\n<p>And, to be honest, then I get this network which is much richer than I expected:</p>\n\n<p><img src=\"/assets/images/willighagen_cito.png\" alt=\"\" /></p>\n\n<p>I wonder how far we can push this. Can we also do this for the <a href=\"https://scholia.toolforge.org/venue/Q6294930\">Journal of Cheminformatics</a>?\nAfter all, this journal had a <a href=\"https://www.biomedcentral.com/collections/cito\">CiTO Pilot</a> and, indeed,\n<a href=\"https://edu.nl/hk8xy\">the results do not disappoint</a>! All I had to do was replace the focus section:</p>\n\n<pre><code class=\"language-SPARQL\">  ?focus1 wdt:P1433 wd:Q6294930 .\n  ?focus2 wdt:P1433 wd:Q6294930 .\n</code></pre>",
      "summary": "Diabetes was already discussed in literature back in 1838-1839 (doi:10.1016/S0140-6736(02)96038-1, doi:10.1016/S0140-6736(02)96066-6, and doi:10.1016/S0140-6736(02)83966-6). These three papers show a short discussion. Papers were a lot shorter back in the days, and the discussion actually shows why papers are longer now (tho I am not sure they really got sufficiently more reproducible, but that’s another discussion).",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/clannyNetwork.png",
      "date_published": "2024-08-11T00:00:00+00:00",
      "date_modified": "2024-08-11T00:00:00+00:00",
      "tags": ["cito","wikidata"],
      "_references": [{ "url": "https://doi.org/10.1016/S0140-6736(02)96038-1" },{ "url": "https://doi.org/10.1016/S0140-6736(02)96066-6" },{ "url": "https://doi.org/10.1016/S0140-6736(02)83966-6" },{ "url": "https://doi.org/10.1007/978-3-319-70407-4_36" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/8c1e7-8yp77",
      "url": "https://chem-bla-ics.linkedchemistry.info/2024/08/07/cito-updates.html",
      "title": "CiTO updates: Wakefield and WikiPathways",
      "content_html": "<p>This summer I am trying to finish up some smaller projects that I did not have time for to finish, with\nmixed successes. I am combing this with a nice Dutch staycation, and I already cycled in\n<a href=\"https://en.wikipedia.org/wiki/Overijssel\">Overijssel</a> and in south-west <a href=\"https://en.wikipedia.org/wiki/Friesland\">Friesland</a>\nand learning about their histories.\nBut this post is about an update on my Citation Typing Ontology use cases. And I have to say,\na <a href=\"https://www.youtube.com/watch?v=1kD7jkyDr3s\">mention by Silvio Peroni</a> is pretty awesome, thanks!</p>\n\n<p>First, the bad news. I still did not get around to the following to tasks I have. First, I need to write up a\nstep-by-step guide how to create <a href=\"https://chem-bla-ics.linkedchemistry.info/2024/04/02/open-science-retreat-2.html\">CiTO nanopublications</a>\nand matching draft article. Second, I still need to work out how to update the JATS workflow for\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2021/11/15/biohackathon-europe-2021-1-cito.html\">CiTO annotation in BioHackrXiv</a>.</p>\n\n<h2 id=\"wakefield\">Wakefield</h2>\n\n<p>Let’s first start with a dataset. Peroni mentioned a study they did (<a href=\"https://doi.org/10.1007/S11192-021-04097-5\">10.1007/S11192-021-04097-5</a>)\ninto why the famous Wakefield paper\n(doi:<a href=\"https://doi.org/10.1016/S0140-6736(97)11096-0\">10.1016/S0140-6736(97)11096-0</a>) is cited. They published\ntheir data set on Zenodo (doi:<a href=\"https://doi.org/10.5281/zenodo.13166142\">10.5281/zenodo.13166142</a>) with CCZero,\nso I imported it into <a href=\"https://wikidata.org/\">Wikidata</a>. Well, at least the citations\nof articles already in Wikidata. I used a Bacting (doi:<a href=\"https://doi.org/10.21105/joss.02558\">10.21105/joss.02558</a>)\n<a href=\"https://gist.github.com/egonw/379c72a49517716712b70bdee0d845ce\">script</a> and it actually was quite short.\nIn the end, this added some 500 new citation intentions to Wikidata, now at almost <a href=\"https://scholia.toolforge.org/cito/\">2000</a>.\nThis is also the third dataset with explicit CiTO intention annotations (see also\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2023/04/02/cito-updates-4-annotations-in-datasets.html\">this post</a>).</p>\n\n<p>This is what the <a href=\"https://scholia.toolforge.org/work/Q28264479#cito-incoming\">CiTO section of the Wakefield paper</a>\nin <a href=\"https://scholia.toolforge.org/\">Scholia</a> (doi:<a href=\"https://doi.org/10.1007/978-3-319-70407-4_36\">10.1007/978-3-319-70407-4_36</a>)\nnow looks like:</p>\n\n<p><img src=\"/assets/images/wakefieldCitations.png\" alt=\"\" /></p>\n\n<h2 id=\"wikipathways\">WikiPathways</h2>\n\n<p>A second thing I want to show is a potentional CiTO intention annotation dataset. Almost two years ago\n<a href=\"https://qoto.org/@xanderpico\">Alex Pico</a> started a new <a href=\"https://wikipathways.org/\">WikiPathways</a>\nfeature as part of the new website (doi:<a href=\"https://doi.org/10.1093/NAR/GKAD960\">10.1093/NAR/GKAD960</a>)):\n<a href=\"https://github.com/wikipathways/wikipathways-database/commit/97f7df0057d312f0c332a9ff290c11684bf252d5\">a list of citations to specific pathways</a>\n(in WikiPathways). Alex’ setup is fully automated and using <a href=\"https://www.ncbi.nlm.nih.gov/pmc/\">PubMed Central</a>\nand find mentions in figure captions:</p>\n\n<p><em>Beyond citations to previous WikiPathways journal articles, we have identified 1228 mentions of a total of 582\nunique WikiPathways pathway model identifiers, e.g. WP4846, in PubMedCentral articles over the past 13 years.</em></p>\n\n<p>The file format is a pretty basic YAML file:</p>\n\n<p><img src=\"/assets/images/citedin_yaml.png\" alt=\"\" /></p>\n\n<p>Additional mentions are found in the main text and tables in the article. These are not always picked up.\nThese can be added manually. Over the past months and the past two weeks particularly, I have been adding\nadditional mentions, not listed yet. We now passed 1500 mentions but I cannot easily give the other\nstatistics.</p>\n\n<p>BTW, anyone can add these citations with the ‘edit’ pencil and some Microsoft GitHub editing (but\nas far as I am concerned, please feel free to also just mention the paper on the\n<a href=\"https://github.com/wikipathways/wikipathways-help/discussions\">WikiPathways Community Forum</a>):</p>\n\n<p><img src=\"/assets/images/citedin_website.png\" alt=\"\" /></p>\n\n<p>So, in the next few days I plan to do two things: 1. generate RDF for the YAML file and make that part of the\n<a href=\"https://data.wikipathways.org/current/rdf/\">monthly WikiPathways RDF release</a>; 2. extract citations and\noffer this back to <a href=\"https://opencitations.net/\">the OpenCitations project</a>; and, 3. add the citations\ninto Wikidata. Of course, all with <code class=\"language-plaintext highlighter-rouge\">cito:usesDataFrom</code> :)</p>\n\n<p>There is a fourth things that I am still thinking about. I can also use the above data the annotation\ncitations to the WikiPathways papers if they also mention a WikiPathways identifier as <code class=\"language-plaintext highlighter-rouge\">cito:usesDataFrom</code>,\nbut I cannot fully oversee the implications of that. What do you think?</p>",
      "summary": "This summer I am trying to finish up some smaller projects that I did not have time for to finish, with mixed successes. I am combing this with a nice Dutch staycation, and I already cycled in Overijssel and in south-west Friesland and learning about their histories. But this post is about an update on my Citation Typing Ontology use cases. And I have to say, a mention by Silvio Peroni is pretty awesome, thanks!",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/wakefieldCitations.png",
      "date_published": "2024-08-07T00:00:00+00:00",
      "date_modified": "2024-08-07T00:00:00+00:00",
      "tags": ["cito","wikipathways","wikidata"],
      "_references": [{ "url": "https://doi.org/10.1016/S0140-6736(97)11096-0" },{ "url": "https://doi.org/10.21105/JOSS.02558" },{ "url": "https://doi.org/10.1007/978-3-319-70407-4_36" },{ "url": "https://doi.org/10.5281/ZENODO.13166142" },{ "url": "https://doi.org/10.1093/NAR/GKAD960" },{ "url": "https://doi.org/10.1007/S11192-021-04097-5" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/32j3a-7ae65",
      "url": "https://chem-bla-ics.linkedchemistry.info/2024/07/31/directed-metabolic-network.html",
      "title": "New paper: &quot;Discovering life&apos;s directed metabolic (sub)paths to interpret human biochemical markers using the DSMN tool&quot;",
      "content_html": "<p>I am still catching up with a lot of work, and found out I actually had forgotten to blog about this cool article\nby <a href=\"https://scholar.google.com/citations?user=Le-4tuQAAAAJ&amp;hl\">Denise Slenter</a>: “Discovering life’s directed metabolic (sub)paths to\ninterpret human biochemical markers using the DSMN tool” (doi:<a href=\"https://doi.org/10.1039/D3DD00069A\">10.1039/D3DD00069A</a>).\nThis paper explains how various open science resources (<a href=\"https://www.wikidata.org/\">Wikidata</a>,\n<a href=\"https://reactome.org/\">Reactome</a>, <a href=\"https://www.wikipathways.org/\">WikiPathways</a>) are used to visualize\nthe biological story of the data from two metabolomics experiments archived in MetaboLights.</p>\n\n<p>Using <a href=\"https://neo4j.com/\">Neo4J</a> and <a href=\"https://cytoscape.org/\">Cytoscape</a> she visualizes the data onto a network created with\nRDF, <a href=\"https://en.wikipedia.org/wiki/SPARQL\">SPARQL</a> from the above resources:</p>\n\n<p><img src=\"/assets/images/d3dd00069a-f12_hi-res.png\" alt=\"\" /></p>\n\n<p>The whole approach uses open science, making the work very reproducible. This is essential, as our knowledge\nabout metabolic processes continues to grow, if not only for the human lipids, but also from molecular\nimaging technologies. Moreover, a lot of biological detail is yet to be encoded on pathway databases,\nsuch as cellular location of proteins and metabolites, which proteins are expressed in which tissue, or\nthe kinetics of metabolic reactions. All knowledge that can be pulled it via knowledge graphs becomes\nimmediately available by using this <a href=\"https://en.wikipedia.org/wiki/FAIR_data\">FAIR</a> approach.</p>\n\n<p>One last note, the reader may notice a focus on the shortest path. Of course, the biological relevant\npath may not be the “shortest” path. But from a network analysis perspective that question is purely\nacademic. Neo4J, like other tools, support finding all paths. But validation which paths (the shorter\nor any of the longer) is biologically most relevant first depends on actually more biological\nknowledge to become FAIR. After this, it is just push button.</p>",
      "summary": "I am still catching up with a lot of work, and found out I actually had forgotten to blog about this cool article by Denise Slenter: “Discovering life’s directed metabolic (sub)paths to interpret human biochemical markers using the DSMN tool” (doi:10.1039/D3DD00069A). This paper explains how various open science resources (Wikidata, Reactome, WikiPathways) are used to visualize the biological story of the data from two metabolomics experiments archived in MetaboLights.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/d3dd00069a-f12_hi-res.png",
      "date_published": "2024-07-31T00:00:00+00:00",
      "date_modified": "2024-07-31T00:00:00+00:00",
      "tags": ["wikipathways","metabolomics"],
      "_references": [{ "url": "https://doi.org/10.1039/D3DD00069A" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/8x2f1-h6d21",
      "url": "https://chem-bla-ics.linkedchemistry.info/2024/07/21/rogue-scholar-and-more.html",
      "title": "GoatCounter, Rogue Scholar and more new things",
      "content_html": "<p>About <a href=\"https://chem-bla-ics.linkedchemistry.info/2023/07/27/archiving-and-updating-my-blog.html\">a year ago</a> I started migrating\nmy blogger.com blog to a git-version-controlled, Markdown-based blogging platform. I have to say, it has been a happy year.\nIt actually is awesome to port old blog posts (<a href=\"https://egonw.github.io/blog/\">follow that here</a>) and to see what I have been\nworking on some 17, 18 years ago.</p>\n\n<p>I do have a nasty bug to fix that causes the conversion of the Markdown to HTML is scaling badly. The system is doing some indexing at\nthe wrong time, and probably all indexing for each post again. Kudos if you spot it.</p>\n\n<p>But while still being on a Jekyll learning curve, some nice things have happened since I started. This blog started with\nInChIKeys, as demonstrated in <a href=\"https://doi.org/10.59350/fbnx1-9r832\">this post</a>,\nwhich adds <a href=\"https://chem-bla-ics.linkedchemistry.info/molecule/DEIYFTQMQPDXOT-UHFFFAOYSA-N\">this molecule page</a>. On my wishlist\nis still a <a href=\"https://chem-bla-ics.linkedchemistry.info/tag/rss\">CMLRSS</a>-based feed.</p>\n\n<p>Newer is things I worked on since, this includes the following, and something that readers of my blog may be interested in\nlearning about. First, I started counting visitors again, but with the GDPR-compliant <a href=\"https://goatcounter.com/\">GoatCounter</a>.\nI have been using my social network as advisory board, and knowing what people find interested matters to me.</p>\n\n<p>The second thing is listing in <a href=\"https://rogue-scholar.org/\">The Rogue Scholar</a>. This is a new platform, like a blog planet, perhaps\na bit like (the late) <a href=\"https://chem-bla-ics.blogspot.com/search?q=%22chemical+blogspace%27\">Chemical blogspace</a> and (the late)\n<a href=\"https://chem-bla-ics.blogspot.com/search?q=%22postgenomic.com%22\">Postgenomic.com</a>, but so far without the extraction of\njournal articles (tho it did start <a href=\"https://doi.org/10.53731/j77gv-54g66\">recognizing some references</a>),\nchemicals, and conferences. Instead, they offer <a href=\"https://doi.org/10.53731/br9f5xa-a556w2t\">archiving</a>\n<a href=\"https://doi.org/10.53731/g60vh-3ng48\">by the Internet Archive</a>, <a href=\"https://doi.org/10.53731/6mkrk-dzh02\">DOIs for your blog posts</a>,\n<a href=\"https://doi.org/10.53731/1dfxr-hs665\">ePub and PDF downloads</a>, and <a href=\"https://doi.org/10.53731/3w1ye-q6z42\">JATS</a>.\nThe just passed the milestone of <a href=\"https://doi.org/10.53731/xkfsa-xkk56\">100 participating blogs</a>!\nPlease do check it out, it’s an awesome service.</p>\n\n<p><img src=\"/assets/images/chemblaics-on-roguescholar.png\" alt=\"\" /></p>\n\n<p>A final thing I want to mention here is that my blog now has an <a href=\"https://chem-bla-ics.linkedchemistry.info/archive/\">archive page</a>,\nwhich sometimes can be useful.</p>\n\n<p>Let’s see what I can say next year, when my blog celebrates its 20th birthday :)</p>",
      "summary": "About a year ago I started migrating my blogger.com blog to a git-version-controlled, Markdown-based blogging platform. I have to say, it has been a happy year. It actually is awesome to port old blog posts (follow that here) and to see what I have been working on some 17, 18 years ago.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/chemblaics-on-roguescholar.png",
      "date_published": "2024-07-21T00:00:00+00:00",
      "date_modified": "2024-07-21T00:00:00+00:00",
      "tags": ["blog"],
      "_references": [{ "url": "https://doi.org/10.59350/fbnx1-9r832" },{ "url": "https://doi.org/10.53731/j77gv-54g66" },{ "url": "https://doi.org/10.53731/br9f5xa-a556w2t" },{ "url": "https://doi.org/10.53731/g60vh-3ng48" },{ "url": "https://doi.org/10.53731/6mkrk-dzh02" },{ "url": "https://doi.org/10.53731/1dfxr-hs665" },{ "url": "https://doi.org/10.53731/3w1ye-q6z42" },{ "url": "https://doi.org/10.53731/xkfsa-xkk56" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/dtfq8-5x011",
      "url": "https://chem-bla-ics.linkedchemistry.info/2024/06/16/cdk2024-3.html",
      "title": "cdk2024 #3: an unexpected downstream project",
      "content_html": "<p>In <a href=\"https://chem-bla-ics.linkedchemistry.info/2024/04/07/cdk2024.html\">the CDK2024</a>\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2024/05/18/cdk2024-2.html\">grant</a> we wrote about\nupdating various software projects using the <a href=\"https://cdk.github.io/\">Chemistry Development Kit</a>.\nWe even wrote that “[r]equired API changes will be publicly shared and disseminated with the\nGroovy Cheminformatics with the Chemistry Development Kit book (egonw.github.io/cdkbook/)”.\nThe <em>Groovy Cheminformatics with the Chemistry Development Kit</em> book is a project that has\nrun since 2009.</p>\n\n<pre><code class=\"language-git\">commit c5cbf9b5dd49baf582afc595c9cbafc714c5199f\nAuthor: Egon Willighagen &lt;egon.willighagen@gmail.com&gt;\nDate:   Fri Apr 10 12:34:42 2009 +0200\n\n    Initial copy of the current draft; converted into separate project for easier branching\n    for tunes of the book for workshops and sorts\n</code></pre>\n\n<p>The original version was in LaTeX and\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2011/02/06/groovy-cheminformatics.html\">sold online via Lulu.com <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.\nBecause all code examples were run (the first public edition had 72 pages with 75 code examples),\nlike RMarkdown of Jupyter Notebooks by design, I was able to\nmake <a href=\"https://chem-bla-ics.blogspot.com/search?q=lulu\">many releases</a>.\nThe big advantage of this was that when <a href=\"https://en.wikipedia.org/wiki/API\">API</a> changes happened,\nthis would be visible by code not compiling or by output changing.</p>\n\n<p>At some point I open sourced the book (doi:<a href=\"https://doi.org/10.6084/M9.FIGSHARE.2057790.V1\">10.6084/M9.FIGSHARE.2057790.V1</a>)\nand then realized that I can <a href=\"https://github.com/egonw/cdkbook/commit/2630699aa280200188f2ae9ef3f0698964926752\">convert the book to Markdown</a>:</p>\n\n<pre><code class=\"language-git\">commit 2630699aa280200188f2ae9ef3f0698964926752\nAuthor: Egon Willighagen &lt;egon.willighagen@gmail.com&gt;\nDate:   Mon Dec 24 16:59:14 2018 +0100\n\n    Create chapter3.md\n</code></pre>\n\n<p>This is the version available at <a href=\"https://egonw.github.io/cdkbook/\">egonw.github.io/cdkbook/</a>\nfor some time now. So, now that for SMARTCyp I need to update the visualization, I went book to my book of\ncode examples (I have a collection of more than 200 examples), but then found that\nthe chapter on <a href=\"https://egonw.github.io/cdkbook/depiction\">Depiction</a> was missing. I was not\nlooking forward to this, because I know that\nthe code examples predate a massive improvement by <a href=\"https://scholia.toolforge.org/author/Q28796322\">John Mayfield</a>\nof the rendering stack and I never got around to see if the examples from the book work well enough\nwith that new API (one is actually updated).</p>\n\n<p>That is when I realized that the <em>Groovy Cheminformatics</em> book actually also is a downstream\nproject that needs updating. I have been doing this already and it’s fairly smooth so that I did\nnot think of including it in the grant, other than updating the\n<a href=\"https://egonw.github.io/cdkbook/migration\">Migration</a> chapter. I now had enough time\nto dive into <a href=\"https://github.com/cdk/nwo-openscience-2024/issues/30\">this project</a>. I need that,\nbecause the goal of the project is also to learn about all the meta science aspects of\nproject maintenance, roles, communication, etc. Therefore also this blog post: we need a track\nrecord, to collect data.</p>\n\n<p>Anyway, porting <a href=\"https://egonw.github.io/cdkbook/code/RenderMolecule.code.html\">the first script</a> went fairly easy,\nbut I am now running into a stacktrace:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>Processing  RenderSelection.groovyin\ndoing RenderSelection.out ...\norg.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed:\n/home/egonw/var/Projects/hub/cdkbook-source/code/RenderSelection.groovy: 39: unable to resolve class ExternalHighlightGenerator\n @ line 39, column 16.\n   generators.add(new ExternalHighlightGenerator());\n                  ^\norg.codehaus.groovy.syntax.SyntaxException: unable to resolve class ExternalHighlightGenerator\n @ line 39, column 16.\n\n</code></pre></div></div>\n\n<p>That brings us to the task of how to find where that class is coming from, which happens\nto be something I already <a href=\"https://github.com/cdk/nwo-openscience-2024/issues/29\">had to write up</a>\nfor up for <code class=\"language-plaintext highlighter-rouge\">RingSearch</code>. Dependency galore.</p>",
      "summary": "In the CDK2024 grant we wrote about updating various software projects using the Chemistry Development Kit. We even wrote that “[r]equired API changes will be publicly shared and disseminated with the Groovy Cheminformatics with the Chemistry Development Kit book (egonw.github.io/cdkbook/)”. The Groovy Cheminformatics with the Chemistry Development Kit book is a project that has run since 2009.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/cdkDepictChapter.png",
      "date_published": "2024-06-16T00:00:00+00:00",
      "date_modified": "2024-12-29T00:00:00+00:00",
      "tags": ["cdk","grant","cdk2024"],
      "_references": [{ "url": "https://doi.org/10.6084/M9.FIGSHARE.2057790.V1" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/m9g28-dne38",
      "url": "https://chem-bla-ics.linkedchemistry.info/2024/06/10/two-meetings.html",
      "title": "Two meetings: ELIXIR Toxicology and FAIR4ChemNL",
      "content_html": "<p>Noting that in the coming week I am not attending the <a href=\"https://elixir-europe.org/events/elixir-all-hands-2024\">ELIXIR All Hands in Uppsala</a>.\nHaving lived in (and around) Uppsala for more than three years, I am disappointed and with the first stories from colleagues coming\nin even more. But it has been a way too busy year, I have much to finish up, and I need to take care of myself too. I am not 32 anymore.</p>\n\n<p>But in the past two weeks I did attend two workshops. The first was a <a href=\"https://www.aanmelder.nl/intoxicom2024firstworkshop\">workshop</a> by the\n<a href=\"https://elixir-europe.org/communities/toxicology\">ELIXIR Toxicology Community</a>, which was held in Utrecht/NL. The programme was around\nFAIR and included two really nice hands-on sessions where we developed drafts for <a href=\"https://faircookbook.elixir-europe.org/\">FAIR Cookbook</a>\nrecipes (see also doi:<a href=\"https://doi.org/10.1038/s41597-023-02166-3\">10.1038/s41597-023-02166-3</a>) and for\n<a href=\"https://www.go-fair.org/how-to-go-fair/fair-implementation-profile/\">FAIR Implementation Profiles</a>\n(doi:<a href=\"https://doi.org/10.1007/978-3-030-65847-2_13\">10.1007/978-3-030-65847-2_13</a>). We will write up a\n<a href=\"https://biohackrxiv.org/discover\">BioHackrXiv</a> report.</p>\n\n<p>The second workshop was last week, the <a href=\"https://tdcc.nl/evenementen/fair4chemnl-workshop/\">FAIR4ChemNL workshop</a>, which was also held\nin Utrecht/NL. The topic was FAIR in chemistry, and we discussed various aspects. There was a significant participant group from the\nGerman NFDI4Cat project (“Cat” is short for (chemical) catalysis), which recently published a nice analysis of several ontologies\n(doi:<a href=\"https://doi.org/10.1186/s13321-024-00807-2\">10.1186/s13321-024-00807-2</a>). And there was also a lot of mention of RDF and SPARQL.</p>\n\n<p>I think it is time for a new special issue around semantic web technologies.</p>",
      "summary": "Noting that in the coming week I am not attending the ELIXIR All Hands in Uppsala. Having lived in (and around) Uppsala for more than three years, I am disappointed and with the first stories from colleagues coming in even more. But it has been a way too busy year, I have much to finish up, and I need to take care of myself too. I am not 32 anymore.",
      
      "date_published": "2024-06-10T00:00:00+00:00",
      "date_modified": "2024-06-10T00:00:00+00:00",
      "tags": ["elixir","fair","chemistry","rdf","sparql"],
      "_references": [{ "url": "https://doi.org/10.1038/S41597-023-02166-3" },{ "url": "https://doi.org/10.1007/978-3-030-65847-2_13" },{ "url": "https://doi.org/10.1186/S13321-024-00807-2" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/b4tm0-s7c62",
      "url": "https://chem-bla-ics.linkedchemistry.info/2024/06/10/linking-fair-to-reuse.html",
      "title": "New paper: FAIR assessment of nanosafety data reusability with community standards",
      "content_html": "<p><a href=\"FAIR assessment of nanosafety data reusability with community standards\">Ammar</a> is finishing up his PhD thesis with his\nresearch on the use of FAIR towards predictive toxicology. Or, “AI ready”, as the term FAIR is now sometimes explained.\nAny computational method needs good data, and just FAIR is not enough. It needs to meet community standards, as formalized\nin R1.3. To me, this includes meeting community standards like minimal reporting standards. Indeed, in the\n<a href=\"https://www.nanosafetycluster.eu/\">EU NanoSafety Cluster</a> the notion that FAIR data also needs be scientifically\ngood data is well noted.</p>\n\n<p>In this paper (doi:<a href=\"https://doi.org/10.1038/s41597-024-03324-x\">10.1038/s41597-024-03324-x</a>),\nAmmar explores this notion and compiled more than 200 maturity indicators in the category R1.3\nresulting from 12 different community standards. For example, this includes minimal reporting standards. There\nis overlap in needs, but they often also have a different focus. The conclusion here: different (re)use cases\nhave different needs, and data not usable to one use case can be sufficiently FAIR for another. Of course, ideally,\nit would be FAIR enough for all use cases.</p>\n\n<p>Ammar formalizes the maturity indicators and links the comming maturity indicators to various use cases.\nThat means that when you determine the indicator values for your data, people can immediately lookup how\nthis data can be reused. And, the generator of the data can immediately see how the data would need to be\nimproved to widen the reusability. How FAIR can we get?</p>\n\n<p>His proposal has already been further explored in two other papers, one around data sharing\n(doi:<a href=\"https://doi.org/10.1038/s41596-024-00993-1\">10.1038/s41596-024-00993-1</a>, see also\n<a href=\"https://doi.org/10.59350/vfvwq-s0v13\">this blog post</a>) and one around QSAR modelling\n(doi:<a href=\"https://doi.org/10.1016/j.impact.2023.100475\">10.1016/j.impact.2023.100475</a>,\nsee also <a href=\"https://doi.org/10.59350/7zf38-w9670\">this blog post</a>).</p>\n\n<p>The below screenshot shows what an analysis using this approach can look like:</p>\n\n<p><img src=\"/assets/images/41597_2024_3324_Fig3_HTML.png\" alt=\"\" /></p>",
      "summary": "Ammar is finishing up his PhD thesis with his research on the use of FAIR towards predictive toxicology. Or, “AI ready”, as the term FAIR is now sometimes explained. Any computational method needs good data, and just FAIR is not enough. It needs to meet community standards, as formalized in R1.3. To me, this includes meeting community standards like minimal reporting standards. Indeed, in the EU NanoSafety Cluster the notion that FAIR data also needs be scientifically good data is well noted.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/41597_2024_3324_Fig3_HTML.png",
      "date_published": "2024-06-10T00:00:00+00:00",
      "date_modified": "2024-06-10T00:00:00+00:00",
      "tags": ["fair","toxicology","qsar"],
      "_references": [{ "url": "https://doi.org/10.1038/S41597-024-03324-X" },{ "url": "https://doi.org/10.1016/J.IMPACT.2023.100475" },{ "url": "https://doi.org/10.1038/S41596-024-00993-1" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/vfvwq-s0v13",
      "url": "https://chem-bla-ics.linkedchemistry.info/2024/05/27/from-spreadsheets-to-rdf.html",
      "title": "New paper: A template wizard for the cocreation of machine-readable data-reporting to harmonize the evaluation of (nano)materials",
      "content_html": "<p>I was about to call this blog post <em>From spreadsheets to RDF</em>, after <a href=\"https://chem-bla-ics.linkedchemistry.info/2024/05/20/from-papers-to-rdf.html\">the post last week</a>.\nBut then I decided to just use the pattern I typically use. Why I wanted to use that shorter term in the first\nplace was that one of the thing I like about the <a href=\"https://sourceforge.net/projects/ambit/\">AMBIT software</a>\n(of OpenTox and eNanoMapper fame) is its\nRDF support (see doi:<a href=\"https://doi.org/10.1186/1756-0500-4-487\">10.1186/1756-0500-4-487</a>). But\n<a href=\"https://chem-bla-ics.linkedchemistry.info/tag/rdf\">RDF</a>, ontologies,\nthose are hard things. And unlike mathematics, we do not have simple objects like integer numbers or simple\noperators. Well, I think we do, and we talk about them. But there is no obligatory education. Just like\nany biologist needs to know what <em>1 + 2</em> means, I think any biologist needs basic knowledge about how\nknowledge graphs work. But sometimes feels like a taboo, like cursing in the life sciences church.</p>\n\n<p>So, there we are. This is where spreadsheets come in. If done well, they combine aspects of knowledge graphs\nwith usability and can even cover a good bit of the learnability. This is what is described in this new\npaper about templates in the <a href=\"https://www.nanosafetycluster.eu/\">EU NanoSafety Cluster</a>: <em>A template wizard\nfor the cocreation of machine-readable data-reporting to harmonize the evaluation of (nano)materials</em>\n(doi:<a href=\"https://doi.org/10.1038/s41596-024-00993-1\">10.1038/s41596-024-00993-1</a>).</p>\n\n<p>The learnability comes in with the spreadsheet templates (“this is how we did it”) and a “wizard” around\nit guides the user with the selection of a template but also can provide feedback on the template. The\ntechnical term for that is “validator”, but it can be tought of as a spelling checker. Computers are good at\nfinding contradictions (the lack of a pattern), though less good at ranking the alternatives (which is\nthe cause of hallucinations in AI approaches).</p>\n\n<p>And to return to the RDF, software like AMBIT can read these templates, use the semantics linked to the\ntemplate, and make the FAIR static spreadsheets (good for archiving on Zenodo!) available as FAIR interactive\ndata (good for exploration and machine learning), and as RDF (good for data integration).</p>\n\n<p>Congrats to <a href=\"http://orcid.org/0000-0002-4322-6179\">Nina</a> and the various EU NanoSafety Cluster projects!</p>",
      "summary": "I was about to call this blog post From spreadsheets to RDF, after the post last week. But then I decided to just use the pattern I typically use. Why I wanted to use that shorter term in the first place was that one of the thing I like about the AMBIT software (of OpenTox and eNanoMapper fame) is its RDF support (see doi:10.1186/1756-0500-4-487). But RDF, ontologies, those are hard things. And unlike mathematics, we do not have simple objects like integer numbers or simple operators. Well, I think we do, and we talk about them. But there is no obligatory education. Just like any biologist needs to know what 1 + 2 means, I think any biologist needs basic knowledge about how knowledge graphs work. But sometimes feels like a taboo, like cursing in the life sciences church.",
      
      "date_published": "2024-05-27T00:00:00+00:00",
      "date_modified": "2024-05-27T00:00:00+00:00",
      "tags": ["rdf","opentox","fair"],
      "_references": [{ "url": "https://doi.org/10.1186/1756-0500-4-487" },{ "url": "https://doi.org/10.1038/S41596-024-00993-1" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/jdj8r-h6187",
      "url": "https://chem-bla-ics.linkedchemistry.info/2024/05/20/from-papers-to-rdf.html",
      "title": "New paper: From papers to RDF-based integration of physicochemical data and adverse outcome pathways for nanomaterials",
      "content_html": "<p>Making something FAIR is hard, particularly when you do more than making something findable. We’ve seen before that\nmaking something usefully findable <a href=\"https://chem-bla-ics.blogspot.com/2020/10/new-paper-semi-automated-workflow-for.html?q=serena\">requires deep indexing</a>,\nand already that continues to be difficult, because we are not seeing it enough.\nSo, when I thought convert a <a href=\"https://chem-bla-ics.blogspot.com/2021/05/new-strategy-towards-generation-of.html\">paper led by Hoet’s lab in Leuven</a>\ninto machine-actionable RDF to make it FAIR, I gravely underestimated the amount of work.\n<a href=\"https://scholia.toolforge.org/author/Q99306396\">Jeaphianne</a> et al. did an awesome job on this work\n(doi:<a href=\"https://doi.org/10.1186/s13321-024-00833-0\">10.1186/s13321-024-00833-0</a>).</p>\n\n<p>The idea was simple: write up which nanomaterial (type) activates which molecular initiating event.\nIt would simply annotate each material with a unique identifier to link it to databases like\n<a href=\"https://enanomapper.adma.ai/\">eNanoMapper</a> and <a href=\"https://doi.org/10.3389/fphy.2023.1271842\">NanoCommons</a>\nand it would use unique identifiers for the\n<a href=\"https://chem-bla-ics.blogspot.com/2022/05/new-providing-adverse-outcome-pathways.html\">Adverse Outcome Pathway</a>) (AOP) key events.\nAs such, it would make a direct link in the growing linked open data cloud between the AOPs\nand the nanomaterial databases.</p>\n\n<p>Unfortunately, it was quickly discovered that actually reusing this new datasets requires rich annotation (metadata!)\nof the materials and the materials from the source paper were not yet in material databases.\nAnd then the cumbersome start was started, resulting in a very rich data model describing the\nkey events, the materials, the assays used, and the original papers themselves:</p>\n\n<p><img src=\"/assets/images/13321_2024_833_Fig1_HTML.png\" alt=\"\" /></p>\n\n<p>But the work has not finished yet. The paper assigned <a href=\"https://chem-bla-ics.blogspot.com/2022/09/nanomaterial-identifiers-erm-identifier.html\">ERM identifiers</a>\nto all included materials, and now these need to be added to new <a href=\"https://nanocommons.github.io/erm-database/\">ERM Identifier Database</a>\nunder development.</p>",
      "summary": "Making something FAIR is hard, particularly when you do more than making something findable. We’ve seen before that making something usefully findable requires deep indexing, and already that continues to be difficult, because we are not seeing it enough. So, when I thought convert a paper led by Hoet’s lab in Leuven into machine-actionable RDF to make it FAIR, I gravely underestimated the amount of work. Jeaphianne et al. did an awesome job on this work (doi:10.1186/s13321-024-00833-0).",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/13321_2024_833_Fig1_HTML.png",
      "date_published": "2024-05-20T00:00:00+00:00",
      "date_modified": "2024-05-20T00:00:00+00:00",
      "tags": ["fair","rdf","erm"],
      "_references": [{ "url": "https://doi.org/10.1186/S13321-024-00833-0" },{ "url": "https://doi.org/10.14573/ALTEX.2102191" },{ "url": "https://doi.org/10.3390/NANO10102068" },{ "url": "https://doi.org/10.1186/S13321-022-00614-7" },{ "url": "https://doi.org/10.3389/FPHY.2023.1271842" },{ "url": "https://doi.org/10.3762/BJNANO.6.165" },{ "url": "https://doi.org/10.1089/AIVT.2021.0010" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/s1hwk-vj154",
      "url": "https://chem-bla-ics.linkedchemistry.info/2024/05/18/cdk2024-2.html",
      "title": "cdk2024 #2: publishing grant proposals",
      "content_html": "<p>Publishing grant proposal is still not very common. The proposal published in Research Ideas and Outcomes)\n(doi:<a href=\"https://doi.org/10.3897/rio.10.e124884\">10.3897/rio.10.e124884</a>) for the\n<a href=\"/2024/04/07/cdk2024.html\">NWO Open Science grant for the CDK</a> is, however, not the first and hopefully not the last.\nInterestingly, it is already cited in (the German) Wikipedia. It is used <a href=\"https://de.wikipedia.org/wiki/Chemistry_Development_Kit\">there</a>\nto support a statement which tools use the Chemistry Development Kit.</p>",
      "summary": "Publishing grant proposal is still not very common. The proposal published in Research Ideas and Outcomes) (doi:10.3897/rio.10.e124884) for the NWO Open Science grant for the CDK is, however, not the first and hopefully not the last. Interestingly, it is already cited in (the German) Wikipedia. It is used there to support a statement which tools use the Chemistry Development Kit.",
      
      "date_published": "2024-05-18T00:00:00+00:00",
      "date_modified": "2024-05-18T00:00:00+00:00",
      "tags": ["cdk","grant","cdk2024"],
      "_references": [{ "url": "https://doi.org/10.3897/RIO.10.E124884" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/ytkmr-0vv92",
      "url": "https://chem-bla-ics.linkedchemistry.info/2024/04/07/cdk2024.html",
      "title": "cdk2024 #1: NWO Open Science grant for the Chemistry Development Kit",
      "content_html": "<p>We recently got awarded our <a href=\"https://chem-bla-ics.linkedchemistry.info/2022/03/05/bridgedb-nwo-grant-update-1-first-steps.html\">second <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nNWO Open Science grant (<a href=\"https://www.nwo.nl/en/projects/osf232097\">OSF23.2.097</a>),\nthis time for the <a href=\"https://cdk.github.io/\">Chemistry Development Kit</a> (CDK).\n“We” here is me and <a href=\"https://orcid.org/0000-0003-0896-0906\">Alyanne de Haan</a>, René van der Ploeg, and\n<a href=\"https://orcid.org/0000-0002-3496-6669\">Marc Teunis</a> from Hogeschool Utrecht.\nThe proposal has been submitted for public dissemination in <a href=\"https://riojournal.com/\">RIO Journal</a>, like\n<a href=\"http://localhost:4000/2022/04/17/bridgedb-nwo-grant-update-2-building-up.html\">we did <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nwith the first NWO Open Science grant.</p>\n\n<p>The project formally started on April 1 but we had our kick-off meeting in Maastricht on April 4-5.\nWe were joined by Javier and on the second day by Marvin, and Ozan from our <a href=\"https://www.maastrichtuniversity.nl/research/bioinformatics\">BiGCaT research group</a>\nin Maastricht. During this hackathon, I gave a (repeat) <a href=\"https://zenodo.org/records/6414204\">presentation</a>\nabout the history of the CDK which also included the problem that software using the CDK does not\nalways use the most recent version.</p>\n\n<p>And that, upgrading tools using the CDK with the latest CDK version, is the main topic of this grant (work package 2, WP2).\nThe full proposal has the focus list of tools, but most of it is also listed in\n<a href=\"https://github.com/cdk/nwo-openscience-2024/issues\">the issue tracker</a> we have set up as project\nmanagement tool on GitHub.</p>\n\n<p>Second, we actually hacked together on two first tools, one on our focus list, but the other that was\n<a href=\"https://github.com/cdk/nwo-openscience-2024/issues/22\">requested we have a look at too</a>: SMARTCyp.\nThe latest version uses <a href=\"https://www.rdkit.org/\">RDKit</a> (doi:<a href=\"https://doi.org/10.1093/bioinformatics/btz037\">10.1093/bioinformatics/btz037</a>),\nbut the original version uses the CDK (doi:<a href=\"https://doi.org/10.1021/ml100016x\">10.1021/ml100016x</a>).</p>\n\n<p>We downloaded the source code of SMARTCyp 2.4.2, started taking <a href=\"https://github.com/cdk/nwo-openscience-2024/blob/main/monitoring/smartcyp.md\">notes</a>,\nJavier <a href=\"https://github.com/cdk/smartcyp\">started</a> a Maven build environment, updated a lot of code, but we seem quite close to a version that can be tested by\npeople that have integrated SMARTCyp in other tools. This is based on <a href=\"https://github.com/cdk/cdk/releases/tag/cdk-2.9\">CDK 2.9</a>\nand if you ignore the 2D depiction glitch, it looks it was a nice first choice:</p>\n\n<p><img src=\"/assets/images/smartcyp.png\" alt=\"\" /></p>\n\n<p>On a final note, we plan to record carefully our steps, in an open notebook science approach, with\nthe intention to extract general upgrade steps. For example, we will update the\n<a href=\"https://egonw.github.io/cdkbook/migration.html\">Migration</a> section of the\n<a href=\"https://egonw.github.io/cdkbook/\">Groovy Cheminformatics with the Chemistry Development Kit</a>.</p>",
      "summary": "We recently got awarded our second NWO Open Science grant (OSF23.2.097), this time for the Chemistry Development Kit (CDK). “We” here is me and Alyanne de Haan, René van der Ploeg, and Marc Teunis from Hogeschool Utrecht. The proposal has been submitted for public dissemination in RIO Journal, like we did with the first NWO Open Science grant.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/smartcyp.png",
      "date_published": "2024-04-07T00:00:00+00:00",
      "date_modified": "2024-12-29T00:00:00+00:00",
      "tags": ["grant","cdk","cdk2024"],
      "_references": [{ "url": "https://doi.org/10.1093/bioinformatics/btz037" },{ "url": "https://doi.org/10.1021/ml100016x" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/n39kz-48173",
      "url": "https://chem-bla-ics.linkedchemistry.info/2024/04/02/open-science-retreat-2.html",
      "title": "Open Science Retreat #2: CiTO Nanopublications",
      "content_html": "<p>During the <a href=\"http://chem-bla-ics.linkedchemistry.info/2024/03/31/open-science-retreat-1.html\">Open Science Retreat</a> I organized\na short session where we looking into typing citation intentions using a new nanopublication template. First, let’s describe\nnanopublications (originally used in doi:<a href=\"https://doi.org/10.3233/ISU-2010-0613\">10.3233/ISU-2010-0613</a>) a bit.\nScholia gives <a href=\"https://scholia.toolforge.org/topic/Q57814310\">a nice overview of (macro?)publications on the topic</a>.\nThe <a href=\"https://nanopub.net/\">nanopub.net</a>\nwebsite describes that <em>[a nanopublication is a small knowledge graph snippet with metadata that is treated as an\nindependent (scientific) publication.]</em>. The knowledge graph, it continues, can be anything from an opinion to the link\nbetween a disease and a gene (doi:<a href=\"https://doi.org/10.1109/ESCIENCE.2018.00024\">10.1109/ESCIENCE.2018.00024</a>).</p>\n\n<p>Now, in this post I will document an update of how we can use nanopublications for citation intention annotation, and\ncompare this to existing solutions. I have been collecting and indexing the CiTO intention annotations in Wikidata and\nvisualizing the corpus with Scholia at <a href=\"https://scholia.toolforge.org/cito/\">scholia.toolforge.org/cito/</a>. There are\ncurrently 22 journal articles with explicit CiTO annoation, largely thanks to a <a href=\"https://www.biomedcentral.com/collections/cito\">Journal of Cheminformatics pilot</a>\n(e.g. see doi:<a href=\"https://doi.org/10.1186/s13321-023-00683-2\">10.1186/s13321-023-00683-2</a>). Recently,\nthe preprint/report server <a href=\"https://biohackrxiv.org/discover\">BioHackrXiv</a> started\n<a href=\"https://github.com/biohackrxiv/publication-template\">CiTO support</a> too, also visible in the statistics\non Scholia with another 17 papers. A third source is data sets from bibliometric-like studies, as explained\nin <a href=\"https://chem-bla-ics.linkedchemistry.info/2023/04/02/cito-updates-4-annotations-in-datasets.html\">this post <i class=\"fa-solid fa-recycle fa-xs\"></i></a>. Nanopublications\nwould be a fourth solution.</p>\n\n<p>So, why another solutions? Like the datasets, assuming DataCite approaches, have clear provenance, but the overhead\nof and needed time for creating a dataset with citation intent annotations can be limiting. And because nanopublications\ncan be linked to ORCID identifiers, we can even discover which citation intent annotations are created by the original\nauthors of articles. Another advantage is that nanopubs are basically RDF and we can query them easily, allowing\nthe citation intentions to migrate to Wikidata. Scholia already saw an update to recognize nanopublications as\na unique kind reference (see the new Wikidata property <a href=\"https://www.wikidata.org/wiki/Property:P12545\">Nanopublication identifier (P12545)</a>).</p>\n\n<h1 id=\"nanodash-template\">NanoDash template</h1>\n\n<p>So, if we can make it easy for people to define nanopublications with CiTO citation intent annotations, than we can\nstart formalizing intent annotations from a much wider range of use cases. For example, we can annotate historically\nimportant discussions. Anyone can retrospectively annotate all their own articles, making them more FAIR. And if we\nuse DOI links, then it no longer is limited to journal articles, but we can use of for software and data citations too.\nThis is where <a href=\"https://w3id.org/np/RAX_4tWTyjFpO6nz63s14ucuejd64t2mK3IBlkwZ7jjLo\">a recent template</a> comes in created by\n<a href=\"https://orcid.org/0000-0002-1267-0234\">Tobias Kuhn</a>, one of the main nanopub developers:</p>\n\n<p><img src=\"/assets/images/citoPub.png\" alt=\"\" /></p>\n\n<p>This nanopublication template defines the minimal needs of the assumptions, along with useful provenance and nanopub\ninfo. Basically, the assertion defines that one DOI is a ScholarlyWork and using the CiTO, defines that it cites\none or more article works (with DOI). For each citations, one can select any of the known CiTO intent types,\ne.g. ‘extends’ or ‘uses method’ in, as in <a href=\"https://w3id.org/np/RA6Rxk1sSOSWxM7A6gW4SjJZRVt4fbY6nShPTAbQ8kce8\">this nanopublication</a>\ncreated with this template:</p>\n\n<p><img src=\"/assets/images/citoPub2.png\" alt=\"\" /></p>\n\n<h2 id=\"sparql-ing-cito-annotations\">SPARQL-ing CiTO annotations</h2>\n\n<p>Besides the template, Tobias also started a SPARQL query to which I added restrictions that the citing and cited\nresources needs to have a DOI, giving us <a href=\"https://query.np.trustyuri.net/tools/type/2c1cce3f3152738c1009d59251409392aaaa3b0324bcb5fdfb4b7b944b8f0c18/yasgui.html#query=prefix+rdfs%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%0Aprefix+np%3A+%3Chttp%3A%2F%2Fwww.nanopub.org%2Fnschema%23%3E%0Aprefix+npa%3A+%3Chttp%3A%2F%2Fpurl.org%2Fnanopub%2Fadmin%2F%3E%0Aprefix+npx%3A+%3Chttp%3A%2F%2Fpurl.org%2Fnanopub%2Fx%2F%3E%0Aprefix+xsd%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2001%2FXMLSchema%23%3E%0Aprefix+dct%3A+%3Chttp%3A%2F%2Fpurl.org%2Fdc%2Fterms%2F%3E%0A%0Aselect+%3Fnp+%3Flabel+%3Fsubj+%3Fcitationrel+%3Fobj+%3Fdate+where+%7B%0A++graph+npa%3Agraph+%7B%0A++++%3Fnp+npa%3AhasValidSignatureForPublicKey+%3Fpubkey+.%0A++++%3Fnp+dct%3Acreated+%3Fdate+.%0A++++%3Fnp+np%3AhasAssertion+%3Fassertion+.%0A++++optional+%7B+%3Fnp+rdfs%3Alabel+%3Flabel+.+%7D%0A++++filter+not+exists+%7B+%3Fnpx+npx%3Ainvalidates+%3Fnp+%3B+npa%3AhasValidSignatureForPublicKey+%3Fpubkey+.+%7D%0A++++filter+not+exists+%7B+%3Fnp+npx%3AhasNanopubType+npx%3AExampleNanopub+.+%7D%0A++%7D%0A++graph+%3Fassertion+%7B%0A++++%3Fsubj+%3Fcitationrel+%3Fobj+.%0A++++filter(regex(str(%3Fcitationrel)%2C+%22%5Ehttp%3A%2F%2Fpurl.org%2Fspar%2Fcito%2F.*%24%22))%0A++++filter(regex(str(%3Fsubj)%2C+%22doi.org%2F10%22))%0A++++filter(regex(str(%3Fobj)%2C+%22doi.org%2F10%22))%0A++%7D%0A%7D%0A++&amp;contentTypeConstruct=text%2Fturtle&amp;contentTypeSelect=application%2Fsparql-results%2Bjson&amp;endpoint=%2Frepo%2Ftype%2F2c1cce3f3152738c1009d59251409392aaaa3b0324bcb5fdfb4b7b944b8f0c18&amp;requestMethod=POST&amp;tabTitle=Query&amp;headers=%7B%7D&amp;outputFormat=table\">this query</a>:</p>\n\n<div class=\"language-sparql highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">prefix</span><span class=\"w\"> </span><span class=\"nn\">rdfs</span><span class=\"o\">:</span><span class=\"w\"> </span><span class=\"nn\">&lt;http://www.w3.org/2000/01/rdf-schema#&gt;</span><span class=\"w\">\n</span><span class=\"k\">prefix</span><span class=\"w\"> </span><span class=\"nn\">np</span><span class=\"o\">:</span><span class=\"w\"> </span><span class=\"nn\">&lt;http://www.nanopub.org/nschema#&gt;</span><span class=\"w\">\n</span><span class=\"k\">prefix</span><span class=\"w\"> </span><span class=\"nn\">npa</span><span class=\"o\">:</span><span class=\"w\"> </span><span class=\"nn\">&lt;http://purl.org/nanopub/admin/&gt;</span><span class=\"w\">\n</span><span class=\"k\">prefix</span><span class=\"w\"> </span><span class=\"nn\">npx</span><span class=\"o\">:</span><span class=\"w\"> </span><span class=\"nn\">&lt;http://purl.org/nanopub/x/&gt;</span><span class=\"w\">\n</span><span class=\"k\">prefix</span><span class=\"w\"> </span><span class=\"nn\">xsd</span><span class=\"o\">:</span><span class=\"w\"> </span><span class=\"nn\">&lt;http://www.w3.org/2001/XMLSchema#&gt;</span><span class=\"w\">\n</span><span class=\"k\">prefix</span><span class=\"w\"> </span><span class=\"nn\">dct</span><span class=\"o\">:</span><span class=\"w\"> </span><span class=\"nn\">&lt;http://purl.org/dc/terms/&gt;</span><span class=\"w\">\n\n</span><span class=\"k\">select</span><span class=\"w\"> </span><span class=\"nv\">?np</span><span class=\"w\"> </span><span class=\"nv\">?label</span><span class=\"w\"> </span><span class=\"nv\">?subj</span><span class=\"w\"> </span><span class=\"nv\">?citationrel</span><span class=\"w\"> </span><span class=\"nv\">?obj</span><span class=\"w\"> </span><span class=\"nv\">?date</span><span class=\"w\"> </span><span class=\"k\">where</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\">\n  </span><span class=\"k\">graph</span><span class=\"w\"> </span><span class=\"nn\">npa</span><span class=\"o\">:</span><span class=\"ss\">graph</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\">\n    </span><span class=\"nv\">?np</span><span class=\"w\"> </span><span class=\"nn\">npa</span><span class=\"o\">:</span><span class=\"ss\">hasValidSignatureForPublicKey</span><span class=\"w\"> </span><span class=\"nv\">?pubkey</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n    </span><span class=\"nv\">?np</span><span class=\"w\"> </span><span class=\"nn\">dct</span><span class=\"o\">:</span><span class=\"ss\">created</span><span class=\"w\"> </span><span class=\"nv\">?date</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n    </span><span class=\"nv\">?np</span><span class=\"w\"> </span><span class=\"nn\">np</span><span class=\"o\">:</span><span class=\"ss\">hasAssertion</span><span class=\"w\"> </span><span class=\"nv\">?assertion</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n    </span><span class=\"k\">optional</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\"> </span><span class=\"nv\">?np</span><span class=\"w\"> </span><span class=\"nn\">rdfs</span><span class=\"o\">:</span><span class=\"ss\">label</span><span class=\"w\"> </span><span class=\"nv\">?label</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\"> </span><span class=\"p\">}</span><span class=\"w\">\n    </span><span class=\"k\">filter</span><span class=\"w\"> </span><span class=\"k\">not</span><span class=\"w\"> </span><span class=\"k\">exists</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\"> </span><span class=\"nv\">?npx</span><span class=\"w\"> </span><span class=\"nn\">npx</span><span class=\"o\">:</span><span class=\"ss\">invalidates</span><span class=\"w\"> </span><span class=\"nv\">?np</span><span class=\"w\"> </span><span class=\"p\">;</span><span class=\"w\"> </span><span class=\"nn\">npa</span><span class=\"o\">:</span><span class=\"ss\">hasValidSignatureForPublicKey</span><span class=\"w\"> </span><span class=\"nv\">?pubkey</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\"> </span><span class=\"p\">}</span><span class=\"w\">\n    </span><span class=\"k\">filter</span><span class=\"w\"> </span><span class=\"k\">not</span><span class=\"w\"> </span><span class=\"k\">exists</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\"> </span><span class=\"nv\">?np</span><span class=\"w\"> </span><span class=\"nn\">npx</span><span class=\"o\">:</span><span class=\"ss\">hasNanopubType</span><span class=\"w\"> </span><span class=\"nn\">npx</span><span class=\"o\">:</span><span class=\"ss\">ExampleNanopub</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\"> </span><span class=\"p\">}</span><span class=\"w\">\n  </span><span class=\"p\">}</span><span class=\"w\">\n  </span><span class=\"k\">graph</span><span class=\"w\"> </span><span class=\"nv\">?assertion</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\">\n    </span><span class=\"nv\">?subj</span><span class=\"w\"> </span><span class=\"nv\">?citationrel</span><span class=\"w\"> </span><span class=\"nv\">?obj</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n    </span><span class=\"k\">filter</span><span class=\"p\">(</span><span class=\"nb\">regex</span><span class=\"p\">(</span><span class=\"nb\">str</span><span class=\"p\">(</span><span class=\"nv\">?citationrel</span><span class=\"p\">),</span><span class=\"w\"> </span><span class=\"s2\">\"^http://purl.org/spar/cito/.*$\"</span><span class=\"p\">))</span><span class=\"w\">\n    </span><span class=\"k\">filter</span><span class=\"p\">(</span><span class=\"nb\">regex</span><span class=\"p\">(</span><span class=\"nb\">str</span><span class=\"p\">(</span><span class=\"nv\">?subj</span><span class=\"p\">),</span><span class=\"w\"> </span><span class=\"s2\">\"doi.org/10\"</span><span class=\"p\">))</span><span class=\"w\">\n    </span><span class=\"k\">filter</span><span class=\"p\">(</span><span class=\"nb\">regex</span><span class=\"p\">(</span><span class=\"nb\">str</span><span class=\"p\">(</span><span class=\"nv\">?obj</span><span class=\"p\">),</span><span class=\"w\"> </span><span class=\"s2\">\"doi.org/10\"</span><span class=\"p\">))</span><span class=\"w\">\n  </span><span class=\"p\">}</span><span class=\"w\">\n</span><span class=\"p\">}</span><span class=\"w\">\n</span></code></pre></div></div>\n\n<p>This includes 6 citation intentions defined by 4 nanopublications added during the Open Science Retreat:</p>\n\n<ul>\n  <li><a href=\"https://w3id.org/np/RAUjZE1JMu1GAvUQ_fZ4yc9-7sOSCT9xbeS0wYznkKtYk\">RAUjZE1JMu</a> by <a href=\"https://nanodash.knowledgepixels.com/explore?id=https%3A%2F%2Forcid.org%2F0000-0002-7192-1486\">me</a> for a paper by Marija Purgar</li>\n  <li><a href=\"https://nanodash.knowledgepixels.com/explore?id=RAXgI--5gcKskgrnOI1XZoA4b3hu9RbNj3bcc2Zxeos7c\">RAXgI–5gc</a> by <a href=\"https://nanodash.knowledgepixels.com/explore?id=https%3A%2F%2Forcid.org%2F0000-0003-2408-7588\">Christian Meesters</a></li>\n  <li><a href=\"https://nanodash.knowledgepixels.com/explore?id=RATZNhd3l_jN0y8GEi8mLIqy-uVV8tiUZIq2RJtkq6G8A\">RATZNhd3l_j</a> by <a href=\"https://nanodash.knowledgepixels.com/explore?id=https%3A%2F%2Forcid.org%2F0000-0003-4285-690X\">Taichi Oichi</a></li>\n  <li><a href=\"https://nanodash.knowledgepixels.com/explore?id=RA6Q6wxSYyWfA3XwpOBqSNFKgQpM7ZgdVBoU2kSD-CFjw\">RA6Q6wxSYy</a> by <a href=\"https://nanodash.knowledgepixels.com/explore?id=https%3A%2F%2Forcid.org%2F0000-0003-1559-1838\">Niklas Hohmann</a></li>\n</ul>\n\n<h1 id=\"from-nanopublications-to-wikidata\">From nanopublications to Wikidata</h1>\n\n<p>Now, this query also provides me with enough information to propagate the citation intent (a fact?) to Wikidata\nand cite the original nanopublication as reference. With a variation of the above SPARQL query, I can get the\nfive most recent new nanopublications, convert them to QuickStatements, and then enjoy them in Wikidata. This\nis written up in <a href=\"https://github.com/egonw/ons-wikidata/blob/main/Nanopubs/createQS.groovy\">this Bacting script</a>.</p>\n\n<p>The script needs to handle some situations. For example, it will not add items for DOIs not already in Wikidata.\nSo, if neither of the two DOIs are known in Wikidata, then nothing gets added. If they both are, then it will\nadd the citation intent. There are alternative solutions, but in practice that doesn’t matter and the QuickStatements\nis in all situations the same, and QuickStatements will only add the new information.</p>\n\n<p>This is what it will <a href=\"https://www.wikidata.org/wiki/Q113312162#P2860\">look like in Wikidata</a>:</p>\n\n<p><img src=\"/assets/images/citoPub3.png\" alt=\"\" /></p>\n\n<p>And this is <a href=\"https://scholia.toolforge.org/cito/#articles\">what it looks</a> (yellow) when we compare the contributions\nfrom nanopublications now with the other sources:</p>\n\n<p><img src=\"/assets/images/citoPubs4.png\" alt=\"\" /></p>",
      "summary": "During the Open Science Retreat I organized a short session where we looking into typing citation intentions using a new nanopublication template. First, let’s describe nanopublications (originally used in doi:10.3233/ISU-2010-0613) a bit. Scholia gives a nice overview of (macro?)publications on the topic. The nanopub.net website describes that [a nanopublication is a small knowledge graph snippet with metadata that is treated as an independent (scientific) publication.]. The knowledge graph, it continues, can be anything from an opinion to the link between a disease and a gene (doi:10.1109/ESCIENCE.2018.00024).",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/citoPub.png",
      "date_published": "2024-04-02T00:00:00+00:00",
      "date_modified": "2024-11-02T00:00:00+00:00",
      "tags": ["osr24nl","openscience","cito","nanopub","wikidata"],
      "_references": [{ "url": "https://doi.org/10.3233/ISU-2010-0613" },{ "url": "https://doi.org/10.1109/ESCIENCE.2018.00024" },{ "url": "https://doi.org/10.1186/S13321-023-00683-2" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/znw1y-zfg25",
      "url": "https://chem-bla-ics.linkedchemistry.info/2024/03/31/open-science-retreat-1.html",
      "title": "Open Science Retreat #1: impressions",
      "content_html": "<p>Last week I attended the <a href=\"https://openscienceretreat.eu/\">Open Science Retreat</a> (<a href=\"https://hashtags-hub.toolforge.org/osr24nl\">#osr24nl</a>)\nin a quite and relaxing region in North-Holland. The meeting was how I like all meetings to be (and I count myself lucky many of my meetings\nare like this): open, welcoming, constructive, diverse, and intellectually challenging. Not all scientific meetings are like this\nand it is easy to end up going to obligatory meetings where the discussions are of a different level. Therefore, great thanks to\nthe organizers, but also to all participants, that showed not just to have a hearth for open science (getting pretty common),\nbut also a drive to advocate for open science. Finally, I like to thank the people that joined me in creating nanopublications for\nCiTO annotations (will blog about that later), and <a href=\"https://twitter.com/marija_purgar/status/1773745895508451573\">to Sadik and Marija</a>\nwith whom we worked on exploring using Wikibase for capturing knowledge about research waste in ecology (more about that later too).</p>",
      "summary": "Last week I attended the Open Science Retreat (#osr24nl) in a quite and relaxing region in North-Holland. The meeting was how I like all meetings to be (and I count myself lucky many of my meetings are like this): open, welcoming, constructive, diverse, and intellectually challenging. Not all scientific meetings are like this and it is easy to end up going to obligatory meetings where the discussions are of a different level. Therefore, great thanks to the organizers, but also to all participants, that showed not just to have a hearth for open science (getting pretty common), but also a drive to advocate for open science. Finally, I like to thank the people that joined me in creating nanopublications for CiTO annotations (will blog about that later), and to Sadik and Marija with whom we worked on exploring using Wikibase for capturing knowledge about research waste in ecology (more about that later too).",
      
      "date_published": "2024-03-31T00:00:00+00:00",
      "date_modified": "2024-03-31T00:00:00+00:00",
      "tags": ["osr24nl","openscience","wikibase","cito","nanopub"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/zds99-03s42",
      "url": "https://chem-bla-ics.linkedchemistry.info/2024/03/17/two-papers.html",
      "title": "Reusing data: two new papers",
      "content_html": "<p>My research is about the interaction of (machine) representation and the impact on the success of\ndata analysis (matchine learning, chemometrics, AI, etc). See the posts\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2010/08/09/molecular-chemometrics-principles-1.html\">about</a>\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2010/08/12/molecular-chemometrics-principles-2-be.html\">molecular</a>\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2010/08/14/molecular-chemometrics-principles-3.html\">chemometrics</a>.\nThis got me into <a href=\"https://chem-bla-ics.linkedchemistry.info/tag/fair\">FAIR</a>: making data interoperable\nand being able to (really) reuse data is the starting point of doing research.</p>\n\n<p>So, when I get the chance to see something where I worked on to make more FAIR actually being used,\nI love to push the boundaries of FAIR a bit extra. The study of representation of molecules and molecular\nsystems is not quite a popular science, but I find it important. Two new papers got recently published\nto which I contributed from this perspective.</p>\n\n<p>The first paper by Anna Niarakis <i>et al.</i> is about using the SARS-CoV-2/COVID-19 knowledge base we\nhave collected of the past 4 years (doi:<a href=\"https://doi.org/10.3389/fimmu.2023.1282859\">10.3389/fimmu.2023.1282859</a>).\nFor me, this started with a WikiPathways with early knowledge about the virus proteins. I think\nin this and earlier papers, we improved our open science and bioinformatics and are actually\nmore ready for a next pandemic, which inevitably will come.</p>\n\n<p>The second paper by Alfaro Serrano <i>et al.</i> is about how access to data remains key to many\nthings, and this, obviously, includes the Sustainable Development Goals (SDGs)\n(doi:<a href=\"https://doi.org/10.1039/D3SU00148B\">10.1039/D3SU00148B</a>). When it comes down\nto the face/off of FAIR versus Open, I think Open has more impact, hands-down.</p>\n\n<p>About the latter, I recently wrote up ten simple actions you can take to make your\nnanosafety research output more FAIR (doi:<a href=\"https://doi.org/10.5281/zenodo.10533126\">10.5281/zenodo.10533126</a>).</p>",
      "summary": "My research is about the interaction of (machine) representation and the impact on the success of data analysis (matchine learning, chemometrics, AI, etc). See the posts about molecular chemometrics. This got me into FAIR: making data interoperable and being able to (really) reuse data is the starting point of doing research.",
      
      "date_published": "2024-03-17T00:00:00+00:00",
      "date_modified": "2024-03-17T00:00:00+00:00",
      "tags": ["covid19","fair","nanosafety","nanocommons"],
      "_references": [{ "url": "https://doi.org/10.3389/FIMMU.2023.1282859" },{ "url": "https://doi.org/10.1039/D3SU00148B" },{ "url": "https://doi.org/10.5281/ZENODO.10533126" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/57rv7-5m756",
      "url": "https://chem-bla-ics.linkedchemistry.info/2024/02/13/wikidata-subsetting.html",
      "title": "New paper: &quot;Wikidata subsetting: approaches, tools, and evaluation&quot;",
      "content_html": "<p>Just before the end of the year, the <em>Wikidata subsetting: approaches, tools, and evaluation</em> paper\nby Seyed Amir Hosseini Beghaeiraveri <em>et al.</em> got published (doi:<a href=\"https://doi.org/10.3233/SW-233491\">10.3233/SW-233491</a>).\nI am really excited our group (i.e.\n<a href=\"https://orcid.org/0000-0002-8399-8990\">Ammar</a> and <a href=\"https://orcid.org/0000-0001-8449-1318\">Denise</a>)\nhas been able to contribute to this. I think it also is a great example\nof the power of hackathons to bring together people.</p>\n\n<p>To me, subsetting of Wikidata (or any large knowledge graph) is important for a couple of reasons.\nFirst, there can be practical reasons. Scholia, for example, is computationally expensive, and the idea\nwe explore in the Alfred P. Sloan Foundation grant for Scholia (doi:<a href=\"https://doi.org/10.3897/rio.5.e35820\">10.3897/rio.5.e35820</a>)\nwas that a subset of Wikidata would make it more performant and potentially\nmore environmental-friendly.</p>\n\n<p>A second reason is more about the scientific process. When doing an analysis and when you want to make\nthe reasoning transparent, you want to share the analyzed data as part of the research output (basically, the “data”).\nFor example, the data may have undergone some curation, or you combined data from two or more different\nsources. And you will want to share this as part of the scientific process. Resharing a full dump\nof the larger knowledge base would not be practical for at least two reasons: duplication of huge data,\nand a lot of unrelated content makes it hard for peers to find the bits of interest to the study.</p>\n\n<p>Subsetting may be useful here. This paper evaluates a number of different subsetting approaches.\nMyself, I am particularly excited about the idea that we can take a shape expression (e.g. <a href=\"https://shex.io\">ShEx</a>)\nas input. I still love the idea that I take the SPARQL queries in my analyses, convert that into\nshapes automatically, and then get a subet that returns the exact same results as the query would\non the full dataset.</p>",
      "summary": "Just before the end of the year, the Wikidata subsetting: approaches, tools, and evaluation paper by Seyed Amir Hosseini Beghaeiraveri et al. got published (doi:10.3233/SW-233491). I am really excited our group (i.e. Ammar and Denise) has been able to contribute to this. I think it also is a great example of the power of hackathons to bring together people.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/wikidata_subsetting_features.png",
      "date_published": "2024-02-13T00:00:00+00:00",
      "date_modified": "2024-02-13T00:00:00+00:00",
      "tags": ["wikidata","scholia"],
      "_references": [{ "url": "https://doi.org/10.3233/SW-233491" },{ "url": "https://doi.org/10.3897/RIO.5.E35820" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/xcvg3-37491",
      "url": "https://chem-bla-ics.linkedchemistry.info/2024/01/07/phd-defences.html",
      "title": "PhD Defences: Andra Waagmeester and Marvin Martens",
      "content_html": "<p>2023 has been a long year in which a lot happens. Two EU projects ended (<a href=\"https://riskgone.eu/\">RiskGONE</a>\nand <a href=\"https://nanosolveit.eu/\">NanoSolveIT</a>; more about that in a\nlater post), our group leader <a href=\"https://scholia.toolforge.org/author/Q19845641\">Chris Evelo</a> will retire this year,\nthe <a href=\"https://elixir-europe.org/communities/toxicology\">ELIXIR Toxicology Community</a> started (see\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2023/06/11/community-activity-2-fairsharing.html\">this post</a>), the\n<a href=\"https://www.wikipathways.org/\">new WikiPathways website</a> launched (see <a href=\"/2023/11/11/wikipathways-nar.html\">this post</a>),\nand a lot, lot more.</p>\n\n<p>But this post is about the upcoming PhD defences of <a href=\"https://scholia.toolforge.org/author/Q19845625\">Andra Waagmeester</a>\nand <a href=\"https://scholia.toolforge.org/author/Q42369611\">Marvin Martens</a>:</p>\n\n<ul>\n  <li><a href=\"https://www.maastrichtuniversity.nl/events/phd-defence-andra-sachinder-waagmeester\">January 16, 16:00</a>: Andra Waagmeester\non “Biological Pathway Abstractions: From Two-Dimensional Drawings to Multidimensional Linked Data”</li>\n  <li><a href=\"https://www.maastrichtuniversity.nl/events/phd-defence-marvin-tlj-martens\">January 29, 16:00</a>: Marvin Martens\non “Adverse Outcome Pathways Coming to Life Exploring New Ways to Support Risk Assessments”</li>\n</ul>\n\n<p>Both meetings have a minisymposium in the morning, related to their thesis topics. I am very much looking forward\nto these meetings. It’s hard to summarize in a few words what they contributed to open science in general and to\nthe data sciences in biology. So, I rather invite you to join the afternoon PhD defences. I think the PhD theses\nwill become freely avialable after the defence, but you can always check the literature lists on their \nabove linked Scholia pages.</p>\n\n<p>Or ask them questions on Mastodon: <a href=\"https://social.edu.nl/@Marvin\">@Marvin</a> and <a href=\"https://genomic.social/@Andrawaag\">@Andrawaag</a>.</p>",
      "summary": "2023 has been a long year in which a lot happens. Two EU projects ended (RiskGONE and NanoSolveIT; more about that in a later post), our group leader Chris Evelo will retire this year, the ELIXIR Toxicology Community started (see this post), the new WikiPathways website launched (see this post), and a lot, lot more.",
      
      "date_published": "2024-01-07T00:00:00+00:00",
      "date_modified": "2024-01-07T00:00:00+00:00",
      "tags": ["bigcat"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/8pkga-q4n03",
      "url": "https://chem-bla-ics.linkedchemistry.info/2023/11/11/wikipathways-nar.html",
      "title": "New paper: &quot;WikiPathways 2024: next generation pathway database&quot;",
      "content_html": "<p>This week the next <a href=\"https://wikipathways.org/\">WikiPathways</a> <a href=\"https://academic.oup.com/nar/search-results?f_TocHeadingTitle=Database+Issue\">NAR Database</a>\nissue paper was published (doi:<a href=\"https://doi.org/10.1093/nar/gkad960\">10.1093/nar/gkad960</a>). It is the next\npaper in a series of papers about the evolution of the Open Science project for\nmaking biological pathways available in a Open and FAIR way. This year, it described\nthat significant move away from <a href=\"https://en.wikipedia.org/wiki/MediaWiki\">MediaWiki</a>.\nIt simply was too costly to keep up with the upstream code base (think: more than 200\nthousand euro costly). This paper describes a transition to a modular system with\n<a href=\"https://en.wikipedia.org/wiki/Jekyll_(software)\">Jekyll</a> and Markdown as\nnew platform technologies. The full details are available as open notebook science:\neverything is basically a git repository.</p>\n\n<p>The is the workflow of what the new platform does when a new pathway (version) gets\nadded to WikiPathways:</p>\n\n<p><img src=\"/assets/images/wp-gpml-change-workflow.png\" alt=\"Workflow that is triggered by an added or changed GPML file, eventually triggering an update of the website.\" /></p>\n\n<p>The upgrade of the whole stack is, however, in full swing. Not everything has\nmigrated yet and the RDF generation is not for example.</p>",
      "summary": "This week the next WikiPathways NAR Database issue paper was published (doi:10.1093/nar/gkad960). It is the next paper in a series of papers about the evolution of the Open Science project for making biological pathways available in a Open and FAIR way. This year, it described that significant move away from MediaWiki. It simply was too costly to keep up with the upstream code base (think: more than 200 thousand euro costly). This paper describes a transition to a modular system with Jekyll and Markdown as new platform technologies. The full details are available as open notebook science: everything is basically a git repository.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/wp-gpml-change-workflow.png",
      "date_published": "2023-11-11T00:00:00+00:00",
      "date_modified": "2023-11-11T00:00:00+00:00",
      "tags": ["wikipathways","git"],
      "_references": [{ "url": "https://doi.org/10.1093/NAR/GKAD960" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/dtyms-yt012",
      "url": "https://chem-bla-ics.linkedchemistry.info/2023/09/24/ai.html",
      "title": "Artificial intelligence for natural product drug discovery",
      "content_html": "<p>Two weeks ago the write up of a week-long scientific discussions around artificial intelligence for natural product drug discovery\nin Leiden at the <a href=\"https://www.lorentzcenter.nl/\">Lorentz Center</a> got published\n(doi:<a href=\"https://doi.org/10.1038/s41573-023-00774-7\">10.1038/s41573-023-00774-7</a>, <a href=\"https://cris.maastrichtuniversity.nl/en/publications/artificial-intelligence-for-natural-product-drug-discovery\">free PDF</a>).</p>\n\n<p><img src=\"/assets/images/ai.png\" alt=\"Part of the copyrighted Figure 1 from the article. I hope this counts as fair use.\" /></p>\n\n<p>Sadly, the meetings was still during the (partial) lockdown, and I think my contribution could have been\nmore extensive. But I am happy I got to pitch the idea of using Wikidata in this area too, taking advantage\nof the work done by the LOTUS (doi:<a href=\"https://doi.org/10.7554/eLife.70780\">10.7554/eLife.70780</a>) team earlier.</p>\n\n<p>And this is key to me: you cannot do statistics, chemometrics, machine learning, or artificial\nintelligence without good quality linked data. Happy reading!</p>",
      "summary": "Two weeks ago the write up of a week-long scientific discussions around artificial intelligence for natural product drug discovery in Leiden at the Lorentz Center got published (doi:10.1038/s41573-023-00774-7, free PDF).",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/ai.png",
      "date_published": "2023-09-24T00:00:00+00:00",
      "date_modified": "2024-03-18T00:00:00+00:00",
      "tags": ["cheminf","natprod"],
      "_references": [{ "url": "https://doi.org/10.1038/s41573-023-00774-7" },{ "url": "https://doi.org/10.7554/eLife.70780" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/7zf38-w9670",
      "url": "https://chem-bla-ics.linkedchemistry.info/2023/09/17/using-fair-for-reuse.html",
      "title": "Using FAIR to select data for reuse",
      "content_html": "<p>This paper got published in July already, but I had not had the time yet to blog about this exciting work by\n<a href=\"https://scholia.toolforge.org/author/Q92131000\">Irini Furxhi</a> and <a href=\"https://scholia.toolforge.org/author/Q86442640\">Ammar Ammar</a>:\n<em>A data reusability assessment in the nanosafety domain based on the NSDRA framework followed by an exploratory\nquantitative structure activity relationships (QSAR) modeling targeting cellular viability</em>\n(doi:<a href=\"https://doi.org/10.1016/j.impact.2023.100475\">10.1016/j.impact.2023.100475</a>)</p>\n\n<p>The study has two sides to it: first, it looks into how far we are with <a href=\"https://en.wikipedia.org/wiki/Quantitative_structure%E2%80%93activity_relationship\">QSAR</a>\nin the field of nanosafety. We have limited data, but this paper got together 34 data sets, and in the model building\nmany different possible factors are explored. Now, as a scholar, I would really want to know which factors are\nreally important. We have been studying this for some time, e.g. in the past RRegrs paper\n(doi:<a href=\"https://doi.org/10.1186/S13321-015-0094-2\">10.1186/S13321-015-0094-2</a>). Basically, I think we still\ndon’t really understand the relation between the data characteristics and the modelling options. When is\ndata rich enough to move from classification to regression? How much (many) exerimental data do we need,\nfor the model to capture a certain applicability domain sufficiently?</p>\n\n<p>Actually, I think the rise of deep learning approaches shows us a few things: more data actually does help.\nBut also, with enough data, the representation becomes less important for the overall pattern. There are\neven hints that deep learning needs a certain level of noise. Did anyone study that phenomenon yet?</p>\n\n<p>Now, the reader of this paper will not be disappointed. The design is complex and there are many small hints\nabout what worked and what did not. But this gets us to the other side of this story.</p>\n\n<p>The second side of this paper is the question whether the level of FAIR-ness helps this QSAR modelling.\nEarlier, Ammar studied the R1.3 aspects of nanosafety research. The R1.3 guiding principle expects that\n<a href=\"https://www.go-fair.org/fair-principles/r1-3-metadata-meet-domain-relevant-community-standards/\">(Meta)data meet domain-relevant community standards</a>.\nAmmar’s research (preprint doi:<a href=\"https://doi.org/10.26434/CHEMRXIV-2022-L8VK8-V2\">10.26434/CHEMRXIV-2022-L8VK8-V2</a>)\nshows we can link this to actual reuse, where QSAR is one of those use cases.\nIn their July paper, they show how we can integrate the use of the community standards\nin a reproducible way to support nanosafety research.</p>\n\n<p>The following screenshot from the article (Figure 2, CC-BY) shows the relation between R1.3 maturity\nindicators and QSAR variables:</p>\n\n<p><img src=\"/assets/images/qsar-maturity-indicators.jpg\" alt=\"\" /></p>\n\n<p>I think Furxhi and Ammar may actually have introduced a new community standard: this is how nanoQSAR\nresearch should be done from now on. Irini and Ammar, thanks for this great collaboration!</p>",
      "summary": "This paper got published in July already, but I had not had the time yet to blog about this exciting work by Irini Furxhi and Ammar Ammar: A data reusability assessment in the nanosafety domain based on the NSDRA framework followed by an exploratory quantitative structure activity relationships (QSAR) modeling targeting cellular viability (doi:10.1016/j.impact.2023.100475)",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/qsar-maturity-indicators.jpg",
      "date_published": "2023-09-17T00:00:00+00:00",
      "date_modified": "2023-09-17T00:00:00+00:00",
      "tags": ["fair","qsar"],
      "_references": [{ "url": "https://doi.org/10.1016/J.IMPACT.2023.100475" },{ "url": "https://doi.org/10.1186/S13321-015-0094-2" },{ "url": "https://doi.org/10.26434/CHEMRXIV-2022-L8VK8-V2" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/pn744-knt64",
      "url": "https://chem-bla-ics.linkedchemistry.info/2023/09/09/making-bridgedb-derby-files-with-groovy.html",
      "title": "Making BridgeDb Derby files with Groovy",
      "content_html": "<p>I just want to drop this here. There are various ways to make <a href=\"https://www.bridgedb.org/\">BridgeDb</a> identifier mapping files. Some of the tools\npredate my joining the BiGCaT research group and the BridgeDb project, but this Groovy page is basically what we\nhave been using to create the metabolite identifier mapping databases:</p>\n\n<div class=\"language-groovy highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nd\">@Grab</span><span class=\"o\">(</span><span class=\"n\">group</span><span class=\"o\">=</span><span class=\"s1\">'org.bridgedb'</span><span class=\"o\">,</span> <span class=\"n\">module</span><span class=\"o\">=</span><span class=\"s1\">'org.bridgedb.bio'</span><span class=\"o\">,</span> <span class=\"n\">version</span><span class=\"o\">=</span><span class=\"s1\">'3.0.23'</span><span class=\"o\">)</span>\n<span class=\"nd\">@Grab</span><span class=\"o\">(</span><span class=\"n\">group</span><span class=\"o\">=</span><span class=\"s1\">'org.bridgedb'</span><span class=\"o\">,</span> <span class=\"n\">module</span><span class=\"o\">=</span><span class=\"s1\">'org.bridgedb.rdb.construct'</span><span class=\"o\">,</span> <span class=\"n\">version</span><span class=\"o\">=</span><span class=\"s1\">'3.0.23'</span><span class=\"o\">)</span>\n\n<span class=\"kn\">import</span> <span class=\"nn\">java.text.SimpleDateFormat</span><span class=\"o\">;</span>\n<span class=\"kn\">import</span> <span class=\"nn\">java.util.Date</span><span class=\"o\">;</span>\n\n<span class=\"kn\">import</span> <span class=\"nn\">org.bridgedb.IDMapperException</span><span class=\"o\">;</span>\n<span class=\"kn\">import</span> <span class=\"nn\">org.bridgedb.DataSource</span><span class=\"o\">;</span>\n<span class=\"kn\">import</span> <span class=\"nn\">org.bridgedb.Xref</span><span class=\"o\">;</span>\n<span class=\"kn\">import</span> <span class=\"nn\">org.bridgedb.bio.DataSourceTxt</span><span class=\"o\">;</span>\n<span class=\"kn\">import</span> <span class=\"nn\">org.bridgedb.rdb.construct.DBConnector</span><span class=\"o\">;</span>\n<span class=\"kn\">import</span> <span class=\"nn\">org.bridgedb.rdb.construct.DataDerby</span><span class=\"o\">;</span>\n<span class=\"kn\">import</span> <span class=\"nn\">org.bridgedb.rdb.construct.GdbConstruct</span><span class=\"o\">;</span>\n<span class=\"kn\">import</span> <span class=\"nn\">org.bridgedb.rdb.construct.GdbConstructImpl4</span><span class=\"o\">;</span>\n\n<span class=\"n\">DataSourceTxt</span><span class=\"o\">.</span><span class=\"na\">init</span><span class=\"o\">()</span>\n\n<span class=\"n\">GdbConstruct</span> <span class=\"n\">database</span> <span class=\"o\">=</span> <span class=\"n\">GdbConstructImpl4</span><span class=\"o\">.</span><span class=\"na\">createInstance</span><span class=\"o\">(</span>\n  <span class=\"s2\">\"test\"</span><span class=\"o\">,</span> <span class=\"k\">new</span> <span class=\"n\">DataDerby</span><span class=\"o\">(),</span> <span class=\"n\">DBConnector</span><span class=\"o\">.</span><span class=\"na\">PROP_RECREATE</span>\n<span class=\"o\">);</span>\n<span class=\"n\">database</span><span class=\"o\">.</span><span class=\"na\">createGdbTables</span><span class=\"o\">();</span>\n<span class=\"n\">database</span><span class=\"o\">.</span><span class=\"na\">preInsert</span><span class=\"o\">();</span>\n\n<span class=\"n\">inchikeyDS</span> <span class=\"o\">=</span> <span class=\"n\">DataSource</span><span class=\"o\">.</span><span class=\"na\">getExistingBySystemCode</span><span class=\"o\">(</span><span class=\"s2\">\"Ik\"</span><span class=\"o\">)</span>\n<span class=\"n\">lmDS</span> <span class=\"o\">=</span> <span class=\"n\">DataSource</span><span class=\"o\">.</span><span class=\"na\">getExistingBySystemCode</span><span class=\"o\">(</span><span class=\"s2\">\"Lm\"</span><span class=\"o\">)</span>\n<span class=\"n\">swisslipidsDS</span> <span class=\"o\">=</span> <span class=\"n\">DataSource</span><span class=\"o\">.</span><span class=\"na\">getExistingBySystemCode</span><span class=\"o\">(</span><span class=\"s2\">\"Sl\"</span><span class=\"o\">)</span>\n\n<span class=\"n\">String</span> <span class=\"n\">dateStr</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"n\">SimpleDateFormat</span><span class=\"o\">(</span><span class=\"s2\">\"yyyyMMdd\"</span><span class=\"o\">).</span><span class=\"na\">format</span><span class=\"o\">(</span><span class=\"k\">new</span> <span class=\"n\">Date</span><span class=\"o\">());</span>\n<span class=\"n\">database</span><span class=\"o\">.</span><span class=\"na\">setInfo</span><span class=\"o\">(</span><span class=\"s2\">\"BUILDDATE\"</span><span class=\"o\">,</span> <span class=\"n\">dateStr</span><span class=\"o\">);</span>\n<span class=\"n\">database</span><span class=\"o\">.</span><span class=\"na\">setInfo</span><span class=\"o\">(</span><span class=\"s2\">\"DATASOURCENAME\"</span><span class=\"o\">,</span> <span class=\"s2\">\"LIPIDMAPS_SWISSLIPIDS\"</span><span class=\"o\">);</span>\n<span class=\"n\">database</span><span class=\"o\">.</span><span class=\"na\">setInfo</span><span class=\"o\">(</span><span class=\"s2\">\"DATASOURCEVERSION\"</span><span class=\"o\">,</span> <span class=\"s2\">\"LIPID_TEST\"</span><span class=\"o\">);</span>\n<span class=\"n\">database</span><span class=\"o\">.</span><span class=\"na\">setInfo</span><span class=\"o\">(</span><span class=\"s2\">\"DATATYPE\"</span><span class=\"o\">,</span> <span class=\"s2\">\"Metabolite\"</span><span class=\"o\">);</span>\n<span class=\"n\">database</span><span class=\"o\">.</span><span class=\"na\">setInfo</span><span class=\"o\">(</span><span class=\"s2\">\"SERIES\"</span><span class=\"o\">,</span> <span class=\"s2\">\"standard_metabolite\"</span><span class=\"o\">);</span>\n\n<span class=\"n\">ref1</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"n\">Xref</span><span class=\"o\">(</span><span class=\"s2\">\"YECLLIMZHNYFCK-RRNJGNTNSA-J\"</span><span class=\"o\">,</span> <span class=\"n\">inchikeyDS</span><span class=\"o\">,</span> <span class=\"kc\">true</span><span class=\"o\">);</span>\n<span class=\"n\">ref2</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"n\">Xref</span><span class=\"o\">(</span><span class=\"s2\">\"LMFA07050035\"</span><span class=\"o\">,</span> <span class=\"n\">lmDS</span><span class=\"o\">,</span> <span class=\"kc\">false</span><span class=\"o\">);</span>\n<span class=\"n\">database</span><span class=\"o\">.</span><span class=\"na\">addGene</span><span class=\"o\">(</span><span class=\"n\">ref1</span><span class=\"o\">)</span>\n<span class=\"n\">database</span><span class=\"o\">.</span><span class=\"na\">addGene</span><span class=\"o\">(</span><span class=\"n\">ref2</span><span class=\"o\">)</span>\n<span class=\"n\">database</span><span class=\"o\">.</span><span class=\"na\">addLink</span><span class=\"o\">(</span><span class=\"n\">ref1</span><span class=\"o\">,</span> <span class=\"n\">ref1</span><span class=\"o\">)</span>\n<span class=\"n\">database</span><span class=\"o\">.</span><span class=\"na\">addLink</span><span class=\"o\">(</span><span class=\"n\">ref1</span><span class=\"o\">,</span> <span class=\"n\">ref2</span><span class=\"o\">)</span>\n\n<span class=\"n\">ref3</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"n\">Xref</span><span class=\"o\">(</span><span class=\"s2\">\"SLM:000000493\"</span><span class=\"o\">,</span> <span class=\"n\">swisslipidsDS</span><span class=\"o\">,</span> <span class=\"kc\">true</span><span class=\"o\">);</span>\n<span class=\"n\">database</span><span class=\"o\">.</span><span class=\"na\">addGene</span><span class=\"o\">(</span><span class=\"n\">ref3</span><span class=\"o\">)</span>\n<span class=\"n\">database</span><span class=\"o\">.</span><span class=\"na\">addLink</span><span class=\"o\">(</span><span class=\"n\">ref1</span><span class=\"o\">,</span> <span class=\"n\">ref3</span><span class=\"o\">)</span>\n\n<span class=\"n\">database</span><span class=\"o\">.</span><span class=\"na\">commit</span><span class=\"o\">();</span>\n<span class=\"n\">database</span><span class=\"o\">.</span><span class=\"na\">finalize</span><span class=\"o\">();</span>\n</code></pre></div></div>\n\n<p>For the people who have worked with BridgeDb Java in the past, note the new SQL schema 4, as used by the\n<code class=\"language-plaintext highlighter-rouge\">GdbConstructImpl4</code>. This schema allows indicating of an identifiers is outdated/retired/etc. This is\nactually the case for the <code class=\"language-plaintext highlighter-rouge\">LMFA07050035</code> identifiers, and hence the <code class=\"language-plaintext highlighter-rouge\">false</code> parameter in the <code class=\"language-plaintext highlighter-rouge\">new Xref()</code>\ncall.</p>",
      "summary": "I just want to drop this here. There are various ways to make BridgeDb identifier mapping files. Some of the tools predate my joining the BiGCaT research group and the BridgeDb project, but this Groovy page is basically what we have been using to create the metabolite identifier mapping databases:",
      
      "date_published": "2023-09-09T00:00:00+00:00",
      "date_modified": "2023-09-09T00:00:00+00:00",
      "tags": ["groovy","bridgedb"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/zqtdm-66432",
      "url": "https://chem-bla-ics.linkedchemistry.info/2023/09/09/ACSFall2023.html",
      "title": "American Chemical Society Fall 2023 meeting",
      "content_html": "<p>About four weeks ago the <a href=\"https://www.acs.org/meetings/acs-meetings/fall-2023.html\">Fall 2023 American Chemical Society</a>\nmeeting (<a href=\"https://mastodon.social/tags/ACSFall2023\">#ACSFall2023</a>).\nI have attended a few ACS meetings in person and even organized a <a href=\"https://egonw.github.io/acsrdf2010/\">symposium at the 2010 ACS meeting</a>\nin Boston. This time too, I did not participate in person, tho visiting San Francisco again would have been nice. I gave\n<a href=\"https://mastodon.social/@egonw@social.edu.nl/110882509829434765\">two</a> <a href=\"https://mastodon.social/@egonw@social.edu.nl/110883271752255923\">presentations</a>\n(slides doi:<a href=\"https://doi.org/10.5281/zenodo.8255394\">10.5281/zenodo.8255394</a>), but have not uploaded my slides of the first presentation to Zenodo yet.</p>\n\n<p>The theme of the meeting was data, and this resulted in a wealth of presentations with cheminformatics. What is striking\nhere is that a lot of work has not changed so much in 20 years, except for the scale. What I missed here was the large open\ndata sets, but generally the level of open science was heartwarming! So many preprints mentions, GitHub repositories, and Zenodo\ndeposits. The Blue Obelisk was truly ahead of its time, but it is a delight to see the field of chemistry catch up.\nI can now say a lot of about peer review, and why the field is not benefitting from all the experience that exists in the field\nbecause people publish in the wrong journals, but that is for another time.</p>\n\n<p>I attended multiple sessions, which is a bit of a challenge, doing this remotely from Central European Summer Time (CEST).\nOf course, the Sunday started with the <a href=\"https://acs.digitellinc.com/sessions/574129/view\">Chemical informatics (R)evolution: Towards Democratization and Open Science</a>\nsession, where I had my first talk, and later that day the <a href=\"https://acs.digitellinc.com/sessions/573932/view\">Enhance your Data - Smart Ways to Metadata and Knowledge Graphs</a> session,\nwhere I gave a second talk, about <a href=\"https://bioschemas.org/\">Bioschemas</a>’ <code class=\"language-plaintext highlighter-rouge\">ChemicalSubstance</code> and <code class=\"language-plaintext highlighter-rouge\">MolecularEntity</code>. Sadly, I\nhad to leave that meeting early because it was getting too late.</p>\n\n<p>There were so many interesting sessions, I could not attend everything. I also have to go back to all\n<a href=\"https://mastodon.social/tags/ACSFall2023\">my notes</a> and isolate things I want to follow up on, prominently open datasets.</p>\n\n<p>More later.</p>",
      "summary": "About four weeks ago the Fall 2023 American Chemical Society meeting (#ACSFall2023). I have attended a few ACS meetings in person and even organized a symposium at the 2010 ACS meeting in Boston. This time too, I did not participate in person, tho visiting San Francisco again would have been nice. I gave two presentations (slides doi:10.5281/zenodo.8255394), but have not uploaded my slides of the first presentation to Zenodo yet.",
      
      "date_published": "2023-09-09T00:00:00+00:00",
      "date_modified": "2023-09-09T00:00:00+00:00",
      "tags": ["acs","scholia","wikidata","acsfall2023","conference"],
      "_references": [{ "url": "https://doi.org/10.5281/zenodo.8255394" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/65nqr-3w351",
      "url": "https://chem-bla-ics.linkedchemistry.info/2023/08/18/last-post-here-freebie-model-online.html",
      "title": "Last post there / the Freebie model online",
      "content_html": "<p>This is <a href=\"https://chem-bla-ics.blogspot.com/2023/08/last-post-here-freebie-model-online.html\">my last post</a> on blogger.com. At least, that is the plan. It has been a great 18 years. I like to thank the owners of\nblogger.com and Google later for providing this service. I am continuing the chem-bla-ics on a new domain:\n<a href=\"https://chem-bla-ics.linkedchemistry.info/\">https://chem-bla-ics.linkedchemistry.info/</a></p>\n\n<p>I, like so many others, struggle with choosing open infrastructure versus the freebie model. Of course, we know these things come\nand go. Google Reader, FriendFeed, Twitter/X (see doi:<a href=\"https://doi.org/10.1038/d41586-023-02554-0\">10.1038/d41586-023-02554-0</a>).\nMy new blog is still using the freebie model: I am hosting it on GitHub. But following the advice from a fellow cheminformatician,\nI now front this with a owned domain name.</p>\n\n<p>See you at <code class=\"language-plaintext highlighter-rouge\">linkedchemistry.info</code>!</p>",
      "summary": "This is my last post on blogger.com. At least, that is the plan. It has been a great 18 years. I like to thank the owners of blogger.com and Google later for providing this service. I am continuing the chem-bla-ics on a new domain: https://chem-bla-ics.linkedchemistry.info/",
      
      "date_published": "2023-08-18T00:00:00+00:00",
      "date_modified": "2023-08-18T00:00:00+00:00",
      
      "_references": [{ "url": "https://doi.org/10.1038/d41586-023-02554-0" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/xr6k8-z4480",
      "url": "https://chem-bla-ics.linkedchemistry.info/2023/08/12/boiling-points-in-wikidata.html",
      "title": "Boiling points in Wikidata",
      "content_html": "<p>Some days ago, I started added boiling points to <a href=\"https://wikidata.org/\">Wikidata</a>, referenced from\n<a href=\"https://scholia.toolforge.org/work/Q22236188\">Basic Laboratory and Industrial Chemicals</a> (wikidata:Q22236188),\n<a href=\"https://scholia.toolforge.org/author/Q18609741\">David R. Lide</a>’s\n‘a CRC quick reference handbook’ from 1993 (well, the edition I have). But Wikidata\n<a href=\"https://www.wikidata.org/wiki/User_talk:Egon_Willighagen#Basic_laboratory_and_industrial_chemicals:_a_CRC_quick_reference_handbook_(Q22236188)\">wants</a>\npressure (wikidata:P2077) info at which the boiling point (wikidata:P2102) was measured. Rightfully so. But I had not added those yet,\nbecause it slows me and can be automated with <a href=\"https://quickstatements.toolforge.org/\">QuickStatements</a>.</p>\n\n<p>I just need a few SPARQL queries to list to which statements the qualifiers needs to be added. Basically, all boiling points which has the\nbook as a reference and that do not have the pressure info. First, there are values with ‘unknown value’, which results in blank nodes\n(by the time you read this, they likely are already fixed):</p>\n\n<div class=\"language-sparql highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">SELECT</span><span class=\"w\"> </span><span class=\"nv\">?cmp</span><span class=\"w\"> </span><span class=\"nv\">?bp</span><span class=\"w\"> </span><span class=\"nv\">?pressure</span><span class=\"w\"> </span><span class=\"k\">WHERE</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\">\n  </span><span class=\"nv\">?cmp</span><span class=\"w\"> </span><span class=\"nn\">p</span><span class=\"o\">:</span><span class=\"ss\">P2102</span><span class=\"w\"> </span><span class=\"nv\">?bpStatement</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n  </span><span class=\"nv\">?bpStatement</span><span class=\"w\"> </span><span class=\"nn\">prov</span><span class=\"o\">:</span><span class=\"ss\">wasDerivedFrom</span><span class=\"o\">/</span><span class=\"nn\">pr</span><span class=\"o\">:</span><span class=\"ss\">P248</span><span class=\"w\"> </span><span class=\"nn\">wd</span><span class=\"o\">:</span><span class=\"ss\">Q22236188</span><span class=\"w\"> </span><span class=\"p\">;</span><span class=\"w\">\n    </span><span class=\"nn\">ps</span><span class=\"o\">:</span><span class=\"ss\">P2102</span><span class=\"w\"> </span><span class=\"nv\">?bp</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n  </span><span class=\"nv\">?bpStatement</span><span class=\"w\"> </span><span class=\"nn\">pq</span><span class=\"o\">:</span><span class=\"ss\">P2077</span><span class=\"w\"> </span><span class=\"nv\">?pressure</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n  </span><span class=\"k\">FILTER</span><span class=\"w\"> </span><span class=\"p\">(</span><span class=\"nb\">contains</span><span class=\"p\">(</span><span class=\"nb\">str</span><span class=\"p\">(</span><span class=\"nv\">?pressure</span><span class=\"p\">),</span><span class=\"w\"> </span><span class=\"s2\">\"http://\"</span><span class=\"p\">))</span><span class=\"w\">\n</span><span class=\"p\">}</span><span class=\"w\">\n</span></code></pre></div></div>\n\n<p>So, to get the list for which I want to write the QuickStatements which does not have any P2077 qualifier yet, I use\n<a href=\"https://query.wikidata.org/#SELECT%20%3Fcmp%20WHERE%20%7B%0A%20%20%3Fcmp%20p%3AP2102%20%3FbpStatement%20.%0A%20%20%3FbpStatement%20prov%3AwasDerivedFrom%2Fpr%3AP248%20wd%3AQ22236188%20%3B%0A%20%20%20%20ps%3AP2102%20%3Fbp%20.%0A%20%20MINUS%20%7B%20%3FbpStatement%20pq%3AP2077%20%3Fpressure%20%7D%0A%7D\">this query</a>:</p>\n\n<div class=\"language-sparql highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">SELECT</span><span class=\"w\"> </span><span class=\"nv\">?cmp</span><span class=\"w\"> </span><span class=\"k\">WHERE</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\">\n  </span><span class=\"nv\">?cmp</span><span class=\"w\"> </span><span class=\"nn\">p</span><span class=\"o\">:</span><span class=\"ss\">P2102</span><span class=\"w\"> </span><span class=\"nv\">?bpStatement</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n  </span><span class=\"nv\">?bpStatement</span><span class=\"w\"> </span><span class=\"nn\">prov</span><span class=\"o\">:</span><span class=\"ss\">wasDerivedFrom</span><span class=\"o\">/</span><span class=\"nn\">pr</span><span class=\"o\">:</span><span class=\"ss\">P248</span><span class=\"w\"> </span><span class=\"nn\">wd</span><span class=\"o\">:</span><span class=\"ss\">Q22236188</span><span class=\"w\"> </span><span class=\"p\">;</span><span class=\"w\">\n    </span><span class=\"nn\">ps</span><span class=\"o\">:</span><span class=\"ss\">P2102</span><span class=\"w\"> </span><span class=\"nv\">?bp</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n  </span><span class=\"k\">MINUS</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\"> </span><span class=\"nv\">?bpStatement</span><span class=\"w\"> </span><span class=\"nn\">pq</span><span class=\"o\">:</span><span class=\"ss\">P2077</span><span class=\"w\"> </span><span class=\"nv\">?pressure</span><span class=\"w\"> </span><span class=\"p\">}</span><span class=\"w\">\n</span><span class=\"p\">}</span><span class=\"w\">\n</span></code></pre></div></div>\n\n<p>At the time of writing, this lists 54 boiling points.</p>\n\n<p>I can the WDQS create CSV-styled QuickStatements with:</p>\n\n<div class=\"language-sparql highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">SELECT</span><span class=\"w\"> </span><span class=\"p\">(</span><span class=\"nb\">SUBSTR</span><span class=\"p\">(</span><span class=\"nb\">STR</span><span class=\"p\">(</span><span class=\"nv\">?cmp</span><span class=\"p\">),</span><span class=\"mi\">32</span><span class=\"p\">)</span><span class=\"w\"> </span><span class=\"k\">AS</span><span class=\"w\"> </span><span class=\"nv\">?qid</span><span class=\"p\">)</span><span class=\"w\"> </span><span class=\"nv\">?P2102</span><span class=\"w\"> </span><span class=\"nv\">?qal2077</span><span class=\"w\"> </span><span class=\"k\">WHERE</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\">\n  </span><span class=\"nv\">?cmp</span><span class=\"w\"> </span><span class=\"nn\">p</span><span class=\"o\">:</span><span class=\"ss\">P2102</span><span class=\"w\"> </span><span class=\"nv\">?bpStatement</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n  </span><span class=\"nv\">?bpStatement</span><span class=\"w\"> </span><span class=\"nn\">prov</span><span class=\"o\">:</span><span class=\"ss\">wasDerivedFrom</span><span class=\"o\">/</span><span class=\"nn\">pr</span><span class=\"o\">:</span><span class=\"ss\">P248</span><span class=\"w\"> </span><span class=\"nn\">wd</span><span class=\"o\">:</span><span class=\"ss\">Q22236188</span><span class=\"w\"> </span><span class=\"p\">;</span><span class=\"w\">\n    </span><span class=\"nn\">ps</span><span class=\"o\">:</span><span class=\"ss\">P2102</span><span class=\"w\"> </span><span class=\"nv\">?P2102</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n  </span><span class=\"k\">MINUS</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\"> </span><span class=\"nv\">?bpStatement</span><span class=\"w\"> </span><span class=\"nn\">pq</span><span class=\"o\">:</span><span class=\"ss\">P2077</span><span class=\"w\"> </span><span class=\"nv\">?pressure</span><span class=\"w\"> </span><span class=\"p\">}</span><span class=\"w\">\n  </span><span class=\"k\">BIND</span><span class=\"w\"> </span><span class=\"p\">(</span><span class=\"s2\">\"101.325U21064807\"</span><span class=\"w\"> </span><span class=\"k\">AS</span><span class=\"w\"> </span><span class=\"nv\">?qal2077</span><span class=\"p\">)</span><span class=\"w\">\n</span><span class=\"p\">}</span><span class=\"w\">\n</span></code></pre></div></div>\n\n<p>Here, the SPARQL variables double as QuickStatement instructions. Finally, note to use of “U21064807” which is the Wikidata item for\nkilopascal (wikidata:Q21064807).</p>\n\n<p>I also need to “add” the boiling point again, to make sure QuickStatements knows which statement to add the qualifier to. I think this\ncan be done better, but not sure how to target statements directly. This is not fool proof: I noted that this approach ignores the\nsituation where there are two statements with the (exact) same boiling point, but different error margins. But that I will monitor\nand where needed correct manually.</p>",
      "summary": "Some days ago, I started added boiling points to Wikidata, referenced from Basic Laboratory and Industrial Chemicals (wikidata:Q22236188), David R. Lide’s ‘a CRC quick reference handbook’ from 1993 (well, the edition I have). But Wikidata wants pressure (wikidata:P2077) info at which the boiling point (wikidata:P2102) was measured. Rightfully so. But I had not added those yet, because it slows me and can be automated with QuickStatements.",
      
      "date_published": "2023-08-12T00:00:00+00:00",
      "date_modified": "2023-08-12T00:00:00+00:00",
      "tags": ["rdf","wikidata","chemistry"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/kxar2-7z367",
      "url": "https://chem-bla-ics.linkedchemistry.info/2023/08/08/history-provenance-detail.html",
      "title": "History, provenance, detail",
      "content_html": "<p>Just a quick note: I just love the level of detail <a href=\"https://www.wikidata.org/\">Wikidata</a> allows us to use. One of the marvels is the\npractices of <code class=\"language-plaintext highlighter-rouge\">named as</code>, which can be used in statements for subject and objects. The notion and importance here is that things are\nreferred to in different ways, and these properties allows us to link the interpretation with the source. For example,\n<a href=\"https://scholia.toolforge.org/author/Q58978\">Max Born</a>’s seminal work <em><a href=\"https://scholia.toolforge.org/work/Q55867811\">Zur Quantenmechanik</a></em>\n(doi:<a href=\"https://doi.org/10.1007/BF01328531\">10.1007/BF01328531</a>) uses a very short notation to cite other literature, as footnotes,\nand DOIs did not exist yet.</p>\n\n<p><img src=\"/assets/images/old_references.png\" alt=\"Screenshot of two references as footnotes on a page with a mathematical formula from the old Born paper from 1925.\" /></p>\n\n<p>So, in Wikidata, you can <a href=\"https://www.wikidata.org/wiki/Q55867811#P2860\">capture this like this</a>:</p>\n\n<p><img src=\"/assets/images/new_old_references.png\" alt=\"Screenshot of the FAIR references from the 1925 Born paper.\" /></p>",
      "summary": "Just a quick note: I just love the level of detail Wikidata allows us to use. One of the marvels is the practices of named as, which can be used in statements for subject and objects. The notion and importance here is that things are referred to in different ways, and these properties allows us to link the interpretation with the source. For example, Max Born’s seminal work Zur Quantenmechanik (doi:10.1007/BF01328531) uses a very short notation to cite other literature, as footnotes, and DOIs did not exist yet.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/old_references.png",
      "date_published": "2023-08-08T00:00:00+00:00",
      "date_modified": "2023-08-08T00:00:00+00:00",
      "tags": ["wikidata","publishing"],
      "_references": [{ "url": "https://doi.org/10.1007/BF01328531" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2023/08/04/blog-planets-blogging-about-debian.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2023/08/04/blog-planets-blogging-about-debian.html",
      "title": "Blog planets: blogging about Debian, GNOME, Wikimedia, FSFE, and many more",
      "content_html": "<p>I am still an avid user of <a href=\"https://en.wikipedia.org/wiki/Category:Web_syndication_formats\">RSS/Atom feeds</a>. I use\n<a href=\"https://feedly.com/\">Feedly</a> daily, partly because of their easy to use app. My blog is part of\n<a href=\"https://planetrdf.com/\">Planet RDF</a>, a <em>blog planet</em>. Blog planets aggregate blogs from many people around a certain topic.\nIt’s like a forum, but open, free, community driven. It’s exactly what the web should be.</p>\n\n<p>It turned out that planets do still exist, so I started a small corner on Wikidata: <a href=\"https://www.wikidata.org/wiki/Q121134938\">Q121134938</a>,\nand a number of <a href=\"https://www.wikidata.org/wiki/Special:WhatLinksHere/Q121134938\">existing blog planets</a>:</p>\n\n<p><img src=\"/assets/images/blog_planets.png\" alt=\"Screenshot of the 'What links here' page for the Wikidata item 'blog planet'.\" /></p>\n\n<p>The software used to run these planets is ancient, though. We need a new generation of software, replacing things like\n<a href=\"https://en.wikipedia.org/wiki/Planet_(software)\">Planet</a>. And I want something people can easily host on GitHub or GitLab Pages or the likes.</p>\n\n<p>I created a minimal shape expression but the Wikidata items for the planets still lack a lot of information that can be added. First,\nwe can think of them as venues, perhaps, where people “publish” their work. Second, we can annotate the blog planets with ‘main subject’\nfor the topics the cover. Or we can list the people that are “author” on the planet; most planets are very transparent about which\nblogs they aggregate.</p>\n\n<p>Love to see where this is going. Who knows? Maybe we will see Postgenomic (see doi:<a href=\"https://doi.org/10.1186/1471-2105-8-487\">10.1186/1471-2105-8-487</a>) and\n<a href=\"https://chem-bla-ics.blogspot.com/search?q=%22chemical+blogspace%22\">Chemical blogspace</a> resurface :)</p>",
      "summary": "I am still an avid user of RSS/Atom feeds. I use Feedly daily, partly because of their easy to use app. My blog is part of Planet RDF, a blog planet. Blog planets aggregate blogs from many people around a certain topic. It’s like a forum, but open, free, community driven. It’s exactly what the web should be.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/blog_planets.png",
      "date_published": "2023-08-04T00:00:00+00:00",
      "date_modified": "2023-08-04T00:00:00+00:00",
      "tags": ["rss","wikidata","cb"],
      "_references": [{ "url": "https://doi.org/10.1186/1471-2105-8-487" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/nfqxs-qs982",
      "url": "https://chem-bla-ics.linkedchemistry.info/2023/07/27/archiving-and-updating-my-blog.html",
      "title": "Archiving and updating my blog",
      "content_html": "<p>This blog is <a href=\"https://chem-bla-ics.blogspot.com/2005/10/chem-bla-ics.html\">almost 18 years old</a> now. I have long wanted\nto migrate it to a version control system and at the same time have more control over things. Markdown would be awesome.\nIn the past year, I learned a lot about the power of <a href=\"https://github.com/jekyll/minima\">Jekyll</a> and needed to get more\nexperienced with it to use it for more databases, like we now do for <a href=\"https://wikipathways.org/\">WikiPathways</a>.</p>\n\n<p>So, time to <a href=\"https://egonw.github.io/blog/\">migrate</a> this blog :) This is probably a multiyear project, so feel free to continue\nreading it hear. Why? Because I start with the old posts :) Along the way, I am fixing things, improving it. I still\nhave plenty on my todo list, but already happy with having learned <a href=\"https://fontawesome.com/\">Font Awesome</a>, which makes\nit easy to annotate with how I fixed broken links (or not). I now use three icons: a box for when I use the\nInternet Archive (they can use your donation); a ‘recycle’ icon when I found a new URL for the same page; and a\nbroken URL link for other situations.</p>\n\n<p>This is what it looks like:</p>\n\n<p><img src=\"/assets/images/new_blog.png\" alt=\"Screenshot of the landing page of the new blog platform.\" /></p>",
      "summary": "This blog is almost 18 years old now. I have long wanted to migrate it to a version control system and at the same time have more control over things. Markdown would be awesome. In the past year, I learned a lot about the power of Jekyll and needed to get more experienced with it to use it for more databases, like we now do for WikiPathways.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/new_blog.png",
      "date_published": "2023-07-27T00:00:00+00:00",
      "date_modified": "2023-07-27T00:00:00+00:00",
      "tags": ["markdown","wikipathways"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2023/07/07/universities-and-open-infrastructures.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2023/07/07/universities-and-open-infrastructures.html",
      "title": "Universities and open infrastructures",
      "content_html": "<p>The role of a university is manifold. Being a place where people can find knowledge and the track record how that knowledge was reached is\noften seen as part of that. Over the past decades universities outsources this role, for example to publishers. This is seeing a lot of\ndiscussion and I am happy to see that the <a href=\"https://www.universiteitenvannederland.nl/\">Dutch Universities</a> are\n<a href=\"/2023/07/06/journal-rankings.html\">taking back control</a> <a href=\"https://www.openaire.eu/next-narcis-dutch-research-portal-on-openaire\">fast now</a>.\nFor example, <a href=\"https://mastodon.social/@Radboud_uni\">Radboud University</a> (&gt;1k followers) already joined the Fediverse (Mastodon etc), making\nthem independent from non-EU law and commercial interests. Scientific journals, Nobel Prize winners, etc\n<a href=\"2022-11-21-finding-mastodon-accounts-with-wikidata.markdown\">already joined too  <i class=\"fa-solid fa-recycle fa-xs\"></i></a>, btw.</p>\n\n<p><a href=\"https://netzpolitik.org/2023/a-call-to-action-universities-of-the-world-into-the-fediverse/\">This effort</a> is calling for more universities\nto go into the direction of open infrastructures. I am looking forward to seeing all Dutch Universities post news on Mastodon, post videos\non PeerTube, etc.</p>\n\n<p>Would it not be awesome if the Fediverse would become the new multidimensional knowledge dissemination and peer review system we have all\nbeen waiting for?</p>\n\n<p><strong>Update</strong>: universities with a Mastodon listed in Wikidata on the world map: <a href=\"https://w.wiki/6zR3\">https://w.wiki/6zR3</a></p>",
      "summary": "The role of a university is manifold. Being a place where people can find knowledge and the track record how that knowledge was reached is often seen as part of that. Over the past decades universities outsources this role, for example to publishers. This is seeing a lot of discussion and I am happy to see that the Dutch Universities are taking back control fast now. For example, Radboud University (&gt;1k followers) already joined the Fediverse (Mastodon etc), making them independent from non-EU law and commercial interests. Scientific journals, Nobel Prize winners, etc already joined too , btw.",
      
      "date_published": "2023-07-07T00:00:00+00:00",
      "date_modified": "2024-01-07T00:00:00+00:00",
      "tags": ["openscience","mastodon"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2023/07/06/journal-rankings.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2023/07/06/journal-rankings.html",
      "title": "Journal Rankings",
      "content_html": "<p>I am pleased to learn that the <a href=\"https://www.universiteitenvannederland.nl/nl_NL/nieuws-detail/nieuwsbericht/915-p-nederlandse-universiteiten-gaan-voortaan-anders-om-met-rankings-p.html\">Dutch Universities start looking at rankings of a more scientific way</a>.\nIt is long overdue that we take scientific peer review of the indicators used in those rankings seriously, instead of hiding beyond\n<a href=\"https://en.wikipedia.org/wiki/Fear,_uncertainty,_and_doubt\">fud</a> around the decline of quality of research.</p>\n\n<p>So, what defines the quality of a journal? Or better, of any scholarly dissemination channel? After all, some databases do better peer review\nthan some journals. Sadly, I am not aware of literature that compares the quality of peer review in databases with that in scientific journals.\nAlso long overdue, in my opinion.</p>\n\n<p>I hope the <a href=\"https://osc-international.com/\">Open Science community</a> will help shape these scholarly dissemination channels, journals included.\nSome ideas, the outlet:</p>\n\n<ul>\n  <li>encourages post-publication peer review</li>\n  <li>communicates the post-publication peer review</li>\n  <li>allows updating easily small fixes and clarifications (no hiding behind the version-of-record)</li>\n  <li>ensures supp info / additional files undergo the same level of peer review</li>\n  <li>use modern solutions for communication (like semantic web technologies)</li>\n  <li>have clear licenses for all aspects of the <a href=\"/2023/07/02/qeios-open-dissemination-platform-for.html\">research output</a></li>\n  <li>actively fight against visual representation only, but provides all data</li>\n  <li>guarantees that supp info / additional files are archived, as the output itself</li>\n  <li>adopts, promotes, requires community standards (including global, unique identifiers)</li>\n</ul>\n\n<p>Okay, these items are pretty broad. Many of them are part of FAIR, but that should not surprise you, because <a href=\"https://doi.org/10.1162/dint_r_00024\">FAIR</a>\nare just applying traditional scholarly approaches, like properly keeping notebooks. It’s just a bit more “digital” then we have been taught.</p>\n\n<p>Do we know how to do this? Yes, pretty much. This is not a technical exercise, but one of social change and particularly willingness. Basically, if you want\nto keep the current way of doing things, the declare you want unreproducible, low quality research reporting. That’s your academic freedom, of course.\nIf I were a funder or a university, I would also expect a bit more in return for my money.</p>\n\n<p>Let me stress, glossy articles are fine! You do not have to stop that. Media appearances, key notes, these are all also fine. They are, however,\ncomplementary. We should not continue the habit of fancy narratives as replacement for quality research dissemination. Do both, if you must.</p>",
      "summary": "I am pleased to learn that the Dutch Universities start looking at rankings of a more scientific way. It is long overdue that we take scientific peer review of the indicators used in those rankings seriously, instead of hiding beyond fud around the decline of quality of research.",
      
      "date_published": "2023-07-06T00:00:00+00:00",
      "date_modified": "2023-07-06T00:00:00+00:00",
      "tags": ["publishing"],
      "_references": [{ "url": "https://doi.org/10.1162/dint_r_00024" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2023/07/02/qeios-open-dissemination-platform-for.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2023/07/02/qeios-open-dissemination-platform-for.html",
      "title": "Qeios, an open dissemination platform for research output",
      "content_html": "<p>A bit over a year ago I got introduced to <a href=\"https://www.qeios.com/\">Qeios</a> when I was asked to review an article by Michie, West, and Hasting:\n<em>“Creating ontological definitions for use in science”</em> (doi:<a href=\"https://doi.org/10.32388/YGIF9B.2\">10.32388/YGIF9B.2</a>). I wrote up my thoughts after\nreading the paper, and the review was posted openly online and got a <a href=\"https://doi.org/10.32388/7MQYM4\">DOI</a>. Not the first platform to do this\n(think F1000), but it is always nice to see some publishers taking publishing seriously. Since then, I reviewed\n<a href=\"https://www.qeios.com/read/ZJ4QDA\">two</a> <a href=\"https://www.qeios.com/read/YCHHA7\">more</a> papers.</p>\n\n<p>One of these latter two was not a more traditional paper, but a different kind of <strong>research output</strong>: a definition, about “<em>Drive-by Curation</em>”\n(doi:<a href=\"https://doi.org/10.32388/KBX9VO\">10.32388/KBX9VO</a>). Now about this output type, collaboratively working on definitions is something core to\nontology development (e.g. see doi:<a href=\"https://doi.org/10.1186/s13326-015-0005-5\">10.1186/s13326-015-0005-5</a>), but there is a clear need to discuss\nterminology. The <a href=\"https://www.h2020gracious.eu/\">GRACIOUS</a> project in the <a href=\"https://www.nanosafetycluster.eu/\">EU NanoSafety Cluster</a> also recognized\nthis and set up a tool for this, their <a href=\"https://terminology-harmonizer.greendecision.eu/\">Terminology Harmonizer</a>\n(doi:<a href=\"https://doi.org/10.1016/j.impact.2021.100366\">10.1016/j.impact.2021.100366</a>).</p>\n\n<p>This GRACIOUS tool, much more than what Qeios does, helps users. Unfortunately, and why how these topics nicely come together, writing definitions,\nthinking about when some zeta potential is different from another zeta potential, and the (drive-by) community curation, it needs transparency.\nI understand it, but landing on a login page is for me a recipe for a silent death as it disallows people to learn, without making an (time)\ninvestment first. That is what Qeios does differently: it is more FAIR.</p>\n\n<p>So, that brings me to my last point in this post. Jente Houweling and I wrote up a definition for “<em>Research Output Management</em>”\n(doi:<a href=\"https://doi.org/10.32388/ZNWI7T\">10.32388/ZNWI7T</a>), based on our discussions about her research insights. See the screenshot below.</p>\n\n<p>It has been reviewed internally, and by one independent peer (doi:<a href=\"https://doi.org/10.32388/C3SJTN\">10.32388/C3SJTN</a>). But we would love to hear\nyour review too. Just follow the instructions online. We are looking forward to reading your thoughts and to refining our definition.</p>\n\n<p><img src=\"/assets/images/qeios_romp.png\" alt=\"Screenshot of the Qeios page for the Research Output Management paper.\" /></p>",
      "summary": "A bit over a year ago I got introduced to Qeios when I was asked to review an article by Michie, West, and Hasting: “Creating ontological definitions for use in science” (doi:10.32388/YGIF9B.2). I wrote up my thoughts after reading the paper, and the review was posted openly online and got a DOI. Not the first platform to do this (think F1000), but it is always nice to see some publishers taking publishing seriously. Since then, I reviewed two more papers.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/qeios_romp.png",
      "date_published": "2023-07-02T00:00:00+00:00",
      "date_modified": "2023-07-02T00:00:00+00:00",
      "tags": ["rom","publishing"],
      "_references": [{ "url": "https://doi.org/10.32388/YGIF9B.2" },{ "url": "https://doi.org/10.32388/7MQYM4" },{ "url": "https://doi.org/10.32388/ZJ4QDA" },{ "url": "https://doi.org/10.32388/YCHHA7" },{ "url": "https://doi.org/10.32388/KBX9VO" },{ "url": "https://doi.org/10.1186/S13326-015-0005-5" },{ "url": "https://doi.org/10.1016/j.impact.2021.100366" },{ "url": "https://doi.org/10.32388/ZNWI7T" },{ "url": "https://doi.org/10.32388/C3SJTN" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2023/07/01/twitter-exits-fair-and-is-no-longer.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2023/07/01/twitter-exits-fair-and-is-no-longer.html",
      "title": "Twitter exits FAIR and is no longer a dissemination solution",
      "content_html": "<p>And just like that, without a warning, Twitter changed policies again, and you now need a Twitter account and be logged in to see public tweets:\n<a href=\"https://www.theverge.com/2023/6/30/23779764/twitter-blocks-unregistered-users-account-tweets\">Twitter has started blocking unregistered users</a>\n(The Verge). Though I learned it first via Mastodon, of course.</p>\n\n<p>For example, this is what happens when you go to <a href=\"http://twitter.com/wikipathways\">twitter.com/wikipathways</a>:</p>\n\n<p><img src=\"/assets/images/twitter_login.png\" alt=\"Screenshot of the Twitter login page.\" /></p>\n\n<p>Fortunately, <a href=\"https://wikipathways.org/\">WikiPathways</a> does have a <a href=\"https://fosstodon.org/@wikipathways\">Mastodon account</a>,\nthat anyone can see without having a Mastodon account. You can even follow WikiPathways’s account with\n<a href=\"https://fosstodon.org/@wikipathways.rss\">its RSS feed</a>. Dissemination should not be paywalled.</p>\n\n<p>Maybe Musk has been talking to Elsevier and Springer Nature.</p>\n\n<p>Tip: <a href=\"https://chem-bla-ics.linkedchemistry.info/2022/11/21/finding-mastodon-accounts-with-wikidata.html\">Finding Mastodon accounts with Wikidata (a few SPARQL queries) <i class=\"fa-solid fa-recycle fa-xs\"></i></a></p>\n\n<p><strong>Update</strong>: <a href=\"https://tweakers.net/nieuws/211364/musk-blokkeren-van-niet-ingelogde-gebruikers-op-twitter-is-tijdelijke-maatregel.html\">Musk</a> said this\nwas a temporary measure. The problem was scraping of content, you know, the content we openly share on Twitter. Maybe they could have done this\nwith APIs. Oh wait, they closed those behind a very expensive paywall.</p>\n\n<p><strong>Update 2</strong>: Another rumor is that the forgot to make a deal with a cloud provider and suddenly were left with a fraction of the computing power.</p>\n\n<p><strong>Update 3</strong>: The access has been restored, so you can start scraping/archiving all interesting tweets again.</p>",
      "summary": "And just like that, without a warning, Twitter changed policies again, and you now need a Twitter account and be logged in to see public tweets: Twitter has started blocking unregistered users (The Verge). Though I learned it first via Mastodon, of course.",
      
      "date_published": "2023-07-01T00:00:00+00:00",
      "date_modified": "2024-11-02T00:00:00+00:00",
      "tags": ["twitter","mastodon","wikipathways"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2023/06/11/community-activity-2-fairsharing.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2023/06/11/community-activity-2-fairsharing.html",
      "title": "Community activity #2: FAIRsharing",
      "content_html": "<p>Some years ago we started the <a href=\"https://elixir-europe.org/communities/toxicology\">ELIXIR Toxicology Community</a>. It has been an interesting journey,\npartly covered in <a href=\"https://f1000research.com/articles/10-1129/v1\">this whitepaper</a>). We started with interaction we had in several projects already,\nbut particularly the potential. I see this. This series of posts is a number of things toxicology projects can do to benefit from ELIXIR solutions\n(“<a href=\"https://elixir-europe.org/services\">services</a>”). The posts have been sent first to the ELIXIR Toxicology Community mailing list (please join!).</p>\n\n<h3 id=\"history\">History</h3>\n\n<p>In this post, let’s look at <a href=\"https://fairsharing.org/\">FAIRsharing</a>. It is “A curated, informative and educational resource on data and metadata standards,\ninter-related to databases and data policies” [0,1].</p>\n\n<p>The ELIXIR Toxicology Community (we) maintains the toxicology corner of this database and members of our community have been adding toxicology-related\ndatabases, relevant standards. On the side of the policies we are falling a bit short:\n<a href=\"https://fairsharing.org/Toxicology\">fairsharing.org/Toxicology</a>.</p>\n\n<h3 id=\"why-adopt-fairsharing\">Why adopt FAIRsharing</h3>\n\n<p>FAIRsharing is one place where metadata can be shared about your databases. It helps make your resources and research more FAIR and explains people\nhow your work relates to other work (<a href=\"https://fairsharing.org/graph/3496\">fairsharing.org/graph/3496</a>):</p>\n\n<p><img src=\"/assets/images/fairsharing_toxicology.png\" alt=\"Screenshot of the 'collects' graph of the FAIRsharing Toxicology Community.\" /></p>\n\n<h3 id=\"what-you-can-do\">What you can do</h3>\n\n<p>Get an account (with your ORCID or GitHub account) and add resources important to your research, your projects, your work generally. Particularly,\n(data) policies and standards you are expected to comply with are useful. Also, links between various resources. For example, if some (project)\ndatabase complies with an important policy or standards, this is worth seeing show up.</p>\n\n<p>Alternatively, join the ELIXIR Toxicology Community <a href=\"https://doi.org/10.1162/dint_r_00024\">mailing list</a> and post the missing resource there,\nor use our issue tracker at <a href=\"https://github.com/elixir-europe/toxicology-community/issues/\">github.com/elixir-europe/toxicology-community/issues/</a>.</p>\n\n<p>Let’s make toxicology more <a href=\"https://doi.org/10.1162/dint_r_00024\">FAIR</a>.</p>\n\n<p>0.<a href=\"https://www.nature.com/articles/s41587-019-0080-8\">https://www.nature.com/articles/s41587-019-0080-8</a>\n1.<a href=\"https://scholia.toolforge.org/work/Q64084285\">https://scholia.toolforge.org/work/Q64084285</a></p>",
      "summary": "Some years ago we started the ELIXIR Toxicology Community. It has been an interesting journey, partly covered in this whitepaper). We started with interaction we had in several projects already, but particularly the potential. I see this. This series of posts is a number of things toxicology projects can do to benefit from ELIXIR solutions (“services”). The posts have been sent first to the ELIXIR Toxicology Community mailing list (please join!).",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/fairsharing_toxicology.png",
      "date_published": "2023-06-11T00:00:00+00:00",
      "date_modified": "2023-06-11T00:00:00+00:00",
      "tags": ["elixir","fair","toxicology"],
      "_references": [{ "url": "https://doi.org/10.12688/f1000research.74502.1" },{ "url": "https://doi.org/10.1038/s41587-019-0080-8" },{ "url": "https://doi.org/10.1162/dint_r_00024" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2023/05/31/information-retrieval-versus-chatgpt.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2023/05/31/information-retrieval-versus-chatgpt.html",
      "title": "Information Retrieval versus ChatGPT",
      "content_html": "<p>When last week in a large (and relevant) Dutch research event ChatGPT came up, and that this was going to change the world. Even the critiques came up,\nbut were effectively disregarded with “these methods get better very quickly”. This is not untrue, but not really true either. I murmur “not even wrong”.\nI know how hard it is to get computers to find meaningful patters; I did a PhD in this in the early 21st century.</p>\n\n<p>What strikes me, is that ChatGPT is now pitches as an informational retrieval (IR) system. This is a system where it tries to find information, that is,\nit “retrieves” information form a knowledge base. Like SQL or SPARQL. Or like Google Maps. IR about reproducing existing knowledge.</p>\n\n<p>Now, deep learning starts with a different premise: we can find the patterns and in this way compress an unlimited number of facts into a mathematical\nequation, a physical law. That way, you do not have to record if the sun comes up every day. We predict it does. We do not have to record that rain drop\nwill fall (that they do. when they do that actually is something to record). At best, we would record when rain drops start “falling” to the sky.\nThat is, we have the laws of gravitation.</p>\n\n<p>But here lies the problem with systems like ChatGPT: they are as good as their predictive patterns they learned. But they do not retrieve information.\nThey predict information. This is why it doesn’t know about references. It lost the link between predictions and on which shelf the the book was stored.</p>\n\n<p>So, when last week the research event mentioned that lawyers were starting to use it, citing existing work, I was skeptical: that would actually mean\nthey moved ChatGPT into IR. And I already had learned (*) that ChatGPT would predict references, rather than look them up. It’s a prediction method,\nnot an IR method. So, how come it would accurately give citations to court cases.</p>\n\n<p>It didn’t. It’s all over the news now. It “hallucinated” legal citations.</p>\n\n<p>Does this matter? I think it does. This is why I moved my research focus after my PhD back to IR, away from the machine learning. Deep learning can only\ngeneralize the facts, so we better start accurately recording facts. This is why I study interoperable and reusable knowledge bases, like WikiPathways,\nWikidata, technologies like RDF in science, etc. Actually, this realization predates my machine learning. I guess I already had this notion when I\nstarted the <em>Woordenboek Organische Chemie</em> back in the nineties.</p>\n\n<p>Someone has to. I just hope the funding for this fundamental aspect of research doesn’t run out. Information Retrieval will remain essential to science\nfor a few decades more.</p>",
      "summary": "When last week in a large (and relevant) Dutch research event ChatGPT came up, and that this was going to change the world. Even the critiques came up, but were effectively disregarded with “these methods get better very quickly”. This is not untrue, but not really true either. I murmur “not even wrong”. I know how hard it is to get computers to find meaningful patters; I did a PhD in this in the early 21st century.",
      
      "date_published": "2023-05-31T00:00:00+00:00",
      "date_modified": "2023-05-31T00:00:00+00:00",
      "tags": ["ml","rdf"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2023/05/22/paper-fair-cookbook-essential-resource.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2023/05/22/paper-fair-cookbook-essential-resource.html",
      "title": "Paper: The FAIR Cookbook - the essential resource for and by FAIR doers",
      "content_html": "<p>I think that if you want to make your knowledge FAIR, you should use an open license and RDF. Simple. Now, not everything is knowledge.\nA lot of data is, but a lot more is not, think raw data. Using RDF to explain a protein sequence is still something that makes me feel uneasy.</p>\n\n<p>However, first, you need to make RDF, you need to make assumptions explicit, you need to decide on meaning. Making RDF is not easy.\nIt’s not hard, just a lot of administration and scientific thinking. What did I measure? What model do I use to describe the chemistry?\nYou know, my research job.</p>\n\n<p>Moreover, not only data should be FAIR. All research output (worth communicating) should be FAIR.</p>\n\n<p>In the past, Andra Waagmeester invited me to co-author a recipe that explains the\n<a href=\"http://www.openphacts.org/specs/2013/WD-rdfguide-20131007/\">general steps of creating RDF</a>. This was during the Open PHACTS project and with Carina Haupt.\nWriting recipes is something getting traction. They are a bit like <a href=\"https://r-pkgs.org/vignettes.html\">vignettes from the R world</a>.</p>\n\n<p>In the past few years the <a href=\"https://cordis.europa.eu/project/id/802750\">FAIRplus project</a> created a\n<a href=\"https://faircookbook.elixir-europe.org/\">FAIR Cookbook</a> with recipes and I wrote a few. Actually, I still have a few to finish,\nfor which I cannot find the time. I retrospect, I spent too much time on perfecting the recipe to finish them earlier. The FAIR Cookbook\nis now a professional venue with editorial board. It is fully open source and welcomes your recipes. Oh, and it is now hosted as ELIXIR service,\nwhich is great to see!</p>\n\n<p>Finally, the <a href=\"https://doi.org/10.1038/s41597-023-02166-3\">The FAIR Cookbook - the essential resource for and by FAIR doers paper</a>\nis out. Go read it :)</p>\n\n<p><img src=\"https://media.springernature.com/full/springer-static/image/art%3A10.1038%2Fs41597-023-02166-3/MediaObjects/41597_2023_2166_Fig2_HTML.png?as=webp\" alt=\"Screenshot of a FAIR Cookbook recipe showing the infobox at the top (with reading time, difficulty indicator (4/5 flames), the audience (PIs, ontologists, data scholars), and the author list with ORCID, affiliation, and CReDIT annotation.)\" /></p>\n<center>\nFigure 2 from the article: 'Citability of recipes and identification of and credit for authors; an example is provided.'\n</center>",
      "summary": "I think that if you want to make your knowledge FAIR, you should use an open license and RDF. Simple. Now, not everything is knowledge. A lot of data is, but a lot more is not, think raw data. Using RDF to explain a protein sequence is still something that makes me feel uneasy.",
      "image": "https://chem-bla-ics.linkedchemistry.infohttps://media.springernature.com/full/springer-static/image/art%3A10.1038%2Fs41597-023-02166-3/MediaObjects/41597_2023_2166_Fig2_HTML.png?as=webp",
      "date_published": "2023-05-22T00:00:00+00:00",
      "date_modified": "2023-05-22T00:00:00+00:00",
      "tags": ["fair"],
      "_references": [{ "url": "https://doi.org/10.1038/s41597-023-02166-3" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2023/04/02/cito-updates-4-annotations-in-datasets.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2023/04/02/cito-updates-4-annotations-in-datasets.html",
      "title": "CiTO updates #4: annotations in datasets",
      "content_html": "<p>Okay, <a href=\"https://jcheminf.biomedcentral.com/articles/10.1186/s13321-023-00683-2\">the Pilot</a>\n<a href=\"https://jcheminf.biomedcentral.com/articles/10.1186/s13321-023-00684-1\">is over</a> ending with 17 papers, 16 of which have CiTO\nannotations (and so far 4 J.Cheminform. <a href=\"https://doi.org/10.1186/s13321-022-00656-x\">papers</a>\n<a href=\"https://doi.org/10.1186/s13321-022-00673-w\">after</a> <a href=\"https://doi.org/10.1186/s13321-022-00677-6\">the</a>\n<a href=\"https://doi.org/10.1186/s13321-023-00701-3\">pilot</a>), but my interest in the\n<a href=\"http://purl.org/spar/cito\">Citation Typing Ontology</a> continues and we just need\n<a href=\"https://chem-bla-ics.blogspot.com/2023/02/citation-typing-progress-but-we-need.html\">more adoption</a>.</p>\n\n<p><strong>Datasets as source of annotations</strong></p>\n\n<p>So, here’s a quick <a href=\"https://wikidata.org/\">Wikidata</a> update. I have been using Wikidata as infrastructure to collect and share CiTO\nannotations (see also the below “Scholia patch” posts). Some time ago I recovered my CiteULike CiTO annotations and made this\n<a href=\"https://scholia.toolforge.org/work/Q115470140\">available on Zenodo</a> (doi:<a href=\"https://doi.org/10.5281/ZENODO.7368209\">10.5281/zenodo.7368209</a>).</p>\n\n<p>And while thinking about datasets with CiTO annotations, I found two other datasets. One was from an article in Portuguese and one from an\n<a href=\"https://scholia.toolforge.org/work/Q117369886\">article by Peroni et al.</a> with\n<a href=\"https://zenodo.org/record/6885109\">this data file</a>. That data file is actually a zip, but inside the zip file is a CSV file with three\ninteresting columns: <code class=\"language-plaintext highlighter-rouge\">cited_doi</code>, <code class=\"language-plaintext highlighter-rouge\">citing_doi</code>, and <code class=\"language-plaintext highlighter-rouge\">intext_citation.intent</code>. There are many more columns and I can highly recommend browsing\nthem. But these are the three I need to add data to Wikidata. The third column is free text, but using the CiTO for labels, making it\nrelatively easy to convert to <a href=\"https://w.wiki/62sR\">citation intentions from Wikidata</a>\n(PS, thanks to <a href=\"https://www.wikidata.org/wiki/User:Fvtvr3r\">Fvtvr3r</a> for adding more!).</p>\n\n<p>So, I had a cleaned file and started writing a Groovy Bioclipse script using <a href=\"https://doi.org/10.21105/joss.02558\">Bacting</a>.\nIt basically does a few things: extract all DOIs, check which ones are in Wikidata, analyze the <code class=\"language-plaintext highlighter-rouge\">intext_citation.intent</code> column content,\nand then generate QuickStatements (see <a href=\"https://gist.github.com/egonw/f74fd3bc1f6361434b042a4cac2a8089\">this gist</a>). Out of the 600\nlines from the input, it creates some 200 new CiTO-annotated citations in Wikidata between\n<a href=\"https://scholia.toolforge.org/work/Q117357537#statements\">some 150 article pairs</a>:</p>\n\n<p><img src=\"/assets/images/Screenshot_20230402_084711.png\" alt=\"\" /></p>\n\n<p>The ability to include CiTO annotations from datasets is another welcome boost for the CiTO statistics in Wikidata.\n<a href=\"https://w.wiki/6XQf\">This SPARQL query</a> shows an overview of sources that support the CiTO intention annotation, but note that\na claim with a CiTO intention may also have CrossRef, PubMed, and COCI as reference. In those cases, they are primarily for\nthe citations and not the intention.</p>\n\n<p>There are <a href=\"https://scholar.social/@egonw/110124747053293502\">now</a> (the <a href=\"https://scholia.toolforge.org/cito/#statistics\">latest stats are here</a>)\n<strong>1202 citation intention</strong> annotations in Wikidata for 992 citations from <strong>405 articles in 199 venues</strong>. Of these 27 articles have\nexplicit annotations in the article itself and are found in 4 venues, two journals and two preprint servers). These annotated citations\nare to 510 articles in 190 different venues. <a href=\"https://github.com/WDscholia/scholia/pull/2271\">This Scholia patch</a> will add a new\nstatistics, the number of datasets providing citation intentions, of which there are (as discussed)\n<a href=\"https://scholia.toolforge.org/topic/Q115470140\">currently</a> <a href=\"https://scholia.toolforge.org/work/Q117357537\">two</a> in Wikidata.\nThe latter two provide intentions for the majority of articles and are depicted in yellow in the below overview.</p>\n\n<p><img src=\"/assets/images/Screenshot_20230402_085317.png\" alt=\"\" /></p>\n\n<p>With an annotation in <a href=\"https://www.wikidata.org/wiki/Q27638524\">an 1938 article by Alan Turing</a>! I ran into this article in November 2011\nnoting an apparent duplicate title in his article list. I turned out an earlier article had a correction with the same name.\nI added <a href=\"https://www.wikidata.org/w/index.php?title=Q27638524&amp;diff=1527020358&amp;oldid=984628387&amp;diffmode=source\">this clarification</a>:</p>\n\n<p><img src=\"/assets/images/Screenshot_20230402_090600.png\" alt=\"\" /></p>\n\n<p>This is very trivial citation intention data that publishers could provide as open data.</p>\n\n<p>Okay, that will do for today. There are actually some really interesting things in the pipeline, but I will have to write about that later. I have some deadlines I should start looking at. Below is some extra reading.\nSome more history</p>\n\n<ul>\n  <li>2021: <a href=\"https://chem-bla-ics.linkedchemistry.info/2021/11/15/biohackathon-europe-2021-1-cito.html\">BioHackathon Europe 2021 #1: CiTO annotations in BioHackrXiv <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n  <li>2021: <a href=\"https://chem-bla-ics.blogspot.com/2021/03/markdown-template-for-journal-of.html\">Markdown template for the Journal of Cheminformatics with CiTO support</a></li>\n  <li>2020: <a href=\"https://chem-bla-ics.blogspot.com/2020/11/cito-updates-3-third-paper-in.html\">CiTO updates #3: third paper in the collection and updated Scholia patch</a></li>\n  <li>2020: <a href=\"https://chem-bla-ics.blogspot.com/2020/11/cito-updates-2-annotation-migration-to.html\">CiTO updates #2: annotation migration to Wikidata and first Scholia patch</a></li>\n  <li>2020: <a href=\"https://chem-bla-ics.blogspot.com/2020/11/cito-updates-1-first-research-paper-in.html\">CiTO updates #1: first research paper in the Journal of Cheminformatics with CiTO annotation published</a></li>\n  <li>July 2020: <a href=\"https://chem-bla-ics.blogspot.com/2020/07/new-editorial-adoption-of-citation.html\">New Editorial: “Adoption of the Citation Typing Ontology by the Journal of Cheminformatics”</a></li>\n  <li>2015: <a href=\"https://chem-bla-ics.blogspot.com/2015/03/what-youre-doing-is-rather-desperate.html\">“What You’re Doing Is Rather Desperate”</a></li>\n  <li>2012: <a href=\"https://chem-bla-ics.linkedchemistry.info/2012/02/23/cito-citeulike-publishing-innovation.html\">CiTO / CiteULike: publishing innovation <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n  <li>2010: <a href=\"https://chem-bla-ics.linkedchemistry.info/2010/10/31/citeulike-cito-use-case-1-wordles.html\">CiteULike CiTO Use Case #1: Wordles <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n  <li>September 2010: <a href=\"https://chem-bla-ics.linkedchemistry.info/2010/09/17/list-of-things-i-miss-in-citeulike.html\">A list of things I miss in CiteULike <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n</ul>",
      "summary": "Okay, the Pilot is over ending with 17 papers, 16 of which have CiTO annotations (and so far 4 J.Cheminform. papers after the pilot), but my interest in the Citation Typing Ontology continues and we just need more adoption.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/Screenshot_20230402_085317.png",
      "date_published": "2023-04-02T00:00:00+00:00",
      "date_modified": "2024-11-02T00:00:00+00:00",
      "tags": ["cito","data","scholia"],
      "_references": [{ "url": "https://doi.org/10.1186/s13321-023-00683-2" },{ "url": "https://doi.org/10.1186/s13321-023-00684-1" },{ "url": "https://doi.org/10.1186/s13321-022-00656-x" },{ "url": "https://doi.org/10.1186/s13321-022-00673-w" },{ "url": "https://doi.org/10.1186/s13321-022-00677-6" },{ "url": "https://doi.org/10.1186/s13321-023-00701-3" },{ "url": "https://doi.org/10.1162/QSS_A_00222" },{ "url": "https://doi.org/10.5281/zenodo.5155219" },{ "url": "https://doi.org/10.21105/joss.02558" },{ "url": "https://doi.org/10.5281/ZENODO.7368209" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2022/11/21/finding-mastodon-accounts-with-wikidata.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2022/11/21/finding-mastodon-accounts-with-wikidata.html",
      "title": "Finding Mastodon accounts with Wikidata (a few SPARQL queries)",
      "content_html": "<p>There are multiple initiatives to support the migration from Twitter to Mastodon (see also\n<a href=\"2022-11-12-stwittermastodong.markdown\">this blog post <i class=\"fa-solid fa-recycle fa-xs\"></i></a>). But\n<a href=\"https://wikidata.org/\">Wikidata</a>\nshould not be forgotten here which has been tracking Mastodon accounts of things in their database:</p>\n\n<p><img src=\"/assets/images/Screenshot_20221121_075015.png\" alt=\"Screenshot of a Wikidata query showing the growth in number of Mastodon accounts listed in Wikidata.\" /></p>\n\n<p>So, here are some <a href=\"https://query.wikidata.org/\">Wikidata SPARQL</a> queries to see the uptake:</p>\n\n<ul>\n  <li><a href=\"https://w.wiki/5$3w\">Universities with Mastodon</a></li>\n  <li><a href=\"https://w.wiki/5$42\">All Mastodon accounts in Wikidata</a> (or <a href=\"https://w.wiki/5$4S\">subset with also a Twitter account</a>)</li>\n  <li><a href=\"https://w.wiki/6zFm\">Nobel Prize winners with Mastodon</a></li>\n  <li><a href=\"https://w.wiki/5$4V\">Academic journals with Mastodon</a></li>\n  <li><a href=\"https://w.wiki/5$4a\">People with Mastodon that published in a PLOS journal</a> (you can pick another publisher)</li>\n  <li><a href=\"https://w.wiki/5$4e\">Find your co-authors with your ORCID</a> (just replace my ORCID with yours)</li>\n</ul>\n\n<p>If you find yourself missing, back in April I <a href=\"https://threadreaderapp.com/thread/1519193166188007424.html\">tweeted</a> (sorry)\nhow you can find yourself and others in Wikidata and how to add your or their Mastodon account.</p>",
      "summary": "There are multiple initiatives to support the migration from Twitter to Mastodon (see also this blog post ). But Wikidata should not be forgotten here which has been tracking Mastodon accounts of things in their database:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/Screenshot_20221121_075015.png",
      "date_published": "2022-11-21T00:00:00+00:00",
      "date_modified": "2022-11-21T00:00:00+00:00",
      "tags": ["mastodon","sparql","wikidata","rdf","orcid"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2022/11/12/wikidata-script-for-smiles-smarts-and.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2022/11/12/wikidata-script-for-smiles-smarts-and.html",
      "title": "Wikidata script for SMILES, SMARTS, and CXSMILES depiction",
      "content_html": "<p>In August I reported about <a href=\"https://chem-bla-ics.blogspot.com/2022/08/wikidata-now-escapes-smiles-and-cxsmiles.html\">2D depiction of (CX)SMILES in Wikidata via linkouts</a>\n(<a href=\"https://chem-bla-ics.blogspot.com/2017/07/wikidata-visualizes-smiles-strings-with.html\">going back to 2017</a>). Based on a script by\n<a href=\"https://orcid.org/0000-0001-5916-0947\">Magnus Manske</a>, I wrote a <a href=\"https://www.wikidata.org/wiki/User:Egon_Willighagen/cdkdepict_gadget.js\">Wikidata gadget</a>\nthat uses the same <a href=\"https://www.simolecule.com/cdkdepict/depict.html\">CDK Depict</a>\n(<a href=\"https://cdkdepict.cloud.vhp4safety.nl/\">VHP4Safety mirror</a>) to depict the 2D structure in <a href=\"https://wikidata.org/\">Wikidata</a> itself:</p>\n\n<p><img src=\"/assets/images/Screenshot_20221112_130346.png\" alt=\"Depicting of part of a Wikidata page with 2D structures of a canonical SMILES and matching CXSMILES.\" /></p>\n\n<p>Note the depiction of the undefined (CIP) stereochemistry on two atoms. Thanks to\n<a href=\"https://orcid.org/0000-0003-0443-9902\">Adriano</a> and <a href=\"https://nextmovesoftware.com/blog/author/john/\">John</a> for working that out.</p>\n\n<p>More about CXSMILES in Wikidata in <a href=\"https://egonw.github.io/cdk-cxsmiles/\">this Dagstuhl meeting results write up</a>.</p>",
      "summary": "In August I reported about 2D depiction of (CX)SMILES in Wikidata via linkouts (going back to 2017). Based on a script by Magnus Manske, I wrote a Wikidata gadget that uses the same CDK Depict (VHP4Safety mirror) to depict the 2D structure in Wikidata itself:",
      
      "date_published": "2022-11-12T00:00:00+00:00",
      "date_modified": "2022-11-12T00:00:00+00:00",
      "tags": ["wikidata","cdk","cxsmiles","dagstuhl","smiles","vhp4safety"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2022/11/12/stwittermastodong.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2022/11/12/stwittermastodong.html",
      "title": "s/Twitter/Mastodon/g",
      "content_html": "<p><img src=\"/assets/images/Mastodon_logotype_(simple)_new_hue.svg.png\" style=\"width: 30%; display: block; margin-left: auto; margin-right: auto; float: right\" alt=\"Mastodon logo. AGPL source: WikiCommons\" />\nYeah, it has been hard to miss it (see e.g. <a href=\"https://www.nature.com/articles/d41586-022-03668-7\">Should I join Mastodon? A scientists’ guide to Twitter’s rival</a>).\nTwitter is experiencing some turbulence and <a href=\"https://joinmastodon.org/\">Mastodon</a> has become a very attractive, open source,\ncommunity-driven, inclusive alternative. It’s been <a href=\"https://scholia.toolforge.org/topic/Q27986619\">around since 2016</a> and there\nis some <a href=\"https://scholia.toolforge.org/topic/Q27986619\">research literature about it</a> already. I got\n<a href=\"https://chem-bla-ics.blogspot.com/2018/09/mastodon-somewhere-between-twitter-and.html?q=mastodon\">my account in 2018</a>, but did\nnot start actively using it until earlier this year.</p>\n\n<p>It’s a fascinating platform: federated, community driven, and open source. Oh, and it uses an open standard:\n<a href=\"https://en.wikipedia.org/wiki/ActivityPub\">ActivityPub</a>. I have still a lot to learn, but there are some reasons why Mastodon\nis better and some reasons why it is worse than Twitter.</p>\n\n<p>First, how can you follow me:</p>\n\n<ul>\n  <li>main scholarly account: <a href=\"https://social.edu.nl/@egonw\">https://social.edu.nl/@egonw</a></li>\n  <li>politics, foss, hobby account: <a href=\"https://mastodon.social/@egonw\">https://mastodon.social/@egonw</a></li>\n</ul>\n\n<p><strong>Better</strong></p>\n\n<p>Well, this is personal, of course, but the following points makes Mastodon for me a better platform:</p>\n\n<ul>\n  <li>distributed, open standard\n    <ul>\n      <li>e.g. no more tweeting of new Zotero entries (soon I hope), just follow my Zotero account</li>\n    </ul>\n  </li>\n  <li>community standards\n    <ul>\n      <li>you can pick; if you don’t like the terms of your current server (read: service provider), just move to another server</li>\n      <li>images must have alternate descriptions on many servers</li>\n    </ul>\n  </li>\n  <li>edit button with version control</li>\n  <li>content warnings</li>\n  <li>ability to hide anything with #caturday (or any other word)</li>\n  <li>detailed annotation of privacy (public, unlisted, etc; no encryption, tho)</li>\n</ul>\n\n<p><strong>Worse</strong></p>\n\n<p>Maybe this category can better be called opportunities. After all, it’s the community that defines how it will evolve, just like Twitter did (which did not originally have hashtags, retweets). One big elephant in the scientific social media world wright now is the uncertainty about searching and indexing: will it be useful as (post-publication) platform? will we be able to use if for conference tweeting?</p>\n\n<p>Another aspect is that in some countries mobile internet is deeply coupled with big companies. Think coupling of access with free whatsapp.</p>\n\n<p>Finally: growing pains. The platform is growing fast, and right now it can be hard to find a server that accepts new accounts.</p>\n\n<p><strong>Tips?</strong></p>\n\n<p>Sure. Start with <a href=\"https://fedi.tips/\">https://fedi.tips/</a>. Have fun! And I love to hear what your tips are :)</p>\n\n<p>Image from <a href=\"https://commons.wikimedia.org/wiki/File:Mastodon_logotype_(simple)_new_hue.svg\">WikiCommons</a>.</p>",
      "summary": "Yeah, it has been hard to miss it (see e.g. Should I join Mastodon? A scientists’ guide to Twitter’s rival). Twitter is experiencing some turbulence and Mastodon has become a very attractive, open source, community-driven, inclusive alternative. It’s been around since 2016 and there is some research literature about it already. I got my account in 2018, but did not start actively using it until earlier this year.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/Mastodon_logotype_(simple)_new_hue.svg.png",
      "date_published": "2022-11-12T00:00:00+00:00",
      "date_modified": "2022-11-12T00:00:00+00:00",
      "tags": ["mastodon","twitter"],
      "_references": [{ "url": "https://doi.org/10.1038/d41586-022-03668-7" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2022/10/08/is-your-research-cited-by-nobel-prize.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2022/10/08/is-your-research-cited-by-nobel-prize.html",
      "title": "Is your research cited by a Nobel prize winner?",
      "content_html": "<p><span style=\"width: 20%; float: right\"><a href=\"https://en.wikipedia.org/wiki/File:Nobel_Prize.png\">\n  <img src=\"https://upload.wikimedia.org/wikipedia/en/e/ed/Nobel_Prize.png?20131011153104\" /></a></span>\nForget the journal impact factor and the H-index. You want your research being used. A first approximation of that is getting cited,\nsure. So, with the Nobel Prize week over (congrats to all winners! the <a href=\"https://www.sciencelink.net/news/nobel-prize-in-physiology-awarded-to-sequencing-of-ancient-genomes/20811.article\">Neanderthaler prize</a>\nactually helped my work in Maastricht this week), let’s figure out of you are cited by a Nobel Prize winner.\nWikidata allows us to figure this out with a SPARQL query\n(<a href=\"https://twitter.com/Adafede/status/1577642035011534850\">created together with Adriano</a>):</p>\n\n<div class=\"language-sparql highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"c1\">#title: Are you cited by Nobel Prize winners?</span><span class=\"w\">\n\n</span><span class=\"k\">SELECT</span><span class=\"w\"> </span><span class=\"p\">(</span><span class=\"nb\">MIN</span><span class=\"p\">(</span><span class=\"nv\">?dates</span><span class=\"p\">)</span><span class=\"w\"> </span><span class=\"k\">AS</span><span class=\"w\"> </span><span class=\"nv\">?date</span><span class=\"p\">)</span><span class=\"w\"> </span><span class=\"nv\">?work</span><span class=\"w\"> </span><span class=\"nv\">?workLabel</span><span class=\"w\">\n  </span><span class=\"p\">(</span><span class=\"nb\">GROUP_CONCAT</span><span class=\"p\">(</span><span class=\"k\">DISTINCT</span><span class=\"w\"> </span><span class=\"nv\">?winnerLabel</span><span class=\"p\">;</span><span class=\"w\"> </span><span class=\"nb\">SEPARATOR</span><span class=\"w\"> </span><span class=\"p\">=</span><span class=\"w\"> </span><span class=\"s2\">\", \"</span><span class=\"p\">)</span><span class=\"w\"> </span><span class=\"k\">AS</span><span class=\"w\"> </span><span class=\"nv\">?winners</span><span class=\"p\">)</span><span class=\"w\">\n  </span><span class=\"p\">(</span><span class=\"nb\">COUNT</span><span class=\"p\">(</span><span class=\"k\">DISTINCT</span><span class=\"p\">(</span><span class=\"nv\">?winnerLabel</span><span class=\"p\">))</span><span class=\"w\"> </span><span class=\"k\">AS</span><span class=\"w\"> </span><span class=\"nv\">?count</span><span class=\"p\">)</span><span class=\"w\">\n</span><span class=\"k\">WHERE</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\">\n  </span><span class=\"k\">VALUES</span><span class=\"w\"> </span><span class=\"nv\">?nobel</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\">\n    </span><span class=\"nn\">wd</span><span class=\"o\">:</span><span class=\"ss\">Q7191</span><span class=\"w\">\n    </span><span class=\"nn\">wd</span><span class=\"o\">:</span><span class=\"ss\">Q80061</span><span class=\"w\">\n    </span><span class=\"nn\">wd</span><span class=\"o\">:</span><span class=\"ss\">Q44585</span><span class=\"w\">\n    </span><span class=\"nn\">wd</span><span class=\"o\">:</span><span class=\"ss\">Q38104</span><span class=\"w\">\n  </span><span class=\"p\">}</span><span class=\"w\">\n  </span><span class=\"nv\">?work</span><span class=\"w\"> </span><span class=\"nn\">wdt</span><span class=\"o\">:</span><span class=\"ss\">P50</span><span class=\"o\">/</span><span class=\"nn\">wdt</span><span class=\"o\">:</span><span class=\"ss\">P496</span><span class=\"w\"> </span><span class=\"s2\">\"0000-0002-2627-833X\"</span><span class=\"w\"> </span><span class=\"p\">;</span><span class=\"w\"> </span><span class=\"c1\"># REPLACE WITH YOUR ORCID id</span><span class=\"w\">\n    </span><span class=\"nn\">wdt</span><span class=\"o\">:</span><span class=\"ss\">P577</span><span class=\"w\"> </span><span class=\"nv\">?datetimes</span><span class=\"p\">.</span><span class=\"w\">\n  </span><span class=\"p\">[]</span><span class=\"w\"> </span><span class=\"nn\">wdt</span><span class=\"o\">:</span><span class=\"ss\">P2860</span><span class=\"w\"> </span><span class=\"nv\">?work</span><span class=\"p\">;</span><span class=\"w\">\n    </span><span class=\"nn\">wdt</span><span class=\"o\">:</span><span class=\"ss\">P50</span><span class=\"w\"> </span><span class=\"nv\">?winner</span><span class=\"p\">.</span><span class=\"w\">\n  </span><span class=\"nv\">?winner</span><span class=\"w\"> </span><span class=\"nn\">wdt</span><span class=\"o\">:</span><span class=\"ss\">P166</span><span class=\"w\"> </span><span class=\"nv\">?nobel</span><span class=\"p\">.</span><span class=\"w\">\n  </span><span class=\"k\">BIND</span><span class=\"p\">(</span><span class=\"nn\">xsd</span><span class=\"o\">:</span><span class=\"ss\">date</span><span class=\"p\">(</span><span class=\"nv\">?datetimes</span><span class=\"p\">)</span><span class=\"w\"> </span><span class=\"k\">AS</span><span class=\"w\"> </span><span class=\"nv\">?dates</span><span class=\"p\">)</span><span class=\"w\">\n  </span><span class=\"k\">SERVICE</span><span class=\"w\"> </span><span class=\"nn\">wikibase</span><span class=\"o\">:</span><span class=\"ss\">label</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\">\n    </span><span class=\"nn\">bd</span><span class=\"o\">:</span><span class=\"ss\">serviceParam</span><span class=\"w\"> </span><span class=\"nn\">wikibase</span><span class=\"o\">:</span><span class=\"ss\">language</span><span class=\"w\"> </span><span class=\"s2\">\"en\"</span><span class=\"p\">.</span><span class=\"w\">\n    </span><span class=\"nv\">?winner</span><span class=\"w\"> </span><span class=\"nn\">rdfs</span><span class=\"o\">:</span><span class=\"ss\">label</span><span class=\"w\"> </span><span class=\"nv\">?winnerLabel</span><span class=\"p\">.</span><span class=\"w\">\n    </span><span class=\"nv\">?work</span><span class=\"w\"> </span><span class=\"nn\">rdfs</span><span class=\"o\">:</span><span class=\"ss\">label</span><span class=\"w\"> </span><span class=\"nv\">?workLabel</span><span class=\"p\">.</span><span class=\"w\">\n  </span><span class=\"p\">}</span><span class=\"w\">\n</span><span class=\"p\">}</span><span class=\"w\">\n</span><span class=\"k\">GROUP</span><span class=\"w\"> </span><span class=\"k\">BY</span><span class=\"w\"> </span><span class=\"nv\">?work</span><span class=\"w\"> </span><span class=\"nv\">?workLabel</span><span class=\"w\">\n</span><span class=\"k\">ORDER</span><span class=\"w\"> </span><span class=\"k\">BY</span><span class=\"w\"> </span><span class=\"k\">DESC</span><span class=\"w\"> </span><span class=\"p\">(</span><span class=\"nv\">?count</span><span class=\"p\">)</span><span class=\"w\">\n</span></code></pre></div></div>\n\n<p>Run this query <a href=\"https://w.wiki/5nBX\">here</a>. Notice the ORCID given in the middle: change that to your own ORCID identifier.</p>\n\n<p>Please keep in mind that <a href=\"https://www.wikidata.org/\">Wikidata</a> does not contain all literature (neither do Google Scholar,\nWeb of Science, PubMed) and not all citations.</p>",
      "summary": "Forget the journal impact factor and the H-index. You want your research being used. A first approximation of that is getting cited, sure. So, with the Nobel Prize week over (congrats to all winners! the Neanderthaler prize actually helped my work in Maastricht this week), let’s figure out of you are cited by a Nobel Prize winner. Wikidata allows us to figure this out with a SPARQL query (created together with Adriano):",
      
      "date_published": "2022-10-08T00:00:00+00:00",
      "date_modified": "2022-10-08T00:00:00+00:00",
      "tags": ["wikidata","sparql"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2022/08/01/biology-acps-lipids-cheminformatics-and.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2022/08/01/biology-acps-lipids-cheminformatics-and.html",
      "title": "Biology, ACPs, lipids, cheminformatics, and Dagstuhl",
      "content_html": "<p>Already 3 months ago I visited <a href=\"https://www.dagstuhl.de/\">Dagstuhl</a> for the second time. The weather was much better than in the January right before\nthe start of the pandemic. The first I attended the Computational Metabolomics meeting, with the focus From Cheminformatics to Machine Learning, one\nof the things we concerned ourselves with was how to do computation with compound classes (see\n<a href=\"https://drops.dagstuhl.de/opus/volltexte/2020/12403/pdf/dagrep_v010_i001_p144_20051.pdf\">Section 3.6</a> and\n<a href=\"https://egonw.github.io/cdk-cxsmiles/\">this online book</a>). We know how to handle\nSMILES and we know how to the substructure searching with SMARTS, but what if you have compound classes or lipid classes? Biology is a greasy business.</p>\n\n<p>From a <a href=\"https://wikipathways.org/\">WikiPathways</a> there is additional complexity, with modified proteins involved in lipid metabolism, the acyl-carrier\nproteins. They look like this, and the R group is a protein:</p>\n\n<p><img src=\"/assets/images/Screenshot_20220801_180944.png\" alt=\"\" /></p>\n\n<p>We have quite a few of them in WikiPathway and they also show up in <a href=\"https://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:5697\">ChEBI</a> (and likely\nReactome), <a href=\"https://www.lipidmaps.org/databases/lmsd/LMFA07060040?LMID=LMFA07060040\">LIPID MAPS</a>, and\n<a href=\"https://www.kegg.jp/entry/C05764\">KEGG</a>.</p>\n\n<p>During this years Dagstuhl we used up one session to continue working on it (report pending). Part of the results is that\n<a href=\"https://www.wikidata.org/\">Wikidata</a> (see doi:<a href=\"https://doi.org/10.7554/eLife.52614\">10.7554/eLife.52614</a> and\ndoi:<a href=\"https://doi.org/10.7554/eLife.70780\">10.7554/eLife.70780</a>) now has <a href=\"https://www.wikidata.org/wiki/Property:P10718\">a property for CXSMILES</a>.\nCDK 2.0 (doi:<a href=\"https://doi.org/10.1186/s13321-017-0220-4\">10.1186/s13321-017-0220-4</a>) already supported CXSMILES and the above image is actually created with\n<a href=\"https://github.com/cdk/depict\">CDK Depict</a> (thx to John!).</p>\n\n<p>So, that means I can now start adding all those ACPs to Wikidata :) Here’s <a href=\"https://www.wikidata.org/wiki/Q113377202\">hexadecanoyl-[acp]</a>\n(or this <a href=\"https://scholia.toolforge.org/chemical-class/Q113377202\">Scholia page</a>):</p>\n\n<p><img src=\"/assets/images/Screenshot_20220801_182345.png\" alt=\"\" /></p>",
      "summary": "Already 3 months ago I visited Dagstuhl for the second time. The weather was much better than in the January right before the start of the pandemic. The first I attended the Computational Metabolomics meeting, with the focus From Cheminformatics to Machine Learning, one of the things we concerned ourselves with was how to do computation with compound classes (see Section 3.6 and this online book). We know how to handle SMILES and we know how to the substructure searching with SMARTS, but what if you have compound classes or lipid classes? Biology is a greasy business.",
      
      "date_published": "2022-08-01T00:00:00+00:00",
      "date_modified": "2022-08-01T00:00:00+00:00",
      "tags": ["cdk","chebi","dagstuhl","epilipidnet","kegg","wikipathways","lipidmaps","metabolomics","smiles","wikidata"],
      "_references": [{ "url": "https://doi.org/10.7554/ELIFE.52614" },{ "url": "https://doi.org/10.7554/ELIFE.70780" },{ "url": "https://doi.org/10.1186/S13321-017-0220-4" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2022/04/17/bridgedb-nwo-grant-update-2-building-up.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2022/04/17/bridgedb-nwo-grant-update-2-building-up.html",
      "title": "BridgeDb NWO grant update #2: building up momentum",
      "content_html": "<p><a href=\"/assets/images/bridgedb_nwo_uml.png\"><img src=\"/assets/images/bridgedb_nwo_uml.png\" style=\"width: 40%; display: block; margin-left: auto; margin-right: auto; float: right\" alt=\"UML diagram showing the steps in a BridgeDb webservice call.\" /></a>\nLast month I <a href=\"https://chem-bla-ics.linkedchemistry.info/2022/03/05/bridgedb-nwo-grant-update-1-first-steps.html\">reported <i class=\"fa-solid fa-recycle fa-xs\"></i></a> on the start of the\n<a href=\"https://www.nwo.nl/en/researchprogrammes/open-science/open-science-fund\">NWO Open Science grant</a> and it is time for an update. First,\nour grant now has a grant number, <a href=\"https://www.nwo.nl/en/projects/203001121\">203.001.121</a>. For a project that is about identifiers,\nhaving a project identifier is a big deal.</p>\n\n<p>Some updates by Denise, Martina, Tooba, Helena, and me:</p>\n\n<ul>\n  <li>the project proposal was accepted and published in RIO Journal (doi:<a href=\"https://doi.org/10.3897/rio.8.e83031\">10.3897/rio.8.e83031</a>)</li>\n  <li>we started drawing various <a href=\"https://github.com/bridgedb/stories\">BridgeDb stories as UML diagrams</a> using\n<a href=\"https://mermaid-js.github.io/\">Mermaid</a></li>\n  <li>updated the documentation in the <a href=\"https://github.com/bridgedb/bridgedb-webservice\">BridgeDb Webservice repository</a></li>\n  <li>an <a href=\"https://github.com/bridgedb/data/commit/172a9c69ef557e7cb065a138f0fc4f5243615188\">Ensembl 104-based gene/protein ID mapping database</a>\n(doi:<a href=\"10.5281/zenodo.6367091\">10.5281/zenodo.6367091</a>)</li>\n  <li>better unit test coverage of the BridgeDb Java library</li>\n  <li>various <a href=\"https://citation-file-format.github.io/\">CITATION.cff</a> updates</li>\n</ul>\n\n<p>There are some further things cooking, including an updated <a href=\"https://github.com/bridgedb/datasources\">datasources.tsv</a> and a few\n<a href=\"https://github.com/bridgedb/BridgeDb/pulls\">pull requests</a>. I expect a new release of the BridgeDb Java library before the end of the month.</p>\n\n<p>With these new results, we also updated <a href=\"https://www.isaac.nwo.nl/\">the ISAAC database</a> for the two new products\n(the published proposal and the gene/protein ID mapping database):</p>\n\n<p><img src=\"/assets/images/bridgedb_nwo_isaac.png\" alt=\"\" /></p>\n\n<p>Right now, the ISAAC database does not make it easy to add content. Instead, there is a series of forms that have to be\nmanually filled, including separate forms for authors. You cannot simply add a DOI. Well, until recent.\n<a href=\"https://orcid.org/0000-0002-4751-4637\">Lars Willighagen</a> and I developed <a href=\"https://chrome.google.com/webstore/detail/isaac-chrome-extension/kiljfbiapahlahhilgcgfkfjnkgggode\">a Chrome browser add-on</a>\nto help out (also works with Brave), using his awesome <a href=\"https://citation.js.org/\">citation-js</a>\n(doi:<a href=\"https://doi.org/10.7717/peerj-cs.214\">10.7717/peerj-cs.214</a>). The above two entries in the database have\nbeen added using this add-on.</p>\n\n<p>We hope it will help other NWO grant holders too and that the add-on becomes obsolete in the near future Because the ISAAC database needs some updates elsewhere too. For example, it does not seem to value open source and open data so much yet:</p>\n\n<p><img src=\"/assets/images/bridgedb_nwo_isaac_output_types.png\" alt=\"\" /></p>\n\n<p>That is a shame.</p>",
      "summary": "Last month I reported on the start of the NWO Open Science grant and it is time for an update. First, our grant now has a grant number, 203.001.121. For a project that is about identifiers, having a project identifier is a big deal.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/bridgedb_nwo_isaac.png",
      "date_published": "2022-04-17T00:00:00+00:00",
      "date_modified": "2024-12-30T00:00:00+00:00",
      "tags": ["bridgedb","openscience","isaac"],
      "_references": [{ "url": "https://doi.org/10.3897/RIO.8.E83031" },{ "url": "https://doi.org/10.5281/ZENODO.6367091" },{ "url": "https://doi.org/10.7717/peerj-cs.214" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2022/03/05/bridgedb-nwo-grant-update-1-first-steps.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2022/03/05/bridgedb-nwo-grant-update-1-first-steps.html",
      "title": "BridgeDb NWO grant update #1: first steps",
      "content_html": "<p>Last year, Denise, Tina, Marvin, and I received an <a href=\"https://www.nwo.nl/en/researchprogrammes/open-science/open-science-fund\">NWO Open Science</a>\ngrant (<a href=\"https://www.nwo.nl/en/projects/203001121\">203.001.121</a>) to improve the long running BridgeDb project, originally developed by Martijn van Iersel\n(see doi:<a href=\"https://doi.org/10.1186/1471-2105-11-5\">10.1186/1471-2105-11-5</a>). Helena joined our group as research software engineer and will work\npart-time on this grant. We started two weeks ago, so time for an update of results:</p>\n\n<ul>\n  <li>the project started after writing our data management and software sustainability plans (mostly, GitHub+Zenodo)</li>\n  <li>the project proposal has been submitted to <a href=\"https://riojournal.com/\">RIO Journal</a></li>\n  <li>created a private project in the <a href=\"https://gitlab.maastrichtuniversity.nl/\">Maastricht University GitLab</a> instance (with all tasks as issues, so that we can monitor progress)</li>\n  <li>first patches by Helena to the <a href=\"https://github.com/bridgedb/bridgedb\">BridgeDb Java library</a></li>\n  <li>factored out the <a href=\"https://github.com/bridgedb/bridgedb-webservice\">BridgeDb Webservice</a> into a separate (unpretty, see topright screenshot) repository, so that the BridgeDb Java library compiles again</li>\n  <li>Marvin update the <a href=\"https://hub.docker.com/layers/bigcatum/bridgedb/3.0.13.20220304/images/sha256-ad373eae152806d0935b751bcd06216732c7e26d3c34efba5e6a388d48c37087?context=explore\">BridgeDb Docker</a> with the latest BridgeDb 3.0.13 and the latest mapping files</li>\n</ul>\n\n<p>It should be noted that FAIRplus has funded Chris’s team to work on identifier mapping too. Luc, Lucas, and now Tooba in our team have been working\non Ensembl-based gene/protein identifier mappings and <a href=\"https://fairplus.github.io/the-fair-cookbook/\">FAIRplus Cookbook</a> recipes.</p>\n\n<p>Not bad this progress in the first two weeks. We are ready now to start writing unit tests for much of the BridgeDb code. There were some, but a lot of code is used in production, but not formally tested. So far, the number of regressions due to updated libraries (dependencies) has been quite manageable. But with the work planned in this grant, we need more sustainable software, and therefore more unit testing. With the BridgeDb Webservice factored out, the code is compiling again and so is the code coverage testing.</p>\n\n<p>The BridgeDb Webservice itself needs a rewrite from scratch. At least the mapping between underlying code (which we can reuse) and the REST calls. The library we used here has never been updated and I spent last weekend figuring out how to change the code, but gave up after two days. Rewriting is faster.</p>",
      "summary": "Last year, Denise, Tina, Marvin, and I received an NWO Open Science grant (203.001.121) to improve the long running BridgeDb project, originally developed by Martijn van Iersel (see doi:10.1186/1471-2105-11-5). Helena joined our group as research software engineer and will work part-time on this grant. We started two weeks ago, so time for an update of results:",
      
      "date_published": "2022-03-05T00:00:00+00:00",
      "date_modified": "2022-03-05T00:00:00+00:00",
      "tags": ["grant","bridgedb","openscience"],
      "_references": [{ "url": "https://doi.org/10.1186/1471-2105-11-5" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2021/11/15/biohackathon-europe-2021-1-cito.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2021/11/15/biohackathon-europe-2021-1-cito.html",
      "title": "BioHackathon Europe 2021 #1: CiTO annotations in BioHackrXiv",
      "content_html": "<p>Serendipity. I did not plan this hack at the <a href=\"https://biohackathon-europe.org/\">BioHackathon Europe 2021</a> but it happened anyway.\nBased on earlier work in the <a href=\"https://www.biomedcentral.com/collections/cito\">Journal of Cheminformatics</a>, extending on the\n<a href=\"https://doi.org/10.7717/peerj-cs.112\">work by Krewinkel et al.</a> I looked into the idea of using the Lua filter for\n<a href=\"https://biohackrxiv.org/\">BioHackrXiv</a>, a preprint server for BioHackathons. Actually, I started by looking at the\nCitation Styling Language file used by the BioHackrXiv tools. But that was just wrong.</p>\n\n<p>Long story short: <a href=\"https://github.com/biohackrxiv/bhxiv-gen-pdf/pull/10\">it worked</a>! Thanks to the encouragements from\n<a href=\"https://github.com/pjotrp\">Pjotr</a> and <a href=\"https://github.com/inutano\">Tazro</a> and suggestions from\n<a href=\"https://twitter.com/larswillighagen/status/1458059589925187585\">Lars</a> and some code on how to\n<a href=\"http://lua-users.org/wiki/TableUtils\">dump a Lua data structure to stdout</a>.</p>\n\n<p>In the Markdown/BibTeX combination you would normally write <code class=\"language-plaintext highlighter-rouge\">[@bibtexkey]</code> to add the reference to the article with the given key\nin the <code class=\"language-plaintext highlighter-rouge\">.bib</code> file. To type the citation (to state the intention why you cite that source), for example because you use a method\nin it, you write <code class=\"language-plaintext highlighter-rouge\">[@usesMethodIn:bibtexkey]</code>. This is different from\n<a href=\"https://github.com/jcheminform/markdown-jcheminf\">how it currently works for the Journal of Cheminformatics</a>,\nwhere the intention cannot be given at citation level yet. You can even use more than one intention, e.g. <code class=\"language-plaintext highlighter-rouge\">[@usesMethodIn:extends:bibtexkey]</code>.</p>\n\n<p>If you want to try it, just create a compatible Markdown file with BibTeX file in a new GitHub repository, and post the repository URL on\nthis <a href=\"http://preview.biohackrxiv.org/\">cool preview website</a>.</p>\n\n<p>Here’s what the created PDF could look like:</p>\n\n<p><img src=\"/assets/images/citoBioHackrXiv.png\" alt=\"\" /></p>",
      "summary": "Serendipity. I did not plan this hack at the BioHackathon Europe 2021 but it happened anyway. Based on earlier work in the Journal of Cheminformatics, extending on the work by Krewinkel et al. I looked into the idea of using the Lua filter for BioHackrXiv, a preprint server for BioHackathons. Actually, I started by looking at the Citation Styling Language file used by the BioHackrXiv tools. But that was just wrong.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/citoBioHackrXiv.png",
      "date_published": "2021-11-15T00:00:00+00:00",
      "date_modified": "2021-11-15T00:00:00+00:00",
      "tags": ["cito","biohackrxiv","markdown","pandoc","biohackeu12"],
      "_references": [{ "url": "https://doi.org/10.7717/peerj-cs.112" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2021/08/28/scholarly-journals-should-use-archived.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2021/08/28/scholarly-journals-should-use-archived.html",
      "title": "Scholarly journals should use &quot;Archived on&quot; instead of &quot;Accessed on&quot;",
      "content_html": "<p>Publishing habits changes very slowly, too slowly. The whole industry is incredibly inert, which can lead to severe frustration\nas <a href=\"https://chem-bla-ics.blogspot.com/2021/06/conflict-of-interest-or-why-i-am.html\">it did for me</a>. But sometimes small\nchanges can do so much.</p>\n\n<p>Linkrot, the phenomenon that URLs are not persistent, has been studied, including the in scholarly settings (see\n<a href=\"https://doi.org/10.3998/3336451.0004.210\">1998</a>,\n<a href=\"https://www.jstor.org/stable/20863780\">2000</a>,\n<a href=\"https://doi.org/10.1353/pla.2003.0098\">2003</a>,\n<a href=\"https://doi.org/10.1002/bmb.2003.494031010165\">2006</a>,\n<a href=\"https://doi.org/10.1300/J123v49n03_10\">2008</a>,\n<a href=\"https://doi.org/10.1371/journal.pone.0115253\">2014</a>,\n<a href=\"https://doi.org/10.18329/09757597/2015/8105\">2015</a>,\n<a href=\"https://doi.org/10.1108/GKMC-06-2019-0067\">2000</a>,\n<a href=\"https://journal.code4lib.org/articles/15509\">2021</a>,\nand probably many more). Indeed, scholarly publishers started introducing the following: URLs should be accompanied with an\n“accessed on” statement. Indeed, you can find this in many bibliographic formatting standards.</p>\n\n<p>Indeed, this must change, and we already have a solution <a href=\"https://en.wikipedia.org/wiki/Internet_Archive\">since 1996</a>:\nthe <a href=\"https://archive.org/web/\">Internet Archive</a> (tho the archive goes back much longer). I call all publishers to change\ntheir “Accessed on” to “Archived on”. Two simpel solutions that can compliment each other:</p>\n\n<h2 id=\"authors-archive-upon-submission\">Authors archive upon submission</h2>\n\n<p>This solution is simply introduced by updating author guidelines. Surely it will take a bit of time for bibliography software\nto be updated, and for the time being we still write “Accessed on” until there is proper support of “Archived on”.</p>\n\n<h2 id=\"journals-archive-upon-acceptance\">Journals archive upon acceptance</h2>\n\n<p>This solution looks for all URLs in journal articles and archives them. It doesn’t matter if the author already did this,\nbecause the Internet Archive has no trouble handling this:</p>\n\n<p><img src=\"/assets/images/Screenshot_20210828_102732.png\" alt=\"\" /></p>\n\n<center>Screenshot of the WaybackMachine showing <a href=\"https://web.archive.org/web/19990615000000*/sci.kun.nl\">many captures of the sci.kun.nl domain.</a></center>\n<p><br /></p>\n\n<p>BTW, projects like Wikipedia have <a href=\"https://meta.wikimedia.org/wiki/InternetArchiveBot\">automated the process</a> of\narchiving URLs and I see no reason why publishers could not do this.</p>",
      "summary": "Publishing habits changes very slowly, too slowly. The whole industry is incredibly inert, which can lead to severe frustration as it did for me. But sometimes small changes can do so much.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/Screenshot_20210828_102732.png",
      "date_published": "2021-08-28T00:00:00+00:00",
      "date_modified": "2021-08-28T00:00:00+00:00",
      "tags": ["publishing"],
      "_references": [{ "url": "https://doi.org/10.3998/3336451.0004.210" },{ "url": "https://doi.org/10.1353/pla.2003.0098" },{ "url": "https://doi.org/10.1002/bmb.2003.494031010165" },{ "url": "https://doi.org/10.1300/J123v49n03_10" },{ "url": "https://doi.org/10.1371/journal.pone.0115253" },{ "url": "https://doi.org/10.18329/09757597/2015/8105" },{ "url": "https://doi.org/10.1108/GKMC-06-2019-0067" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2021/02/16/downloading-all-currently-released.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2021/02/16/downloading-all-currently-released.html",
      "title": "Downloading all currently released BridgeDb identifier mapping databases",
      "content_html": "<p>The <a href=\"https://bridgedb.github.io/\">BridgeDb</a> project (doi:<a href=\"https://doi.org/10.1186/1471-2105-11-5\">10.1186/1471-2105-11-5</a>)\n(and <a href=\"https://elixir-europe.org/platforms/interoperability/rirs\">ELIXIR recommended interoperability resource</a>) has several\naims, all around identifier mapping:</p>\n\n<ul>\n  <li>provide a Java API for identifier mapping</li>\n  <li>provide ID mappings (two flavors: with and without semantic meaning)</li>\n  <li>provide services (<a href=\"https://www.bioconductor.org/packages/release/bioc/html/BridgeDbR.html\">R package</a>,\n<a href=\"http://webservice.bridgedb.org/\">OpenAPI webservice</a>)</li>\n  <li>track the history of identifiers</li>\n</ul>\n\n<p>The last one is more recent and two aspects are under development here: secondary identifiers and dead identifiers. More\nabout that in some future post. About the first and the third I am also not going to tell much in this post. Just follow the\nabove links.</p>\n\n<p>I do want to say something in this post about the actually identifier mapping databases, in particular those we distribute as\nApache Derby files, the storage format used by the Java libraries. These are the files you download if you want mapping databases\nfor <a href=\"https://pathvisio.github.io/\">PathVisio</a> (doi:<a href=\"https://doi.org/10.1371/journal.pcbi.1004085\">10.1371/journal.pcbi.1004085</a>).\nBridgeDb has mapping files for various things and some example databases the data it maps between:</p>\n\n<ol>\n  <li>genes and proteins: Ensembl, UniProt, NCBI Gene</li>\n  <li>metabolites; HMDB, ChEBI, LIPID MAPS, Wikidata, CAS</li>\n  <li>publications: DOI, PubMed</li>\n  <li>macromolecular complexes: Complex Portal, Wikidata</li>\n</ol>\n\n<p>The BridgeDb API is agnostic to the things it can map identifiers for.</p>\n\n<p><strong>Downloading mapping files</strong>:\nBridgeDb has an <a href=\"https://bioschemas.org/\">BioSchemas</a>-powered\n<a href=\"https://bridgedb.github.io/data/gene_database/\">web page with an overview of the latest released mapping files</a>.\nIt looks like this:</p>\n\n<p><img src=\"/assets/images/bridgedbDownloadsImage.png\" alt=\"\" /></p>\n\n<p>This webpage is the result from the cyber attack in late 2019, disrupting a good bit of the infrastructure. This is why we\nrenewed the website, including the download page. The new page actually is hosted <a href=\"https://github.com/bridgedb/data\">on GitHub as a Markdown file</a>,\nbut this is where things get interesting. The Markdown file is actually autogenerated from a JSON file with all the info. Everything,\nincluding the BioSchemas annotation is created from that. Basically, JSON gets converted into Markdown (with a custom script), which\ngets converted into HTML by a GitHub Action/Pages. So, when someone releases a new mapping file on Zenodo or Figshare, they only have\nto send me a pull request with updated JSON file.</p>\n\n<p>Now, previously, downloading all released mapping files, for example for the BridgeDb webservice, was a bit complicated. The\ninformation was a HTML file generated by the webserver for a folder. No metadata. Nuno wrote code to extract the relevant info\nand download all the files. However, since the information is now available in a public JSON file, it is a lot easier. The\nfollowing code uses wget and jq, two tools readily available on the popular operating systems. Have fun!</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"o\">!</span>/bin/bash\n\nwget <span class=\"nt\">-nc</span> https://bridgedb.github.io/data/gene.json\nwget <span class=\"nt\">-nc</span> https://bridgedb.github.io/data/corona.json\nwget <span class=\"nt\">-nc</span> https://bridgedb.github.io/data/other.json\n\njq <span class=\"nt\">-r</span> <span class=\"s1\">'.mappingFiles | .[] | \"\\(.file)=\\(.downloadURL)\"'</span> gene.json <span class=\"o\">&gt;</span> files.txt\njq <span class=\"nt\">-r</span> <span class=\"s1\">'.mappingFiles | .[] | \"\\(.file)=\\(.downloadURL)\"'</span> corona.json <span class=\"o\">&gt;&gt;</span> files.txt\njq <span class=\"nt\">-r</span> <span class=\"s1\">'.mappingFiles | .[] | \"\\(.file)=\\(.downloadURL)\"'</span> other.json <span class=\"o\">&gt;&gt;</span> files.txt\n\n<span class=\"k\">for </span>FILE <span class=\"k\">in</span> <span class=\"si\">$(</span><span class=\"nb\">cat </span>files.txt<span class=\"si\">)</span>\n<span class=\"k\">do\n  </span>readarray <span class=\"nt\">-d</span> <span class=\"o\">=</span> <span class=\"nt\">-t</span> splitFILE<span class=\"o\">&lt;&lt;&lt;</span> <span class=\"s2\">\"</span><span class=\"nv\">$FILE</span><span class=\"s2\">\"</span>\n  <span class=\"nb\">echo</span> <span class=\"k\">${</span><span class=\"nv\">splitFILE</span><span class=\"p\">[0]</span><span class=\"k\">}</span>\n  wget <span class=\"nt\">-nc</span> <span class=\"nt\">-O</span> <span class=\"k\">${</span><span class=\"nv\">splitFILE</span><span class=\"p\">[0]</span><span class=\"k\">}</span> <span class=\"k\">${</span><span class=\"nv\">splitFILE</span><span class=\"p\">[1]</span><span class=\"k\">}</span>\n<span class=\"k\">done</span>\n</code></pre></div></div>\n\n<p>Actually, while writing this blog post, I notice the code can be further simplified.</p>",
      "summary": "The BridgeDb project (doi:10.1186/1471-2105-11-5) (and ELIXIR recommended interoperability resource) has several aims, all around identifier mapping:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/bridgedbDownloadsImage.png",
      "date_published": "2021-02-16T00:00:00+00:00",
      "date_modified": "2021-02-16T00:00:00+00:00",
      "tags": ["bridgedb","json"],
      "_references": [{ "url": "https://doi.org/10.1186/1471-2105-11-5" },{ "url": "https://doi.org/10.1371/journal.pcbi.1004085" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2020/07/03/bioclipse-git-experiences-2-create.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2020/07/03/bioclipse-git-experiences-2-create.html",
      "title": "Bioclipse git experiences #2: Create patches for individual plugins/features",
      "content_html": "<p>This is a series of two posts repeating some content I <a href=\"https://web.archive.org/web/20180821111520/http://wiki.bioclipse.net/index.php?title=Git_Development\">wrote up back in the Bioclipse days</a>\n(see also <a href=\"https://scholia.toolforge.org/topic/Q1769726\">this Scholia page</a>). They both deal with something\nwe were facing: restructuring of version control repositories, while actually keeping the history. For\nexample, you may want to copy or move code from one repository to another. A second use case can be a file\nthat must be removed (there are valid reasons for that). Because these posts are based on Bioclipse work,\nthere will be some specific terminology, but the approach I regularly apply in other situations.</p>\n\n<p>This second post talks about how to migrate code from one repository to another.</p>\n\n<h2 id=\"create-patches-for-individual-pluginsfeatures\">Create patches for individual plugins/features</h2>\n\n<p>While the above works pretty well, a good alternative in situations where you only need to get a\nrepository-with-history for a few plugins, is to use patch sets.</p>\n\n<p>First, initialize a new git repository, e.g. bioclipse.rdf:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nb\">mkdir </span>bioclipse.rdf\n<span class=\"nb\">cd </span>bioclipse.rdf\ngit init\nnano README\ngit commit <span class=\"nt\">-m</span> <span class=\"s2\">\"Added README with some basic info about the new repository\"</span> README\n</code></pre></div></div>\n\n<p>Then, for each plugin discover you need what the commit was where the plugins was first commited, using the git-svn repository created earlier:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nb\">cd </span>your.gitsvn.checkout\ngit log <span class=\"nt\">--pretty</span><span class=\"o\">=</span>oneline externals/com.hp.hpl.jena/ | <span class=\"nb\">tail</span> <span class=\"nt\">-1</span>\n</code></pre></div></div>\n\n<p>Then create patches for the last tree before that last patch by appending <code class=\"language-plaintext highlighter-rouge\">^1</code> to the commit hash. For example, the first patch of the Jena libraries was 06d0eb0542377f958d06892860ea3363e3316389, so I type:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nb\">rm </span>00<span class=\"k\">*</span>.patch\ngit format-patch 06d0eb0542377f958d06892860ea3363e3316389^1 <span class=\"nt\">--</span> externals/com.hp.hpl.jena\n</code></pre></div></div>\n\n<p>(tune the filter when removing old patches if there are more than 99!)</p>\n\n<p>The previous two steps can be combined into a Perl script:</p>\n\n<div class=\"language-perl highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"c1\">#!/usr/bin/perl</span>\n<span class=\"k\">use</span> <span class=\"nv\">diagnostics</span><span class=\"p\">;</span>\n<span class=\"k\">use</span> <span class=\"nv\">strict</span><span class=\"p\">;</span>\n\n<span class=\"k\">my</span> <span class=\"nv\">$plugin</span> <span class=\"o\">=</span> <span class=\"nv\">$ARGV</span><span class=\"p\">[</span><span class=\"mi\">0</span><span class=\"p\">];</span>\n\n<span class=\"k\">if</span> <span class=\"p\">(</span><span class=\"o\">!</span><span class=\"nv\">$plugin</span><span class=\"p\">)</span> <span class=\"p\">{</span>\n  <span class=\"k\">print</span> <span class=\"p\">\"</span><span class=\"s2\">Syntax: gfp &lt;plugin|feature&gt;</span><span class=\"se\">\\n</span><span class=\"p\">\";</span>\n  <span class=\"nb\">exit</span><span class=\"p\">(</span><span class=\"mi\">0</span><span class=\"p\">);</span>\n<span class=\"p\">}</span>\n\n<span class=\"nb\">die</span> <span class=\"p\">\"</span><span class=\"s2\">Cannot find plugin or feature </span><span class=\"si\">$plugin</span><span class=\"s2\"> !</span><span class=\"p\">\"</span> <span class=\"k\">if</span> <span class=\"p\">(</span><span class=\"o\">!</span><span class=\"p\">(</span><span class=\"o\">-</span><span class=\"nv\">e</span> <span class=\"nv\">$plugin</span><span class=\"p\">));</span>\n\n<span class=\"p\">`</span><span class=\"sb\">rm -f *.patch</span><span class=\"p\">`;</span>\n<span class=\"k\">my</span> <span class=\"nv\">$hash</span> <span class=\"o\">=</span> <span class=\"p\">`</span><span class=\"sb\">git log --follow --pretty=oneline </span><span class=\"si\">$plugin</span><span class=\"sb\"> | tail -1 | cut -d' ' -f1</span><span class=\"p\">`;</span>\n<span class=\"nv\">$hash</span> <span class=\"o\">=~</span> <span class=\"sr\">s/\\n|\\r//g</span><span class=\"p\">;</span>\n\n<span class=\"k\">print</span> <span class=\"p\">\"</span><span class=\"s2\">Plugin: </span><span class=\"si\">$plugin</span><span class=\"s2\"> </span><span class=\"se\">\\n</span><span class=\"p\">\";</span>\n<span class=\"k\">print</span> <span class=\"p\">\"</span><span class=\"s2\">Hash: </span><span class=\"si\">$hash</span><span class=\"s2\"> </span><span class=\"se\">\\n</span><span class=\"p\">\";</span>\n<span class=\"p\">`</span><span class=\"sb\">git format-patch </span><span class=\"si\">$hash</span><span class=\"sb\">^1 -- </span><span class=\"si\">$plugin</span><span class=\"p\">`;</span>\n</code></pre></div></div>\n\n<p>Move these patches into your new repository:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nb\">mv </span>00<span class=\"k\">*</span>.patch ../bioclipse.rdf\n</code></pre></div></div>\n\n<p>(tune the filter when moving the patches if there are more than 99! Also customize the target folder name to match your situation)</p>\n\n<p>Apply the new patches in your new git repository:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nb\">cd</span> ../bioclipse.rdf\ngit am 00<span class=\"k\">*</span>.patch\n</code></pre></div></div>\n\n<p>(You’re on your own if that fails… and you may have to default to the other alternative then)</p>\n\n<p>Repeat those two steps for all plugins you want in your new repository</p>",
      "summary": "This is a series of two posts repeating some content I wrote up back in the Bioclipse days (see also this Scholia page). They both deal with something we were facing: restructuring of version control repositories, while actually keeping the history. For example, you may want to copy or move code from one repository to another. A second use case can be a file that must be removed (there are valid reasons for that). Because these posts are based on Bioclipse work, there will be some specific terminology, but the approach I regularly apply in other situations.",
      
      "date_published": "2020-07-03T00:00:00+00:00",
      "date_modified": "2020-07-03T00:00:00+00:00",
      "tags": ["bioclipse","git"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2020/07/02/bioclipse-git-experiences-1-strip-away.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2020/07/02/bioclipse-git-experiences-1-strip-away.html",
      "title": "Bioclipse git experiences #1: Strip away unwanted plugins",
      "content_html": "<p>This is a series of two posts repeating some content I <a href=\"https://web.archive.org/web/20180821111520/http://wiki.bioclipse.net/index.php?title=Git_Development\">wrote up back in the Bioclipse days</a>\n(see also <a href=\"https://scholia.toolforge.org/topic/Q1769726\">this Scholia page</a>). They both deal with something\nwe were facing: restructuring of version control repositories, while actually keeping the history. For\nexample, you may want to copy or move code from one repository to another. A second use case can be a file\nthat must be removed (there are valid reasons for that). Because these posts are based on Bioclipse work,\nthere will be some specific terminology, but the approach I regularly apply in other situations.</p>\n\n<p>For this first post, think of a <em>plugin</em> as a subfolder, tho it even applies to files.</p>\n\n<h2 id=\"strip-away-unwanted-plugins\">Strip away unwanted plugins</h2>\n\n<p>In this case, you remove everything you do not want in your new git repository. Do:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>git clone <span class=\"nt\">--bare</span> <span class=\"nt\">--no-hardlinks</span> old.local.clone/ new.local.clone/\n</code></pre></div></div>\n\n<p>Then use:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>git filter-branch <span class=\"nt\">--index-filter</span> <span class=\"s1\">'git rm -r -q --cached --ignore-unmatch plugins/net.bioclipse.actionHistory plugins/net.bioclipse.analysis'</span> HEAD\n</code></pre></div></div>\n\n<p>It often happens that you need to run the above command several times, in cases when there are many subdirectories to be removed.\nWhen you removed all the bits you need removed, you can clean up the repository and reduce the size considerably with:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code> git repack <span class=\"nt\">-ad</span><span class=\"p\">;</span> git prune\n</code></pre></div></div>",
      "summary": "This is a series of two posts repeating some content I wrote up back in the Bioclipse days (see also this Scholia page). They both deal with something we were facing: restructuring of version control repositories, while actually keeping the history. For example, you may want to copy or move code from one repository to another. A second use case can be a file that must be removed (there are valid reasons for that). Because these posts are based on Bioclipse work, there will be some specific terminology, but the approach I regularly apply in other situations.",
      
      "date_published": "2020-07-02T00:00:00+00:00",
      "date_modified": "2020-07-02T00:00:00+00:00",
      "tags": ["bioclipse","git"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2019/03/30/what-metabolites-are-found-in-which.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2019/03/30/what-metabolites-are-found-in-which.html",
      "title": "What metabolites are found in which species? Nanopublications from Wikidata",
      "content_html": "<p>In December I reported about Groovy <a href=\"https://chem-bla-ics.linkedchemistry.info/2018/12/27/creating-nanopublications-with-groovy.html\">code to create nanopublications <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.\nThis has been running for some time now, extracting nanopubs that assert that some\nmetabolite is found in some species. I send the resulting nanopubs to\n<a href=\"https://scholia.toolforge.org/author/Q42027946\">Tobias Kuhn <i class=\"fa-solid fa-recycle fa-xs\"></i></a>, to populate his\n<em>Growing Resource of Provenance-Centric Scientific Linked Data</em>\n(doi:<a href=\"https://doi.org/10.1109/eScience.2018.00024\">10.1109/eScience.2018.00024</a>,\n<a href=\"https://arxiv.org/pdf/1809.06532.pdf\">PDF</a>).</p>\n\n<p>Each data set comes with <a href=\"http://np.inn.ac/RA6KPZ2qS8joGDOA9EvfcNHeNsg6nI2_T1YePsYMjL9io\">an index pointing to the individual nanopubs</a>,\nand that looks like this:</p>\n\n<p><img src=\"/assets/images/nanopubs.png\" alt=\"\" /></p>\n\n<p>I wonder what options I have to to archive the full set up nanopublications on\nFigshare or Zenodo, and see that DOI show up here…</p>",
      "summary": "In December I reported about Groovy code to create nanopublications . This has been running for some time now, extracting nanopubs that assert that some metabolite is found in some species. I send the resulting nanopubs to Tobias Kuhn , to populate his Growing Resource of Provenance-Centric Scientific Linked Data (doi:10.1109/eScience.2018.00024, PDF).",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/nanopubs.png",
      "date_published": "2019-03-30T00:00:00+00:00",
      "date_modified": "2024-11-03T00:00:00+00:00",
      "tags": ["nanopub","cheminf","wikidata"],
      "_references": [{ "url": "https://doi.org/10.1109/ESCIENCE.2018.00024" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2018/12/27/creating-nanopublications-with-groovy.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2018/12/27/creating-nanopublications-with-groovy.html",
      "title": "Creating nanopublications with Groovy",
      "content_html": "<p><img style=\"float: right\" width=\"200\" src=\"/assets/images/Screenshot_20181227_075006.png\" />\nYesterday, I struggled some with creating <a href=\"http://nanopub.org/\">nanopublications</a> with <a href=\"https://en.wikipedia.org/wiki/Apache_Groovy\">Groovy</a>.\nMy first attempt was an utter failure, but then I discovered <a href=\"https://twitter.com/txkuhn\">Thomas Kuhn</a>’s\n<a href=\"https://github.com/Nanopublication/nanopub-java/blob/master/src/main/java/org/nanopub/NanopubCreator.java\">NanopubCreator</a>\nand it was downhill from there.</p>\n\n<p>On the right, a depiction is given of a compound found in Taphrorychus bicolor (doi:<a href=\"https://doi.org/10.1002/JLAC.199619961005\">10.1002/JLAC.199619961005</a>).\nPublished in <em>Liebigs Annalen</em>, see <a href=\"https://chem-bla-ics.blogspot.com/2018/12/from-annalen-der-pharmacie-to-european.html\">this post</a>\nabout the history of that journal.</p>\n\n<p>There are two good things about this. First, I now have a <a href=\"https://github.com/egonw/wikidataNanopublications\">code base</a>\nthat I can easily repurpose to make <em>trusty nanopublications</em> (doi:<a href=\"10.1007/978-3-319-07443-6_27\">10.1007/978-3-319-07443-6_27</a>)\nabout anything structured as a table (so can you).</p>\n\n<p>Second, I now about almost 1200 CCZero nanopublications that tell you in which species a certain metabolite\nhas been found. Sourced from <a href=\"https://wikidata.org/\">Wikidata</a>, using <a href=\"https://query.wikidata.org/\">their SPARQL end point</a>.\nThis collection is a bit boring that this moment, and most of them are human metabolites, where the source is either\n<a href=\"https://tools.wmflabs.org/scholia/work/Q28601559\">Recon 2.2</a> or <a href=\"https://wikipathways.org/\">WikiPathways</a>.\nBut I expect (hope) to see more DOIs to show up. Think\n<em><a href=\"https://blogs.biomedcentral.com/bmcblog/2018/11/01/challenge-reuse-additional-files-supplementary-information/\">We challenge you to reuse Additional Files</a></em>.</p>\n\n<p>Finally, you are probably interested in learning what one of the created nanopublications looks like, to I put\n<a href=\"https://gist.github.com/egonw/5fb0994cac6f9e851f3857cd306f0890\">a Gist online</a>:</p>\n\n<pre><code class=\"language-trig\">@prefix this: &lt;http://www.bigcat.unimaas.nl/nanopubs/wikidata/tmp/np742.RAwXcetTykN6UPVzBOyatKm30mbT6endXfDrxnarRysL0&gt; .\n@prefix sub: &lt;http://www.bigcat.unimaas.nl/nanopubs/wikidata/tmp/np742.RAwXcetTykN6UPVzBOyatKm30mbT6endXfDrxnarRysL0#&gt; .\n@prefix wd: &lt;http://www.wikidata.org/entity/&gt; .\n@prefix np: &lt;http://www.nanopub.org/nschema#&gt; .\n@prefix has-source: &lt;http://semanticscience.org/resource/SIO_000253&gt; .\n@prefix has-inchikey: &lt;http://semanticscience.org/resource/CHEMINF_000399&gt; .\n@prefix orcid: &lt;http://orcid.org/&gt; .\n@prefix wdt: &lt;http://www.wikidata.org/prop/direct/&gt; .\n@prefix owl: &lt;http://www.w3.org/2002/07/owl#&gt; .\n@prefix pav: &lt;http://purl.org/pav/&gt; .\n@prefix rdfs: &lt;http://www.w3.org/2000/01/rdf-schema#&gt; .\n@prefix skos: &lt;http://www.w3.org/2004/02/skos/core#&gt; .\n\nsub:Head {\n        this: np:hasAssertion sub:assertion ;\n                np:hasProvenance sub:provenance ;\n                np:hasPublicationInfo sub:pubinfo ;\n                a np:Nanopublication .\n}\n\nsub:assertion {\n        wd:Q15978631 rdfs:label \"Homo sapiens\"@en ;\n                skos:exactMatch &lt;http://purl.obolibrary.org/obo/NCBITaxon_9606&gt; .\n\n        wd:Q27125029 has-inchikey: \"APJYDQYYACXCRM-UHFFFAOYSA-O\" ;\n                rdfs:label \"tryptaminium\"@en ;\n                wdt:P703 wd:Q15978631 .\n}\n\nsub:provenance {\n        sub:assertion has-source: wd:Q2013 , wd:Q28601559 .\n\n        wd:Q28601559 rdfs:label \"Recon 2.2: from reconstruction to model of human metabolism\"@en ;\n                owl:sameAs &lt;https://doi.org/10.1007/S11306-016-1051-4&gt; .\n}\n\nsub:pubinfo {\n        this: pav:createdBy orcid:0000-0001-7542-0286 .\n}\n</code></pre>",
      "summary": "Yesterday, I struggled some with creating nanopublications with Groovy. My first attempt was an utter failure, but then I discovered Thomas Kuhn’s NanopubCreator and it was downhill from there.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/Screenshot_20181227_075006.png",
      "date_published": "2018-12-27T00:00:00+00:00",
      "date_modified": "2018-12-27T00:00:00+00:00",
      "tags": ["nanopub","wikidata","groovy"],
      "_references": [{ "url": "https://doi.org/10.1007/978-3-319-07443-6_27" },{ "url": "https://doi.org/10.1002/JLAC.199619961005" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2018/11/17/join-me-in-encouraging-acs-to-join.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2018/11/17/join-me-in-encouraging-acs-to-join.html",
      "title": "Join me in encouraging the ACS to join the Initiative for Open Citations",
      "content_html": "<p>My research is into abstract representation of chemical information, important for other research to be performed. Indeed, my work\nis generally reused, but knowing which research fields my work is used in, or which societal problems it is helping solve, is not\neasily retrieved or determined. Efforts like <a href=\"https://meta.wikimedia.org/wiki/WikiCite\">WikiCite</a> and\n<a href=\"https://tools.wmflabs.org/scholia/topic/Q45340488\">Scholia</a> do allow me to navigate the citation network, so that I can determine\nwhich research fields my output influences and which diseases are studied with methods I proposed. Here’s a\n<a href=\"https://query.wikidata.org/embed.html#%23defaultView%3AGraph%0ASELECT%0A%20%20%3Ftopic1%20%3Ftopic1Label%20%3Ftopic2%20%3Ftopic2Label%20%3Fcount%0AWITH%20%7B%0A%20%20SELECT%0A%20%20%20%20(COUNT(%3Fwork)%20AS%20%3Fcount)%20%3Ftopic1%20%3Ftopic2%0A%20%20WHERE%20%7B%0A%20%20%20%20%23%20Find%20works%20that%20are%20marked%20with%20main%20subject%20of%20the%20topic.%0A%20%20%20%20%3Fwork%20wdt%3AP2860%2Fwdt%3AP50%20wd%3AQ20895241%20.%0A%20%20%20%20%0A%20%20%20%20%23%20Identify%20co-occuring%20topics.%20%0A%20%20%20%20%3Fwork%20wdt%3AP921%20%3Ftopic1%2C%20%3Ftopic2%20.%20%0A%0A%20%20%20%20%23%20article%20by%20author%0A%20%20%20%20MINUS%20%7B%20%3Fwork%20wdt%3AP50%20wd%3AQ20895241%20.%20%7D%0A%20%20%20%20FILTER%20(%20%3Ftopic1%20!%3D%20%3Ftopic2%20)%0A%20%20%7D%0A%20%20GROUP%20BY%20%3Ftopic1%20%3Ftopic2%0A%20%20ORDER%20BY%20DESC(%3Fcount)%0A%0A%20%20%23%20There%20a%20performance%20problems%20in%20the%20browser%3A%20We%20cannot%20show%20large%20graphs%2C%0A%20%20%23%20so%20we%20put%20a%20limit%20on%20the%20number%20of%20links%20displayed.%0A%20%20LIMIT%20400%0A%0A%7D%20AS%20%25results%0AWHERE%20%7B%0A%20%20INCLUDE%20%25results%0A%20%20%0A%20%20%23%20Label%20the%20results%0A%20%20SERVICE%20wikibase%3Alabel%20%7B%0A%20%20%20%20bd%3AserviceParam%20wikibase%3Alanguage%20%22en%2Cda%2Cde%2Ces%2Cfr%2Cjp%2Cnl%2Cno%2Cru%2Csv%2Czh%22.%0A%20%20%7D%0A%7D%0A%0A\">network of topics of articles citing my work</a>:</p>\n\n<p><img src=\"/assets/images/exploring.png\" alt=\"\" /></p>\n\n<p>Graphs like this show information on how people are using my work, which in turn allows me to further support. But this relies on\nopen citations.</p>\n\n<p>In my opinion, citations are an essential part of our research process. It gives us access to import prior work on which a study\nis based, and reflects how a work influences other research or even is essential to that other work. For example, it allows us\nto not repeat earlier published work, while preserving the ability to reproduce the full work. The\n<a href=\"https://i4oc.org/\">Initiative for Open Citations</a> encourages these citations to be publicly available to benefit research, but\nremoving barriers to access this critical part of scholarly communication. While many societies and publishers have joined this\ninitiative, the <a href=\"https://pubs.acs.org/\">American Chemical Society</a> (ACS) has not yet. By not joining the limit the sharing of\nknowledge for unclear reasons.</p>\n\n<p>And I would really like to see the ACS to join this initiative, and proposed this a few times already. Because they still have\nnot joined the initiative, I have <a href=\"https://www.change.org/p/the-american-chemical-society-to-join-the-initiative-for-open-citations\">started this petition</a>.\nIf you agree, please sign and share it with others.</p>",
      "summary": "My research is into abstract representation of chemical information, important for other research to be performed. Indeed, my work is generally reused, but knowing which research fields my work is used in, or which societal problems it is helping solve, is not easily retrieved or determined. Efforts like WikiCite and Scholia do allow me to navigate the citation network, so that I can determine which research fields my output influences and which diseases are studied with methods I proposed. Here’s a network of topics of articles citing my work:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/exploring.png",
      "date_published": "2018-11-17T00:00:00+00:00",
      "date_modified": "2018-11-17T00:00:00+00:00",
      "tags": ["acs","i4oc","publishing"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2018/09/08/also-new-this-week-google-dataset-search.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2018/09/08/also-new-this-week-google-dataset-search.html",
      "title": "Also new this week: &quot;Google Dataset Search&quot;",
      "content_html": "<p>There was a lot of Open Science news this week. The <a href=\"https://www.blog.google/products/search/making-it-easier-discover-datasets/\">announcement</a>\nof the <a href=\"https://toolbox.google.com/datasetsearch\">Google Dataset Search</a> was one of them:</p>\n\n<p><img src=\"/assets/images/google_dataset_search.png\" alt=\"\" /></p>\n\n<p>Of course, I first tried searching for “<a href=\"https://toolbox.google.com/datasetsearch/search?query=RDF%20chemistry&amp;docid=hiQ14TdWzjx%2FQ37gAAAAAA%3D%3D\">RDF chemistry</a>”\nwhich shows some of my data sets (and a lot more):</p>\n\n<p><img src=\"/assets/images/google_dataset_search2.png\" alt=\"\" /></p>\n\n<p>It picks up data from many sources, such as <a href=\"https://figshare.com/\">Figshare</a> in this image. That means it also works\n(well, sort of, as <a href=\"https://twitter.com/baoilleach/status/1037986030266318848\">Noel O’Boyle noticed</a>) for\nsupplementary information from the <a href=\"https://jcheminf.biomedcentral.com/\">Journal of Cheminformatics</a>.</p>\n\n<p>It picks up metadata in several ways, among which <a href=\"http://schemas.org/\">schemas.org</a>. So, next week we’ll see if\nwe can get <a href=\"http://enanomapper.net/\">eNanoMapper</a> extended to spit compatible JSON-LD for its data sets, called “bundles”.</p>\n\n<h2 id=\"integrated-with-google-scholar\">Integrated with Google Scholar?</h2>\n\n<p>While the URL for the search engine does not suggest the service is more than a 20% project, we can\nhope it will stay around like Google Scholar has been. But I do hope they will further integrate it\nwith Scholar. For example, in the above figure, it did pick up that I am the author of that data set\n(well, repurposed from an effort of <a href=\"https://twitter.com/rapodaca\">Rich Apodaca</a>), it did not figure\nout that I am also on Scholar.</p>\n\n<p>So, these data sets do not show up in your Google Scholar profile yet, but they <strong><em>must</em></strong>. Time will\ntell where this data search engine is going. There are many interesting features, and given the amount\nof online attention, they won’t stop development just yet, and I expect to discover more and better\nfeatures in the next months. Give it a spin!</p>",
      "summary": "There was a lot of Open Science news this week. The announcement of the Google Dataset Search was one of them:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/google_dataset_search2.png",
      "date_published": "2018-09-08T00:00:00+00:00",
      "date_modified": "2018-09-08T00:00:00+00:00",
      "tags": ["data","google"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2017/12/15/new-paper-integration-among-databases.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2017/12/15/new-paper-integration-among-databases.html",
      "title": "New paper: &quot;Integration among databases and data sets to support productive nanotechnology: Challenges and recommendations&quot;",
      "content_html": "<p><img style=\"float: right;\" src=\"/assets/images/1-s2.0-S2452074817301398-gr1.png\" width=\"200\" alt=\"Figure 1 from the NanoImpact article. CC-BY.\" />\nThe U.S.A and European nanosafety communities have a longstanding history of collaboration. On both sides there are working groups,\n<a href=\"https://nciphub.org/groups/nanowg\">NanoWG</a> and <a href=\"https://www.nanosafetycluster.eu/working-groups/wg-f-data-management.html\">WG-F</a> (previously called\nWG4) of the NanoSafety Cluster. I have been chair of WG4 for about three years and still active in the group, though in the past half year, without\ndedicated funding, less active. That is already changing again with the imminent start of the\n<a href=\"https://twitter.com/iseult5/status/836879814581698560\">NanoCommons</a> project.</p>\n\n<p>One of these collaborations resulted in a series of papers around data curation (see\ndoi:<a href=\"https://doi.org/10.1039/C5NR08944A\">10.1039/C5NR08944A</a> and\ndoi:<a href=\"https://doi.org/10.3762/bjnano.6.189\">10.3762/bjnano.6.189</a>). Part of this effort was also an survey about the state of databases. A good\nnumber of databases responded to the call. It turned out non-trivial to analyse the results and write up a report around it with recommendations.\nThe first version was submitted and rejected, and with fresh leadership, the paper underwent a significant restructuring by\n<a href=\"http://www.codata.org/events/codata-prize/2006-john-rumble-usa\">John Rumble</a> and resubmitted to Elsevier’s\n<a href=\"http://www.sciencedirect.com/science/journal/24520748\">NanoImpact</a> and now online\n(doi:<a href=\"http://dx.doi.org/10.1016/j.impact.2017.11.002\">10.1016/j.impact.2017.11.002</a>).</p>\n\n<p>The paper outlines an overview of challenges and a recommendation to the community on how to proceed. That is, basically, how should projects\nlike <a href=\"https://search.data.enanomapper.net/\">eNanoMapper</a>, <a href=\"https://cananolab.nci.nih.gov/caNanoLab/\">caNanoLab</a>, and\n<a href=\"https://www.nanomaterialregistry.org/\">Nanomaterial Registry</a> evolve to, and what might the\n<a href=\"https://echa.europa.eu/-/eu-observatory-for-nanomaterials-launched\">European Union Observatory for Nanomaterials</a> (EUON) look like. BTW, a\nsimilar paper by Tropsha et al. was recently published the other week with a focus on the USA database ecosystem\n(doi:<a href=\"https://doi.org/10.1038/nnano.2017.233\">10.1038/nnano.2017.233</a>).</p>\n\n<p>Have fun reading <a href=\"https://doi.org/10.1016/j.impact.2017.11.002\">it</a>, and if you are working in a related field, please join\neither of the two aforementioned working groups! And a huge thanks to everyone involved, particular Sandra, John, and Christine.</p>",
      "summary": "The U.S.A and European nanosafety communities have a longstanding history of collaboration. On both sides there are working groups, NanoWG and WG-F (previously called WG4) of the NanoSafety Cluster. I have been chair of WG4 for about three years and still active in the group, though in the past half year, without dedicated funding, less active. That is already changing again with the imminent start of the NanoCommons project.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/1-s2.0-S2452074817301398-gr1.png",
      "date_published": "2017-12-15T00:00:00+00:00",
      "date_modified": "2017-12-15T00:00:00+00:00",
      "tags": ["nanosafety","enanomapper","nanocommons","eunsc"],
      "_references": [{ "url": "https://doi.org/10.1039/C5NR08944A" },{ "url": "https://doi.org/10.3762/bjnano.6.189" },{ "url": "https://doi.org/10.1016/J.IMPACT.2017.11.002" },{ "url": "https://doi.org/10.1038/nnano.2017.233" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2017/11/26/winter-solstice-challenge-what-is-your.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2017/11/26/winter-solstice-challenge-what-is-your.html",
      "title": "Winter solstice challenge: what is your Open Knowledge score?",
      "content_html": "<p><img src=\"/assets/images/Robert_Snache_-_Spirithands.net_-_Winter_Solstice_Lunar_Eclipse_Startrails_(by).jpg\" style=\"width: 30%; display: block; margin-left: auto; margin-right: auto; float: right\" alt=\"Photo of a time laps of a starry night, making the stars show as lines in the sky. Source: Wikimedia, CC-BY 2.0, https://commons.wikimedia.org/wiki/File:Robert_Snache_-_Spirithands.net_-_Winter_Solstice_Lunar_Eclipse_Startrails_(by).jpg)\" />\nHi all, welcome to this winter solstice challenge! Umm, to not give our southern hemisphere colleagues\nnot a disadvantage, as their winter solstice has already passes, you’re up for a summer solstice challenge!</p>\n\n<h2 id=\"introduction\">Introduction</h2>\n\n<p>So, you know <a href=\"http://impactstory.org/\">ImpactStory</a> and <a href=\"http://altmetric.com/\">Altmetric.com</a> (if not,\n<a href=\"https://chem-bla-ics.blogspot.com/search?q=impactstory&amp;max-results=20&amp;by-date=true\">browse</a>\n<a href=\"https://chem-bla-ics.blogspot.com/search?q=altmetric&amp;max-results=20&amp;by-date=true\">my blog</a>);\nthese are wonderful tools to see what people are doing with your work. I hope you already know about\n<a href=\"http://opencitations.net/\">OpenCitations</a>, a collaboration of publishers, CrossRef, and many others, to\nmake all citation data available. They just passed the 50% milestone, congratulations on that amazing\nachievement! For the younger scientists it may be worth realizing that for the past 20 years, at least,\nthis data was copyrighted and not to be used unless you paid. Elsevier is, BTW,\n<a href=\"https://opencitations.wordpress.com/2017/11/24/elsevier-references-dominate-those-that-are-not-open-at-crossref/\">the major culprit</a>\nstill claiming IP on this, but RT this if you are surprised.</p>\n\n<p>So, the reason I introduce both ImpactStory and OpenCitations is the following. Scientific articles are\ndata and knowledge dense documents. If we did not redirect the reader to other literature. That may give\na more complete sketch of the context, describe a measurement protocol, describe how certain knowledge\nwas derived, etc. Therefore, just having your article Open Access is not enough: the articles you cite\nshould be Open Access too. That’s the next phase if really making an effort to have\n<a href=\"https://en.wikisource.org/wiki/Universal_Declaration_of_Human_Rights\">all of humanity benefit from the fruits of science</a>.</p>\n\n<p>I know it is hard already to calculate a “Open Access” score, though ImpactStory does a great job at\nthat! So, calculating this for your paper and the papers those papers cite is even harder. You may\nneed to brush up your algorithm and programming skills.</p>\n\n<h2 id=\"eligibility\">Eligibility</h2>\n\n<p>Anyone is allowed to participate. Submission of your entry is done online, e.g. in your blog, in a public\nwrite up, or even a <a href=\"https://en.wikipedia.org/wiki/Open_notebook_science\">open notebook</a>!\nHowever, you need at least on citable research object. That is, it\nneeds a DOI. Otherwise, I cannot give you the prize (see below). The score should be based on all your\nproducts. Bonus points for those who include software and data citations. Excluding citable object to\nboost your score (for example, I would have to exclude my book chapters), is seen as cheating the system.</p>\n\n<p><img src=\"/assets/images/800px-Global_key-route_main_paths_for_a_citation_network.svg.png\" style=\"width: 40%; display: block; margin-left: auto; margin-right: auto; float: right\" alt=\"Your article B may cite three articles (C, D, J) but article D also cited articles (F, I). So, your Open Knowledge score is recursive. Source: Wikipedia, CC-BY-SA 4.0, https://commons.wikimedia.org/wiki/File:Global_key-route_main_paths_for_a_citation_network.svg\" /></p>\n\n<h2 id=\"depth\">Depth</h2>\n\n<p>Calculating your Open Knowledge score can be done at multiple levels. After all, your article depends\n(cites) articles, and your software depends on libraries, but those cited articles and software\ndependencies recursively also cite articles and/or software. The complexity is non-trivial, making it\na perfect solstice challenge indeed!</p>\n\n<h2 id=\"prizes\">Prizes</h2>\n\n<p>The prize I have to offer is my continued commitment to Open Science, but that you already get for\nfree and may not be enough boon. So, instead, soon after the winter/summer solstice at the end of this year,\nI will blog about your research boosting your <a href=\"https://en.wikipedia.org/wiki/Altmetrics\">#altmetrics</a>\nscores. Yes, I will actually read and try to understand it!</p>\n\n<p>And because there is the results and the method, neither of which exist yet, there are two categories! I just\n<strong><em>doubled your chance</em></strong> of winning! That’s because humanity is worth it! One prize for the best tool to calculated\nyour Open Knowledge score, and one prize for the researcher with the highest score.</p>\n\n<h2 id=\"audience-prize\">Audience Prize</h2>\n\n<p>If someone feels a need to organize an audience prize, this is very much encouraged! (Assuming Open approaches, of course :)</p>",
      "summary": "Hi all, welcome to this winter solstice challenge! Umm, to not give our southern hemisphere colleagues not a disadvantage, as their winter solstice has already passes, you’re up for a summer solstice challenge!",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/Robert_Snache_-_Spirithands.net_-_Winter_Solstice_Lunar_Eclipse_Startrails_(by).jpg",
      "date_published": "2017-11-26T00:00:00+00:00",
      "date_modified": "2017-11-26T00:00:00+00:00",
      "tags": ["solstice","altmetrics","opencitations"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2017/10/15/two-conference-proceedings.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2017/10/15/two-conference-proceedings.html",
      "title": "Two conference proceedings: nanopublications and Scholia",
      "content_html": "<p><img style=\"float: right;\" src=\"/assets/images/Screenshot_20171015_131507.png\" width=\"300\" alt=\"The nanopublication conference article in Scholia.\" />\nIt takes effort to move scholarly publishing forward. And the traditional publishers have not all shown to\nbe good at that: we’re still basically stuck with machine-broken channels like PDFs and ReadCubes. They seem\nto all love text mining, but only if they can do it themselves.</p>\n\n<p>Fortunately, there are plenty of people who do like to make a difference and like to innovate. I find this\nimportant, because if we do not do it, who will. Two people who make an effort are two researchers who\nrecently published their work as conference proceedings: <a href=\"http://www.tkuhn.org/\">Tobias Kuhn</a> and\n<a href=\"https://github.com/fnielsen\">Finn Nielsen</a>. And I am happy to have been able to contribute to both efforts.</p>\n\n<h2 id=\"nanopublications\">Nanopublications</h2>\n\n<p>Tobias works on <a href=\"https://web.archive.org/web/20171004200524/http://nanopub.org/wordpress/\">nanopublications <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>\nwhich innovates how we make knowledge machine\nreadable. And I have stressed how important this is in my blog for years. Nanopublications describe how\nknowledge is captures, makes it FAIR, but importantly, it links the knowledge to the research that led to the\nknowledge. His <a href=\"https://doi.org/10.1007/978-3-319-68288-4_26\">recent conference proceedings</a>\ndetails how nanopublications can be used to establish incremental\nknowledge. That is, given two sets of nanopubblications, it determines which have been removed, added, and\nchanged. The paper continues outlining how that can be used to reduce, for example, download sizes and how\nit can help establish an efficient change history.</p>\n\n<h2 id=\"scholia\">Scholia</h2>\n\n<p>And Finn developed <a href=\"https://scholia.toolforge.org/\">Scholia <i class=\"fa-solid fa-recycle fa-xs\"></i></a>, an interface not unlike Web-of-Science. But\nthen based on <a href=\"http://wikidata.org/\">Wikidata</a> and therefore fully on CCZero data. And, with a community\nactively adding the full history of scholarly literature and the citations between papers, courtesy to the\n<a href=\"https://i4oc.org/\">Initiative for Open Citations</a>. This is opening up a lot of possibilities: from keeping\ntrack of articles citing your work, to get alerts of articles publishing new data on your favorite gene or\nmetabolite.</p>",
      "summary": "It takes effort to move scholarly publishing forward. And the traditional publishers have not all shown to be good at that: we’re still basically stuck with machine-broken channels like PDFs and ReadCubes. They seem to all love text mining, but only if they can do it themselves.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/Screenshot_20171015_131507.png",
      "date_published": "2017-10-15T00:00:00+00:00",
      "date_modified": "2025-01-02T00:00:00+00:00",
      "tags": ["scholia","nanopub"],
      "_references": [{ "url": "https://doi.org/10.48550/ARXIV.1703.04222" },{ "url": "https://doi.org/10.1007/978-3-319-68288-4_26" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2016/03/27/migrating-pka-data-from-drugmet-to.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2016/03/27/migrating-pka-data-from-drugmet-to.html",
      "title": "Migrating pKa data from DrugMet to Wikidata",
      "content_html": "<p>In 2010 <a href=\"https://twitter.com/smllmp\">Samuel Lampa</a> and I started a pet project:\ncollecting pK<sub>a</sub> data: he was working on RDF extension of MediaWiki and I like consuming\nRDF data. We started <a href=\"http://drugmet.rilspace.org/wiki/Main_Page\">DrugMet</a>.\nWhen you read this post, this MediaWiki installation may already be down, which\nis why I am migrating the data to <a href=\"https://en.wikipedia.org/wiki/Wikidata\">Wikidata</a>.\nWhy? Because data curation takes effort, I like to play with Wikidata (see\n<a href=\"http://rio.pensoft.net/articles.php?id=7573\">this H2020 proposal</a> by \n<a href=\"https://twitter.com/EvoMRI\">Daniel Mietchen</a> <em>et al.</em>), I like Open Data, and it still\n<a href=\"http://proteinsandwavefunctions.blogspot.nl/2016/03/generating-protonation-states-and.html\">much needed</a>.</p>\n\n<p>We opted for a page with the minimal amount of information. To maximize the speed\nat which we could add information. However, when it came to semantics, we tried\nto be as explicit as possible, and, e.g. use <a href=\"https://doi.org/10.1371/journal.pone.0025513\">the CHEMINF ontology</a>.\nSo, it collected:</p>\n\n<ol>\n  <li>InChIKey (used to show images)</li>\n  <li>the paper it was collected from (identified by a DOI)</li>\n  <li>the value, and where possible, the experimental error</li>\n</ol>\n\n<p>A page typically looks something like this:</p>\n\n<p><img src=\"/assets/images/pKa.png\" alt=\"\" /></p>\n\n<p>While not used on all pages, at some point I even started using templates, and\nI used these two, for molecules and papers:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>{{Molecule\n  |Name=\n  |InChIKey=\n  |DOI=\n  |Wikidata=\n}}\n\n{{Paper\n  |DOI=\n  |Year=\n  |Wikidata=\n}}\n</code></pre></div></div>\n\n<p>These templates, as well as the above screenshot, already contain a spoiler, but\nmore about that later. Using MediaWiki functionality it was now easy to make lists,\ne.g. for all pK<sub>a</sub> data (more spoilers):</p>\n\n<p><img src=\"/assets/images/pKa1.png\" alt=\"\" /></p>\n\n<p>I find a database like this very important. It does not capture all the information\nit should be capturing, though, as is clear from <a href=\"https://www.overleaf.com/read/wqfsxrgrrbzx\">the proposal</a>\nsome of use worked on a while back. However, this project got on hold; I don’t\nhave time for it anymore, and it is not core to our department enough to spend\ntime on write grant proposals for it.</p>\n\n<p>But I still do not want to get this data get lost. Wikidata is something I have\nstarted using, as it is a machine readable CCZero database with an increasing\namount of scientific knowledge. More and more people are working on it, and you\nmust absolutely <a href=\"http://dx.doi.org/10.1093/database/baw015\">read this paper</a>\nabout this very topic (by <a href=\"https://bitbucket.org/sulab/wikidatabots\">a great team</a>\nyou should track, anyway). I am using it myself as source of identifier mappings\nand more. So, migrating the previously collected data to Wikidata makes perfect\nsense to me:</p>\n\n<ol>\n  <li>if a compound is missing, I can easily <a href=\"https://chem-bla-ics.linkedchemistry.info/2016/03/20/adding-disclosures-to-wikidata-with.html\">create a new one using Bioclipse <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n  <li>if a paper is missing, I can easily <a href=\"https://chem-bla-ics.linkedchemistry.info/2016/03/20/adding-disclosures-to-wikidata-with.html\">create a new one using Magnus Manske’s QuickStatements <i class=\"fa-solid fa-recycle fa-xs\"></i></a></li>\n  <li>Wikidata has a pretty decent provenance model</li>\n</ol>\n\n<p>I can annotate data with the data source (paper) it came from and also experimental conditions:</p>\n\n<p><img src=\"/assets/images/pKa2.png\" alt=\"\" /></p>\n\n<p>In fact, you’ll note that the the book is a separate Wikidata entry in itself.\nBetter even, it’s an ‘edition’ of the book. This is the whole point we make in\nthe above linked H2020 proposal: Wikidata is not a database specific for one\ndomain, it works for any (scholarly) domain, and seamlessly links all those\ndomains.</p>\n\n<p>Now, to keep track of what data I have migrated, I am annotating DrugMet entries\nwith links to Wikidata: everything with a Wikidata Q-code is already migrated.\nThe above pK<sub>a</sub> table already shows Q-identifiers, but I also created them for all\ndata sources I have used (three of them are two books and\n<a href=\"https://twitter.com/JBiolChem/status/713779938969698305\">one old paper without a DOI</a>):</p>\n\n<p><img src=\"/assets/images/pKa3.png\" alt=\"\" /></p>\n\n<p>I have still quite a number of entries to do, but all the protocols are set up now.</p>\n\n<p>On the downstream side, Wikidata is also great because of\n<a href=\"https://query.wikidata.org/\">their SPARQL end point</a>. Something that I did not\nget worked out some weeks ago, I did manage yesterday (after\n<a href=\"https://twitter.com/arthursmith/status/713730159422095360\">some encouragement from @arthursmith</a>):\nlist all pK<sub>a</sub> statements, including literature source if available:</p>\n\n<p>If you <a href=\"https://query.wikidata.org/#SELECT%20%3Fwikidata%20%3Fcompound%20%3FpKa%20%3Fsource%20%3Ftitle%20%3Fdoi%20WHERE%20%7B%0A%20%20%3Fwikidata%20%3Chttp%3A%2F%2Fwww.wikidata.org%2Fprop%2FP1117%3E%20%3Ffoo%20%3B%0A%20%20%20%20rdfs%3Alabel%20%3Fcompound%20.%0A%20%20%3Ffoo%20a%20wikibase%3ABestRank%20%3B%0A%20%20%20%20%3Chttp%3A%2F%2Fwww.wikidata.org%2Fprop%2Fstatement%2FP1117%3E%20%3FpKa%20.%0A%20%20OPTIONAL%20%7B%0A%20%20%20%20%3Ffoo%20prov%3AwasDerivedFrom%2F%3Chttp%3A%2F%2Fwww.wikidata.org%2Fprop%2Freference%2FP248%3E%20%3Fsource%20.%0A%20%20%20%20%3Fsource%20rdfs%3Alabel%20%3Ftitle%20.%0A%20%20%20%20OPTIONAL%20%7B%20%3Fsource%20wdt%3AP356%20%3Fdoi%20.%20%7D%0A%20%20%20%20FILTER(lang(%3Ftitle)%20%3D%20%22en%22)%0A%20%20%7D%0A%20%20FILTER(lang(%3Fcompound)%20%3D%20%22en%22)%0A%7D\">run that query on the Wikidata endpoint</a>,\nyou get a table like this:</p>\n\n<p><img src=\"/assets/images/pKa4.png\" alt=\"\" /></p>\n\n<p>We here see experimental data from two papers: <a href=\"https://doi.org/10.1021/ja01489a008\">10.1021/ja01489a008</a>\nand <a href=\"https://doi.org/10.1021/ed050p510\">10.1021/ed050p510</a>. This can all be\ndisplayed a lot fancier, like make histograms, tables with 2D drawings of the\nchemical structures, etc, but I leave that to the reader.</p>",
      "summary": "In 2010 Samuel Lampa and I started a pet project: collecting pKa data: he was working on RDF extension of MediaWiki and I like consuming RDF data. We started DrugMet. When you read this post, this MediaWiki installation may already be down, which is why I am migrating the data to Wikidata. Why? Because data curation takes effort, I like to play with Wikidata (see this H2020 proposal by Daniel Mietchen et al.), I like Open Data, and it still much needed.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/pKa4.png",
      "date_published": "2016-03-27T00:00:00+00:00",
      "date_modified": "2024-11-02T00:00:00+00:00",
      "tags": ["wikidata","chemistry"],
      "_references": [{ "url": "https://doi.org/10.3897/RIO.1.E7573" },{ "url": "https://doi.org/10.1371/JOURNAL.PONE.0025513" },{ "url": "https://doi.org/10.1093/DATABASE/BAW015" },{ "url": "https://doi.org/10.1021/ED050P510" },{ "url": "https://doi.org/10.1021/JA01489A008" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2016/03/20/adding-disclosures-to-wikidata-with.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2016/03/20/adding-disclosures-to-wikidata-with.html",
      "title": "Adding disclosures to Wikidata with Bioclipse",
      "content_html": "<p>Last week the huge, bi-annual ACS meeting took place (<a href=\"https://twitter.com/search?q=%23ACSSanDiego\">#ACSSanDiego</a>),\nduring which commonly new drug (leads) are disclosed. This time too, like this one tweeted by\n<a href=\"https://twitter.com/beth_halford\">Bethany Halford</a>:</p>\n\n<iframe id=\"twitter-widget-3\" scrolling=\"no\" frameborder=\"0\" allowtransparency=\"true\" allowfullscreen=\"true\" class=\"\" title=\"X Post\" src=\"https://platform.twitter.com/embed/Tweet.html?dnt=false&amp;embedId=twitter-widget-3&amp;features=eyJ0ZndfdGltZWxpbmVfbGlzdCI6eyJidWNrZXQiOltdLCJ2ZXJzaW9uIjpudWxsfSwidGZ3X2ZvbGxvd2VyX2NvdW50X3N1bnNldCI6eyJidWNrZXQiOnRydWUsInZlcnNpb24iOm51bGx9LCJ0ZndfdHdlZXRfZWRpdF9iYWNrZW5kIjp7ImJ1Y2tldCI6Im9uIiwidmVyc2lvbiI6bnVsbH0sInRmd19yZWZzcmNfc2Vzc2lvbiI6eyJidWNrZXQiOiJvbiIsInZlcnNpb24iOm51bGx9LCJ0ZndfZm9zbnJfc29mdF9pbnRlcnZlbnRpb25zX2VuYWJsZWQiOnsiYnVja2V0Ijoib24iLCJ2ZXJzaW9uIjpudWxsfSwidGZ3X21peGVkX21lZGlhXzE1ODk3Ijp7ImJ1Y2tldCI6InRyZWF0bWVudCIsInZlcnNpb24iOm51bGx9LCJ0ZndfZXhwZXJpbWVudHNfY29va2llX2V4cGlyYXRpb24iOnsiYnVja2V0IjoxMjA5NjAwLCJ2ZXJzaW9uIjpudWxsfSwidGZ3X3Nob3dfYmlyZHdhdGNoX3Bpdm90c19lbmFibGVkIjp7ImJ1Y2tldCI6Im9uIiwidmVyc2lvbiI6bnVsbH0sInRmd19kdXBsaWNhdGVfc2NyaWJlc190b19zZXR0aW5ncyI6eyJidWNrZXQiOiJvbiIsInZlcnNpb24iOm51bGx9LCJ0ZndfdXNlX3Byb2ZpbGVfaW1hZ2Vfc2hhcGVfZW5hYmxlZCI6eyJidWNrZXQiOiJvbiIsInZlcnNpb24iOm51bGx9LCJ0ZndfdmlkZW9faGxzX2R5bmFtaWNfbWFuaWZlc3RzXzE1MDgyIjp7ImJ1Y2tldCI6InRydWVfYml0cmF0ZSIsInZlcnNpb24iOm51bGx9LCJ0ZndfbGVnYWN5X3RpbWVsaW5lX3N1bnNldCI6eyJidWNrZXQiOnRydWUsInZlcnNpb24iOm51bGx9LCJ0ZndfdHdlZXRfZWRpdF9mcm9udGVuZCI6eyJidWNrZXQiOiJvbiIsInZlcnNpb24iOm51bGx9fQ%3D%3D&amp;frame=false&amp;hideCard=false&amp;hideThread=false&amp;id=710543705812426752&amp;lang=en&amp;origin=https%3A%2F%2Fchem-bla-ics.blogspot.com%2F2016%2F03%2Fadding-disclosures-to-wikidata-with.html&amp;sessionId=ba8a9ed10d55387ac0f656bfaf73f3a579e1e77a&amp;theme=light&amp;widgetsVersion=2615f7e52b7e0%3A1702314776716&amp;width=550px\" style=\"position: static; visibility: visible; width: 550px; height: 1311px; display: block; flex-grow: 1;\" data-tweet-id=\"710543705812426752\"></iframe>\n<p><br /></p>\n\n<p>Because getting this information out in the open is important, I think it’s a good idea to add them to\n<a href=\"http://wikidata.org/\">Wikidata</a> (see doi:<a href=\"http://dx.doi.org/10.3897/rio.1.e7573\">10.3897/rio.1.e7573</a>).\nSo, with <a href=\"http://www.bioclipse.net/\">Bioclipse</a> (doi:<a href=\"http://dx.doi.org/10.1186/1471-2105-8-59\">10.1186/1471-2105-8-59</a>)\nI redrew the structure:</p>\n\n<p><img src=\"/assets/images/strucutre.png\" alt=\"\" /></p>\n\n<p>I previously blogged about how to <a href=\"https://chem-bla-ics.linkedchemistry.info/2016/01/27/adding-chemical-compound-to-wikidata.html\">add chemicals to Wikidata <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\nbut I realized that I wanted to also use Bioclipse to automate this process a bit. So, I wrote this script to generated the SMILES, InChI,\nInChIKey, double check the compound is not already in Wikidata (using the <a href=\"https://query.wikidata.org/\">Wikidata SPARQL endpoint</a>),\nan look up the <a href=\"https://pubchem.ncbi.nlm.nih.gov/\">PubChem</a> compound identifier (example SMILES).</p>\n\n<div class=\"language-groovy highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">smiles</span> <span class=\"o\">=</span> <span class=\"s2\">\"CCCC\"</span>\n\n<span class=\"n\">mol</span> <span class=\"o\">=</span> <span class=\"n\">cdk</span><span class=\"o\">.</span><span class=\"na\">fromSMILES</span><span class=\"o\">(</span><span class=\"n\">smiles</span><span class=\"o\">)</span>\n<span class=\"n\">ui</span><span class=\"o\">.</span><span class=\"na\">open</span><span class=\"o\">(</span><span class=\"n\">mol</span><span class=\"o\">)</span>\n\n<span class=\"n\">inchiObj</span> <span class=\"o\">=</span> <span class=\"n\">inchi</span><span class=\"o\">.</span><span class=\"na\">generate</span><span class=\"o\">(</span><span class=\"n\">mol</span><span class=\"o\">)</span>\n<span class=\"n\">inchiShort</span> <span class=\"o\">=</span> <span class=\"n\">inchiObj</span><span class=\"o\">.</span><span class=\"na\">value</span><span class=\"o\">.</span><span class=\"na\">substring</span><span class=\"o\">(</span><span class=\"mi\">6</span><span class=\"o\">)</span>\n<span class=\"n\">key</span> <span class=\"o\">=</span> <span class=\"n\">inchiObj</span><span class=\"o\">.</span><span class=\"na\">key</span> <span class=\"c1\">// key = \"GDGXJFJBRMKYDL-FYWRMAATSA-N\"</span>\n\n<span class=\"n\">sparql</span> <span class=\"o\">=</span> <span class=\"s2\">\"\"\"\nPREFIX wdt: &lt;http://www.wikidata.org/prop/direct/&gt;\nSELECT ?compound WHERE {\n  ?compound wdt:P235 \"$key\" .\n}\n\"\"\"</span>\n\n<span class=\"k\">if</span> <span class=\"o\">(</span><span class=\"n\">bioclipse</span><span class=\"o\">.</span><span class=\"na\">isOnline</span><span class=\"o\">())</span> <span class=\"o\">{</span>\n  <span class=\"n\">results</span> <span class=\"o\">=</span> <span class=\"n\">rdf</span><span class=\"o\">.</span><span class=\"na\">sparqlRemote</span><span class=\"o\">(</span>\n    <span class=\"s2\">\"https://query.wikidata.org/sparql\"</span><span class=\"o\">,</span> <span class=\"n\">sparql</span>\n  <span class=\"o\">)</span>\n  <span class=\"n\">missing</span> <span class=\"o\">=</span> <span class=\"n\">results</span><span class=\"o\">.</span><span class=\"na\">rowCount</span> <span class=\"o\">==</span> <span class=\"mi\">0</span>\n<span class=\"o\">}</span> <span class=\"k\">else</span> <span class=\"o\">{</span>\n  <span class=\"n\">missing</span> <span class=\"o\">=</span> <span class=\"kc\">true</span>\n<span class=\"o\">}</span>\n\n<span class=\"n\">formula</span> <span class=\"o\">=</span> <span class=\"n\">cdk</span><span class=\"o\">.</span><span class=\"na\">molecularFormula</span><span class=\"o\">(</span><span class=\"n\">mol</span><span class=\"o\">)</span>\n\n<span class=\"c1\">// Create the Wikidata QuickStatement,</span>\n<span class=\"c1\">// see https://tools.wmflabs.org/wikidata-todo/quick_statements.php</span>\n\n<span class=\"n\">item</span> <span class=\"o\">=</span> <span class=\"s2\">\"LAST\"</span> <span class=\"c1\">// set to Qxxxx if you need to append info,</span>\n              <span class=\"c1\">// e.g. item = \"Q22579236\"</span>\n\n<span class=\"n\">pubchemLine</span> <span class=\"o\">=</span> <span class=\"s2\">\"\"</span>\n<span class=\"k\">if</span> <span class=\"o\">(</span><span class=\"n\">bioclipse</span><span class=\"o\">.</span><span class=\"na\">isOnline</span><span class=\"o\">())</span> <span class=\"o\">{</span>\n  <span class=\"n\">pcResults</span> <span class=\"o\">=</span> <span class=\"n\">pubchem</span><span class=\"o\">.</span><span class=\"na\">search</span><span class=\"o\">(</span><span class=\"n\">key</span><span class=\"o\">)</span>\n  <span class=\"k\">if</span> <span class=\"o\">(</span><span class=\"n\">pcResults</span><span class=\"o\">.</span><span class=\"na\">size</span> <span class=\"o\">==</span> <span class=\"mi\">1</span><span class=\"o\">)</span> <span class=\"o\">{</span>\n    <span class=\"n\">cid</span> <span class=\"o\">=</span> <span class=\"n\">pcResults</span><span class=\"o\">[</span><span class=\"mi\">0</span><span class=\"o\">]</span>\n    <span class=\"n\">pubchemLine</span> <span class=\"o\">=</span> <span class=\"s2\">\"$item\\tP662\\t\\\"$cid\\\"\"</span>\n  <span class=\"o\">}</span>\n<span class=\"o\">}</span>\n\n<span class=\"k\">if</span> <span class=\"o\">(!</span><span class=\"n\">missing</span><span class=\"o\">)</span> <span class=\"o\">{</span>\n  <span class=\"n\">println</span> <span class=\"s2\">\"====================\"</span>\n  <span class=\"n\">println</span> <span class=\"s2\">\"Already in Wikidata as \"</span> <span class=\"o\">+</span> <span class=\"n\">results</span><span class=\"o\">.</span><span class=\"na\">get</span><span class=\"o\">(</span><span class=\"mi\">1</span><span class=\"o\">,</span><span class=\"s2\">\"compound\"</span><span class=\"o\">)</span>\n  <span class=\"n\">println</span> <span class=\"s2\">\"====================\"</span>\n<span class=\"o\">}</span> <span class=\"k\">else</span> <span class=\"o\">{</span>\n  <span class=\"n\">statement</span> <span class=\"o\">=</span> <span class=\"s2\">\"\"\"\n    CREATE\n    \n    $item\\tDen\\t\\\"chemical compound\\\"\n    $item\\tP233\\t\\\"$smiles\\\"\n    $item\\tP274\\t\\\"$formula\\\"\n    $item\\tP234\\t\\\"$inchiShort\\\"\n    $item\\tP235\\t\\\"$key\\\"\n    $pubchemLine\n  \"\"\"</span>\n\n  <span class=\"n\">println</span> <span class=\"s2\">\"====================\"</span>\n  <span class=\"n\">println</span> <span class=\"n\">statement</span>\n  <span class=\"n\">println</span> <span class=\"s2\">\"====================\"</span>\n<span class=\"o\">}</span>\n</code></pre></div></div>\n\n<p>The output of this script is a <a href=\"https://tools.wmflabs.org/wikidata-todo/quick_statements.php\">QuickStatement</a> for\n<a href=\"http://twitter.org/MagnusManske\">Magnus Manske</a>’s tool (IMPORTANT: it’s not meant to automate editing Wikidata! I only automate\ncreating the input, which I carefully check (e.g. checking all stereochemistry is defined)! Note, how Bioclipse opens up the\nstructure in a viewer with ui.open()), which is a list of commands to create and edit entries in Wikidata. You need to enable\nit first, but if you have an account, this is not too hard. Of course, the advantage is that it is a lot quicker. I have similar\nscript to create QuickStatements starting with only a <a href=\"https://www.ebi.ac.uk/chembl/\">ChEMBL</a> identifier.</p>\n\n<p>The QuickStatement for GDC-0853 looks like:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>    CREATE\n    \n    LAST Den \"chemical compound\"\n    LAST P233 \"O=C1C(=CC(=CN1C)c2ccnc(c2CO)N4C(=O)c3cc5c(n3CC4)CC(C)(C)C5)Nc6ncc(cc6)N7CCN(C[C@@H]7C)C8COC8\"\n    LAST P274 \"C37H44N8O4\"\n    LAST P234 \"1S/C37H44N8O4/c1-23-18-42(27-21-49-22-27)9-10-43(23)26-5-6-33(39-17-26)40-30-13-25(19-41(4)35(30)47)28-7-8-38-34(29(28)20-46)45-12-11-44-31(36(45)48)14-24-15-37(2,3)16-32(24)44/h5-8,13-14,17,19,23,27,46H,9-12,15-16,18,20-22H2,1-4H3,(H,39,40)/t23-/m0/s1\"\n    LAST P235 \"WNEODWDFDXWOLU-QHCPKHFHSA-N\"\n    LAST P662 \"86567195\"\n</code></pre></div></div>\n\n<p>The first line creates a new Wikidata item, while the next ones add information about this compound. GDC-0853 is now also\n<a href=\"https://www.wikidata.org/wiki/Q23304817\">Q23304817</a>. The label I added manually afterwards. Note how the Bioclipse script found\nthe PubChem identifier, using the InChIKey. I also use this approach to add compounds to Wikidata that we have in\n<a href=\"http://wikipathways.org/\">WikiPathways</a>.</p>",
      "summary": "Last week the huge, bi-annual ACS meeting took place (#ACSSanDiego), during which commonly new drug (leads) are disclosed. This time too, like this one tweeted by Bethany Halford:",
      
      "date_published": "2016-03-20T00:00:00+00:00",
      "date_modified": "2024-11-02T00:00:00+00:00",
      "tags": ["acs","bioclipse","chembl","inchi","pubchem","wikidata","acssandiego"],
      "_references": [{ "url": "https://doi.org/10.1186/1471-2105-8-59" },{ "url": "https://doi.org/10.3897/RIO.1.E7573" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2016/01/27/adding-chemical-compound-to-wikidata.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2016/01/27/adding-chemical-compound-to-wikidata.html",
      "title": "Adding chemical compounds to Wikidata",
      "content_html": "<p>Adding chemical compounds to <a href=\"https://www.wikidata.org/\">Wikidata</a> is not difficult. You can store the chemical formula\n(<a href=\"https://www.wikidata.org/wiki/Property:P274\">P274</a>), (canonical) <a href=\"http://chem-bla-ics.blogspot.nl/2015/12/the-quality-of-smiles-strings-in.html\">SMILES</a>\n(<a href=\"https://www.wikidata.org/wiki/Property:P233\">P233</a>), InChIKey (<a href=\"https://www.wikidata.org/wiki/Property:P235\">P235</a>) (and InChI\n(<a href=\"https://www.wikidata.org/wiki/Property:P233\">P234</a>), of course), as well various database identifiers (see what I wrote about that\n[here(http://chem-bla-ics.blogspot.nl/2015/12/new-edition-getting-cas-registry.html)]). It also allows storing of the provenance, and has predicates\nfor that too.</p>\n\n<p>So, to enter a new structure for a compound, you should enter the compound information to Wikidata. Of course, make sure to create the needed accounts,\nparticularly one for Wikidata (<a href=\"https://www.wikidata.org/w/index.php?title=Special:UserLogin&amp;returnto=Wikidata%3AMain+Page&amp;type=signup\">create account</a>)\n(not sure if the next steps needs a more general Wikimedia account too).</p>\n\n<p><strong>Entering the research paper</strong>: <br />\n<a href=\"https://twitter.com/MagnusManske\">Magnus Manske</a> <a href=\"https://twitter.com/MagnusManske/status/691664308523130880\">pointed</a> me to\n<a href=\"http://tools.wmflabs.org/sourcemd/\">this helper tool</a>. If you have the DOI of the paper, it is easy to add a new paper. This is what the tool shows\nfor doi:<a href=\"http://dx.doi.org/10.1128/AAC.01148-08\">10.1128/AAC.01148-08</a> (but no longer when you try!):</p>\n\n<p><img src=\"/assets/images/smd.png\" alt=\"\" /></p>\n\n<p>You need permission to run this script and the tool will alert you about that, and give the instructions how to get permission. After\nI clicked the Open in QuickStatements I get this output, showing me an entry in Wikidata was created for this paper:</p>\n\n<p><img src=\"/assets/images/smd1.png\" alt=\"\" /></p>\n\n<p>Later, I can use the new Q-code (<a href=\"https://www.wikidata.org/wiki/Q22309806\">Q22309806</a>) to use as source for statements about the compound (formula, etc).</p>\n\n<p><strong>Draw your compound and get an InChIKey</strong>: <br />\nThe next step is to draw a compound and get an InChIKey. This can be done with many tools, including\n<a href=\"http://bioclipse.net/\">Bioclipse</a>. Rajarshi opted for alternatives:</p>\n\n<ul>\n<a href=\"https://twitter.com/collabchem\">@collabchem</a> <a href=\"https://twitter.com/egonwillighagen\">@egonwillighagen</a> OSRA or <a href=\"https://t.co/ZIQdgrYsmr\">https://t.co/ZIQdgrYsmr</a>? <br />\n— Rajarshi Guha (@rguha) <a href=\"https://twitter.com/rguha/status/692377715735949313\">January 27, 2016</a>\n</ul>\n\n<p>Then check if the compound is not already in Wikidata. You can use this SPARQL query for that using the InChIKey of the compound (it’s for acetic acid, so it will be found):</p>\n\n<p><img src=\"/assets/images/smd3.png\" alt=\"\" /></p>\n\n<p>For convenience, here the copy/pastable SPARQL:</p>\n\n<div class=\"language-sparql highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">PREFIX</span><span class=\"w\"> </span><span class=\"nn\">wdt</span><span class=\"o\">:</span><span class=\"w\">\n</span><span class=\"k\">SELECT</span><span class=\"w\"> </span><span class=\"nv\">?compound</span><span class=\"w\"> </span><span class=\"k\">WHERE</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\">\n  </span><span class=\"nv\">?compound</span><span class=\"w\"> </span><span class=\"nn\">wdt</span><span class=\"o\">:</span><span class=\"ss\">P235</span><span class=\"w\"> </span><span class=\"s2\">\"QTBSBXVTEAMEQO-UHFFFAOYSA-N\"</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n</span><span class=\"p\">}</span><span class=\"w\">\n</span></code></pre></div></div>\n\n<p><strong>Entering the compound</strong>: <br />\nSo, the compound is not already in Wikidata, so time to add it. The minimal information you should provide is the following:</p>\n\n<ul>\n  <li>mark the new entry as ‘instance of’ (P) ‘chemical compound (Q)</li>\n  <li>the chemical formula and SMILES (use as reference the paper)\n    <ul>\n      <li>add the reference to the paper you entered above</li>\n    </ul>\n  </li>\n  <li>add the InChIKey and/or InChI</li>\n</ul>\n\n<p>The first step is to create a new Wikidat entry. The Create new item menu in the left side panel can be used, showing a page like this:</p>\n\n<p><img src=\"/assets/images/smd2.png\" alt=\"\" /></p>\n\n<p>As a label you can use the name used in the paper for the compound, even if a code, and as description ‘chemical compound’ will do for now; it can be changed later.</p>\n\n<p>Feel free to add as much information about the compound as you can find. There are some chemically rich entries in Wikidata, such as that for acetic acid\n(<a href=\"https://www.wikidata.org/wiki/Q47512\">Q47512</a>).</p>",
      "summary": "Adding chemical compounds to Wikidata is not difficult. You can store the chemical formula (P274), (canonical) SMILES (P233), InChIKey (P235) (and InChI (P234), of course), as well various database identifiers (see what I wrote about that [here(http://chem-bla-ics.blogspot.nl/2015/12/new-edition-getting-cas-registry.html)]). It also allows storing of the provenance, and has predicates for that too.",
      
      "date_published": "2016-01-27T00:00:00+00:00",
      "date_modified": "2016-01-27T00:00:00+00:00",
      "tags": ["wikidata","chemistry","bioclipse"],
      "_references": [{ "url": "https://doi.org/10.1128/AAC.01148-08" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2015/04/18/chemistry-central-and-orcid-identifier.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2015/04/18/chemistry-central-and-orcid-identifier.html",
      "title": "Chemistry Central and the ORCID identifier",
      "content_html": "<p><img style=\"float: right;\" src=\"/assets/images/orcidTshirt.png\" width=\"200\" />\nIf you are a scientist you have heard about the <a href=\"http://orcid.org/\">ORCID</a> identifier by now. If not, you have\nbeen focusing on groundbreaking research and isolated yourself from the rest of the world, just to make it perfect\nand get that Nobel prize next year. If you have been working on impactful research, Nobel prize-worthy, and have\nbeen blogging and tweeting about your progress, as a good Open Scholar, you know ORCID is the DOI for\n“research contributors” and you already have one yourself, and probably also that T-shirt with your own identifier.\nMine is <a href=\"http://orcid.org/0000-0001-7542-0286\">0000-0001-7542-0286</a>, and\n<a href=\"https://orcid.org/statistics\">almost 1.3M other authors</a> got one too. The list of\n<a href=\"https://en.wikipedia.org/wiki/Wikipedia:ORCID\">ORCIDs on Wikipedia</a> is growing\n(<a href=\"https://www.wikidata.org/wiki/Property:P496\">and Wikidata</a>), thanks to\n<a href=\"https://twitter.com/pigsonthewing\">Andy Mabbett</a>, whom also made it possible to add\n<a href=\"http://wikipathways.org/index.php/Template:User_ORCID\">your ORCID on WikiPathways</a>.</p>\n\n<p>Anyway, what I was pleased to see today that you can now log in with your ORCID identifier with the\n<a href=\"https://www.editorialmanager.com/CHIN/default.aspx\">Chemistry Central article submission system</a> (notice\nthe green icon):</p>\n\n<p><img src=\"/assets/images/orcidChemistryCentral.png\" style=\"width: 90%; display: block; margin-left: auto; margin-right: auto;\" alt=\"Screenshot of the Chemistry Central system login page with the normal username/password text boxes, but also a green ORCID logo to login via ORCID.\" /></p>\n\n<p>Many other publishers allow logging in with your ORCID too, which benefits many:</p>\n\n<ol>\n  <li>authors who just enter a list of ORCID identifiers, instead of a long list of author names and affiliations</li>\n  <li>publishers, which have a simpler submission system and get more accurate information about submitters</li>\n  <li>funding agencies which can more easily track what is done with the research funding</li>\n  <li>research institutes which can more easily track what their employees are studying</li>\n</ol>\n\n<p>Don’t have one yet? <a href=\"https://orcid.org/register\">Get your very own ORCID here</a>.</p>",
      "summary": "If you are a scientist you have heard about the ORCID identifier by now. If not, you have been focusing on groundbreaking research and isolated yourself from the rest of the world, just to make it perfect and get that Nobel prize next year. If you have been working on impactful research, Nobel prize-worthy, and have been blogging and tweeting about your progress, as a good Open Scholar, you know ORCID is the DOI for “research contributors” and you already have one yourself, and probably also that T-shirt with your own identifier. Mine is 0000-0001-7542-0286, and almost 1.3M other authors got one too. The list of ORCIDs on Wikipedia is growing (and Wikidata), thanks to Andy Mabbett, whom also made it possible to add your ORCID on WikiPathways.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/orcidTshirt.png",
      "date_published": "2015-04-18T00:00:00+00:00",
      "date_modified": "2015-04-18T00:00:00+00:00",
      "tags": ["orcid","wikidata","wikipedia"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2014/11/16/programming-in-life-sciences-20.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2014/11/16/programming-in-life-sciences-20.html",
      "title": "Programming in the Life Sciences #20: extracting data from JSON",
      "content_html": "<p>I previously wrote about the <a href=\"http://chem-bla-ics.blogspot.nl/2013/10/programming-in-life-sciences-10.html\">JavaScript Object Notation</a>\n(JSON) which has become a de facto standard for sharing data by web services. I personally\nstill prefer something using the <a href=\"https://en.wikipedia.org/wiki/Resource_Description_Framework\">Resource Description Framework</a>\n(RDF) because of its clear link to ontologies, but perhaps\n<a href=\"https://en.wikipedia.org/wiki/JSON-LD\">JSON-LD</a> combines the best of both worlds.</p>\n\n<p>The <a href=\"https://dev.openphacts.org/\">Open PHACTS API</a> support various formats and this\nJSON is the default format used by the <a href=\"https://github.com/openphacts/ops.js\">ops.js</a>\nlibrary. However, the amount of information returned by the Open PHACTS cache is complex,\nand generally includes more than you want to use in the next step. Therefore, it is\nneeded to extract data from the JSON document, which was not covered in the\n<a href=\"http://chem-bla-ics.blogspot.nl/2013/10/programming-in-life-sciences-10.html\">post #10</a>\n<a href=\"http://chem-bla-ics.blogspot.nl/2013/10/programming-in-life-sciences-11-html.html\">or #11</a>.</p>\n\n<p>Let’s start with the example JSON given in that post, and let’s consider this is the\nvalue of a variable with the name jsonData:</p>\n\n<div class=\"language-json highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"p\">{</span><span class=\"w\">\n    </span><span class=\"nl\">\"id\"</span><span class=\"p\">:</span><span class=\"w\"> </span><span class=\"mi\">1</span><span class=\"p\">,</span><span class=\"w\">\n    </span><span class=\"nl\">\"name\"</span><span class=\"p\">:</span><span class=\"w\"> </span><span class=\"s2\">\"Foo\"</span><span class=\"p\">,</span><span class=\"w\">\n    </span><span class=\"nl\">\"price\"</span><span class=\"p\">:</span><span class=\"w\"> </span><span class=\"mi\">123</span><span class=\"p\">,</span><span class=\"w\">\n    </span><span class=\"nl\">\"tags\"</span><span class=\"p\">:</span><span class=\"w\"> </span><span class=\"p\">[</span><span class=\"w\"> </span><span class=\"s2\">\"Bar\"</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"s2\">\"Eek\"</span><span class=\"w\"> </span><span class=\"p\">],</span><span class=\"w\">\n    </span><span class=\"nl\">\"stock\"</span><span class=\"p\">:</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\">\n        </span><span class=\"nl\">\"warehouse\"</span><span class=\"p\">:</span><span class=\"w\"> </span><span class=\"mi\">300</span><span class=\"p\">,</span><span class=\"w\">\n        </span><span class=\"nl\">\"retail\"</span><span class=\"p\">:</span><span class=\"w\"> </span><span class=\"mi\">20</span><span class=\"w\">\n    </span><span class=\"p\">}</span><span class=\"w\">\n</span><span class=\"p\">}</span><span class=\"w\">\n</span></code></pre></div></div>\n\n<p>We can see that this JSON value starts with a map-like structure. We can also see that\nthere is a list embedded, and another map. I guess that one of the reasons why JSON\nhas taken such a flight is how well it integrates with the JavaScript language: selecting\ncontent can be done in terms of core language features, different from, for example,\n<a href=\"https://en.wikipedia.org/wiki/XPath\">XPath</a> statements needed for\n<a href=\"https://en.wikipedia.org/wiki/XML\">XML</a> or <a href=\"https://en.wikipedia.org/wiki/SPARQL\">SPARQL</a>\nfor RDF content. This is because the notation just follows core data types of JavaScript\nand data is stored as native data types and objects.</p>\n\n<p>For example, to get the price value from the above JSON code, we use:</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kd\">var</span> <span class=\"nx\">price</span> <span class=\"o\">=</span> <span class=\"nx\">jsonData</span><span class=\"p\">.</span><span class=\"nx\">price</span><span class=\"p\">;</span>\n</code></pre></div></div>\n\n<p>Or, if we want to get the first value in the Bar-Eek list, we use:</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kd\">var</span> <span class=\"nx\">tag</span> <span class=\"o\">=</span> <span class=\"nx\">jsonData</span><span class=\"p\">.</span><span class=\"nx\">tags</span><span class=\"p\">[</span><span class=\"mi\">0</span><span class=\"p\">];</span>\n</code></pre></div></div>\n\n<p>Or, if we want to inspect the warehouse stock:</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kd\">var</span> <span class=\"nx\">inStock</span> <span class=\"o\">=</span> <span class=\"nx\">jsonData</span><span class=\"p\">.</span><span class=\"nx\">stock</span><span class=\"p\">.</span><span class=\"nx\">warehouse</span><span class=\"p\">;</span>\n</code></pre></div></div>\n\n<p>Now, the JSON returned by the Open PHACTS API has a lot more information. This is why the\nonline, interactive documentation is so helpful: it shows the JSON. In fact, given that\nJSON is so much used, there are many tools online that help you, such as\n<a href=\"http://jsoneditoronline.org/\">jsoneditoronline.org</a> (yes, it will show error messages\nif the syntax is wrong):</p>\n\n<p><img src=\"/assets/images/debug3.png\" alt=\"\" /></p>\n\n<p>BTW, I also recommend installing a JSON viewer extension for\n<a href=\"https://chrome.google.com/webstore/detail/jsonview/chklaanhfefbnpoihckbnefhakgolnmc?hl=en#sthash.vsIhyalK.dpuf\">Chrome</a>\nor for <a href=\"https://addons.mozilla.org/en-US/firefox/addon/jsonview/\">Firefox</a>. Once you\nhave installed this plugin, you can not just read the JSON on Open PHACTS’\ninteractive documentation page, but also open the Request URL into a separate browser\nwindow. Just copy/paste the URL from this output:</p>\n\n<p><img src=\"/assets/images/json.png\" alt=\"\" /></p>\n\n<p>And with a JSON viewing extension, opening this <em>https://beta.openphacts.org/1.3/pathways/…</em>\nURL in your browser window will look something like:</p>\n\n<p><img src=\"/assets/images/json1.png\" alt=\"\" /></p>\n\n<p>And because these extensions typically use syntax highlighting, it is easier to understand\nhow to access information from within your JavaScript code. For example, if we want the\nnumber of pathways in which the compound <a href=\"http://www.conceptwiki.org/concept/index/7e0a4dd4-8160-4906-9db1-fdb300e888ea\">testosterone</a>\n(the link is the <a href=\"http://scholar.google.com/scholar?hl=nl&amp;q=ConceptWiki\">ConceptWiki</a>\nURL in the above example) is found, we can use this code:</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kd\">var</span> <span class=\"nx\">pathwayCount</span> <span class=\"o\">=</span> <span class=\"nx\">jsonData</span><span class=\"p\">.</span><span class=\"nx\">result</span><span class=\"p\">.</span><span class=\"nx\">primaryTopic</span><span class=\"p\">.</span><span class=\"nx\">pathway_count</span><span class=\"p\">;</span>\n</code></pre></div></div>",
      "summary": "I previously wrote about the JavaScript Object Notation (JSON) which has become a de facto standard for sharing data by web services. I personally still prefer something using the Resource Description Framework (RDF) because of its clear link to ontologies, but perhaps JSON-LD combines the best of both worlds.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/debug3.png",
      "date_published": "2014-11-16T00:00:00+00:00",
      "date_modified": "2014-11-16T00:00:00+00:00",
      "tags": ["pra3006"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2014/11/16/programming-in-life-sciences-19.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2014/11/16/programming-in-life-sciences-19.html",
      "title": "Programming in the Life Sciences #19: debugging",
      "content_html": "<p>Debugging is the process find removing a fault in your code\n(<a href=\"https://en.wikipedia.org/wiki/Software_bug#Etymology\">the etymology</a> goes further back\nthan the moth story, I learned today). Being able to debug is an essential programming skill,\nand being able to program flawlessly is not enough; the bug can be outside your own code.\n(… there is much that can be written up about module interactions, APIs, documentation, etc,\nthat lead to <em>malfunctioning</em> code …)</p>\n\n<p>While there are full debugging tools, achieving the task of finding where the bug is can\noften be reached with simpler means:</p>\n\n<ol>\n  <li>take notice of error messages</li>\n  <li>add debug statements in your code</li>\n</ol>\n\n<h2 id=\"error-messages\">Error messages</h2>\n\n<p>Keeping track of error messages is first starting point. This skill is almost an art:\nit requires having seen enough for them to understand how to interpret them. I guess\nerror messages are the worst developed aspects of programming language, and I do not\nfrequently see programming language tutorial that discuss error messages. The field can\ncertainly improve here.</p>\n\n<p>However, at least error messages in general give an indication where the problem occurs.\nOften by a line number, though this number is not always accurate. Underlying causes of\nthat are the problem that if there is a problem in the code, it is not always clear what\nthe problem is. For example, if there is a closing (or opening) bracket missing somewhere,\nhow can the <a href=\"http://chem-bla-ics.blogspot.nl/2013/10/exercise-what-variable-type-would-you.html\">compiler</a>\ndecide what the author of the code meant? Web browsers like Firefox/Iceweasel and\nChrome (Ctrl-Shift-C) have a console that displays compiler errors and warnings:</p>\n\n<p><img src=\"/assets/images/debug1.png\" alt=\"\" /></p>\n\n<p>Another issue is that error messages can be cryptic and misleading. For example, the\nabove error message <em>“TypeError: searcher.bytag is not a function example1.html:73”</em>\nis confusing for a starting programmer. Surely, the source code calls <code class=\"language-plaintext highlighter-rouge\">searcher.bytag()</code>\nwhich definately is a function. So, why does the compiler say it is not?? The bug here,\nof course, is that the function called in the source code is not found: it should be\n<a href=\"https://github.com/openphacts/ops.js/blob/master/src/ConceptWikiSearch.js#L9\">byTag()</a>.</p>\n\n<p>But this bug at least can be detected during interpretation and executing of the code.\nThat is, it is clear to the compiler that it doesn’t know how to handle the code.\nAnother common problem is the situation where the code looks fine (to the compiler),\nbut the data it handles makes the code break down. For example, an variable doesn’t\nhave the expected value, leading to errors (e.g. null pointer-style). Therefore,\nunderstanding the variable values at a particular point in your code can be of\ngreat use.</p>\n\n<h2 id=\"console-output\">Console output</h2>\n\n<p>A simple way to inspect the content of a variable is to use this console visible in\nthe above screenshot. Many programming languages have their custom call to send output\nthere. Java has the <code class=\"language-plaintext highlighter-rouge\">System.out.println()</code> and JavaScript has <code class=\"language-plaintext highlighter-rouge\">console.log()</code>:</p>\n\n<p><img src=\"/assets/images/debug2.png\" alt=\"\" /></p>\n\n<p>Thus, if you have some complex bit of code with multiple for-loops, if-else statements,\netc, this can be used to see if some part of your code that you expect to be called\nreally is:</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nx\">console</span><span class=\"p\">.</span><span class=\"nf\">log</span><span class=\"p\">(</span><span class=\"dl\">\"</span><span class=\"s2\">He, I'm here!</span><span class=\"dl\">\"</span><span class=\"p\">);</span>\n</code></pre></div></div>\n\n<p>This can be very useful when using asynchronous web service calls! Similarly, see\nwhat the value of some variable is:</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kd\">var</span> <span class=\"nx\">label</span> <span class=\"o\">=</span> <span class=\"nx\">jsonResponse</span><span class=\"p\">.</span><span class=\"nx\">items</span><span class=\"p\">[</span><span class=\"nx\">i</span><span class=\"p\">].</span><span class=\"nx\">prefLabel</span><span class=\"p\">;</span>\n<span class=\"nx\">console</span><span class=\"p\">.</span><span class=\"nf\">log</span><span class=\"p\">(</span><span class=\"dl\">\"</span><span class=\"s2\">label: </span><span class=\"dl\">\"</span> <span class=\"o\">+</span> <span class=\"nx\">label</span><span class=\"p\">);</span>\n</code></pre></div></div>\n\n<p>Also, because JavaScript is not a <a href=\"https://en.wikipedia.org/wiki/Strong_and_weak_typing\">strongly typed programming</a>\nI frequently find myself inspecting the <a href=\"http://chem-bla-ics.blogspot.nl/2013/10/exercise-what-variable-type-would-you.html\">data type</a>\nof a variable:</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kd\">var</span> <span class=\"nx\">label</span> <span class=\"o\">=</span> <span class=\"nx\">jsonResponse</span><span class=\"p\">.</span><span class=\"nx\">items</span><span class=\"p\">[</span><span class=\"nx\">i</span><span class=\"p\">].</span><span class=\"nx\">prefLabel</span><span class=\"p\">;</span>\n\n<span class=\"nx\">console</span><span class=\"p\">.</span><span class=\"nf\">log</span><span class=\"p\">(</span><span class=\"dl\">\"</span><span class=\"s2\">typeof label: </span><span class=\"dl\">\"</span> <span class=\"o\">+</span> <span class=\"k\">typeof</span><span class=\"p\">(</span><span class=\"nx\">label</span><span class=\"p\">));</span>\n</code></pre></div></div>\n\n<h2 id=\"conclusion\">Conclusion</h2>\n\n<p>These tools are very useful to find the location of a bug. And this matters. Yesterday,\nI was trying to use the <a href=\"http://chem-bla-ics.blogspot.nl/2014/11/programming-in-life-sciences-18.html\">histogram code in example6.html</a>\nto visualize a set of values with negative numbers (<a href=\"https://en.wikipedia.org/wiki/Zeta_potential\">zeta potentials</a>\nof nanomaterials, to be precise) and I was debugging the issue, trying to find where\nmy code when wrong. I used the above approaches, and the array of values looked in\norder, but different from the original example. But still the histogram was not\nshowing up. Well, after hours, and having asked someone else to look at the code\ntoo, and having ruled out many alternatives, she pointed out that the problem was\nnot in the JavaScript part of the code, but in the HTML: I was mixing up how\ndefault JavaScript and the d3.js library add SVG content to the HTML data model.\nThat is, I was using <code class=\"language-plaintext highlighter-rouge\">&lt;div id=\"chart\"&gt;</code>, which works with <code class=\"language-plaintext highlighter-rouge\">document.getElementById(\"chart\").innerHTML</code>,\nbut needed to use <code class=\"language-plaintext highlighter-rouge\">&lt;div class=\"chart\"&gt;</code> with the <code class=\"language-plaintext highlighter-rouge\">d3.select(\".chart\").innerHTML</code>\ncode I was using later.</p>\n\n<p>OK, that bug was on my account. However, it still was not working: I did see a\nhistogram, but it didn’t look good. Again debugging, and after again much too long,\nI found out that this was a bug in the d3.js code that makes it impossible to use\ntheir histogram example code for negative values. Again, once I knew where the bug\nwas, I could Google and quickly found\n<a href=\"http://stackoverflow.com/questions/15388481/d3-js-histogram-with-positive-and-negative-values\">the solution for it on StackOverflow</a>.</p>\n\n<p>So, the workflow of debugging at a top level, looks like:</p>\n\n<ol>\n  <li>find where the problem is</li>\n  <li>try to solve the problem</li>\n</ol>\n\n<p>Happy debugging!</p>",
      "summary": "Debugging is the process find removing a fault in your code (the etymology goes further back than the moth story, I learned today). Being able to debug is an essential programming skill, and being able to program flawlessly is not enough; the bug can be outside your own code. (… there is much that can be written up about module interactions, APIs, documentation, etc, that lead to malfunctioning code …)",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/debug2.png",
      "date_published": "2014-11-16T00:00:00+00:00",
      "date_modified": "2014-11-16T00:00:00+00:00",
      "tags": ["pra3006"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2014/11/06/programming-in-life-sciences-17-open.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2014/11/06/programming-in-life-sciences-17-open.html",
      "title": "Programming in the Life Sciences #17: The Open PHACTS scientific questions",
      "content_html": "<p>I think the authors of the <a href=\"http://www.openphacts.org/\">Open PHACTS</a> proposal made a right choice\nin defining a small set of questions that the solution to be developed could be tested against.\nThe questions being specific, it is much easier to understand the needs. In fact, I suspect it may\neven be a very useful form of requirement analysis, and makes it hard to keep using vague terms.</p>\n\n<p><img src=\"/assets/images/opsSciencyQs.jpg\" alt=\"\" /></p>\n\n<p>Open PHACTS has come up with 20 questions (doi:<a href=\"https://doi.org/10.1016/j.drudis.2013.05.008\">10.1016/j.drudis.2013.05.008</a>;\nOpen Access):</p>\n\n<ol>\n  <li><em>Give me all oxidoreductase inhibitors active &lt;100 nM in human and mouse</em></li>\n  <li><em>Given compound X, what is its predicted secondary pharmacology? What are the on- and off-target safety concerns for a compound? What is the evidence and how reliable is that evidence (journal impact factor, KOL) for findings associated with a compound?</em></li>\n  <li><em>Given a target, find me all actives against that target. Find/predict polypharmacology of actives. Determine ADMET profile of actives</em></li>\n  <li><em>For a given interaction profile – give me similar compounds</em></li>\n  <li><em>The current Factor Xa lead series is characterized by substructure X. Retrieve all bioactivity data in serine protease assays for molecules that contain substructure X</em></li>\n  <li><em>A project is considering protein kinase C alpha (PRKCA) as a target. What are all the compounds known to modulate the target directly? What are the compounds that could modulate the target directly? I.e. return all compounds active in assays where the resolution is at least at the level of the target family (i.e. PKC) from structured assay databases and the literature</em></li>\n  <li><em>Give me all active compounds on a given target with the relevant assay data</em></li>\n  <li><em>Identify all known protein–protein interaction inhibitors</em></li>\n  <li><em>For a given compound, give me the interaction profile with targets</em></li>\n  <li><em>For a given compound, summarize all ‘similar compounds’ and their activities</em></li>\n  <li><em>Retrieve all experimental and clinical data for a given list of compounds defined by their chemical structure (with options to match stereochemistry or not)</em></li>\n  <li><em>For my given compound, which targets have been patented in the context of Alzheimer’s disease?</em></li>\n  <li><em>Which ligands have been described for a particular target associated with transthyretin-related amyloidosis, what is their affinity for that target and how far are they advanced into preclinical/clinical phases, with links to publications/patents describing these interactions?</em></li>\n  <li><em>Target druggability: compounds directed against target X have been tested in which indications? Which new targets have appeared recently in the patent literature for a disease? Has the target been screened against in AZ before? What information on in vitro or in vivo screens has already been performed on a compound?</em></li>\n  <li><em>Which chemical series have been shown to be active against target X? Which new targets have been associated with disease Y? Which companies are working on target X or disease Y?</em></li>\n  <li><em>Which compounds are known to be activators of targets that relate to Parkinson’s disease or Alzheimer’s disease</em></li>\n  <li><em>For my specific target, which active compounds have been reported in the literature? What is also known about upstream and downstream targets?</em></li>\n  <li><em>Compounds that agonize targets in pathway X assayed in only functional assays with a potency &lt;1 μM</em></li>\n  <li><em>Give me the compound(s) that hit most specifically the multiple targets in a given pathway (disease)</em></li>\n  <li><em>For a given disease/indication, give me all targets in the pathway and all active compounds hitting them</em></li>\n</ol>\n\n<p>Students in the <a href=\"http://chem-bla-ics.blogspot.nl/search/label/%23mscpils\">Programming in the Life Sciences course</a>\nwill this year pick one of these questions as a starting point in the project. The goal is to develop\na HTML+JavaScript solution that will answer the question the selected. There is freedom to tweak the\nquestion to personal interests, of course. By selecting a simpler pharmacological question that last\nyear, more time and effort can be put into visualization and interpretation of the found data.</p>",
      "summary": "I think the authors of the Open PHACTS proposal made a right choice in defining a small set of questions that the solution to be developed could be tested against. The questions being specific, it is much easier to understand the needs. In fact, I suspect it may even be a very useful form of requirement analysis, and makes it hard to keep using vague terms.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/opsSciencyQs.jpg",
      "date_published": "2014-11-06T00:00:00+00:00",
      "date_modified": "2014-11-06T00:00:00+00:00",
      "tags": ["pra3006"],
      "_references": [{ "url": "https://doi.org/10.1016/j.drudis.2013.05.008" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2014/08/30/on-open-access.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2014/08/30/on-open-access.html",
      "title": "On Open Access in The Netherlands",
      "content_html": "<p><img style=\"float: right;\" src=\"/assets/images/vsnu.png\" width=\"200\" />\nYesterday, I received a letter from the <a href=\"http://vsnu.nl/\">Association of Universities The Netherlands</a> (VSNU, <a href=\"https://twitter.com/deVSNU\">@deVSNU</a>)\nabout Open Access. The Netherlands is for research a very interesting country: it’s small, meaning we have few resources to establish and maintain\nhigh profile centers, we also believe strong education benefits from distribution, so we we have many good universities, rather than a few excelling\nuniversities. Mind you, this clouds that we absolutely do have excelling research institutes and research groups; they just are not concentrated in\none university.</p>\n\n<p>Another important aspect is that all those Dutch universities are expected to compete which each other for funding. As a result I have experience\nrather interesting collaborations between universities. That’s a downside of a small country: everyone knows each other, often in way to much\ndetail. But my point is that the Dutch can be rather conservative. That kills innovation, and is in my opinion a key reason why\n<a href=\"http://www.rathenau.nl/actueel/nieuwsberichten/2014/08/universiteiten-blijven-hangen-in-de-subtop.html\">we are not breaking into the top 50 of rankings</a>,\nmore than concentration. Concentration of funding in Top research institutes has not been extensively evaluated, but I think the efficiency is\nnot proven higher than previous funding approaches.</p>\n\n<p>Anyway, this letter I received is part of <a href=\"http://vsnu.nl/openaccess/\">their Open Access program</a>. Here too, the Dutch universities are conservative\n(well, relatively from my views, at least). Now, the Open Access debate is not so interesting, because it primarily ends up about who pays who\n(boring) and whether we should go gold or green (besides the point, see below), and, sadly, here too many people think about who pays who again\n(still boring).</p>\n\n<p>Therefore, giving the outlined importance and impact of Dutch research, I found it relevant to post about the progress of Open Access in my small\ncountry. The letter is <a href=\"http://www.vsnu.nl/files/documenten/Domeinen/Onderzoek/Open%20access/14267%20Open%20Access%20to%20publications%20(ENG).pdf\">available in English</a>.</p>\n\n<p>Basically, the letter is an answer to an earlier letter from our government about Open Access, and it warns about actions that will soon be\nundertaken (so, not really pro-active). However,</p>\n\n<ul><i>\n\"[they] are also appealing to you to continue to advocate free access to your own scientific publications.\"\n</i></ul>\n\n<p>Well, I have, not so actively, and maybe this post can be the start of a change. Because what basically bothers me is that the Open Access\ndiscussion, also in The Netherlands, is biased. And indeed, the letter continues with a section about gold and green access. If the VSNU\nreally wants to promote free access to <strong><em>research</em></strong>, it should not even accept green. We all know that it is not about being able to look at (free),\nbut to be able to mix and improve. Reuse. Continue. Stand on shoulders. The fact that this letter focuses on publications only, does not spend a\nword on reuse, is rather depressing and not giving me even the slightest hint that The Netherlands will break into that Top 50 any time soon.</p>\n\n<p>Overall, the latter is relatively positive for the Open Access movement, though reactive. <a href=\"https://twitter.com/egonwillighagen/status/504973493742891008\">They still have some explanation to do</a>:</p>\n\n<ul><i>\n\"The golden route is more complex. However, many believe that in the end it is a\nmore sustainable route to Open Access.\"\n</i></ul>\n\n<p>(Or maybe readers can explain me what is complex about the golden route?)</p>\n\n<p>The following is a rather interesting section, but really only when they had focused on Open Access in its pure form that allows research\nreuse. I think it now leaves you with a low starting point bargaining with resistant publisher lawyers and managers that have long\n<a href=\"http://alexholcombe.wordpress.com/2013/01/09/scholarly-publishers-and-their-high-profits/\">lost the interest of the academics in favor of that of the share holders</a>:</p>\n\n<ul><i>\nFor the past ten years, publishers have been offering journals in package deals referred to as Big Deals. Shortly negotiations with\nthe major publishers about these Big Deals Will take place, including Elsevier, Springer and Wiley. The Dutch universities have expressed\ntheir wish to make agreements with these publishers about the transition to Open Access as part of those Big Deals. Universities expect\npublishers to take serious steps to facilitate that transition.\n</i></ul>\n\n<p>I hope the VSNU will clarify with what they mean with “serious”. Because they all came up with “me too” solutions (setting up new OA\njournals) without seriously changing their model. No large publisher dared making the flagship journals full gold Open Access. That is\nserious business; all we see now is scribbling in the margin.</p>\n\n<p>Perhaps that is the reason of the wish to be in the top 50. Maybe the VSNU just wants a better bargaining position.</p>\n\n<p>The letter ends with what researchers can do. And with that, they are spot on:</p>\n\n<ul><i>\nAs a researcher, you can play a vital role in the transition to Open Access. We have\nmentioned the possibility of depositing arlídes in the repository of your own\nuniversity. But there is more. It’s important to consider that researchers play a key\nrole in the publishing process: as providers of the scientific content, as reviewers\nand as members of editorial and advisory boards. We hope that where ever possible,\nyou will ask publishers to convert to an Open Access model.\n</i></ul>\n\n<p>What any researcher can already do to promote (proper) Open Access:</p>\n\n<ol>\n  <li>stop reviewing publishing closed-access papers (you have way too much review requests already, and some filtering will not hurt you)</li>\n  <li>stop reviewing publishing for non-gold Open Access journals (step further than the first item)</li>\n  <li>submit only to full-gold Open Access journals (plenty of options; importantly, the quality and impact of your paper is not dependent on the journal, but on you. if not, you’re just a bad author and researcher and should go back to school or start learning from feed back on your Open Notebook Science, so that you improve your act before you submit; really, it happens to the best of us: multidisciplinary research is hard: you cannot excel in biology and chemistry and statistics and informatics and computer science and data analysis and materials science and as perfect and creative linguistic (well, not all of us, anyway))</li>\n  <li>put your previous mistakenly closed-access papers in university repositories (most Dutch universities have solutions; not all yet)</li>\n  <li>make previously published closed-access papers gold Open Access (yes, you can! I am in the process of doing this for the CDK I paper, and other ACS papers will follow)</li>\n  <li><a href=\"https://orcid.org/register\">get an ORCID</a></li>\n  <li>use <a href=\"https://en.wikipedia.org/wiki/Altmetrics\">#altmetrics</a> to see that gold Open Access gives you more impact for your papers too (service providers include <a href=\"https://impactstory.org/\">ImpactStory</a>, <a href=\"http://altmetric.com/\">Altmetric.com</a>, <a href=\"http://www.plumanalytics.com/\">Plum Analytics</a>, etc)</li>\n</ol>\n\n<p>Of course, it is not only about publications. Again, the VSNU would do good to learn that research is not the same as publications.\nBesides sending letters, I think the VSNU can do this to promote Open Science, which is what I hope they are after:</p>\n\n<ol>\n  <li>negotiate with the government and major science and funding agencies (KNAW, NWO) to stop focusing on publications as primary output</li>\n  <li>start focusing on output other than publications (e.g. data sets, software) even if you have not ended negotiations with other, just to set a proper example</li>\n  <li>make research outcomes machine readable (read <a href=\"https://researchkb.wordpress.com/2014/08/26/linked-open-data-at-the-national-library-of-the-netherlands/\">this interesting post from our national library</a>)</li>\n  <li>actively explore business models around Open Science (and not have your universities’ spin-off departments only know about patent law, ignore the rest of the world)</li>\n  <li>adopt the ORCID nation wide, staring Jan 2015</li>\n  <li>start using #altmetrics to get a better perspective of the performance of your members</li>\n</ol>\n\n<p>Of course, I am more than willing to help the VNSU with this transition. I can be reached at the\n<a href=\"http://www.bigcat.unimaas.nl/\">Department of Bioinformatics - BiGCaT</a>, <a href=\"http://www.maastrichtuniversity.nl/web/show/id=6265112/langid=42\">NUTRIM</a>,\n<a href=\"http://www.maastrichtuniversity.nl/web/show/id=74338/langid=42\">FHML</a>, <a href=\"http://www.maastrichtuniversity.nl/\">Maastricht University</a>.\nThere are many options I have missed here (like data repositories, data citing, DOIs, and whatever).</p>\n\n<p>PS. <a href=\"https://impactstory.org/EgonWillighagen\">my ImpactStory profile</a> will tell you that more than\n80% of my publications are Open Access. Not all gold yet, but I am working on changing that for some old papers.</p>",
      "summary": "Yesterday, I received a letter from the Association of Universities The Netherlands (VSNU, @deVSNU) about Open Access. The Netherlands is for research a very interesting country: it’s small, meaning we have few resources to establish and maintain high profile centers, we also believe strong education benefits from distribution, so we we have many good universities, rather than a few excelling universities. Mind you, this clouds that we absolutely do have excelling research institutes and research groups; they just are not concentrated in one university.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/vsnu.png",
      "date_published": "2014-08-30T00:00:00+00:00",
      "date_modified": "2014-08-30T00:00:00+00:00",
      "tags": ["openaccess","openscience"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2013/11/08/looking-for-phd-and-postdoc-to-work-on.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2013/11/08/looking-for-phd-and-postdoc-to-work-on.html",
      "title": "Looking for a PhD and a Postdoc to work on Open Science Nanosafety",
      "content_html": "<p>I am happy that I got my first research grant awarded (EU FP7), which should start after all the contracts are signed,\netc, somewhere early 2014. The project is about setting up data needs for the analysis of nanosafety studies. And for this,\nI have the below two position vacancies available now. If you are keen on doing Open Science (CDK, Bioclipse, OpenTox, WikiPathways, …, …),\nworking within the European <a href=\"http://www.nanosafetycluster.eu/\">NanoSafety Cluster</a>, and have an affinity with understanding the\nsystems biology of nanomaterials, then you may be interested in applying.</p>\n\n<p><strong>PhD position</strong></p>\n\n<p><img src=\"/assets/images/vac1.png\" alt=\"\" /></p>\n\n<p><strong>Postdoc position</strong></p>\n\n<p><img src=\"/assets/images/vac2.png\" alt=\"\" /></p>",
      "summary": "I am happy that I got my first research grant awarded (EU FP7), which should start after all the contracts are signed, etc, somewhere early 2014. The project is about setting up data needs for the analysis of nanosafety studies. And for this, I have the below two position vacancies available now. If you are keen on doing Open Science (CDK, Bioclipse, OpenTox, WikiPathways, …, …), working within the European NanoSafety Cluster, and have an affinity with understanding the systems biology of nanomaterials, then you may be interested in applying.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/vac1.png",
      "date_published": "2013-11-08T00:00:00+00:00",
      "date_modified": "2024-06-01T00:00:00+00:00",
      "tags": ["nanosafety","enanomapper","opentox","ontology"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2013/10/30/programming-in-life-sciences-9-apis-and.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2013/10/30/programming-in-life-sciences-9-apis-and.html",
      "title": "Programming in the Life Sciences #9: APIs and Web Services",
      "content_html": "<p>Continuing on the <a href=\"http://chem-bla-ics.blogspot.nl/2013/10/exercise-what-variable-type-would-you.html\">theory</a>\n<a href=\"http://chem-bla-ics.blogspot.nl/2013/10/programming-in-life-sciences-8-coding.html\">covered</a> in this course,\nthis part will talk about <a href=\"https://en.wikipedia.org/wiki/Application_programming_interface\">application programming interfaces</a>\n(APIs) and <a href=\"https://en.wikipedia.org/wiki/Web_service\">web services</a>.</p>\n\n<h2 id=\"application-programming-interfaces\">Application Programming Interfaces</h2>\n\n<p>APIs define how programs can be used by other programs. An API defines how methods are called and what feedback\nyou can expect. It basically is the combination of documentation and the program itself. But, unlike any piece\nof software, an API is aimed at users, rather than use in the same program. The API is how you communicate\nbetween programs.</p>\n\n<p>Now, in this course we will see two key types of APIs. The first are the APIs provided by the libraries that we\nuse. For example, we already indicated that we will be using at least the following two libraries,\n<a href=\"http://chem-bla-ics.blogspot.nl/2013/10/programming-in-life-sciences-8-coding.html\">ops.js and d3.js</a>.\nThese libraries are a collection of functional bits (e.g. classes and methods). For example, ops.js\ndefines an API which wraps closely the <a href=\"https://dev.openphacts.org/docs\">Open PHACTS Linked Data API</a>\n(LDA) itself. The API requires as to do a few things: 1. create a wrapper for the LDA; 2. define a\ncall back function; 3. invoke the actual</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nx\">call</span><span class=\"p\">.</span><span class=\"kd\">var</span> <span class=\"nx\">searcher</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"nx\">Openphacts</span><span class=\"p\">.</span><span class=\"nc\">ConceptWikiSearch</span><span class=\"p\">(</span>\n  <span class=\"dl\">\"</span><span class=\"s2\">https://beta.openphacts.org</span><span class=\"dl\">\"</span><span class=\"p\">,</span> <span class=\"nx\">appID</span><span class=\"p\">,</span> <span class=\"nx\">appKey</span>\n<span class=\"p\">);</span>  \n<span class=\"kd\">var</span> <span class=\"nx\">callback</span><span class=\"o\">=</span><span class=\"kd\">function</span><span class=\"p\">(</span><span class=\"nx\">success</span><span class=\"p\">,</span> <span class=\"nx\">status</span><span class=\"p\">,</span> <span class=\"nx\">response</span><span class=\"p\">){</span>  \n    <span class=\"nx\">searcher</span><span class=\"p\">.</span><span class=\"nf\">parseResponse</span><span class=\"p\">(</span><span class=\"nx\">response</span><span class=\"p\">);</span>\n<span class=\"p\">};</span>  \n<span class=\"nx\">searcher</span><span class=\"p\">.</span><span class=\"nf\">findCompounds</span><span class=\"p\">(</span><span class=\"dl\">'</span><span class=\"s1\">Aspirin</span><span class=\"dl\">'</span><span class=\"p\">,</span> <span class=\"dl\">'</span><span class=\"s1\">20</span><span class=\"dl\">'</span><span class=\"p\">,</span> <span class=\"dl\">'</span><span class=\"s1\">4</span><span class=\"dl\">'</span><span class=\"p\">,</span> <span class=\"nx\">callback</span><span class=\"p\">);</span>\n</code></pre></div></div>\n\n<h2 id=\"web-services\">Web Services</h2>\n\n<p>Web services are a special kind of APIs: they expose an API over the web. That imposes some features of\nthese APIs: first, they are based on a web transport layer, commonly\n<a href=\"https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol\">HTTP</a>, but\n<a href=\"https://en.wikipedia.org/wiki/Xmpp\">XMPP</a> is possible too. HTTP is used by your web browser too. Secondly,\nthe web server needs a common communication language to serialize the method call. Here, two key standards\nare used, <a href=\"https://en.wikipedia.org/wiki/XML\">XML</a> and <a href=\"https://en.wikipedia.org/wiki/JSON\">JSON</a>.\nWe will cover these in more detail later. For now, it suffices to think of these as\nenvelopes in which are message is sent. Now, another aspect standardized is how to call the web services.\nFor that, <a href=\"https://en.wikipedia.org/wiki/SOAP\">SOAP</a> and <a href=\"https://en.wikipedia.org/wiki/REST\">REST</a> are\nthe most important standards for the life sciences (though I still think\n<a href=\"http://www.biomedcentral.com/1471-2105/10/279\">Wagener’s XMPP approach</a> is still\nworthwhile checking out!). SOAP and REST use XML and JSON are underlying serialization format.</p>\n\n<p>So, web services are theoretically complex. For this course, most of it is hidden by the client library that will take care of the HTTP and SOAP/REST layers. The students who wish to use Java instead of JavaScript, will face the problem that you first need to find a Java client library for the LDA. There is this library, but that needs exploring for use with the latest Open PHACTS LDA. Higher stakes, higher rewards.</p>\n\n<h2 id=\"take-home-message\">Take home message</h2>\n\n<p>Practically, you do not need to know much of the technologies behind web services, just like you do not need to know machine instructions CPUs follow to run your program. But, it is important to have seen these terms. You will run into them, and need enough context to know where and how to find answers to the questions that you will have.</p>\n\n<p>There is one exception: JavaScript Object Notation, JSON. That is the format in which the data is returned by the service, and you will have to handle that. JSON will be the topic of the next post.</p>",
      "summary": "Continuing on the theory covered in this course, this part will talk about application programming interfaces (APIs) and web services.",
      
      "date_published": "2013-10-30T00:00:00+00:00",
      "date_modified": "2013-10-30T00:00:00+00:00",
      "tags": ["pra3006"],
      "_references": [{ "url": "https://doi.org/10.1186/1471-2105-10-279" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2013/10/30/programming-in-life-sciences-11-html.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2013/10/30/programming-in-life-sciences-11-html.html",
      "title": "Programming in the Life Sciences #11: HTML",
      "content_html": "<p><a href=\"https://en.wikipedia.org/wiki/HTML\">HTML</a> (HyperText Markup Language), the language of the web,\nis no longer the only language of the web. But it still is the primary language in which source\ncode of webpages is shared. Originally, HTML pages were always static: the only HTML source of a\nweb page was that was downloaded from a website. Nowadays, much HTML the is visualized in your\nweb browser, is generated on the fly with JavaScript. In fact, that is exactly what you will\nlearn to do in this course.</p>\n\n<p>HTML has many dialects, and HTML5 is the upcoming next version. The features have become so\nextensive that we will not have capture half of them; instead, we will stick to the bare\nminimum needed. But even at an minimum, writing a web page with HTML code is basically writing\nsource code. The compiled version is the view of the webpage your web browser shows you. One\nimportant difference is that HTML is much more like a data model representation than it is like\ncomputational instructions. That is, rather than saying things like <code class=\"language-plaintext highlighter-rouge\">put(\"String\", xCoord, yCoord)</code>,\nwe define what is to be shown in in what order with general instructions. Well, in pure HTML\nthat is. <a href=\"https://en.wikipedia.org/wiki/CSS\">Cascading Style Sheets</a> (CSS) is quite outside the\nscope of this course.</p>\n\n<p>A minimal HTML page looks like:</p>\n\n<div class=\"language-html highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nt\">&lt;html&gt;</span>\n  <span class=\"nt\">&lt;head&gt;</span>\n  <span class=\"nt\">&lt;/head&gt;</span>\n  <span class=\"nt\">&lt;body&gt;</span>\n  Hello world!\n  <span class=\"nt\">&lt;/body&gt;</span>\n<span class=\"nt\">&lt;/html&gt;</span>\n</code></pre></div></div>\n\n<p>When we think about this structure, we notice that it is not unlike the key-value maps we\ncovered earlier. For example, compare it to this\n<a href=\"http://chem-bla-ics.blogspot.nl/2013/10/programming-in-life-sciences-10.html\">JSON</a>:</p>\n\n<div class=\"language-json highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"p\">{</span><span class=\"w\">\n  </span><span class=\"nl\">\"html\"</span><span class=\"p\">:</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\">\n    </span><span class=\"nl\">\"head\"</span><span class=\"p\">:{},</span><span class=\"w\">\n    </span><span class=\"nl\">\"body\"</span><span class=\"p\">:{</span><span class=\"err\">value:</span><span class=\"s2\">\"Hello world!\"</span><span class=\"p\">}</span><span class=\"w\">\n  </span><span class=\"p\">}</span><span class=\"w\">\n</span><span class=\"p\">}</span><span class=\"w\">\n</span></code></pre></div></div>\n\n<p>Even if we introduce HTML attributes:</p>\n\n<div class=\"language-html highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nt\">&lt;html&gt;</span>\n  <span class=\"nt\">&lt;head&gt;</span>\n  <span class=\"nt\">&lt;/head&gt;</span>\n  <span class=\"nt\">&lt;body&gt;</span>\n  <span class=\"nt\">&lt;h1&gt;&lt;a</span> <span class=\"na\">name=</span><span class=\"s\">\"hello\"</span><span class=\"nt\">&gt;</span>Hello world!<span class=\"nt\">&lt;/a&gt;&lt;/h1&gt;</span>\n  <span class=\"nt\">&lt;/body&gt;</span>\n<span class=\"nt\">&lt;/html&gt;</span>\n</code></pre></div></div>\n\n<p>The JSON equivalent would be:</p>\n\n<div class=\"language-json highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"p\">{</span><span class=\"w\">\n  </span><span class=\"nl\">\"html\"</span><span class=\"p\">:</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\">\n    </span><span class=\"nl\">\"head\"</span><span class=\"p\">:{},</span><span class=\"w\">\n    </span><span class=\"nl\">\"body\"</span><span class=\"p\">:{</span><span class=\"w\">\n      </span><span class=\"nl\">\"h1\"</span><span class=\"p\">:{</span><span class=\"w\">\n        </span><span class=\"nl\">\"a\"</span><span class=\"p\">:{</span><span class=\"w\">\n          </span><span class=\"err\">attributes:</span><span class=\"p\">{</span><span class=\"nl\">\"name\"</span><span class=\"p\">:</span><span class=\"s2\">\"hello\"</span><span class=\"p\">},</span><span class=\"w\">\n          </span><span class=\"err\">value:</span><span class=\"s2\">\"Hello world!\"</span><span class=\"w\">\n        </span><span class=\"p\">}</span><span class=\"w\">\n      </span><span class=\"p\">}</span><span class=\"w\">\n    </span><span class=\"p\">}</span><span class=\"w\">\n  </span><span class=\"p\">}</span><span class=\"w\">\n</span><span class=\"p\">}</span><span class=\"w\">\n</span></code></pre></div></div>\n\n<p>So, while these are quite different languages than programming languages, we can clearly\nsee they have been made up by the same (computer science) people. But in my opinion, this\nis an advantage: because we only need to learn the underlying patterns and can then much\nmore easily switch between different language.</p>\n\n<p>Now, returning to the HTML example, we introduce a bit of terminology. Let’s start with\nthe last example:</p>\n\n<div class=\"language-html highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nt\">&lt;h1&gt;&lt;a</span> <span class=\"na\">name=</span><span class=\"s\">\"hello\"</span><span class=\"nt\">&gt;</span>Hello world!<span class=\"nt\">&lt;/a&gt;&lt;/h1&gt;</span>\n</code></pre></div></div>\n\n<p>This HTML code example shows the <code class=\"language-plaintext highlighter-rouge\">&lt;h1&gt;</code> <strong>element</strong> which has one <strong>child element</strong>\n<code class=\"language-plaintext highlighter-rouge\">&lt;a&gt;</code>. This child element has an <strong>attribute</strong> <code class=\"language-plaintext highlighter-rouge\">@name</code>. Elements can contain string content,\nsuch as the <code class=\"language-plaintext highlighter-rouge\">&lt;a&gt;</code> element has, and one or more child elements (and any combination of that).\nAttributes can only have string content. The HTML specification defines in detail which\nelements can be child elements of other elements. For example, the <code class=\"language-plaintext highlighter-rouge\">&lt;head&gt;</code> element can\nonly be a child element of <code class=\"language-plaintext highlighter-rouge\">&lt;html&gt;</code>. Similarly, each HTML element can only have specific\nattributes, though some attributes can be attached to any element.</p>\n\n<p>There is plenty of documentation on the web, but there are also tools that can help us write\nHTML. For example, the <a href=\"http://validator.w3.org/\">http://validator.w3.org/</a>. This website\ndetects errors in your HTML code, and is quite helpful if you are new to editing HTML, as\nwell as useful if you have a lot of HTML experience.</p>\n\n<p>HTML elements you may find useful include the following:</p>\n\n<ul>\n  <li><code class=\"language-plaintext highlighter-rouge\">&lt;h1&gt;</code>, <code class=\"language-plaintext highlighter-rouge\">&lt;h2&gt;</code>, …, <code class=\"language-plaintext highlighter-rouge\">&lt;h5&gt;</code>: these are header and can be used to make sections</li>\n  <li><code class=\"language-plaintext highlighter-rouge\">&lt;p&gt;</code>: indicates a paragraph</li>\n  <li><code class=\"language-plaintext highlighter-rouge\">&lt;div id=\"someID\"&gt;</code>: indicates a section of text. The content of any element with an id attribute can be replaced by any appropriate HTML content with JavaScript</li>\n  <li><code class=\"language-plaintext highlighter-rouge\">&lt;a href=\"http://...\"&gt;some link&lt;/a&gt;</code>: this is used to make hyperlinks, href means hyperlink reference</li>\n  <li><code class=\"language-plaintext highlighter-rouge\">&lt;a name=\"mark1\"&gt;some text&lt;/a&gt;</code>: this is used to create bookmarks. with <code class=\"language-plaintext highlighter-rouge\">&lt;a href=\"#mark1\"&gt;jump to section Mark 1&lt;/a&gt;</code></li>\n  <li><code class=\"language-plaintext highlighter-rouge\">&lt;script&gt;</code>: used to include JavaScript code in your HTML page</li>\n  <li><code class=\"language-plaintext highlighter-rouge\">&lt;head&gt;</code>: this HTML blob contains metadata, a list of libraries to be loaded, but also JavaScript which is executed before the HTML <code class=\"language-plaintext highlighter-rouge\">&lt;body&gt;</code> is processed</li>\n  <li><code class=\"language-plaintext highlighter-rouge\">&lt;body&gt;</code>: this contains the HTML that is depicted in your browser window</li>\n</ul>\n\n<p>Keep the HTML simple; the programming is more important.</p>\n\n<p><strong>Exercise</strong>: below is part of the HTML/JavaScript <a href=\"https://github.com/egonw/mscpils/blob/master/example1.html\">source code</a>\nbehind <a href=\"http://chem-bla-ics.blogspot.nl/2013/10/programming-in-life-sciences-5.html\">this app</a>.\nPlease indicate which lines are HTML source code, and what is JavaScript.</p>\n\n<div class=\"language-html highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"cp\">&lt;!DOCTYPE HTML PUBLIC\n  \"-//W3C//DTD HTML 4.01 Transitional//EN\"\n  \"http://www.w3.org/TR/html4/loose.dtd\"&gt;</span>\n<span class=\"nt\">&lt;html&gt;</span>\n<span class=\"c\">&lt;!--\n\nCopyright (c) 2013  Egon Willighagen &lt;egon.willighagen@maastrichtuniversity.nl&gt;\n\n Permission is hereby granted, free of charge, to any person\n obtaining a copy of this software and associated documentation\n files (the \"Software\"), to deal in the Software without\n restriction, including without limitation the rights to use,\n copy, modify, merge, publish, distribute, sublicense, and/or sell\n copies of the Software, and to permit persons to whom the\n Software is furnished to do so, subject to the following\n conditions:\n\n The above copyright notice and this permission notice shall be\n included in all copies or substantial portions of the Software.\n\n THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND,\n EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES\n OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND\n NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT\n HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,\n WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING\n FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR\n OTHER DEALINGS IN THE SOFTWARE.\n\n--&gt;</span>\n<span class=\"nt\">&lt;head&gt;</span>\n  <span class=\"nt\">&lt;title&gt;</span>OpenPHACTS Jasmine Spec Runner<span class=\"nt\">&lt;/title&gt;</span>\n  <span class=\"nt\">&lt;script </span><span class=\"na\">src=</span><span class=\"s\">\"lib/jquery-1.9.1.min.js\"</span><span class=\"nt\">&gt;&lt;/script&gt;</span>\n  <span class=\"nt\">&lt;script </span><span class=\"na\">type=</span><span class=\"s\">\"text/javascript\"</span> <span class=\"na\">src=</span><span class=\"s\">\"lib/purl.js\"</span><span class=\"nt\">&gt;&lt;/script&gt;</span>\n\n  <span class=\"c\">&lt;!-- include source files here... --&gt;</span>\n  <span class=\"nt\">&lt;script </span><span class=\"na\">type=</span><span class=\"s\">\"text/javascript\"</span> <span class=\"na\">src=</span><span class=\"s\">\"src/OPS.js\"</span><span class=\"nt\">&gt;&lt;/script&gt;</span>\n  <span class=\"nt\">&lt;script </span><span class=\"na\">type=</span><span class=\"s\">\"text/javascript\"</span> <span class=\"na\">src=</span><span class=\"s\">\"src/ConceptWikiSearch.js\"</span><span class=\"nt\">&gt;&lt;/script&gt;</span>\n\n  <span class=\"c\">&lt;!-- setup --&gt;</span>\n  <span class=\"nt\">&lt;script </span><span class=\"na\">type=</span><span class=\"s\">\"text/javascript\"</span><span class=\"nt\">&gt;</span>\n  <span class=\"c1\">// get the app_key and app_id from the webpage call --&gt;</span>\n<span class=\"kd\">var</span> <span class=\"nx\">prmstr</span> <span class=\"o\">=</span> <span class=\"nb\">window</span><span class=\"p\">.</span><span class=\"nx\">location</span><span class=\"p\">.</span><span class=\"nx\">search</span><span class=\"p\">.</span><span class=\"nf\">substr</span><span class=\"p\">(</span><span class=\"mi\">1</span><span class=\"p\">);</span>\n<span class=\"kd\">var</span> <span class=\"nx\">prmarr</span> <span class=\"o\">=</span> <span class=\"nx\">prmstr</span><span class=\"p\">.</span><span class=\"nf\">split </span><span class=\"p\">(</span><span class=\"dl\">\"</span><span class=\"s2\">&amp;</span><span class=\"dl\">\"</span><span class=\"p\">);</span>\n<span class=\"kd\">var</span> <span class=\"nx\">params</span> <span class=\"o\">=</span> <span class=\"p\">{};</span>\n<span class=\"k\">for</span> <span class=\"p\">(</span> <span class=\"kd\">var</span> <span class=\"nx\">i</span> <span class=\"o\">=</span> <span class=\"mi\">0</span><span class=\"p\">;</span> <span class=\"nx\">i</span> <span class=\"o\">&lt;</span> <span class=\"nx\">prmarr</span><span class=\"p\">.</span><span class=\"nx\">length</span><span class=\"p\">;</span> <span class=\"nx\">i</span><span class=\"o\">++</span><span class=\"p\">)</span> <span class=\"p\">{</span>\n    <span class=\"kd\">var</span> <span class=\"nx\">tmparr</span> <span class=\"o\">=</span> <span class=\"nx\">prmarr</span><span class=\"p\">[</span><span class=\"nx\">i</span><span class=\"p\">].</span><span class=\"nf\">split</span><span class=\"p\">(</span><span class=\"dl\">\"</span><span class=\"s2\">=</span><span class=\"dl\">\"</span><span class=\"p\">);</span>\n    <span class=\"nx\">params</span><span class=\"p\">[</span><span class=\"nx\">tmparr</span><span class=\"p\">[</span><span class=\"mi\">0</span><span class=\"p\">]]</span> <span class=\"o\">=</span> <span class=\"nx\">tmparr</span><span class=\"p\">[</span><span class=\"mi\">1</span><span class=\"p\">];</span>\n<span class=\"p\">}</span>\n  <span class=\"nt\">&lt;/script&gt;</span>\n<span class=\"nt\">&lt;/head&gt;</span>\n\n<span class=\"nt\">&lt;body&gt;</span>\n  <span class=\"nt\">&lt;h3&gt;</span>Output<span class=\"nt\">&lt;/h3&gt;</span>\n  <span class=\"nt\">&lt;h3&gt;</span>Search Results<span class=\"nt\">&lt;/h3&gt;</span>\n  <span class=\"nt\">&lt;p&gt;&lt;div</span> <span class=\"na\">id=</span><span class=\"s\">\"table\"</span><span class=\"nt\">&gt;&lt;/div&gt;&lt;/p&gt;</span>\n  <span class=\"nt\">&lt;h3&gt;</span>Compound Details<span class=\"nt\">&lt;/h3&gt;</span>\n  <span class=\"nt\">&lt;p&gt;&lt;div</span> <span class=\"na\">id=</span><span class=\"s\">\"details\"</span><span class=\"nt\">&gt;&lt;/div&gt;&lt;/p&gt;</span>\n  <span class=\"nt\">&lt;h3&gt;</span>JSON reply<span class=\"nt\">&lt;/h3&gt;</span>\n  <span class=\"nt\">&lt;p&gt;&lt;div</span> <span class=\"na\">id=</span><span class=\"s\">\"json\"</span><span class=\"nt\">&gt;</span>Nothing yet<span class=\"nt\">&lt;/div&gt;&lt;/p&gt;</span>\n  <span class=\"nt\">&lt;script </span><span class=\"na\">type=</span><span class=\"s\">\"text/javascript\"</span><span class=\"nt\">&gt;</span>\n<span class=\"kd\">var</span> <span class=\"nx\">searcher</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"nx\">Openphacts</span><span class=\"p\">.</span><span class=\"nc\">ConceptWikiSearch</span><span class=\"p\">(</span>\n  <span class=\"dl\">\"</span><span class=\"s2\">https://beta.openphacts.org</span><span class=\"dl\">\"</span><span class=\"p\">,</span>\n  <span class=\"nx\">params</span><span class=\"p\">[</span><span class=\"dl\">\"</span><span class=\"s2\">app_id</span><span class=\"dl\">\"</span><span class=\"p\">],</span> <span class=\"nx\">params</span><span class=\"p\">[</span><span class=\"dl\">\"</span><span class=\"s2\">app_key</span><span class=\"dl\">\"</span><span class=\"p\">]</span>\n<span class=\"p\">);</span>\n<span class=\"kd\">var</span> <span class=\"nx\">callback</span> <span class=\"o\">=</span> <span class=\"kd\">function</span><span class=\"p\">(</span><span class=\"nx\">success</span><span class=\"p\">,</span> <span class=\"nx\">status</span><span class=\"p\">,</span> <span class=\"nx\">response</span><span class=\"p\">){</span>\n  <span class=\"nb\">document</span><span class=\"p\">.</span><span class=\"nf\">getElementById</span><span class=\"p\">(</span><span class=\"dl\">\"</span><span class=\"s2\">json</span><span class=\"dl\">\"</span><span class=\"p\">).</span><span class=\"nx\">innerHTML</span> <span class=\"o\">=</span> <span class=\"nx\">JSON</span><span class=\"p\">.</span><span class=\"nf\">stringify</span><span class=\"p\">(</span><span class=\"nx\">response</span><span class=\"p\">);</span>\n  <span class=\"nx\">html</span> <span class=\"o\">=</span> <span class=\"dl\">\"</span><span class=\"s2\">&lt;table&gt;</span><span class=\"dl\">\"</span><span class=\"p\">;</span>\n  <span class=\"k\">for</span> <span class=\"p\">(</span><span class=\"kd\">var</span> <span class=\"nx\">i</span><span class=\"o\">=</span><span class=\"mi\">0</span><span class=\"p\">;</span> <span class=\"nx\">i</span><span class=\"o\">&lt;</span><span class=\"nx\">response</span><span class=\"p\">.</span><span class=\"nx\">length</span><span class=\"p\">;</span> <span class=\"nx\">i</span><span class=\"o\">++</span><span class=\"p\">)</span> <span class=\"p\">{</span>\n    <span class=\"nx\">html</span> <span class=\"o\">+=</span> <span class=\"dl\">\"</span><span class=\"s2\">&lt;tr&gt;</span><span class=\"dl\">\"</span><span class=\"p\">;</span>\n    <span class=\"nx\">html</span> <span class=\"o\">+=</span> <span class=\"dl\">\"</span><span class=\"s2\">&lt;td&gt;</span><span class=\"dl\">\"</span><span class=\"p\">;</span>\n    <span class=\"nx\">html</span> <span class=\"o\">+=</span> <span class=\"dl\">\"</span><span class=\"s2\">Name: &lt;span&gt;</span><span class=\"dl\">\"</span> <span class=\"o\">+</span>\n      <span class=\"nx\">response</span><span class=\"p\">[</span><span class=\"nx\">i</span><span class=\"p\">].</span><span class=\"nx\">prefLabel</span> <span class=\"o\">+</span>\n      <span class=\"dl\">\"</span><span class=\"s2\">&lt;/span&gt;</span><span class=\"dl\">\"</span><span class=\"p\">;</span>\n    <span class=\"nx\">html</span> <span class=\"o\">+=</span> <span class=\"dl\">\"</span><span class=\"s2\">&lt;/td&gt;</span><span class=\"dl\">\"</span><span class=\"p\">;</span>\n    <span class=\"nx\">html</span> <span class=\"o\">+=</span> <span class=\"dl\">\"</span><span class=\"s2\">&lt;/tr&gt;</span><span class=\"dl\">\"</span><span class=\"p\">;</span>\n  <span class=\"p\">}</span>\n  <span class=\"nx\">html</span> <span class=\"o\">+=</span> <span class=\"dl\">\"</span><span class=\"s2\">&lt;/table&gt;</span><span class=\"dl\">\"</span><span class=\"p\">;</span>\n  <span class=\"nb\">document</span><span class=\"p\">.</span><span class=\"nf\">getElementById</span><span class=\"p\">(</span><span class=\"dl\">\"</span><span class=\"s2\">table</span><span class=\"dl\">\"</span><span class=\"p\">).</span><span class=\"nx\">innerHTML</span> <span class=\"o\">=</span> <span class=\"nx\">html</span><span class=\"p\">;</span>\n<span class=\"p\">};</span>\n<span class=\"nx\">searcher</span><span class=\"p\">.</span><span class=\"nf\">byTag</span><span class=\"p\">(</span>\n  <span class=\"dl\">'</span><span class=\"s1\">Aspirin</span><span class=\"dl\">'</span><span class=\"p\">,</span> <span class=\"dl\">'</span><span class=\"s1\">5</span><span class=\"dl\">'</span><span class=\"p\">,</span> <span class=\"dl\">'</span><span class=\"s1\">4</span><span class=\"dl\">'</span><span class=\"p\">,</span>\n  <span class=\"dl\">'</span><span class=\"s1\">07a84994-e464-4bbf-812a-a4b96fa3d197</span><span class=\"dl\">'</span><span class=\"p\">,</span>\n  <span class=\"nx\">callback</span>\n<span class=\"p\">);</span>\n  <span class=\"nt\">&lt;/script&gt;</span>\n<span class=\"nt\">&lt;/body&gt;</span>\n<span class=\"nt\">&lt;/html&gt;</span>\n</code></pre></div></div>",
      "summary": "HTML (HyperText Markup Language), the language of the web, is no longer the only language of the web. But it still is the primary language in which source code of webpages is shared. Originally, HTML pages were always static: the only HTML source of a web page was that was downloaded from a website. Nowadays, much HTML the is visualized in your web browser, is generated on the fly with JavaScript. In fact, that is exactly what you will learn to do in this course.",
      
      "date_published": "2013-10-30T00:00:00+00:00",
      "date_modified": "2013-10-30T00:00:00+00:00",
      "tags": ["pra3006"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2013/10/30/programming-in-life-sciences-10.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2013/10/30/programming-in-life-sciences-10.html",
      "title": "Programming in the Life Sciences #10: JavaScript Object Notation (JSON)",
      "content_html": "<p>As said, <a href=\"https://en.wikipedia.org/wiki/JSON\">JSON</a> is the format we will use as serialization format\nfor answers given by the <a href=\"https://dev.openphacts.org/docs\">Open PHACTS LDA</a>. The API actually supports\nXML, RDF, HTML, and TSV too, but I think JSON is a good balance between expressiveness and compactness.\nMoreover, and perhaps a much better argument, JSON works very well in a JavaScript environment: it is\nvery easy to convert the serialization into a data model:</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kd\">var</span> <span class=\"nx\">jsonData</span> <span class=\"o\">=</span> <span class=\"nx\">JSON</span><span class=\"p\">.</span><span class=\"nf\">parse</span><span class=\"p\">(</span><span class=\"nx\">jsonString</span><span class=\"p\">);</span>\n</code></pre></div></div>\n\n<p>Now, we previously covered maps. Maps have keys and values: the keys unlock a particular value.\nFor example, take this JavaScript:</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kd\">var</span> <span class=\"nx\">map</span> <span class=\"o\">=</span> <span class=\"p\">{</span> <span class=\"dl\">\"</span><span class=\"s2\">key</span><span class=\"dl\">\"</span><span class=\"p\">:</span> <span class=\"dl\">\"</span><span class=\"s2\">value</span><span class=\"dl\">\"</span><span class=\"p\">,</span> <span class=\"dl\">\"</span><span class=\"s2\">key2</span><span class=\"dl\">\"</span><span class=\"p\">:</span> <span class=\"dl\">\"</span><span class=\"s2\">value2</span><span class=\"dl\">\"</span> <span class=\"p\">};</span>\n</code></pre></div></div>\n\n<p>We define here a key-value object, and we can access the two values with the two keys:</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nx\">map</span><span class=\"p\">[</span><span class=\"dl\">\"</span><span class=\"s2\">key2</span><span class=\"dl\">\"</span><span class=\"p\">];</span> <span class=\"c1\">// == value2</span>\n</code></pre></div></div>\n\n<p>These examples are JavaScript source code. Not a string. The content of the map variable is a data\nstructure. But when we communicate with a web service, we need a (string) serialization of the data\nmodel, because we cannot send around memory pointers (which a variable is) because they are only\nvalid on a single machine.</p>\n\n<p>This is where the JSON format comes in. We can convert the content of the above map variable into a\nstring representation with this code:</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kd\">var</span> <span class=\"nx\">mapStringified</span> <span class=\"o\">=</span> <span class=\"nx\">JSON</span><span class=\"p\">.</span><span class=\"nf\">stringify</span><span class=\"p\">(</span><span class=\"nx\">map</span><span class=\"p\">);</span>\n</code></pre></div></div>\n\n<p>which gives us the following output:</p>\n\n<div class=\"language-json highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"p\">{</span><span class=\"nl\">\"key\"</span><span class=\"p\">:</span><span class=\"s2\">\"value\"</span><span class=\"p\">,</span><span class=\"nl\">\"key2\"</span><span class=\"p\">:</span><span class=\"s2\">\"value2\"</span><span class=\"p\">}</span><span class=\"w\">\n</span></code></pre></div></div>\n\n<p>This string looks an awful lot like the JavaScript code we wrote earlier.</p>\n\n<p>And, likewise we can convert the JSON string back into a JavaScript data model again, with:</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kd\">var</span> <span class=\"nx\">mapAgain</span> <span class=\"o\">=</span> <span class=\"nx\">JSON</span><span class=\"p\">.</span><span class=\"nf\">parse</span><span class=\"p\">(</span><span class=\"nx\">mapStringified</span><span class=\"p\">);</span>\n</code></pre></div></div>\n\n<p>Now, I did warn you earlier that values can be lists and maps itself again, so consider this\nJSON example from Wikipedia:</p>\n\n<div class=\"language-json highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"p\">{</span><span class=\"w\">\n    </span><span class=\"nl\">\"id\"</span><span class=\"p\">:</span><span class=\"w\"> </span><span class=\"mi\">1</span><span class=\"p\">,</span><span class=\"w\">\n    </span><span class=\"nl\">\"name\"</span><span class=\"p\">:</span><span class=\"w\"> </span><span class=\"s2\">\"Foo\"</span><span class=\"p\">,</span><span class=\"w\">\n    </span><span class=\"nl\">\"price\"</span><span class=\"p\">:</span><span class=\"w\"> </span><span class=\"mi\">123</span><span class=\"p\">,</span><span class=\"w\">\n    </span><span class=\"nl\">\"tags\"</span><span class=\"p\">:</span><span class=\"w\"> </span><span class=\"p\">[</span><span class=\"w\"> </span><span class=\"s2\">\"Bar\"</span><span class=\"p\">,</span><span class=\"w\"> </span><span class=\"s2\">\"Eek\"</span><span class=\"w\"> </span><span class=\"p\">],</span><span class=\"w\">\n    </span><span class=\"nl\">\"stock\"</span><span class=\"p\">:</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\">\n        </span><span class=\"nl\">\"warehouse\"</span><span class=\"p\">:</span><span class=\"w\"> </span><span class=\"mi\">300</span><span class=\"p\">,</span><span class=\"w\">\n        </span><span class=\"nl\">\"retail\"</span><span class=\"p\">:</span><span class=\"w\"> </span><span class=\"mi\">20</span><span class=\"w\">\n    </span><span class=\"p\">}</span><span class=\"w\">\n</span><span class=\"p\">}</span><span class=\"w\">\n</span></code></pre></div></div>\n\n<p>Here we see that the value behind the stock key is another map, and the value behind the tags\nkey is a list. This creates a quite flexible serialization format, which is happily used by\nOpen PHACTS. (And for the semantic web readers, yes, we can make JSON more semantic. The Open\nPHACTS LDA supports a “rdfjson” format.)</p>",
      "summary": "As said, JSON is the format we will use as serialization format for answers given by the Open PHACTS LDA. The API actually supports XML, RDF, HTML, and TSV too, but I think JSON is a good balance between expressiveness and compactness. Moreover, and perhaps a much better argument, JSON works very well in a JavaScript environment: it is very easy to convert the serialization into a data model:",
      
      "date_published": "2013-10-30T00:00:00+00:00",
      "date_modified": "2013-10-30T00:00:00+00:00",
      "tags": ["pra3006"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2013/10/29/programming-in-life-sciences-8-coding.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2013/10/29/programming-in-life-sciences-8-coding.html",
      "title": "Programming in the Life Sciences #8: coding standards",
      "content_html": "<p>Never underestimate the power of lack of coding standards in code obfuscation. Just try randomly to\nread code you wrote a year ago or four years ago. You’ll be surprised with what you find. Coding\nstandards are like the grammar in writing: they ensure that our message gets understood. Of course,\nthe primary goal is that the CPU understands what you mean, but because programming languages are\nnot your native language, you may not always say what you think you are saying.</p>\n\n<h2 id=\"copyright-and-licensing\">Copyright and Licensing</h2>\n\n<p>First standard is attribution: if you use the solution of someone else, you write in your source\ncode whom wrote the solution. Secondly, you must allow others to do the same. Therefore, you always\nadd your name (and normally email address) to your source code, and under what conditions people\nmay use your code. This is commonly done by assigning a license. Open Source licenses promote\n(scientific) collaboration, and give others the rights to use your solution, redistribution\nmodifications, etc. They may explicitly require attributions, but often not. In a scholarly setting,\nyou always give attribution, even if not required by the license. Remember, that software falls\nunder copyright but algorithms typically not. Copyright/author and license information is typically\nadded to source code using a <a href=\"http://chem-bla-ics.blogspot.nl/2009/06/making-patches-attribution-copyright.html\">header</a>.</p>\n\n<h2 id=\"documentation\">Documentation</h2>\n\n<p>The second thing is to document what your code is supposed to do, what assumptions are made,\nhow people should use it, and preferably under what conditions it will fail. Comments in your\nsource are just as much documentation as a tutorial in Word format. They are complementary, and\ndocumentation must not only be targeted at users, but also at yourself so that you understand\nwhy you added that weird check. You will not (have to) remember in two years.</p>\n\n<h2 id=\"coding-standards\">Coding standards</h2>\n\n<p>Just like English has coding standards, programming language have too. Both also have styles,\nand a selection of a style is up to the author, but consistency is important. What coding\nstandards should you be thinking about, include consistent use of variable and method names,\nkeeping code blocks small, etc. For example, compare the following two code examples which do\nthe same thing:</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kd\">var</span> <span class=\"nx\">method</span> <span class=\"o\">=</span> <span class=\"kd\">function</span><span class=\"p\">(</span><span class=\"nx\">string</span><span class=\"p\">)</span> <span class=\"p\">{</span>\n  <span class=\"nx\">number</span> <span class=\"o\">=</span> <span class=\"mi\">0</span>\n  <span class=\"k\">for </span><span class=\"p\">(</span><span class=\"kd\">var</span> <span class=\"nx\">i</span><span class=\"o\">=</span><span class=\"mi\">0</span><span class=\"p\">;</span> <span class=\"nx\">i</span><span class=\"o\">&lt;</span><span class=\"nx\">string</span><span class=\"p\">.</span><span class=\"nx\">length</span><span class=\"p\">;</span> <span class=\"nx\">i</span><span class=\"o\">++</span><span class=\"p\">)</span> <span class=\"p\">{</span>\n    <span class=\"k\">if </span><span class=\"p\">(</span><span class=\"nx\">string</span><span class=\"p\">[</span><span class=\"nx\">i</span><span class=\"p\">]</span> <span class=\"o\">==</span> <span class=\"dl\">\"</span><span class=\"s2\">A</span><span class=\"dl\">\"</span><span class=\"p\">)</span> <span class=\"nx\">number</span> <span class=\"o\">=</span> <span class=\"nx\">number</span> <span class=\"o\">+</span><span class=\"mi\">1</span> \n  <span class=\"p\">}</span>\n  <span class=\"k\">return</span> <span class=\"nx\">number</span>\n<span class=\"p\">}</span>\n</code></pre></div></div>\n\n<p>And this version:</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kd\">var</span> <span class=\"nx\">countTheANucleotides</span> <span class=\"o\">=</span> <span class=\"kd\">function</span><span class=\"p\">(</span><span class=\"nx\">dnaSequence</span><span class=\"p\">)</span> <span class=\"p\">{</span>\n  <span class=\"nx\">count</span> <span class=\"o\">=</span> <span class=\"mi\">0</span>\n  <span class=\"c1\">// iterate over all nucleotides in the DNA string</span>\n  <span class=\"k\">for </span><span class=\"p\">(</span><span class=\"kd\">var</span> <span class=\"nx\">i</span><span class=\"o\">=</span><span class=\"mi\">0</span><span class=\"p\">;</span> <span class=\"nx\">i</span><span class=\"o\">&lt;</span><span class=\"nx\">dnaSequence</span><span class=\"p\">.</span><span class=\"nx\">length</span><span class=\"p\">;</span> <span class=\"nx\">i</span><span class=\"o\">++</span><span class=\"p\">)</span> <span class=\"p\">{</span>\n    <span class=\"k\">if </span><span class=\"p\">(</span><span class=\"nx\">dnaSequence</span><span class=\"p\">[</span><span class=\"nx\">i</span><span class=\"p\">]</span> <span class=\"o\">==</span> <span class=\"dl\">\"</span><span class=\"s2\">A</span><span class=\"dl\">\"</span><span class=\"p\">)</span> <span class=\"nx\">count</span> <span class=\"o\">=</span> <span class=\"nx\">count</span> <span class=\"o\">+</span><span class=\"mi\">1</span> \n  <span class=\"p\">}</span>\n  <span class=\"k\">return</span> <span class=\"nx\">count</span>\n<span class=\"p\">}</span>\n</code></pre></div></div>\n\n<p>Which one do you find easier to understand the function of?</p>\n\n<ol>\n  <li>use clear, descriptive variable and method names</li>\n  <li>use source code comments to describe the intention of source code</li>\n  <li>keep source code lines short enough that you can read the full line without (horizontal) scrolling</li>\n  <li>keep code blocks short enough that the fit a single screen (say, 25 lines max)</li>\n</ol>\n\n<h2 id=\"unit-testing\">Unit testing</h2>\n\n<p>It is important to realize that what you intend to have the computer to calculate is\nsomething different that what your source code actually tells the computer to do. Even\nmore important is to realize that it is not always your fault if the calculation goes\nwrong; in particular, the input you pass to some program can always be crafted such,\nthat it will fool your code in doing unintended things.</p>\n\n<p>But, a common cause of misbehaving code is the author itself. At first (and many, many\ntimes after that) it’s just getting the code to compile: missing semi-colons, typos in\nvariable names, etc, etc. After a bit, and hunting you down to your grave, are bugs\ncaused by unintuitive features of programming language, libraries you’re using, etc.\nCommon (and often expensive) mistakes include for-loops missing the first or the last\nelement, incorrect conversion of units (<a href=\"https://en.wikipedia.org/wiki/Mars_Climate_Orbiter\">125 M$ expensive!</a>),\netc.</p>\n\n<p>Fortunately, we can call in the help of computers for this too. We have code checking\ntools, and importantly, libraries to help us define (unit) tests. These tests call\nrunning code, and check if the calculated results are matching our expectation. For\nexample, for JavaScript we could use the <a href=\"https://github.com/jquery/qunit/blob/master/MIT-LICENSE.txt\">MIT-licensed</a>\nqunit. For example, we could write the following tests (in qunit):</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nf\">test</span><span class=\"p\">(</span> <span class=\"dl\">\"</span><span class=\"s2\">counting tests</span><span class=\"dl\">\"</span><span class=\"p\">,</span> <span class=\"kd\">function</span><span class=\"p\">()</span> <span class=\"p\">{</span>\n  <span class=\"nf\">equal</span><span class=\"p\">(</span><span class=\"mi\">1</span><span class=\"p\">,</span> <span class=\"nf\">countTheANucleotides</span><span class=\"p\">(</span><span class=\"dl\">\"</span><span class=\"s2\">AGCT</span><span class=\"dl\">\"</span><span class=\"p\">));</span>\n  <span class=\"nf\">equal</span><span class=\"p\">(</span><span class=\"mi\">4</span><span class=\"p\">,</span> <span class=\"nf\">countTheANucleotides</span><span class=\"p\">(</span><span class=\"dl\">\"</span><span class=\"s2\">AAAA</span><span class=\"dl\">\"</span><span class=\"p\">));</span>\n  <span class=\"nf\">equal</span><span class=\"p\">(</span><span class=\"mi\">0</span><span class=\"p\">,</span> <span class=\"nf\">countTheANucleotides</span><span class=\"p\">(</span><span class=\"dl\">\"</span><span class=\"s2\">GCGC</span><span class=\"dl\">\"</span><span class=\"p\">));</span>\n<span class=\"p\">});</span>\n</code></pre></div></div>\n\n<p>OK, you get the idea. That other scientists really start to care about these things,\nis shown by these two recent papers:</p>\n\n<ul>\n  <li><a href=\"http://dx.doi.org/10.1371/journal.pcbi.1002802\">Ten simple rules for the open development of scientific software</a></li>\n  <li><a href=\"http://dx.doi.org/10.1371/journal.pcbi.1003285\">Ten simple rules for reproducible computational research</a></li>\n</ul>",
      "summary": "Never underestimate the power of lack of coding standards in code obfuscation. Just try randomly to read code you wrote a year ago or four years ago. You’ll be surprised with what you find. Coding standards are like the grammar in writing: they ensure that our message gets understood. Of course, the primary goal is that the CPU understands what you mean, but because programming languages are not your native language, you may not always say what you think you are saying.",
      
      "date_published": "2013-10-29T00:00:00+00:00",
      "date_modified": "2013-10-29T00:00:00+00:00",
      "tags": ["pra3006"],
      "_references": [{ "url": "https://doi.org/10.1371/journal.pcbi.1002802" },{ "url": "https://doi.org/10.1371/journal.pcbi.1003285" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2013/10/23/exercise-what-variable-type-would-you.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2013/10/23/exercise-what-variable-type-would-you.html",
      "title": "Programming in the Life Sciences #7: theory",
      "content_html": "<p>No course, with some good theory. In <a href=\"http://chem-bla-ics.blogspot.nl/2013/10/programming-in-life-sciences-1-six-day.html\">this six-day course</a>,\nI plan to cover this computing theory. It’s very practice oriented:</p>\n\n<p><img src=\"/assets/images/theorySlide.png\" alt=\"\" /></p>\n\n<p>That should give them enough head start to work on something <a href=\"http://chem-bla-ics.blogspot.nl/2013/10/programming-in-life-sciences-5.html\">like this</a>.\nThe material will be more extensive, but I’ll give myself a head start, with some initial content.</p>\n\n<h2 id=\"introduction\">Introduction</h2>\n<p>Programming in the Life Sciences is done to solve problems in the life sciences, but only\nproblems that can be solved with pen and paper too. Programming cannot measure metabolites\nin a cell. For that, you need equipment that gives the things it measured as data as input to\nthe computer.</p>\n\n<p>Instead, the program defines some computation that is done on the computer. For example, noise\nreduction, DNA/RNA/protein sequence alignment, metabolite identification, etc. But all\ncomputation start with input data.</p>\n\n<p>The program tells the computer what it should do, step by step. Get the data from the LC/MS; find\npeaks; group peaks at the same retention time; match that against a metabolite spectral database;\ndetermine the best match; report the best three matches to the user via the screen. Step by step.</p>\n\n<p>The computer consists of input/output devices (to get data; to present results), various kinds of\nmemory (to remember things), and a central processing unit (CPU) that performs the computation\nsteps.</p>\n\n<p>Considering all this, programming is to define what the computer should do, in a (programming)\nlanguage that the computer understands. Note that I say “the computer understands” rather than\n“the CPU understands”. The CPU only speaks one language (machine instructions). But we use a\nhigher level programming language, which is much more compact and easier to read/understand. A\ncompiler translate this higher level language into machine instructions (sometimes more\ncompilers).</p>\n\n<h2 id=\"data-types\">Data Types</h2>\n<p>The programming language says do this, do that. It does not know about data. Fortunately, it\nknows about bit, and bits we can use to store data. That way, we can instruct the CPU to do\nthings like: OK, take the measured LC/MS data, take the MS at retention time 5, then start with\nthe first m/z value, and if that is larger than 10, then… etc. We do not want to hard code the\ndata in our program, so we instruct the CPU to remember it. The computer has various levels of\nmemory that are relevant (ignoring those at a CPU level!): variables stored in the working\nmemory, and data stored on external memory (hard disk, USB disk, LC/MS machine).</p>\n\n<p><em>Exercise: write a program that counts the sum of all numbers starting with 1 up to 50 without\nusing variables.</em></p>\n\n<p>Some programming languages have variables types. This variable is a non-integer number, this\nvariable is a text string. This ensures that you cannot sum up “cat” with 5.3. This is called\nvariable typing. Some programming language have hard typing (types are defined in the source\ncode), while others have dynamic typing (the program figures it out when it is compiling), and\nsome even no typing at all (the computer will complain when it runs).</p>\n\n<p>Example basic variable types include: string, integers, floats, and booleans. Strings can be used\nto remember names; integers are needed for counts and iterations (how many m/z values did I\nalready look at again??), and floats are needed for pretty much all scientific data. A boolean is\na yes/no type, or true/false.</p>\n\n<p>Also, variables do not have units. Remember those high school days? <em>“John, six WHAT??”</em>, <em>“Umm,\nsix mole, sir.”</em> Variables do not have units. Thus while you cannot calculate the sum of “cat”\nand 5.3, a computer has no problem calculating the sum of six mole and three days.</p>\n\n<h2 id=\"complex-types\">Complex Types</h2>\n\n<p><em>Exercise: What variable type would you use for that photo you took last week of that western blot?</em></p>\n\n<p>It is clear that these basic types don’t suffice. This touches on the topic of computer\nrepresentation. How does a computer keep a western blot in memory? That photo you tool with\nyour Android digitized the western blot into a matrix of numbers: if it was a greyscale photo,\nthen a single integer per position.</p>\n\n<p>Programming languages have various complex types, though most even support the definition of even\nmore complicated data structures. But the more basic complex types first: list. A list, vector,\nor array all refer to the same concept: a list of variables, typically of the same type. For\nexample, a mathematical vector is a list of floats (e.g. <code class=\"language-plaintext highlighter-rouge\">float[]</code> in JavaScript, where the\n<code class=\"language-plaintext highlighter-rouge\">[]</code> refers to the list or array nature). A string, actually, which we marked as a “basic”\nvariable type, is really a complex one too: it is a list of characters. That is, the string “cat”\nis a list of three characters. Importantly, each item in the list has an index, and the full list\nhas a length. Depending on the programming language, the first item in the list has index 0 or 1.</p>\n\n<p>As said, a list typically contains variables of the same type, just because it is easier to work\nwith. But the list can contain complex types too. For example, we can create a list of lists (we\nwould write <code class=\"language-plaintext highlighter-rouge\">float[][]</code>). Each element in the top list is a list again; that is, the first\nelement of the outer list is again a list. This matches vary closes the mathematical matrix.</p>\n\n<p>A second complex type important in this course is the map. A map is basically a list of key-value\npairs, where they keys take the role of the index in lists. Instead of asking for the list item\nwith index 7, we ask for the value behind a certain key. And, like we could make a list of lists,\nwe can also make a map of maps, etc. Keep this in mind! We will use this extensively in this\ncourse.</p>\n\n<h2 id=\"automation\">Automation</h2>\n<p>Now that we know how the CPU uses memory, we turn back to what the processor must do, according\nto our program. First, I mentioned the step by step at the start. This is critical: the processor\nhas a linear progression through the steps it must do. I can only go forward, and only step by\nstep. It cannot go back. Yet, that is exactly what we write in a for-loop, like in this four line\nJavaScript example:</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kd\">var</span> <span class=\"nx\">sum</span> <span class=\"o\">=</span> <span class=\"mi\">0</span><span class=\"p\">;</span>\n<span class=\"k\">for </span><span class=\"p\">(</span><span class=\"kd\">var</span> <span class=\"nx\">i</span><span class=\"o\">=</span><span class=\"mi\">1</span><span class=\"p\">;</span> <span class=\"nx\">i</span><span class=\"o\">&lt;</span><span class=\"mi\">50</span><span class=\"p\">;</span> <span class=\"nx\">i</span><span class=\"o\">=</span><span class=\"mi\">1</span><span class=\"o\">+</span><span class=\"mi\">1</span><span class=\"p\">)</span> <span class=\"p\">{</span> \n  <span class=\"nx\">sum</span> <span class=\"o\">=</span> <span class=\"nx\">sum</span> <span class=\"o\">+</span> <span class=\"nx\">i</span><span class=\"p\">;</span>\n<span class=\"p\">}</span>\n</code></pre></div></div>\n\n<p>This code defines the variable sum in the first line, and then starts counting, from 1 to 50, one\nby one, and adding that number to the sum. This loop is only for our convenience. This is how the\ncomputer will run this program (and at a CPU machine instruction level it’s even longer):</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kd\">var</span> <span class=\"nx\">sum</span> <span class=\"o\">=</span> <span class=\"mi\">0</span><span class=\"p\">;</span>\n<span class=\"kd\">var</span> <span class=\"nx\">i</span><span class=\"o\">=</span><span class=\"mi\">1</span><span class=\"p\">;</span>\n<span class=\"nx\">sum</span> <span class=\"o\">=</span> <span class=\"nx\">sum</span> <span class=\"o\">+</span> <span class=\"nx\">i</span><span class=\"p\">;</span>\n<span class=\"nx\">i</span><span class=\"o\">=</span><span class=\"nx\">i</span><span class=\"o\">+</span><span class=\"mi\">1</span><span class=\"p\">;</span>\n<span class=\"nx\">sum</span> <span class=\"o\">=</span> <span class=\"nx\">sum</span> <span class=\"o\">+</span> <span class=\"nx\">i</span><span class=\"p\">;</span>\n<span class=\"nx\">i</span><span class=\"o\">=</span><span class=\"nx\">i</span><span class=\"o\">+</span><span class=\"mi\">1</span><span class=\"p\">;</span>\n<span class=\"nx\">sum</span> <span class=\"o\">=</span> <span class=\"nx\">sum</span> <span class=\"o\">+</span> <span class=\"nx\">i</span><span class=\"p\">;</span>\n<span class=\"nx\">i</span><span class=\"o\">=</span><span class=\"nx\">i</span><span class=\"o\">+</span><span class=\"mi\">1</span><span class=\"p\">;</span>\n<span class=\"nx\">sum</span> <span class=\"o\">=</span> <span class=\"nx\">sum</span> <span class=\"o\">+</span> <span class=\"nx\">i</span><span class=\"p\">;</span>\n<span class=\"nx\">i</span><span class=\"o\">=</span><span class=\"nx\">i</span><span class=\"o\">+</span><span class=\"mi\">1</span><span class=\"p\">;</span>\n<span class=\"nx\">sum</span> <span class=\"o\">=</span> <span class=\"nx\">sum</span> <span class=\"o\">+</span> <span class=\"nx\">i</span><span class=\"p\">;</span>\n<span class=\"nx\">i</span><span class=\"o\">=</span><span class=\"nx\">i</span><span class=\"o\">+</span><span class=\"mi\">1</span><span class=\"p\">;</span>\n<span class=\"nx\">sum</span> <span class=\"o\">=</span> <span class=\"nx\">sum</span> <span class=\"o\">+</span> <span class=\"nx\">i</span><span class=\"p\">;</span>\n<span class=\"nx\">i</span><span class=\"o\">=</span><span class=\"nx\">i</span><span class=\"o\">+</span><span class=\"mi\">1</span><span class=\"p\">;</span>\n<span class=\"nx\">sum</span> <span class=\"o\">=</span> <span class=\"nx\">sum</span> <span class=\"o\">+</span> <span class=\"nx\">i</span><span class=\"p\">;</span>\n<span class=\"nx\">i</span><span class=\"o\">=</span><span class=\"nx\">i</span><span class=\"o\">+</span><span class=\"mi\">1</span><span class=\"p\">;</span>\n<span class=\"nx\">sum</span> <span class=\"o\">=</span> <span class=\"nx\">sum</span> <span class=\"o\">+</span> <span class=\"nx\">i</span><span class=\"p\">;</span>\n<span class=\"c1\">// ...</span>\n</code></pre></div></div>\n\n<p>OK, I won’t give the full sequence of steps the computer takes. I guess you can see the virtues\nof higher level programming languages :) Importantly, it is a linear list of steps it takes.</p>\n\n<p>Another important control structures in programming languages is the if-statement. This gives us\nthe power of making decisions. For example, we can skip the 7 in the above summation:</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kd\">var</span> <span class=\"nx\">sum</span> <span class=\"o\">=</span> <span class=\"mi\">0</span><span class=\"p\">;</span>\n<span class=\"k\">for </span><span class=\"p\">(</span><span class=\"kd\">var</span> <span class=\"nx\">i</span><span class=\"o\">=</span><span class=\"mi\">1</span><span class=\"p\">;</span> <span class=\"nx\">i</span><span class=\"o\">&lt;</span><span class=\"mi\">50</span><span class=\"p\">;</span> <span class=\"nx\">i</span><span class=\"o\">=</span><span class=\"mi\">1</span><span class=\"o\">+</span><span class=\"mi\">1</span><span class=\"p\">)</span> <span class=\"p\">{</span>\n  <span class=\"k\">if </span><span class=\"p\">(</span><span class=\"nx\">i</span> <span class=\"o\">==</span> <span class=\"mi\">7</span><span class=\"p\">)</span> <span class=\"p\">{</span>\n  <span class=\"p\">}</span> <span class=\"k\">else</span> <span class=\"p\">{</span>\n     <span class=\"nx\">sum</span> <span class=\"o\">=</span> <span class=\"nx\">sum</span> <span class=\"o\">+</span> <span class=\"nx\">i</span><span class=\"p\">;</span>\n  <span class=\"p\">}</span>\n<span class=\"p\">}</span>\n</code></pre></div></div>\n\n<p>But I yet did not discuss another important concept: the operator. The operator tells the\ncomputer what operation to perform, and how. This last source code example uses various operators: <code class=\"language-plaintext highlighter-rouge\">=</code>, <code class=\"language-plaintext highlighter-rouge\">&lt;</code>, <code class=\"language-plaintext highlighter-rouge\">+</code>, and <code class=\"language-plaintext highlighter-rouge\">==</code>. The first is an assignment operator: it assigns the value ‘0’\nto the variable sum. This operation does not return anything. The <code class=\"language-plaintext highlighter-rouge\">&lt;</code> operator compares two\nvariable values, or a variable value with a specific value. For example, the above code compares\nthe value behind the ‘i’ variable with 50; indeed, it does not compare 50 with “i”, which is the\nvariable name. The + operator follows the mathematical + operator for floats and integers; for\nstrings the + operator performs a concatenation: <code class=\"language-plaintext highlighter-rouge\">\"cat\" + \"fish\"</code> is not one less fish, but a\n<code class=\"language-plaintext highlighter-rouge\">\"catfish\"</code>. Note that these two operators, &lt; and +, return a new value. The <code class=\"language-plaintext highlighter-rouge\">&lt;</code> returns a\nboolean (yes, it’s smaller; no, it’s not smaller); the <code class=\"language-plaintext highlighter-rouge\">+</code> returns an integer if it was summing\nintegers, or a string when it concatenated two strings. The <code class=\"language-plaintext highlighter-rouge\">==</code> operator also returns a boolean:\ntrue of the two variables are the same (in general). During the course, we will see several more\noperators. Look out for them!</p>\n\n<p>In some way, this brings us to the next topic: functions of parameters. An operator is a special\nkind of function, and that will become more clear if I give an example function:</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kd\">function</span> <span class=\"nf\">add</span><span class=\"p\">(</span><span class=\"nx\">first</span><span class=\"p\">,</span> <span class=\"nx\">second</span><span class=\"p\">)</span> <span class=\"p\">{</span>\n  <span class=\"kd\">var</span> <span class=\"nx\">sum</span> <span class=\"o\">=</span> <span class=\"nx\">first</span> <span class=\"o\">+</span> <span class=\"nx\">second</span><span class=\"p\">;</span>\n  <span class=\"k\">return</span> <span class=\"nx\">sum</span><span class=\"p\">;</span>\n<span class=\"p\">}</span>\n</code></pre></div></div>\n\n<p>Effectively, we just made an alias function “add” which internally just uses the + operator, with\nthe exact same outcome.</p>\n\n<p><em>Exercise: what would be returned by these two function calls? 1. add(1,2); 2. add(“cat”,\n“fish”);</em></p>\n\n<p>This function example is not so interesting, and only makes the code harder to read. However,\nwhen the “body” of the function becomes larger, it allows you to easily replace a complex list\nof steps with one function call. Consider: <code class=\"language-plaintext highlighter-rouge\">sumAllNumbers(1,50)</code>.</p>\n\n<p>Now, if we collect many such functions, pretty much like books, we get a library. So, that one\nwas easy.</p>\n\n<p>That includes this episode of the <a href=\"http://chem-bla-ics.blogspot.nl/search/label/%23mscpils\">Programming in the Life Sciences</a>\nseries. I will continue later with the theory about Web Services and Clients, Serialization\nformats, and Other.</p>",
      "summary": "No course, with some good theory. In this six-day course, I plan to cover this computing theory. It’s very practice oriented:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/theorySlide.png",
      "date_published": "2013-10-23T00:00:00+00:00",
      "date_modified": "2013-10-23T00:00:00+00:00",
      "tags": ["pra3006"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2013/10/12/programming-in-life-sciences-6-functions.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2013/10/12/programming-in-life-sciences-6-functions.html",
      "title": "Programming in the Life Sciences #6: functions",
      "content_html": "<p>One key feature of programming languages is the following: first, there is linearity. This is an important point\nthat is not always clear to students who just start to program. In fact, ask yourself what the algorithm is for\ncounting the chairs in the room where you are now sitting. Could a computer do that in the same way? How should\nyour algorithm change? A key point is, is that the program is run step by step, in a linear way.</p>\n\n<p>However, we very easily jump to functions. In fact, we use so many libraries nowadays, this linearity is not so\nclear anymore. Things just happen with magic library calls. But at the same time, the library calls make our life\na lot easier: by using functions, we group functionality in easy to read and easier to understand blobs.</p>\n\n<p>OK, the previous example showed that we could use the HTML <code class=\"language-plaintext highlighter-rouge\">@onClick</code> attribute to provide further detail.\nBut I did not show how. This is how:</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nx\">html</span> <span class=\"o\">+=</span> <span class=\"dl\">\"</span><span class=\"s2\">Name: &lt;span onClick=</span><span class=\"se\">\\\"</span><span class=\"s2\">showDetails('</span><span class=\"dl\">\"</span> <span class=\"o\">+</span>\n  <span class=\"nf\">escape</span><span class=\"p\">(</span><span class=\"nx\">dataJSON</span><span class=\"p\">)</span> <span class=\"o\">+</span> <span class=\"dl\">\"</span><span class=\"se\">\\</span><span class=\"s2\">')</span><span class=\"se\">\\\"</span><span class=\"s2\">&gt;</span><span class=\"dl\">\"</span> <span class=\"o\">+</span> \n  <span class=\"nx\">response</span><span class=\"p\">[</span><span class=\"nx\">i</span><span class=\"p\">].</span><span class=\"nx\">prefLabel</span> <span class=\"o\">+</span> <span class=\"dl\">\"</span><span class=\"s2\">&lt;/span&gt;</span><span class=\"dl\">\"</span><span class=\"p\">;</span>\n</code></pre></div></div>\n\n<p>This code adds the <code class=\"language-plaintext highlighter-rouge\">@onClick</code> attribute and a function call to the <code class=\"language-plaintext highlighter-rouge\">showDetails()</code> method which takes one parameter,\nwhere we pass escaped JSON. That is non-trivial, I understand, and may be due to my limited knowledge of JavaScript.\nThe escaping of the JSON is needed to make quotes match in the generated HTML. In the function later, we can unescape\nit and get the original JSON again. Importantly, the dataJSON data contains all the details I like to show.</p>\n\n<p>Now, this functions needs to be defined. Yes, plural, because two functions are used in this code snippet: <code class=\"language-plaintext highlighter-rouge\">showDetails()</code>\nand <code class=\"language-plaintext highlighter-rouge\">escape()</code>. The last is defined by one of the used libraries. The <code class=\"language-plaintext highlighter-rouge\">showDetails()</code> function, however, I made up.\nSo, I had to define it elsewhere in the HTML document, and it looks like:</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kd\">var</span> <span class=\"nx\">showDetails</span> <span class=\"o\">=</span> <span class=\"kd\">function</span><span class=\"p\">(</span><span class=\"nx\">dataJSON</span><span class=\"p\">){</span>\n  <span class=\"nx\">data</span> <span class=\"o\">=</span> <span class=\"nx\">JSON</span><span class=\"p\">.</span><span class=\"nf\">parse</span><span class=\"p\">(</span><span class=\"nf\">unescape</span><span class=\"p\">(</span><span class=\"nx\">dataJSON</span><span class=\"p\">));</span>\n  <span class=\"nb\">document</span><span class=\"p\">.</span><span class=\"nf\">getElementById</span><span class=\"p\">(</span><span class=\"dl\">\"</span><span class=\"s2\">details</span><span class=\"dl\">\"</span><span class=\"p\">).</span><span class=\"nx\">innerHTML</span> <span class=\"o\">=</span>\n    <span class=\"nx\">data</span><span class=\"p\">.</span><span class=\"nx\">_about</span><span class=\"p\">;</span>\n<span class=\"p\">};</span>\n</code></pre></div></div>\n\n<p>This example actually gives the exact same output as the code in the previous post, but with one major difference.\nWe now can extend the function as much as we like, but the code to output the list of found compounds does not have\nto get more complex than it already is.</p>",
      "summary": "One key feature of programming languages is the following: first, there is linearity. This is an important point that is not always clear to students who just start to program. In fact, ask yourself what the algorithm is for counting the chairs in the room where you are now sitting. Could a computer do that in the same way? How should your algorithm change? A key point is, is that the program is run step by step, in a linear way.",
      
      "date_published": "2013-10-12T00:00:00+00:00",
      "date_modified": "2013-10-12T00:00:00+00:00",
      "tags": ["pra3006","javascript","html"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2013/10/09/programming-in-life-sciences-5.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2013/10/09/programming-in-life-sciences-5.html",
      "title": "Programming in the Life Sciences #5: converting the results into HTML",
      "content_html": "<p>Now that we have <a href=\"http://chem-bla-ics.blogspot.nl/2013/10/programming-in-life-sciences-4.html\">the communication working</a>\nwith the Open PHACTS LDA, it is time to make a nice GUI. I will not go into details, but we can use basic JavaScript to\niterate over the JSON results, and, for example, create a HTML table:</p>\n\n<p><img src=\"/assets/images/mscpils1_output.png\" alt=\"\" /></p>\n\n<p>In fact, I hooked in some HTML <code class=\"language-plaintext highlighter-rouge\">onClick()</code> functionality so that when you click one of the compound names, you get further\ndetails (under <em>Compound Details</em>), though that only outputs the ConceptWiki URI at this moment. A simple for-loop does\nthe heavy work:</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nx\">html</span> <span class=\"o\">=</span> <span class=\"dl\">\"</span><span class=\"s2\">&lt;table&gt;</span><span class=\"dl\">\"</span><span class=\"p\">;</span>\n<span class=\"k\">for </span><span class=\"p\">(</span><span class=\"kd\">var</span> <span class=\"nx\">i</span><span class=\"o\">=</span><span class=\"mi\">0</span><span class=\"p\">;</span> <span class=\"nx\">i</span><span class=\"o\">&lt;</span><span class=\"nx\">response</span><span class=\"p\">.</span><span class=\"nx\">length</span><span class=\"p\">;</span> <span class=\"nx\">i</span><span class=\"o\">++</span><span class=\"p\">)</span> <span class=\"p\">{</span>\n  <span class=\"nx\">html</span> <span class=\"o\">+=</span> <span class=\"dl\">\"</span><span class=\"s2\">&lt;tr&gt;</span><span class=\"dl\">\"</span><span class=\"p\">;</span>\n  <span class=\"nx\">html</span> <span class=\"o\">+=</span> <span class=\"dl\">\"</span><span class=\"s2\">&lt;td&gt;</span><span class=\"dl\">\"</span><span class=\"p\">;</span>\n  <span class=\"nx\">dataJSON</span> <span class=\"o\">=</span> <span class=\"nx\">JSON</span><span class=\"p\">.</span><span class=\"nf\">stringify</span><span class=\"p\">(</span><span class=\"nx\">response</span><span class=\"p\">[</span><span class=\"nx\">i</span><span class=\"p\">]);</span>\n  <span class=\"c1\">//   dataJSON.replace(/\"/g, \"'\");</span>\n  <span class=\"nx\">html</span> <span class=\"o\">+=</span> <span class=\"dl\">\"</span><span class=\"s2\">Name: &lt;span&gt;</span><span class=\"dl\">\"</span> <span class=\"o\">+</span> <span class=\"nx\">response</span><span class=\"p\">[</span><span class=\"nx\">i</span><span class=\"p\">].</span><span class=\"nx\">prefLabel</span> <span class=\"o\">+</span> <span class=\"dl\">\"</span><span class=\"s2\">&lt;/span&gt;</span><span class=\"dl\">\"</span><span class=\"p\">;</span>\n  <span class=\"nx\">html</span> <span class=\"o\">+=</span> <span class=\"dl\">\"</span><span class=\"s2\">&lt;/td&gt;</span><span class=\"dl\">\"</span><span class=\"p\">;</span>\n  <span class=\"nx\">html</span> <span class=\"o\">+=</span> <span class=\"dl\">\"</span><span class=\"s2\">&lt;/tr&gt;</span><span class=\"dl\">\"</span><span class=\"p\">;</span>\n<span class=\"p\">}</span>\n<span class=\"nx\">html</span> <span class=\"o\">+=</span> <span class=\"dl\">\"</span><span class=\"s2\">&lt;/table&gt;</span><span class=\"dl\">\"</span><span class=\"p\">;</span>\n<span class=\"nb\">document</span><span class=\"p\">.</span><span class=\"nf\">getElementById</span><span class=\"p\">(</span><span class=\"dl\">\"</span><span class=\"s2\">table</span><span class=\"dl\">\"</span><span class=\"p\">).</span><span class=\"nx\">innerHTML</span> <span class=\"o\">=</span> <span class=\"nx\">html</span><span class=\"p\">;</span>\n</code></pre></div></div>\n\n<p>So, we’re set to teach the students all the basics of programming: loops, variables, functions, etc.</p>",
      "summary": "Now that we have the communication working with the Open PHACTS LDA, it is time to make a nice GUI. I will not go into details, but we can use basic JavaScript to iterate over the JSON results, and, for example, create a HTML table:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/mscpils1_output.png",
      "date_published": "2013-10-09T00:00:00+00:00",
      "date_modified": "2013-10-09T00:00:00+00:00",
      "tags": ["pra3006","html","javascript"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2013/10/09/programming-in-life-sciences-4.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2013/10/09/programming-in-life-sciences-4.html",
      "title": "Programming in the Life Sciences #4: communication from within HTML",
      "content_html": "<p>The purpose of a web service is that you give it a question or task, and that it returns an answer. For example, we can ask the\n<a href=\"http://www.openphacts.org/\">Open PHACTS</a> platform what compounds it knows with aspirin in the name. We pass the question (with the\n<a href=\"http://chem-bla-ics.blogspot.nl/2013/10/programming-in-life-sciences-2-accounts.html\">API key</a>) and get a list of matching compounds.\nNow, this communication is complex: it happens at many levels, which are spelled out in the\n<a href=\"https://en.wikipedia.org/wiki/Internet_model\">Internet Model</a>. There are various variants of the stack of communication layers,\nbut we are interested mostly in the top layers, at the <em>application layer</em>. In fact, for this course this model only serves as\nsupporting information for those who want to learn more.</p>\n\n<p>Practically, what matters here is how to ask the question and how to understand the answer.</p>\n\n<p>We are supported in these practicalities with JavaScript libraries, in particular the <a href=\"https://github.com/openphacts/ops.js\">ops.js</a>\nlibrary and general <a href=\"https://en.wikipedia.org/wiki/JSON\">JSON</a> functionality provided by most browsers (unless the student decided to use\na <em>different</em> programming language, in which there are different libraries). Personally, I have only very limited JavaScript experience,\nand this mostly goes back to the good old <a href=\"http://www.biomedcentral.com/1471-2105/8/487\">Userscript and Greasemonkey days</a> (wow! the\npaper is actually the <a href=\"http://www.altmetric.com/details.php?citation_id=103983\">4th highest scoring BMC Bioinformatics article!</a>).\nBut because my JavaScript knowledge is limited and rusty, I spent a good part of today, to get a basic example running. Very basic,\nand barely exceeding the communication details. That is, this is the output in the browser:</p>\n\n<p><img src=\"/assets/images/mcspils_jsonOutput.png\" alt=\"\" /></p>\n\n<p>So, what does the question look like? The question is actually hardcoded in the HTML source, but the page does take two parameters:\nthe <code class=\"language-plaintext highlighter-rouge\">app_key</code> and <code class=\"language-plaintext highlighter-rouge\">app_id</code> that come <a href=\"http://chem-bla-ics.blogspot.nl/2013/10/programming-in-life-sciences-2-accounts.html\">with your Open PHACTS account</a>.</p>\n\n<p>The ops.js library helps us, and wraps the Open PHACTS LDA methods in JavaScript methods. Thus, rather can crafting special HTTP calls,\nwe use two JavaScript calls:</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kd\">var</span> <span class=\"nx\">searcher</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"nx\">Openphacts</span><span class=\"p\">.</span><span class=\"nc\">ConceptWikiSearch</span><span class=\"p\">(</span>\n  <span class=\"dl\">\"</span><span class=\"s2\">https://beta.openphacts.org</span><span class=\"dl\">\"</span><span class=\"p\">,</span>\n  <span class=\"nx\">params</span><span class=\"p\">[</span><span class=\"dl\">\"</span><span class=\"s2\">app_id</span><span class=\"dl\">\"</span><span class=\"p\">],</span> <span class=\"nx\">params</span><span class=\"p\">[</span><span class=\"dl\">\"</span><span class=\"s2\">app_key</span><span class=\"dl\">\"</span><span class=\"p\">]</span>\n<span class=\"p\">);</span>\n<span class=\"nx\">searcher</span><span class=\"p\">.</span><span class=\"nf\">byTag</span><span class=\"p\">(</span>\n  <span class=\"dl\">'</span><span class=\"s1\">Aspirin</span><span class=\"dl\">'</span><span class=\"p\">,</span> <span class=\"dl\">'</span><span class=\"s1\">20</span><span class=\"dl\">'</span><span class=\"p\">,</span> <span class=\"dl\">'</span><span class=\"s1\">4</span><span class=\"dl\">'</span><span class=\"p\">,</span> <span class=\"dl\">'</span><span class=\"s1\">07a84994-e464-4bbf-812a-a4b96fa3d197</span><span class=\"dl\">'</span><span class=\"p\">,</span>\n  <span class=\"nx\">callback</span>\n<span class=\"p\">);</span>\n</code></pre></div></div>\n\n<p>The first statement creates an LDA method object, while the second makes an actual question. I have not defined the callback variable,\nwhich actually is a JavaScript function that looks like:</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kd\">var</span> <span class=\"nx\">callback</span> <span class=\"o\">=</span> <span class=\"kd\">function</span><span class=\"p\">(</span><span class=\"nx\">success</span><span class=\"p\">,</span> <span class=\"nx\">status</span><span class=\"p\">,</span> <span class=\"nx\">response</span><span class=\"p\">){</span>\n  <span class=\"kd\">var</span> <span class=\"nx\">result</span> <span class=\"o\">=</span> <span class=\"nx\">searcher</span><span class=\"p\">.</span><span class=\"nf\">parseResponse</span><span class=\"p\">(</span><span class=\"nx\">response</span><span class=\"p\">);</span>\n  <span class=\"nb\">document</span><span class=\"p\">.</span><span class=\"nf\">getElementById</span><span class=\"p\">(</span><span class=\"dl\">\"</span><span class=\"s2\">output</span><span class=\"dl\">\"</span><span class=\"p\">).</span><span class=\"nx\">innerHTML</span> <span class=\"o\">=</span>\n    <span class=\"dl\">\"</span><span class=\"s2\">Results: </span><span class=\"dl\">\"</span> <span class=\"o\">+</span> <span class=\"nx\">JSON</span><span class=\"p\">.</span><span class=\"nf\">stringify</span><span class=\"p\">(</span><span class=\"nx\">result</span><span class=\"p\">);</span>\n<span class=\"p\">};</span>\n</code></pre></div></div>\n\n<p>When the LDA web service returns data, this method gets called, providing asynchronous functionality to keep the web page responsive.\nBut when called, it first parses the returned data, and then puts the JSON output as text in the HTML. The output that is given in\nthe earlier screenshot.</p>\n\n<p>So, hurdle taken. From here on it’s easier. Regular looping over the results, creating some HTML output, etc. The\n<a href=\"https://gist.github.com/egonw/6902776\">full source code</a> if this example is available as Gist.</p>",
      "summary": "The purpose of a web service is that you give it a question or task, and that it returns an answer. For example, we can ask the Open PHACTS platform what compounds it knows with aspirin in the name. We pass the question (with the API key) and get a list of matching compounds. Now, this communication is complex: it happens at many levels, which are spelled out in the Internet Model. There are various variants of the stack of communication layers, but we are interested mostly in the top layers, at the application layer. In fact, for this course this model only serves as supporting information for those who want to learn more.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/mcspils_jsonOutput.png",
      "date_published": "2013-10-09T00:00:00+00:00",
      "date_modified": "2013-10-09T00:00:00+00:00",
      "tags": ["pra3006","javascript","html","openphacts"],
      "_references": [{ "url": "https://doi.org/10.1186/1471-2105-8-487" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2013/10/09/programming-in-life-sciences-3.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2013/10/09/programming-in-life-sciences-3.html",
      "title": "Programming in the Life Sciences #3: the assessment",
      "content_html": "<p>Now that I have wrote out <a href=\"http://chem-bla-ics.blogspot.nl/2013/10/programming-in-life-sciences-1-six-day.html\">the goals</a>,\nwhat they students will practically do, and how to <a href=\"http://chem-bla-ics.blogspot.nl/2013/10/programming-in-life-sciences-2-accounts.html\">get started</a>\nwith the <a href=\"http://openphacts.org/\">Open PHACTS</a> platform, I will list how we will assess the students:</p>\n\n<ol>\n  <li>a presentation on the second day, outlining the project and work plan,</li>\n  <li>working source code at the end of the cour\nse,</li>\n  <li>a final presentation, showing the results and conclusions.</li>\n</ol>\n\n<p>Primarily, they will be judged on their acquired programming skills. Working code is the minimum; but code quality will be taken\ninto account too. I will show them how blogging works as a pre-print server for presentations. I hope it will also learn them\nwhat role this has in scientific communication.</p>",
      "summary": "Now that I have wrote out the goals, what they students will practically do, and how to get started with the Open PHACTS platform, I will list how we will assess the students:",
      
      "date_published": "2013-10-09T00:00:00+00:00",
      "date_modified": "2013-10-09T00:00:00+00:00",
      "tags": ["pra3006"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2013/10/08/programming-in-life-sciences-2-accounts.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2013/10/08/programming-in-life-sciences-2-accounts.html",
      "title": "Programming in the Life Sciences #2: accounts and API keys",
      "content_html": "<p>I have outlined the scope of the <a href=\"http://chem-bla-ics.blogspot.nl/2013/10/programming-in-life-sciences-1-six-day.html\">six-day course</a>:\nthe students will learn to program while hacking on the <a href=\"https://dev.openphacts.org/docs\">Open PHACTS’ Linked Data API</a> (LDA). The first\nstep is to get an account for the LDA. I have already done that to save time. But these are the steps to take. You go to\n<a href=\"https://dev.openphacts.org/signup\">https://dev.openphacts.org/signup</a>:</p>\n\n<p><img src=\"/assets/images/gscholar1.png\" alt=\"\" /></p>\n\n<p>You then approve the account via your email account and you are set. The account is needed to get an API key. Using this key,\nOpen PHACTS developers can contact you if your scripts go berserk  So, you are kindly invited to make crazy hypotheses and hack the\nhell out of the platform. That’s what I hope my students will do.</p>\n\n<p>To try your new key, go to the documentation page, and open, for example, the <em>SMILES to URL</em> method:</p>\n\n<p><img src=\"/assets/images/mscpils.png\" alt=\"\" /></p>\n\n<p>Here you can see what parameters this LDA method has. We focus now on the <code class=\"language-plaintext highlighter-rouge\">app_id</code> and <code class=\"language-plaintext highlighter-rouge\">app_key</code> fields. Each account comes by default\nwith a, um, default <code class=\"language-plaintext highlighter-rouge\">app_id</code> and default <code class=\"language-plaintext highlighter-rouge\">app_key</code>. Just click on the field and select them:</p>\n\n<p><img src=\"/assets/images/mscpils1.png\" alt=\"\" /></p>\n\n<p>Select the defaults and enter a SMILES (try: <a href=\"https://apps.ideaconsult.net:8080/ambit2/depict?search=CC(=O)NC1=CC=C(C=C1)O\">CC(=O)NC1=CC=C(C=C1)O)</a>).\nYou can select the format you like (I like Turtle) and you get Linked Data back on this <a href=\"https://rdf.chemspider.com/1906\">compound</a>.</p>\n\n<p>Now, go explore the LDA methods.</p>",
      "summary": "I have outlined the scope of the six-day course: the students will learn to program while hacking on the Open PHACTS’ Linked Data API (LDA). The first step is to get an account for the LDA. I have already done that to save time. But these are the steps to take. You go to https://dev.openphacts.org/signup:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/gscholar1.png",
      "date_published": "2013-10-08T00:00:00+00:00",
      "date_modified": "2013-10-08T00:00:00+00:00",
      "tags": ["pra3006","openphacts","javascript","rest"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2013/10/05/programming-in-life-sciences-1-six-day.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2013/10/05/programming-in-life-sciences-1-six-day.html",
      "title": "Programming in the Life Sciences #1: a six day course",
      "content_html": "<p>Our <a href=\"http://www.bigcat.unimaas.nl/\">department</a> will soon start the course Programming in the Life Sciences for a group of some\n10 students from the <a href=\"http://www.maastrichtuniversity.nl/web/Schools/MaastrichtScienceProgramme.htm\">Maastricht Science Programme</a>.\nThis is the first time we give this course, and over the next weeks I will be blogging about this course. First, some information.\nThese are the goals, to use programming to:</p>\n\n<ul>\n  <li>have the ability to recognize various classes of chemical entities in pharmacology and to understand the basic physical and chemical interactions.</li>\n  <li>be familiar with technologies for web services in the life sciences.</li>\n  <li>obtain experience in using such web services with a programming language.</li>\n  <li>be able to select web services for a particular pharmacological question.</li>\n  <li>have sufficient background for further, more advanced, bioinformatics data analyses.</li>\n</ul>\n\n<p>So, this course will be a mix of things. I will likely start with a lecture or too about scientific programming, such as the\nimportance of reproducibility, licensing, documentation, and (unit) testing. To achieve these learning goals we have set a\nproblem. The description is:</p>\n\n<ul><i>\nIn the life sciences the interactions between chemical entities is of key interest. Not only do these play an important role\nin the regulation of gene expression, and therefore all cellular processes, they are also one of the primary approaches in\ndrug discovery. Pharmacology is the science studies the action of drugs, and for many common drugs, this is studying the\ninteraction of small organic molecules and protein targets.\n\nAnd with the increasing information in the life sciences, automation becomes increasingly important. Big data and small data\nalike, provide challenges to integrate data from different experiments. The Open PHACTS platform provides web services to\nsupport pharmacological research and in this course you will learn how to use such web services from programming languages,\nallowing you to link data from such knowledge bases to other platforms, such as those for data analysis.\n</i></ul>\n\n<p>So, it becomes pretty clear what the students will be doing. They only have six days, so it won’t be much. It’s just to learn\nthem the basic skills. The students are in their 3rd year at the university, and because of the nature of the programme they\nfollow, a mixed background in biology, mathematics, chemistry, and physics. So, I have a good hope they will surprise me in\nwhat they will get done.</p>\n\n<p>Pharmacology is the basic topic: drug-protein interaction, but the students are free to select a research question. In fact,\nI will not care that much what they like to study, as long as they do it properly. They will start with\n<a href=\"https://dev.openphacts.org/docs\">Open PHACTS’ Linked Data API</a>, but here too, they are free to complement data from the\nOPS cache with additional information. I hope they do.</p>\n\n<p>Now, regarding the technology they will use. The default will be JavaScript, and in the next week I will hack up demo code\nshowing the integration of <a href=\"https://github.com/openphacts/ops.js\">ops.js</a> and <a href=\"http://d3js.org/\">d3.js</a>.\nLet’s see how hard it will be; it’s new to me too. But, if the students\nalready are familiar with another programming language and prefer to use that, I won’t stop them.</p>\n\n<p>(For the Dutch readers, would #mscpils be a good tag?)</p>",
      "summary": "Our department will soon start the course Programming in the Life Sciences for a group of some 10 students from the Maastricht Science Programme. This is the first time we give this course, and over the next weeks I will be blogging about this course. First, some information. These are the goals, to use programming to:",
      
      "date_published": "2013-10-05T00:00:00+00:00",
      "date_modified": "2013-10-05T00:00:00+00:00",
      "tags": ["pra3006","javascript","openphacts"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/h24n0-r8e92",
      "url": "https://chem-bla-ics.linkedchemistry.info/2013/05/09/new-paper-chembl-database-as-linked.html",
      "title": "New Paper: &quot;The ChEMBL database as linked open data&quot;",
      "content_html": "<script src=\"https://d1bxh8uas1mnw7.cloudfront.net/assets/embed.js\" type=\"text/javascript\"></script>\n\n<div class=\"altmetric-embed\" data-badge-details=\"right\" data-badge-type=\"donut\" data-doi=\"10.1186/1758-2946-5-23\" style=\"float: right;\"></div>\n\n<p><strong>Update</strong>: Mark wrote up a <a href=\"http://chembl.blogspot.co.uk/2013/05/chembl-chembl-rdf.html\">blog post</a> on the RDF that the ChEMBL team itself.</p>\n\n<p>Yesterday, the paper “The ChEMBL database as linked open data” (doi:<a href=\"https://doi.org/10.1186/1758-2946-5-23\">10.1186/1758-2946-5-23</a>) by\nAndra Waagmeester (<a href=\"https://twitter.com/andrawaag\">@andrawaag</a>), Ola Spjuth (<a href=\"https://twitter.com/ola_spjuth\">@ola_spjuth</a>), Peter Ansell\n(<a href=\"http://twitter.com/p_ansell\">@p_ansell</a>), Antony Williams (<a href=\"https://twitter.com/chemconnector\">@chemconnector</a>), Valery Tkachenko,\nJanna Hastings, Bin Chen (<a href=\"http://twitter.com/binchenindiana\">@binchenindiana</a>), David J Wild (<a href=\"http://twitter.com/davidjohnwild\">@davidjohnwild</a>),\nand me appeared in the OA <a href=\"http://en.wikipedia.org/wiki/Journal_of_Cheminformatics\">JChemInf</a> journal.</p>\n\n<p>I am also indebted to the <a href=\"https://www.ebi.ac.uk/chembl/\">ChEMBL</a> team (<a href=\"http://twitter.com/chembl\">@chembl</a>) for both providing such\nvaluable data under a liberal Open Access license and their critical reading of the manuscript! <strong>Additionally, I would like to stress\nthat the ChEMBL team will create their own RDF version of ChEMBL and that this paper is not describing the version they will release.</strong></p>\n\n<p>BTW, the <a href=\"https://github.com/egonw/chembl-rdf-paper/\">source of the paper</a> is available from GitHub. And the\n<a href=\"https://github.com/egonw/chembl.rdf\">(original) scripts to create RDF from the MySQL dump of ChEMBL</a> are also on GitHub.</p>\n\n<p><img src=\"https://media.springernature.com/lw685/springer-static/image/art%3A10.1186%2F1758-2946-5-23/MediaObjects/13321_2012_Article_469_Figa_HTML.gif\" alt=\"\" /></p>\n\n<p>This paper outlines the <a href=\"http://www.jcheminf.com/content/3/1/15\">RDF</a> as it has evolved from various earlier projects. The above\ndiagram visualizes the basic structure (red), various Linked Data resources linked too (blue) and illustrates how various ontologies are used,\nsuch as the <a href=\"http://www.plosone.org/article/info:doi/10.1371/journal.pone.0025513\">CHEMINF</a>, <a href=\"http://bibliontology.com/\">BIBO</a>,\nand <a href=\"http://www.jbiomedsem.com/content/1/S1/S6\">CiTO</a> ontologies.</p>\n\n<p>Additionally, various applications and links are described developed by various co-authors. For example, Peter worked on the use in\n<a href=\"http://bio2rdf.org/\">Bio2RDF</a> and Bin and David on <a href=\"http://cheminfov.informatics.indiana.edu:8080/\">Chem2Bio2RDF</a>. Andra developed\nan extension for his (#altmetric) <a href=\"http://citedin.org/\">CitedIn</a> resource, giving credit to a paper when data in it is extracted into\nChEMBL. Ola, Valery, and Anthony developed a <a href=\"http://www.bioclipse.net/decision-support\">Bioclipse Decision Support</a> extension,\nwhich supports a nearest neighbor search in ChEMBL using <a href=\"http://chemspider.com/\">ChemSpider</a>. Of course, Ola also hosts\n<a href=\"http://rdf.farmbio.uu.se/chembl/snorql/\">the SPARQL end point</a> of which you can monitor the uptime at the also cool\n<a href=\"http://labs.mondeca.com/sparqlEndpointsStatus/details/farmbio-chembl.html\">mondeca.com service</a>:</p>\n\n<p><img src=\"/assets/images/mondecaUptime.png\" alt=\"\" /></p>\n\n<p>(Yes, I think I have all the cool buzzwords covered in this paper. Sadly, marketing is needed nowadays as a scientist. Where is the\ntime that you could rant on page after page in all your domain specific jargon, not having to worry if your reader would understand\nit immediately, or without a university degree…)</p>\n\n<p>What this paper does not describe, is all the things I did with ChEMBL-RDF in the <a href=\"http://www.openphacts.org/\">Open PHACTS</a> project\n(<a href=\"https://twitter.com/open_phacts\">@Open_PHACTS</a>), which includes the use of <a href=\"http://qudt.org/\">QUDT</a> and the\n<a href=\"https://github.com/egonw/jqudt\">jQUDT</a> library for unit normalization outlined in <a href=\"http://www.bigcat.unimaas.nl/~egonw/units/\">this document</a>\nand the use of VoID for link sets as described in <a href=\"http://www.openphacts.org/specs/2012/WD-datadesc-20121019/\">this document</a>.</p>",
      "summary": "",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/mondecaUptime.png",
      "date_published": "2013-05-09T00:00:00+00:00",
      "date_modified": "2024-08-08T00:00:00+00:00",
      "tags": ["chembl","rdf","cito","cheminf","ontology","chemspider","openphacts"],
      "_references": [{ "url": "https://doi.org/10.1186/1758-2946-5-23" },{ "url": "https://doi.org/10.1186/1758-2946-3-15" },{ "url": "https://doi.org/10.1371/JOURNAL.PONE.0025513" },{ "url": "https://doi.org/10.1186/2041-1480-1-S1-S6" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/97y2t-wnh91",
      "url": "https://chem-bla-ics.linkedchemistry.info/2012/04/10/emerging-practices-for-mapping-and.html",
      "title": "&quot;Emerging practices for mapping and linking life sciences data using RDF&quot;",
      "content_html": "<p>The “Emerging practices for mapping and linking life sciences data using RDF” (doi:<a href=\"https://doi.org/10.1016/j.websem.2012.02.003\">10.1016/j.websem.2012.02.003</a>)\nis now available online, where I contributed a section on the original workflow for creating <a href=\"https://www.ebi.ac.uk/chembldb/\">ChEMBL</a> triples,\nand contributed to the section about open licensing, referring to <a href=\"http://creativecommons.org/publicdomain/zero/1.0/\">CCZero</a> and the\n<a href=\"http://pantonprinciples.org/\">Panton Principles</a>. Happy reading!</p>\n\n<p>(Yes, it is indeed an Elsevier journal…)</p>",
      "summary": "The “Emerging practices for mapping and linking life sciences data using RDF” (doi:10.1016/j.websem.2012.02.003) is now available online, where I contributed a section on the original workflow for creating ChEMBL triples, and contributed to the section about open licensing, referring to CCZero and the Panton Principles. Happy reading!",
      
      "date_published": "2012-04-10T00:00:00+00:00",
      "date_modified": "2012-04-10T00:00:00+00:00",
      "tags": ["semweb","chembl"],
      "_references": [{ "url": "https://doi.org/10.1016/J.WEBSEM.2012.02.003" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/qtjby-n6m67",
      "url": "https://chem-bla-ics.linkedchemistry.info/2012/03/04/chembl-13-as-rdf.html",
      "title": "ChEMBL 13 as RDF",
      "content_html": "<p><strong>Update</strong>: this work is now described in <a href=\"https://chem-bla-ics.linkedchemistry.info/2013/05/09/new-paper-chembl-database-as-linked.html\">this paper <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.</p>\n\n<p>Last week, ChEMBL 13 was <a href=\"http://chembl.blogspot.com/2012/02/chembl-13-released.html\">released</a>, with even more data, data fixes,\n<a href=\"ftp://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/releases/chembl_13/chembl_13_release_notes.txt\">etc</a>. Since my RDF for\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2011/04/21/chembl-09-as-rdf.html\">ChEMBL 09 <i class=\"fa-solid fa-recycle fa-xs\"></i></a> my workflow has become\n<a href=\"https://github.com/egonw/chembl.rdf/commits/master\">more solid</a> and uses more common ontologies, started using more common ontologies\nand ontologies I just like, such as <a href=\"http://www.plosone.org/article/info:doi/10.1371/journal.pone.0025513\">CHEMINF</a> and\n<a href=\"http://www.jbiomedsem.com/content/1/S1/S6\">CiTO</a>. Below is an overview of the resource types present in the RDF:\nactivities (almost 7M now), chemical entities, assays, targets, and documents.</p>\n\n<p><img src=\"/assets/images/relations.png\" alt=\"\" /></p>\n\n<p>The <a href=\"https://chem-bla-ics.linkedchemistry.info/2011/10/22/chembl-rdf-uploading-data-to-kasabi.html\">data on Kasabi <i class=\"fa-solid fa-recycle fa-xs\"></i></a> will be updated soon,\nand the <a href=\"http://rdf.farmbio.uu.se/chembl/sparql\">SPARQL end point</a> hosted by Uppsala University was updated yesterday, including the\n<a href=\"http://rdf.farmbio.uu.se/chembl/snorql/\">SNORQL frontend</a>:</p>\n\n<p><img src=\"/assets/images/chemblRDF13.png\" alt=\"\" /></p>\n\n<p>The new data is not fully backwards compatible. The changes to the RDF include the use of <code class=\"language-plaintext highlighter-rouge\">cito:citesAsDataSource</code>, more typing\nusing existing ontologies, e.g. with <code class=\"language-plaintext highlighter-rouge\">cheminf:CHEMINF_000000</code> and <code class=\"language-plaintext highlighter-rouge\">pro:PR_000000001</code> from the\n<a href=\"http://pir.georgetown.edu/pro/\">PRotein Ontology</a>.</p>\n\n<p>A paper dedicated to the ChEMBL-RDF is in preparation. Existing use cases can be found\n<a href=\"http://www.jbiomedsem.com/content/2/S1/S6\">here</a>.</p>",
      "summary": "Update: this work is now described in this paper .",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/relations.png",
      "date_published": "2012-03-04T00:00:00+00:00",
      "date_modified": "2024-11-02T00:00:00+00:00",
      "tags": ["chembl","rdf","semweb","ontology","cheminf","cito"],
      "_references": [{ "url": "https://doi.org/10.1371/JOURNAL.PONE.0025513" },{ "url": "https://doi.org/10.1186/2041-1480-1-S1-S6" },{ "url": "https://doi.org/10.1186/2041-1480-2-S1-S6" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/25dgb-j2y93",
      "url": "https://chem-bla-ics.linkedchemistry.info/2012/02/23/cito-citeulike-publishing-innovation.html",
      "title": "CiTO / CiteULike: publishing innovation",
      "content_html": "<p>Readers of my blog know I have been using the Citation Typing Ontology, CiTO (doi:<a href=\"http://dx.doi.org/10.1186/2041-1480-1-S1-S6\">10.1186/2041-1480-1-S1-S6</a>).\nI allows me to see <a href=\"http://chem-bla-ics.blogspot.com/2010/02/citing-chemistry-development-kit.html\">how the CDK</a> is\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2010/10/31/citeulike-cito-use-case-1-wordles.html\">cited and used <i class=\"fa-solid fa-recycle fa-xs\"></i></a>. CiteULike is currently adding more CiTO more functionality,\nwhich they <a href=\"https://chem-bla-ics.linkedchemistry.info/2010/09/17/list-of-things-i-miss-in-citeulike.html\">started <i class=\"fa-solid fa-recycle fa-xs\"></i></a> doing almost one and a half years ago.</p>\n\n<p>One of the things, is that the CiTO data added via a certain account, can be downloaded as triples:</p>\n\n<p><img src=\"/assets/images/culcito2.png\" alt=\"\" /></p>\n\n<p>The second is that they are improving the graphics of how it is visualized. E.g. they added an ‘Expand’ link, which I found when they\n<a href=\"https://twitter.com/#!/citeulike/status/172446830666321921\">tweeted</a> they had hidden drag-n-drop, which I haven’t found yet, though.\nClicking that action, will show you the following:</p>\n\n<p><img src=\"/assets/images/culcito.png\" alt=\"\" /></p>\n\n<p>Because CiteULike takes advantage of the <a href=\"http://www.w3.org/TR/owl-ref/#InverseFunctionalProperty-def\">inverse function</a> of the CiTO predictates,\nthey show up with the cited paper too, which is less suitable for the top-down flow graphics:</p>\n\n<p><img src=\"/assets/images/culcito1.png\" alt=\"\" /></p>\n\n<p>To make this advertorial a bit balanced, not all <a href=\"https://chem-bla-ics.linkedchemistry.info/2010/09/17/list-of-things-i-miss-in-citeulike.html\">my wishes <i class=\"fa-solid fa-recycle fa-xs\"></i></a> have been\nimplemented yet, and the next up from my perspective should be Linked Data. There is some Linked Data embedded as RDFa, but the latter is not turning out\nto be the killer I had hoped, and regular RDF entry points should be used.</p>\n\n<p>Each CiteULike entry (post) should get a unique <a href=\"http://en.wikipedia.org/wiki/Internationalized_Resource_Identifier\">IRI</a> (or\n<a href=\"http://en.wikipedia.org/wiki/Uniform_resource_identifier\">URI</a>) and opening that link should give RDF about that post\n(<a href=\"http://www.citeulike.org/groupforum/2191\">wish #10</a>). That’s is <a href=\"http://en.wikipedia.org/wiki/Dereferenceable_Uniform_Resource_Identifier\">dereferencibility</a>.\nThe RDF can be, for example, in <a href=\"http://bibliontology.com/\">BIBO</a> but there are many alternatives, and I have not been keeping up with which is the best\n(please leave a comment, if you have an opinion on that).</p>\n\n<p>But I like where this is going! Thanx, CiteIReallyLikeThis!</p>",
      "summary": "Readers of my blog know I have been using the Citation Typing Ontology, CiTO (doi:10.1186/2041-1480-1-S1-S6). I allows me to see how the CDK is cited and used . CiteULike is currently adding more CiTO more functionality, which they started doing almost one and a half years ago.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/culcito1.png",
      "date_published": "2012-02-23T00:00:00+00:00",
      "date_modified": "2024-11-02T00:00:00+00:00",
      "tags": ["citeulike","cito","rdf"],
      "_references": [{ "url": "https://doi.org/10.1186/2041-1480-1-S1-S6" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2012/01/15/groovy-cheminformatics-4th-edition.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2012/01/15/groovy-cheminformatics-4th-edition.html",
      "title": "Groovy Cheminformatics 4th edition",
      "content_html": "<p>Six month was not quite the amount of time I anticipated between the third and fourth edition, but I finally managed\nto upload edition 1.4.7-0 of my <a href=\"http://www.lulu.com/product/paperback/groovy-cheminformatics-with-the-chemistry-development-kit/18825420\">Groovy Cheminformatics</a>\nbook. The first three editions sold 37 copies, including two for myself. Enough to feel supported and to continue working on it.</p>\n\n<p>So, this new edition is again thicker, summing up to 152 pages now, which is 28 pages more than\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2011/07/31/groovy-cheminformatics-3rd-edition.html\">the 3rd edition <i class=\"fa-solid fa-recycle fa-xs\"></i></a>. Indeed, the table of contents\nis more than half a page longer in itself, though, just barely, still fitting on four pages. In fact, I had to remove one (new)\nsubsection title, because it would take otherwise two further pages.</p>\n\n<p>The new content is again a mix of sections and chapters. While writing new chapters, I find myself realizing I need to cover\nmore basics. Those get typically added as new sections. I did not get many feature requests, except for one email pointing me\nthe text promised how to interpret and handle failing atom type perception, which explains one of the new sections.\nThe full list of new content is:</p>\n\n<ul>\n  <li>Section 2.1.4: explaining the three flavors of atomic coordinates</li>\n  <li>Extended Section 2.2: added detail about electron counts of bonds (partly in reply to this post by Rich)</li>\n  <li>Chapter 5 “Protein and DNA”: four pages, mostly about PDB files, and the matching CDK data structure</li>\n  <li>Chapter 6 “IChemObjectBuilders”: four pages explaining the four alternative builders CDK 1.4.7 has</li>\n  <li>Section 7.8: a new section with recipes on how to post-process read input, discussing MDL molfiles only now. It talks about what information is present in the file format, and what steps must be untertaken to add missing information</li>\n  <li>Section 8.2.4 “No atom type perceived?!”</li>\n  <li>Section 11.4: describes how to depict aromatic rings</li>\n  <li>Section 11.5: describes how to change the background color of depictions</li>\n  <li>Section 13.4: explains how to calculate the Van der Waals volume of molecules</li>\n  <li>Section 18.1.3: discussing the API improvement in the iterating readers</li>\n  <li>Appendix C: a list of all descriptors provided by the CDK</li>\n  <li>Appendix D: a list of file formats known by the CDK, indicating which has readers and writers</li>\n</ul>\n\n<p>On top of that, I improved other bits of the book too, such as the resolution of the depictions of molecules,\nas well as those of various diagrams. Also the number of scripts has seriously gone up, from 94 to 134!</p>\n\n<p>Appendix C is a prelude to a chapter I am already writing, but did not get finished yet: a chapter about\ndescriptor calculation. But since I just started a new post-doc position, it may take another six months\nfor that chapter to make it into print.</p>\n\n<p>The paperbak is <a href=\"http://www.lulu.com/product/paperback/groovy-cheminformatics-with-the-chemistry-development-kit/18825420\">available from Lulu.com</a>,\nan on-demand publisher, as well as <a href=\"http://www.lulu.com/product/ebook/groovy-cheminformatics-with-the-chemistry-development-kit/18825437\">this ebook version</a>.</p>",
      "summary": "Six month was not quite the amount of time I anticipated between the third and fourth edition, but I finally managed to upload edition 1.4.7-0 of my Groovy Cheminformatics book. The first three editions sold 37 copies, including two for myself. Enough to feel supported and to continue working on it.",
      
      "date_published": "2012-01-15T00:00:00+00:00",
      "date_modified": "2024-12-29T00:00:00+00:00",
      "tags": ["cdk","cdkbook","java","cheminf"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/k1860-kks41",
      "url": "https://chem-bla-ics.linkedchemistry.info/2011/10/22/chembl-rdf-uploading-data-to-kasabi.html",
      "title": "ChEMBL-RDF: Uploading data to Kasabi with pytassium",
      "content_html": "<p>I reported earlier how to I <a href=\"http://chem-bla-ics.blogspot.com/2011/07/chempedia-rdf-3-uploading-data-to.html\">uploaded the ChemPedia (RIP) data onto Kasabi</a>.\nBut for ChEMBL-RDF I have used the <a href=\"https://github.com/iand/pytassium\">pytassium</a> tool, not just because it has a cool name :) I discovered yesterday,\nhowever, that I did not write down in this lab notebook, what steps I needed to take to reproduce it. And I just wanted to uploaded new triples to the\n<a href=\"http://kasabi.com/dataset/chembl-rdf\">ChEMBL-RDF data set on Kasabi</a>.</p>\n\n<p>The new triples I wanted to upload, link the <a href=\"http://chembl.blogspot.com/2011/08/chembl-11-released.html\">new public CHEMBL identifiers</a>\n(like <a href=\"https://www.ebi.ac.uk/chembldb/index.php/compound/inspect/CHEMBL25\">CHEMBL25 for aspirin</a>) to the internal ChEMBL database identifier I used for\nChEMBL 09 for the URIs. So, I am adding a lot of triples like:</p>\n\n<div class=\"language-turtle highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nl\">&lt;http://data.kasabi.com/dataset/chembl-rdf/09/molecule/m517180&gt;</span><span class=\"w\"> </span><span class=\"nl\">&lt;http://www.w3.org/2002/07/owl#sameAs&gt;</span><span class=\"w\">\n</span><span class=\"nl\">&lt;http://data.kasabi.com/dataset/chembl-rdf/09/chemblid/CHEMBL1&gt;</span><span class=\"w\">\n</span></code></pre></div></div>\n\n<p>And the pytassium code I use to upload this to Kasabi looks like:</p>\n\n<div class=\"language-python highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kn\">import</span> <span class=\"n\">pytassium</span>\n<span class=\"kn\">import</span> <span class=\"n\">time</span>\n\n<span class=\"n\">dataset</span> <span class=\"o\">=</span> <span class=\"n\">pytassium</span><span class=\"p\">.</span><span class=\"nc\">Dataset</span><span class=\"p\">(</span><span class=\"sh\">'</span><span class=\"s\">chembl-rdf</span><span class=\"sh\">'</span><span class=\"p\">,</span><span class=\"sh\">'</span><span class=\"s\">XXX</span><span class=\"sh\">'</span><span class=\"p\">)</span>\n\n<span class=\"c1\"># Store the contents of a turtle file\n</span><span class=\"n\">dataset</span><span class=\"p\">.</span><span class=\"nf\">store_file</span><span class=\"p\">(</span><span class=\"sh\">'</span><span class=\"s\">chemblids.nt</span><span class=\"sh\">'</span><span class=\"p\">,</span> <span class=\"n\">media_type</span><span class=\"o\">=</span><span class=\"sh\">'</span><span class=\"s\">text/plain</span><span class=\"sh\">'</span><span class=\"p\">)</span>\n</code></pre></div></div>\n\n<p>So, that omission in my log book has been corrected now.</p>",
      "summary": "I reported earlier how to I uploaded the ChemPedia (RIP) data onto Kasabi. But for ChEMBL-RDF I have used the pytassium tool, not just because it has a cool name :) I discovered yesterday, however, that I did not write down in this lab notebook, what steps I needed to take to reproduce it. And I just wanted to uploaded new triples to the ChEMBL-RDF data set on Kasabi.",
      
      "date_published": "2011-10-22T00:00:00+00:00",
      "date_modified": "2011-10-22T00:00:00+00:00",
      "tags": ["kasabi","chembl","semweb"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/eg94z-9dg88",
      "url": "https://chem-bla-ics.linkedchemistry.info/2011/09/17/inchikey-collision-diy-copypastables.html",
      "title": "InChIKey collision: the DIY copy/pastables",
      "content_html": "<p>About two weeks ago, the ChemConnector blog <a href=\"http://www.chemconnector.com/2011/09/01/an-inchikey-collision-is-discovered-and-not-based-on-stereochemistry/\">reported an InChIKey collosion</a>\ndetected by <a href=\"http://www-ucc.ch.cam.ac.uk/researchgroups/goodman_group\">Prof. Goodman</a>. Unlike the previous collision, this one was based solely on the graph and not on stereochemistry.\nThe two molecules both have the InChIKey <code class=\"language-plaintext highlighter-rouge\">OCPAUTFLLNMYSX-UHFFFAOYSA-N</code>:</p>\n\n<p><img src=\"/assets/images/inchikey1.png\" style=\"height:400px\" />\n<img src=\"/assets/images/inchikey2.png\" style=\"height:400px\" /></p>\n\n<p>The compounds are really different, the molecular formulas are C<sub>50</sub>H<sub>102</sub>O and C<sub>57</sub>H<sub>114</sub>O respectively.\nThe SMILESes are <code class=\"language-plaintext highlighter-rouge\">OC(C)C(C)CC(C)C(C)CCC(C)C(C)CCCC(C)C(C)CC(C)C(C)CCCC(C)C(C)CCC(C)C(C)CC(C)CCCCCCC</code> and\n<code class=\"language-plaintext highlighter-rouge\">O=C(C)CC(C)C(C)CCC(C)CCC(C)C(C)C(C)C(C)C(C)C(C)C(C)C(C)CC(C)C(C)C(C)CC(C)C(C)C(C)CCCCC(C)C(C)CC(C)C(C)C</code>.\nThe IUPAC names are useful to have as copy/pastables too (e.g. with the\n<a href=\"http://opsin.ch.cam.ac.uk/\">OPSIN</a>-based ‘<a href=\"http://chem-bla-ics.blogspot.com/2011/02/opsin-used-for-bioclipse-wizard.html\">Molecule from IUPAC name</a>‘-wizard\nin <a href=\"http://bioclipse.net/\">Bioclipse</a> 2.5, which has been updated to the latest OPSIN version this week):\n3,5,6,9,10,14,15,17,18,22,23,26,27,29-tetradecamethylhexatriacontan-2-ol and\n4,5,8,11,12,13,14,15,16,17,18,20,21,22,24,25,26,31,32,34,35-henicosamethylhexatriacontan-2-one.</p>\n\n<p>I am adding these structures to the <a href=\"http://chem-bla-ics.blogspot.com/2011/03/pharmaceutical-bioinformatics.html\">pharmbio.org course book</a>\nand the matching Bioclipse plugin this weekend.</p>",
      "summary": "About two weeks ago, the ChemConnector blog reported an InChIKey collosion detected by Prof. Goodman. Unlike the previous collision, this one was based solely on the graph and not on stereochemistry. The two molecules both have the InChIKey OCPAUTFLLNMYSX-UHFFFAOYSA-N:",
      
      "date_published": "2011-09-17T00:00:00+00:00",
      "date_modified": "2011-09-17T00:00:00+00:00",
      "tags": ["inchi","opsin","smiles","bioclipse","iupac"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/eg94z-9dg88",
      "url": "https://chem-bla-ics.linkedchemistry.info/2011/08/02/my-google-scholar-citations-profile.html",
      "title": "My Google Scholar Citations profile arrived",
      "content_html": "<p><a href=\"http://en.wikipedia.org/wiki/Web_of_Science\">Web of Science</a> is my de facto standard for citation statistics (I need these for\n<a href=\"http://vr.se/\">VR</a> grant applications), and defines the lower limit of citations (it is pretty clean, but I do have to ping them now\nand then to fix something). The public front-end of it is <a href=\"http://www.researcherid.com/rid/C-6136-2008\">Researcher ID</a>. There is an\n<a href=\"http://academic.research.microsoft.com/Author/2893110/egon-l-willighagen\">Microsoft initiative</a>, which looks clean but doesn’t work\non Linux for the nicer things, but the coverage of journals is pretty bad in my field, giving a biased (downwards)\n<a href=\"http://en.wikipedia.org/wiki/H-index\">H-index</a>. And <a href=\"http://www.citeulike.org/user/egonw\">CiteULike</a> and\n<a href=\"http://www.mendeley.com/profiles/egon-willighagen/\">Mendeley</a> focus more on your publications than on citations (though the former\nhas <a href=\"http://opencitations.wordpress.com/2010/10/21/use-of-cito-in-citeulike/\">great CiTO support</a>!).</p>\n\n<p>Then <a href=\"http://googlescholar.blogspot.com/2011/07/google-scholar-citations.html\">Google Scholar Citations</a> (GSC) shows up. While it\ndoes not look as pretty as competing products, it compensates that with a wide coverage of literature (for example, it supports the\n<a href=\"http://jcheminf.com/\">JChemInf</a>, which Web-of-Science currently does not; and I happen to publish a lot in that journal recently),\nbooks, and reports, while keeping false positives fairly low. Thus, it provides an upper limit of my citations statistics, but one\nI am pretty happy confident about. And my H-index is quite comparable anyway. This is what\n<a href=\"http://scholar.google.com/citations?user=u8SjMZ0AAAAJ\">my profile</a> looks like:</p>\n\n<p><img src=\"/assets/images/gsc.png\" alt=\"\" /></p>\n\n<p>So, these statistics have two purposes to me: 1. grant applications, and 2. I like to know what people based theirs on my research. (Well,\nOK, 3. it helps me understand why I work so hard on too many things.)</p>\n\n<p>Now the question is, will GSC take off. Will it replace <a href=\"http://orcid.org/\">ORCID</a>? Will they join ORCID? Will GSC get a good API?\nWho will write the first <a href=\"http://www.biomedcentral.com/1471-2105/8/487\">userscript</a> to make the GUI fancier? Will GSC support CiTO?\nWill GSC start using microformats or RDFa? What mashups can we expect between bibliographic databases? Will new entries automatically\nbe posted to Google+? Will it have a button to autocreate a blog post when a paper gets cited 100, 500, or a 1000 times? Will GSC\nsupport <a href=\"http://friendfeed.com/search?q=%23altmetrics\">#altmetrics</a>?</p>",
      "summary": "Web of Science is my de facto standard for citation statistics (I need these for VR grant applications), and defines the lower limit of citations (it is pretty clean, but I do have to ping them now and then to fix something). The public front-end of it is Researcher ID. There is an Microsoft initiative, which looks clean but doesn’t work on Linux for the nicer things, but the coverage of journals is pretty bad in my field, giving a biased (downwards) H-index. And CiteULike and Mendeley focus more on your publications than on citations (though the former has great CiTO support!).",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/gsc.png",
      "date_published": "2011-08-02T00:00:00+00:00",
      "date_modified": "2011-08-02T00:00:00+00:00",
      "tags": ["google","citeulike"],
      "_references": [{ "url": "https://doi.org/10.1186/1471-2105-8-487" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2011/07/31/groovy-cheminformatics-3rd-edition.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2011/07/31/groovy-cheminformatics-3rd-edition.html",
      "title": "Groovy Cheminformatics 3rd edition",
      "content_html": "<p><strong>Update</strong>: the <a href=\"https://chem-bla-ics.linkedchemistry.info/2012/01/15/groovy-cheminformatics-4th-edition.html\">fourth edition <i class=\"fa-solid fa-recycle fa-xs\"></i></a> is out.</p>\n\n<p>I am starting to get the hang of this publishing soon, publishing often thing, and\n<a href=\"http://www.lulu.com/product/paperback/groovy-cheminformatics-with-the-chemistry-development-kit/16378378\">just uploaded</a>\nedition 1.4.1-0 of the <a href=\"https://chem-bla-ics.linkedchemistry.info/2011/02/06/groovy-cheminformatics.html\">Groovy Cheminformatics <i class=\"fa-solid fa-recycle fa-xs\"></i></a> book.\nThe cover is the same (with one typo fix), and the content is 20 pages thicker. True, six of those pages are isotope\nmasses of all natural isotopes. That leaves 14 pages with this new content:</p>\n\n<ul>\n  <li>Section 2.7 on line notations with 2.7.1 about reading and writing SMILES</li>\n  <li>Section 6.3 about Sybyl (mol2) atom types</li>\n  <li>Section 7.4 on atom numbering with 7.4.1 on Morgan atom numbers, and 7.4.2 on InChI atom numbers</li>\n  <li>Chapter 9 on molecule depiction with the new rendering code, with\n    <ul>\n      <li>Section 9.1 on drawing molecules,</li>\n      <li>Section 9.2 on rendering parameters, and</li>\n      <li>Section 9.3 on the generator API and how to add custom content</li>\n    </ul>\n  </li>\n  <li>Section 11.4 on calculating aromaticity</li>\n  <li>Appendix A.2 listing all Sybyl atom types</li>\n  <li>Appendix B listing all naturally occurring isotopes</li>\n</ul>\n\n<p>Features requests most welcome.</p>",
      "summary": "Update: the fourth edition is out.",
      
      "date_published": "2011-07-31T00:00:00+00:00",
      "date_modified": "2024-12-29T00:00:00+00:00",
      "tags": ["cdk","cdkbook","java","cheminf"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/zq2m3-dxp07",
      "url": "https://chem-bla-ics.linkedchemistry.info/2011/07/13/data-nonotify-or-silent.html",
      "title": "Data, Nonotify, or Silent?",
      "content_html": "<p>I cannot find the bug report just now, but the <a href=\"http://cdk.sf.net/\">CDK</a> has an open problem with change even notification,\nwhere the nonotify classes still caused change event to be sent around.</p>\n\n<p>This was because the nonotify classes extended in a wrong way the data classes. So, I worked today on copying the data class\nimplementations into a new implementation, not extending the data classes, while removing the listener code: the <em>silent</em>\nmodule. I’m not entirely done yet, but close enough to blog about it. While checking things, I ran the\n<a href=\"https://github.com/egonw/cheminfbenchmark\">cheminfbench</a> code on it, with these results:</p>\n\n<p><img src=\"/assets/images/silent.png\" alt=\"\" /></p>\n\n<p>So, removal of the notification listening improves the performance, when reading a 416 entry SD file. I think the difference\nwill be more significant for other tasks, like ring finding.</p>\n\n<p>But, but…?!?! Yeah, this is a rather weird plot indeed… the blue bar should also be lower than the red one! And it used\nto be too… :( Bad regression… hard to unit test too :(</p>\n\n<p>OK, back to some final clean up.</p>\n\n<p><strong>Update</strong>: the clean up is done, and I have now run the fingerprint benchmark from cheminfbench using the new module and\nnonotify. In a situation when change events are much more used (as is with fingerprint calculation), we see that nonotify\nstill improves speed, and that the new silent module shows about the same speed up. We also see that the 1.4.x classes\nare a bit slower than one classes of some 20 months ago. That probably reflects\n<a href=\"https://sourceforge.net/tracker/?func=detail&amp;aid=2992921&amp;group_id=20024&amp;atid=120024\">bug 2992921</a> that was recently fixed.\nThe full bar plot:</p>\n\n<p><img src=\"/assets/images/silent1.png\" alt=\"\" /></p>\n\n<p>Red and blue are CDK 1.2.x (as the plot legend says), green and yellow the same for CDK 1.3.x (and both clearly faster than\nthe 1.2 series, and purple an light blue the same for CDK 1.4.0. The last bar is the new silent module, a tid bit slower\nthan nonotify.</p>\n\n<p><strong>Update 2</strong>: OK, one last update. The performance difference can actually be larger than this. The below screen shot shows\nthe effect of the silent module (blue, yellow) on SMILES generation (without and with lower case formalism, red and green\nrespectively):</p>\n\n<p><img src=\"/assets/images/silent2.png\" alt=\"\" /></p>\n\n<p>If you did not get it yet, if you bring your system to production level, do not use the default implementation,\n<strong><em>unless</em></strong> you really need to change notifications.</p>",
      "summary": "I cannot find the bug report just now, but the CDK has an open problem with change even notification, where the nonotify classes still caused change event to be sent around.",
      
      "date_published": "2011-07-13T00:00:00+00:00",
      "date_modified": "2011-07-13T00:00:00+00:00",
      "tags": ["cdk"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/f95v6-r1630",
      "url": "https://chem-bla-ics.linkedchemistry.info/2011/07/06/chempedia-rdf-2-kasabi.html",
      "title": "ChemPedia-RDF #2: Kasabi",
      "content_html": "<p><img style=\"float: right;\" src=\"/assets/images/kasabi.png\" width=\"200\" />\n<a href=\"http://beta.kasabi.com/\">Kasabi</a> is a new, RDF hosting service by <a href=\"http://www.talis.com/\">Talis</a>. It’s still in beta, and I have been testing\ntheir beta service with the <a href=\"https://chem-bla-ics.linkedchemistry.info/2009/11/19/chempedia-rdf-1-sparql-end-point.html\">RDF version <i class=\"fa-solid fa-recycle fa-xs\"></i></a> I created of\n<a href=\"http://metamolecular.com/chempedia/\">ChemPedia Substances</a> (the now no longer existing cool web service from\n<a href=\"http://metamolecular.com/\">MetaMolecular</a> to draw and name organic molecules).</p>\n\n<p>Kasabi makes the RDF data available via a few APIs, depending on the APIs selected by the uploader. I picked all five of them, just to see how\nthings work. Of direct interest are the SPARQL end point, but also the option to host the data as dereferencable resources. Cool! That was just\nwhat was missing for me.</p>\n\n<p>Now, using the API requires you to get an account. This will allow Kasabi to control the traffic, and as such creates a business model around\nproviding services around Open Data. I think this approach will work. But just to make clear, this does mean you need to get an account first,\nif you like to play with this data. Once you got an account, you get an API key, and you can append that to any URI with <code class=\"language-plaintext highlighter-rouge\">?apikey=XXXX</code> to\nauthenticate yourself. I think this does mean Kasabi will have to go to a https connection, which is not yet the case. Moreover, you will need\nto subscribe to the data set too. That, in fact, with #altmetrics in mind, sounds really interesting :)</p>\n\n<p>The ChemPedia RDF data is available at: <a href=\"http://beta.kasabi.com/dataset/chempedia-rdf\">http://beta.kasabi.com/dataset/chempedia-rdf</a></p>\n\n<p>This web page will give the five APIs, of which the augmentation one is really interesting, but I have not played with that yet to say much\nabout it. The idea of that API is to augment RDF you post with data from the data set. Like in a <a href=\"http://en.wikipedia.org/wiki/Augmented_reality\">augmented reality</a>.\nThat should be cool for mashups.</p>\n\n<p>Now, the APIs I do understand include this SPARQL end point (remember to add your API key!):</p>\n\n<p><a href=\"http://labs.kasabi.com/explorer/sparql/sparql-endpoint-chempedia-rdf\">http://labs.kasabi.com/explorer/sparql/sparql-endpoint-chempedia-rdf</a></p>\n\n<p>And the Linked Data feature. In the <a href=\"http://chem-bla-ics.blogspot.com/2011/07/chempedia-rdf-3-uploading-data-to.html\">next post</a>, I will\nexplain how I tweaked the original data, how I uploaded it, and how this resulted in the dereferencable resources, like:</p>\n\n<p><a href=\"http://data.kasabi.com/dataset/chempedia-rdf/substances/2-2595-7562-8125.html\">http://data.kasabi.com/dataset/chempedia-rdf/substances/2-2595-7562-8125.html</a></p>\n\n<p>Note the links for RDF/XML, RDF/JSON, and Turtle, directly accessible by replacing the .html extension with .rdf, .json, and .ttl respectively.\nAn API key does not seem required for this, which makes perfect sense.</p>\n\n<p>It took me some chatting with the people from Talis, who have been very helpful, as the whole platform was a bit overwhelming. But, for the first\ntime ever, I actually got Linked Open Data online, in a Linked Data manner.</p>",
      "summary": "Kasabi is a new, RDF hosting service by Talis. It’s still in beta, and I have been testing their beta service with the RDF version I created of ChemPedia Substances (the now no longer existing cool web service from MetaMolecular to draw and name organic molecules).",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/kasabi.png",
      "date_published": "2011-07-06T00:00:00+00:00",
      "date_modified": "2024-12-29T00:00:00+00:00",
      "tags": ["semweb","kasabi","chemistry"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/9w84r-evn93",
      "url": "https://chem-bla-ics.linkedchemistry.info/2011/04/21/chembl-09-as-rdf.html",
      "title": "ChEMBL 09 as RDF",
      "content_html": "<p><em>Update 2021-02</em>: this post is still the second-most read post in my blog. Welcome! Some updates:</p>\n\n<ul>\n  <li>Ammar Ammar in our BiGCaT group has set up a <a href=\"https://chemblmirror.rdf.bigcat-bioinformatics.org/\">new SPARQL endpoint</a>. Please use and tweet. blog, or otherwise let others now how you use the ChEMBL RDF.</li>\n  <li>Since this post I have <a href=\"https://chem-bla-ics.blogspot.com/search/label/chembl\">blogged a lot more about ChEMBL</a>.</li>\n</ul>\n\n<p><em>Update</em>: this work is now written down in <a href=\"http://chem-bla-ics.blogspot.nl/2013/05/new-paper-chembl-database-as-linked.html\">this paper</a>.</p>\n\n<p>I’m having a really bad month, as you can see from the number of posts. Too much to do, too little time. One of the things\nI have been doing in the past weeks is update the RDF for <a href=\"https://www.ebi.ac.uk/chembldb/\">ChEMBL</a>, now up to\nversion 09. The <a href=\"https://web.archive.org/web/20121123055403/http://rdf.farmbio.uu.se/chembl/sparql\">SPARQL end point <i class=\"fa-solid fa-box-archive fa-xs\"></i></a> has not been updated yet (which is\nstill at ChEBML 04), but you can now download the triples for self-hosting here. Like the database itself, the RDF is\navailable under the <a href=\"http://creativecommons.org/licenses/by-sa/3.0/\">CC-SA-BY license</a>, requiring attribution to both\nthe ChEMBL team as well as our efforts to create the RDF (see this\n<a href=\"https://github.com/egonw/chembl.rdf/blob/master/README.markdown\">README</a>).</p>",
      "summary": "Update 2021-02: this post is still the second-most read post in my blog. Welcome! Some updates:",
      
      "date_published": "2011-04-21T00:00:00+00:00",
      "date_modified": "2023-12-30T00:00:00+00:00",
      "tags": ["chembl","rdf"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2011/02/06/groovy-cheminformatics.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2011/02/06/groovy-cheminformatics.html",
      "title": "Groovy Cheminformatics...",
      "content_html": "<p><strong>Update</strong>: the <a href=\"https://chem-bla-ics.linkedchemistry.info/2012/01/15/groovy-cheminformatics-4th-edition.html\">fourth edition <i class=\"fa-solid fa-recycle fa-xs\"></i></a> is out.</p>\n\n<p>Some project are never finished. Neither is this one, but it is never too late to change how things work, so, taking advantage of\npublishing-on-demand, here I introduce the release-soon, release-often equivalent of cheminformatics books, my\n<a href=\"http://www.lulu.com/product/paperback/groovy-cheminformatics-with-the-chemistry-development-kit/14745007\">Groovy Cheminformatics with the Chemistry Development Kit</a>\nbook:</p>\n\n<p><img src=\"/assets/images/cdkBook.png\" alt=\"\" /></p>\n\n<p>With a serious discount for just being the first edition (1.3.8-0), but still counting at 72 pages with 75 code examples, this edition\nmarks a personal milestone (and probably not much more than that). There remains much to do, but I promised a release by tomorrow, so\nhere it is. Next releases will contain more code examples, more functionality descriptions, and more literature reviewing where such\ncode is used in science. The plan is to make new editions with each new <a href=\"http://cdk.sf.net/\">CDK</a> release, as well as new editions\nwhen I added a new chapter, section, or just paragraph. But, there will not be a Nightly build service anytime soon.</p>\n\n<p>The current table of content is as follows:</p>\n\n<p><img src=\"/assets/images/cdkBookToc1.png\" alt=\"\" /></p>\n\n<p>Now, the book content is <strong><em>not</em></strong> open content. However, it contains nothing that is not available in other means. It’s just the\ncompilation that makes this book interesting, as well as that I put effort in ensuring the code examples remain working.\nFor that, I ask a minor financial contribution.</p>",
      "summary": "Update: the fourth edition is out.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/cdkBook.png",
      "date_published": "2011-02-06T00:00:00+00:00",
      "date_modified": "2024-12-29T00:00:00+00:00",
      "tags": ["cdk","java","cheminf","cdkbook"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/7dnxr-jv029",
      "url": "https://chem-bla-ics.linkedchemistry.info/2011/01/30/github-tip-download-commits-as-patches.html",
      "title": "GitHub Tip: download commits as patches",
      "content_html": "<p><img style=\"float: right;\" src=\"/assets/images/1000px-GitHub.svg.png\" width=\"200\" />\nSome time ago, the brilliant <a href=\"http://github.com/\">GitHub</a> people gave me the following tip. Rajarshi is\n<a href=\"https://sourceforge.net/tracker/index.php?func=detail&amp;aid=3160093&amp;group_id=20024&amp;atid=120024#\">lazy</a>, and might\nfind it interesting. By appending <code class=\"language-plaintext highlighter-rouge\">.patch</code> to the commit URL, a commit can easily be downloaded as patch. That way,\ndevelopers can easily download it with <code class=\"language-plaintext highlighter-rouge\">wget</code> or <code class=\"language-plaintext highlighter-rouge\">curl</code> and apply it locally with <code class=\"language-plaintext highlighter-rouge\">git am</code>,\nwithout having the fetch the full repository.</p>\n\n<p>For example, Dmitry made this commit in his branch, having the URL\n<a href=\"https://github.com/dmak/cdk/commit/9b0478d50c7b5ca10f77fb01d89329db5fe80625\">https://github.com/dmak/cdk/commit/9b0478d50c7b5ca10f77fb01d89329db5fe80625</a>.\nThe patch for this commit can then be downloaded at this URL\n<a href=\"https://github.com/dmak/cdk/commit/9b0478d50c7b5ca10f77fb01d89329db5fe80625.patch\">https://github.com/dmak/cdk/commit/9b0478d50c7b5ca10f77fb01d89329db5fe80625.patch</a>.</p>",
      "summary": "Some time ago, the brilliant GitHub people gave me the following tip. Rajarshi is lazy, and might find it interesting. By appending .patch to the commit URL, a commit can easily be downloaded as patch. That way, developers can easily download it with wget or curl and apply it locally with git am, without having the fetch the full repository.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/1000px-GitHub.svg.png",
      "date_published": "2011-01-30T00:00:00+00:00",
      "date_modified": "2011-01-30T00:00:00+00:00",
      "tags": ["github"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2010/12/29/converting-json-to-rdfxml-with-groovy.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2010/12/29/converting-json-to-rdfxml-with-groovy.html",
      "title": "Converting JSON to RDF/XML with Groovy",
      "content_html": "<p>Mark’s new <a href=\"http://www.science3point0.com/blog/2010/12/29/cc0-rdf-hosting-for-scientists/\">CCO/RDF hosting functionality</a>\n(see also <a href=\"http://chem-bla-ics.blogspot.com/2010/12/what-should-free-cc0-rdf-hosting-for.html\">my post two days ago</a>)\nrequires <a href=\"http://www.w3.org/TR/REC-rdf-syntax/\">RDF/XML format</a>, so I updated my code to convert the\n<a href=\"http://chempedia.com/substances\">Chempedia Substances</a> data into RDF/XML instead of N3 (I have asked\n<a href=\"http://depth-first.com/\">Rich</a> to put a new download link online). This is the\n<a href=\"http://groovy.codehaus.org/\">Groovy</a> code I used:</p>\n\n<div class=\"language-groovy highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kn\">import</span> <span class=\"nn\">groovy.xml.MarkupBuilder</span>\n<span class=\"kn\">import</span> <span class=\"nn\">groovy.util.IndentPrinter</span>\n\n<span class=\"n\">input</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"n\">File</span><span class=\"o\">(</span><span class=\"s2\">\"substances.json\"</span><span class=\"o\">)</span>\n<span class=\"n\">json</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"n\">JsonSlurper</span><span class=\"o\">().</span><span class=\"na\">parse</span><span class=\"o\">(</span><span class=\"n\">input</span><span class=\"o\">);</span>\n\n<span class=\"kt\">def</span> <span class=\"n\">writer</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"n\">StringWriter</span><span class=\"o\">()</span>\n<span class=\"kt\">def</span> <span class=\"n\">xml</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"n\">MarkupBuilder</span><span class=\"o\">(</span>\n  <span class=\"k\">new</span> <span class=\"nf\">IndentPrinter</span><span class=\"o\">(</span><span class=\"k\">new</span> <span class=\"n\">PrintWriter</span><span class=\"o\">(</span><span class=\"n\">writer</span><span class=\"o\">))</span>\n<span class=\"o\">)</span>\n<span class=\"n\">xml</span><span class=\"o\">.</span><span class=\"s1\">'rdf:RDF'</span><span class=\"o\">(</span>\n  <span class=\"s1\">'xmlns:rdf'</span><span class=\"o\">:</span>\n    <span class=\"s1\">'http://www.w3.org/1999/02/22-rdf-syntax-ns#'</span><span class=\"o\">,</span>\n  <span class=\"s1\">'xmlns:dc'</span> <span class=\"o\">:</span>\n    <span class=\"s1\">'http://purl.org/dc/elements/1.1/'</span><span class=\"o\">,</span>\n  <span class=\"s1\">'xmlns:iupac'</span> <span class=\"o\">:</span>\n    <span class=\"s1\">'http://www.iupac.org/'</span><span class=\"o\">,</span>\n  <span class=\"s1\">'xmlns:cp'</span> <span class=\"o\">:</span>\n    <span class=\"s1\">'http://rdf.openmolecules.net/chempedia/onto#'</span><span class=\"o\">,</span>\n  <span class=\"s1\">'xmlns:owl'</span> <span class=\"o\">:</span>\n    <span class=\"s1\">'http://www.w3.org/2002/07/owl#'</span>\n<span class=\"o\">)</span> <span class=\"o\">{</span>\n  <span class=\"n\">json</span><span class=\"o\">.</span><span class=\"na\">each</span> <span class=\"o\">{</span> <span class=\"n\">substance</span> <span class=\"o\">-&gt;</span>\n    <span class=\"n\">xml</span><span class=\"o\">.</span><span class=\"s1\">'rdf:Description'</span><span class=\"o\">(</span>\n      <span class=\"s1\">'rdf:about'</span><span class=\"o\">:</span> <span class=\"n\">substance</span><span class=\"o\">.</span><span class=\"na\">uri</span>\n    <span class=\"o\">)</span> <span class=\"o\">{</span>\n      <span class=\"n\">xml</span><span class=\"o\">.</span><span class=\"s1\">'dc:identifier'</span><span class=\"o\">(</span><span class=\"n\">substance</span><span class=\"o\">.</span><span class=\"na\">gsid</span><span class=\"o\">)</span>\n      <span class=\"n\">xml</span><span class=\"o\">.</span><span class=\"s1\">'owl:sameAs'</span><span class=\"o\">(</span>\n        <span class=\"s1\">'rdf:resource'</span> <span class=\"o\">:</span>\n        <span class=\"s1\">'http://rdf.openmolecules.net/?'</span> <span class=\"o\">+</span>\n        <span class=\"n\">substance</span><span class=\"o\">.</span><span class=\"na\">inchi</span>\n      <span class=\"o\">)</span>\n      <span class=\"n\">xml</span><span class=\"o\">.</span><span class=\"s1\">'iupac:inchi'</span><span class=\"o\">(</span>\n        <span class=\"s1\">'http://rdf.openmolecules.net/?'</span> <span class=\"o\">+</span>\n        <span class=\"n\">substance</span><span class=\"o\">.</span><span class=\"na\">inchi</span>\n      <span class=\"o\">)</span>\n      <span class=\"k\">for</span> <span class=\"o\">(</span><span class=\"kt\">int</span> <span class=\"n\">i</span> <span class=\"o\">=</span> <span class=\"mi\">0</span><span class=\"o\">;</span> <span class=\"n\">i</span><span class=\"o\">&lt;</span><span class=\"n\">substance</span><span class=\"o\">.</span><span class=\"na\">namings</span><span class=\"o\">.</span><span class=\"na\">size</span><span class=\"o\">();</span> <span class=\"n\">i</span><span class=\"o\">++)</span>\n      <span class=\"o\">{</span>\n        <span class=\"n\">naming</span> <span class=\"o\">=</span> <span class=\"n\">substance</span><span class=\"o\">.</span><span class=\"na\">namings</span><span class=\"o\">.</span><span class=\"na\">get</span><span class=\"o\">(</span><span class=\"n\">i</span><span class=\"o\">);</span>\n        <span class=\"n\">namingURI</span> <span class=\"o\">=</span> <span class=\"n\">substance</span><span class=\"o\">.</span><span class=\"na\">uri</span> <span class=\"o\">+</span> <span class=\"s2\">\"/naming\"</span> <span class=\"o\">+</span> <span class=\"n\">i</span><span class=\"o\">;</span>\n        <span class=\"n\">xml</span><span class=\"o\">.</span><span class=\"s1\">'cp:hasNaming'</span> <span class=\"o\">{</span>\n          <span class=\"n\">xml</span><span class=\"o\">.</span><span class=\"s1\">'rdf:Description'</span> <span class=\"o\">{</span>\n            <span class=\"n\">xml</span><span class=\"o\">.</span><span class=\"s1\">'cp:hasName'</span><span class=\"o\">(</span><span class=\"n\">naming</span><span class=\"o\">.</span><span class=\"na\">name</span><span class=\"o\">)</span>\n            <span class=\"n\">xml</span><span class=\"o\">.</span><span class=\"s1\">'cp:hasStatus'</span><span class=\"o\">(</span><span class=\"n\">naming</span><span class=\"o\">.</span><span class=\"na\">status</span><span class=\"o\">)</span>\n            <span class=\"n\">xml</span><span class=\"o\">.</span><span class=\"s1\">'cp:hasScore'</span><span class=\"o\">(</span><span class=\"n\">naming</span><span class=\"o\">.</span><span class=\"na\">score</span><span class=\"o\">)</span>\n          <span class=\"o\">}</span>\n        <span class=\"o\">}</span>\n      <span class=\"o\">}</span>\n    <span class=\"o\">}</span>\n  <span class=\"o\">}</span>\n<span class=\"o\">}</span>\n<span class=\"n\">println</span> <span class=\"n\">writer</span><span class=\"o\">.</span><span class=\"na\">toString</span><span class=\"o\">();</span>\n</code></pre></div></div>",
      "summary": "Mark’s new CCO/RDF hosting functionality (see also my post two days ago) requires RDF/XML format, so I updated my code to convert the Chempedia Substances data into RDF/XML instead of N3 (I have asked Rich to put a new download link online). This is the Groovy code I used:",
      
      "date_published": "2010-12-29T00:00:00+00:00",
      "date_modified": "2010-12-29T00:00:00+00:00",
      "tags": ["groovy","chemistry","rdf","json"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2010/12/03/chemwriter-google-chrome-and-many-eyes.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2010/12/03/chemwriter-google-chrome-and-many-eyes.html",
      "title": "ChemWriter, Google Chrome, and Many Eyes in Open Source",
      "content_html": "<p><a href=\"http://en.wikipedia.org/wiki/Linus'_Law\">Linus’ law</a>:</p>\n\n<ul><i>given enough eyeballs, all bugs are shallow.</i></ul>\n\n<p><a href=\"http://depth-first.com/\">Rich</a> of <a href=\"http://metamolecular.com/\">MetaMolecular</a> works on Open Source and closed source cheminformatics\nsolutions. <a href=\"http://chemwriter.com/\">ChemWriter</a> is one product he is working on which uses JavaScript and <a href=\"http://en.wikipedia.org/wiki/SVG\">SVG</a>\n(two Open Standards), and recently asked feedback on the new version. Test users found a problem on Google’s\n<a href=\"http://www.google.com/chrome\">Chrome</a> browser, and Rich then <a href=\"http://depth-first.com/articles/2010/12/03/the-mysterious-google-chrome-svg-bug/\">did something</a>\nthat is only possible in an Open Source environment: he downloaded the buggy product (Chrome), started looking for the cause, found it, and\nfiled a <a href=\"http://code.google.com/p/chromium/issues/detail?id=65238\">detailed bug report</a>. Just think that would have happened\nif this problem was in MS Internet Explorer…</p>\n\n<p>Well done!</p>",
      "summary": "Linus’ law:",
      
      "date_published": "2010-12-03T00:00:00+00:00",
      "date_modified": "2010-12-03T00:00:00+00:00",
      "tags": ["opensource"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/npbqm-gfa49",
      "url": "https://chem-bla-ics.linkedchemistry.info/2010/10/31/citeulike-cito-use-case-1-wordles.html",
      "title": "CiteULike CiTO Use Case #1: Wordles",
      "content_html": "<p>Last month I reported a <a href=\"https://chem-bla-ics.linkedchemistry.info/2010/09/17/list-of-things-i-miss-in-citeulike.html\">few things I missed <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nin <a href=\"http://www.citeulike.org/\">CiteULike</a>. One of them was support for CiTO (see\ndoi:<a href=\"10.1186/2041-1480-1-S1-S6\">https://doi.org/10.1186/2041-1480-1-S1-S6</a>), a great Citation Typing Ontology.</p>\n\n<p>I promised the CiTO author, <a href=\"http://www.zoo.ox.ac.uk/staff/academics/shotton_dm.htm\">David</a>, my use cases, but have been horribly\nbusy in the past few weeks with my new position, wrapping up my past position, and thinking on my position after Cambridge. But finally, here it is. Based on source code I\n<a href=\"http://github.com/egonw/groovy-citeulike\">wrote and released earlier</a>, the first use case I represent is the\n<a href=\"http://www.wordle.net/\">Wordle</a> one, which I <a href=\"http://chem-bla-ics.blogspot.com/2010/02/wordle-of-titles-of-20-most-recent.html\">showed with manual work in February</a>.</p>\n\n<p>Now that all the data is semantically marked up in CiteULike, I can easily extract all paper titles (or whatever is available in CiteULike) for all papers that cite the first\n<a href=\"http://cdk.sf.net/\">CDK</a> paper (doi:<a href=\"http://dx.doi.org/10.1021/ci025584y\">10.1021/ci025584y</a>). Using the JSON interface, I have\n<a href=\"http://github.com/egonw/groovy-citeulike/blob/master/cul2wordleInput.groovy\">this Groovy script</a> to extract all titles:</p>\n\n<div class=\"language-groovy highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kn\">import</span> <span class=\"nn\">groovyx.net.http.HTTPBuilder</span>\n<span class=\"kn\">import</span> <span class=\"nn\">groovyx.net.http.Method</span>\n<span class=\"kn\">import</span> <span class=\"nn\">static</span> <span class=\"n\">groovyx</span><span class=\"o\">.</span><span class=\"na\">net</span><span class=\"o\">.</span><span class=\"na\">http</span><span class=\"o\">.</span><span class=\"na\">ContentType</span><span class=\"o\">.</span><span class=\"na\">JSON</span>\n\n<span class=\"n\">culUrl</span> <span class=\"o\">=</span> <span class=\"s2\">\"http://www.citeulike.org/\"</span><span class=\"o\">;</span>\n\n<span class=\"n\">citotags</span> <span class=\"o\">=</span> <span class=\"o\">[</span>\n  <span class=\"s2\">\"cito--cites\"</span><span class=\"o\">,</span>\n  <span class=\"s2\">\"cito--usesMethodIn\"</span><span class=\"o\">,</span>\n  <span class=\"s2\">\"cito--discusses\"</span><span class=\"o\">,</span>\n  <span class=\"s2\">\"cito--extends\"</span>\n<span class=\"c1\">// there are more, but these are all</span>\n<span class=\"c1\">// I use right now</span>\n<span class=\"o\">]</span>\n\n<span class=\"n\">papers</span> <span class=\"o\">=</span> <span class=\"o\">[</span>\n  <span class=\"s2\">\"1073448\"</span><span class=\"o\">,</span>\n  <span class=\"s2\">\"423382\"</span>\n<span class=\"o\">]</span>\n\n<span class=\"n\">http</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"n\">HTTPBuilder</span><span class=\"o\">(</span><span class=\"n\">culUrl</span><span class=\"o\">)</span>\n\n<span class=\"n\">papers</span><span class=\"o\">.</span><span class=\"na\">each</span> <span class=\"o\">{</span> <span class=\"n\">paper</span> <span class=\"o\">-&gt;</span>\n  <span class=\"n\">println</span> <span class=\"s2\">\"# Processing $paper...\"</span>\n  <span class=\"n\">citotags</span><span class=\"o\">.</span><span class=\"na\">each</span> <span class=\"o\">{</span> <span class=\"n\">tag</span> <span class=\"o\">-&gt;</span>\n    <span class=\"n\">citation</span> <span class=\"o\">=</span> <span class=\"s2\">\"$tag--$paper\"</span><span class=\"o\">.</span><span class=\"na\">toLowerCase</span><span class=\"o\">()</span>\n    <span class=\"n\">http</span><span class=\"o\">.</span><span class=\"na\">request</span><span class=\"o\">(</span><span class=\"n\">Method</span><span class=\"o\">.</span><span class=\"na\">valueOf</span><span class=\"o\">(</span><span class=\"s2\">\"GET\"</span><span class=\"o\">),</span> <span class=\"n\">JSON</span><span class=\"o\">)</span> <span class=\"o\">{</span>\n      <span class=\"n\">uri</span><span class=\"o\">.</span><span class=\"na\">path</span> <span class=\"o\">=</span> <span class=\"s2\">\"/json/user/egonw/tag/$citation\"</span>\n\n      <span class=\"n\">response</span><span class=\"o\">.</span><span class=\"na\">success</span> <span class=\"o\">=</span> <span class=\"o\">{</span> <span class=\"n\">resp</span><span class=\"o\">,</span><span class=\"n\">json</span> <span class=\"o\">-&gt;</span>\n        <span class=\"n\">json</span><span class=\"o\">.</span><span class=\"na\">each</span> <span class=\"o\">{</span> <span class=\"n\">article</span> <span class=\"o\">-&gt;</span>\n          <span class=\"n\">tripleCount</span> <span class=\"o\">=</span> <span class=\"mi\">0</span><span class=\"o\">;</span>\n          <span class=\"n\">article</span><span class=\"o\">.</span><span class=\"na\">tags</span><span class=\"o\">.</span><span class=\"na\">each</span> <span class=\"o\">{</span> <span class=\"n\">artTag</span> <span class=\"o\">-&gt;</span>\n            <span class=\"k\">if</span> <span class=\"o\">(</span><span class=\"n\">artTag</span><span class=\"o\">.</span><span class=\"na\">startsWith</span><span class=\"o\">(</span><span class=\"n\">tag</span><span class=\"o\">))</span> <span class=\"n\">tripleCount</span><span class=\"o\">++</span>\n          <span class=\"o\">}</span>\n          <span class=\"k\">if</span> <span class=\"o\">(</span><span class=\"n\">tripleCount</span> <span class=\"o\">&gt;</span> <span class=\"mi\">0</span><span class=\"o\">)</span> <span class=\"o\">{</span>\n            <span class=\"n\">title</span> <span class=\"o\">=</span> <span class=\"n\">article</span><span class=\"o\">.</span><span class=\"na\">title</span>\n            <span class=\"n\">title</span> <span class=\"o\">=</span> <span class=\"n\">title</span><span class=\"o\">.</span><span class=\"na\">replaceAll</span><span class=\"o\">(</span><span class=\"s2\">\"\\\\{\"</span><span class=\"o\">,</span><span class=\"s2\">\"\"</span><span class=\"o\">)</span>\n            <span class=\"n\">title</span> <span class=\"o\">=</span> <span class=\"n\">title</span><span class=\"o\">.</span><span class=\"na\">replaceAll</span><span class=\"o\">(</span><span class=\"s2\">\"\\\\}\"</span><span class=\"o\">,</span><span class=\"s2\">\"\"</span><span class=\"o\">)</span>\n            <span class=\"n\">println</span> <span class=\"s2\">\"$title\"</span>\n          <span class=\"o\">}</span>\n        <span class=\"o\">}</span>\n      <span class=\"o\">}</span>\n    <span class=\"o\">}</span>\n  <span class=\"o\">}</span>\n<span class=\"o\">}</span>\n</code></pre></div></div>\n\n<p>The output is two blocks which I can easily copy/paste into Wordle. Now, I think I heard one can actually download the java code, so I am tempted to integrate it later,\nbut for now copy/paste will do fine, after the data handling is mostly automated: with a few lines extra I can make such visualizations for any paper\nI annotated in CiteULike with CiTO.</p>\n\n<p><strong>The CDK I paper</strong></p>\n\n<p><img src=\"/assets/images/wordleCDK1.png\" alt=\"\" /></p>\n\n<p><strong>The CDK II paper</strong></p>\n\n<p><img src=\"/assets/images/wordleCDK2.png\" alt=\"\" /></p>\n\n<p>Interesting differences… more statistics will soon follow. See <a href=\"http://chem-bla-ics.blogspot.com/2010/02/further-statistics-on-papers-citing-cdk.html\">Further statistics on the papers citing the CDK</a>\nfor the kind of analyses I have in mind.</p>",
      "summary": "Last month I reported a few things I missed in CiteULike. One of them was support for CiTO (see doi:https://doi.org/10.1186/2041-1480-1-S1-S6), a great Citation Typing Ontology.",
      
      "date_published": "2010-10-31T00:00:00+00:00",
      "date_modified": "2024-11-02T00:00:00+00:00",
      "tags": ["cito","citeulike","cdk","wordle"],
      "_references": [{ "url": "https://doi.org/10.1186/2041-1480-1-S1-S6" },{ "url": "https://doi.org/10.1021/CI025584Y" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/g2ds0-81a33",
      "url": "https://chem-bla-ics.linkedchemistry.info/2010/09/17/list-of-things-i-miss-in-citeulike.html",
      "title": "A list of things I miss in CiteULike",
      "content_html": "<p>AJCann posted a blog today about what <a href=\"http://scienceoftheinvisible.blogspot.com/2010/09/long-list-of-things-i-dont-like-about.html\">he doesn’t like about Mendeley</a>.\nAbhishek replied that he does not like people complain about one tool, instead of pointing out a good alternative.\n<a href=\"http://www.mendeley.com/\">Mendeley</a> has two alternatives, <a href=\"http://www.zotero.org/\">Zotero</a> and <a href=\"http://www.citeulike.org/\">CiteULike</a> (there is also\n<a href=\"http://connotea.org/\">Connotea</a>, but got behind in evolution).</p>\n\n<p>Agreeing with <a href=\"http://twitter.com/citeulike\">@citeulike</a> and <a href=\"http://twitter.com/abhishektiwari\">@abhishektiwari</a>, as a service provider\nany bad news is good news too: they provide opportunities to improve. So, as encouraged to do so, I reported my long list of things I miss in CiteULike:</p>\n\n<ul>\n  <li>@citeulike ok, one more. wish #18: get readermeter.org to also support citeulike</li>\n  <li>@citeulike wish #17: allow people linking between papers in their libs using CiTO to annotate how they cite papers, see http://ur.ly/lBUO</li>\n  <li>@citeulike wish #16: I think I saw images from some papers, right? how about doing that for #biomedcentral journals too?</li>\n  <li>@citeulike wish #15: at the same http://ur.ly/lIGn page, the tag cloud should reflect tag use with font sizing</li>\n  <li>@citeulike wish #14: upon ‘post url’, the first page with extraced information should allow marking as ‘I am author’ (cannot find that)</li>\n  <li>@citeulike (new) wish #12: clicking an account name should get me to a public portal, rather than just his paper list</li>\n  <li>@citeulike good point, wish #13: be more strong on requiring people to tag papers… and use article keywords as default tags</li>\n  <li>@citeulike wish #11: remove ‘no-tag’ from tag clouds</li>\n  <li>@citeulike wish #10: support #RDF export with BIBO and/or PRISM</li>\n  <li>@citeulike wish #9: use #foaf for the RDFa for account pages, and to mark up friends</li>\n  <li>@citeulike wish #8: and more generally, make #citeulike part of the #linkeddata network (provide an #rdf API)</li>\n  <li>@citeulike wish #7: start using RDFa, e.g. with the PRISM ontology</li>\n  <li>@citeulike wish #6: on an article page (like http://ur.ly/lvWk) summarize the network that bookmarked that article, not just the acc names</li>\n  <li>@citeulike wish #5: don’t show the ‘copy’ button for papers that are already in my archive (really a bug)</li>\n  <li>@citeulike indeed, but don’t or do it right… wish #4: allow people to have that link automatically point to an external blog</li>\n  <li>@citeulike wish #3: provide summaries of lists, like article count per journal and article count per year</li>\n  <li>@citeulike well, I’ll use the blog functoinality to summarize… wish #2: do not try to be a blogging platform</li>\n  <li>@citeulike (new) wish #1: put automatically focus on text field after clicking search and select all text for easy deletion</li>\n</ul>\n\n<p>The reports are now also available in the <a href=\"http://www.citeulike.org/groupfunc/3124/forums\">fora of CiteULike</a>.</p>",
      "summary": "AJCann posted a blog today about what he doesn’t like about Mendeley. Abhishek replied that he does not like people complain about one tool, instead of pointing out a good alternative. Mendeley has two alternatives, Zotero and CiteULike (there is also Connotea, but got behind in evolution).",
      
      "date_published": "2010-09-17T00:00:00+00:00",
      "date_modified": "2010-09-17T00:00:00+00:00",
      "tags": ["cito","citeulike"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/832vn-qwh10",
      "url": "https://chem-bla-ics.linkedchemistry.info/2010/08/14/molecular-chemometrics-principles-3.html",
      "title": "The Molecular Chemometrics Principles #3: stand on shoulders",
      "content_html": "<p>I have blogged about two Molecular Chemometrics principles so far:</p>\n\n<ul>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2010/08/09/molecular-chemometrics-principles-1.html\">McPrinciple #1: access to data</a></li>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2010/08/12/molecular-chemometrics-principles-2-be.html\">McPrinciple #2: be clear in what you mean</a></li>\n</ul>\n\n<p>Peter’s post <a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=2539\">#solo10: Green Chain Reaction; where to store the data? DSR? IR? BioTorrent, OKF or ???</a>\ngives me enough basis to write up a third principle:</p>\n\n<p><strong>Molecular Chemometrics Principles #3</strong>: We make scientific progress if we build on past achievements.</p>\n\n<p>Sounds logical, right? Practically, the way we share our cheminformatics knowledge makes this standing on shoulders pretty difficult.\nBut there is one particular aspect I would like to ask your attention for: you can contribute by making clear what shoulders\nyou would like to stand on. That is, where do you prefer to put your effort, and what message would you like to give to your user community.</p>\n\n<p>In the aforelinked post, Peter asks where he should upload his data, and he suggest <a href=\"http://www.biotorrents.net/\">BioTorrent</a> (see my review\n<a href=\"http://chem-bla-ics.blogspot.com/2010/04/bittorrents-for-science.html\">BitTorrents for Science</a>), DSpace, and <a href=\"http://www.ckan.net/\">CKAN</a>.\nNow, his <a href=\"http://www.google.se/search?sourceid=chrome&amp;client=ubuntu&amp;channel=cs&amp;ie=UTF-8&amp;q=%22Green+Chain+Reaction%22\">Green Chain Reaction</a>\nis picked up (see <a href=\"http://researchremix.wordpress.com/2010/08/11/green-chain-reaction-project-putting-my-minutes-where-my-mouth-is/\">these</a>\n<a href=\"http://scienceonlinelondon.wikidot.com/topics:green-chain-reaction\">few</a> <a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=2538\">blog</a> posts),\nand the resulting data should be distributed as much as possible. The exact location does not really matter…</p>\n\n<p>But…</p>\n\n<p>By picking where you upload, you make a statement to your community: “<em>Look guys, we are distributing our data via Foo, because we believe those guys are doing good work! Perhaps you can support them too.</em>”.</p>\n\n<p>This principle does not only apply to data, it applies to things too. For example, when\n<a href=\"http://www.chemspider.com/blog/ichemlabs-and-rsc-chemspider-announce-partnership.html\">iChemLabs and RSC ChemSpider Announce Partnership</a>\nthey do not just improve the user experience of ChemSpider (which I certainly won’t object against), but they also imply\n“<em>Look dudes, your product is just not good enough and we do not want to help you improve it either</em>”.\nOf course, ChemSpider has every right, and for them to succeed it is crucial to make decisions like this. Fortunately,\n<a href=\"http://web.chemdoodle.com/installation.php\">ChemDoodle is GPL</a>.</p>\n\n<p>Every project with a user base has the opportunity to support shoulders, if they only visibly stand on them. By merely discussion the\n<em>Green Chain Reaction</em>, I show to support this social web experiment. You can too. Use these powers wisely. May the McPrinciples be with you.</p>",
      "summary": "I have blogged about two Molecular Chemometrics principles so far:",
      
      "date_published": "2010-08-14T00:00:00+00:00",
      "date_modified": "2010-08-14T00:00:00+00:00",
      "tags": ["mcprinciples","solo10","chemdoodle","chemspider","javascript"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/dzqvt-ynv20",
      "url": "https://chem-bla-ics.linkedchemistry.info/2010/08/12/molecular-chemometrics-principles-2-be.html",
      "title": "The Molecular Chemometrics Principles #2: be clear in what you mean",
      "content_html": "<p>I noted <a href=\"https://chem-bla-ics.linkedchemistry.info/2010/08/09/molecular-chemometrics-principles-1.html\">earlier this week</a>\nthat <em>[d]uring the week [in <a href=\"/2010/08/06/oxford-2.html\">Oxford <i class=\"fa-solid fa-recycle fa-xs\"></i></a>], someone (name and address is know at the\neditorial office) commented on the fact that my blog posts are somewhat difficult to follow; that is, it’s\noften not clear why I am posting what I am posting</em>. This triggered the start of a series of principles in\nthe field I coined <a href=\"https://doi.org/10.1080/10408340600969601\">Molecular Chemometrics</a>, and the promise\nthat I will try to indicate in each blog post to which of these principles it relates. Just to put things in a bit more\nperspective; to make a bit more clear why I am blogging about that bit; just to be clear in what I mean.</p>\n\n<p>Now, the first principle was about the need for access to data (<a href=\"https://chem-bla-ics.linkedchemistry.info/2010/08/09/molecular-chemometrics-principles-1.html\">McPrinciple #1</a>).\nThis principle goes without saying, one would think, but is not widely accepted yet. This is why Open Data promotion is still needed. For example, data in papers\nstill is not freely redistributable, as <a href=\"https://chem-bla-ics.linkedchemistry.info/2010/08/09/molecular-chemometrics-principles-1.html\">Peter points out once again</a>.</p>\n\n<p>Anyway, this post is not about McPrinciple #1, but about the second principle.</p>\n\n<p><strong>Molecular Chemometrics Principles #2</strong>: In order to reproduce cheminformatics studies you need to be able to understand the input data.</p>\n\n<p>Readers of my blog will surely recognize this theme. Clearly this theme explains my past fetish for the\n<a href=\"http://chem-bla-ics.blogspot.com/search?q=CML\">Chemical Markup Language</a>, and my more recent work on the\n<a href=\"http://chem-bla-ics.blogspot.com/search?q=RDF\">Resource Description Framework</a>.</p>\n\n<p>And it is so easy to jump to conclusions. Easy to make mistakes. And this is not just at the received side; the sending\nperson may have accidentally made a mistake, or left something accidentally unclear, causing incorrect assumptions, and\ntherefore errors in the cheminformatics computation. Now, if the data was semantically (clearly) annotated, and the\nmeaning was clear, it was also trivial to see when a mistake had sneaked in. Think of it as a check bit.</p>\n\n<p>“Well, isn’t this a bit exaggerated,” you might say. Perhaps, perhaps not. An simple, recent example. We all know\n<a href=\"http://www.opensmiles.org/\">SMILES</a>, right? And we all know that lower case element symbols indicate aromaticity, right?\nThat is, c1ccccc1 is aromatic, right? So, what’s the problem then?</p>\n\n<p>Now, consider the SMILES string c1ccc1. Lower case carbon element symbols, so aromatic, right? Oh, wait…</p>\n\n<p>Therefore, be clear in what you mean. It saves us from a lot of trouble.</p>\n\n<p>Further reading:</p>\n\n<ul>\n  <li><a href=\"https://chem-bla-ics.linkedchemistry.info/2010/08/09/molecular-chemometrics-principles-1.html\">The Molecular Chemometrics Principles #1: access to data</a></li>\n  <li>Molecular Chemometrics, 2006 (doi:<a href=\"https://doi.org/10.1080/10408340600969601\">10.1080/10408340600969601</a>)</li>\n</ul>",
      "summary": "I noted earlier this week that [d]uring the week [in Oxford ], someone (name and address is know at the editorial office) commented on the fact that my blog posts are somewhat difficult to follow; that is, it’s often not clear why I am posting what I am posting. This triggered the start of a series of principles in the field I coined Molecular Chemometrics, and the promise that I will try to indicate in each blog post to which of these principles it relates. Just to put things in a bit more perspective; to make a bit more clear why I am blogging about that bit; just to be clear in what I mean.",
      
      "date_published": "2010-08-12T00:00:00+00:00",
      "date_modified": "2024-05-18T00:00:00+00:00",
      "tags": ["mcprinciples","chemometrics","rdf","cml","semweb"],
      "_references": [{ "url": "https://doi.org/10.1080/10408340600969601" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/srwf0-4gf52",
      "url": "https://chem-bla-ics.linkedchemistry.info/2010/08/09/molecular-chemometrics-principles-1.html",
      "title": "The Molecular Chemometrics Principles #1: access to data",
      "content_html": "<p>The meetings in and around Oxford were great! I already wrote that the Predictive Toxicology workshop was brilliant\n(see <a href=\"/2010/08/01/oxford.html\">Oxford… #1 <i class=\"fa-solid fa-recycle fa-xs\"></i></a>) and\n<a href=\"/2010/08/06/oxford-2.html\">Oxford… #2 <i class=\"fa-solid fa-recycle fa-xs\"></i></a>), but I also very, very much enjoyed meeting up\nwith <a href=\"http://www.danhagon.me.uk/blog/\">Dan</a> and <a href=\"http://semanticscience.wordpress.com/\">Nico</a>! During the week, someone\n(name and address is know at the editorial office) commented on the fact that my blog posts are somewhat difficult\nto follow; that is, it’s often not clear why I am posting what I am posting.</p>\n\n<p>Indeed, I am not particularly one of those bloggers who spends trees after trees, in great detail explaining what is going on.\nI do make a lot of use of <a href=\"http://en.wikipedia.org/wiki/Hyperlink\">hyperlinking</a>; much more than the average blogger. I\nactually assume that readers follow links, to read about the perspective of a blog post. But we all know that scientists\ndo not read the cited papers in a paper they are reading, so who am I to assume blog readers would start doing that with blogs :)</p>\n\n<p>Well, since <a href=\"/2010/02/19/open-data-panton-principles.html\">principles seems popular <i class=\"fa-solid fa-recycle fa-xs\"></i></a>, it might be\na good start of my grand scheme that is behind this blog: the Molecular Chemometrics Principles. Hence, this first post about\nthe why. The why is simply to provide a reference frame to what I am blogging about. In the next few posts on these\nMcPrinciples (is that a catchy name, or what?) that will appear over the next two weeks, I will outline the code of\nchem-bla-ics. And, moreover, from now on, I will tag all my posts with the reaons why I make that post. I am sure that will\nnot be too helpful for the occasional reader, but for anyone who is serious about chem-bla-ics, this will be a genuine gold\nmine of data for pattern recognition and data mining otherwise.</p>\n\n<p>So, here goes.</p>\n\n<p><strong>Molecular Chemometrics Principles #1</strong>: In order to reproduce cheminformatics studies you need access to the input data.</p>\n\n<p>The reason for this is that statistical modeling very much depends on the data on which modeling was done, patterns\nwere recognized, etc. Therefore, without the input data, it is practically impossible to accurately reproduce results.\nFortunately, the acceptance of the importance of access to data (e.g. as Open Data) is slowly getting momentum in\nscience.</p>\n\n<p>Further reading: Molecular Chemometrics, 2006 (doi:<a href=\"https://doi.org/10.1080/10408340600969601\">10.1080/10408340600969601</a>)</p>",
      "summary": "The meetings in and around Oxford were great! I already wrote that the Predictive Toxicology workshop was brilliant (see Oxford… #1 ) and Oxford… #2 ), but I also very, very much enjoyed meeting up with Dan and Nico! During the week, someone (name and address is know at the editorial office) commented on the fact that my blog posts are somewhat difficult to follow; that is, it’s often not clear why I am posting what I am posting.",
      
      "date_published": "2010-08-09T00:00:00+00:00",
      "date_modified": "2024-05-18T00:00:00+00:00",
      "tags": ["chemometrics","mcprinciples"],
      "_references": [{ "url": "https://doi.org/10.1080/10408340600969601" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/31apn-15c92",
      "url": "https://chem-bla-ics.linkedchemistry.info/2010/08/06/oxford-2.html",
      "title": "Oxford... #2",
      "content_html": "<p>The <a href=\"/2010/08/01/oxford.html\">Predictive Toxicology <i class=\"fa-solid fa-recycle fa-xs\"></i></a> meeting is over. It was a great meeting, by any standard.\nVery much recommended, and many thanx to Barry for the organization! The meeting was a true workshop, with a mix of presentations and getting\nwork done. I participated in a group that looked at mutagenicity of potential anti-malaria drugs from the datasets of GSK and Novartis recently\nrelease as Open Data. We used various tools to predict properties, and plan to make all our results freely available soon. Otherwise, it was\nalso great to meet Nina again (with whom I <a href=\"https://chem-bla-ics.blogspot.com/2010/08/using-bioclipse-to-upload-data-to.html\">talked about OpenTox</a>),\nand to meet other CDK users, including Patrik (<a href=\"https://web.archive.org/web/20100918124243/https://www.farma.ku.dk/smartcyp/\">SMARTCyp <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>,\ndoi:<a href=\"https://doi.org/10.1021/ml100016x\">10.1021/ml100016x</a>) and David (<a href=\"http://inkspotscience.com/\">Inkspot</a>).</p>\n\n<p>In the afternoon I walked around a bit more in Oxford, did some more shopping… and visited the Apple shop and played with an iPad. It’s\nindeed a great piece of hardware. Looking forward to the first Android versions :)</p>\n\n<p><img src=\"/assets/images/DSCI0107.JPG\" alt=\"\" /></p>",
      "summary": "The Predictive Toxicology meeting is over. It was a great meeting, by any standard. Very much recommended, and many thanx to Barry for the organization! The meeting was a true workshop, with a mix of presentations and getting work done. I participated in a group that looked at mutagenicity of potential anti-malaria drugs from the datasets of GSK and Novartis recently release as Open Data. We used various tools to predict properties, and plan to make all our results freely available soon. Otherwise, it was also great to meet Nina again (with whom I talked about OpenTox), and to meet other CDK users, including Patrik (SMARTCyp , doi:10.1021/ml100016x) and David (Inkspot).",
      
      "date_published": "2010-08-06T00:00:00+00:00",
      "date_modified": "2024-05-18T00:00:00+00:00",
      "tags": ["cdk","oxford","oxfordadmet2010","conference"],
      "_references": [{ "url": "https://doi.org/10.1021/ml100016x" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/ap7n7-58v06",
      "url": "https://chem-bla-ics.linkedchemistry.info/2010/08/01/oxford.html",
      "title": "Oxford...",
      "content_html": "<p>Yesterday I arrived in <a href=\"http://en.wikipedia.org/wiki/Oxford\">Oxford</a>, after a 3.5 hour bus transfer from\n<a href=\"http://en.wikipedia.org/wiki/London_Stansted_Airport\">London Stansted</a>. Long, boring ride (though I might have seen a few\n<a href=\"https://web.archive.org/web/20100728051221/http://www.rspb.org.uk/wildlife/birdguide/name/r/redkite/index.aspx\">red kites <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>, but seeing that they were near extinct, I am\nwondering what other large bird of prey has strong split tail like a swallow). Showed once more that the UK infrastructure has\nhardly changed since the 19th century. Enjoying an undergraduate room at one of the colleges. Pretty basic, but makes me feel\nmore like a human than a tourist. Yes!, undergraduate students are human too! One of the advantages is you get an excellent\ninternet connection :)</p>\n\n<p>Anyways, going to the <a href=\"https://web.archive.org/web/20111001000000*/http://echeminfo.com/comty_oxfordadmet10\">Predictive Toxicology <i class=\"fa-solid fa-box-archive fa-xs\"></i></a> workshop, thanx to the bursary award I received from\n<a href=\"https://web.archive.org/web/20110207193345/http://echeminfo.com/\">echeminfo <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>\n(see <a href=\"http://chem-bla-ics.blogspot.com/2010/03/oxford-august-2010-echeminfo-predictive.html\">Oxford, August 2010: eCheminfo Predictive ADME &amp; Toxicology 2010 Workshop</a>).</p>\n\n<p>This afternoon I walked around a bit, watching all the old buildings. But I guess being here without anyone to share it with,\nand that it looks just like <a href=\"http://en.wikipedia.org/wiki/Cambridge\">Cambridge</a>, makes me not-so-much impressed. Moreover, it’s too\nbusy with tourists and people randomly wearing Oxford University sweatshirts. Small and nice was the\n<a href=\"http://www.mhs.ox.ac.uk/\">Museum of the History of Science</a>, with some nice chemical pieces, like this one:</p>\n\n<p><img src=\"/assets/images/DSCI0089.JPG\" alt=\"\" /></p>\n\n<p>Buildings like the <a href=\"http://en.wikipedia.org/wiki/Radcliffe_Camera\">Radcliffe Camera</a> are nice on the outside, but closed.\nSeems I have to become a fellow first. This is what it looked like today:</p>\n\n<p><img src=\"/assets/images/DSCI0094.JPG\" alt=\"\" /></p>\n\n<p>Quite interesting too was the Oxford University Press shop. I’m a sucker for books. Apparently, you can just write a book\nand publish it. For example, an extensive list of <a href=\"http://ukcatalogue.oup.com/category/academic/series/general/opr.do\">dictionaries on about anything</a>…\nand since I have been writing several book chapters right now, perhaps this is actually an interesting route…</p>\n\n<p>But the question is, of course, how long will we keep reading books… they’re the\n<a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=1064\">hamburgers</a> of educational material… Kindle and alikes will soon drop in\nprice, and cost some €30 euro. But e-book prices will have to drop too, and I still do not get why an e-book is more expensive than a paperback…\n(see <a href=\"http://chem-bla-ics.blogspot.com/2010/07/amazon-kindle-edition-is-more-expensive.html\">Amazon, the Kindle edition is more expensive than the paperback??</a>).\nBut then again… they are rich, and I am not.</p>\n\n<p>There was some recent talk about the fact that no one can be Open to the full. You either do Open Data or Open Source, and\nmake a living from the rest. That’s where I nicely show I know bullocks of economics. I do\n<a href=\"http://bodr.sf.net/\">BODR</a>, <a href=\"http://cdk.sf.net/\">CDK</a>, … all Open, all for free.</p>\n\n<p>OK. That’s a plus for Oxford… it makes you think about things. Perhaps there is something to\n<a href=\"http://en.wikipedia.org/wiki/Morphic_field#Morphogenetic_field\">morphogenetic</a> fields…</p>",
      "summary": "Yesterday I arrived in Oxford, after a 3.5 hour bus transfer from London Stansted. Long, boring ride (though I might have seen a few red kites , but seeing that they were near extinct, I am wondering what other large bird of prey has strong split tail like a swallow). Showed once more that the UK infrastructure has hardly changed since the 19th century. Enjoying an undergraduate room at one of the colleges. Pretty basic, but makes me feel more like a human than a tourist. Yes!, undergraduate students are human too! One of the advantages is you get an excellent internet connection :)",
      
      "date_published": "2010-08-01T00:00:00+00:00",
      "date_modified": "2024-05-18T00:00:00+00:00",
      "tags": ["oxford","oxfordadmet2010","publishing","science","toxicology","conference"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/q5sed-jea02",
      "url": "https://chem-bla-ics.linkedchemistry.info/2010/02/19/open-data-panton-principles.html",
      "title": "Open Data: the Panton Principles",
      "content_html": "<p>The <a href=\"http://blog.okfn.org/2010/02/19/launch-of-the-panton-principles-for-open-data-in-science/\">announcement</a> of the\n<a href=\"http://web.archive.org/web/20100222213041/http://pantonprinciples.org/\">Panton Principles <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>\n<a href=\"http://opendotdotdot.blogspot.com/2010/02/open-data-question-of-panton-principles.html\">is</a>\n<a href=\"http://web.archive.org/web/20100223064514/http://scienceblogs.com/commonknowledge/2010/02/reaching_agreement_on_the_publ.php\">the <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>\n<a href=\"http://usefulchem.blogspot.com/2010/02/support-open-data-by-endorsing-panton.html\">big</a>\n<a href=\"http://www.sennoma.net/main/archives/2010/02/panton_principles_for_open_dat.php\">news</a>\n<a href=\"http://www.nextgenerationscience.com/open-access/the-panton-principles-for-open-data-in-science/\">today</a>,\nthough Peter already spoke about them\n<a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=1939\">in May last year</a> (see coverage on\n<a href=\"http://friendfeed.com/search?q=panton+principles\">FriendFeed</a> and\n<a href=\"http://search.twitter.com/search?q=panton+principles\">Twitter</a>). The four principles list in their short versions:</p>\n\n<ul>\n  <li>When publishing data make an explicit and robust statement of your wishes.</li>\n  <li>Use a recognized waiver or license that is appropriate for data.</li>\n  <li>If you want your data to be effectively used and added to by others it should be open as defined by the Open Knowledge/Data Definition – in particular non-commercial and other restrictive clauses should not be used.</li>\n  <li>Explicit dedication of data underlying published science into the public domain via PDDL or CCZero is strongly recommended and ensures compliance with both the Science Commons Protocol for Implementing Open Access Data and the Open Knowledge/Data Definition.</li>\n</ul>\n\n<p>I think these are very workable next steps in Open Date, perhaps even worthy end goals.\n<a href=\"http://web.archive.org/web/20100222084119/http://pantonprinciples.org/endorse\">I endorse them <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>.</p>\n\n<p><img src=\"/assets/images/panton.png\" alt=\"Sort of logo for the Panton Principles, showing this name and the text &quot;Principles for Open Data in Science&quot;.\" /></p>\n\n<p><strong>Principle 1: an explicit and robust statement</strong> <br />\nThis is in my opinion the most important principle. Too often you find a database with really useful data, but without\nany clue about what you are allowed to do with this data. Of course, I can contact the authors, get their permission, etc.\nThey probably like it that way, and I can even understand that. However, it does not scale, and it is slow. Even worse is\nthe situation when the original composer gets missing in action. Both are equally valid, but explicit statements just make\nthings easier.</p>\n\n<p><strong>Principle 2: use a waiver or license appropriate for data</strong> <br />\nThis principle is debatable. Very much like the BSD-vs-GPL flamewars, some like copylefting, others do not. There is an\nimportant difference though. Software has the concept of interfaces, allowing to more easily share incompatible licenses\ncleanly separated by these interfaces. This, for example, allows you to run proprietary software on a Linux kernel.\nHowever, data sets do not have such a concept. There is not such thing as an interface between two numbers.</p>\n\n<p>This makes the concept of mixing data sets different: because there is no such interface, any mixing can only happen\nbetween compatible licenses. This is one reason behind the choice of very liberal licenses like\n<a href=\"http://creativecommons.org/license/zero\">CC0</a>. This license, or waiver really, allows you to do anything, and most\ncertainly, mix data sets.</p>\n\n<p>And that makes things a lot easier. But then again, while these are nobel goals, I rather see people use a copylefting\nlicenses than no license at all.</p>\n\n<p><strong>Principle 3: non-commercial and other restrictive clauses should not be used</strong> <br />\nI think again making things easier is the goal. The non-commercial clause is interesting, and actually likely an important\none. Consider course material, a course book. Those are commercial. Some even argued that many universities themselves are\nactually commercial entities.</p>\n\n<p><strong>Principle 4: the public domain via PDDL or CCZero is strongly recommended</strong> <br />\nI second these choices over a mere claim claim that the data is public domain. The PD concept has many meanings and not\nthe same in every jurisdiction. In particular, differences between USA and EU law. Waiving these right, which is just\nthe same as claiming public domain, works in any jurisdiction, again, making things a lot easier.</p>\n\n<p><strong>Open Data, Open Source, Open Standards are not goals</strong> <br />\nThe underlying pattern of my comments must be clear: the principles make life easier. This is all what Open Source and Open Standards\n(<a href=\"http://blueobelisk.stackexchange.com/questions/231/what-formats-fall-into-open-specification\">whatever</a>\n<a href=\"http://blueobelisk.stackexchange.com/questions/106/which-formats-fall-into-open-data-open-source-and-open-standards\">those</a>\n<a href=\"http://sourceforge.net/mailarchive/forum.php?thread_name=6aeb064b1002162228qcc0603eo8f363a13f7d46805@mail.gmail.com&amp;forum_name=blueobelisk-discuss\">are</a>).</p>\n\n<ul>\n<i><b>The three pillars of the ODOSOS mantra is not goals, but merely the means of making life easier.</b></i>\n</ul>\n\n<p>The Panton Principles certainly make life easier in Open Data, and initiative like the\n<a href=\"http://esw.w3.org/topic/HCLSIG/LODD/\">Linking Open Drug Data</a> in which I participate will greatly benefit\nfrom people adopting them.</p>\n\n<p>The Principles do not solve all problems. There is still a lot of ‘Open Data’ licensed with unrecommended licenses.\nFor example, the <a href=\"http://chem-bla-ics.blogspot.com/2009/09/open-chemical-data-1-nmrshiftdb.html\">NMRShiftDB</a> uses a\nGNU FDL license, and data from supplementary material of Open Access journal articles is like Creative Commons.</p>\n\n<p><img src=\"/assets/images/panton_is_it_open_data.png\" alt=\"Screenshot of the &quot;Is it Open Data?&quot; website, showing starting points like the &quot;How Does It Work?&quot; button.\" /></p>\n\n<p>Another related initiative should certainly not go unnoticed either: <a href=\"http://www.isitopendata.org/\">Is it Open Data?</a>\nis a service where you can try to resolve what the license is for one of those databases which is not quite\nPanton Principles compatible yet.</p>\n\n<p>OK, one last thing. The <a href=\"http://www.volkskrant.nl/binnenland/article1351058.ece/Krachtmeting_in_kabinet_om_Uruzgan\">Dutch government is bursting</a>,\nand I want to listen to the music. With permission, I have been hacking the Panton Principles endorsement page,\nand injected some extra span elements, to make it easier to machine process (again, to make things easier), so\nyou can use the following one-liner to calculate the number of people endorsing the principles:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nv\">$ </span>wget <span class=\"nt\">-O</span> endorsed.html http://pantonprinciples.org/endorsed.html <span class=\"p\">;</span> xpath <span class=\"nt\">-q</span> <span class=\"nt\">-e</span> <span class=\"s2\">\"//span[@class='signature']/span[@class='Country']/text()\"</span> endorsed.html | <span class=\"nb\">sort</span> | <span class=\"nb\">uniq</span> <span class=\"nt\">-c</span>\n</code></pre></div></div>\n\n<p>The current count is <a href=\"http://pantonprinciples.org/endorse/\">hitting 44 now</a>, and has not quite reached the\n<a href=\"http://friendfeed.com/openchemicaldata/e6236e5a/panton-principles-endorse-open-data-go-visit\">500 I had hoped for</a> yet:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>1 Australia\n1 Canada\n1 Catalonia\n2 Espana\n2 France\n6 Germany\n1 Greece\n1 Italy\n1 Netherlands\n1 New Zealand\n1 Norway\n1 Poland\n1 Slovenia\n1 Sweden\n1 Switzerland\n1 The Netherlands\n9 UK\n1 U.K.\n1 United Kingdom\n1 United States of America\n9 USA\n</code></pre></div></div>\n\n<p>Anyone knows how we can convert this into some nice world map graphics with a few lines of code?</p>\n\n<p>Now, I am looking for a bar in Uppsala to write up some ideas about what specifications are :)</p>",
      "summary": "The announcement of the Panton Principles is the big news today, though Peter already spoke about them in May last year (see coverage on FriendFeed and Twitter). The four principles list in their short versions:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/panton_is_it_open_data.png",
      "date_published": "2010-02-19T00:00:00+00:00",
      "date_modified": "2024-03-23T00:00:00+00:00",
      "tags": ["opendata"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2009/11/19/chempedia-rdf-1-sparql-end-point.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/11/19/chempedia-rdf-1-sparql-end-point.html",
      "title": "ChemPedia RDF #1: the SPARQL end point",
      "content_html": "<p>Well, you might spot a pattern here; yes, another chemical <a href=\"http://pele.farmbio.uu.se/cc0/sparql\">SPARQL end point</a>\n(actually, it shares the end point with the <a href=\"http://chem-bla-ics.blogspot.com/2009/11/open-notebook-science-solubility-sparql.html\">Solubility data</a>).\nThis time around <a href=\"http://depth-first.com/\">Rich</a>’s <a href=\"http://chempedia.com/substances\">ChemPedia</a>. Taking advantage of the\n<a href=\"https://doi.org/10.59350/kprj3-gyg97\">CC0-licensed downloads <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\nI have created a small <a href=\"http://groovy.codehaus.org/\">Groovy</a> script (using this <a href=\"http://json-lib.sourceforge.net/\">JSON library</a>)\nto convert the ChemPedia <a href=\"http://en.wikipedia.org/wiki/Json\">JSON</a> into\n<a href=\"http://en.wikipedia.org/wiki/Notation3\">Notation3</a>:</p>\n\n<div class=\"language-groovy highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kn\">import</span> <span class=\"nn\">net.sf.json.groovy.JsonSlurper</span><span class=\"o\">;</span>\n\n<span class=\"n\">input</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"n\">File</span><span class=\"o\">(</span><span class=\"s2\">\"substances.json\"</span><span class=\"o\">)</span>\n<span class=\"n\">json</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"n\">JsonSlurper</span><span class=\"o\">().</span><span class=\"na\">parse</span><span class=\"o\">(</span><span class=\"n\">input</span><span class=\"o\">);</span>\n\n<span class=\"n\">println</span> <span class=\"s2\">\"@prefix dc: &lt;http://purl.org/dc/elements/1.1/&gt;\"</span><span class=\"o\">;</span>\n<span class=\"n\">println</span> <span class=\"s2\">\"@prefix cp: &lt;http://rdf.openmolecules.net/chempedia/onto#&gt;\"</span><span class=\"o\">;</span>\n<span class=\"n\">json</span><span class=\"o\">.</span><span class=\"na\">each</span> <span class=\"o\">{</span> <span class=\"n\">it</span> <span class=\"o\">-&gt;</span>\n  <span class=\"n\">println</span> <span class=\"s2\">\"&lt;\"</span> <span class=\"o\">+</span> <span class=\"n\">it</span><span class=\"o\">.</span><span class=\"na\">uri</span> <span class=\"o\">+</span> <span class=\"s2\">\"&gt; dc:identifier \\\"\"</span> <span class=\"o\">+</span> <span class=\"n\">it</span><span class=\"o\">.</span><span class=\"na\">gsid</span> <span class=\"o\">+</span> <span class=\"s2\">\"\\\";\"</span><span class=\"o\">;</span>\n  <span class=\"n\">println</span> <span class=\"s2\">\" &lt;http://www.w3.org/2002/07/owl#sameAs&gt; &lt;http://rdf.openmolecules.net/?\"</span> <span class=\"o\">+</span> <span class=\"n\">it</span><span class=\"o\">.</span><span class=\"na\">inchi</span> <span class=\"o\">+</span> <span class=\"s2\">\"&gt;;\"</span><span class=\"o\">;</span>\n  <span class=\"n\">println</span> <span class=\"s2\">\"  &lt;http://www.iupac.org/inchi&gt; \\\"\"</span> <span class=\"o\">+</span> <span class=\"n\">it</span><span class=\"o\">.</span><span class=\"na\">inchi</span> <span class=\"o\">+</span> <span class=\"s2\">\"\\\".\"</span><span class=\"o\">;</span>\n  <span class=\"k\">if</span> <span class=\"o\">(</span><span class=\"n\">it</span><span class=\"o\">.</span><span class=\"na\">namings</span><span class=\"o\">.</span><span class=\"na\">size</span><span class=\"o\">()</span> <span class=\"o\">&gt;</span> <span class=\"mi\">0</span><span class=\"o\">)</span> <span class=\"o\">{</span>\n    <span class=\"k\">for</span> <span class=\"o\">(</span><span class=\"kt\">int</span> <span class=\"n\">i</span> <span class=\"o\">=</span> <span class=\"mi\">0</span><span class=\"o\">;</span> <span class=\"n\">i</span><span class=\"o\">&lt;</span><span class=\"n\">it</span><span class=\"o\">.</span><span class=\"na\">namings</span><span class=\"o\">.</span><span class=\"na\">size</span><span class=\"o\">();</span> <span class=\"n\">i</span><span class=\"o\">++)</span> <span class=\"o\">{</span>\n      <span class=\"n\">naming</span> <span class=\"o\">=</span> <span class=\"n\">it</span><span class=\"o\">.</span><span class=\"na\">namings</span><span class=\"o\">.</span><span class=\"na\">get</span><span class=\"o\">(</span><span class=\"n\">i</span><span class=\"o\">);</span>\n      <span class=\"n\">namingURI</span> <span class=\"o\">=</span> <span class=\"n\">it</span><span class=\"o\">.</span><span class=\"na\">uri</span> <span class=\"o\">+</span> <span class=\"s2\">\"/naming\"</span> <span class=\"o\">+</span> <span class=\"n\">i</span><span class=\"o\">;</span>\n      <span class=\"n\">println</span> <span class=\"s2\">\"&lt;\"</span> <span class=\"o\">+</span> <span class=\"n\">it</span><span class=\"o\">.</span><span class=\"na\">uri</span> <span class=\"o\">+</span> <span class=\"s2\">\"&gt; cp:hasNaming \"</span> <span class=\"o\">+</span>\n        <span class=\"s2\">\"&lt;\"</span> <span class=\"o\">+</span> <span class=\"n\">namingURI</span> <span class=\"o\">+</span> <span class=\"s2\">\"&gt;.\"</span><span class=\"o\">;</span>\n      <span class=\"n\">println</span> <span class=\"s2\">\"&lt;\"</span> <span class=\"o\">+</span> <span class=\"n\">namingURI</span> <span class=\"o\">+</span> <span class=\"s2\">\"&gt; a cp:Naming;\"</span><span class=\"o\">;</span>\n      <span class=\"n\">println</span> <span class=\"s2\">\"  cp:hasName \\\"\"</span> <span class=\"o\">+</span> <span class=\"n\">naming</span><span class=\"o\">.</span><span class=\"na\">name</span> <span class=\"o\">+</span> <span class=\"s2\">\"\\\";\"</span><span class=\"o\">;</span>\n      <span class=\"n\">println</span> <span class=\"s2\">\"  cp:hasStatus \\\"\"</span> <span class=\"o\">+</span> <span class=\"n\">naming</span><span class=\"o\">.</span><span class=\"na\">status</span> <span class=\"o\">+</span> <span class=\"s2\">\"\\\";\"</span><span class=\"o\">;</span>\n      <span class=\"n\">println</span> <span class=\"s2\">\"  cp:hasScore \\\"\"</span> <span class=\"o\">+</span> <span class=\"n\">naming</span><span class=\"o\">.</span><span class=\"na\">score</span> <span class=\"o\">+</span> <span class=\"s2\">\"\\\".\"</span><span class=\"o\">;</span>\n    <span class=\"o\">}</span>\n  <span class=\"o\">}</span>\n<span class=\"o\">}</span>\n</code></pre></div></div>\n\n<p>After uploading it into <a href=\"http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VOSIndex\">Virtuoso</a> (now using <code class=\"language-plaintext highlighter-rouge\">DB.DBA.TTLP</code> instead of\n<a href=\"http://chem-bla-ics.blogspot.com/2009/09/nmrshiftdb-enters-rdfopenmoleculesnet-2.html\">DB.DBA.RDF_LOAD_RDFXML_MT</a>), we can now have our\nregular SPARQL fun with the data from ChemPedia. For example, list the 10 names with the most votes:</p>\n\n<div class=\"language-sparql highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">prefix</span><span class=\"w\"> </span><span class=\"nn\">dc</span><span class=\"o\">:</span><span class=\"w\"> </span><span class=\"nn\">&lt;http://purl.org/dc/elements/1.1/&gt;</span><span class=\"w\">\n</span><span class=\"k\">prefix</span><span class=\"w\"> </span><span class=\"nn\">cp</span><span class=\"o\">:</span><span class=\"w\"> </span><span class=\"nn\">&lt;http://rdf.openmolecules.net/chempedia/onto#&gt;</span><span class=\"w\">\n\n</span><span class=\"k\">select</span><span class=\"w\"> </span><span class=\"k\">distinct</span><span class=\"w\"> </span><span class=\"nv\">?name</span><span class=\"w\"> </span><span class=\"nv\">?score</span><span class=\"w\"> </span><span class=\"k\">where</span><span class=\"w\"> </span><span class=\"p\">{</span><span class=\"w\">\n  </span><span class=\"nv\">?s</span><span class=\"w\"> </span><span class=\"k\">a</span><span class=\"w\"> </span><span class=\"nn\">cp</span><span class=\"o\">:</span><span class=\"ss\">Naming</span><span class=\"w\"> </span><span class=\"p\">;</span><span class=\"w\">\n     </span><span class=\"nn\">cp</span><span class=\"o\">:</span><span class=\"ss\">hasName</span><span class=\"w\"> </span><span class=\"nv\">?name</span><span class=\"w\"> </span><span class=\"p\">;</span><span class=\"w\">\n     </span><span class=\"nn\">cp</span><span class=\"o\">:</span><span class=\"ss\">hasScore</span><span class=\"w\"> </span><span class=\"nv\">?score</span><span class=\"w\"> </span><span class=\"p\">.</span><span class=\"w\">\n</span><span class=\"p\">}</span><span class=\"w\"> </span><span class=\"k\">ORDER</span><span class=\"w\"> </span><span class=\"k\">BY</span><span class=\"w\"> </span><span class=\"k\">DESC</span><span class=\"p\">(</span><span class=\"nv\">?score</span><span class=\"p\">)</span><span class=\"w\"> </span><span class=\"k\">LIMIT</span><span class=\"w\"> </span><span class=\"mi\">10</span><span class=\"w\">\n</span></code></pre></div></div>",
      "summary": "Well, you might spot a pattern here; yes, another chemical SPARQL end point (actually, it shares the end point with the Solubility data). This time around Rich’s ChemPedia. Taking advantage of the CC0-licensed downloads , I have created a small Groovy script (using this JSON library) to convert the ChemPedia JSON into Notation3:",
      
      "date_published": "2009-11-19T00:00:00+00:00",
      "date_modified": "2024-12-30T00:00:00+00:00",
      "tags": ["rdf","sparql","chempedia"],
      "_references": [{ "url": "https://doi.org/10.59350/kprj3-gyg97" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2009/08/21/bioclipse-and-sparql-end-points-2.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2009/08/21/bioclipse-and-sparql-end-points-2.html",
      "title": "Bioclipse and SPARQL end points #2: MyExperiment",
      "content_html": "<p><a href=\"http://en.wikipedia.org/wiki/Resource_Description_Framework\">RDF</a> and <a href=\"http://en.wikipedia.org/wiki/SPARQL\">SPARQL</a>\nare two really useful Open Standards. <a href=\"http://github.com/egonw/bioclipse.rdf/tree/master\">Bioclipse-RDF</a> is a\nplugin for <a href=\"http://www.bioclipse.net/\">Bioclipse</a> that provide RDF functionality, among which using remote SPARQL end points.</p>\n\n<p>The <a href=\"http://www.myexperiment.org/\">MyExperiment</a> team has set up an excellent <a href=\"http://rdf.myexperiment.org/\">RDF front end</a>.\nFor example, this is <a href=\"http://rdf.myexperiment.org/User/286\">my MyExperiment account in RDF</a>. The storage gets updated\nonce every day (at this moment), but I’m sure that will become more often in the future. The SPARQL end point\nallows us to make any query against the database that <a href=\"http://rdf.myexperiment.org/ontologies/\">their ontologies</a>\nsupport. The above query showed up 132 workflows when I ran it today.</p>\n\n<h2 id=\"gists\">Gists</h2>\n\n<p>Now, so far I have been using <a href=\"http://gist.github.com/\">Gist</a> to share Bioclipse scripts and I wrote\nsome <a href=\"http://chem-bla-ics.blogspot.com/2009/01/bioclipse-and-gist-integration.html\">Bioclipse GUI elements for downloading such gists</a>.\nTo annotate these gists, <a href=\"http://delicious.com/\">Delicious</a> has been used, and a listing of Bioclipse scripts can be found under the\ntags <a href=\"http://delicious.com/tag/bioclipse+gist\">bioclipse and gist</a>.</p>\n\n<p>MyExperiment also allows to share workflows, but originally only for <a href=\"http://taverna.sf.net/\">Taverna</a>.\nA recent change, however, made it possible to share other <em>types</em> of workflows too. And, MyExperiment\nitself also allows all the annotation which we may want to do.</p>\n\n<p>Now, using the Bioclipse-RDF functionality, I can query the MyExperiment database and use that information\ndo to stuff. If this stuff is a Bioclipse script, then I can just download it, as the download link of a\nworkflow is part of the RDF too, as we will see.</p>\n\n<h2 id=\"querying-a-sparql-end-point\">Querying a SPARQL end point</h2>\n\n<p>As we have seen in the <a href=\"http://chem-bla-ics.blogspot.com/2009/08/bioclipse-and-sparql-end-points.html\">first article of this series</a>,\nthe RDF manager his a method to query a remote SPARQL end point. The complexity is mostly in formulating the SPARQL (and this one\nhappens to be available as <a href=\"http://www.myexperiment.org/workflows/890\">workflow on MyExperiment too</a>:</p>\n\n<p><img src=\"/assets/images/myExp890.png\" alt=\"\" /></p>\n\n<p>This is worsened by the fact that JavaScript does not have a type of multiline Strings, so the backslashes at\nthe end of the lines are JavaScript syntax and not part of the SPARQL. To simplify the SPARQL, I will show\nbelow the SPARQL only, and not the Bioclipse script wrapping as is done in the above code snippet.</p>\n\n<h2 id=\"list-all-taverna-2-workflows\">List all Taverna 2 workflows</h2>\n\n<p>Listing all Taverna 2 workflows, as shown in that earlier snippet, is done with the SPARQL:</p>\n\n<script src=\"https://gist.github.com/egonw/172138.js\"></script>\n\n<p>This query asks for a <code class=\"language-plaintext highlighter-rouge\">?workflow</code> and its <code class=\"language-plaintext highlighter-rouge\">?title</code>, and the workflow <code class=\"language-plaintext highlighter-rouge\">?type</code> must be of Class <code class=\"language-plaintext highlighter-rouge\">ContentType</code> as defined in the\n<code class=\"language-plaintext highlighter-rouge\">mebase</code> namespace, and we want to know the <code class=\"language-plaintext highlighter-rouge\">?typetitle</code> of that content type, because we are filtering that using a\n<a href=\"http://en.wikipedia.org/wiki/Regular_expression\">regular expression</a> to contain “Taverna 2”. Well, if you cannot\nfollow this, just <a href=\"http://www.bing.com/search?q=sparql+tutorial&amp;go=&amp;form=QBLH&amp;filt=all\">google for SPARQL</a>,\nand run one of those tutorials which are abundantly present on the web.</p>\n\n<h2 id=\"finding-tags-used-to-annotate-workflows\">Finding tags used to annotate workflows</h2>\n\n<p>To list all tags which have likely to do with metabolomics, I can do:</p>\n\n<script src=\"https://gist.github.com/egonw/172277.js\"></script>\n\n<p>And I can also list all workflows that are tagged like this. Because I could not get string matching to work, I used the tag’s URI instead:</p>\n\n<script src=\"https://gist.github.com/egonw/172685.js\"></script>\n\n<h2 id=\"all-myexperiments-users-in-sweden\">All MyExperiments Users in Sweden</h2>\n\n<p>I was also interested in all MyExperiment Users in Sweden, and again, a simple SPARQL tells me where they live:</p>\n\n<script src=\"https://gist.github.com/egonw/172129.js\"></script>\n\n<h2 id=\"finding-duncan-and-pierre\">Finding Duncan and Pierre</h2>\n\n<p>Very easy to find users, such as <a href=\"http://duncan.hull.name/\">Duncan</a>:</p>\n\n<script src=\"https://gist.github.com/egonw/172686.js\"></script>\n\n<p>Or <a href=\"http://plindenbaum.blogspot.com/\">Pierre</a>, who has not listed where he lives:</p>\n\n<script src=\"https://gist.github.com/egonw/172687.js\"></script>\n\n<h2 id=\"my-workflows\">My workflows</h2>\n\n<p>Given a user, it is also easy to get the workflows he <em>owns</em>. Again, I am using my URI instead of combining with a search\nfor my account, because the MyExperiment SPARQL end point is not particularly fast:</p>\n\n<script src=\"https://gist.github.com/egonw/172691.js\"></script>\n\n<p>Earlier in this series:</p>\n\n<ol>\n  <li><a href=\"http://chem-bla-ics.blogspot.com/2009/08/bioclipse-and-sparql-end-points.html\">Bioclipse and SPARQL end points #1: DBPedia</a></li>\n</ol>",
      "summary": "RDF and SPARQL are two really useful Open Standards. Bioclipse-RDF is a plugin for Bioclipse that provide RDF functionality, among which using remote SPARQL end points.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/myExp890.png",
      "date_published": "2009-08-21T00:00:00+00:00",
      "date_modified": "2009-08-21T00:00:00+00:00",
      "tags": ["bioclipse","rdf","foaf","myexperiment","rdf","sparql"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/r8d8f-55c02",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/10/02/jchempaint-history-cml-patches-in-1999.html",
      "title": "JChemPaint history: CML patches in 1999",
      "content_html": "<p>There was some talk about the history of chemoinformatics toolkits by\n<a href=\"http://baoilleach.blogspot.com/2008/09/overview-of-cheminformatics-toolkits.html\">Noel</a> and\n<a href=\"http://www.dalkescientific.com/writings/diary/archive/2008/09/20/euroqsar.html\">Andrew</a>, which made\nme wonder on the exact history of <a href=\"http://www.jmol.org/\">Jmol</a> and\n<a href=\"http://sf.net/project/jchempaint\">JChemPaint</a>. Below is the email\n<a href=\"http://www.steinbeck-molecular.de/steinblog/\">Christoph</a> dug up from his archives:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>X-Mozilla-Status: 1011\nX-Mozilla-Status2: 00000000\nMessage-ID: &lt;372ECD5E.53A49584@ice.mpg.de&gt;\nDate: Tue, 04 May 1999 12:35:10 +0200\nFrom: Christoph Steinbeck\nReply-To: steinbeck@ice.mpg.de\nOrganization: Max-Planck-Institute of Chemical Ecology\nX-Mailer: Mozilla 4.51 [en] (WinNT; I)\nX-Accept-Language: en\nMIME-Version: 1.0\nTo: Egon Willighagen\nSubject: Re: Participating in JChemPaint\nReferences: &lt;000701be9613$34cf52e0$8e74ae83@catv6142.extern.kun.nl&gt;\nContent-Type: text/plain; charset=us-ascii\nContent-Transfer-Encoding: 7bit\n\n&gt; Egon Willighagen wrote:\n&gt;\n&gt; Dear Christoph Steinbeck,\n&gt;\n&gt; Yesterday I visited your site on JChemPaint. I like to contribute some\n&gt; of my expertise on\n&gt; Java and CML (1).\n&gt;\n&gt; CML is a markup language that is able to contain chemical information.\n&gt; It can contain for example physical properties, for which I use CML in\n&gt; my Dictionary on Organic Chemistry (2).\n&gt; But is also might contain spectra, bibliographic references etc. And\n&gt; of course 2D and 3D\n&gt; structural information.\n&gt;\n&gt; Therefore I propose to write both CML-input and -output procedures for\n&gt; the JChemPaint project.\n&gt;\n&gt; I hope to hear from you soon.\n&gt;\n&gt; Yours sincerely,\n&gt;\n&gt; Egon Willighagen\n&gt;\n&gt; 1. http://www.xml-cml.org/\n&gt; 2. http://www.sci.kun.nl/sigma/Chemisch/Woordenboek/\n\nDear Egon,\n\nthanks very much for your mail and your offer to write CML-input and\noutput routines for JChemPaint.\nThat really sounds great to me and I will give you access to our CVS\ntree as soon as we have discussed the details.\n\nCheers,\n\nChris\n\n--C. S.\nDr. Christoph Steinbeck (http://www.ice.mpg.de/~stein)\nMPI of Chemical Ecology, Tatzendpromenade 1a, 07745 Jena, Germany\nTel: +49(0)3641 643644 - MoPho: +49(0)177 8236510 - Fax: +49(0)3641\n643665\n\nWhat is man but that lofty spirit - that sense of enterprise.\n.. Kirk, \"I, Mudd,\" stardate 4513.3..\n</code></pre></div></div>\n\n<p>Now, my email must have been triggered by the <a href=\"http://freshmeat.net/projects/jchempaint/\">announcement of JChemPaint on FreshMeat.net</a>,\nwhich is the oldest public record of JChemPaint I have found so far:</p>\n\n<p><img src=\"/assets/images/fmJChemPaint.png\" alt=\"\" /></p>",
      "summary": "There was some talk about the history of chemoinformatics toolkits by Noel and Andrew, which made me wonder on the exact history of Jmol and JChemPaint. Below is the email Christoph dug up from his archives:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/fmJChemPaint.png",
      "date_published": "2008-10-02T00:00:00+00:00",
      "date_modified": "2008-10-02T00:00:00+00:00",
      "tags": ["jmol","jchempaint","cml"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2008/04/21/open-access-open-data-leads-to-added.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/04/21/open-access-open-data-leads-to-added.html",
      "title": "Open Access / Open Data leads to added value",
      "content_html": "<p>Two companies recently showed two things:</p>\n\n<ul>\n  <li>open access and open data allow adding value</li>\n  <li>adding value is easier by forking</li>\n</ul>\n\n<p><a href=\"http://depth-first.com/\">Rich</a>’ <a href=\"http://metamolecular.com/\">MetaMolecular</a> set up <a href=\"http://depth-first.com/articles/2008/04/17/user-created-compound-monographs-on-chempedia-net-open-sourcing-the-collation-and-indexing-of-chemical-information\">Chempedia</a>\nwhich combines a substructure-searchable chemical <a href=\"http://wikipedia.org/\">Wikipedia</a>. There is also a\n<a href=\"http://chempedia.net/articles/new\">page to make links</a> to new Wikipedia monographs. Not sure why Rich chose CAS instead of the InChI,\ngiven the recent <a href=\"http://chem-bla-ics.blogspot.com/2008/03/chemical-object-identifier-or-freedom.html\">controversy on validity of CAS numbers in Wikipedia</a>…\nrealize that this page is for new monograph, of which the CAS number is likely not verified yet, or? On the other hand, the InChI or InChIKey is\n<a href=\"http://chem-bla-ics.blogspot.com/2007/11/molecules-in-wikipedia-without-inchis-3.html\">not so abundant in Wikipedia yet</a> (I really must make an updated list).</p>\n\n<p><a href=\"http://www.chemspider.com/\">ChemSpider</a> has been using a similar approach to add value to existing resources. The interesting thing in\nthis case, is that these substructure searchable versions, have an interesting spin off: it allows ChemSpider to build a valuable\nDOI-InChI table. So far, I spotted:</p>\n\n<ul>\n  <li><a href=\"http://iucr.chemspider.com/\">iucr.chemspider.com</a> (<a href=\"http://www.chemspider.com/blog/chemspider-rolls-out-website-connected-to-international-union-of-crystallography.html\">Antony’s story</a>)</li>\n  <li><a href=\"http://molbank.chemspider.com/\">molbank.chemspider.com</a> (<a href=\"http://www.chemspider.com/blog/one-more-dedicated-chemspider-website-molbank.html\">Antony’s story</a>)</li>\n  <li><a href=\"https://chem-bla-ics.blogspot.com/2008/04/motd.chemspider.com\">motd.chemspider.com</a> (<a href=\"http://www.chemspider.com/blog/dedicated-search-pages-for-subsets-of-data.html\">Antony’s story</a>)</li>\n</ul>\n\n<p>If you wonder how to integrate all data again when things are so distributed, just consider\n<a href=\"http://chem-bla-ics.blogspot.com/2007/12/christmas-presents.html\">userscripts</a>.</p>",
      "summary": "Two companies recently showed two things:",
      
      "date_published": "2008-04-21T00:00:00+00:00",
      "date_modified": "2008-04-21T00:00:00+00:00",
      "tags": ["chempedia","openscience","chemspider","rdf"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2008/01/02/open-lab-2007-results.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2008/01/02/open-lab-2007-results.html",
      "title": "Open Lab 2007 results",
      "content_html": "<p>The results for the <a href=\"http://scienceblogs.com/clock/2008/01/open_lab_2007_the_winning_entr.php\">Open Lab 2007 are out <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>.\nI participated in this endeavor as judge, and read 75 of the 486 blog items, focusing on the sections <em>chemistry,\nblogging, publishing, politics of science</em>, and a number of blog items with few reviews when I passed them.</p>\n\n<p>I am happy to see that one of the <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/12/04/my-open-laboratory-2007-submissions.html\">chemistry submission I made myself <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nmade it into the anthology: the <a href=\"http://depth-first.com/\">Depth-First</a> item on\n<a href=\"https://doi.org/10.59350/rpn9h-qay37\">SMILES and Aromaticity: Broken? <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.\nCongratulations, Rich!</p>",
      "summary": "The results for the Open Lab 2007 are out . I participated in this endeavor as judge, and read 75 of the 486 blog items, focusing on the sections chemistry, blogging, publishing, politics of science, and a number of blog items with few reviews when I passed them.",
      
      "date_published": "2008-01-02T00:00:00+00:00",
      "date_modified": "2025-01-05T00:00:00+00:00",
      "tags": ["openlab"],
      "_references": [{ "url": "https://doi.org/10.59350/rpn9h-qay37" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/12/04/my-open-laboratory-2007-submissions.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/12/04/my-open-laboratory-2007-submissions.html",
      "title": "My Open Laboratory 2007 submissions",
      "content_html": "<p><a href=\"https://chem-bla-ics.linkedchemistry.info/2007/11/14/last-call-for-open-laboratory-2007.html\">As promised <i class=\"fa-solid fa-recycle fa-xs\"></i></a>, here is my\nlist of submission for the <a href=\"http://scienceblogs.com/clock/2007/11/open_laboratory_2008_last_call.php\">Open Laboratory 2007</a>:</p>\n\n<ul>\n  <li><a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=412\">Open Data is critical for Reproducible Research</a></li>\n  <li><a href=\"http://www.thechemblog.com/?p=678\">If you ever made something fluoresce after you did a reaction with a transition metal…</a></li>\n  <li><a href=\"http://pipeline.corante.com/archives/2007/11/02/one_for_the_brave.php\">One For the Brave</a></li>\n  <li><a href=\"http://curlyarrow.blogspot.com/2007/04/fun-with-singlet-oxygen.html\">Fun with singlet oxygen</a></li>\n  <li><a href=\"http://depth-first.com/articles/2007/11/28/smiles-and-aromaticity-broken\">SMILES and Aromaticity: Broken?</a></li>\n  <li><a href=\"http://totallysynthetic.com/blog/?p=785\">Resveratrol-Based Natural Products</a></li>\n  <li><a href=\"http://usefulchem.blogspot.com/2007/02/making-anti-malarials-feb-2007-update.html\">Making Anti-Malarials: Feb 2007 Update</a></li>\n</ul>\n\n<p>BTW, even though <a href=\"http://www.scienceblogs.com/strangerfruit/2007/11/open_lab_2007.php\">the judges have started</a>\ntheir way through the submissions, you can still <a href=\"http://openlab.wufoo.com/forms/submission-form/\">submit entries</a>.</p>",
      "summary": "As promised , here is my list of submission for the Open Laboratory 2007:",
      
      "date_published": "2007-12-04T00:00:00+00:00",
      "date_modified": "2025-01-05T00:00:00+00:00",
      "tags": ["openlab"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/11/14/last-call-for-open-laboratory-2007.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/11/14/last-call-for-open-laboratory-2007.html",
      "title": "Last Call for Open Laboratory 2007",
      "content_html": "<p><a href=\"http://pbeltrao.blogspot.com/\">Pedro</a> <a href=\"http://pbeltrao.blogspot.com/2007/11/last-call-for-open-laboratory-2007.html\">reminded me</a>\nof the last call for <a href=\"http://scienceblogs.com/clock/2007/11/open_laboratory_2008_last_call.php\">Open Laboratory 2007</a>,\nwhich prints the best blog items of 2007 in book form. The list of chemistry contributions is not so large yet, so\n<a href=\"http://openlab.wufoo.com/forms/submission-form/\">go ahead and nominate</a> some of cool chemical blog items of the last year.</p>\n\n<p>I will post my shortlist later this week.</p>",
      "summary": "Pedro reminded me of the last call for Open Laboratory 2007, which prints the best blog items of 2007 in book form. The list of chemistry contributions is not so large yet, so go ahead and nominate some of cool chemical blog items of the last year.",
      
      "date_published": "2007-11-14T00:00:00+00:00",
      "date_modified": "2007-11-14T00:00:00+00:00",
      "tags": ["openlab"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/10/24/one-billion-biochemical-rdf-triples.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/10/24/one-billion-biochemical-rdf-triples.html",
      "title": "One Billion Biochemical RDF Triples!",
      "content_html": "<p>That must be a record! Eric Jain wrote on <a href=\"http://lists.w3.org/Archives/Public/public-semweb-lifesci/\">public-semweb-lifesci</a>:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>The latest release of the UniProt protein database contains just over a\nbillion triples*! PRESS RELEASE :-)\n\nThe data is all available via the (Semantic or otherwise) Web:\n\n  http://beta.uniprot.org/\n\n...or can be bulk-downloaded from:\n\n  ftp://ftp.uniprot.org/\n\n* Counting some reification statements, and assuming no overlap between\n\"named graphs\".\n\nP.S. This should be the last you'll hear from me on this topic -- I'm off\nto new adventures...\n</code></pre></div></div>\n\n<p>I surely hope this is not the last we hear of this huge RDF collection.</p>",
      "summary": "That must be a record! Eric Jain wrote on public-semweb-lifesci:",
      
      "date_published": "2007-10-24T00:00:00+00:00",
      "date_modified": "2007-10-24T00:00:00+00:00",
      "tags": ["uniprot","rdf"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/08/24/jchempaint-too-png-embedded.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/08/24/jchempaint-too-png-embedded.html",
      "title": "JChemPaint too: PNG embedded connectivity tables",
      "content_html": "<p>Rich blogged about Firefly <a href=\"http://depth-first.com/articles/2007/08/01/never-draw-the-same-molecule-twice-image-metadata-for-cheminformatics\">embedding MDL molfiles in PNG images</a>,\nwhich I found <a href=\"http://depth-first.com/articles/2007/08/08/never-draw-the-same-molecule-twice-viewing-image-metadata\">really</a> cool.\nRich and Noel later showed how that metadata <a href=\"http://depth-first.com/articles/2007/08/08/never-draw-the-same-molecule-twice-viewing-image-metadata\">can be retrieved again</a>,\npossibly <a href=\"http://baoilleach.blogspot.com/2007/08/access-embedded-molecular-information.html\">with Python</a>.</p>\n\n<p>But I did not like that <a href=\"http://depth-first.com/articles/tag/firefly\">Firefly</a> could do this, and <a href=\"http://www.mdpi.org/molecules/html/50100093.htm\">JChemPaint</a> not.\nSo, I started hacking. First I discovered I had to get rid of the use of <a href=\"http://java.sun.com/javase/technologies/desktop/media/jai/\">JAI</a>; then I had to adapt the\nJChemPaintPanel <code class=\"language-plaintext highlighter-rouge\">takeSnaphot()</code> API to return a <code class=\"language-plaintext highlighter-rouge\">RendererImage</code>; and finally, I had to figure out how to write the extra metadata. Now, Firefly is not opensource\n(yet), so it took me some time to figure out how that was done, and this is how:</p>\n\n<div class=\"language-java highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nc\">ImageWriter</span> <span class=\"n\">writer</span> <span class=\"o\">=</span> <span class=\"nc\">ImageIO</span><span class=\"o\">.</span><span class=\"na\">getImageWriters</span><span class=\"o\">(</span>\n  <span class=\"k\">new</span> <span class=\"nf\">ImageTypeSpecifier</span><span class=\"o\">(</span><span class=\"n\">awtImage</span><span class=\"o\">),</span> <span class=\"s\">\"png\"</span>\n<span class=\"o\">).</span><span class=\"na\">next</span><span class=\"o\">();</span>\n<span class=\"nc\">ImageTypeSpecifier</span> <span class=\"n\">specifier</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"nc\">ImageTypeSpecifier</span><span class=\"o\">(</span><span class=\"n\">awtImage</span><span class=\"o\">);</span>\n<span class=\"nc\">IIOMetadata</span> <span class=\"n\">meta</span> <span class=\"o\">=</span> <span class=\"n\">writer</span><span class=\"o\">.</span><span class=\"na\">getDefaultImageMetadata</span><span class=\"o\">(</span> <span class=\"n\">specifier</span><span class=\"o\">,</span> <span class=\"kc\">null</span> <span class=\"o\">);</span>\n\n<span class=\"nc\">Node</span> <span class=\"n\">node</span> <span class=\"o\">=</span> <span class=\"n\">meta</span><span class=\"o\">.</span><span class=\"na\">getAsTree</span><span class=\"o\">(</span> <span class=\"s\">\"javax_imageio_png_1.0\"</span> <span class=\"o\">);</span>\n<span class=\"nc\">IIOMetadataNode</span> <span class=\"n\">tExtNode</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"nc\">IIOMetadataNode</span><span class=\"o\">(</span><span class=\"s\">\"tEXt\"</span><span class=\"o\">);</span>\n<span class=\"nc\">IIOMetadataNode</span> <span class=\"n\">tExtEntryNode</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"nc\">IIOMetadataNode</span><span class=\"o\">(</span><span class=\"s\">\"tEXtEntry\"</span><span class=\"o\">);</span>\n<span class=\"n\">tExtEntryNode</span><span class=\"o\">.</span><span class=\"na\">setAttribute</span><span class=\"o\">(</span> <span class=\"s\">\"keyword\"</span><span class=\"o\">,</span> <span class=\"s\">\"molfile\"</span> <span class=\"o\">);</span>\n<span class=\"n\">tExtEntryNode</span><span class=\"o\">.</span><span class=\"na\">setAttribute</span><span class=\"o\">(</span> <span class=\"s\">\"value\"</span><span class=\"o\">,</span> <span class=\"n\">mdlMolfile</span><span class=\"o\">);</span>\n<span class=\"n\">tExtNode</span><span class=\"o\">.</span><span class=\"na\">appendChild</span><span class=\"o\">(</span><span class=\"n\">tExtEntryNode</span><span class=\"o\">);</span>\n<span class=\"n\">node</span><span class=\"o\">.</span><span class=\"na\">appendChild</span><span class=\"o\">(</span><span class=\"n\">tExtNode</span><span class=\"o\">);</span>\n<span class=\"n\">meta</span><span class=\"o\">.</span><span class=\"na\">mergeTree</span><span class=\"o\">(</span><span class=\"s\">\"javax_imageio_png_1.0\"</span><span class=\"o\">,</span> <span class=\"n\">node</span><span class=\"o\">);</span>\n<span class=\"nc\">ImageOutputStream</span> <span class=\"n\">ios</span> <span class=\"o\">=</span> <span class=\"nc\">ImageIO</span><span class=\"o\">.</span><span class=\"na\">createImageOutputStream</span><span class=\"o\">(</span>\n  <span class=\"k\">new</span> <span class=\"nf\">FileOutputStream</span><span class=\"o\">(</span><span class=\"n\">filename</span><span class=\"o\">)</span>\n<span class=\"o\">);</span>\n<span class=\"n\">writer</span><span class=\"o\">.</span><span class=\"na\">setOutput</span><span class=\"o\">(</span><span class=\"n\">ios</span><span class=\"o\">);</span>\n<span class=\"n\">writer</span><span class=\"o\">.</span><span class=\"na\">write</span><span class=\"o\">(</span> <span class=\"n\">meta</span><span class=\"o\">,</span> <span class=\"k\">new</span> <span class=\"nc\">IIOImage</span><span class=\"o\">(</span><span class=\"n\">awtImage</span><span class=\"o\">,</span> <span class=\"kc\">null</span><span class=\"o\">,</span> <span class=\"n\">meta</span><span class=\"o\">),</span> <span class=\"kc\">null</span> <span class=\"o\">);</span>\n</code></pre></div></div>\n\n<p>Now I can create my own test files for the <a href=\"http://neksa.blogspot.com/2007/08/strigi-now-extracts-chemical.html\">Strigi’s ability to extract chemical metadata from PNG images</a>.\nHere is the JChemPaint generator PNG image for <a href=\"http://en.wikipedia.org/wiki/Benzophenone\">benzophenone</a>:</p>\n\n<p><img src=\"/assets/images/mdlTest.png\" alt=\"\" /></p>\n\n<p>Another issue, unrelated to this patch, is that writing PNG images changes the location of the structure in the JChemPaint editor,\nand that the placing of the element symbol in image writing is seriously broken. But that will soon be solved with\n<a href=\"https://progz-jchem.blogspot.com/\">Niels’ new renderer</a>.</p>\n\n<p>The metadata looks like:</p>\n\n<p><img src=\"/assets/images/jcpPNGmolfile.png\" alt=\"\" /></p>\n\n<p>(Newlines are lost in the XML display.)</p>\n\n<p>JChemPaint does not yet write InChIs, and it also does not open PNG images for input yet (as Firefly does).</p>",
      "summary": "Rich blogged about Firefly embedding MDL molfiles in PNG images, which I found really cool. Rich and Noel later showed how that metadata can be retrieved again, possibly with Python.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/jcpPNGmolfile.png",
      "date_published": "2007-08-24T00:00:00+00:00",
      "date_modified": "2007-08-24T00:00:00+00:00",
      "tags": ["jchempaint","cheminf"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/08/24/automatic-classification-of-thousands.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/08/24/automatic-classification-of-thousands.html",
      "title": "Automatic Classification of thousands of Crystal Structures",
      "content_html": "<p>Clustering and classification of crystal structures is hot. Parkin hit the <a href=\"http://www.rsc.org/Publishing/Journals/CE/article.asp?doi=b710869a\">front cover</a>\nof <a href=\"http://www.rsc.org/Publishing/Journals/ce/\">CrystEngComm</a> with a story on <em>Comparing entire crystal structures: structural genetic fingerprinting</em>\n(DOI:<a href=\"https://doi.org/10.1039/b704177b\">10.1039/b704177b</a>). Now, the story itself, while rather interesting and well written, has three major flaws:</p>\n\n<ol>\n  <li>the data set it way too small</li>\n  <li>the proposed proof-of-concept is not novel at all</li>\n  <li>they do not cite me</li>\n</ol>\n\n<p>Well, the latter sounds a bit boohoo, and it is :) (BTW, I do like this paper.)</p>\n\n<p>They propose the work as proof-of-concept, but use a very artificial data set of only 12 crystal structures (<a href=\"http://en.wikipedia.org/wiki/Benzene\">benzene</a>\nand eleven <a href=\"http://en.wikipedia.org/wiki/Polycyclic_aromatic_hydrocarbon\">polycyclic aromatic hydrocarbons</a>, like\n<a href=\"http://en.wikipedia.org/wiki/Naphthalene\">naphtalene</a>, <a href=\"http://en.wikipedia.org/wiki/Anthracene\">anthracene</a>,\n<a href=\"http://en.wikipedia.org/wiki/Phenanthrene\">phenanthrene</a>, <a href=\"http://en.wikipedia.org/wiki/Triphenylene\">triphenylene</a>,\n<a href=\"https://en.wikipedia.org/wiki/Pyrene\">pyrene</a>, <a href=\"https://en.wikipedia.org/wiki/Perylene\">perylene</a>, and <a href=\"https://en.wikipedia.org/wiki/Coronene\">coronene</a>).\nWhile such a small set does make a nice example where you can still list all similarities (<code class=\"language-plaintext highlighter-rouge\">0.5*N*(N-1)</code>), it is really too artificial.</p>\n\n<p>Now, you may wonder if I am in the position to criticize this shortcoming, but I think I am. As part of my PhD\nwork, I analyzed this problem myself, and published two years ago the paper <em>Method for the computational comparison\nof crystal structures</em> (DOI:<a href=\"https://doi.org/10.1107/S0108768104028344\">10.1107/S0108768104028344</a>). Apparently,\nParkin was not aware of this publication and did not cite it. I should have went to a crystallography conference\nwith a poster, and advertise my work more. In this paper, I analyzed a data set with 48 crystal structures, manually\nvalidated by visual inspection, resulting in having to compare 1128! crystal structure pairs. Took me two full weeks\nbehind a Silicon Graphics. Yes, I really understand why they took only 12 structures :)</p>\n\n<p>However, there is more prior art. While my approach was based on a new radial distibution function-based whole\ncrystal structure descriptor, my supervisor (<a href=\"http://www.cac.science.ru.nl/people/rwehrens/index.html\">Ron</a>) used\nthe more common powder diffraction pattern and showed in <em>Representing Structural Databases in a Self-Organising Map</em>\n(DOI:<a href=\"https://doi.org/10.1107/S0108768105020331\">10.1107/S0108768105020331</a>) it to be a good enough descriptor for\nclustering of thousands of crystal structures using a <a href=\"http://en.wikipedia.org/wiki/Self-organizing_map\">self-organizing map</a>\n(SOM).</p>\n\n<p>Last week, my second paper in crystallography appeared: <em>Supervised Self-Organizing Maps in Crystal Property and\nStructure Prediction</em> (DOI:<a href=\"https://doi.org/10.1021/cg060872y\">10.1021/cg060872y</a>). In this paper, we show how\nsupervised SOMs (see DOI:<a href=\"https://doi.org/10.1016/j.chemolab.2006.02.003\">10.1016/j.chemolab.2006.02.003</a>) can be\nused for supervised classification and even for property prediction. Note that these supervised SOMs are <em>truly</em>\nsupervised, unlike many earlier modifications of the unsupervised SOMs: the training is supervised.</p>\n\n<p>Finally, another advantage of this last work: the code is open source. The code for the unsupervised SOMs is available as\n<a href=\"http://r-project.org/\">R</a> package: <a href=\"http://cran.r-project.org/src/contrib/Descriptions/kohonen.html\">kohonen</a>; and for\npowder diffraction patterns: <a href=\"http://cran.r-project.org/src/contrib/Descriptions/wccsom.html\">wccsom</a>. Details can be found in\n<a href=\"http://cran.r-project.org/doc/Rnews/Rnews_2006-3.pdf\">this R News issue</a>. The first package is not actually limited to\ncrystal structures, and can be used for any clustering problem. However, the articles mentioned here make use of simulated\ndiffraction patters, and I am not sure there are open source tools to generate those.</p>\n\n<p>BTW, I would still be interested in teaming up with <a href=\"http://wwmm.ch.cam.ac.uk/crystaleye/index.html\">CrystalEye</a> in\none way or another, and couple these data analysis methods to live streams of new crystal structures. Nick, let me\nknow if you are interesting in idea exchange.</p>\n\n<p>Getting back to Parkin’s paper, I do like the work. Hirshfield surfaces are an interesting tool to visualize packing\ncharacteristics, and using them to describe a crystal structure sounds like an interesting idea indeed. I just hope\nthat the method properly scales.</p>",
      "summary": "Clustering and classification of crystal structures is hot. Parkin hit the front cover of CrystEngComm with a story on Comparing entire crystal structures: structural genetic fingerprinting (DOI:10.1039/b704177b). Now, the story itself, while rather interesting and well written, has three major flaws:",
      
      "date_published": "2007-08-24T00:00:00+00:00",
      "date_modified": "2007-08-24T00:00:00+00:00",
      "tags": ["crystal"],
      "_references": [{ "url": "https://doi.org/10.1039/b704177b" },{ "url": "https://doi.org/10.1107/S0108768104028344" },{ "url": "https://doi.org/10.1107/S0108768105020331" },{ "url": "https://doi.org/10.1021/CG060872Y" },{ "url": "https://doi.org/10.1016/j.chemolab.2006.02.003" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/08/22/dapagliflozin-molecular-structure.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/08/22/dapagliflozin-molecular-structure.html",
      "title": "Dapagliflozin: the molecular structure",
      "content_html": "<p>An anonymous reader <a href=\"http://chem-bla-ics.blogspot.com/2007/03/what-is-dapagliflozin.html\">reported</a> that the\n<a href=\"http://www.ama-assn.org/\">American Medical Association</a> <a href=\"http://www.ama-assn.org/ama1/pub/upload/mm/365/dapagliflozin.pdf\">published</a>\nthe structure of dapagliflozin. Here are the details.</p>\n\n<p><img src=\"/blog/assets/images/dapagliflozin.png\" alt=\"\" /></p>\n\n<p>The full name is <em>(2S,3R,4R,5S,6R)-2- [4-chloro-3-(4-ethoxybenzyl)phenyl]-6-(hydroxymethyl)tetrahydro-2H-pyran-3,4,5-triol</em>\nand the PDF report the CAS number <code class=\"language-plaintext highlighter-rouge\">461432-26-8</code>, and\nInChI=1S/C21H25ClO6/c1-2-27-15-6-3-12(4-7-15)9-14-10-13(5-8-16(14)22)21-20(26)19(25)18(24)17(11-23)28-21/h3-8,10,17-21,23-26H,2,9,11H2,1H3/t17-,18-,19+,20-,21+/m1/s1.</p>\n\n<p>I have added this information to Wikipedia, see the <a href=\"http://en.wikipedia.org/wiki/Dapagliflozin\">Dapagliflozin</a> entry.</p>",
      "summary": "An anonymous reader reported that the American Medical Association published the structure of dapagliflozin. Here are the details.",
      "image": "https://chem-bla-ics.linkedchemistry.info/blog/assets/images/dapagliflozin.png",
      "date_published": "2007-08-22T00:10:00+00:00",
      "date_modified": "2025-01-11T00:00:00+00:00",
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/08/22/operator-08-released-new-sechemtic-user.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/08/22/operator-08-released-new-sechemtic-user.html",
      "title": "Operator 0.8 released: a new Sechemtic user script",
      "content_html": "<p><a href=\"http://www.kaply.com/weblog/\">Mike</a> released <a href=\"http://www.kaply.com/weblog/2007/08/21/operator-08-is-available/\">Operator 0.8</a>,\nwhich picks up RDF (RDFa en eRDF) from HTML pages, and adds actions to it. I <a href=\"http://chem-bla-ics.blogspot.com/2007/06/chemical-rdfa-with-operator-in-firefox.html\">blogged earlier about the beta</a>\nand wrote a script for it for <a href=\"http://chem-bla-ics.blogspot.com/2006/12/including-smiles-cml-and-inchi-in.html\">chemical RDFa</a>.\nAt this moment, <a href=\"http://cb.openmolecules.net/\">Chemical blogspace</a> and <a href=\"http://rdf.openmolecules.net/?InChI=1/CH4/h1H4\">RDF for Molecular Space</a>\n(see <a href=\"http://chem-bla-ics.blogspot.com/2007/07/rdf-ing-molecular-space.html\">this blog</a>) are using chemical RDFa to semantically markup molecular information.</p>\n\n<p>The new Operator release (<a href=\"https://addons.mozilla.org/en-US/firefox/addon/4106\">download</a>) has one notable API change:\nit now uses “RDF” as key for semantic information; the add-on now supports eRDF too. So, when installing or updating\nto version 0.8, you also need to update the Sechemtic user script to <a href=\"http://blueobelisk.svn.sf.net/svnroot/blueobelisk/operator/tags/1.1/sechemtic_rdfa_operator.js\">version 1.1</a>\n<a href=\"http://blueobelisk.svn.sf.net/svnroot/blueobelisk/operator/tags/\">or better</a>.</p>\n\n<p>Installing Operator scripts is a bit more work than Greasemonkey userscripts. Save the script to your home directory,\nor any other place you can easily find on the hard disk. After installing the Operator add-on, click the <em>Options</em> button:</p>\n\n<p><img src=\"/blog/assets/images/options.png\" alt=\"\" /></p>\n\n<p>For the RDFa script to work, you need to make sure that the <em>Display style</em> is set to <em>Data formats</em>:</p>\n\n<p><img src=\"/blog/assets/images/options1.png\" alt=\"\" /></p>\n\n<p>Then you can go to the <em>User Scripts</em> tab, and use the <em>New</em> button to add the script you downloaded and saved to your hard disk earlier:</p>\n\n<p><img src=\"/blog/assets/images/options2.png\" alt=\"\" /></p>\n\n<p>Then, after rebooting Firefox (looks like MS-Windows :(), you can go to Chemical blogspace and\n<a href=\"http://cb.openmolecules.net/inchis.php\">look up molecules</a>, and see output like that described in\n<a href=\"http://chemicalblogspace.blogspot.com/2007/06/rdfa-operator-in-action-on-cb.html\">RDFa Operator in action on Cb</a>.</p>",
      "summary": "Mike released Operator 0.8, which picks up RDF (RDFa en eRDF) from HTML pages, and adds actions to it. I blogged earlier about the beta and wrote a script for it for chemical RDFa. At this moment, Chemical blogspace and RDF for Molecular Space (see this blog) are using chemical RDFa to semantically markup molecular information.",
      "image": "https://chem-bla-ics.linkedchemistry.info/blog/assets/images/options1.png",
      "date_published": "2007-08-22T00:00:00+00:00",
      "date_modified": "2007-08-22T00:00:00+00:00",
      "tags": ["semweb","chemistry"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/08/13/touchgraphing-my-blog.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/08/13/touchgraphing-my-blog.html",
      "title": "Touchgraphing my blog",
      "content_html": "<p>Via <a href=\"https://web.archive.org/web/20071101070909/http://www.lexical.org.uk/planetscifoo/\">SciFoo Planet <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>\n(from <a href=\"https://pimm.wordpress.com/2007/08/11/scifoo-links-visualized-by-touchgraph-google-browser/\">Partial immortalization <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>)\nI learned about <a href=\"http://www.touchgraph.com/TGGoogleBrowser.html\">TouchGraph Google</a> (Peter\n<a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=496\">brought it into Chemical blogspace</a>).\nIt’s cool, though not open source. Here’s the touch graph for my blog:</p>\n\n<p><img src=\"/blog/assets/images/touchGraph.png\" alt=\"\" /></p>\n\n<p>As you can see, plenty of <a href=\"https://www.blogspot.com\">blogspot</a> bloggers around me, among which,\nin purple, <a href=\"http://usefulchem.blogspot.com/\">Useful Chemistry</a>. Funny thing is, each time I\nrepeat the Google search, the output is different. Oh, and make sure to drag one of the halos\naround; that will keep you procrastinating for the whole afternoon :)</p>",
      "summary": "Via SciFoo Planet (from Partial immortalization ) I learned about TouchGraph Google (Peter brought it into Chemical blogspace). It’s cool, though not open source. Here’s the touch graph for my blog:",
      "image": "https://chem-bla-ics.linkedchemistry.info/blog/assets/images/touchGraph.png",
      "date_published": "2007-08-13T00:10:00+00:00",
      "date_modified": "2025-01-05T00:00:00+00:00",
      "tags": ["blogging"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/08/13/centralized-or-decentralized.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/08/13/centralized-or-decentralized.html",
      "title": "Centralized or decentralized?",
      "content_html": "<p><a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/\">Peter</a> wondered if data should be stored <a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=497\">centralized or decentralized</a>,\nwhen <a href=\"http://mndoci.com/blog/\">Deepak</a> <a href=\"http://mndoci.com/blog/2007/08/12/freebase-at-scifoo/\">blogged</a> about\n<a href=\"http://freebase.com/\">Freebase</a> and <a href=\"http://www.metaweb.com/\">Metaweb</a>. Now, I haven’t really looked into these\ntwo projects, but the question of centralized versus decentralized is interesting. It’s MySQL versus the world\nwide web; it’s the PubChem compound ID versus the InChI; it’s <a href=\"http://cb.openmolecules.net/rdf/?InChI=1/CH4/h1H4\">http://cb.openmolecules.net/rdf/?InChI=1/CH4/h1H4</a>\nversus <code class=\"language-plaintext highlighter-rouge\">info:inchi/InChI=1/CH4/h1H4</code> (see <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/07/31/rdf-ing-molecular-space.html\">RDF-ing molecular space <i class=\"fa-solid fa-recycle fa-xs\"></i></a>).</p>\n\n<p>Both have advantages and disadvantages (everything does). Google has a huge experience with massive data, and\nis the centralized version of the distributed world wide web. Personally, I tend towards the decentralized\nversion of things. Scales better. The chemical RDF community showed some concerns about scalability of triple\nstores (see e.g. Taylor et al. <em>Bringing Chemical Data onto the Semantic Web</em>, <strong>2006</strong>, DOI <a href=\"https://doi.org/10.1021/ci050378m\">10.1021/ci050378m</a>).\nNow, their tests went up to some 30M triples, which is barely enough to store the InChI, PubChem compound ID, and one chemical name.</p>\n\n<p>So, how would this work for molecules then? I am leaning towards a system where one can query resources about\none molecule, and work ones way through molecular space. Using KEGG, reaction databases, similarity stores,\none could move from molecule to molecule, and add bits of RDF along the way, filling a local RDF store around\nthe actual query I have in mind. For example, if I want to verify that the mass spectrum I found really belongs\nto the molecular structure I have in mind, I would look up in the resources I know about all triples that\nrelate to the putative structure, and do my queries from there. That’s what I would do… (and will do, but\nmore on that later…)</p>",
      "summary": "Peter wondered if data should be stored centralized or decentralized, when Deepak blogged about Freebase and Metaweb. Now, I haven’t really looked into these two projects, but the question of centralized versus decentralized is interesting. It’s MySQL versus the world wide web; it’s the PubChem compound ID versus the InChI; it’s http://cb.openmolecules.net/rdf/?InChI=1/CH4/h1H4 versus info:inchi/InChI=1/CH4/h1H4 (see RDF-ing molecular space ).",
      
      "date_published": "2007-08-13T00:00:00+00:00",
      "date_modified": "2025-01-05T00:00:00+00:00",
      "tags": ["inchi","semweb"],
      "_references": [{ "url": "https://doi.org/10.1021/ci050378m" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/08/11/molecular-connectivity-tables-in-images.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/08/11/molecular-connectivity-tables-in-images.html",
      "title": "Molecular Connectivity Tables in Images",
      "content_html": "<p>Rich blogged about to <a href=\"https://doi.org/10.59350/wgy8j-brx45\">Never Draw the Same Molecule Twice: Viewing Image Metadata <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nin which he shows his molecular editor outputting images of molecular structure where the connectivity table\nof structure is embedded in the image. His molecular editor can read the image again, and will automatically\npick up the embedded connection table. Noel showed that such can not only be done in Java, but\n<a href=\"http://baoilleach.blogspot.com/2007/08/access-embedded-molecular-information.html\">in Python too</a>.</p>\n\n<p>This is important progress, though I would still like to see <a href=\"http://iupac.org/inchi/\">InChI</a>s in the\ndocuments, and/or the data files as supplementary information. Actually, I would even more like to\nsee that all experimental sections not just list the structure name, but give the InChI. An important\nspin-off is that when giving spectral information, the atom numbering given by InChI can be used to\nassociate NMR shifts, and IR wavenumbers to atoms and atom groups, removing the ambiguity in those\nassociations as we are used to find in literature.</p>\n\n<p>Chemistry Central is <a href=\"http://blogs.openaccesscentral.com/blogs/ccblog/entry/symyx_technologies_to_acquire_mdl\">looking into improving the submission process</a>\nfor molecular data, and hereby request the commenting on, taking into account in ongoing internal\ndiscussings, and incorporation of these approaches in the editorial requirements for CC publications:</p>\n\n<ul>\n  <li>including the connection table as metadata in images</li>\n  <li>including the InChI in experimental sections for newly synthesized molecules</li>\n  <li>use InChI atom numbering to associate NMR shifts with atoms in these experimental sections</li>\n</ul>\n\n<p>I will shortly blog an example experimental section incorporating the InChI.</p>",
      "summary": "Rich blogged about to Never Draw the Same Molecule Twice: Viewing Image Metadata in which he shows his molecular editor outputting images of molecular structure where the connectivity table of structure is embedded in the image. His molecular editor can read the image again, and will automatically pick up the embedded connection table. Noel showed that such can not only be done in Java, but in Python too.",
      
      "date_published": "2007-08-11T00:10:00+00:00",
      "date_modified": "2025-01-04T00:00:00+00:00",
      "tags": ["publishing","chemistry","inchi"],
      "_references": [{ "url": "https://doi.org/10.59350/wgy8j-brx45" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/08/11/molecules-in-wikipedia-without-inchis.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/08/11/molecules-in-wikipedia-without-inchis.html",
      "title": "Molecules in Wikipedia without InChIs",
      "content_html": "<p>I reported last week about the <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/08/02/molecules-in-wikipedia.html\">Molecules in Wikipedia <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nand the plethora of templates used. <a href=\"http://cb.openmolecules.net/\">Chemical blogspace</a> has also been using\n<a href=\"http://en.wikipedia.org/\">Wikipedia</a> URLs as molecular identifier and extracting InChIs from the wiki pages (see\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2007/06/19/using-wikipedia-to-recognize-molecules.html\">Using Wikipedia to recognize Molecules in Blogspace <i class=\"fa-solid fa-recycle fa-xs\"></i></a>).\nSeveral people have shown interest in adding InChIs for molecules in Wikipedia, so here’s a new version of a\nlist it molecules without InChIs:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>http://www.en.wikipedia.org/wiki/Hydrogen_cyanide#Hydrogen_cyanide_as_a_chemical_weapon -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/P-Phenylenediamine -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Valence_%28chemistry%29 -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Nitrous_oxide -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Cytisine -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Disulfur_decafluoride -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Mescaline -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Lewisite -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Sulfur_mustard -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Tryptamine -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Interferon_beta-1a -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Methyl_isocyanate -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Anthraquinone -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Tocopherol -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Cinnamic_acid -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Tryptamine -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Psilocybin -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Alphamethyltryptamine -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Alpha-ethyltryptamine -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Allylamine -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Ergosterol -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Squalene -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Sulfur_hexafluoride -&gt; but no InChI/CID\n</code></pre></div></div>\n\n<p>Strictly speaking, the list should be longer, as the code that produced this list actually is also happy\nwhen a PubChem compound identifier (CID) is given. The previous list is also\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2007/06/19/using-wikipedia-to-recognize-molecules.html\">still online <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.</p>",
      "summary": "I reported last week about the Molecules in Wikipedia and the plethora of templates used. Chemical blogspace has also been using Wikipedia URLs as molecular identifier and extracting InChIs from the wiki pages (see Using Wikipedia to recognize Molecules in Blogspace ). Several people have shown interest in adding InChIs for molecules in Wikipedia, so here’s a new version of a list it molecules without InChIs:",
      
      "date_published": "2007-08-11T00:00:00+00:00",
      "date_modified": "2025-01-04T00:00:00+00:00",
      "tags": ["wikipedia","inchi"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/08/02/molecules-in-wikipedia.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/08/02/molecules-in-wikipedia.html",
      "title": "Molecules in Wikipedia",
      "content_html": "<p>I do not care about physical and chemical properties in <a href=\"http://wikipedia.org/\">Wikipedia</a>, as I can easily extract them from other sources.\nThe main value of Wikipedia for molecules is, I think, that it describes the history of a molecule. Additionally, the Wikipedia URL is a\nnice unique molecular identifier (for example <em><a href=\"http://en.wikipedia.org/wiki/Lactose\">http://en.wikipedia.org/wiki/Lactose</a></em>) given certain\nconditions, and many <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/06/19/using-wikipedia-to-recognize-molecules.html\">bloggers are using it as such <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.\nBut, it only is a useful identifier if one (and only one) InChI is stated on the wiki page.</p>\n\n<p>Now that I am <a href=\"http://chem-bla-ics.blogspot.com/2007/07/rdf-ing-molecular-space.html\">RDF-ing molecular space</a>, I was\n<a href=\"http://del.icio.us/url/e24b896a3398220b76d47f59dbdc2634\">again</a> interested in <a href=\"http://dbpedia.org/docs/\">dbpedia</a>, a RDF version of Wikipedia.\nSee these two <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/06/19/quality-of-chemical-database.html\">blog <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\n<a href=\"http://radar.oreilly.com/archives/2007/03/different_appro_1.html\">items</a> and Peter’s very nice\n<a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=333\">dbpedia, RDF and SPARQL - for chemistry</a> item.\n<a href=\"http://www.scs.carleton.ca/~cleger\">Christian</a> is picking this up, and extending dbpedia for support for the various chemical boxes.</p>\n\n<h2 id=\"wikipedia-templates\">Wikipedia Templates</h2>\n\n<p>I have spotted a couple of templates: <a href=\"http://en.wikipedia.org/w/index.php?title=Template:Drugbox\">Drugbox</a>,\n<a href=\"http://en.wikipedia.org/w/index.php?title=Template:Chembox\">Chembox</a>, <a href=\"http://en.wikipedia.org/w/index.php?title=Template:Chembox_new\">Chembox new</a>,\nof which the last one seems to most recent, and has extensions for explosives and drugs. The\n<a href=\"http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Chemicals\">WikiProject Chemicals</a> does not mention it though. Anyone who knows the status?\nIs <em>chembox new</em> the way forward and going to replace the older <em>chembox</em>? I hope so, because only the newer one has InChI in\nthe last of official fields. Or is <em>chembox new</em> simply an extension of <em>chembox</em> itself?</p>\n\n<p>Somewhere between 1000 and 1500 entries use the <em>chembox new</em> and another 1000 to 1500 use <em>chembox</em> but I assume there is\nconsiderable overlap. Additionally, Christian noted that there still seem to be molecules in Wikipedia which do not use a\ntemplate at all, and counted some 1900 molecules using various lists. If you you want to keep a more close eye on chemistry in\ndbpedia, you should register to the <a href=\"http://sourceforge.net/mailarchive/forum.php?forum_name=dbpedia-discussion\">dbpedia-discussion</a>\nmailing list.</p>",
      "summary": "I do not care about physical and chemical properties in Wikipedia, as I can easily extract them from other sources. The main value of Wikipedia for molecules is, I think, that it describes the history of a molecule. Additionally, the Wikipedia URL is a nice unique molecular identifier (for example http://en.wikipedia.org/wiki/Lactose) given certain conditions, and many bloggers are using it as such . But, it only is a useful identifier if one (and only one) InChI is stated on the wiki page.",
      
      "date_published": "2007-08-02T00:00:00+00:00",
      "date_modified": "2025-01-03T00:00:00+00:00",
      "tags": ["chemistry","wikipedia","rdf","inchi"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/08/01/excel-messes-up-your-data-analysis.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/08/01/excel-messes-up-your-data-analysis.html",
      "title": "Excel messes up your data analysis :)",
      "content_html": "<p>Well, no wonder: Excel is meant to be used to process money flows. Anyway, <a href=\"http://del.icio.us/greyarea\">greyarea</a> pointed me to\n<a href=\"http://itre.cis.upenn.edu/~myl/languagelog/archives/002912.html\">this nice blog item</a> from March 2006. It discusses a 2004 article in\n<a href=\"http://www.biomedcentral.com/bmcbioinformatics\">BMC Bioinformatics</a> <em>Mistaken Identifiers: Gene name errors can be introduced\ninadvertently when using Excel in bioinformatics</em> by Barry Zeeberg et al. (DOI:<a href=\"https://doi.org/10.1186/1471-2105-5-80\">10.1186/1471-2105-5-80</a>).\nHence, the importance of semantics and proper markup languages. The quotes are illustrative:</p>\n\n<ul><i>\nWhen we were beta-testing [two new bioinformatics programs] on microarray data, a frustrating problem occurred repeatedly: Some\ngene names kept bouncing back as \"unknown.\" A little detective work revealed the reason: ... A default date conversion feature in\nExcel ... was altering gene names that it considered to look like dates. For example, the tumor suppressor DEC1 [Deleted in\nEsophageal Cancer 1] was being converted to '1-DEC.' Figure 1 lists 30 gene names that suffer an analogous fate.<br /><br />\n\n...<br /><br />\n\nThere is another default conversion problem for RIKEN clone identifiers identifiers of the form nnnnnnnEnn, where n denotes a\ndigit. These identifiers are comprised of the serial number of the plate that contains the library, information on plate status,\nand the address of the clone. A search ... identified more than 2,000 such identifiers out of a total set of 60,770. For example,\nthe RIKEN identifier \"2310009E13\" was converted irreversibly to the floating-point number \"2.31E+13.\" A non-expert user might\nwell fail to notice that approximately 3% of the identifiers on a microarray with tens of thousands of genes had been converted\nto an incorrect form, yet the potential for 2,000 identifiers to be transmogrified without notice is a considerable concern. Most\nimportant, these conversions to an internal date representation or floating-point number format are irreversible; the original\ngene name cannot be recovered.\n</i></ul>\n\n<p>Is this the article that made all bioinformaticians turn to R?</p>",
      "summary": "Well, no wonder: Excel is meant to be used to process money flows. Anyway, greyarea pointed me to this nice blog item from March 2006. It discusses a 2004 article in BMC Bioinformatics Mistaken Identifiers: Gene name errors can be introduced inadvertently when using Excel in bioinformatics by Barry Zeeberg et al. (DOI:10.1186/1471-2105-5-80). Hence, the importance of semantics and proper markup languages. The quotes are illustrative:",
      
      "date_published": "2007-08-01T00:00:00+00:00",
      "date_modified": "2007-08-01T00:00:00+00:00",
      "tags": ["bioinfo","excel"],
      "_references": [{ "url": "https://doi.org/10.1186/1471-2105-5-80" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/07/31/rdf-ing-molecular-space.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/07/31/rdf-ing-molecular-space.html",
      "title": "RDF-ing molecular space",
      "content_html": "<p><a href=\"http://en.wikipedia.org/wiki/Resource_Description_Framework\">RDF</a> might be the solution we are looking for to get a grip\non the huge amount of information we are facing. <a href=\"http://chem-bla-ics.blogspot.com/2007/05/microformats-in-chemistry.html\">microformats</a>,\nand <a href=\"http://chem-bla-ics.blogspot.com/2007/06/chemical-rdfa-with-operator-in-firefox.html\">RDFa</a>, are just solutions along the way,\nand Gleaning Resource Descriptions from Dialects of Languages (<a href=\"http://www.w3.org/2004/01/rdxh/spec\">GRDDL</a>) might be\nan important tool to get the web RDF-ied.</p>\n\n<p>One important aspect of RDF is that <a href=\"http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#section-Graph-URIref\">any resource has a unique URI</a>.\nThese make look like a URL or even like <code class=\"language-plaintext highlighter-rouge\">urn:doi:10.1186/1471-2105-8-59</code>. The recent blogs by Pierre\n(<em><a href=\"http://plindenbaum.blogspot.com/2007/07/url-1-lsid-1.html\">URL +1, LSID -1</a></em>) and Roderic\n(<em><a href=\"http://iphylo.blogspot.com/2007/06/rethinking-lsids-versus-http-uri.html\">Rethinking LSIDs versus HTTP URI</a></em>)\nillustrate the pro and cons of the different alternatives.</p>\n\n<h2 id=\"bioguid\">bioGUID</h2>\n\n<p>As usual, the bioinformaticians are less conservative and ahead of chemists in trying new options, and several interesting\nwebsite have emerged. For example, <a href=\"http://bioguid.info/\">bioGUID</a> makes the bridge between a simple URI and a resolvable URL.\nAnd, importantly, it spit RDF. This is the output for <a href=\"http://bioguid.info/doi:10.1109/MIS.2006.62\">http://bioguid.info/doi:10.1109/MIS.2006.62</a>:</p>\n\n<div class=\"language-xml highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"cp\">&lt;?xml version=\"1.0\" encoding=\"utf-8\"?&gt;</span>\n<span class=\"cp\">&lt;?xml-stylesheet type=\"text/xsl\" href=\"http://bioguid.info/xsl/html.xsl\"?&gt;</span>\n<span class=\"nt\">&lt;rdf:RDF</span> <span class=\"na\">xmlns:bioguid=</span><span class=\"s\">\"http://bioguid.info/schema/0.1/\"</span> \n  <span class=\"na\">xmlns:rdfs=</span><span class=\"s\">\"http://www.w3.org/2000/01/rdf-schema#\"</span>\n  <span class=\"na\">xmlns:rss=</span><span class=\"s\">\"http://purl.org/rss/1.0/\"</span> \n  <span class=\"na\">xmlns:prism=</span><span class=\"s\">\"http://prismstandard.org/namespaces/1.2/basic/\"</span>\n  <span class=\"na\">xmlns:dcterms=</span><span class=\"s\">\"http://purl.org/dc/terms/\"</span> \n  <span class=\"na\">xmlns:dc=</span><span class=\"s\">\"http://purl.org/dc/elements/1.1/\"</span>\n  <span class=\"na\">xmlns:rdf=</span><span class=\"s\">\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\"</span><span class=\"nt\">&gt;</span>\n  <span class=\"nt\">&lt;rdf:Description</span> <span class=\"na\">rdf:about=</span><span class=\"s\">\"http://bioguid.info/doi:10.1109/MIS.2006.62\"</span><span class=\"nt\">&gt;</span>\n    <span class=\"nt\">&lt;rdf:type</span> <span class=\"na\">rdf:resource=</span><span class=\"s\">\"http://bioguid.info/schema/0.1/Publication\"</span><span class=\"nt\">/&gt;</span>\n    <span class=\"nt\">&lt;rdfs:comment&gt;</span>Generated by transforming XML returned by CrossRef's\n      OpenURL service.<span class=\"nt\">&lt;/rdfs:comment&gt;</span>\n    <span class=\"nt\">&lt;dc:creator&gt;</span>Shadbolt<span class=\"nt\">&lt;/dc:creator&gt;</span>\n    <span class=\"nt\">&lt;dc:title&gt;</span>The Semantic Web Revisited<span class=\"nt\">&lt;/dc:title&gt;</span>\n    <span class=\"nt\">&lt;dcterms:issued&gt;</span>2006<span class=\"nt\">&lt;/dcterms:issued&gt;</span>\n\n    <span class=\"nt\">&lt;prism:publicationDate&gt;</span>2006<span class=\"nt\">&lt;/prism:publicationDate&gt;</span>\n    <span class=\"nt\">&lt;dc:identifier</span> <span class=\"na\">rdf:resource=</span><span class=\"s\">\"doi:10.1109/MIS.2006.62\"</span><span class=\"nt\">/&gt;</span>\n    <span class=\"nt\">&lt;rdfs:comment&gt;</span>info URI scheme<span class=\"nt\">&lt;/rdfs:comment&gt;</span>\n    <span class=\"nt\">&lt;dc:identifier</span> <span class=\"na\">rdf:resource=</span><span class=\"s\">\"info:doi/10.1109/MIS.2006.62\"</span><span class=\"nt\">/&gt;</span>\n    <span class=\"nt\">&lt;rdfs:comment&gt;</span>CrossRef resolver<span class=\"nt\">&lt;/rdfs:comment&gt;</span>\n    <span class=\"nt\">&lt;rss:link&gt;</span>http://dx.doi.org/10.1109/MIS.2006.62<span class=\"nt\">&lt;/rss:link&gt;</span>\n    <span class=\"nt\">&lt;prism:publicationName&gt;</span>IEEE Intelligent Systems<span class=\"nt\">&lt;/prism:publicationName&gt;</span>\n\n    <span class=\"nt\">&lt;prism:volume&gt;</span>21<span class=\"nt\">&lt;/prism:volume&gt;</span>\n    <span class=\"nt\">&lt;prism:number&gt;</span>3<span class=\"nt\">&lt;/prism:number&gt;</span>\n    <span class=\"nt\">&lt;prism:startingPage&gt;</span>96<span class=\"nt\">&lt;/prism:startingPage&gt;</span>\n    <span class=\"nt\">&lt;prism:issn&gt;</span>10947167<span class=\"nt\">&lt;/prism:issn&gt;</span>\n  <span class=\"nt\">&lt;/rdf:Description&gt;</span>\n<span class=\"nt\">&lt;/rdf:RDF&gt;</span>\n</code></pre></div></div>\n\n<p>(BTW, interesting is the use of XSLT to create HTML; it’s doing the opposite of GRDDL! And this is probably the right way. Cheers Roderic!)</p>\n\n<h2 id=\"inchi\">InChI</h2>\n\n<p>I wanted something similar for molecules. The unique identifier is the <a href=\"http://iupac.org/inchi/\">InChI</a>, of course. The InChI itself is\nnot a proper URI, so I set up a webpage to work around that (if only I had realized this some time ago, I would have urged IUPAC to use\nthe prefix ‘inchi:’ instead of ‘InChI=’). The result is, currently, looking like\n<a href=\"http://cb.openmolecules.net/rdf/rdf.php?InChI=1/CH4/h1H4\">http://cb.openmolecules.net/rdf/rdf.php?InChI=1/CH4/h1H4</a>.\nI do not use a XSLT yet, but will do so shortly. The RDF looks like:</p>\n\n<div class=\"language-xml highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nt\">&lt;rdf:RDF</span>\n<span class=\"na\">xmlns:rdf=</span><span class=\"s\">\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\"</span>\n<span class=\"na\">xmlns:iupac=</span><span class=\"s\">\"http://www.iupac.org/\"</span><span class=\"nt\">&gt;</span>\n\n<span class=\"nt\">&lt;rdf:Description</span>\n <span class=\"na\">rdf:about=</span><span class=\"s\">\"http://cb.openmolecules.net/rdf/?InChI=1/CH4/h1H4\"</span><span class=\"nt\">&gt;</span>\n\n <span class=\"nt\">&lt;iupac:inchi&gt;</span>InChI=1/CH4/h1H4<span class=\"nt\">&lt;/iupac:inchi&gt;</span>\n\n <span class=\"nt\">&lt;pubchem:cid</span> <span class=\"na\">xmlns:pubchem=</span><span class=\"s\">\"http://pubchem.ncbi.nlm.nih.gov/#\"</span><span class=\"nt\">&gt;</span>297<span class=\"nt\">&lt;/pubchem:cid&gt;</span>\n <span class=\"nt\">&lt;pubchem:name</span> <span class=\"na\">xmlns:pubchem=</span><span class=\"s\">\"http://pubchem.ncbi.nlm.nih.gov/#\"</span><span class=\"nt\">&gt;</span>methane<span class=\"nt\">&lt;/pubchem:name&gt;</span>\n <span class=\"nt\">&lt;cb:discussedBy</span> <span class=\"na\">xmlns:cb=</span><span class=\"s\">\"http://cb.openmolecules.net/#\"</span><span class=\"nt\">&gt;</span>http://chemistrylabnotebook.blogspot.com/2007/04/space-final-frontier.html<span class=\"nt\">&lt;/cb:discussedBy&gt;</span>\n <span class=\"nt\">&lt;cb:discussedBy</span> <span class=\"na\">xmlns:cb=</span><span class=\"s\">\"http://cb.openmolecules.net/#\"</span><span class=\"nt\">&gt;</span>http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=299<span class=\"nt\">&lt;/cb:discussedBy&gt;</span>\n <span class=\"nt\">&lt;cb:discussedBy</span> <span class=\"na\">xmlns:cb=</span><span class=\"s\">\"http://cb.openmolecules.net/#\"</span><span class=\"nt\">&gt;</span>http://chem-bla-ics.blogspot.com/2006/12/smiles-cas-and-inchi-in-blogs.html<span class=\"nt\">&lt;/cb:discussedBy&gt;</span>\n <span class=\"nt\">&lt;cb:discussedBy</span> <span class=\"na\">xmlns:cb=</span><span class=\"s\">\"http://cb.openmolecules.net/#\"</span><span class=\"nt\">&gt;</span>http://chem-bla-ics.blogspot.com/2007/02/invisible-inchis.html<span class=\"nt\">&lt;/cb:discussedBy&gt;</span>\n\n<span class=\"nt\">&lt;/rdf:Description&gt;</span>\n\n<span class=\"nt\">&lt;/rdf:RDF&gt;</span>\n</code></pre></div></div>\n\n<p>The system uses PHP to create the output, and has a basis pluggable system: a plugin basically spits a RDF fragment for\nthe given InChI, and at this moment it only has a plugin for <a href=\"http://cb.openmolecules.net/\">Cb</a>, but I plan a few more.\nIt needs some tuning and any and all feedback is most welcome. Note that the actual URI might change a bit.</p>",
      "summary": "RDF might be the solution we are looking for to get a grip on the huge amount of information we are facing. microformats, and RDFa, are just solutions along the way, and Gleaning Resource Descriptions from Dialects of Languages (GRDDL) might be an important tool to get the web RDF-ied.",
      
      "date_published": "2007-07-31T00:00:00+00:00",
      "date_modified": "2007-07-31T00:00:00+00:00",
      "tags": ["chemistry","rdf","inchi"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/07/26/further-bioclipse-qsar-functionality.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/07/26/further-bioclipse-qsar-functionality.html",
      "title": "Further Bioclipse QSAR functionality development",
      "content_html": "<p>I had some time to <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/06/27/qsar-plugin-for-bioclipse-getting-in.html\">work some more on the QSAR functionality <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nin <a href=\"http://www.bioclipse.net/\">Bioclipse</a>. There is still much to do, but it is getting there. The calculation of a QSAR descriptor data matrix</p>\n\n<p><img src=\"/assets/images/qsarJob.png\" alt=\"\" /></p>\n\n<p>This screenshot shows that multi-resource selection is now working, and that the calculation is now a Job. The resulting matrix looks like:</p>\n\n<p><img src=\"/assets/images/qsarJob1.png\" alt=\"\" /></p>\n\n<p>Things that remain to be done:</p>\n\n<ul>\n  <li>work on a SDF resource</li>\n  <li>a graph view for the matrix</li>\n  <li><a href=\"http://www.r-project.org/\">R</a> functionality for the matrices</li>\n  <li><a href=\"http://joelib.sf.net/\">JOELib</a> support</li>\n</ul>",
      "summary": "I had some time to work some more on the QSAR functionality in Bioclipse. There is still much to do, but it is getting there. The calculation of a QSAR descriptor data matrix",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/qsarJob.png",
      "date_published": "2007-07-26T00:00:00+00:00",
      "date_modified": "2025-01-02T00:00:00+00:00",
      "tags": ["cdk","qsar","bioclipse","joelib"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/07/20/osra-gpl-ed-molecule-drawing-to-smiles.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/07/20/osra-gpl-ed-molecule-drawing-to-smiles.html",
      "title": "OSRA: GPL-ed molecule drawing to SMILES convertor",
      "content_html": "<p>Igor wrote a message to the <a href=\"http://www.ccl.net/chemistry/sub_unsub.shtml\">CCL mailing list</a> about\n<a href=\"http://cactus.nci.nih.gov/osra/\">OSRA</a>:</p>\n\n<ul><i>\nWe would like to announce a new addition to the set of chemoinformatics tools available from the Computer-Aided Drug Design Group\nat the NCI-Frederick. OSRA is a utility designed to convert graphical representations of chemical structures, such as they appear\nin journal articles, patent documents, textbooks, trade magazines etc., into SMILES.<br /><br />\n\nOSRA can read a document in any of the over 90 graphical formats parseable by ImageMagick (GIF, JPEG, PNG, TIFF, PDF, PS etc.) and\ngenerate the SMILES representation of the molecular structure images encountered within that document.\n</i></ul>\n\n<p>The email does not give any information on the fail rate, but the demo they provide via the\n<a href=\"http://cactus.nci.nih.gov/cgi-bin/osra/index.cgi\">webinterface</a> does show some minor glitches (the bromine is not recognized):</p>\n\n<p><img src=\"/assets/images/osra.png\" alt=\"\" /></p>\n\n<p>The source reuses <a href=\"http://openbabel.sf.net/\">OpenBabel</a> and uses the GPL license. The value equal to that of text mining tools like\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2006/06/22/text-mining-for-chemistry-using-oscar3.html\">OSCAR3 <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\nand together they sounds like the Jordan and Pippen of mining chemical literature.</p>",
      "summary": "Igor wrote a message to the CCL mailing list about OSRA:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/osra.png",
      "date_published": "2007-07-20T00:10:00+00:00",
      "date_modified": "2025-01-02T00:00:00+00:00",
      "tags": ["cheminf","openbabel"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/07/20/screencasts-for-life-science.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/07/20/screencasts-for-life-science.html",
      "title": "Screencasts for life science informatics",
      "content_html": "<p><a href=\"http://mndoci.com/blog/\">Deepak</a> blogged about <a href=\"http://mndoci.com/blog/2007/07/18/bioscreencastcom-02/\">screencasting for bio topics</a>,\nconcentrated at <a href=\"https://web.archive.org/web/20070701050807/http://bioscreencast.com/\">bioscreencast.com <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>\nof which he is co-owner. I guess it is like a YouTube for\nbioinformatics thingies. <a href=\"http://usefulchem.blogspot.com/\">Jean-Claude</a> picked this up very quickly (seen on\n<a href=\"http://cb.openmolecules.net/\">Cb</a>? At least I did.), and already uploaded a screencast,\n<a href=\"https://web.archive.org/web/*/http://bioscreencast.com/bsc_movwin.html*\">demoing JSpecView <i class=\"fa-solid fa-link-slash fa-xs\"></i></a>\nwritten by <a href=\"http://wwwchem.uwimona.edu.jm:1104/chrl.html\">Robert</a>. I wonder if he will upload the\n<a href=\"http://usefulchem.blogspot.com/2006/07/cml-in-rss-feeds.html\">screencasts he made for</a>\n<a href=\"http://www.bioclipse.net/\">Bioclipse</a> too? (hint, hint … :)</p>\n\n<p>I have no idea if this site will be a success, but at least it has the right ingredients: tags, flash movies, clean UI, a\n<a href=\"http://bioscreencast.wordpress.com/\">blog to monitor technological changes and improvements</a>, and a page to\nrequest screencasts (with voting). What I only miss is a one summary page for each screencast to which I can\neasily link, for example for my <a href=\"http://del.icio.us/egonw\">del.icio.us</a> account.</p>",
      "summary": "Deepak blogged about screencasting for bio topics, concentrated at bioscreencast.com of which he is co-owner. I guess it is like a YouTube for bioinformatics thingies. Jean-Claude picked this up very quickly (seen on Cb? At least I did.), and already uploaded a screencast, demoing JSpecView written by Robert. I wonder if he will upload the screencasts he made for Bioclipse too? (hint, hint … :)",
      
      "date_published": "2007-07-20T00:00:00+00:00",
      "date_modified": "2024-12-29T00:00:00+00:00",
      "tags": ["bioinfo"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/07/16/cdk-data-model-1.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/07/16/cdk-data-model-1.html",
      "title": "The CDK data model #1",
      "content_html": "<p>The <a href=\"http://cdk.sf.net/\">Chemistry Development Kit</a> has a rich set of data classes, each of which is\n<a href=\"http://cdk.svn.sf.net/svnroot/cdk/trunk/cdk/src/org/openscience/cdk/interfaces/IChemObject.java\">defined by an interface</a>.\nWhile the classes for atoms, bonds and a connectivity table are fairly straightforward, but beyond that it is sometimes\nnot entirely clear. I will now discuss all interfaces in a series of blog items. I’ll start with the IChemFile.\n<a href=\"http://wiki.cubic.uni-koeln.de/blog/\">Christoph</a>, please correct me if I move to far away from our Notre Dame board sketch.</p>\n\n<h2 id=\"ichemfile\">IChemFile</h2>\n\n<p>The <a href=\"http://cdk.sourceforge.net/api/org/openscience/cdk/interfaces/IChemFile.html\">IChemFile</a> is the class to\nhold a chemical document, e.g. a MDL molfile or a PDB file. The idea of this class is that it can hold anything we\ncan expect from a chemical document. But nothing beyond that either; a XHTML document with embedded CML is outside\nthe scope of a IChemFile. You might wonder why the <a href=\"http://cdk.sourceforge.net/api/org/openscience/cdk/io/IChemObjectReader.html\">IChemObjectReaders</a>\nnot always just return a IChemFile. That would be a fair point, any many actually do, but somethings it is handier\nto return an IMolecule. A reader for MDL molfiles would be expected to return a IMolecule.</p>\n\n<p>However, a document may contain much more, and the approach taken by the CDK is that a file contains one or more\nmodels. A MDL molfile is an example document with one model, while a MDL SD file would be a document with more than\none model.</p>\n\n<h2 id=\"ichemsequence\">IChemSequence</h2>\n\n<p>However, the IChemFile can hold more than one <a href=\"http://cdk.sourceforge.net/api/org/openscience/cdk/interfaces/IChemSequence.html\">IChemSequence</a>.\nNow, I honestly cannot remember why that is; a single IChemSequence should be enough. And, I actually do not remember\nmore than one IChemSequence being used. (Anyone?) As said, the IChemSequence contains IChemModels, and nothing more\nreally. The interface therefore just contains the basic logic of a list. Let’s move on.</p>\n\n<h2 id=\"ichemmodel\">IChemModel</h2>\n\n<p>The <a href=\"http://cdk.sourceforge.net/api/org/openscience/cdk/interfaces/IChemModel.html\">IChemModel</a> is much more interesting.\nIn the CDK a model is defined as anything that occurs in one actual volume of 3D (or 2D) space. A CIF file with a\ncrystal structures is, therefore, one IChemModel. A supramolecular aggregation of lipids, e.g. a mono- or bilayer,\nwould be IChemModel too. This could be a time step in a molecular dynamics run. Additionally, the IChemModel may\nalso be a chemical reaction, possibly a multistep reaction. It could be, for example, a enzyme reaction mechanism\n<a href=\"http://chem-bla-ics.blogspot.com/2006/02/chemical-reactions-in-cml.html\">entry from the MACiE database</a>.\nThese three types of content are captured in the ICrystal, IMoleculeSet, and IReactionSet.</p>\n\n<h2 id=\"some-examples\">Some Examples</h2>\n\n<p>A CIF file would be read as an IChemFile contains an IChemSequence with one IChemModel containing an ICrystal.\nAn MDL molfile would be read as an IChemFile containing an IChemSquence with one IChemModel containing a\nIMoleculeSet with one IMolecule. And, an MDL SD file, however, would be read is an IChemFile with an\nIChemSequence with as many IChemModels as there are molecules in the SD file; and, each IChemModel would\ncontains a IMoleculeSet with only one IMolecule. Counter-intuitively, because one may expect the SD file,\nwhich is a set of molecules, being stored in a IMoleculeSet.</p>\n\n<p>Enough for tonight. More later. For the impatient, previously I wrote up a short blog about\n<a href=\"http://chem-bla-ics.blogspot.com/2006/04/cdk-data-classes-and-change.html\">the update notification scheme in the CDK interfaces</a>.</p>",
      "summary": "The Chemistry Development Kit has a rich set of data classes, each of which is defined by an interface. While the classes for atoms, bonds and a connectivity table are fairly straightforward, but beyond that it is sometimes not entirely clear. I will now discuss all interfaces in a series of blog items. I’ll start with the IChemFile. Christoph, please correct me if I move to far away from our Notre Dame board sketch.",
      
      "date_published": "2007-07-16T00:10:00+00:00",
      "date_modified": "2007-07-16T00:10:00+00:00",
      "tags": ["cdk","cheminf"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/07/16/open-science-notebook-10-years-ago.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/07/16/open-science-notebook-10-years-ago.html",
      "title": "The Open Science Notebook 10 years ago",
      "content_html": "<p>So, with <a href=\"http://mndoci.com/blog/2007/07/14/does-the-open-research-world-need-a-single-access-point/\">all</a>\n<a href=\"http://drexel-coas-elearning.blogspot.com/2006/09/open-notebook-science.html\">these</a>\n<a href=\"http://3quarksdaily.blogs.com/3quarksdaily/2006/11/the_future_of_s.html\">people</a>\n<a href=\"http://depth-first.com/articles/2007/06/21/open-notebook-science-using-inchimatic\">blogging</a>\n<a href=\"http://www.sennoma.net/main/archives/2007/07/giving_open_notebook_science_a.php\">about</a>\n<a href=\"http://scilib.typepad.com/science_library_pad/2007/06/thinking-about-.html\">the</a>\n<a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=353\">Open</a>\n<a href=\"http://lccccollegeenglish.blogspot.com/2007/05/gary-hermans-open-notebook-science.html\">Science</a>\n<a href=\"http://www.earlham.edu/~peters/fos/2007_04_08_fosblogarchive.html#117639147972554644\">Notebook</a>\n(yes, each word is one distinct blog) it is worth looking back in time. To make clear what I put\nunder the OSN: a notebook in which experimental details and outcome are written down.\nSo, what did the OSN look like almost ten years ago?</p>\n\n<p>It looked like the early open source chemoinformatics projects, such as\n<a href=\"http://sourceforge.net/users/steinbeck/\">CompChem and JMDraw</a> set up by\n<a href=\"http://wiki.cubic.uni-koeln.de/blog/\">Christoph</a> (the SourceForge projects have, unfortunately,\nbeen deleted; so I cannot link to the original project pages). JChemPaint and Jmol also originate from\nthose years.</p>\n\n<p>These projects were OSNs <em>avant le lettre</em>: an experiment in chemoinformatics is the definition of a\nnew (or reformulation of an old) algorithm, writing down the experiment (source code in this code),\nuploaded into a repository (Open Science!) for everyone to comment on, possible sent around an\nannouncement for discussion to mailing list, and reporting the outcome (preferable in a peer-reviewed\njournal). While I am ranting^Wtalking about the issues, chemoinformatics is in the luxurious situation\nthat reproducibility of a procedure is <strong>much</strong> easier,\n<a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=413\">except for the missing data part</a>.</p>\n\n<p>Just wanted to say that OSN is really nothing new, not to chemistry anyway. Maybe for lab chemists.\n<a href=\"http://drexel-coas-talks-mp3-podcast.blogspot.com/\">Jean-Claude</a> has shown to be very successful in\n<a href=\"https://doi.org/10.1038/npre.2007.39.1\">promoting these open science ideas</a> among lab chemists,\nand congratulate him with the exposure in all those magazine interviews lately. Cheers!</p>\n\n<h2 id=\"open-science-versus-open-source\">Open Science versus Open Source</h2>\n\n<p>Oh, and let me make the distinction between open source in general and open science. Many of the\ncurrent open source software in chemistry(/chemoinformatics) are <strong>not</strong> open science. Open science\nmeans that every step in the development process is open, where is many chemoinformatics programs\nare <em>dumped</em> into the open source sphere at the end. That is not the way it should be.</p>\n\n<p>For the lab chemists: <em><a href=\"http://en.wikipedia.org/wiki/%5EW\">^W is a shortcut for ‘delete the previous word’</a></em>.</p>",
      "summary": "So, with all these people blogging about the Open Science Notebook (yes, each word is one distinct blog) it is worth looking back in time. To make clear what I put under the OSN: a notebook in which experimental details and outcome are written down. So, what did the OSN look like almost ten years ago?",
      
      "date_published": "2007-07-16T00:00:00+00:00",
      "date_modified": "2007-07-16T00:00:00+00:00",
      "tags": ["cdk","openscience"],
      "_references": [{ "url": "https://doi.org/10.1038/npre.2007.39.1" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/07/14/cdk-literature-2.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/07/14/cdk-literature-2.html",
      "title": "CDK Literature #2",
      "content_html": "<p>Second in a series of articles summarizing articles that cite one of the main CDK articles for\n<a href=\"http://www.cdknews.org/\">CDK News</a>. The <a href=\"https://chem-bla-ics.linkedchemistry.info/2007/01/14/cdk-literature-1.html\">first CDK Literature <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nwas already half a year ago, so it was about time.</p>\n\n<h2 id=\"bioclipse\">Bioclipse</h2>\n\n<p>Nothing much I have to say about that. Just <a href=\"http://chem-bla-ics.blogspot.com/search?q=Bioclipse\">browse my blog</a> and\nyou’ll see that it heavily uses CDK, JChemPaint and Jmol. See also the <a href=\"http://bioclipse.blogspot.com/\">Bioclipse blog</a>. <br />\n<em>Ola Spjuth, Tobias Helmus, Egon Willighagen, Stefan Kuhn, Martin Eklund, Johannes Wagener, Peter Murray-Rust,\nChristoph Steinbeck, Jarl Wikberg, Bioclipse: an open source workbench for chemo- and bioinformatics, BMC Bioinformatics,\n2007, 8(59), doi:<a href=\"https://doi.org/10.1186/1471-2105-8-59\">10.1186/1471-2105-8-59</a></em></p>\n\n<h2 id=\"proteomics-in-20052006\">Proteomics in 2005/2006</h2>\n\n<p>Review article on proteomics which mentions the CDK and JChemPaint in the data analysis section, but it does not cite them.\nIt does cite the Bioclipse article though. <br />\n<em>Jeffrey Smith, Jean-Philippe Lambert, Fred Elisma, Daniel Figeys, Proteomics in 2005/2006: Developments, applications\nand challenges, Analytical Chemistry, 2007, 79(12):4325-4343, doi:<a href=\"https://doi.org/10.1021/ac070741j\">10.1021/ac070741j</a></em></p>\n\n<h2 id=\"combinatorial-enumeration\">Combinatorial Enumeration</h2>\n\n<p>Article by Andreas on <a href=\"http://gecco.org.chemie.uni-frankfurt.de/smilib/index.html\">SmiLib</a> (BSD-like license) which\nis library for combinatorial enumeration using building blocks. The CDK is used for the addition of explicit\nhydrogens and the creation of MDL SD files. Andreas mentions in the article that the CDK’s SMILES parser ignores\nstereo chemistry. <br />\n<em>Andreas Schüller, Volker Hänke, Gisbert Schneider, SmiLib v2.0: A Java-Based Tool for Rapid Combinatorial Library\nEnumeration, QSAR &amp; Combinatorial Science, 2007, 26(3):407-410, doi:<a href=\"https://doi.org/10.1002/qsar.200630101\">10.1002/qsar.200630101</a></em></p>\n\n<h2 id=\"molecular-query-language\">Molecular Query Language</h2>\n\n<p>This article is also from the group of Gisbert. Ewgenij introduces an open standard SMARTS replacement, covered in\n<a href=\"http://chem-bla-ics.blogspot.com/2005/10/cdk-news.html\">CDK News in 2005</a>. There is an interface to the CDK, but the\nlicense of the reference implementation makes it impossible to distribute it with the CDK itself. This is rather\nunfortunate, because if it would have been possible, a number of implementations in the CDK, such as atom type\nperception, could be based on MQL. See also <a href=\"http://miningdrugs.blogspot.com/2007/01/molecular-query-languages-flexmol-mql.html\">Jörgs blog on MQL</a>. <br />\n<em>Ewgenij Proschak, Jörg Wegner, Andreas Schüller, Gisbert Schneider, Uli Fechner, J. Chem. Inf. Model., 2007, 47(2):295-301,\ndoi:<a href=\"https://doi.org/10.1021/ci600305h\">10.1021/ci600305h</a></em></p>\n\n<h2 id=\"golden-rules-in-mass-spectroscopy\">Golden Rules in Mass Spectroscopy</h2>\n\n<p>Tobias Kind wrote about structure elucidation using mass spectra, and discusses MolGen and CDK’s <code class=\"language-plaintext highlighter-rouge\">DeterministicStructureGenerator</code>,\nand mentions problems with both generators. He has been in contact with the CDK and recently did\n<a href=\"http://sourceforge.net/tracker/index.php?func=detail&amp;aid=1743861&amp;group_id=20024&amp;atid=120024\">extensive tests</a>. <br />\n<em>Tobias Kind and Oliver Fiehn, Seven Golden Rules for heuristic filtering of molecular formulas obtained by accurate mass\nspectrometry, BMC Bioinformatics, 2007, 8:105, doi:<a href=\"https://doi.org/10.1186/1471-2105-8-105\">10.1186/1471-2105-8-105</a></em></p>",
      "summary": "Second in a series of articles summarizing articles that cite one of the main CDK articles for CDK News. The first CDK Literature was already half a year ago, so it was about time.",
      
      "date_published": "2007-07-14T00:00:00+00:00",
      "date_modified": "2024-12-27T00:00:00+00:00",
      "tags": ["cdk"],
      "_references": [{ "url": "https://doi.org/10.1186/1471-2105-8-59" },{ "url": "https://doi.org/10.1021/ac070741j" },{ "url": "https://doi.org/10.1186/1471-2105-8-105" },{ "url": "https://doi.org/10.1002/qsar.200630101" },{ "url": "https://doi.org/10.1021/ci600305h" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/07/13/inter-and-extrapolation-nmr-shift.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/07/13/inter-and-extrapolation-nmr-shift.html",
      "title": "Inter- and Extrapolation: the NMR shift prediction debate",
      "content_html": "<p>Chemical blogspace has seen a lengthy discussion on <a href=\"http://chem-bla-ics.blogspot.com/2007/06/quality-of-chemical-database.html\">the quality of a few NMR shift prediction programs</a>,\nand Ryan wanted to make <a href=\"http://acdlabs.typepad.com/my_weblog/2007/07/final-note-on-t.html\">a final statement</a>. Down his blog item\nhe had this quote from Jeff, discussing the use of the <a href=\"http://www.nmrshiftdb.org/\">NMRShiftDB</a> as external test set:</p>\n\n<ul><i>\n“Of course customers are really interested in how accurately a prediction program can predict THEIR molecules - not a collection of external data such as NMRShiftDB.”\n</i></ul>\n\n<p>I’m sure none of us knows what weird chemistry people are doing; we will never know what the overlap of the NMRShiftDB test\nset with the customer data set is. The quote suggests it is low, but we simply do not know.</p>\n\n<h2 id=\"interpolation-and-extrapolation\">Interpolation and Extrapolation</h2>\n\n<p>The accuracy of prediction models is very difficult to grasp, and one can only estimate it; using a test set.\nIf few data is available, one may opt for using the training set as test set too, and gives an estimate if the\nmodeling method is able to predict at all. However, the outcome of this exercise is the worst possible estimate\nyou can make. So, when possible you use an independent test set, which does not contain any molecules that were\npresent in the training set. (Actually, one could even suggest that this must happen on a shift level, but that\ngives problems with HOSE-code based prediction.)</p>\n\n<p>Now, what Ryan stresses in his <a href=\"http://acdlabs.typepad.com/my_weblog/2007/07/final-note-on-t.html\">latest blog item</a>\nis that prediction test results for the various available methods does not explicitly state the amount of overlap\nbetween the training and test set, one cannot draw any conclusions. Agreed. I would, however, like to tune this\neven a bit further, after reading the stupid quote (of course, taking out of context). What Jeff probably aimed\nat, is that the prediction accuracy is only meaningful to a customer if there is considerable between the customers\ndata set and the test set, which is what the model makers do not know.</p>\n\n<p>And the overlap actually goes beyond the overlap in terms of molecular identity. It is really the overlap in terms\nof molecular substructures that matters: a database with alkanes but no phenyl rings will more accurately predict\nother alkanes not present in the training set (interpolation), but will not accurately predict compounds with\nphenyl rings (extrapolation). What the customer needs is that his personal data set does not require extrapolation.\nThat is what matters.</p>\n\n<p>It is interesting to realize, however, that the NMRShiftDB allows you to upload your molecules, or alternatively,\nyou download the software (it’s open source) and the data (it’s open data) if you don’t want to send your molecules\nover the internet, and the NMRShiftDB software will automatically take into account your own data set.</p>\n\n<p>Thus, if you are working on a series of related molecules, you can extend the NMRShiftDB data set with already\nelucidated structures, reducing the prediction error for your yet related unknowns derivatives. It is that easy\nto include prior/expert knowledge in the NMRShiftDB. I believe the ACD/Labs software allows this too, so the\nquote is really meaningless. Not correct, not wrong, simply says nothing.</p>\n\n<h2 id=\"open-data-open-source-open-standards\">Open Data, Open Source, Open Standards</h2>\n\n<p>Now, the various releases of the ACD/Labs software show a simple, understandable trend that increasing the number\nof data you use for the training set, reduces the prediction error. That’s because of various reasons I will not\ngo into in this item. The ACD/Labs NMR databases are expensive, because they have to manually extract and validate\nthe data from literature (see <a href=\"http://acdlabs.typepad.com/my_weblog/2007/06/the_purgatory_d.html\">The Purgatory Database</a>);\nso, during my PhD I only bought the CNMR and HNMR prediction packages. (Off topic: two weeks after I received my\ncopies of the software, ACD/Labs released a new version, which they kindly sent me a copy of too. Common in\nopensource, but much appreciated at that time. Cheers, <a href=\"http://www.acdlabs.com/\">ACD/Labs</a>!)</p>\n\n<p>The ACD/Labs databases are likely expensive because of various reasons. And this is where the ODOSOS concept of the\n<a href=\"http://www.blueobelisk.org/\">Blue Obelisk</a> comes in. <strong>Open Data</strong>: if publishers would not copyright their data,\nNMR databases would be much cheaper to set up (see <a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=405\">this thread in Peter’s blog</a>);\nassuming ACD/Labs has to pay publishers for actually setting up their database. <strong>Open Source</strong>: the various Blue\nObelisk projects provide the <a href=\"http://chem-bla-ics.blogspot.com/2006/09/chemical-archeology-oscar3-to.html\">tools to automatically create a purgatory NMR database</a>;\nno humans needed for that any more. <strong>Open Standards</strong>: the data from the NMRShiftDB can be downloaded in various\nformats, among which CMLSpect. Being able to easily read the data, made it possible that we actually have this\ndiscussion. Sure, the open data part of the NMRShiftDB is crucial too! But the database could have used an obscure,\nbinary, undocumented, with many software tweaks and special cases, <code class=\"language-plaintext highlighter-rouge\">.doc</code>-like format, which no one could support.</p>\n\n<p>Clearly, ODOSOS gives all, even proprietary, NMR prediction tools a boost, and I am very happy to see that happen.\nIt is the point that we, the Blue Obelisk Movement, are trying to make for some time now.</p>",
      "summary": "Chemical blogspace has seen a lengthy discussion on the quality of a few NMR shift prediction programs, and Ryan wanted to make a final statement. Down his blog item he had this quote from Jeff, discussing the use of the NMRShiftDB as external test set:",
      
      "date_published": "2007-07-13T00:00:00+00:00",
      "date_modified": "2007-07-13T00:00:00+00:00",
      "tags": ["nmr"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/07/08/that-big-pile-of-paper.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/07/08/that-big-pile-of-paper.html",
      "title": "That big pile of paper...",
      "content_html": "<p>Everyone of use knows that big pile of paper on your desk that contains the things we want to read, scan or just\nbrowse. I even have <a href=\"http://del.icio.us/egonw/toread\">an electronic equivalent</a>. Another pile contains leaflets\nand glossy folders from conferences, like the <a href=\"http://chem-bla-ics.blogspot.com/search?q=ACS+Chicago\">ACS meeting in Chicago</a>.\nOK, going to get rid of those last ones, and will shortly put the links here.</p>\n\n<p>The first leaflet is from <a href=\"http://www.chemistrycentral.com/\">Chemistry Central</a>, one of the open access publishers.\nActually, not just open access as in free access, but open access as in freedom to reuse it. One things I noticed is this text:\n<em>Our submission system also allows authors to upload figures and reactions schemes in ChemDraw or ISIS/Draw file formats</em>.\nWhat about CMLReact and CML itself? Those are formats I can author with my <a href=\"http://www.blueobelisk.org/\">Blue Obelisk</a>\ntools.</p>\n\n<p>Then there is the proprietary <a href=\"http://www.strandls.com/sarchitect/\">Sarchitect</a> in the area of QSAR/QSPR/ADMET.\nNo idea about the scope or whatever. Oh, make sure to check out <a href=\"http://www.qsarworld.com/\">QSAR world</a>,\nwhere <a href=\"http://andygoesus.blogspot.com/\">Andreas</a> has a column too. I also have some information on the\n<a href=\"http://www.rsc.org/virtuallibrary\">RSC Virtual Library</a> which provides free access to the RSC journals for\nRSC member. But I am not. <a href=\"http://www.epa.gov/greenchemistry\">Green Chemistry</a> is nice for the environment,\nof course, but according to the <a href=\"http://www.epa.gov/\">EPA</a>, it’s about more: <em>Cleaner, cheaper, smarter chemistry</em>.\nWhy, oh why, does this financial incentive have to be present all the time? Are we, humans, really that stupid?</p>\n\n<p>I’m sure I had more advertorials, but these must have been the highlights.</p>",
      "summary": "Everyone of use knows that big pile of paper on your desk that contains the things we want to read, scan or just browse. I even have an electronic equivalent. Another pile contains leaflets and glossy folders from conferences, like the ACS meeting in Chicago. OK, going to get rid of those last ones, and will shortly put the links here.",
      
      "date_published": "2007-07-08T00:00:00+00:00",
      "date_modified": "2007-07-08T00:00:00+00:00",
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/07/06/standing-on-shoulders-of-blue-obelisk.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/07/06/standing-on-shoulders-of-blue-obelisk.html",
      "title": "Standing on the shoulders of ... the Blue Obelisk",
      "content_html": "<p><a href=\"http://blog-msb.embo.org/blog/\">The Seven stones</a> wondered <a href=\"http://blog-msb.embo.org/blog/2007/07/what_would_you_do_with_a_petaf_1.html\">what to do with a petaflop in science</a>,\nin response to <a href=\"http://www.declanbutler.info/blog/\">Declan</a>’s <a href=\"http://dx.doi.org/10.1038/448006a\">The petaflop challenge</a> in Nature.\nDeclan discusses in this commentary the increase in computing power and the necessity of parallel programming to make use of it.\nNow, I do have some ideas (e.g. enumerating metabolomic space, mining the RDF graph of our collective biological and chemical\nknowledge base for the one hundred most supported contradictions), but that is not what I want to talk about. It is this fragment\nfrom Declan’s piece:</p>\n\n<ul><i>\n\"I'm amazed at what he can do just using open-source libraries,\" [Horst Simon] says. Although there are exceptions, such as\nhigh-energy physics and bioinformatics, many labs keep their software development close to their chests, for fear that their\ncompetitors will put it to better use and get the credit for the academic application of the program. There is little\nincentive to get the software out there, says Simon, and such attitudes plague development.\n</i></ul>\n\n<p>This is something that is very familiar to many of us: developing algorithms for scientific problems is not appreciated.\nIt worries me very much the way the scientific community currently deals with algorithms and data; it seems the community\ndoes not care about correctness or improvement at all, as long as the result illustrates what they think the (bio)chemical\nreality has to offer. At least, that is what effectively happens if they do no give proper credit to the scientific\nimportance of software development.</p>\n\n<p>Of course, scientific credibility of software depends on the open source nature of the software:\n“Given enough eyeballs, all bugs are shallow”, <a href=\"http://www.catb.org/~esr/writings/cathedral-bazaar/cathedral-bazaar/\">The Cathedral and the Bazaar</a>,\nE.S. Raymond. Or, in more traditional wording: science, and scientific software, must be reproducible and/or\nfalsifiable. The <a href=\"http://www.blueobelisk.org/\">Blue Obelisk Movement</a> is trying to achieve this\n(DOI:<a href=\"https://doi.org/10.1021/ci050400b\">10.1021/ci050400b</a>).</p>\n\n<h2 id=\"the-open-source-challenge\">The open source challenge</h2>\n\n<p>Therefore, I hereby challenge all experimental chemists in biologists to acknowledge the amount of scientific software\nthey already use, and give credit where credit is due. I challenge them to stand up and say that chemo- and\nbioinformaticians provide the methods they rely on daily to achieve there goals. I challenge them to say that\nthey stand of the shoulders of scientific software developers.</p>\n\n<p>The article should not have been called <em>The petaflop challenge</em>, but <em>The open source challenge</em>.</p>",
      "summary": "The Seven stones wondered what to do with a petaflop in science, in response to Declan’s The petaflop challenge in Nature. Declan discusses in this commentary the increase in computing power and the necessity of parallel programming to make use of it. Now, I do have some ideas (e.g. enumerating metabolomic space, mining the RDF graph of our collective biological and chemical knowledge base for the one hundred most supported contradictions), but that is not what I want to talk about. It is this fragment from Declan’s piece:",
      
      "date_published": "2007-07-06T00:00:00+00:00",
      "date_modified": "2007-07-06T00:00:00+00:00",
      "tags": ["blue-obelisk"],
      "_references": [{ "url": "https://doi.org/10.1038/448006a" },{ "url": "https://doi.org/10.1021/ci050400b" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/07/01/atom-typing-in-cdk.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/07/01/atom-typing-in-cdk.html",
      "title": "Atom typing in the CDK",
      "content_html": "<p>Atom typing is one of principal activities in chemoinformatics. Atom types provide additional information that cannot be derived\nfrom the connection table that is being used, or may define what force fields terms should be used. This makes perception of\natom types very important.</p>\n\n<p>The <a href=\"http://cdk.sf.net/\">CDK</a> has a few places where atom types are perceived. The <a href=\"http://cdk.svn.sf.net/svnroot/cdk/trunk/cdk/src/org/openscience/cdk/tools/HydrogenAdder.java\">HydrogenAdder</a>\nand <a href=\"http://cdk.svn.sf.net/svnroot/cdk/trunk/cdk/src/org/openscience/cdk/tools/ValencyChecker.java\">ValencyChecker</a> are two examples.\nGetting the perception wrong, makes it impossible to correctly add hydrogens (of course, hydrogen should always be explicit!) For a\nlong time, these perception algorithms have been embedded in the classes that used them, but efforts have been undertaken to refactor\nthe algorithms into separate classes. These can be found in the package <a href=\"http://cdk.svn.sf.net/svnroot/cdk/trunk/cdk/src/org/openscience/cdk/atomype/\">cdk/atomtype/</a>.</p>\n\n<h2 id=\"different-applications-different-scheme\">Different applications, different scheme</h2>\n\n<p>Now, the CDK can be a bit confusing with respect to the HydrogenAdder and <a href=\"http://cdk.svn.sf.net/svnroot/cdk/trunk/cdk/src/org/openscience/cdk/tools/IValencyChecker.java\">IValencyChecker</a>.\nOriginally, the CDK had only one atom type list, the <a href=\"http://cdk.svn.sf.net/svnroot/cdk/trunk/cdk/src/org/openscience/cdk/config/data/valency_atomtypes.xml\">StructGen Atom Types</a>.\nThis list was used by the deterministic structure generator (and still is), and only defined atom types for neutral atoms, and does not know anything about hybridization states.</p>\n\n<p>The first bug reports dropped in when people applied the HydrogenAdder to charged molecules. However, as said, charged atoms were not defined and the algorithm failed,\nnot silently, just gave the wrong answer. Therefore, the <a href=\"http://cdk.svn.sf.net/svnroot/cdk/trunk/cdk/src/org/openscience/cdk/config/data/valency_atomtypes.xml\">Valency Atom Types</a>\nlist was setup, which does include charged atoms. Everyone happy again.</p>\n\n<p>Later, bugs were reported about the SMILES parser, which comes with additional problems: bond orders are not explicit, and have to be\ndeduced from the connectivity; atom type perception is the only way to decide how many bonds an atom should have, and with what bond\norder. However, SMILES defines hybridization states, and the CDK did not have an atom type list with hybridization information. So,\nwhile the Valency Atom Types list was extended from the StructGen Atom Type List, a new list was created extending from the Valency\nAtom Type list: the <a href=\"http://cdk.svn.sf.net/svnroot/cdk/trunk/cdk/src/org/openscience/cdk/config/data/hybridization_atomtypes.xml\">Hybridization Atom Types</a>\nlist.</p>\n\n<p>Since then, applications asked for other atom type lists, such as the <a href=\"http://cdk.svn.sf.net/svnroot/cdk/trunk/cdk/src/org/openscience/cdk/config/data/mm2_atomtypes.xml\">MM2</a>,\n<a href=\"http://cdk.svn.sf.net/svnroot/cdk/trunk/cdk/src/org/openscience/cdk/config/data/mmff94_atomtypes.xml\">MMFF94</a>,\n<a href=\"http://cdk.svn.sf.net/svnroot/cdk/trunk/cdk/src/org/openscience/cdk/config/data/pdb_atomtypes.xml\">PDB</a>, and\n<a href=\"http://cdk.svn.sf.net/svnroot/cdk/trunk/cdk/src/org/openscience/cdk/config/data/mol2_atomtypes.xml\">Sybyl</a> atom\ntypes. The first two are used for the force field code in the CDK, while the latter two are used for the respective\nIChemObjectReaders.</p>\n\n<h2 id=\"junit-testing-the-perceivers\">JUnit testing the perceivers</h2>\n\n<p>Not all applications actually already make use of the new atom type perception classes in cdk.atomtype. It is wished that these well tested\nbefore the replace code in the classes that use those atom types. Therefore, Rajarshi and me have been working on JUnit test suites. The\nlatest step in this process was that I transformed the test classes to extend a new JUnit4-based\n<a href=\"http://cdk.svn.sf.net/svnroot/cdk/trunk/cdk/src/org/openscience/cdk/test/atomtype/AbstractAtomTypeTest.java\">AbstractAtomTypeTest</a> class.\nNew in this class is that it report which atom types in the atom type list have been tested, and the test will fail if not all atom types\nare tested. The StructGen Atom Types list is mostly covered now, but for all other lists tests still have to be written (monitor the progress\non <a href=\"http://cheminfo.informatics.indiana.edu/~rguha/code/java/nightly/test/result-core.html\">CDK Nightly</a>).</p>\n\n<p>For the MOL2 atom type list, there is no Java implementation of the IAtomTypeMatcher, but we have Fortran code that can be ported (provided\nby Martin Ott). Anyone interested?</p>",
      "summary": "Atom typing is one of principal activities in chemoinformatics. Atom types provide additional information that cannot be derived from the connection table that is being used, or may define what force fields terms should be used. This makes perception of atom types very important.",
      
      "date_published": "2007-07-01T00:00:00+00:00",
      "date_modified": "2007-07-01T00:00:00+00:00",
      "tags": ["cdk"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/06/27/chemical-rdfa-with-operator-in-firefox.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/06/27/chemical-rdfa-with-operator-in-firefox.html",
      "title": "Chemical RDFa with Operator in the Firefox toolbar",
      "content_html": "<p>December last year <a href=\"http://chem-bla-ics.blogspot.com/2006/12/including-smiles-cml-and-inchi-in.html\">I proposed the use of microformats and RDFa</a>\nfor simple semantic markup of molecular information. I linked that with the <a href=\"http://chem-bla-ics.blogspot.com/2006/02/hacking-inchi-support-into.html\">InChI extension for the Postgenomic.com software</a>\nfor <a href=\"http://cb.openmolecules.net/\">Chemical blogspace</a> and wrote these tools to work with the markup:</p>\n\n<ul>\n  <li><a href=\"http://chem-bla-ics.blogspot.com/2006/12/smiles-cas-and-inchi-in-blogs.html\">wrote a Greasemonkey script to automatically link to webservices</a>,</li>\n  <li><a href=\"http://chem-bla-ics.blogspot.com/2007/01/chemistry-in-html-javascript-from.html\">explained how that script can be used on the server</a>, and</li>\n  <li><a href=\"http://chem-bla-ics.blogspot.com/2007/05/cb-comments-for-inchis.html\">adapted a Greasemonkey script to show blog items related to molecules</a>.</li>\n</ul>\n\n<p>All using the new semantic markup.</p>\n\n<p>Of the two, I think RDFa has the best future. Then I <a href=\"http://chem-bla-ics.blogspot.com/2007/05/added-my-hcard-to-my-blog.html\">discovered Operator</a>,\nwritten by <a href=\"http://www.kaply.com/weblog/\">Mike</a>. While the Greasemonkey scripts already allow me to link to, for example, PubChem and eMolecules,\nthe <a href=\"https://addons.mozilla.org/en-US/firefox/addon/4106\">Operator Firefox Addon</a> allowed me to open vCards incorporated in HTML pages directly\nto my address book client. Thus, I could open chemistry directly in <a href=\"http://bioclipse.net/\">Bioclipse</a> too!</p>\n\n<p>That was the idea, at least. I contacted Mike, and he asked me to wait until the first 0.8 releases, which he\n<a href=\"http://www.kaply.com/weblog/2007/06/04/operator-08a-is-available/\">announced earlier this month</a>.\nThis version allows user scripts to be written, which define how RDFa should be handled. And with his patience and help, this was the result:</p>\n\n<p><img src=\"/assets/images/pubchemRDFa.png\" alt=\"\" /></p>\n\n<p>The HTML is almost <a href=\"http://chem-bla-ics.blogspot.com/2006/12/smiles-cas-and-inchi-in-blogs.html\">as explained before</a>, and looks like:</p>\n\n<div class=\"language-html highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nt\">&lt;html</span> <span class=\"na\">xmlns=</span><span class=\"s\">\"http://www.w3.org/2002/06/xhtml2/\"</span><span class=\"nt\">&gt;</span>\n\n<span class=\"nt\">&lt;h1&gt;</span>Chemical RDFa with Operator<span class=\"nt\">&lt;/h1&gt;</span>\n\n<span class=\"nt\">&lt;div</span> <span class=\"na\">about=</span><span class=\"s\">\"#chem_123\"</span> <span class=\"na\">xmlns:chem=</span><span class=\"s\">\"http://www.blueobelisk.org/chemistryblogs/\"</span><span class=\"nt\">&gt;</span>\n  Methane has the following identifier: <span class=\"nt\">&lt;span</span> <span class=\"na\">property=</span><span class=\"s\">\"chem:inchi\"</span><span class=\"nt\">&gt;</span>InChI=1/CH4/h1H4<span class=\"nt\">&lt;/span&gt;</span>\n<span class=\"nt\">&lt;/div&gt;</span>\n\n<span class=\"nt\">&lt;/html&gt;</span>\n</code></pre></div></div>\n\n<p>It is important here to wrap the statement in a <code class=\"language-plaintext highlighter-rouge\">&lt;div&gt;</code> element and to add the <code class=\"language-plaintext highlighter-rouge\">@about</code> attribute to it, defining the Subject. Moreover,\nyou need to use the <code class=\"language-plaintext highlighter-rouge\">@property</code> attributes instead of <code class=\"language-plaintext highlighter-rouge\">@class</code>. The content of this attribute defined the Predicate, and the content of the\n<code class=\"language-plaintext highlighter-rouge\">&lt;span&gt;</code> element is the Object, completing the RDF triple.</p>\n\n<p>Operator detects these RDFa statements from the HTML, and creates a new menu item <em>Search in Pubchem</em> using this piece of code:</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kd\">var</span> <span class=\"nx\">pubchem_inchi</span> <span class=\"o\">=</span> <span class=\"p\">{</span>\n  <span class=\"na\">description</span><span class=\"p\">:</span> <span class=\"dl\">\"</span><span class=\"s2\">Search in PubChem</span><span class=\"dl\">\"</span><span class=\"p\">,</span>\n  <span class=\"na\">short</span><span class=\"p\">:</span> <span class=\"dl\">\"</span><span class=\"s2\">PubChem</span><span class=\"dl\">\"</span><span class=\"p\">,</span>\n  <span class=\"na\">scope</span><span class=\"p\">:</span> <span class=\"p\">{</span>\n    <span class=\"na\">semantic</span><span class=\"p\">:</span> <span class=\"p\">{</span>\n      <span class=\"dl\">\"</span><span class=\"s2\">RDFa</span><span class=\"dl\">\"</span> <span class=\"p\">:</span>  <span class=\"p\">{</span>\n        <span class=\"na\">property</span> <span class=\"p\">:</span> <span class=\"dl\">\"</span><span class=\"s2\">http://www.blueobelisk.org/chemistryblogs/inchi</span><span class=\"dl\">\"</span><span class=\"p\">,</span>\n        <span class=\"na\">defaultNS</span> <span class=\"p\">:</span> <span class=\"dl\">\"</span><span class=\"s2\">http://www.blueobelisk.org/chemistryblogs/</span><span class=\"dl\">\"</span>\n      <span class=\"p\">}</span>\n    <span class=\"p\">}</span>\n  <span class=\"p\">},</span>\n  <span class=\"na\">doAction</span><span class=\"p\">:</span> <span class=\"kd\">function</span><span class=\"p\">(</span><span class=\"nx\">semanticObject</span><span class=\"p\">,</span> <span class=\"nx\">semanticObjectType</span><span class=\"p\">)</span> <span class=\"p\">{</span>\n    <span class=\"k\">if </span><span class=\"p\">(</span><span class=\"nx\">semanticObjectType</span> <span class=\"o\">==</span> <span class=\"dl\">\"</span><span class=\"s2\">RDFa</span><span class=\"dl\">\"</span><span class=\"p\">)</span> <span class=\"p\">{</span>\n      <span class=\"k\">return</span> <span class=\"dl\">\"</span><span class=\"s2\">http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=search&amp;DB=pccompound&amp;term=%22</span><span class=\"dl\">\"</span> <span class=\"o\">+</span> <span class=\"nx\">semanticObject</span><span class=\"p\">.</span><span class=\"nx\">inchi</span> <span class=\"o\">+</span> <span class=\"dl\">\"</span><span class=\"s2\">%22[InChI]</span><span class=\"dl\">\"</span><span class=\"p\">;</span>\n    <span class=\"p\">}</span>\n  <span class=\"p\">}</span>\n<span class=\"p\">};</span>\n\n<span class=\"nx\">SemanticActions</span><span class=\"p\">.</span><span class=\"nf\">add</span><span class=\"p\">(</span><span class=\"dl\">\"</span><span class=\"s2\">pubchem_inchi</span><span class=\"dl\">\"</span><span class=\"p\">,</span> <span class=\"nx\">pubchem_inchi</span><span class=\"p\">);</span>\n</code></pre></div></div>\n\n<p>You can reproduce this by installing Operator 0.8a in Firefox, saving the script to a file in your home directory, and\nreading it via the Operator “Options” dialog. Make sure to also set the <em>Display Style</em> in the <em>General</em> tab of the dialog to\n<em>Data formats</em>. Only then will the RDFa magic kick in.</p>\n\n<p>Adding support for eMolecules, ChemSpider and whatever else we like is easy now. What I still need to explore (or ask Mike),\nis how I can trigger the <em>Open With/Save As</em> dialog of Firefox.</p>",
      "summary": "December last year I proposed the use of microformats and RDFa for simple semantic markup of molecular information. I linked that with the InChI extension for the Postgenomic.com software for Chemical blogspace and wrote these tools to work with the markup:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/pubchemRDFa.png",
      "date_published": "2007-06-27T00:10:00+00:00",
      "date_modified": "2007-06-27T00:10:00+00:00",
      "tags": ["pubchem","rdf","userscript","inchi"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/06/27/qsar-plugin-for-bioclipse-getting-in.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/06/27/qsar-plugin-for-bioclipse-getting-in.html",
      "title": "QSAR plugin for Bioclipse getting in shape",
      "content_html": "<p>Over the last few weeks I continued the work on getting (descriptor-based) <a href=\"http://en.wikipedia.org/wiki/QSAR\">QSAR</a>/QSPR implemented in\n<a href=\"http://www.bioclipse.net/\">Bioclipse</a>. <a href=\"http://joelib.sf.net/\">JOELib</a> (GPL) and the <a href=\"http://cdk.sf.net/\">CDK</a> (LGPL) being two prominent\nopensource engines that can calculate molecular descriptors, and <a href=\"http://ambit.acad.bg/\">AMBIT</a> a front-end.</p>\n\n<p>To be able to do QSAR/QSPR model building from start to end in Bioclipse, I worked in April\n<a href=\"http://chem-bla-ics.blogspot.com/2007/04/bioclipse-now-allows-qsar-descriptor.html\">on an architecture for selecting descriptors</a>.\nBeing busy with so many things, it took me some time to get around to completing that, but here are the screenshots:</p>\n\n<p><img src=\"/assets/images/bioQSAR1.png\" alt=\"\" /></p>\n\n<p>The funny characters and the whitespace is gone. Right now, it still only lists one provider, but I plan to add JOELib plugin soon.\nThe list of actual descriptors is provided by the extension.</p>\n\n<p>What Bioclipse then does, is have the extension calculate the descriptor values for the selected <code class=\"language-plaintext highlighter-rouge\">CDKResource</code> in the BioNavigator\nusing the selected descriptors. This will then create a new <code class=\"language-plaintext highlighter-rouge\">MatrixResource</code> in the Bioclipse workspace (currently called\nqsarResult.jam), and which is opened in the Matrix editor:</p>\n\n<p><img src=\"/assets/images/bioQSAR1.png\" alt=\"\" /></p>\n\n<p>There is still enough work left to do. For example, the columns are not yet labeled according to the descriptor name, and\nselecting more than one <code class=\"language-plaintext highlighter-rouge\">CDKResource</code> in the navigator does not give a multirow matrix yet.</p>",
      "summary": "Over the last few weeks I continued the work on getting (descriptor-based) QSAR/QSPR implemented in Bioclipse. JOELib (GPL) and the CDK (LGPL) being two prominent opensource engines that can calculate molecular descriptors, and AMBIT a front-end.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/bioQSAR.png",
      "date_published": "2007-06-27T00:00:00+00:00",
      "date_modified": "2025-01-17T00:00:00+00:00",
      "tags": ["bioclipse","qsar","cdk","ambit"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/06/25/test-file-repository-and-relaxng.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/06/25/test-file-repository-and-relaxng.html",
      "title": "Test File Repository and RelaxNG",
      "content_html": "<p>Last week I started the <a href=\"http://www.blueobelisk.org/\">Blue Obelisk</a> <a href=\"http://blueobelisk.svn.sf.net/svnroot/blueobelisk/ctfr/trunk/\">Chemical Test File Repository</a>,\na repository of <a href=\"http://www.opensource.org/licenses\">OSI-approved-licence</a>d test files (from various sources) to improve interoperability between\nchemoinformatics software.</p>\n\n<p>Following a discussion on the mailing list earlier, a directory hierarchy has been set up, and each files contains an index.xml to describe\nthe content. In case of a directory with actual test files, it may look like:</p>\n\n<div class=\"language-xml highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nt\">&lt;dir</span> <span class=\"na\">name=</span><span class=\"s\">\"asn/pubchem/valid\"</span> <span class=\"na\">xmlns:dc=</span><span class=\"s\">\"http://purl.org/dc/elements/1.1/\"</span><span class=\"nt\">&gt;</span>\n\n  <span class=\"nt\">&lt;chemfiles&gt;</span>\n\n    <span class=\"nt\">&lt;file</span> <span class=\"na\">name=</span><span class=\"s\">\"cid1.asn\"</span> <span class=\"na\">valid=</span><span class=\"s\">\"yes\"</span><span class=\"nt\">&gt;</span>\n       <span class=\"nt\">&lt;dc:format&gt;</span>chemical/x-asn-pubchem<span class=\"nt\">&lt;/dc:format&gt;</span>\n       <span class=\"nt\">&lt;dc:source&gt;</span>PubChem<span class=\"nt\">&lt;/dc:source&gt;</span>\n       <span class=\"nt\">&lt;dc:creator&gt;</span>Unknown<span class=\"nt\">&lt;/dc:creator&gt;</span>\n       <span class=\"nt\">&lt;dc:rights&gt;</span>PublicDomain<span class=\"nt\">&lt;/dc:rights&gt;</span>\n       <span class=\"nt\">&lt;test</span> <span class=\"na\">by=</span><span class=\"s\">\"CDK\"</span><span class=\"nt\">/&gt;</span>\n    <span class=\"nt\">&lt;/file&gt;</span>\n\n  <span class=\"nt\">&lt;/chemfiles&gt;</span>\n\n<span class=\"nt\">&lt;/dir&gt;</span>\n</code></pre></div></div>\n\n<p>As is clear, <a href=\"http://en.wikipedia.org/wiki/Dublin_core\">Dublin Core</a> is reused for much of the meta data.</p>\n\n<p>To improve and ensure some quality, the XML must be valid in addition to just well-formed, so that I can set up XSLT stylesheets to create XHTML indices and\nsummaries. Therefore, I wanted to setup a schema for the index.xml files. My first thought was to use <a href=\"http://en.wikipedia.org/wiki/Xml_schema\">XML Schema</a>\nwhich has XML Namespaces support and has well defined (and extensible) data types. I have hacked in it in the past my the details have slipped me.\nAlready in 1998 I worked with DTDs, around the time that the XML specification was declared a recommendation. Originating from the SGML year,\nit is not XML based, had no knowledge of namespaces, and only a limited amount of data types.</p>\n\n<p>Then there is <a href=\"http://en.wikipedia.org/wiki/RELAX_NG\">RELAX NG</a>. XML based, uses the same data types are XML Schema and has support for namespaces.\nSince I had to look up the specs for either DTD or XML Schema for the details anyway (e.g. on how to allow the DC namespace in the main namepsace),\nwhy not try something new. Well, I was amazed. RELAX NG has a syntax simplicity like that of DTD, but the functionality from XML Schema. So,\nI hacked up in 30 minutes a XML spec for the test file repository, including a (too short) list of recognized MIME types. Just a combination of some\n<code class=\"language-plaintext highlighter-rouge\">&lt;element&gt;</code>, <code class=\"language-plaintext highlighter-rouge\">&lt;attribute&gt;</code>, <code class=\"language-plaintext highlighter-rouge\">&lt;oneOrMore&gt;</code>, etc elements. The results is available as <a href=\"http://blueobelisk.svn.sf.net/svnroot/blueobelisk/ctfr/trunk/schema.relaxng\">schema.relaxng</a>\nin SVN.</p>",
      "summary": "Last week I started the Blue Obelisk Chemical Test File Repository, a repository of OSI-approved-licenced test files (from various sources) to improve interoperability between chemoinformatics software.",
      
      "date_published": "2007-06-25T00:00:00+00:00",
      "date_modified": "2007-06-25T00:00:00+00:00",
      "tags": ["blue-obelisk","openscience"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/06/25/nature-should-host-our-electronic-lab.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/06/25/nature-should-host-our-electronic-lab.html",
      "title": "Nature should host our Electronic Lab Notebooks",
      "content_html": "<p><a href=\"http://pbeltrao.blogspot.com/\">Pedro</a> suggested in <a href=\"http://network.nature.com/\">Nature Network</a>s <a href=\"http://network.nature.com/forum/whats-next\">What’s Next</a>\nforum that Nature should add a new service for scientists: hosting electronic lab notebooks. And I think this will be a killer application.\nI am rather excited about the idea, and feel ashamed not putting one-and-one together myself. We have our\n<a href=\"http://www.blueobelisk.org/\">chemoinformatics tools</a> and <a href=\"http://en.wikipedia.org/wiki/Resource_Description_Framework\">RDF</a>\nis just around the corner, that combined with <a href=\"http://hdl.handle.net/10042/23\">semantic wikis</a>, and we have <em>science of the 21st century</em>.\nThis is <a href=\"http://network.nature.com/forums/whats-next/5?page=6#reply-508\">my reply</a> posted on Nature Network:</p>\n\n<ul><i>\n<p>Pedro, that might be an interesting idea: Nature hosting ELN. with much content, I have been maintaining a wiki in my previous postdoc,\nas replacement for the old paper notebook. Allows me to make links etc. I plan to do this in my new postdoc too, maybe even with a\nRDF-enabled wiki, to have agents automatically verify what I enter for inconsistencies. These things are already possible; just a\nmatter of doing it.\n\n<p>If Nature would host such a service (RDF-enabled, and integrated with their other pages), they have a true killer for me: I write\nmy ELN items, and for each page I decide if I want to make it public; since it is a wiki, I can keep it private until happy about\nthe results, or, simply, until the experiment has finished. Then, by clicking a button it would become CC+attribution and\nautomatically end up in Nature Preceedings. The full integration of Scintilla/Postgenomic/Connotea comes in when making links to\nbackground material.\n\n<p>The RDF is important for validating what I write, and I can imagine that Nature has an extensive set of default agents (of course,\nin addition to spell checking etc :). These agents check if the chemical reaction equations makes sense (conservation of mass,\natom count, etc), that NMR/MS spectra and other experimental properties are consistent with that equation, and whatever else\nwe can come up with. The tools for this validation are available, and basically only the glue is missing.\n&lt;/i&gt;&lt;/ul&gt;\n</p></p></p></i></ul>",
      "summary": "Pedro suggested in Nature Networks What’s Next forum that Nature should add a new service for scientists: hosting electronic lab notebooks. And I think this will be a killer application. I am rather excited about the idea, and feel ashamed not putting one-and-one together myself. We have our chemoinformatics tools and RDF is just around the corner, that combined with semantic wikis, and we have science of the 21st century. This is my reply posted on Nature Network:",
      
      "date_published": "2007-06-25T00:00:00+00:00",
      "date_modified": "2007-06-25T00:00:00+00:00",
      "tags": ["nature","eln","openscience"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/06/22/archiving-spectra-use-inchi-and-cml.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/06/22/archiving-spectra-use-inchi-and-cml.html",
      "title": "Archiving spectra: use InChI and CML",
      "content_html": "<p><a href=\"http://acdlabs.typepad.com/my_weblog/\">Ryan</a> blogged in <a href=\"http://acdlabs.typepad.com/my_weblog/2007/06/archive_this.html\">Archive This</a>\nabout some advices from ACD on how to store spectra in your electronic lab notebook.</p>\n\n<h2 id=\"use-inchi\">Use InChI</h2>\n\n<p>This reminded me of a <a href=\"http://chem-bla-ics.blogspot.com/2007/02/rsc-first-publisher-to-go-semantic.html\">discussion I had with with Colin</a>\nwhen he was at the CUBIC, which was about experimental sections. I proposed that the <a href=\"http://www.iupac.org/inchi/\">InChI</a> should have a\nprominent place in the experimental section. An important argument for this is that it allows well-defined atom numbering to be used when\nwriting down the NMR bits in that section: the InChI gives a unique numbering, so that the numbering used in the experimental section\nbecomes author neutral. Because the InChI puts the carbons up front, the <sup>13</sup>C NMR details get numbers from 1-13, or whatever\nthe carbon count is. For proton NMR it is not difficult either, they are simply numbered according to the heavy atom to which they are\nattached. For situations where two hydrogens attached to the same heavy atom have different shifts, then a and b can still be used.\nThe numbers are easily added to 2D diagrams anyway.</p>\n\n<p>If software vendors (e.g. <a href=\"http://www.acdlabs.com/\">ACD</a> and <a href=\"http://bioclipse.net/\">Bioclipse</a>) and publishers (e.g. ACS,\n<a href=\"http://www.rsc.org/Publishing/Journals/ProjectProspect/\">RSC</a>, <a href=\"http://www.chemistrycentral.com/\">Chemistry Central</a>) could adopt this\nproposal, then experimental sections immediately are better machine parsable and ready for automatic processing, such as discussed in\nmy blog item <a href=\"http://chem-bla-ics.blogspot.com/2006/09/chemical-archeology-oscar3-to.html\">Chemical Archeology: OSCAR3 to NMRShiftDB.org</a>\nand by <a href=\"http://www.acscinf.org/dbx/mtgs/232nm/232cinfprogram.asp\">Christoph at the ACS meeting</a>, available as\n<a href=\"http://acscinf.org/docs/meetings/232nm/presentations/232nm101.pdf\">PDF</a> and this 18MB\n<a href=\"http://acscinf.org/docs/meetings/232nm/presentations/232nm101.mp3\">MP3</a>.</p>\n\n<h2 id=\"use-cml\">Use CML</h2>\n\n<p>Even better is to use <a href=\"http://en.wikipedia.org/wiki/Chemical_Markup_Language\">CML</a> for this, or CMLSpect to be precise (paper is accepted,\nand should appear soon). This XML-based language allows the full semantic markup of all the experimental details and all the interesting\nassignments you want to archive. I would like to <strong>challenge ACD</strong> to follow Bioclipse’s lead and provide export as CMLSpect for spectral\nassignments and markup of experimental details, in addition to the PDF in whatever format they prefer. Cheers for the work by Tobias\nand Stefan on spectrum support in Bioclipse!</p>",
      "summary": "Ryan blogged in Archive This about some advices from ACD on how to store spectra in your electronic lab notebook.",
      
      "date_published": "2007-06-22T00:00:00+00:00",
      "date_modified": "2007-06-22T00:00:00+00:00",
      "tags": ["cml","inchi","nmr"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/06/19/new-job-post-doc-at-wur-on-ms-based.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/06/19/new-job-post-doc-at-wur-on-ms-based.html",
      "title": "A new job: post-doc at the WUR on MS based structure elucidation",
      "content_html": "<p>On July 1st I will start a post-doc in <a href=\"http://en.wikipedia.org/wiki/Wageningen\">Wageningen</a>, The Netherlands at the\n<a href=\"http://www.wur.nl/\">WUR</a>. More precisely, with a post-doc in the group of Prof. Van Eeuwijk at <a href=\"http://www.biometris.wur.nl/UK/\">Biometris</a>,\ncooperating with the group of Prof. Hall at <a href=\"http://www.pri.wur.nl/UK/\">Plant Research International</a> (PRI), within the framework of\nthe new <a href=\"http://www.metabolomicscentre.nl/\">Netherlands Metabolomics Center</a>. The topic will be structure elucidation using mass\nspectral data originating from the experimental department of PRI, and will be a nice follow up on the work on SENECA\n<a href=\"http://chem-bla-ics.blogspot.com/2007/04/cubic-period-is-over.html\">I have been doing last year</a> in the group of\n<a href=\"http://wiki.cubic.uni-koeln.de/blog/\">Dr. Christoph Steinbeck</a> at the <a href=\"https://chem-bla-ics.blogspot.com/2007/06/www.cubic.uni-koeln.de/\">CUBIC</a>.</p>",
      "summary": "On July 1st I will start a post-doc in Wageningen, The Netherlands at the WUR. More precisely, with a post-doc in the group of Prof. Van Eeuwijk at Biometris, cooperating with the group of Prof. Hall at Plant Research International (PRI), within the framework of the new Netherlands Metabolomics Center. The topic will be structure elucidation using mass spectral data originating from the experimental department of PRI, and will be a nice follow up on the work on SENECA I have been doing last year in the group of Dr. Christoph Steinbeck at the CUBIC.",
      
      "date_published": "2007-06-19T00:20:00+00:00",
      "date_modified": "2007-06-19T00:20:00+00:00",
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/06/19/quality-of-chemical-database.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/06/19/quality-of-chemical-database.html",
      "title": "Quality of Chemical Database",
      "content_html": "<p>Lately, <a href=\"http://cb.openmolecules.net/\">Chemical blogspace</a> has seen an interesting discussion on the quality of opendata and free chemical database (over\n<a href=\"http://depth-first.com/articles/2007/01/24/thirty-two-free-chemistry-databases\">32 free resources now</a>), such as the\n<a href=\"http://nmrshiftdb.org/\">NMRShiftDB.org</a>. For example, see <a href=\"http://www.chemspider.com/blog/?p=44\">Antony’s view on the NMRShiftDB</a>\nand <a href=\"http://nmrpredict.orc.univie.ac.at/csearch_summary/more_or_less_than_250_errors.html\">Robien’s analysis</a>.</p>\n\n<p><a href=\"http://en.wikipedia.org/wiki/Open_data\">Opendata</a> makes such quality assurance possible, and I am happy that the NMRShiftDB was\nexplored like this; the found problems can be reported and corrected. If correcting them upstream is difficult, opendata allows\none to make a better derivative; that’s what opendata is about. For example, <a href=\"http://biometa.cmbi.ru.nl/\">BioMeta</a>\n(DOI:<a href=\"https://doi.org/10.1186/1471-2105-7-517\">10.1186/1471-2105-7-517</a>) took data from KEGG and corrected a lot of molecular\nproblems (like reaction balancing, stereo chemistry, etc).</p>\n\n<p>I have contributed almost 900 spectra to the NMRShiftDB, and I am sure I may have made a mistake here and there. But my submission is verified\nby a reviewer, and furthermore, users of the database can report inconsistencies via the NMRShiftDB.org website. Now, I have focused on uncommon\nNMR nuclei, like <sup>11</sup>B, <sup>195</sup>Pt and <sup>29</sup>Si (see the <a href=\"http://nmrshiftdb.ice.mpg.de/nmrshiftdbhtml/statistics.html\">stats</a>),\nwhich tend to have only one peak. Nothing much that can go wrong; still, one or two errors were catched by the reviewer.</p>\n\n<h2 id=\"ensuring-data-quality\">Ensuring data quality</h2>\n\n<p>Humans make errors, but not even only when data is entered; they make mistakes checking data too. Nothing much that can\nbe done about that, other than using computers to find patterns. This is exactly what Robien did: he used his software\nwhich implements common patterns to find entries in the database that did not comply to those patterns.</p>\n\n<p>Automated quality assurance requires a easy to use, machine-readable interface. For example, CMLRSS\n(DOI:<a href=\"https://doi.org/10.1021/ci034244p\">10.1021/ci034244p</a>) can be used for running new entries in databases\nagainst known patterns. But other interfaces are most welcome too. Rich recently\n<a href=\"http://depth-first.com/articles/2007/06/11/hacking-pubchem-learning-to-speak-pug\">discussed the new PUG interface</a>,\nwhich offers an interface to <a href=\"http://pubchem.ncbi.nlm.nih.gov/\">PubChem</a>.</p>\n\n<p>German scientists offer a RDF interface to <a href=\"http://wikipedia.org/\">Wikipedia</a>: <a href=\"http://dbpedia.org/\">DBPedia</a>.\nInformal semantic markup in Wikipedia, such as the <a href=\"http://en.wikipedia.org/wiki/Wikipedia:Infobox_templates\">Infobox template</a>,\n<a href=\"http://dbpedia.org/docs/\">are used to create triples</a>. It’s a shame that the <a href=\"http://en.wikipedia.org/wiki/Template:Chembox\">ChemBox</a>\nis not used yet, which would make <a href=\"http://chem-bla-ics.blogspot.com/2007/06/using-wikipedia-to-recognize-molecules.html\">detecting molecules in blogs</a>\neven easier.</p>",
      "summary": "Lately, Chemical blogspace has seen an interesting discussion on the quality of opendata and free chemical database (over 32 free resources now), such as the NMRShiftDB.org. For example, see Antony’s view on the NMRShiftDB and Robien’s analysis.",
      
      "date_published": "2007-06-19T00:10:00+00:00",
      "date_modified": "2007-06-19T00:10:00+00:00",
      "tags": ["opendata","chemistry","pubchem","rdf"],
      "_references": [{ "url": "https://doi.org/10.1186/1471-2105-7-517" },{ "url": "https://doi.org/10.1021/CI034244P" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/06/19/using-wikipedia-to-recognize-molecules.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/06/19/using-wikipedia-to-recognize-molecules.html",
      "title": "Using Wikipedia to recognize Molecules in Blogspace",
      "content_html": "<p>Only few people are <a href=\"http://chem-bla-ics.blogspot.com/2006/12/including-smiles-cml-and-inchi-in.html\">using InChI’s to indicate the molecules the blog about</a>\n(prominent exceptions are <a href=\"http://usefulchem.blogspot.com/\">Useful Chemistry</a> and <a href=\"http://www.scienceblogs.com/moleculeoftheday/\">Molecule of the Day</a>).\nConsequently, the number of detected molecules (without using OSCAR3) in <a href=\"http://cb.openmolecules.net/\">Chemical blogspace</a> has been low.</p>\n\n<p>Fortunately, many more people use links to <a href=\"http://wikipedia.org/\">Wikipedia</a> to identify the molecules that talk about. And some of these pages\nuse the <a href=\"http://en.wikipedia.org/wiki/Template:Chembox\">ChemBox template</a> which actually might contain a\n<a href=\"http://pubchem.ncbi.nlm.nih.gov/\">PubChem</a> CID or even an <a href=\"http://www.iupac.org/inchi/\">InChI</a>. This has increased the\n<a href=\"http://cb.openmolecules.net/inchis.php\">molecular content of Chemical blogspace</a> considerably.</p>\n\n<p>There is also, however, a good list of molecules in Wikipedia for which no CID or InChI is given:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>http://www.en.wikipedia.org/wiki/Hafnium(IV)_oxide -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Cubane -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/water -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/oxidane -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Carminic_acid -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Alizarin -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/AIBN -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/piperidine -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/hydroxide -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/tetrahydrocannabinol -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Epibatidine -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/cortisone -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Eschenmoser%27s_salt -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/pyrrole -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/anthracene -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/benzylbromide -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Skatole -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Teicoplanin -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Methyl_violet -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Penicillin -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Aspartame -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Splenda -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Sucrose -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Rhodamine -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Ascorbic_acid -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Tabun_(nerve_agent) -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Soman -&gt; but no InChI/CID\nhttp://www.wikipedia.org/wiki/Phosgene -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/AZD2171 -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Heavy_water -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/MTBE -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Biotin -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Spermine -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Silicon_carbide -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/stilbene -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Methyl_salicylate -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Dmso -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/DMF -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Acetonitrile -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/HMPA -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Phenol -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/TBHQ -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/MTBE -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Salvia_divinorum -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/salvinorin -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Tetrahydrocannabinol -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Selenium_dioxide -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Piperidine -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Resveratrol -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/P4O10 -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Dimethyl_sulfide -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Folate -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Hydroxybenzotriazole -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Hydrogen_cyanide -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Peroxyacetic_acid -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/epothilone -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/paraquat -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/N-butyllithium -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Nafion -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Boron_nitride -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Triclosan -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Hydrogen_peroxide -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Cholesterol -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/DMAP -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/aniline -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Phenol -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Ascorbic_acid -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Nicotine -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Tetra-ethyl_lead -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Acetophenone -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Ethanol -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Acetaldehyde -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/EDTA -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Menthol -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Formic_acid -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Octanitrocubane -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/VX_%28nerve_agent%29 -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Tetraazidomethane -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Lawesson%27s_reagent -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Hexafluoroisopropanol -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Cellulose -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Bremelanotide -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Cellulose -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Dimethicone#Applications -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Shikimic_acid -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Methyl_amine -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/Dimethyl_amine -&gt; but no InChI/CID\nhttp://www.en.wikipedia.org/wiki/DDT -&gt; but no InChI/CID\n</code></pre></div></div>\n\n<p>I really would like to start adding InChI’s for these molecules to Wikipedia, but someone needs to enlighten me about\nthe state of ChemBox? Can the InChI be added to the template, or should the InChI be given elsewhere on the page?\nAdding such small bits is easier than <a href=\"http://mndoci.com/blog/2007/06/17/writing-something-on-wikipedia/\">writing a full entry</a>.</p>",
      "summary": "Only few people are using InChI’s to indicate the molecules the blog about (prominent exceptions are Useful Chemistry and Molecule of the Day). Consequently, the number of detected molecules (without using OSCAR3) in Chemical blogspace has been low.",
      
      "date_published": "2007-06-19T00:00:00+00:00",
      "date_modified": "2007-06-19T00:00:00+00:00",
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/06/16/payed-summer-jobs-in-chemoinformatics.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/06/16/payed-summer-jobs-in-chemoinformatics.html",
      "title": "Payed summer jobs in chemoinformatics",
      "content_html": "<p>Last year the <a href=\"http://www.programmeerzomer.nl/\">Programmeerzomer.nl</a> sponsored one summer student to work on <a href=\"http://www.bioclipse.net/\">Bioclipse</a>\n(see the <a href=\"http://chem-bla-ics.blogspot.com/2006/06/dutch-summer-of-code-sponsors.html\">announcement</a>). The Programmeerzomer is much like the\n<a href=\"http://kemistry-desktop.blogspot.com/2007/04/chemical-semantic-desktop.html\">Google Summer of Code where I mentor Alexandr</a>. However, it is much\nsmaller and oriented at just the <a href=\"http://en.wikipedia.org/wiki/Netherlands\">NL area</a>: both the student and the mentor needs to be Dutch,\nbut the opensource project does not.</p>\n\n<p>Rob worked last year on a <a href=\"http://wiki.bioclipse.net/index.php?title=Ghemical_plugin\">Ghemical plugin for Bioclipse</a> (see\n<a href=\"http://www.programmeerzomer.nl/interviews/rob_schellhorn/_rp_links1_elementId/1_1257\">this interview in Dutch</a>). The architecture for doing\ncalculations (the Compute plugin) is still being used within several other plugins. This year I got assigned two students: one for\nBioclipse and one for <a href=\"http://www.jmol.org/\">Jmol</a>.</p>\n\n<p>I have no idea at this moment what ideas the students picked from the lists in the wikis (see the\n<a href=\"http://wiki.jmol.org:81/index.php/ProgrammeerZomer\">Jmol project idea</a> and <a href=\"http://wiki.bioclipse.net/index.php?title=SummerOfCode\">Bioclipse idea</a>\nlists). There is a meeting scheduled in the 25th.</p>\n\n<p>The ideas include:</p>\n<ul>\n  <li>Jmol\n    <ul>\n      <li>a SWT widget for Eclipse RCP-based application</li>\n      <li><a href=\"http://wiki.jmol.org:81/index.php/SoCPharmacophores\">Pharmacophore rendering</a></li>\n      <li>Support for PDB in XML</li>\n      <li><a href=\"http://wiki.jmol.org:81/index.php/SoCAjaxJS\">Ajax-enabled JavaScript library</a> (for the applet)</li>\n    </ul>\n  </li>\n  <li>Bioclipse\n    <ul>\n      <li><a href=\"http://wiki.bioclipse.net/index.php?title=Validating_CML_editor\">a validating CML editor</a></li>\n      <li><a href=\"http://wiki.bioclipse.net/index.php?title=Gromacs_plugin\">a GROMACS plugin</a></li>\n      <li><a href=\"http://wiki.bioclipse.net/index.php?title=Sequence_editor\">a sequence editor</a></li>\n      <li><a href=\"http://wiki.bioclipse.net/index.php?title=SummerOfCode\">webservices over Jabber’s XMPP</a></li>\n      <li>Beanshell and Jython scripting plugins</li>\n    </ul>\n  </li>\n</ul>\n\n<p>If you have a suggestion, it would be much appreciated if you can add that to the wiki pages linked above. Make sure to leave a comment to this blog item too, announcing the new idea!</p>",
      "summary": "Last year the Programmeerzomer.nl sponsored one summer student to work on Bioclipse (see the announcement). The Programmeerzomer is much like the Google Summer of Code where I mentor Alexandr. However, it is much smaller and oriented at just the NL area: both the student and the mentor needs to be Dutch, but the opensource project does not.",
      
      "date_published": "2007-06-16T00:00:00+00:00",
      "date_modified": "2007-06-16T00:00:00+00:00",
      "tags": ["jmol","bioclipse"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/06/10/janocchio-jmol-and-cdk-based-1h.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/06/10/janocchio-jmol-and-cdk-based-1h.html",
      "title": "Janocchio: Jmol and CDK based 1H coupling constant prediction",
      "content_html": "<p>While looking up a reference for <a href=\"http://firstglance.jmol.org/\">FirstGlance in Jmol</a>, I found <a href=\"https://sourceforge.net/projects/janocchio/\">Janocchio</a>,\na <a href=\"http://cdk.sf.net/\">CDK</a> and <a href=\"http://www.jmol.org/\">Jmol</a> based tool for prediction of coupling constants,\n<a href=\"https://doi.org/10.1002/mrc.2016\">recently published</a> in <a href=\"http://www3.interscience.wiley.com/cgi-bin/jhome/3767\">Magnetic Resonance in Chemistry</a>.\nIt’s written by Evans, Bodkin, Baker and Sharman (from <a href=\"http://lilly.com/\">Eli Lilly</a>) and licensed LGPL. It is one of those rare contributions of\npharmaceutical industry, and I can only deeply appreciate this contribution.</p>\n\n<p>A quote from the article:</p>\n\n<ul><i>\nIt was therefore decided to create a Java application and applet,\n‘JAva NOe and Coupling Calculator with Handy Interactive Operation’\n(Janocchio), using the open source libraries of the molecular viewer Jmol\nand the Chemical Development Kit (CDK). It aims to provide a simple and\nintuitive way to calculate both the NOEs and couplings.\n</i></ul>\n\n<p>Release 1.0.1 of last May uses an old Jmol, and the CDK release from 26 August 2005. A bit outdated, and I am wondering if it would\nbe a lot of work to integrate this into Bioclipse. <a href=\"http://wiki.bioclipse.net/index.php?title=SummerOfCode\">Maybe a summer job</a>?</p>",
      "summary": "While looking up a reference for FirstGlance in Jmol, I found Janocchio, a CDK and Jmol based tool for prediction of coupling constants, recently published in Magnetic Resonance in Chemistry. It’s written by Evans, Bodkin, Baker and Sharman (from Eli Lilly) and licensed LGPL. It is one of those rare contributions of pharmaceutical industry, and I can only deeply appreciate this contribution.",
      
      "date_published": "2007-06-10T00:00:00+00:00",
      "date_modified": "2007-06-10T00:00:00+00:00",
      "tags": ["jmol","cdk","nmr"],
      "_references": [{ "url": "https://doi.org/10.1002/mrc.2016" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/06/09/preprint-servers-cps-failed-how-will.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/06/09/preprint-servers-cps-failed-how-will.html",
      "title": "Preprint servers: the CPS failed, how will Nature Precedings do?",
      "content_html": "<p>Some 7 years ago, following successes in physics, <a href=\"http://chemweb.com/\">ChemWeb.com</a>\n<a href=\"http://www.prnewswire.co.uk/cgi/news/release?id=10870\">launched the Chemistry Preprint Server (CPS)</a>,\nand <a href=\"https://doi.org/10.1021/ci025627a\">Warr evaluated</a> it in a JCIM article three years later.\nShe wrote about ‘lessons learned’, but the only one seemed to have been that chemistry was not\nready for it, as <a href=\"http://www.iucr.org/iucr-top/lists/epc-l/msg00790.html\">the project shutdown in 2004</a>.\nThe <a href=\"http://www.sciencedirect.com/preprintarchive?url=/CPS\">archives are still available</a>,\nfortunately, and you may find it amusing to look up my or some other submission.</p>\n\n<p>Now, <a href=\"http://blogs.nature.com/wp/nascent/2007/06/coming_soon_nature_precedings.html\">Nascent wrote that Nature is setting up</a>\n<a href=\"http://precedings.nature.com/\">Nature Precedings</a>, which was earlier\n<a href=\"http://pbeltrao.blogspot.com/2007/06/nature-preceedings-pre-print-server-for.html\">noted by Pedro</a>.\nThe <a href=\"https://doi.org/10.1038/447614a\">official announcement</a> was published as an editorial in\nNature. This being a Nature initiative, and not focused on just chemistry, I am sure it will do\nbetter than CPS. BTW, media coverage is <a href=\"http://www.connotea.org/user/timo/tag/Precedings\">tracked in a social way</a>.</p>\n\n<p>I might <a href=\"http://network.nature.com/groups/bioinformatics/notice/2007/06/08/nature-precedings-contributors-wanted\">request an test account</a>;\nI do have an old half-finished manuscript that I never got around to finishing. While still relevant,\nit could use some community input; this preprint server would be the perfect tool. That’s how my first\nmanuscript ended up on CPS too :)</p>",
      "summary": "Some 7 years ago, following successes in physics, ChemWeb.com launched the Chemistry Preprint Server (CPS), and Warr evaluated it in a JCIM article three years later. She wrote about ‘lessons learned’, but the only one seemed to have been that chemistry was not ready for it, as the project shutdown in 2004. The archives are still available, fortunately, and you may find it amusing to look up my or some other submission.",
      
      "date_published": "2007-06-09T00:00:00+00:00",
      "date_modified": "2007-06-09T00:00:00+00:00",
      "tags": ["publishing","nature"],
      "_references": [{ "url": "https://doi.org/10.1021/ci025627a" },{ "url": "https://doi.org/10.1038/447614a" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/06/08/scientific-literature-searching-ranking.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/06/08/scientific-literature-searching-ranking.html",
      "title": "Scientific Literature: searching, ranking, storage",
      "content_html": "<p>Dealing with scientific literature has been one important theme in <a href=\"http://wiki.cubic.uni-koeln.de/cb/\">Chemical blogspace</a>.\nFor example, ranking articles and how to store your personal PDF archive has been topics of discussion. In this blog I will\nsummarize bits of the discussion, and my personal view on things.</p>\n\n<h2 id=\"searching\">Searching</h2>\n\n<p>Searching literature is traditionally done in systems like Chemical Abstracts and Web-of-Science. The open nature of a\ngrowing number of repositories (e.g. the Dutch <a href=\"http://www.darenet.nl/en/page/language.view/search.page\">DARE</a>) and\nindexing facilities like <a href=\"http://www.ncbi.nlm.nih.gov/sites/entrez?db=pubmed\">PubMed</a> make these proprietary tools\nobsolete.</p>\n\n<p>It is incorrect to assume that these payed services are the only trustworthy sources. Even WoS fails to make the all\nlinks between entries in the database. For example, I am aware of two missing citations to articles I have written,\neven though both the cited and the citing article is available in the system. One of the citing articles was in the\n<a href=\"http://www3.interscience.wiley.com/cgi-bin/jhome/26737?CRETRY=1&amp;SRETRY=0\">Angewandte Chemie</a>!</p>\n\n<p>Additionally, some search services, like <a href=\"http://scholar.google.com/\">Google Scholar</a>, have the advantage that they\nfind copies and close variants of articles in proprietary articles on home pages and in open repositories. Today,\nI learned about <a href=\"http://en.scientificcommons.org/\">Scientific Commons</a> which indexes and links to a staggering\n1.5M publications, using, among others, PubMed and university repositories. Where possible it makes direct links\nto PDF versions of the article.</p>\n\n<h2 id=\"ranking\">Ranking</h2>\n\n<p><a href=\"http://www.chemicalforums.com/index.php?topic=17653.msg67580#msg67580\">Mitch set up</a> <a href=\"http://chemrank.com/\">ChemRank</a>,\nto which <a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=342\">Peter</a>, the <a href=\"http://www.thechemblog.com/?p=552\">ChemBlog</a>\nand <a href=\"http://chem-bla-ics.blogspot.com/2007/05/chemrank-ranking-scientific-literature.html\">I replied</a>. Afterwards,\nI learned that other services are available too, that allow, in addition to setting up an online personal literature\ndatabase, voting and commenting on articles.</p>\n\n<p>Apparently, <a href=\"http://www.citeulike.org/\">CiteULike</a> (CUL) supports this too. In contrast to ChemRank, CUL requires\na login, which I personally see as an advantage, because I can browse literature bookmarked by other accounts I trust.\nThere is also <a href=\"http://www.connotea.org/\">Connotea</a> but I never liked that site that much (e.g. is allows bookmarking\nany web page); <a href=\"http://depth-first.com/articles/2007/03/22/why-i-still-dont-use-connotea\">Rich has his comments too</a>.\nI would also like to mention <a href=\"http://www.biowizard.com/\">BioWizard</a> which is based on the PubMed content, which actually\ncovers a good deal of chemistry literature nowadays too.</p>\n\n<h2 id=\"local-storage\">Local Storage</h2>\n\n<p>These above mentioned systems can be used as alternative to offline bibliographic database systems, like EndNote and\n<a href=\"http://jabref.sf.net/\">JabRef</a>. The latter is my favorite, being based on BibTeX which I use for my LaTeX based\npublications, and is opensource and contains <a href=\"http://www.ohloh.net/accounts/2934/contributions/557\">a few patches</a>\nfrom yours truly. Jungfreudlich wondered <a href=\"http://www.jungfreudlich.de/2007/05/20/how-are-your-paper-files-organized/\">how people organized their PDF archive</a>\nand <a href=\"http://www.jungfreudlich.de/2007/05/20/how-are-your-paper-files-organized/#comment-3199\">I commented how I do it</a>:</p>\n\n<ul>\n  <li>a directory hierarchy based on journal name and year</li>\n  <li>file names that include last name of the first author and year</li>\n  <li>JabRef for the bibiographic database</li>\n  <li><a href=\"http://strigi.sf.net/\">Strigi</a> for full text search</li>\n</ul>\n\n<p><a href=\"http://miningdrugs.blogspot.com/2007/05/literature-management.html\">Jörg</a> and\n<a href=\"http://www.thepowerofgoo.net/2007/05/20/organizing-pdfs-papers/\">the power of goo</a> replied too.</p>\n\n<h2 id=\"mashups\">Mashups</h2>\n\n<p>I have accounts on several online tools now (with some duplication which I don’t like), and I have no idea which of\nthe options will stay around. Time will learn. Good news is that the open characters of many of these allow making\nmashups, and generally integrate tools. For example, JabRef allows downloading citations from PubMed, and Noel\n<a href=\"http://baoilleach.blogspot.com/2007/05/supporting-information-available-as.html\">suggested to use Greasemonkey scripts to link to the supplementary information for his articles</a>,\ninstead of using the mechanisms journals have. I can see the advantage of this, as, for example,\n<a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=306\">Wiley takes full copyright of the data in SI material</a>,\nwhile Noel’s mechanism would keep the data open.</p>\n\n<p>For now, however, I would very much like to see a meta service where I can query rankings and comment for\narticles using any or all of the above tools.</p>",
      "summary": "Dealing with scientific literature has been one important theme in Chemical blogspace. For example, ranking articles and how to store your personal PDF archive has been topics of discussion. In this blog I will summarize bits of the discussion, and my personal view on things.",
      
      "date_published": "2007-06-08T00:00:00+00:00",
      "date_modified": "2007-06-08T00:00:00+00:00",
      "tags": ["citeulike","publishing","chemistry"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/06/05/blue-obelisk-corner-in-chemical.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/06/05/blue-obelisk-corner-in-chemical.html",
      "title": "A Blue Obelisk corner in Chemical Blogspace",
      "content_html": "<p>I just finished setting up a <a href=\"http://www.blueobelisk.org/\">Blue Obelisk</a> section for <a href=\"http://wiki.cubic.uni-koeln.de/cb/\">Chemical blogspace</a>,\nas future replacement for the current <a href=\"http://www.blueobelisk.org/planetbo/\">Planet Blue Obelisk</a> (unless someone wants to take over that webpage).\nThe only thing really missing is a RSS feed for <a href=\"http://wiki.cubic.uni-koeln.de/cb/posts.php?category=Blue%20Obelisk\">recent posts</a> for just\nthe <a href=\"http://wiki.cubic.uni-koeln.de/cb/blogs.php?category=Blue%20Obelisk\">Blue Obelisk member blogs</a> (BTW, just email me if you want to be\nlisted as BO member with your blog too; the BO community is very open!).</p>\n\n<p>For now, you will have to do with <a href=\"http://wiki.cubic.uni-koeln.de/cb/index.php?category=Blue%20Obelisk\">this page</a>:</p>\n\n<p><img src=\"/assets/images/cbbo.png\" alt=\"\" /></p>\n\n<p>An additional flaw is that it also shows molecules for other blogs.</p>\n\n<p><strong><em>Update</em></strong>: the RSS feed for a specific category was already available, but just not from the FireFox URL bar. Instead, it is\ngiven on the right side of the posts page when you selected a category. Here a shortcut for the RSS for\n<a href=\"http://wiki.cubic.uni-koeln.de/cb/atom.php?category=Blue%20Obelisk&amp;type=latest_posts\">posts from the Blue Obelisk category</a>.</p>",
      "summary": "I just finished setting up a Blue Obelisk section for Chemical blogspace, as future replacement for the current Planet Blue Obelisk (unless someone wants to take over that webpage). The only thing really missing is a RSS feed for recent posts for just the Blue Obelisk member blogs (BTW, just email me if you want to be listed as BO member with your blog too; the BO community is very open!).",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/cbbo.png",
      "date_published": "2007-06-05T00:00:00+00:00",
      "date_modified": "2007-06-06T00:00:00+00:00",
      "tags": ["cb","blue-obelisk"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/06/03/finding-email-with-strigi-in-tar.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/06/03/finding-email-with-strigi-in-tar.html",
      "title": "Finding email with Strigi in .tar backups",
      "content_html": "<p>Now that <a href=\"http://chemicalblogspace.blogspot.com/2007/05/uploaded-source-code-to-sf-svn.html\">my CUBIC desktop machine is shutting down</a>,\nI made the necessary backups, among a mail.tar for my mail correspondence of about a year. About 500MB in size for almost 8700 files.\n<a href=\"http://strigi.sf.net/\">Strigi</a> is a perfect tool to help me find messages in this archive, as it will recurse into the .tar archive,\nand even into email attachements. I created an index just for the archive with:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>strigicmd create -t clucene -d index/ mail.tar\n</code></pre></div></div>\n\n<p>It took Strigi about 30 seconds to index the whole archive. That’s good performance!</p>\n\n<p>Now, Strigi indexes content full text, but also uses a controlled vocabulary (among which\n<a href=\"http://kemistry-desktop.blogspot.com/2007/04/chemical-semantic-desktop.html\">one specifically for chemistry</a>).\nSo I can search for email messages which have article in the subject with:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>strigicmd query -t clucene -d index/ email.subject:article\n</code></pre></div></div>\n\n<p>However, <code class=\"language-plaintext highlighter-rouge\">From:</code> and <code class=\"language-plaintext highlighter-rouge\">To:</code> content was not yet extracted. That was easily patched. This allows me to find correspondence between me and, for example, Christoph:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>strigicmd query -t clucene -d index/ email.to:Christoph AND email.from:Egon\n</code></pre></div></div>",
      "summary": "Now that my CUBIC desktop machine is shutting down, I made the necessary backups, among a mail.tar for my mail correspondence of about a year. About 500MB in size for almost 8700 files. Strigi is a perfect tool to help me find messages in this archive, as it will recurse into the .tar archive, and even into email attachements. I created an index just for the archive with:",
      
      "date_published": "2007-06-03T00:00:00+00:00",
      "date_modified": "2007-06-03T00:00:00+00:00",
      "tags": ["kde"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/05/30/chemrank-ranking-scientific-literature.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/05/30/chemrank-ranking-scientific-literature.html",
      "title": "ChemRank: ranking scientific literature",
      "content_html": "<p><a href=\"http://blog.chemicalforums.com/\">Mitch</a> <a href=\"http://www.chemicalforums.com/index.php?topic=17653\">just launched</a>\n<a href=\"http://www.chemrank.com/\">ChemRank</a>, a website where we can comment on and vote thumbs up or down for scientific articles.\nGood initiative I think. Some thoughts:</p>\n\n<ul>\n  <li>please include the DOI for each article overview on the front page (see <a href=\"http://baoilleach.blogspot.com/2007/04/add-quotes-from-postgenomic-and.html\">why</a>)</li>\n  <li>make the content <a href=\"http://en.wikipedia.org/wiki/Open_data\">opendata</a>, e.g. using the <a href=\"http://en.wikipedia.org/wiki/Creative_Commons\">CC license</a></li>\n  <li>provide a means to refer to other literature to back up comments and ranking</li>\n  <li>provide an API to make mashups (like that of <a href=\"http://blueobelisk.svn.sourceforge.net/viewvc/blueobelisk/cb/trunk/interface/api.php?revision=11&amp;view=markup\">Chemical blogspace for use in Greasemonkey scripts</a>)</li>\n  <li>make the website source code opensource (JSON, RDF come to mind)</li>\n  <li>use microformats where possible (for <a href=\"https://addons.mozilla.org/nl/firefox/addon/4106\">Operator</a> and FF3)</li>\n  <li>at least provide means for tagging articles</li>\n  <li>provide browsing by journal</li>\n  <li>import articles from Connotea/NatureNetwork/etc</li>\n</ul>\n\n<p>Please consider there as feature requests, and not as critique. Two of these are already listed in the\n<a href=\"http://www.chemicalforums.com/index.php?topic=17653\">developers wishlist</a>. I will likely come up with more later :)</p>",
      "summary": "Mitch just launched ChemRank, a website where we can comment on and vote thumbs up or down for scientific articles. Good initiative I think. Some thoughts:",
      
      "date_published": "2007-05-30T00:20:00+00:00",
      "date_modified": "2007-05-30T00:20:00+00:00",
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/05/30/weka-decision-trees-to-java-conversion.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/05/30/weka-decision-trees-to-java-conversion.html",
      "title": "Weka Decision Trees to Java Conversion",
      "content_html": "<p>Some time ago I wrote a small Perl script to convert a decision tree created with <a href=\"http://www.cs.waikato.ac.nz/~ml/weka/\">Weka</a> in the\n<a href=\"http://www.cs.waikato.ac.nz/~ml/weka/arff.html\">ARFF format</a> to Java source code, for use in the\n<a href=\"http://cheminfo.informatics.indiana.edu/~rguha/code/java/nightly/api/org/openscience/cdk/qsar/descriptors/molecular/IPMolecularDescriptor.html\">ionization potential prediction</a>\nin <a href=\"http://cdk.sf.net/\">CDK</a>. The advantage is that Weka is no longer used are runtime, and that there is no model that needs to be loaded and interpreted. Instead, it is simple Java code that does the work, much faster.</p>\n\n<p>This is the code:</p>\n\n<div class=\"language-perl highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"c1\">#!/usr/bin/perl</span>\n<span class=\"c1\">#</span>\n<span class=\"c1\"># Copyright 2007 (C) Egon Willighagen</span>\n<span class=\"c1\"># License: GPL</span>\n\n<span class=\"k\">use</span> <span class=\"nv\">diagnostics</span><span class=\"p\">;</span>\n<span class=\"k\">use</span> <span class=\"nv\">strict</span><span class=\"p\">;</span>\n\n<span class=\"k\">my</span> <span class=\"nv\">$filename</span> <span class=\"o\">=</span> <span class=\"nv\">$ARGV</span><span class=\"p\">[</span><span class=\"mi\">0</span><span class=\"p\">];</span>\n\n<span class=\"k\">print</span> <span class=\"p\">\"</span><span class=\"s2\">double result = 0.0;</span><span class=\"se\">\\n</span><span class=\"p\">\";</span>\n<span class=\"nb\">open</span><span class=\"p\">(</span><span class=\"nv\">INPUT</span><span class=\"p\">,</span> <span class=\"p\">\"</span><span class=\"s2\">&lt;</span><span class=\"si\">$filename</span><span class=\"p\">\");</span>\n<span class=\"k\">my</span> <span class=\"nv\">$level</span> <span class=\"o\">=</span> <span class=\"mi\">0</span><span class=\"p\">;</span>\n<span class=\"k\">my</span> <span class=\"nv\">$prevLevel</span> <span class=\"o\">=</span> <span class=\"o\">-</span><span class=\"mi\">1</span><span class=\"p\">;</span>\n<span class=\"k\">while</span> <span class=\"p\">(</span><span class=\"k\">my</span> <span class=\"nv\">$line</span> <span class=\"o\">=</span> <span class=\"o\">&lt;</span><span class=\"nv\">INPUT</span><span class=\"o\">&gt;</span><span class=\"p\">)</span> <span class=\"p\">{</span>\n  <span class=\"nv\">$line</span> <span class=\"o\">=~</span> <span class=\"sr\">s/\\n//g</span><span class=\"p\">;</span>\n  <span class=\"nv\">$level</span> <span class=\"o\">=</span> <span class=\"mi\">0</span><span class=\"p\">;</span>\n  <span class=\"k\">while</span> <span class=\"p\">(</span><span class=\"nv\">$line</span> <span class=\"o\">=~</span> <span class=\"sr\">/^\\|\\s*(.*)/</span><span class=\"p\">)</span> <span class=\"p\">{</span>\n    <span class=\"nv\">$level</span><span class=\"o\">++</span><span class=\"p\">;</span>\n    <span class=\"nv\">$line</span> <span class=\"o\">=</span> <span class=\"err\">$</span><span class=\"mi\">1</span><span class=\"p\">;</span>\n  <span class=\"p\">}</span>\n  <span class=\"k\">my</span> <span class=\"nv\">$else</span> <span class=\"o\">=</span> <span class=\"p\">\"\";</span>\n  <span class=\"k\">if</span> <span class=\"p\">(</span><span class=\"nv\">$prevLevel</span> <span class=\"o\">==</span> <span class=\"nv\">$level</span><span class=\"p\">)</span> <span class=\"p\">{</span>\n    <span class=\"nv\">$else</span> <span class=\"o\">=</span> <span class=\"p\">\"</span><span class=\"s2\">else </span><span class=\"p\">\";</span>\n  <span class=\"p\">}</span> <span class=\"k\">elsif</span> <span class=\"p\">(</span><span class=\"nv\">$prevLevel</span> <span class=\"o\">&lt;</span> <span class=\"nv\">$level</span><span class=\"p\">)</span> <span class=\"p\">{</span>\n    <span class=\"c1\"># we increase one level at a time</span>\n    <span class=\"k\">for</span> <span class=\"p\">(</span><span class=\"k\">my</span> <span class=\"nv\">$i</span><span class=\"o\">=</span><span class=\"mi\">0</span><span class=\"p\">;</span> <span class=\"nv\">$i</span><span class=\"o\">&lt;</span><span class=\"p\">(</span><span class=\"nv\">$level</span><span class=\"o\">-</span><span class=\"mi\">1</span><span class=\"p\">);</span> <span class=\"nv\">$i</span><span class=\"o\">++</span><span class=\"p\">)</span> <span class=\"p\">{</span> <span class=\"k\">print</span> <span class=\"p\">\"</span><span class=\"s2\">  </span><span class=\"p\">\";</span> <span class=\"p\">};</span>\n    <span class=\"k\">print</span> <span class=\"p\">\"</span><span class=\"s2\">{</span><span class=\"se\">\\n</span><span class=\"p\">\";</span>\n    <span class=\"nv\">$prevLevel</span> <span class=\"o\">=</span> <span class=\"nv\">$level</span><span class=\"p\">;</span>\n  <span class=\"p\">}</span> <span class=\"k\">else</span> <span class=\"p\">{</span>\n    <span class=\"c1\"># this is a bit more tricky: we possibly need more than</span>\n    <span class=\"c1\"># one end bracket</span>\n    <span class=\"k\">my</span> <span class=\"nv\">$diff</span> <span class=\"o\">=</span> <span class=\"nv\">$prevLevel</span> <span class=\"o\">-</span> <span class=\"nv\">$level</span><span class=\"p\">;</span>\n    <span class=\"k\">for</span> <span class=\"p\">(</span><span class=\"k\">my</span> <span class=\"nv\">$closes</span><span class=\"o\">=</span><span class=\"mi\">0</span><span class=\"p\">;</span> <span class=\"nv\">$closes</span><span class=\"o\">&lt;</span><span class=\"nv\">$diff</span><span class=\"p\">;</span> <span class=\"nv\">$closes</span><span class=\"o\">++</span><span class=\"p\">)</span> <span class=\"p\">{</span>\n      <span class=\"k\">for</span> <span class=\"p\">(</span><span class=\"k\">my</span> <span class=\"nv\">$i</span><span class=\"o\">=</span><span class=\"mi\">0</span><span class=\"p\">;</span> <span class=\"nv\">$i</span><span class=\"o\">&lt;</span><span class=\"p\">(</span><span class=\"nv\">$prevLevel</span><span class=\"o\">-</span><span class=\"nv\">$closes</span><span class=\"o\">-</span><span class=\"mi\">1</span><span class=\"p\">);</span> <span class=\"nv\">$i</span><span class=\"o\">++</span><span class=\"p\">)</span> <span class=\"p\">{</span> <span class=\"k\">print</span> <span class=\"p\">\"</span><span class=\"s2\">  </span><span class=\"p\">\";</span> <span class=\"p\">};</span>\n      <span class=\"k\">print</span> <span class=\"p\">\"</span><span class=\"s2\">}</span><span class=\"se\">\\n</span><span class=\"p\">\";</span>\n    <span class=\"p\">}</span>\n    <span class=\"nv\">$prevLevel</span> <span class=\"o\">=</span> <span class=\"nv\">$level</span><span class=\"p\">;</span>\n  <span class=\"p\">}</span>\n  <span class=\"k\">if</span> <span class=\"p\">(</span><span class=\"nv\">$line</span> <span class=\"o\">=~</span> <span class=\"sr\">/:/</span><span class=\"p\">)</span> <span class=\"p\">{</span>\n    <span class=\"k\">my</span> <span class=\"p\">(</span><span class=\"nv\">$if</span><span class=\"p\">,</span> <span class=\"nv\">$then</span><span class=\"p\">)</span> <span class=\"o\">=</span> <span class=\"nb\">split</span><span class=\"p\">(\"</span><span class=\"s2\">:</span><span class=\"p\">\",</span><span class=\"nv\">$line</span><span class=\"p\">);</span>\n    <span class=\"k\">for</span> <span class=\"p\">(</span><span class=\"k\">my</span> <span class=\"nv\">$i</span><span class=\"o\">=</span><span class=\"mi\">0</span><span class=\"p\">;</span> <span class=\"nv\">$i</span><span class=\"o\">&lt;</span><span class=\"nv\">$level</span><span class=\"p\">;</span> <span class=\"nv\">$i</span><span class=\"o\">++</span><span class=\"p\">)</span> <span class=\"p\">{</span> <span class=\"k\">print</span> <span class=\"p\">\"</span><span class=\"s2\">  </span><span class=\"p\">\";</span> <span class=\"p\">};</span>\n    <span class=\"c1\"># FIXME: java-fy $then</span>\n    <span class=\"k\">if</span> <span class=\"p\">(</span><span class=\"nv\">$then</span> <span class=\"o\">=~</span> <span class=\"sr\">/([\\d|_]*)\\s*\\(([^\\)]*)\\)/</span><span class=\"p\">)</span> <span class=\"p\">{</span>\n      <span class=\"k\">my</span> <span class=\"nv\">$result</span> <span class=\"o\">=</span> <span class=\"err\">$</span><span class=\"mi\">1</span><span class=\"p\">;</span>\n      <span class=\"k\">my</span> <span class=\"nv\">$stats</span> <span class=\"o\">=</span> <span class=\"err\">$</span><span class=\"mi\">2</span><span class=\"p\">;</span>\n      <span class=\"nv\">$result</span> <span class=\"o\">=~</span> <span class=\"sr\">s/_/\\./g</span><span class=\"p\">;</span>\n      <span class=\"k\">print</span> <span class=\"nv\">$else</span> <span class=\"o\">.</span> <span class=\"p\">\"</span><span class=\"s2\">if (</span><span class=\"si\">$if</span><span class=\"s2\">) { result = </span><span class=\"si\">$result</span><span class=\"s2\">; // </span><span class=\"si\">$stats</span><span class=\"s2\"> }</span><span class=\"se\">\\n</span><span class=\"p\">\";</span>\n    <span class=\"p\">}</span> <span class=\"k\">else</span> <span class=\"p\">{</span>\n      <span class=\"k\">print</span> <span class=\"nv\">$else</span> <span class=\"o\">.</span> <span class=\"p\">\"</span><span class=\"s2\">if (</span><span class=\"si\">$if</span><span class=\"s2\">) { result = </span><span class=\"si\">$then</span><span class=\"s2\">; }</span><span class=\"se\">\\n</span><span class=\"p\">\";</span>\n    <span class=\"p\">}</span>\n  <span class=\"p\">}</span> <span class=\"k\">else</span> <span class=\"p\">{</span>\n    <span class=\"k\">for</span> <span class=\"p\">(</span><span class=\"k\">my</span> <span class=\"nv\">$i</span><span class=\"o\">=</span><span class=\"mi\">0</span><span class=\"p\">;</span> <span class=\"nv\">$i</span><span class=\"o\">&lt;</span><span class=\"nv\">$level</span><span class=\"p\">;</span> <span class=\"nv\">$i</span><span class=\"o\">++</span><span class=\"p\">)</span> <span class=\"p\">{</span> <span class=\"k\">print</span> <span class=\"p\">\"</span><span class=\"s2\">  </span><span class=\"p\">\";</span> <span class=\"p\">};</span>\n    <span class=\"k\">print</span> <span class=\"nv\">$else</span> <span class=\"o\">.</span> <span class=\"p\">\"</span><span class=\"s2\">if (</span><span class=\"si\">$line</span><span class=\"s2\">)</span><span class=\"se\">\\n</span><span class=\"p\">\";</span>\n  <span class=\"p\">}</span>\n<span class=\"p\">}</span>\n\n<span class=\"c1\"># OK, now add the rest of the closing brackets</span>\n<span class=\"k\">for</span> <span class=\"p\">(</span><span class=\"k\">my</span> <span class=\"nv\">$closes</span><span class=\"o\">=</span><span class=\"nv\">$prevLevel</span><span class=\"p\">;</span> <span class=\"nv\">$closes</span><span class=\"o\">&gt;</span><span class=\"mi\">0</span><span class=\"p\">;</span> <span class=\"nv\">$closes</span><span class=\"o\">--</span><span class=\"p\">)</span> <span class=\"p\">{</span>\n  <span class=\"k\">for</span> <span class=\"p\">(</span><span class=\"k\">my</span> <span class=\"nv\">$i</span><span class=\"o\">=</span><span class=\"mi\">0</span><span class=\"p\">;</span> <span class=\"nv\">$i</span><span class=\"o\">&lt;</span><span class=\"p\">(</span><span class=\"nv\">$closes</span><span class=\"o\">-</span><span class=\"mi\">1</span><span class=\"p\">);</span> <span class=\"nv\">$i</span><span class=\"o\">++</span><span class=\"p\">)</span> <span class=\"p\">{</span> <span class=\"k\">print</span> <span class=\"p\">\"</span><span class=\"s2\">  </span><span class=\"p\">\";</span> <span class=\"p\">};</span>\n  <span class=\"k\">print</span> <span class=\"p\">\"</span><span class=\"s2\">}</span><span class=\"se\">\\n</span><span class=\"p\">\";</span>\n<span class=\"p\">}</span>\n</code></pre></div></div>",
      "summary": "Some time ago I wrote a small Perl script to convert a decision tree created with Weka in the ARFF format to Java source code, for use in the ionization potential prediction in CDK. The advantage is that Weka is no longer used are runtime, and that there is no model that needs to be loaded and interpreted. Instead, it is simple Java code that does the work, much faster.",
      
      "date_published": "2007-05-30T00:10:00+00:00",
      "date_modified": "2007-05-30T00:10:00+00:00",
      "tags": ["java","cheminf","cdk"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/05/29/jcim-is-linking-to-planet-blue-obelisk.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/05/29/jcim-is-linking-to-planet-blue-obelisk.html",
      "title": "The JCIM is linking to Planet Blue Obelisk??",
      "content_html": "<p>I use <a href=\"http://www.google.com/analytics/\">Google Analytics</a> to analyze the visitors of my blogs and of\n<a href=\"http://blueobelisk.org/planetbo/\">Planet Blue Obelisk</a> too. Now, for the past couple of weeks, the webpage of the\n<a href=\"http://pubs.acs.org/journals/jcisd8/index.html\">Journal of Chemical Information and Modeling</a> is\nshowing up as refering site:</p>\n\n<p><img src=\"/assets/images/jcim-bo-link.png\" alt=\"\" /></p>\n\n<p>What is going on here ?!?! This is really no fake, but cannot find an actual link when I visit the journal\nwebpage either…</p>\n\n<p><strong>Update</strong>: When looking at the logs, it becomes even weirder. Nothing shows up, so I can only assume this is a\nglitch in the Google Analytics system :( What I did see in the log, was referrals pointing to Chemical\nBlogspace :) That must be the user script in action.</p>",
      "summary": "I use Google Analytics to analyze the visitors of my blogs and of Planet Blue Obelisk too. Now, for the past couple of weeks, the webpage of the Journal of Chemical Information and Modeling is showing up as refering site:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/jcim-bo-link.png",
      "date_published": "2007-05-29T00:00:00+00:00",
      "date_modified": "2007-06-08T00:00:00+00:00",
      "tags": ["blue-obelisk"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/05/25/numbers-are-copyrighted.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/05/25/numbers-are-copyrighted.html",
      "title": "Numbers are copyrighted?",
      "content_html": "<p>I just read on <a href=\"http://www.blueobelisk.org/planetbo/\">Planet Blue Obelisk</a> <a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/\">Peter</a>’s\ndisturbing news (via <a href=\"http://www.earlham.edu/~peters/fos/2007_05_20_fosblogarchive.html#6528603867120185583\">Suber</a>) that\n<a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=338\">Wiley thinks it can copyright a set of numbers</a> (also known as data).\nThat is a sad milestone in scientific publishing. It reminds me of the recent internet hype about a long number recently\nflooding the internet (and notably <a href=\"http://www.del.icio.us/\">del.icio.us</a>) related to watching DVDs you legally bought.\nSome details can be found in this <a href=\"http://www.lwn.net/\">Linux Weekly News</a> article on\n<a href=\"http://lwn.net/Articles/233660/\">How Debian packages a number</a>.</p>\n\n<p>Interestingly, this is really not problems just regarding commercial publishers, or closed access publishing or so. Yesterday,\n<a href=\"http://wiki.cubic.uni-koeln.de/blog/\">Christoph</a> and I working on getting <a href=\"http://chem-bla-ics.blogspot.com/2006/09/chemical-archeology-oscar3-to.html\">the NMR spectrum text mining</a>\ngoing in <a href=\"http://www.bioclipse.net/\">Bioclipse</a> again for the <a href=\"http://teacher.bmc.uu.se/BioclipseWS07/\">workshop</a>,\nwe noticed that the open access <a href=\"http://bjoc.beilstein-journals.org/\">Beilstein Journal of Organic Chemistry</a>,\ndoes not make <a href=\"http://en.wikipedia.org/wiki/Open_Data\">Open Data</a> reality either: the experimental sections are\ngenerally (all?) excluded from the main text in HTML and obscured in .doc files in the supplementary information.</p>\n\n<p>BTW, this makes me wonder if organic chemists still consider the experimental properties of molecules novel science.</p>",
      "summary": "I just read on Planet Blue Obelisk Peter’s disturbing news (via Suber) that Wiley thinks it can copyright a set of numbers (also known as data). That is a sad milestone in scientific publishing. It reminds me of the recent internet hype about a long number recently flooding the internet (and notably del.icio.us) related to watching DVDs you legally bought. Some details can be found in this Linux Weekly News article on How Debian packages a number.",
      
      "date_published": "2007-05-25T00:00:00+00:00",
      "date_modified": "2007-05-25T00:00:00+00:00",
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/05/11/added-my-hcard-to-my-blog.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/05/11/added-my-hcard-to-my-blog.html",
      "title": "Added my hCard to my blog",
      "content_html": "<p>Getting back on microformats (see <a href=\"http://chem-bla-ics.blogspot.com/2007/05/microformats-in-chemistry.html\">yesterday</a>),\nI added my <a href=\"http://microformats.org/wiki/hcard\">hCard</a> to the bottom of my blog:</p>\n\n<p><img src=\"/blog//assets/images/hCard2.png\" alt=\"\" /></p>\n\n<p>I will likely populate it a bit more soon (after holiday in Sweden).</p>\n\n<p>Now, if you had the Firefox plugin <a href=\"https://addons.mozilla.org/en-US/firefox/addon/4106\">Operator</a> installed, you would\nhave my contact information show up in your FF toolbar, like this:</p>\n\n<p><img src=\"/blog//assets/images/hCard1.png\" alt=\"\" /></p>\n\n<p>Note the ‘Export Contact’ button in the toolbar. This will automatically create a vCard which I can directly open in\nmy address book (I use the KDE addressbook). Very nice integration!</p>\n\n<p>Now, I already asked the author how the plugin could be extended to support chemical microformats. Just think of the\nfeature “Export Molecule (137)” (e.g. to <a href=\"http://www.bioclipse.net/\">Bioclipse</a>), when reading a HTML version of paper\nin one of the <a href=\"http://www.rsc.org/Publishing/Journals/ProjectProspect/\">Project Prospect</a> enabled journals :)</p>",
      "summary": "Getting back on microformats (see yesterday), I added my hCard to the bottom of my blog:",
      "image": "https://chem-bla-ics.linkedchemistry.info/blog//assets/images/hCard1.png",
      "date_published": "2007-05-11T00:10:00+00:00",
      "date_modified": "2007-05-11T00:10:00+00:00",
      "tags": ["microformat","blog"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/05/11/microformats-in-chemistry.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/05/11/microformats-in-chemistry.html",
      "title": "Microformats in chemistry...",
      "content_html": "<p><a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/\">Peter</a> blogged some days ago about <a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=295\">microformats and how they could be used in chemistry</a>.\nBeing late and a bit absent minded, I added a short comment that <a href=\"http://wiki.cubic.uni-koeln.de/cb/\">Chemical blogspace</a>\n<a href=\"http://chemicalblogspace.blogspot.com/2006/12/hacking-inchi-support-into-cb.html\">supports</a>\n<a href=\"http://chem-bla-ics.blogspot.com/2006/12/including-smiles-cml-and-inchi-in.html\">microformats for chemistry</a>, and that\n<a href=\"http://chemicalblogspace.blogspot.com/2007/02/latest-blogged-molecules-on-front-page.html\">chemistry is harvested from that</a>,\nand actually <a href=\"http://chemicalblogspace.blogspot.com/2007/01/cb-gets-cmlrss-feed.html\">semantically distributed again using CMLRSS</a>.</p>\n\n<p>In reply to my comment, he wrote <a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=299\">a follow up</a> highlighting one of blog items linked\nabove (thanx for that!). Accidentally, he also published my Gmail account and IP address, which was really just for the blog owner to\nsee who did the comment, and not for the world to harvest. This is a moment I am not so happy that Peter’s blog is so popular ;) Peter,\nmaybe be a bit more careful with copy/pasting next time.</p>\n\n<p>Peter and Henry (still not in blogspace?) have been <a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=301\">doing things along these lines for years now</a>,\noften in different contexts. But getting these things going is a bit trickier. Actually, the take up of the chemical microformats\nhas been limited, and at least one alternative mechanism is being used: put the InChI in the <code class=\"language-plaintext highlighter-rouge\">@alt</code> attribute on the <code class=\"language-plaintext highlighter-rouge\">&lt;img&gt;</code> element.\nOther alternatives are possible too, such as recognizing molecules (or whatever else) based on a link to wikipedia; linking to\nentries in <a href=\"http://www.wikipedia.org/\">wikipedia</a> is popular in Chemical blogspace.</p>\n\n<p>One problem in getting microformats accepted, especially among chemists, is to have tools available. Tools meaning dedicated plugins\nfor blogging software to easy adding microformats to a blog item. You’d be suprised how uncommon raw HTML editing has become in the\nlast 10 years. <a href=\"http://structuredblogging.org/\">::: Structured Blogging :::</a> is a provider of such tools. On the using site,\nthere is <a href=\"https://addons.mozilla.org/en-US/firefox/addon/4106\">this nice Firefox plugin</a>, that can extract information available in\nmicroformats, though Firefox3 is supposed to support some microformats natively.</p>\n\n<p>Just today, <a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=309\">Peter also blogged about a Berner-Lee’s presentation</a> with the nice\ncircular phenomena in all these web technologies. The diagrams nicely visualize the complex social aspects of these new technologies.\n(I’m sure the apply to chemoinformatics too… who makes a chemoinfo variant?) RDF is the way to go; it’s the machine interpretable\n(well, more accurate) <em>microformat</em>. All sorts of information is getting available as RDF. For example, check out\n<a href=\"http://www.l3s.de/~siberski/bibtex2rdf/\">bibtex2rdf</a>, <a href=\"http://dbpedia.org/docs/#intro\">Wikipedia as RDF</a>,\n<a href=\"http://dev.isb-sib.ch/projects/uniprot-rdf/\">uniprotRDF</a>, and <a href=\"http://bioguid.info/\">BioGUID</a>. Moreover,\n<a href=\"http://www.w3.org/TR/2007/CR-grddl-20070502/\">GRDDL</a> might mave this even more common.\nI have been maintaining a <a href=\"http://del.icio.us/egonw/rdf\">bookmark list of RDF things happening</a>, check it out,\nthe list is <em>social</em> <strong><em>and</em></strong> <em>using microformats</em>.</p>",
      "summary": "Peter blogged some days ago about microformats and how they could be used in chemistry. Being late and a bit absent minded, I added a short comment that Chemical blogspace supports microformats for chemistry, and that chemistry is harvested from that, and actually semantically distributed again using CMLRSS.",
      
      "date_published": "2007-05-11T00:00:00+00:00",
      "date_modified": "2007-05-11T00:00:00+00:00",
      "tags": ["rdf","microformat","chemistry"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/05/06/preparing-chemoinformatics-workshop.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/05/06/preparing-chemoinformatics-workshop.html",
      "title": "Preparing a Chemoinformatics workshop",
      "content_html": "<p>After handing in a new draft of my PhD manuscript with my co-promotors last friday, and a week before we leave for Sweden, it is\ntime to start finishing up the material for my one hour workshop on chemoinformatics in general and QSAR/QSPR in particular for the\n<a href=\"http://teacher.bmc.uu.se/BioclipseWS07\">Bioclipse Workshop</a>.</p>\n\n<p><a href=\"http://plindenbaum.blogspot.com/2007/05/does-this-remind-you-of-anything.html\">Pierre blogged</a> about this movie. It looks relevant:</p>\n\n<iframe width=\"560\" height=\"315\" src=\"https://www.youtube.com/embed/xFAWR6hzZek\" title=\"YouTube video player\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen=\"\">\n</iframe>",
      "summary": "After handing in a new draft of my PhD manuscript with my co-promotors last friday, and a week before we leave for Sweden, it is time to start finishing up the material for my one hour workshop on chemoinformatics in general and QSAR/QSPR in particular for the Bioclipse Workshop.",
      
      "date_published": "2007-05-06T00:00:00+00:00",
      "date_modified": "2007-05-06T00:00:00+00:00",
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/05/05/cb-comments-for-inchis.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/05/05/cb-comments-for-inchis.html",
      "title": "Cb comments for InChI&apos;s",
      "content_html": "<p>About a year ago <a href=\"http://pbeltrao.blogspot.com/2006/05/postgenomics-script-for-firefox-i-am.html\">Pedro wrote a Greasemonkey script</a>\nto add comments from <a href=\"http://www.postgenomic.com/\">PostGenomic.com</a> to table of contents of scientific journals.\n<a href=\"http://baoilleach.blogspot.com/2007/04/add-quotes-from-postgenomic-and.html\">Noel extended</a> it with support for\n<a href=\"http://wiki.cubic.uni-koeln.de/cb/\">Chemical blogspace</a> (see also <a href=\"http://chemicalblogspace.blogspot.com/2007/03/jacs-toc-featuring-your-review.html\">this earlier item</a>).\nNow, the later website is maintained by me, and I\n<a href=\"http://chemicalblogspace.blogspot.com/2006/12/hacking-inchi-support-into-cb.html\">extended the aggregator software with molecule support</a>,\nfor example to show <em>hot</em> <a href=\"http://chemicalblogspace.blogspot.com/2007/02/latest-blogged-molecules-on-front-page.html\">molecules on the frontpage</a>\n(at some point <a href=\"http://www.ghastlyfop.com/blog/2007/05/quick-notices.html\">my patches will be backported into mainstream</a>.\nEuan, why not invite me to London HQ in, say, June?).</p>\n\n<p>So, when we can show comments from blogosphere for journal articles, why can’t we do that for molecules too? Sure we can.\nJust needs some hacking. Right, and done that today. The scripts works for <a href=\"http://pubchem.ncbi.nlm.nih.gov/\">PubChem</a>:</p>\n\n<p><img src=\"/assets/images/cb_inchi_greasemonkey1.png\" alt=\"\" /></p>\n\n<p>Works for any <code class=\"language-plaintext highlighter-rouge\">&lt;a href&gt;</code> element with an URL to PubChem like <em>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=search&amp;DB=pccompound&amp;term=%22InChI=1/CH4/h1H4%22[InChI]</em>.\nBTW, while the URL is not very readable, this might actually be a good way to <a href=\"http://chem-bla-ics.blogspot.com/2007/02/invisible-inchis.html\">hide InChIs</a>,\nthough I am sure Google will not index this InChI either.</p>\n\n<p>And it also works for <a href=\"http://chem-bla-ics.blogspot.com/2006/12/including-smiles-cml-and-inchi-in.html\">semantically marked up InChI’s (using either microformats or RDFa)</a>:</p>\n\n<p><img src=\"/assets/images/cb_inchi_greasemonkey.png\" alt=\"\" /></p>\n\n<p>You’ll notice here that it is friendly with my\n<a href=\"http://chem-bla-ics.blogspot.com/2006/12/smiles-cas-and-inchi-in-blogs.html\">Sechemtic script to make links to Google and PubChem</a>.</p>\n\n<p>The tools to make this happen involves a new Greasemonkey script (based on Noels code), and a few patches to the Postgenomic.com software.\nThe user script can be downloaded <a href=\"http://userscripts.org/scripts/show/9002\">here</a>. An entry on the\n<a href=\"http://wiki.cubic.uni-koeln.de/bowiki/index.php/Using_Javascript_and_Greasemonkey_for_Chemistry\">Blue Obelisk userscript page</a>\nwill follow; check that page for more goodies.</p>",
      "summary": "About a year ago Pedro wrote a Greasemonkey script to add comments from PostGenomic.com to table of contents of scientific journals. Noel extended it with support for Chemical blogspace (see also this earlier item). Now, the later website is maintained by me, and I extended the aggregator software with molecule support, for example to show hot molecules on the frontpage (at some point my patches will be backported into mainstream. Euan, why not invite me to London HQ in, say, June?).",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/cb_inchi_greasemonkey1.png",
      "date_published": "2007-05-05T00:00:00+00:00",
      "date_modified": "2007-05-05T00:00:00+00:00",
      "tags": ["cb","inchi","userscript","rdf"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/04/27/ex-cubic-get-together.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/04/27/ex-cubic-get-together.html",
      "title": "Ex-CUBIC get-together",
      "content_html": "<p>Yesterday and today I was in Cologne to meet with other ex-CUBIC researchers from <a href=\"http://wiki.cubic.uni-koeln.de/blog/\">Christoph</a>’s\n<a href=\"http://almost.cubic.uni-koeln.de/jrg\">research group on chemoinformatics</a> (and <a href=\"http://kemistry-desktop.blogspot.com/2007/04/gsoc-meeting-with-alexandr.html\">with Alexandr</a>).\nNot all former group members where there, but on the other hand we were complemented with Pascal:</p>\n\n<p><img src=\"/assets/images/DSCI0173.JPG\" alt=\"\" /></p>\n\n<p>(Yes, the sun was <strong>very</strong> bright :)</p>\n\n<p>The program was consisted of a couple of group things, like making a short list of articles to write up in the next\nfew months. Yesterday evening ended in a very nice Biergarten called the <a href=\"http://www.kneipen-suche.com/koeln-altenberger_hof-2541.html\">Altenberger Hof</a>.</p>",
      "summary": "Yesterday and today I was in Cologne to meet with other ex-CUBIC researchers from Christoph’s research group on chemoinformatics (and with Alexandr). Not all former group members where there, but on the other hand we were complemented with Pascal:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/DSCI0173.JPG",
      "date_published": "2007-04-27T00:00:00+00:00",
      "date_modified": "2007-04-27T00:00:00+00:00",
      "tags": ["cheminf"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/04/24/bioclipse-now-allows-qsar-descriptor.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/04/24/bioclipse-now-allows-qsar-descriptor.html",
      "title": "Bioclipse now allows QSAR descriptor selection",
      "content_html": "<p>In preparation for the <a href=\"http://teacher.bmc.uu.se/BioclipseWS07/Welcome.html\">Embrace Workshop for Bioclipse</a> in May, I am working on the QSAR functionality of\n<a href=\"http://www.bioclipse.net/\">Bioclipse</a>. A nice extension point got set up some time ago, called <a href=\"http://chem-bla-ics.blogspot.com/2006/11/bioclipse-workshop-short-but.html\">DescriptorProvider</a>,\nand implemented by plugins to allow calculation of one or more descriptors for the selected molecules. Now, the\n<a href=\"http://chem-bla-ics.blogspot.com/2006/07/matrix-support-in-bioclipse.html\">functionality for the resulting matrix</a> has been around for some time too.</p>\n\n<p>What had not been available yet, was some GUI stuff to select descriptors to calculate, and the actual calculation. While the latter is yet to be\nhooked up, the selection of descriptors is now available:</p>\n\n<p><img src=\"/assets/images/bioclipseDescriptorSelection.png\" alt=\"\" /></p>\n\n<p>Interesting here is the use of OWL. CDK’s <code class=\"language-plaintext highlighter-rouge\">DescriptorEngine</code> provides a simple API written by Rajarshi that interfaces to the dictionary support\nfor OWL (which CDK offers in addition to CML based dictionaries). All CDK descriptors are written up in OWL (the\n<a href=\"http://cdk.svn.sourceforge.net/viewvc/cdk/trunk/cdk/src/org/openscience/cdk/dict/data/descriptor-algorithms.owl?view=markup\">source file</a>\nand the <a href=\"http://qsar.sourceforge.net/dicts/qsar-descriptors/index.xhtml\">HTML version</a>).\nYou’ll notice the weird characters in the screenshot; there something goes wrong with the encoding when reading the OWL.</p>",
      "summary": "In preparation for the Embrace Workshop for Bioclipse in May, I am working on the QSAR functionality of Bioclipse. A nice extension point got set up some time ago, called DescriptorProvider, and implemented by plugins to allow calculation of one or more descriptors for the selected molecules. Now, the functionality for the resulting matrix has been around for some time too.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/bioclipseDescriptorSelection.png",
      "date_published": "2007-04-24T00:00:00+00:00",
      "date_modified": "2007-04-24T00:00:00+00:00",
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/04/23/cdk-10-milestone-after-7-year-of.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/04/23/cdk-10-milestone-after-7-year-of.html",
      "title": "CDK 1.0: a milestone after 7 year of development",
      "content_html": "<p>Last night, I <a href=\"http://sourceforge.net/project/showfiles.php?group_id=20024\">released CDK 1.0</a> as the previous release candidate\ndid not show up new major problems. It is far from a perfect release (see these still <a href=\"http://wiki.cubic.uni-koeln.de/cdkwiki/doku.php?id=cdk1.0\">TODO</a>’s\nand <a href=\"http://cheminfo.informatics.indiana.edu/~rguha/code/java/nightly/\">Nightly</a>, run by\n<a href=\"http://cheminfoclub.blogspot.com/\">Rajarshi</a>), but the core is pretty solid.</p>\n\n<p>I would warmly thank everyone who has contributed to the project in one way or another (I worked more on maintainance than\nimplementing functionality), as it has been a great pleasure to make CDK releases. <a href=\"http://www.ohloh.net/\">OHLOH</a> runs a rather nice\ndeveloper <a href=\"http://www.ohloh.net/projects/380/analyses/latest/contributors\">hall of fame for the CDK</a>. You’ll see that\n<a href=\"http://wiki.cubic.uni-koeln.de/blog/\">Christoph</a>’s research group is the major contributor. User contributions, however,\nare equally important and played a bug role in the quite <a href=\"http://cheminfo.informatics.indiana.edu/~rguha/code/java/nightly/junitsummary.html\">large set of JUnit tests</a>\nwe have now (3300+).</p>\n\n<p>Another reason why this is an important milestone, is that it is the last release I am creating. I wrote on the user list:</p>\n\n<ul><i>\nIn advance of the actual CDK 1.0 release, thanx very much to all that contributed big *and* small ! It was a great 7 years of open source\nchemoinformatics development!\n\nHey, that actually sounds like I am stepping down... Well, it *is* time for a new generation to step up indeed. I won't leave the project,\nbut being CDK News editor, CDK release manager, CDK code developer is a bit much for doing outside office hours. I feel that I have clearly\nenough made my point for open source chemoinformatics, and it is time for something else... which will very likely involve the CDK, but\nlikely more as user only... I was hoping in the past few years, that the transition would go smoothly, and have been trying to get people\ninterested in various emails, including this one; however, being humans, we wait for the catastrophe and only after that we're shocked and\nstart doing something about it. So, yeah, I'm forced to make this drastic announcement: CDK 1.0 will be the last CDK release *I* will make.\n</i></ul>\n\n<p>So, who wants to take over? Some one will have to. I, however, will put my focus on other things. Very likely involving the CDK, as there\nare still many things I want to do. Some things I have on my list:</p>\n\n<ul>\n  <li>the Java2D based 2D renderer/editor</li>\n  <li>more accurate atom type perception</li>\n  <li>more articles for CDK News</li>\n  <li>the book “CDK for Dummies”</li>\n  <li>improved structure generator</li>\n  <li>validation</li>\n  <li>…</li>\n</ul>",
      "summary": "Last night, I released CDK 1.0 as the previous release candidate did not show up new major problems. It is far from a perfect release (see these still TODO’s and Nightly, run by Rajarshi), but the core is pretty solid.",
      
      "date_published": "2007-04-23T00:00:00+00:00",
      "date_modified": "2007-04-23T00:00:00+00:00",
      "tags": ["cdk","cheminf","junit"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/04/21/clustering-web-search-results.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/04/21/clustering-web-search-results.html",
      "title": "Clustering web search results",
      "content_html": "<p>The Dutch <a href=\"http://www.intermediair.nl/\">Intermediair</a> magazine of this week had a letter sent by a reader introducing\n<a href=\"http://clusty.com/\">Clusty</a>, a web search engine that clusters the results. It does a pretty good job for\n‘<a href=\"http://clusty.com/search?input-form=clusty-simple&amp;v%3Asources=webplus&amp;query=egon+willighagen\">egon willighagen</a>’:</p>\n\n<p><img src=\"/assets/images/clusty1.png\" alt=\"\" /></p>\n\n<p>It seems to use other engine to do the searching and focus on the clustering. Source engine exclude Google, and include\n<a href=\"http://gigablast.com/\">Gigablast</a>, <a href=\"http://www.msn.com/\">MSN</a> and <a href=\"http://wikipedia.org/\">Wikipedia</a>.</p>\n\n<p>For <em>chemoinformatics</em> it comes up with the following top 10 clusters: ‘Drug Discovery’, ‘Structure’, ‘Cheminformatics’,\n‘Research’, ‘Books’, ‘Conference, German’, ‘Textbook, Gasteiger’, ‘Laboratory’, ‘Handbook of Chemoinformatics’, and\n‘School’. Quite acceptable and useful clustering.</p>\n\n<p>This might be the next step in googling. Rich, it also might solve <a href=\"http://depth-first.com/articles/2007/04/20/self-referential\">your problem</a>:\nsearching for ‘ruby chemoinformatics’ does <strong>not</strong> give a ‘Depth First’ or ‘Rich Apodaca’ cluster :)</p>",
      "summary": "The Dutch Intermediair magazine of this week had a letter sent by a reader introducing Clusty, a web search engine that clusters the results. It does a pretty good job for ‘egon willighagen’:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/clusty1.png",
      "date_published": "2007-04-21T00:00:00+00:00",
      "date_modified": "2007-04-21T00:00:00+00:00",
      "tags": ["google","cheminf"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/04/06/cubic-period-is-over.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/04/06/cubic-period-is-over.html",
      "title": "CUBIC period is over",
      "content_html": "<p>The end of the CUBIC has come, and so did the end of my 1-year postdoc in the group of <a href=\"http://wiki.cubic.uni-koeln.de/blog/\">Christoph Steinbeck</a>.\nIt would have been much better if the group could have continued for one or two more years, so that we could harvest the fruit of the work done in\nthe past years. Only having been group member since April 1 2006, I mostly contributed work to <a href=\"http://www.bioclipse.net/\">Bioclipse</a>\n(doi:<a href=\"https://doi.org/10.1186/1471-2105-8-59\">10.1186/1471-2105-8-59</a>), CMLSpect (submitted), and integrating Miguel’s mass spectrum prediction\ntoolkit into SENECA (doi:<a href=\"https://doi.org/10.1021/ci000407n\">10.1021/ci000407n</a>) for structure elucidation. The latter topic is rather exciting\nand when the method shows powerful enough, this will have a major impact on the field of <a href=\"http://en.wikipedia.org/wiki/Metabolite\">metabolomics</a>.</p>\n\n<p>BTW, importantly, my CUBIC email address is no longer valid, so please use one of my many other email addresses, e.g. my SourceForge one, or\nmy Gmail account.</p>",
      "summary": "The end of the CUBIC has come, and so did the end of my 1-year postdoc in the group of Christoph Steinbeck. It would have been much better if the group could have continued for one or two more years, so that we could harvest the fruit of the work done in the past years. Only having been group member since April 1 2006, I mostly contributed work to Bioclipse (doi:10.1186/1471-2105-8-59), CMLSpect (submitted), and integrating Miguel’s mass spectrum prediction toolkit into SENECA (doi:10.1021/ci000407n) for structure elucidation. The latter topic is rather exciting and when the method shows powerful enough, this will have a major impact on the field of metabolomics.",
      
      "date_published": "2007-04-06T00:00:00+00:00",
      "date_modified": "2007-04-06T00:00:00+00:00",
      "tags": ["bioclipse","cml"],
      "_references": [{ "url": "https://doi.org/10.1186/1471-2105-8-59" },{ "url": "https://doi.org/10.1021/ci000407n" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/03/29/acs-chicago-day-3.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/03/29/acs-chicago-day-3.html",
      "title": "ACS Chicago - Day #3",
      "content_html": "<p>Tuesday promised to be an interesting day: an interesting ‘Scientific Communication’ CINF session in the morning and early\nafternoon. And, rather important to me, the <a href=\"http://www.blueobelisk.org/\">Blue Obelisk</a> dinner that night, just after another\nCINF party, where I chatted with a few others about options of a chemistry equivalent of the <a href=\"http://code.google.com/soc/\">Google Summer of Code</a>;\nwho knows what happens this summer, but start thinking about ideas on how to increase the web experience of chemistry journal web pages.</p>\n\n<h2 id=\"the-gsoc\">The GSoC</h2>\n\n<p>Now that I am talking about the GSoC, you might have realized that the <a href=\"http://cdk.sf.net/\">CDK</a> and\n<a href=\"http://www.bioclipse.net/\">Bioclipse</a> did not make it as mentoring organization. While I had not seriously expected it,\nall the enthusiasm from within both projects including several interested students, I was a bit hoping for getting\naccepted with at least on of them. Meanwhile, <a href=\"http://www.kde.org/\">KDE</a>, as expected, is approved, and actually contains\ntwo interesting chemistry project ideas too. One is about a 3D viewer/editor for which 7 students send Google a proposal,\nand the other about text mining of chemical content on the desktop, using <a href=\"http://www.vandenoever.info/software/strigi/\">Strigi</a>\n(two students). Both topics have one excellent proposal, who do good in the ranking process. So, we might have some\nchemistry in the GSoC after all.</p>\n\n<h2 id=\"cinf\">CINF</h2>\n\n<p>OK, back to the ACS meeting. Fahrenbach had a presentation on blogging too, but don’t remember anything special about it.\nThe <a href=\"http://chem-bla-ics.blogspot.com/2007/03/acs-chicago-day-1.html\">CHED</a> session was more elaborate on the whole topic,\nand since you are a reader of chemical blogs, you all know about this anyway :) Loney introduced\n<a href=\"http://biotechexchange.org/\">biotechexchange.org</a> which is building a social network around biotechnology. There are other\ncommunity sites like this, and my major <em>problem</em> with these community building efforts is that they are too well defined.\nI much prefer to work in a more open environment where I can get in contact just as easily with people outside some specific\ntopic. For the rest, the set of technologies is rather comprehensive.</p>\n\n<p>Frenkel spoke about the imminent success of <a href=\"http://trc.nist.gov/ThermoML.html\">ThermoML</a>, which is now being supported by\nvendors and publishers, smoothing the whole dissemination of data supported by this format. It is basically what\n<a href=\"http://www.xml-cml.org/\">CML</a> is attempting to achieve in molecular structure data. Day is having a good go at this with\ncrystal data, and <a href=\"http://wwmm.ch.cam.ac.uk/wikis/wwmm/index.php/CMLCrystBase\">his CrystalEye project</a> is supposed to be\nlaunched next month.</p>\n\n<p>Hey, at Microsoft.com, had a rather manager level presentation, with very little value for someone into the field of\n‘data lifecycle and curation’. Rather disappointing on a scientific conference, or am I judging the ACS conference here?\nIf the Microsoft is getting interested in chemoinformatics that might be a good thing, as long as the are OK with open\nsource, open data and open standards. Who knows…</p>\n\n<p>Rzepa had his presentation on the semantic wiki, which he, in similar form, <a href=\"http://chem-bla-ics.blogspot.com/2006/11/german-conference-on-chemoinformatics_14.html\">held at the German Chemoinformatics Conference\ntoo</a>. New, I think, were the sheets\non reasoning based on the content of the wiki. That was rather interesting. If we all would make our chemical knowledge\navailable as <a href=\"http://en.wikipedia.org/wiki/Resource_Description_Framework\">RDF</a>, then this can become a big thing very\nsoon. I skipped the presentation of Renear on ontologies, though it was actually one that I had hand picked; but I was\nsimply too tired. Will watch the podcast when available. (BTW, are they making podcasts for the CINF session only, or\nfor the whole ACS meeting?)</p>\n\n<p>In the afternoon, I also followed just a subset of the presentations. The last one was by Scott <a href=\"http://chem-bla-ics.blogspot.com/2007/03/acs-chicago-day-1.html\">who spoke earlier on\nSecond Life</a>. I’m really interested in seeing where this\nis going, though I have my reservations if this is the right medium for mining chemical knowledge. Today she spoke on the\nsocial bookmarking and podcasting initiatives at Nature and <a href=\"http://network.nature.com/\">Nature Network</a>. The latter is\na social site, like like BioTechExchange, but not limited to one specific topic, and more interdisciplinary (my account\nat <a href=\"http://network.nature.com/profile/U6151BCD6\">NN</a>). I blogged <a href=\"http://chem-bla-ics.blogspot.com/2007/02/nature-network-v2-cannot-create-new.html\">about some early issues</a>\nsome time ago.</p>\n\n<h2 id=\"jmol\">Jmol</h2>\n\n<p>Jabri showed us how <a href=\"http://www.jmol.org/\">Jmol</a> is adding value to the <a href=\"http://pubs.acs.org/journals/acbcct/index.html\">ACS Chemical Biology</a>\njournal. Yes, that’s what she said. An opensource tool, developed by people on their free time, is making an ACS journal\nmore valuable. I am very happy to hear that, and it strongly supports our view that opensource chemoinformatics is very\nimportant. Some more support from established organizations might be in order indeed!</p>\n\n<h2 id=\"blue-obelisk\">Blue Obelisk</h2>\n\n<p>It has become a habit to organize <a href=\"http://www.blueobelisk.org/\">Blue Obelisk</a> dinner to talk about opensource, opendata,\nopenscience, and the future. Actually, we secretly talk about talking over the world, but I can’t say that. (Neither can\nI tell anything about our secret rituals. The protocol for becoming member of our society is quite simple though: be\nclear about your opinion that ODOSOS is the future.) Dinner was great, and it was great talking to several older and\nnewer members of the movement. Cheers all! Oh, there also was some awards involved, but I hope Peter and Christoph will\nblog about that and post the pictures.</p>",
      "summary": "Tuesday promised to be an interesting day: an interesting ‘Scientific Communication’ CINF session in the morning and early afternoon. And, rather important to me, the Blue Obelisk dinner that night, just after another CINF party, where I chatted with a few others about options of a chemistry equivalent of the Google Summer of Code; who knows what happens this summer, but start thinking about ideas on how to increase the web experience of chemistry journal web pages.",
      
      "date_published": "2007-03-29T00:20:00+00:00",
      "date_modified": "2007-03-29T00:20:00+00:00",
      "tags": ["acs"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/03/29/acs-chicago-day-2.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/03/29/acs-chicago-day-2.html",
      "title": "ACS Chicago - Day #2",
      "content_html": "<p>The wetter was much better today. This is a view on downtown from the walking bridge between Lake Side and McCormick\nbuildings of the conference site:</p>\n\n<p><img src=\"/assets/images/dsci0028.jpg\" alt=\"\" /></p>\n\n<h2 id=\"cinf-morning\">CINF morning</h2>\n\n<p>Yeah, more CINF session reports; I’m a chemoinformatician, remember. Chen showed us around in the latest changes in\n<a href=\"http://cdb.ics.uci.edu/CHEM/Web/\">ChemDB</a>, such as retrosynthesis planning. Banik shows a patented method for\nshowing differences in a set of spectra, though his examples were not really impressive; if the method is really\npowerful, the examples might have been picked a bit more careful. And I have to say, in retrospect, I found the\npresentations in the CINF sessions typically of lower quality than I had expected for the big ACS meeting. Fortunately,\nmeeting all the people here makes more than up for that. Guha presented a potentially powerful method to cluster\nlarge and huge data sets with a method that approximated SVD by splitting up the full matrices into many smaller ones.</p>\n\n<h2 id=\"laptop-lane\">Laptop Lane</h2>\n\n<p>The idea was that us bloggers met up, but that did not quite work out. No problem though. Spoke about many things\nwith several people. <a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/\">Peter</a> pointed me to an email from Noel on getting\n<a href=\"http://chemicalblogspace.blogspot.com/2007/03/jacs-toc-featuring-your-review.html\">blog comments on JACS/ChemComm/JCIM papers on the table of contents</a>.\nAnd somewhere that day, <a href=\"http://usefulchem.blogspot.com/\">Jean-Claude</a> pointed me to\n<a href=\"http://usefulchem.blogspot.com/2007/03/communicating-chemistry-at-acs.html\">more chemistry in Second Life</a>\n(set up by <a href=\"http://bethssecondlife.blogspot.com/\">Beth</a>). Both are great follow ups on\n<a href=\"http://chem-bla-ics.blogspot.com/2007/03/acs-chicago-day-1.html\">the blog/wiki session by CHED yesterday</a>!\nAround 17:00 we left for one of the receptions in the W hotel in one of the WOW rooms, though the view was not that spectacular:</p>\n\n<p><img src=\"/assets/images/dsci0037.jpg\" alt=\"\" /></p>\n\n<h2 id=\"bulls\">Bulls</h2>\n\n<p>I did not stay very long at the party, as I had to leave for the Bulls game against Portland. That was fun indeed!\nIf I knew my first breakfast was going to cost 22 dollar, I would have bought a better ticket, but now I had a\n<em>Stand 1</em> ticket which is so high up in the stadium that even the security people had not idea where to send\nme :) The view was still more than good enough to feel the pain/joy of the blocks and dunks-in-your-face:</p>\n\n<p><img src=\"/assets/images/dsci0042.jpg\" alt=\"\" /></p>",
      "summary": "The wetter was much better today. This is a view on downtown from the walking bridge between Lake Side and McCormick buildings of the conference site:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/dsci0028.jpg",
      "date_published": "2007-03-29T00:10:00+00:00",
      "date_modified": "2007-03-29T00:10:00+00:00",
      "tags": ["acs"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/03/26/acs-chicago-day-1.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/03/26/acs-chicago-day-1.html",
      "title": "ACS Chicago - Day #1",
      "content_html": "<p>I was happy to notice just a minute ago that the first blog items covering the\n<a href=\"http://acswebcontent.acs.org/nationalmeeting/chicago2007/home.html\">ACS meeting</a> are popping up: C&amp;EN has set up a\n<a href=\"http://cen07.wordpress.com/\">dedicated blog about the meeting</a>, Nature’s Sceptical Caterine\n<a href=\"http://blogs.nature.com/thescepticalchymist/\">wrote she has reached the meeting too</a>, Richard wrote about the\n<a href=\"http://www.rscweb.org/blogs/cw/\">scent of bugs in wine</a> (or so), and\n<a href=\"http://www.rscweb.org/blogs/cw/\">Kyle won’t make it other than tomorrow</a>. Additionally, Nature is running a\n<a href=\"http://blogs.nature.com/news/blog/conference_reports/american_chemical_society/\">coverage of the ACS meeting</a>.\nOn the reader side, Paul is <a href=\"http://blog.chembark.com/2007/03/25/acs-07-chicago/\">hoping that Whitesides</a>\nwill be blogged about.</p>\n\n<p>My first day at the conference was interesting. The huge facility makes navigation a bit problematic, and we\nseemed to make it a habit to explore the wrong end of the building before heading in the right direction.\nThere are a lot of maps in the ACS On-Site Meeting Program, but a nice overview map is lacking. Anyway, I\nspent the morning session in the ‘blog, wiki, and podcast session’, and the afternoon in CINF session honoring\n<a href=\"http://www.informatics.indiana.edu/people/profiles.asp?u=wiggins\">Prof. Wiggins</a>.</p>\n\n<h2 id=\"ched\">CHED</h2>\n\n<p>Vogel was the first speaker in the CHED C Section morning session, and spoke about blogs and RSS feeds in general.\n<a href=\"http://www.chemicalforums.com/index.php?topic=13540.msg62586#msg62586\">Mitch’ Yahoo Pipes hackup</a> was mentioned\nin one of the talks in this morning session. Currano followed with a discussion on social bookmarking, and so did\nPence who focussed on the function in education. Francl put chemical blogging in some perspective which led to a\nshort discussion on the difference in idea between blogs and wiki’s. Gelder and Picione spoke about podcasting as\nmultimedia blogs. Scott represented recent work by Nature in exploring Second Life technologies, and mentioned the\nchemistry on their island, which happened to host <a href=\"http://chem-bla-ics.blogspot.com/2006/12/chemoblogs-2.html\">a session of the First Online EMBL PhD Symposium last year</a>.\nBradley spoke about how he integrated blogs and wiki’s into there practicals. The atmosphere of the session was\nrelaxed and the discussion lively.</p>\n\n<h2 id=\"cinf\">CINF</h2>\n\n<p>The downside of all these parallel sessions is that it is bound to give clashes. It’s apparently even supposed to,\nbecause the ACS website private schedule assistant is made to make you aware and resolve such clashes. So, while\nI had to skip the CINF morning session honoring Wiggens, I had to skip the CHED session on social networking continuing\non the CHED morning session. For example, I has to miss the presentation by <a href=\"http://www.ch.ic.ac.uk/rzepa/confchem06/\">Rzepa on the semantic wiki</a>\n(Henry, I hope to have made up for it, by plugging your work here :)</p>\n\n<p>Murray-Rust was the first speaker of the CINF Section A afternoon session, and talked about mashups, text mining\nand other things done in Cambridge. He also mentioned recent Greasemonkey scripts using comments from and\nenhancing our chemical blogs, now <a href=\"http://wiki.cubic.uni-koeln.de/bowiki/index.php/Using_Javascript_and_Greasemonkey_for_Chemistry\">described at the Blue Obelisk website</a>.\n(Especially the Chemical blogspace enhanced TOC of chemistry journals is nice.) Wild spoke about\n<a href=\"http://djwild.info/acs07/\">integrating text mining and chemoinformatics tools</a>, and showed a mockup of a\n‘by the way’ system for PubChem, where a PubChem entry would be enhanced with ‘BTW, did you know that these 7\narticles mention this molecules, and that … etc’. These things are going to happen this year. Heller held his\nusual talk on InChI and PubChem, though the content has slightly changed since the last two versions I’ve\nseen (not the message, though). Doman gave a practical example showing backing up earlier statements that\ntoo much information is lost in the publication process. Heritage showed Elsevier/MDL’s view on the future of\nchemoinformatics, and accurately touched where it is currently failing. Amusingly, he pointed out that Elsevier,\nthe publisher, would love to see more accurate QSAR/QSPAR/VS/etc models; ironically, it is, actually, for a\nlarge part caused by data not ending up in publications that predictive models are not as accurate as they\ncould be. So, while looking at the chemoinformaticians/metricians, they should really be looking at themselves.</p>\n\n<p>Some of these presentations mentioned directly or indirectly things I worked on. Thanx for doing that! Because I\nknew that there was funding for going to this meeting, only after the poster submission deadline was closed, I\nam not in the opportunity to present my work myself.</p>\n\n<p>The evening is for the traditionally parties, time to eat, drink, network, make deals and try to convince others\nabout the virtues of <a href=\"http://chem-bla-ics.blogspot.com/2006/10/opensource-chemistry-and-opensource.html\">ODOSOS</a>.</p>\n\n<p>A last reminder: tomorrow afternoon at 13:00 at Laptop Lane in the exposition area is a meeting of chemical\nbloggers. Please join and chat IRL for once! :)</p>",
      "summary": "I was happy to notice just a minute ago that the first blog items covering the ACS meeting are popping up: C&amp;EN has set up a dedicated blog about the meeting, Nature’s Sceptical Caterine wrote she has reached the meeting too, Richard wrote about the scent of bugs in wine (or so), and Kyle won’t make it other than tomorrow. Additionally, Nature is running a coverage of the ACS meeting. On the reader side, Paul is hoping that Whitesides will be blogged about.",
      
      "date_published": "2007-03-26T00:00:00+00:00",
      "date_modified": "2007-03-26T00:00:00+00:00",
      "tags": ["acs"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/03/25/arrived-in-chicago.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/03/25/arrived-in-chicago.html",
      "title": "Arrived in Chicago...",
      "content_html": "<p>I arrived in Chicago yesterday afternoon. Much warmed than the cold Chicago the ACS promised me,\nso my winter coat was really not necessary. Is this global warming? Or was the ACS simply wrong?\nAnyway, very foggy indeed, just like the <a href=\"http://wiki.cubic.uni-koeln.de/cb/blog_search.php?timeframe=10y&amp;blog_id=44\">Chemistry World blog wrote</a>:</p>\n\n<p><img src=\"/assets/images/dsci0027.jpg\" alt=\"\" /></p>\n\n<p>There were several other Dutch chemists on the plane, among which a few formed postdocs from Nijmegen,\nwho I knew from the time I was still a M.Sc. student in organic chemistry. The plane was nice too, a\nBoeing 747, the first time I flew with one. OK, there now is the new Airbus, so the Boeing has lost\nsome of its prestige, but here’s the image anyway:</p>\n\n<p><img src=\"/blog//assets/images/dsci0024.jpg\" alt=\"\" /></p>\n\n<p>The <a href=\"http://www.youtube.com/watch?v=6iTqwPj5ChE\">380 is actually supposed to be in Chicago</a>,\nbut I did not see it. OK, going for breakfast now, and to the ACS conference site afterwards.</p>",
      "summary": "I arrived in Chicago yesterday afternoon. Much warmed than the cold Chicago the ACS promised me, so my winter coat was really not necessary. Is this global warming? Or was the ACS simply wrong? Anyway, very foggy indeed, just like the Chemistry World blog wrote:",
      "image": "https://chem-bla-ics.linkedchemistry.info/blog//assets/images/dsci0024.jpg",
      "date_published": "2007-03-25T00:00:00+00:00",
      "date_modified": "2007-03-25T00:00:00+00:00",
      "tags": ["acs"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/03/22/chicago-bulls-here-i-come.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/03/22/chicago-bulls-here-i-come.html",
      "title": "Chicago (Bulls), here I come!",
      "content_html": "<p>I had some fun today with making prints of reservations etcetera for my trip to the\n<a href=\"http://www.chemistry.org/portal/a/c/s/1/acsdisplay.html?DOC=meetings%5Cchicago2007%5Chome.html\">ACS conference in Chicago</a>.\nWent over to the website to make a print of the location of the hotel I am in.\n(<a href=\"http://chicago.intercontinental.com/\">Intercontinental Chicago</a>: in case you want to leave me a message to\nmeet up over breakfast or so.) Anyway, so at the ACS website I found a notice that the ACS Housing people\nclosed down and that I should contact the hotel directly. Fine, no problem. Oh wait, my hotel is not in the\nlist. No worries, I just enter my last name and acknowledgment number. Huh, they don’t know me?? Already\nworried about which bridge to use as backup alternative, I emailed the organization which now takes care\nof it, being answered some 15 minutes later that they no longer do the hotel administration for that ACS\nconference anymore. That indeed rang some bell; I went over back to the ACS webpage, and this time found\nthe correct ACS housing webpage. I had been using one from a previous ACS conference. Yeah, one of my\nfinest hours :) Things are sorted out now, as I had already email the hotel too. Things are fine, and so\nmy nerviness activity is back to normal. (If you care to reproduce, just go to the\n<a href=\"http://www.chemistry.org/portal/a/c/s/1/acsdisplay.html?DOC=meetings\\national\\international.html\">page for International Visitors</a>\nlinked from the Chicago conference homepage, scroll down to “Preparing for Your ACS Meeting Experience” and click the\n<a href=\"http://www.chemistry.org/portal/a/c/s/1/acsdisplay.html?DOC=meetings\\national\\housing.html\">Hotel Information link</a>.\nMakes sense, because the international guests already know how things work :) And, yes, I could have seen it mention\nSA in the subtitle, I know.)</p>\n\n<h2 id=\"my-acs-schedule\">My ACS Schedule</h2>\n\n<p>My schedule is pretty regular, filled mostly with CINF and COMP presentations. The Monday and and Wednesday\nafternoons are empty, though there was some <a href=\"http://chemicalblogspace.blogspot.com/2007/03/chemical-blogspace-getting-physical-at.html\">plans to meet up with bloggers</a>\n(<a href=\"http://gaussling.wordpress.com/2007/03/07/bloggenvolk/\">and here</a>) on Monday afternoon, which I still\nthink we should do, even though <a href=\"http://gaussling.wordpress.com/2007/03/19/bloggenvolk-acs-chicago-meeting-minus-gaussling/\">Gaussling had to back out</a>.\nI hereby suggest we meet at 13:00 at <a href=\"http://map.mapnetwork.com/tradeshow/chicago/acs/\">Laptop Lane on the ACS Show Floor</a>.\nSunday evening there are all sorts of parties, and I and some colleagues are going to the party for\ninternational guests at the Sheraton Chicago. Monday evening is reserved for the <a href=\"http://www.nba.com/bulls/\">Bulls</a>.\nI am indeed some 10 years late, but happy to finally be able to visit a NBA stadium during the season. They play\nagainst Portland on Monday, while against Detroit on Tuesday. I figure that game would have been nicer, but\nTuesday evening is reserved for the <a href=\"http://blueobelisk.org/\">Blue Obelisk</a> social event at the\n<a href=\"http://hardly.cubic.uni-koeln.de/pipermail/blue-obelisk/2007-March/001125.html\">South Water Kitchen</a>.</p>\n\n<p>Suggestions, like <a href=\"http://blind-science.blogspot.com/2007/03/if-i-were-going-to-chicago.html\">these from Carmen</a>\nare most welcome!</p>",
      "summary": "I had some fun today with making prints of reservations etcetera for my trip to the ACS conference in Chicago. Went over to the website to make a print of the location of the hotel I am in. (Intercontinental Chicago: in case you want to leave me a message to meet up over breakfast or so.) Anyway, so at the ACS website I found a notice that the ACS Housing people closed down and that I should contact the hotel directly. Fine, no problem. Oh wait, my hotel is not in the list. No worries, I just enter my last name and acknowledgment number. Huh, they don’t know me?? Already worried about which bridge to use as backup alternative, I emailed the organization which now takes care of it, being answered some 15 minutes later that they no longer do the hotel administration for that ACS conference anymore. That indeed rang some bell; I went over back to the ACS webpage, and this time found the correct ACS housing webpage. I had been using one from a previous ACS conference. Yeah, one of my finest hours :) Things are sorted out now, as I had already email the hotel too. Things are fine, and so my nerviness activity is back to normal. (If you care to reproduce, just go to the page for International Visitors linked from the Chicago conference homepage, scroll down to “Preparing for Your ACS Meeting Experience” and click the Hotel Information link. Makes sense, because the international guests already know how things work :) And, yes, I could have seen it mention SA in the subtitle, I know.)",
      
      "date_published": "2007-03-22T00:00:00+00:00",
      "date_modified": "2007-03-22T00:00:00+00:00",
      "tags": ["acs","chemistry","blue-obelisk"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/03/16/pipelining-chemical-information-with.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/03/16/pipelining-chemical-information-with.html",
      "title": "Pipelining chemical information with Yahoo Pipes",
      "content_html": "<p>Chemists are picking up <a href=\"http://pipes.yahoo.com/\">Yahoo Pipes</a>, or, as Noel calls them,\n<a href=\"http://www.mail-archive.com/blue-obelisk@hardly.cubic.uni-koeln.de/msg00120.html\">Pipeline Pilot for RSS feeds</a>.\nI tend to agree, as the source of the workflows are closed, that is, at least require registering to the Yahoo webpage.</p>\n\n<p>Several chemical applications have been developed since. One was developed by <a href=\"http://msblog.kermitmurray.com/\">Kermit</a>\nwho wrote an <a href=\"http://msblog.kermitmurray.com/2007/02/yahoo-pipes-mass-spectrometry.html\">aggregator for mass spectrometry journal articles</a>.\nAnd <a href=\"http://www.chemicalforums.com/index.php?action=profile;u=2\">Mitch</a> has set up a\n<a href=\"http://www.chemicalforums.com/index.php?topic=13458.msg62253#msg62253\">similar feature for ACS journals</a>.</p>\n\n<p>Now, what I am really waiting for, are the first applications that deal with molecular structures, and a\npipe that alerts me about publications in which molecules are discussed matching a certain\n<a href=\"https://doi.org/10.1021/ci600305h\">MQL molecular query for an interesting substructure</a>.</p>",
      "summary": "Chemists are picking up Yahoo Pipes, or, as Noel calls them, Pipeline Pilot for RSS feeds. I tend to agree, as the source of the workflows are closed, that is, at least require registering to the Yahoo webpage.",
      
      "date_published": "2007-03-16T00:00:00+00:00",
      "date_modified": "2007-03-16T00:00:00+00:00",
      "tags": ["rss","chemistry","publishing"],
      "_references": [{ "url": "https://doi.org/10.1021/ci600305h" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/03/14/what-is-dapagliflozin.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/03/14/what-is-dapagliflozin.html",
      "title": "What is dapagliflozin?",
      "content_html": "<p><a href=\"http://www.qdinformation.com/qdisblog\">QDIS</a> blogged about <a href=\"http://www.qdinformation.com/qdisblog/2007/01/11/bristol-myers-and-astrazeneca-in-1-billion-drug-pact/\">Bristol-Myers and AstraZeneca teaming up for a new drug called\ndapagliflozin</a>. Now,\ndapagliflozin is, this week, the most used search keyword in <a href=\"http://www.google.com/\">Google</a>, leading to\n<a href=\"http://wiki.cubic.uni-koeln.de/cb/\">Chemical blogspace</a>.</p>\n\n<p>I wondered what the chemical structure of this compound is. The <a href=\"http://www.astrazeneca.com/\">AstraZeneca</a> and\n<a href=\"http://www.bms.com/\">Bristol-Myers Squibb</a> websites don’t say. Since everything in pharma is patented I went to the US\npatent database and a search for <em>DPP-4 AND inhibitor</em> found the patents <a href=\"http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&amp;Sect2=HITOFF&amp;p=1&amp;u=%2Fnetahtml%2FPTO%2Fsearch-bool.html&amp;r=1&amp;f=G&amp;l=50&amp;co1=AND&amp;d=PTXT&amp;s1=DPP-4&amp;s2=inhibitor&amp;OS=DPP-4+AND+inhibitor&amp;RS=DPP-4+AND+inhibitor\">6,995,183</a>\nand <a href=\"http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&amp;Sect2=HITOFF&amp;p=1&amp;u=%2Fnetahtml%2FPTO%2Fsearch-bool.html&amp;r=2&amp;f=G&amp;l=50&amp;co1=AND&amp;d=PTXT&amp;s1=DPP-4&amp;s2=inhibitor&amp;OS=DPP-4+AND+inhibitor&amp;RS=DPP-4+AND+inhibitor\">6,995,180</a>.\nBut that does not help me either.</p>\n\n<p>Does anyone know the chemical structure of this compound? Just the InChI would be fine…</p>",
      "summary": "QDIS blogged about Bristol-Myers and AstraZeneca teaming up for a new drug called dapagliflozin. Now, dapagliflozin is, this week, the most used search keyword in Google, leading to Chemical blogspace.",
      
      "date_published": "2007-03-14T00:00:00+00:00",
      "date_modified": "2007-03-14T00:00:00+00:00",
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/03/08/fast-molecular-similarity-with-new-3d.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/03/08/fast-molecular-similarity-with-new-3d.html",
      "title": "Fast molecular similarity with a new 3D shape descriptor",
      "content_html": "<p><a href=\"http://wwmm.ch.cam.ac.uk/blogs/downing\">Jim</a> reported about <a href=\"http://www.dspace.cam.ac.uk/handle/1810/183858\">SPECTRa</a>\n<a href=\"http://wwmm.ch.cam.ac.uk/blogs/downing/?p=79\">being in the news</a> and <a href=\"http://www.slashdot.org/\">./</a> about\n<a href=\"http://developers.slashdot.org/developers/07/03/08/1638241.shtml\">Toward a 3D Search Engine</a>. These two items have in\ncoming that they deal with the article <em>Ultrafast shape recognition for similarity search in molecular databases</em> by\nBallester and Richards (DOI:<a href=\"https://doi.org/10.1098/rspa.2007.1823\">10.1098/rspa.2007.1823</a>). The NewScientist wrote\nup <a href=\"http://www.newscientisttech.com/article/dn11283-novel-search-engine-matches-molecules-in-a-flash.html\">their angle on it</a>,\nwith a quote from <a href=\"http://www.ch.ic.ac.uk/local/organic/mod/\">Henry Rzepa</a>.</p>\n\n<p>The article proposes a new shape descriptor which is requires little computational resources to be calculated. It consists\nof 12 numbers describing the shape, and a simple similarity measure converts it into similarities. The results shown in\nthe article, and replicated in the NewScientist article linked above, are interesting enough for me to wonder if I could\n<a href=\"http://cia.navi.cx/stats/author/f_marighetti\">Federico</a>, one of our <a href=\"http://almost.cubic.uni-koeln.de/jrg/\">CUBIC</a>\nstudents, to work on this in the last two weeks of his practical.</p>\n\n<p>BTW, <a href=\"http://andygoesus.blogspot.com/\">Andreas</a>, don’t those review articles (viz.\nDOI:<a href=\"https://doi.org/10.1039/b409813g\">10.1039/b409813g</a>) work out good for your citation count ;)</p>",
      "summary": "Jim reported about SPECTRa being in the news and ./ about Toward a 3D Search Engine. These two items have in coming that they deal with the article Ultrafast shape recognition for similarity search in molecular databases by Ballester and Richards (DOI:10.1098/rspa.2007.1823). The NewScientist wrote up their angle on it, with a quote from Henry Rzepa.",
      
      "date_published": "2007-03-08T00:00:00+00:00",
      "date_modified": "2007-03-08T00:00:00+00:00",
      
      "_references": [{ "url": "https://doi.org/10.1098/rspa.2007.1823" },{ "url": "https://doi.org/10.1039/b409813g" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/02/23/nature-network-v2-cannot-create-new.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/02/23/nature-network-v2-cannot-create-new.html",
      "title": "Nature Network v2: cannot create a new group",
      "content_html": "<p><a href=\"http://blogs.nature.com/wp/nascent/2007/02/nature_network_v2_is_live.html\">Nascent</a> reported that\n<a href=\"http://network.nature.com/\">Nature Network v2</a> has gone life. Never too anxious to try something new,\nI created an account and signed in. I even joined two groups: <em>Bioinformatics</em> and <em>Semantic Web for the Life Sciences</em>.</p>\n\n<p>But, when I tried to create a new group, the system fails. I promised me to send me email for confirmation.\nTried it twice via my <a href=\"http://www.sf.net/\">Sourceforge</a> email account. No email. I then changed my email\nfor my Nature account to my Gmail address. Still no email…</p>\n\n<p>I am not located in Boston or London, is that the problem? Is being ‘global’ not good enough? Is the requirement\nto have two ‘o’s in the name? Cologne then, maybe?</p>\n\n<h2 id=\"missing-features\">(Missing) Features</h2>\n\n<p>For the rest, the system seems interesting. I am not too fond of having to create accounts all over the place\n(<em>what was the password again???</em>), but looks promising. The thing I missed most when filling out my profile\nwas a feature to import the list of my publications from <a href=\"http://www.connotea.org/\">Connotea</a>.</p>\n\n<p>Another thing I missed, was the ability to mention my blog(s) in my profile. May I put this in as request too?\nBTW, is there a group or forum on Nature Network where I can file these things?</p>",
      "summary": "Nascent reported that Nature Network v2 has gone life. Never too anxious to try something new, I created an account and signed in. I even joined two groups: Bioinformatics and Semantic Web for the Life Sciences.",
      
      "date_published": "2007-02-23T00:00:00+00:00",
      "date_modified": "2007-02-23T00:00:00+00:00",
      "tags": ["nature"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/02/20/invisible-inchis.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/02/20/invisible-inchis.html",
      "title": "Invisible InChI&apos;s",
      "content_html": "<p>Some <a href=\"http://www.iupac.org/inchi/\">InChI</a>’s are short, such as that for methane: <span class=\"chem:inchi\">InChI=1/CH4/h1H4</span>.\nOthers are long (think <a href=\"http://chem-bla-ics.blogspot.com/2006/03/inchis-in-latex-and-cdk-news.html\">crambin</a>), and you don’t\nwant to show them inline. Or you just want to show them anyway, but still want the chemistry to be understood. Here come the\ninvisible InChI’s.</p>\n\n<h2 id=\"alt-text-for-images\">Alt text for images</h2>\n\n<p>One solution is to put the InChI as content of the @alt attribute of the HTML <code class=\"language-plaintext highlighter-rouge\">&lt;img&gt;</code> element. This has the downside that it\nhas no explicit semantic meaning. For example, the <a href=\"http://scienceblogs.com/moleculeoftheday/\">Molecule Of The Day</a> blog is using\nthis approach. It’s an excellent start, but not the solution.</p>\n\n<h2 id=\"as-keyword\">As Keyword</h2>\n\n<p>Another option is to put it in as keyword, in the HTML <code class=\"language-plaintext highlighter-rouge\">&lt;head&gt;</code> element: <code class=\"language-plaintext highlighter-rouge\">&lt;meta name=\"keywords\" content=\"InChI=1/CH4/h1H4\"&gt;</code>.\nBut Google does not index this, so the use is restricted.</p>\n\n<h2 id=\"invisible-text\">Invisible text</h2>\n\n<p>The most promosing alternative, however, is to put it in using the <code class=\"language-plaintext highlighter-rouge\">&lt;span&gt;</code> element, in combination with microformats or RDFa,\nLike this: <span class=\"chem:inchi\" style=\"font-size: 0%; visibility: hidden;\">InChI=1/CH4/h1H4</span>.\nIt does not show up, does it? But it is really there, as you would see, if you have\n<a href=\"http://chem-bla-ics.blogspot.com/2006/12/chemistry-in-html-greasemonkey-again.html\">the special Greasemonkey</a> installed.</p>\n\n<p>This is the HTML code for this example:</p>\n\n<div class=\"language-html highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nt\">&lt;span</span> <span class=\"na\">class=</span><span class=\"s\">\"chem:inchi\"</span> <span class=\"na\">style=</span><span class=\"s\">\"font-size: 0%; visibility: hidden;\"</span><span class=\"nt\">&gt;</span>InChI=1/CH4/h1H4<span class=\"nt\">&lt;/span&gt;</span>\n</code></pre></div></div>\n\n<p>The <code class=\"language-plaintext highlighter-rouge\">@style</code> attribute marks the text’s visibility as hidden, and the font-size is set to 0%. It is important not to set it\nto zero itself, because many web browsers do not interpret zero font size correctly, and take the default font size instead.</p>\n\n<p>This should solve the standing problem that we would like to include the InChI’s in our blogs, if it would just not be so\nlong and unreadable. Just hide it.</p>\n\n<p><strong>Update</strong>: Daniel <a href=\"https://web.archive.org/web/20070514085137/https://chem-bla-ics.blogspot.com/2007/02/invisible-inchis.html#comment-6321491648638004528\">informed</a>\nme that Google won’t index text marked ‘visibility: hidden’ and may even mark your webpage as spam :( Not the solution either.\nRead the comments for more thoughts.</p>",
      "summary": "Some InChI’s are short, such as that for methane: InChI=1/CH4/h1H4. Others are long (think crambin), and you don’t want to show them inline. Or you just want to show them anyway, but still want the chemistry to be understood. Here come the invisible InChI’s.",
      
      "date_published": "2007-02-20T00:00:00+00:00",
      "date_modified": "2007-02-20T00:00:00+00:00",
      "tags": ["inchi","html"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/02/19/pimp-my-javadoc.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/02/19/pimp-my-javadoc.html",
      "title": "Pimp my JavaDoc",
      "content_html": "<p><a href=\"http://miningdrugs.blogspot.com/\">Jörg</a>’s PhD book <em>Data Mining und Graph Mining auf molekularen Graphen - Chemoinformatik und\nmolekulare Kodierungen für ADME/Tox-QSAR-Analysen</em> has a dump of the JavaDoc of the <code class=\"language-plaintext highlighter-rouge\">GroupContributionPredictor</code> in\n<a href=\"http://joelib.sf.net/\">JOELib</a> (Figure 3.2, page 43). There are two nice things to the shown JavaDoc: 1. it has links to\n<a href=\"http://www.wikipedia.org/\">Wikipedia</a>; 2. it has a Further Reading section.</p>\n\n<p>Now, the <a href=\"http://cdk.sf.net/\">CDK</a> already links to a bibliography for some time now. However, it would just give a BibTex\nkey, and link to a webpage created from a <a href=\"http://bibtexml.sf.net/\">BibTeXML</a> file in which we store all references\n(<a href=\"http://cdk.svn.sourceforge.net/viewvc/cdk/trunk/cdk/doc/refs/cheminf.bibx?view=log\">cdk/doc/refs/cheminf.bibx</a>).\nPutting the full citation inline makes the JavaDoc more informative, but I wanted to preserve the <code class=\"language-plaintext highlighter-rouge\">@cdk.cite</code>\nmechanism we were using.</p>\n\n<p>This weekend I hacked up a nice CDKCiteDoclet that would read the BibTeXML file with <a href=\"http://www.xom.nu/\">XOM</a>,\nand convert items to HTML to put into the pimped JavaDoc:</p>\n\n<p><img src=\"/assets/images/pimpedJavaDoc.png\" alt=\"\" /></p>",
      "summary": "Jörg’s PhD book Data Mining und Graph Mining auf molekularen Graphen - Chemoinformatik und molekulare Kodierungen für ADME/Tox-QSAR-Analysen has a dump of the JavaDoc of the GroupContributionPredictor in JOELib (Figure 3.2, page 43). There are two nice things to the shown JavaDoc: 1. it has links to Wikipedia; 2. it has a Further Reading section.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/pimpedJavaDoc.png",
      "date_published": "2007-02-19T00:00:00+00:00",
      "date_modified": "2007-02-19T00:00:00+00:00",
      "tags": ["cdk","javadoc","literature"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/02/17/is-that-jmol-in-that-d-wave-demo.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/02/17/is-that-jmol-in-that-d-wave-demo.html",
      "title": "Is that Jmol in that D-Wave demo?",
      "content_html": "<p><a href=\"http://science.slashdot.org/article.pl?sid=07/02/15/1417236&amp;from=rss\">Slashdot reported</a> on\n<a href=\"http://www.dwavesys.com/\">D-Wave</a>’s recent demo of their 16-<a href=\"http://en.wikipedia.org/wiki/Qubit\">qubit</a>\nquantum computing system. <a href=\"http://kwc.org/blog/archives/2007/2007-02-14.dwave_demo.html\">Video’s of the demo</a>\ncan be watched on <a href=\"http://video.google.com/\">Google Video</a>. The <a href=\"http://video.google.com/videoplay?docid=-291541120357804188&amp;hl=en\">second video</a>\ndemonstrates the use of the machine in similarity searching:</p>\n\n<p><img src=\"/assets/images/dwaveDemo.png\" alt=\"\" /></p>\n\n<p>Now, that screenshot does look like <a href=\"http://jmol.sf.net/\">Jmol</a>.\nThe companies website does not give the answer, <a href=\"http://scottaaronson.com/blog/?p=198\">though Scott mentions C and Java front end software</a>.</p>\n\n<p>So, let’s ask the source: Dear dr. <a href=\"http://dwave.wordpress.com/\">Rose</a>, is it Jmol what we see in that demo?</p>",
      "summary": "Slashdot reported on D-Wave’s recent demo of their 16-qubit quantum computing system. Video’s of the demo can be watched on Google Video. The second video demonstrates the use of the machine in similarity searching:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/dwaveDemo.png",
      "date_published": "2007-02-17T00:00:00+00:00",
      "date_modified": "2007-02-17T00:00:00+00:00",
      "tags": ["jmol","quantum"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/02/04/writing-up-my-phd-introduction-chapter.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/02/04/writing-up-my-phd-introduction-chapter.html",
      "title": "Writing up my PhD introduction chapter...",
      "content_html": "<p>The last twelve months or so, I have been doing two jobs (excluding hobbies of mine, such as\n<a href=\"http://wiki.cubic.uni-koeln.de/cb/\">Chemical blogspace</a>): my postdoc in <a href=\"http://almost.cubic.uni-koeln.de/jrg\">the group of Christoph Steinbeck</a>\non computer aided structure elucidation, and finishing my PhD. The topic of my PhD is about the interplay between chemoinformatics\nand chemometrics: the first being strong in dealing with molecular structures, the latter strong in data analysis and mining,\noriginally on experimental data. Really, I focused on a few existing problems, such as how to represent and analyze large\nlibraries of crystal structures, the use of NMR spectra in QSAR studies, and two more practical problems regarding reproducibility\nof scientific results, which includes communication of data, and transferability of algorithms. Actually, I also studied fragment\nmining in QSAR for a set of transfactants, but that has not lead to firm results yet.</p>\n\n<p>The below diagram shows how I see the interplay between both fields:</p>\n\n<p><img src=\"/assets/images/rodeDraad.png\" alt=\"\" /></p>",
      "summary": "The last twelve months or so, I have been doing two jobs (excluding hobbies of mine, such as Chemical blogspace): my postdoc in the group of Christoph Steinbeck on computer aided structure elucidation, and finishing my PhD. The topic of my PhD is about the interplay between chemoinformatics and chemometrics: the first being strong in dealing with molecular structures, the latter strong in data analysis and mining, originally on experimental data. Really, I focused on a few existing problems, such as how to represent and analyze large libraries of crystal structures, the use of NMR spectra in QSAR studies, and two more practical problems regarding reproducibility of scientific results, which includes communication of data, and transferability of algorithms. Actually, I also studied fragment mining in QSAR for a set of transfactants, but that has not lead to firm results yet.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/rodeDraad.png",
      "date_published": "2007-02-04T00:00:00+00:00",
      "date_modified": "2007-02-04T00:00:00+00:00",
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/02/03/cdk-workshop-days-3-and-4.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/02/03/cdk-workshop-days-3-and-4.html",
      "title": "CDK Workshop - Days #3 and #4",
      "content_html": "<p>Days #3 and #4 of the <a href=\"http://wiki.cubic.uni-koeln.de/cdkwiki/doku.php?id=spring2007workshop\">CDK Workshop</a> have been\nquite busy indeed, and I have not been able to summarize them so far. After <a href=\"http://chem-bla-ics.blogspot.com/2007/01/cdk-workshop-day-2.html\">a rather interesting day #2</a>,\nthe third day was the last one with scheduled presentations. Kai Hartmann showed how he used the CDK in his systems\nbiology research, and contributed the code he wrote to predict Gibbs energies based on fragment contributions.\nMiguel Rojas showed his MS prediction work, which is based on the CDK too.</p>\n\n<p>Much of the rest of day and Thursday continued on the work started yesterday: making the 3D structure builder a\nsingleton class, and applying and testing an optimization for the AllRingsFinder to address\n<a href=\"http://chem-bla-ics.blogspot.com/2007/01/cdk-workshop-day-2.html\">molecules like Choloyl-CoA</a>. The trick\nbasically consists of applying the all rings finding algorithm to isolated systems only. The effect is\nconsiderable: the total computation time for Choloyl-CoA decreases by a 93 fold! We found that the\nfingerprints used in the template library for the 3D structure builder are outdated, and Christoph worked\non updating that, which required searching into old archives to find the tool to do just this.</p>\n\n<p>Because the above performance fix did not fix the current slow SMILES parsing, Kai looked at the\n<code class=\"language-plaintext highlighter-rouge\">DeduceBondOrderTool</code> which is the slow component, and optimized the used algorithm by reusing determined\nmolecular ring systems. Nevertheless, on users requests, a time out mechanism is now available for SMILES\nparsing. Additionally, several of the bugs found on the second workshop day have been fixed. Meanwhile,\nI was distracted by other things. For example, fixing <a href=\"http://www.bioclipse.net/\">Bioclipse</a> bugs for\n<a href=\"http://bioclipse.blogspot.com/2007/02/bioclipse-101-released.html\">the version 1.0.1 released yesterday</a>.\nThe SENECA tool is not forgotten too, and last weekend I made some good progress with it,\n<a href=\"http://wiki.cubic.uni-koeln.de/blog/pivot/entry.php?id=15\">which Christoph blogged about</a>.</p>",
      "summary": "Days #3 and #4 of the CDK Workshop have been quite busy indeed, and I have not been able to summarize them so far. After a rather interesting day #2, the third day was the last one with scheduled presentations. Kai Hartmann showed how he used the CDK in his systems biology research, and contributed the code he wrote to predict Gibbs energies based on fragment contributions. Miguel Rojas showed his MS prediction work, which is based on the CDK too.",
      
      "date_published": "2007-02-03T00:00:00+00:00",
      "date_modified": "2007-02-03T00:00:00+00:00",
      "tags": ["cdk","smiles"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/02/01/rsc-first-publisher-to-go-semantic.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/02/01/rsc-first-publisher-to-go-semantic.html",
      "title": "RSC: the first publisher to go semantic!",
      "content_html": "<p>Just announced: <a href=\"http://web.archive.org/web/20070211195109/http://www.rsc.org/Publishing/Journals/ProjectProspect/index.asp\">the RSC goes semantic <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>!\nColin Batchelor was here at the CUBIC last autumn, where we discussed issues involved, mostly relating to\nexperimental section of organic chemistry syntheses, and NMR and MS spectra in particular, so I knew that\nthis was coming our way. The announcement writes:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>RSC Publishing, the publishing arm of the Royal Society of Chemistry, is\npleased to announce a new initiative for its journals. From February\n2007 electronic RSC journal papers will be enhanced so that their data\ncan be read, indexed and intelligently searched by machine, a first step\ntowards the \"semantic web\".\n\nReaders will be able to click on named compounds and scientific concepts\nin an electronic journal article to download structures, understand\ntopics, or link through to electronic databases; compounds and ontology\nterms will be published as RSS feeds enabling automated discovery of\nrelevant research.\n\nThe initiative, coined 'Project Prospect', is the first of its scope\nfrom a primary research publisher. Developed together with UK academics\nbased at the Unilever Centre of Molecular Informatics and the Computing\nLaboratory at Cambridge University, the Project uses InChIs (IUPAC's\nInternational Chemical Identifier for compounds); OBO ontology terms\n(Open Biomedical Ontologies: a hierarchical classification of biomedical\nterms) such as the Gene Ontology (GO) and the related Sequence Ontology\n(SO); terms from the IUPAC Gold Book; and CML (Chemical Markup Language:\na means to describe molecular information in a structured form).\n\nThis is a completely free service for authors and readers of RSC\njournals. The enhanced articles have an at a glance HTML view with\nadditional features accessed by a tool box. Downloadable compound\nstructures and printer friendly versions will be available via this new\nservice.\n</code></pre></div></div>\n\n<p>Colin, cheers!</p>",
      "summary": "Just announced: the RSC goes semantic ! Colin Batchelor was here at the CUBIC last autumn, where we discussed issues involved, mostly relating to experimental section of organic chemistry syntheses, and NMR and MS spectra in particular, so I knew that this was coming our way. The announcement writes:",
      
      "date_published": "2007-02-01T00:00:00+00:00",
      "date_modified": "2024-10-14T00:00:00+00:00",
      "tags": ["semweb","chemistry","publishing"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/01/30/cdk-workshop-day-2.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/01/30/cdk-workshop-day-2.html",
      "title": "CDK Workshop - Day #2",
      "content_html": "<p>Because of other obligations, I was unable to attend the first day of the <a href=\"http://wiki.cubic.uni-koeln.de/cdkwiki/doku.php?id=spring2007workshop\">CDK Workshop</a>,\nthough Christoph had set up Skype so that at least I could hear the talks from <a href=\"http://www.inf.uni-konstanz.de/bioml/staff/berthold/\">Prof. Berthold</a>\n(Konstanz, Germany) about <a href=\"http://www.knime.org/\">KNIME</a> and <a href=\"http://almost.cubic.uni-koeln.de/cosi/curriculumVitae_zielesny.htm\">Prof. Zielesny</a>\nabout <a href=\"http://cdk-taverna.de/\">CDK-Taverna</a>.</p>\n\n<p>Today, Miguel Rojas and Stefan Kuhn discussed their research. Miguel showed the state of mass spectrum prediction using the <a href=\"http://cdk.sf.net/\">CDK</a>\nand the MEDEA plugin for <a href=\"http://www.bioclipse.net/\">Bioclipse</a>. Stefan demonstrated the <a href=\"http://www.nmrshiftdb.org/\">NMRShiftDB</a>\nand a new lab systems for NMR experiment scheduling and management system based on that. <a href=\"http://www2.cmbi.ru.nl/who-and-where/staff/27/\">Dr. Ott</a>\n(Nijmegen, Netherlands) showed the <a href=\"http://biometa.cmbi.ru.nl/\">BioMeta Database</a> which contains metabolite and reaction information derived from the\n<a href=\"http://www.genome.jp/kegg/ligand.html\">KEGG</a>, but which fixes a set of chemical problems in the latter (see also the article,\nDOI:<a href=\"https://doi.org/10.1186/1471-2105-7-517\">10.1186/1471-2105-7-517</a>).</p>\n\n<p>The afternoons of CDK workshops traditionally have discussion sessions and hackathons. Two groups were formed: one consisted of the KNIME guys who,\ntogether with Miguel and Federico focused in QSAR descriptor calculations in KNIME, while Stefan, Martin and me looked at the fingerprinter\npeculiarities that Martin found (see also this <a href=\"http://almost.cubic.uni-koeln.de/cdk/cdk_top/cdk_news/archive/cdknews2.2.article22.pdf\">CDK News article</a>),\nand came up with a possible further performance improvement of the AllRingsFinder. Because one class of molecules that is causing trouble consist of two\nring systems connected by a long linker, like Choloyl-CoA (below), we anticipate that splitting the molecule up into ring systems prior to using the\nSSSR algorithm should speed up the complete all-ring finding process.</p>\n\n<p><img src=\"/assets/images/choloyl-coa.png\" alt=\"\" /></p>\n\n<p>Currently, the spanning tree is calculated before deciding on using the SSSR finder, which, we think, can be used to partition the molecule\ninto separate ring systems. On each of them, then, the further steps of the ring search can be applied.</p>\n\n<p>After dinner (pasta/pizza), during the Spanish-German handball game, we continued the hacking and discussions, now focusing as a whole group\non QSAR descriptors in KNIME. We looked at each descriptor and decided if it should go into a QSAR calculator node, or even in a node of its own.</p>\n\n<h2 id=\"bugs-found\">Bugs found</h2>\n<p>I won’t close this blog entry without giving a list of problems we found in the current CDK; some minor and small, some more troublesome.\nHere goes: typos all over the place; the OrderQueryBond lack a return statement in an else clause; the Mol2Reader does not mark atom and\nbond aromaticity properly and reads a single bond as aromatic, and an aromatic bond as single; the Renderer2D does not always highlight\nboth atoms when hovering over a bond; SmilesGenerator.parseBond() should output bond orders correctly; the SSSR finder seems to have a\nmessed up if-else statement for the ringBondCount limit of 37; the BondCount descriptor should count all bonds by default, not just the\nsingle bonds; <code class=\"language-plaintext highlighter-rouge\">IDescriptor.getParameters()</code> should return null instead of <code class=\"language-plaintext highlighter-rouge\">Object[0];</code> several programs use the SYBYL atomtype S.o2, while\nthe specification and the CDK config defines S.O2; the IP descriptor now returns a variable length descriptor.</p>",
      "summary": "Because of other obligations, I was unable to attend the first day of the CDK Workshop, though Christoph had set up Skype so that at least I could hear the talks from Prof. Berthold (Konstanz, Germany) about KNIME and Prof. Zielesny about CDK-Taverna.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/choloyl-coa.png",
      "date_published": "2007-01-30T00:00:00+00:00",
      "date_modified": "2007-01-30T00:00:00+00:00",
      "tags": ["cdk","kegg","knime","smiles","taverna"],
      "_references": [{ "url": "https://doi.org/10.1186/1471-2105-7-517" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/01/24/osmb2007-day-1-venture-capital.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/01/24/osmb2007-day-1-venture-capital.html",
      "title": "OSMB2007 Day #1: venture capital, scientific blogger and Kepler",
      "content_html": "<p>The second day just started of the <a href=\"http://www.heise.de/veranstaltungen/2007/ho_osb/en/\">Open Source Meets Business</a>,\nand now actually listening to the PHP talk, but here is a short update on day 1, which was the investment summit. It\nwas not so crowded, but especially the talks from the venture capitalists were interesting. During lunch we actually\ntalked to one in person, which was insightful. I will be putting up links to interesting sites mentioned during this\nconference on my <a href=\"http://del.icio.us/egonw/OSMB2007\">delicious account</a>.</p>\n\n<p>Nothing much more I can tell about this, except for a few general quotes:</p>\n\n<ul>\n  <li>2% of the downloaders become paying customers</li>\n  <li>an active community is important, cherish it</li>\n  <li>support as business model is not interesting for venture capatilists</li>\n  <li>don’t think you understand the legal implications</li>\n</ul>\n\n<p>Noteworthy is that we have free wireless at the conference site :) So I downloaded a recent\n<a href=\"http://drexel-coas-talks-mp3-podcast.blogspot.com/2007/01/nc-science-blogging-conference.html\">presentation by Jean-Claude about his open science work and blogging efforts</a>,\nwhich I enjoyed watching very much. I skyped with my wife and children, and I booked a hotel for the\n<a href=\"http://www.chemistry.org/portal/a/c/s/1/acsdisplay.html?DOC=meetings%5Cchicago2007%5Chome.html\">ACS meeting in March in Chicago</a>,\nas chances are high that I will attend that meeting.</p>\n\n<p>Last night it started snowing, and it is completely white outside right now. The temperature has dropped to normal\nwinter season, which made the burritos in downtown Nuernberg extra nice. Later today, Christoph’s\n<a href=\"http://www.chemoinformatics.org/\">COSI</a> talk is scheduled, and I was delighted to learn via\n<a href=\"http://wiki.cubic.uni-koeln.de/cb/\">Chemical blogspace</a> that\n<a href=\"http://cszamudio.spaces.live.com/Blog/cns!9BCF6F9D6772B8F5!1461.entry\">Carlos blogged about it yesterday</a>!\nCheers Carlos! In the same blog he also mentions that he is integrating the <a href=\"http://cdk.sf.net/\">CDK</a>\nwith something called Kepler. Carlos, if you read this: what is the URL for Kepler?</p>",
      "summary": "The second day just started of the Open Source Meets Business, and now actually listening to the PHP talk, but here is a short update on day 1, which was the investment summit. It was not so crowded, but especially the talks from the venture capitalists were interesting. During lunch we actually talked to one in person, which was insightful. I will be putting up links to interesting sites mentioned during this conference on my delicious account.",
      
      "date_published": "2007-01-24T00:00:00+00:00",
      "date_modified": "2007-01-24T00:00:00+00:00",
      "tags": ["cdk","acs","cheminf"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/01/24/blogging-and-press.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/01/24/blogging-and-press.html",
      "title": "Blogging and the Press",
      "content_html": "<p>Today at the <a href=\"http://chem-bla-ics.blogspot.com/2007/01/osmb2007-day-1-venture-capital.html\">OSMB</a> we had again a good\nlunch again, and Rachel Sterne joined our table. She works at a New York based start up\n<a href=\"http://groundreport.com/articles.php?id=274\">Ground Report</a>, which is a news website where anyone, including bloggers,\ncan post news stories. Not links to news stories, as on <a href=\"http://slashdot.org/\">Slashdot</a>, but actual news stories.\nStories that can be committed are not restricted to any topic, or country, or whatever. The good news is that the\nrevenues out of advertisement is shared with the people that submit the stories, 50/50 even, if I understood correctly.\nThe more visitor hits your story gets, the bigger your part of the revenue is.</p>\n\n<p>Now, the reason why I advertise this, is that <a href=\"http://blog.chembark.com/2007/01/22/blogging-creds/\">Paul recently blogged about the status of bloggers as members of the\npress</a>. ACS does not seem to think so, though even\n<a href=\"http://www.pulitzer.org/resources/onlinerel.html\">the Pulizer organization disagrees</a>. The ACS requires that\nfreelancers are connected to an news organization, and I am wondering wether they would accept Ground Report as such…</p>",
      "summary": "Today at the OSMB we had again a good lunch again, and Rachel Sterne joined our table. She works at a New York based start up Ground Report, which is a news website where anyone, including bloggers, can post news stories. Not links to news stories, as on Slashdot, but actual news stories. Stories that can be committed are not restricted to any topic, or country, or whatever. The good news is that the revenues out of advertisement is shared with the people that submit the stories, 50/50 even, if I understood correctly. The more visitor hits your story gets, the bigger your part of the revenue is.",
      
      "date_published": "2007-01-24T00:00:00+00:00",
      "date_modified": "2007-01-24T00:00:00+00:00",
      "tags": ["acs","blog"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/01/22/open-source-meets-business-2007.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/01/22/open-source-meets-business-2007.html",
      "title": "Open Source Meets Business 2007",
      "content_html": "<p>Today I leave for a two day visit at the <a href=\"http://www.heise.de/veranstaltungen/2007/ho_osb/en/\">Open Source Meets Business</a>\nconference in Nürnberg, where <a href=\"http://wiki.cubic.uni-koeln.de/blog/\">Christoph</a> will speak about the\n<a href=\"http://chemoinformatics.org/\">Chemoinformatics OpenSource Initiative</a> (COSI). If you happen to go to that meeting too,\nlet’s try to meet!</p>",
      "summary": "Today I leave for a two day visit at the Open Source Meets Business conference in Nürnberg, where Christoph will speak about the Chemoinformatics OpenSource Initiative (COSI). If you happen to go to that meeting too, let’s try to meet!",
      
      "date_published": "2007-01-22T00:00:00+00:00",
      "date_modified": "2007-01-22T00:00:00+00:00",
      "tags": ["cheminf","opensource"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/01/14/cdk-literature-1.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/01/14/cdk-literature-1.html",
      "title": "CDK Literature #1",
      "content_html": "<p>For each <a href=\"http://www.cdknews.org/\">CDK News</a> I try to write up what CDK related literature has been published\nrecently, but I failed to do so for the last two issues. In order to not postpone writing it up until close to\nthe deadline, I will write up things here, so that I can copy-paste it later for CDK News.</p>\n\n<h2 id=\"oxidoreductase-catalyzed-reactions\">Oxidoreductase-catalyzed reactions</h2>\n\n<p>Mu <em>et al.</em> analyzed about 2000 oxidation/reduction reactions from <a href=\"https://chem-bla-ics.blogspot.com/2007/01/cdk-literature-1.html\">KEGG</a>\nusing the <a href=\"http://cdk.sf.net/\">CDK</a> and <a href=\"http://joelib.sf.net/\">JOELib</a> for the chemoinformatics bits. The reactions were grouped into\n12 subclasses, and SVM was used to train models to distinguish reactants from non-reactants. It seems that there were not independent\ntest sets used, but cross-validation indicates that there approach is possible. The works uses CDK’s HydrogenAdder,\nUniversalIsomorphismTester, and unnamed QSAR descriptors. It would be interesting to see how it compares to\n<a href=\"http://chem-bla-ics.blogspot.com/2006/04/mining-kegg-pathway-database-with-self.html\">the work of Aires-de-Sousa</a>.</p>\n\n<h2 id=\"cognate-ligands\">Cognate ligands</h2>\n\n<p>Bashton <em>et al.</em> took a different approach in analyzing the metabolome. They looked at the correlation of ligand structure with enzyme\ndomains, and propose a method to identify cognate ligands, that is, ligands that are present in vivo and are required for a functional\nmetobolome. The CDK is used for calculating fingerprints and used for calculating maximal common substructures (MCSS). The paper notes\nthat the MCSS is not necessarily of biochemical relevance, indicating that there is room for pharmacophore like concept in the CDK.</p>",
      "summary": "For each CDK News I try to write up what CDK related literature has been published recently, but I failed to do so for the last two issues. In order to not postpone writing it up until close to the deadline, I will write up things here, so that I can copy-paste it later for CDK News.",
      
      "date_published": "2007-01-14T00:00:00+00:00",
      "date_modified": "2007-01-14T00:00:00+00:00",
      "tags": ["cdk"],
      "_references": [{ "url": "https://doi.org/10.1093/bioinformatics/btl535" },{ "url": "https://doi.org/10.1016/j.jmb.2006.09.041" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/01/11/why-do-i-blog.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/01/11/why-do-i-blog.html",
      "title": "Why do I blog?",
      "content_html": "<p><a href=\"http://www.chemicalforums.com/index.php?topic=12307.msg57384#msg57384\">Mitch blogged</a> about a comment Bethany Halford,\nAssociate Editor of <a href=\"http://pubs.acs.org/cen/\">C&amp;EN</a>, <a href=\"http://www.thechemblog.com/?p=360#comment-1889\">left in The Chem Blog</a>.\nShe is writing an opinion piece on chemistry blogs, and is wondering why I blog, whether I use a nickname, and if my\nemployer knows I blog. So, here goes.</p>\n\n<h2 id=\"why-do-i-blog\">Why do I blog?</h2>\n\n<p>I started blogging <a href=\"https://chem-bla-ics.linkedchemistry.info/2005/10/15/chem-bla-ics.html\">in October 2005 <i class=\"fa-solid fa-recycle fa-xs\"></i></a> to reduce my workload:\ninvolved in open source chemoinformatics projects, I quite often emailed to mailing lists about interesting websites/projects/events\netc. Not uncommonly to multiple lists, which required me to tune the email to the list. I realized that blogging about it, would\nmake it possible to no longer post it to mailing lists, and, therefore, reduce my workload. A second reason is that I post\ntricks there, so that I have them available in a central place, and to post questions that, hopefully, others can answer.\nAs such, it is a way of communicating with fellow scientists, without the need the specifically address them. Open, free and fast.</p>\n\n<p>Deliberately, I did not start a personally diary blog, but a blog about my work as chemoinformatician. Nevertheless, the nature of\nblogging allows to give what you write a personal twist. To stress scientific nature of my blog, and many others, is that blogging\nscientists often cite and discuss literature, which nicely leads to scientific blog aggregators like <a href=\"http://postgenomic.com/\">Postgenomic.com</a>\nand <a href=\"http://web.archive.org/web/20070115091132/http://wiki.cubic.uni-koeln.de/cb/\">Chemical blogspace <i class=\"fa-solid fa-archive fa-xs\"></i></a>,\nwhich summarize the scientific literature being discussed in\nthe blogosphere. The latter even <a href=\"http://chem-bla-ics.blogspot.com/2007/01/chemical-blogspace-is-getting-more.html\">recently started to blog about molecules being discussed</a>.\nThere are even blogs which specialize on discussing literature, such as <a href=\"http://cheminfoclub.blogspot.com/\">the blog by Rajarshi, Gary and David</a>.</p>\n\n<h2 id=\"why-do-i-not-use-a-nickname\">Why do I not use a nickname?</h2>\n\n<p>In my blogging I am clear in who I am, even where I work; I blog about my scientific work, and, as reader, putting one and one\ntogether would lead to my real name soon enough anyway. I did not discuss the blogging with the employer I had in 2005, but the\nblogging is mostly done outside office hours anyway, certainly in that period. My current employer is a\n<a href=\"https://web.archive.org/web/20070120084645/http://wiki.cubic.uni-koeln.de/blog/index.php\">scientific blogger himself <i class=\"fa-solid fa-archive fa-xs\"></i></a>.\nEven my nickname, or pseudonym, is not that obfuscated.</p>\n\n<p>Moreover, I do make a statement in my blog (which sort of summarizes to: “you cannot do science if you cannot reproduce experimental results”),\nand I think it is not more than fair to identify myself. I’m not like Ender’s brother <a href=\"http://en.wikipedia.org/wiki/Peter_Wiggin\">Peter</a>.</p>\n\n<h2 id=\"why-do-i-answer-bethanys-questions\">Why do I answer Bethany’s questions?</h2>\n\n<p>I try to convince myself that I do not answer these questions out of <a href=\"https://cen.acs.org/articles/84/i22/Power-Procrastination.html\">procrastination <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\nsomething Bethany is wondering. Instead, I like blogging as new way to communicate with fellow scientists on a scientific level (Bethany,\nplease <em>do</em> explore <a href=\"https://web.archive.org/web/20070607221957/http://wiki.cubic.uni-koeln.de/cb/blogs.php\">the full chemical blogspace <i class=\"fa-solid fa-recycle fa-xs\"></i></a>, and be amazed of the high scientific\ncontent gems around!), though this might qualify is catching up with current literature. Moreover, answering this questions allows\nme to advertize my blog, and some websites I like. I feel that blogging might fill a niche in scientific communication.</p>\n\n<p>Bethany, please feel free to leave additional questions as comment.</p>",
      "summary": "Mitch blogged about a comment Bethany Halford, Associate Editor of C&amp;EN, left in The Chem Blog. She is writing an opinion piece on chemistry blogs, and is wondering why I blog, whether I use a nickname, and if my employer knows I blog. So, here goes.",
      
      "date_published": "2007-01-11T00:00:00+00:00",
      "date_modified": "2007-01-11T00:00:00+00:00",
      "tags": ["blog","chemistry"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/01/09/delicious-tagometer-on-www2bloggercom.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/01/09/delicious-tagometer-on-www2bloggercom.html",
      "title": "The del.icio.us tagometer on www2.blogger.com",
      "content_html": "<p>Yesterday I blogged about <a href=\"http://chem-bla-ics.blogspot.com/2007/01/delicious-tagometer-on-blogspotcom.html\">how to include the new del.icio.us tagometer on a\nwww.blogger.com blog</a>,\njust like <a href=\"http://consumingexperience.blogspot.com/2006/12/delicious-tagometer-howto-manual-mode.html\">Improbulus did last December</a>\nas I discovered later. <a href=\"http://chemical-quantum-images.blogspot.com/\">Felix</a>\nasked me how it could be done on the new www2.blogger.com template system. Well,\nhere it is.</p>\n\n<p>Like with the old blogger.com template system, you need to add this to the header,\njust before the <code class=\"language-plaintext highlighter-rouge\">&lt;/head&gt;</code> end tag:</p>\n\n<div class=\"language-html highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"c\">&lt;!-- del.icio.us badge stuff --&gt;</span>\n<span class=\"nt\">&lt;script </span><span class=\"na\">type=</span><span class=\"s\">\"text/javascript\"</span><span class=\"nt\">&gt;</span>\n  <span class=\"k\">if </span><span class=\"p\">(</span><span class=\"k\">typeof</span> <span class=\"nb\">window</span><span class=\"p\">.</span><span class=\"nx\">Delicious</span> <span class=\"o\">==</span> <span class=\"dl\">\"</span><span class=\"s2\">undefined</span><span class=\"dl\">\"</span><span class=\"p\">)</span> <span class=\"nb\">window</span><span class=\"p\">.</span><span class=\"nx\">Delicious</span> <span class=\"o\">=</span> <span class=\"p\">{};</span>\n  <span class=\"nx\">Delicious</span><span class=\"p\">.</span><span class=\"nx\">BLOGBADGE_MANUAL_MODE</span> <span class=\"o\">=</span> <span class=\"kc\">true</span><span class=\"p\">;</span>\n<span class=\"nt\">&lt;/script&gt;</span>\n<span class=\"nt\">&lt;link</span> <span class=\"na\">id=</span><span class=\"s\">\"delicious-blogbadge-css\"</span> \n      <span class=\"na\">href=</span><span class=\"s\">\"http://images.del.icio.us/static/css/blogbadge.css\"</span>\n      <span class=\"na\">rel=</span><span class=\"s\">\"stylesheet\"</span> <span class=\"na\">type=</span><span class=\"s\">\"text/css\"</span> <span class=\"nt\">/&gt;</span>\n<span class=\"nt\">&lt;script </span><span class=\"na\">src=</span><span class=\"s\">\"http://images.del.icio.us/static/js/blogbadge.js\"</span> <span class=\"nt\">/&gt;</span>\n</code></pre></div></div>\n\n<p>And, for the blog entry template bit, look for this the <code class=\"language-plaintext highlighter-rouge\">&lt;p&gt;</code> element of class\n‘post-footer-line post-footer-line-3’, which was empty for me. Add this <code class=\"language-plaintext highlighter-rouge\">&lt;div&gt;</code>\nto that:</p>\n\n<div class=\"language-html highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nt\">&lt;p</span> <span class=\"na\">class=</span><span class=\"s\">'post-footer-line post-footer-line-3'</span><span class=\"nt\">&gt;</span>\n  <span class=\"nt\">&lt;div</span> <span class=\"na\">class=</span><span class=\"s\">\"delicious-blogbadge-line\"</span> <span class=\"na\">expr:id=</span><span class=\"s\">\"data:post.id\"</span><span class=\"nt\">&gt;</span>\n    <span class=\"nt\">&lt;script </span><span class=\"na\">type=</span><span class=\"s\">\"text/javascript\"</span><span class=\"nt\">&gt;</span>\n      <span class=\"nx\">Delicious</span><span class=\"p\">.</span><span class=\"nx\">BlogBadge</span><span class=\"p\">.</span><span class=\"nx\">register</span><span class=\"p\">(</span><span class=\"dl\">'</span><span class=\"s1\">&lt;data:post.id/&gt;</span><span class=\"dl\">'</span><span class=\"p\">,</span> <span class=\"dl\">'</span><span class=\"s1\">&lt;data:post.url/&gt;</span><span class=\"dl\">'</span><span class=\"p\">,</span> <span class=\"dl\">'</span><span class=\"s1\">&lt;data:post.title/&gt;</span><span class=\"dl\">'</span><span class=\"p\">);</span>\n    <span class=\"nt\">&lt;/script&gt;</span>\n  <span class=\"nt\">&lt;/div&gt;</span>\n<span class=\"nt\">&lt;/p&gt;</span>\n</code></pre></div></div>\n\n<p>To get at the right place, with the full template XHTML content, go to your\n<a href=\"http://www2.blogger.com/home\">www2.blogger.com/home</a> homepage, click the\nTemplate tab, then pick the <em>Edit HTML</em> option, and make sure to enable the\n<strong>Expand Widget Templates</strong> option.</p>",
      "summary": "Yesterday I blogged about how to include the new del.icio.us tagometer on a www.blogger.com blog, just like Improbulus did last December as I discovered later. Felix asked me how it could be done on the new www2.blogger.com template system. Well, here it is.",
      
      "date_published": "2007-01-09T00:00:00+00:00",
      "date_modified": "2007-01-09T00:00:00+00:00",
      "tags": ["blog"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/01/08/delicious-tagometer-on-blogspotcom.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/01/08/delicious-tagometer-on-blogspotcom.html",
      "title": "The del.icio.us tagometer on Blogspot.com",
      "content_html": "<p>Some days ago I read about the <a href=\"http://del.icio.us/\">del.icio.us</a>\n<a href=\"http://blog.del.icio.us/blog/2006/12/the_new_and_tag.html#more\">tagometer</a>, which\nis basically sort of save as I had before on this blog. The tagometer, however,\nshows some interesting properties of the blog items, like the number of people who\nbookmarked the item, and what tags they used. The\n<a href=\"http://del.icio.us/help/tagometer\">tagometer help</a> does not show how it can be\nintegrated with <a href=\"http://www.blogspot.com/\">blogspot.com</a> (where this blog is hosted),\nbut with the source from <a href=\"http://decafbad.com/blog/\">0xDECAFBAD</a> I got it working.\nThese blogs are not yet moved to the new blogger.com system (so, www.blogger.com,\nnot www2.blogger.com), so the below principally applies to the older system.</p>\n\n<p>First you need to adapt this blob to the <code class=\"language-plaintext highlighter-rouge\">&lt;head&gt;</code> of the template:</p>\n\n<div class=\"language-html highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nt\">&lt;</span><span class=\"err\">$</span><span class=\"na\">BlogMetaData</span><span class=\"err\">$</span><span class=\"nt\">&gt;</span>\n\n<span class=\"c\">&lt;!-- del.icio.us badge stuff --&gt;</span>\n<span class=\"nt\">&lt;script </span><span class=\"na\">type=</span><span class=\"s\">\"text/javascript\"</span><span class=\"nt\">&gt;</span>\n  <span class=\"k\">if </span><span class=\"p\">(</span><span class=\"k\">typeof</span> <span class=\"nb\">window</span><span class=\"p\">.</span><span class=\"nx\">Delicious</span> <span class=\"o\">==</span> <span class=\"dl\">\"</span><span class=\"s2\">undefined</span><span class=\"dl\">\"</span><span class=\"p\">)</span> <span class=\"nb\">window</span><span class=\"p\">.</span><span class=\"nx\">Delicious</span> <span class=\"o\">=</span> <span class=\"p\">{};</span>\n  <span class=\"nx\">Delicious</span><span class=\"p\">.</span><span class=\"nx\">BLOGBADGE_MANUAL_MODE</span> <span class=\"o\">=</span> <span class=\"kc\">true</span><span class=\"p\">;</span>\n<span class=\"nt\">&lt;/script&gt;</span>\n<span class=\"nt\">&lt;link</span> <span class=\"na\">id=</span><span class=\"s\">\"delicious-blogbadge-css\"</span> \n      <span class=\"na\">href=</span><span class=\"s\">\"http://images.del.icio.us/static/css/blogbadge.css\"</span>\n      <span class=\"na\">rel=</span><span class=\"s\">\"stylesheet\"</span> <span class=\"na\">type=</span><span class=\"s\">\"text/css\"</span> <span class=\"nt\">/&gt;</span>\n<span class=\"nt\">&lt;script </span><span class=\"na\">src=</span><span class=\"s\">\"http://images.del.icio.us/static/js/blogbadge.js\"</span><span class=\"nt\">&gt;&lt;/script&gt;</span>\n\n<span class=\"nt\">&lt;/head&gt;</span>\n</code></pre></div></div>\n\n<p>where <code class=\"language-plaintext highlighter-rouge\">&lt;$BlogMetaData$&gt;</code> and <code class=\"language-plaintext highlighter-rouge\">&lt;/head&gt;</code> should already be present in the template.</p>\n\n<p>Further down the template, you need to add a bit in the <code class=\"language-plaintext highlighter-rouge\">&lt;div class=\"blogPost\"&gt;</code>\nsection, just after the last <code class=\"language-plaintext highlighter-rouge\">&lt;div class=\"byline\"&gt;</code> element in your template.\nThe bits you add use blogger variables, so make sure to get it right:</p>\n\n<div class=\"language-html highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nt\">&lt;div</span> <span class=\"na\">class=</span><span class=\"s\">\"delicious-blogbadge-line\"</span> <span class=\"na\">id=</span><span class=\"s\">\"badge-&lt;$BlogItemNumber$&gt;\"</span><span class=\"nt\">&gt;</span>\n  <span class=\"nt\">&lt;script </span><span class=\"na\">type=</span><span class=\"s\">\"text/javascript\"</span><span class=\"nt\">&gt;</span>\n    <span class=\"nx\">Delicious</span><span class=\"p\">.</span><span class=\"nx\">BlogBadge</span><span class=\"p\">.</span><span class=\"nx\">register</span><span class=\"p\">(</span><span class=\"dl\">'</span><span class=\"s1\">badge-&lt;$BlogItemNumber$&gt;</span><span class=\"dl\">'</span><span class=\"p\">,</span> <span class=\"dl\">'</span><span class=\"s1\">&lt;$BlogItemPermalinkURL$&gt;</span><span class=\"dl\">'</span><span class=\"p\">,</span> <span class=\"dl\">\"</span><span class=\"s2\">&lt;$BlogItemTitle$&gt;</span><span class=\"dl\">\"</span><span class=\"p\">);</span>\n  <span class=\"nt\">&lt;/script&gt;</span>\n<span class=\"nt\">&lt;/div&gt;</span>\n</code></pre></div></div>\n\n<p>Note the quotes of the third argument. Do this properly, the quotes in the output\nof <code class=\"language-plaintext highlighter-rouge\">&lt;$BlogItemTitle$&gt;</code> should be escaped, so that it does not interfere with the\nquotes of the <code class=\"language-plaintext highlighter-rouge\">register()</code> JavaScript call. Can anyone tell me how to do that\nin JavaScript?</p>",
      "summary": "Some days ago I read about the del.icio.us tagometer, which is basically sort of save as I had before on this blog. The tagometer, however, shows some interesting properties of the blog items, like the number of people who bookmarked the item, and what tags they used. The tagometer help does not show how it can be integrated with blogspot.com (where this blog is hosted), but with the source from 0xDECAFBAD I got it working. These blogs are not yet moved to the new blogger.com system (so, www.blogger.com, not www2.blogger.com), so the below principally applies to the older system.",
      
      "date_published": "2007-01-08T00:00:00+00:00",
      "date_modified": "2007-01-08T00:00:00+00:00",
      "tags": ["blog"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/01/04/chemical-blogspace-is-getting-more.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/01/04/chemical-blogspace-is-getting-more.html",
      "title": "Chemical blogspace is getting more chemical",
      "content_html": "<p>The best remedy for being depressed is the rush after hacking some nice new feature (unfortunately, it is addictive). After\n<a href=\"http://chemicalblogspace.blogspot.com/2006/12/hacking-inchi-support-into-cb.html\">hacking InChI support into Chemical blogspace</a>\na couple of days back, adding some more visual feedback on <a href=\"http://web.archive.org/web/20070611160715/http://wiki.cubic.uni-koeln.de/cb/inchis.php\">those molecules <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>\nis not that hard, with <a href=\"http://pubchem.ncbi.nlm.nih.gov/\">PubChem</a> around that is:</p>\n\n<p><img src=\"/assets/images/inchisCbPage.png\" alt=\"\" /></p>\n\n<p>Beware! Every <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/12/10/including-smiles-cml-and-inchi-in.html\">marked up molecule <i class=\"fa-solid fa-box-archive fa-xs\"></i></a> in your\nblog is being picked up! So should the compound with the SMILES N(=NC1=CC=C(C=C1)N(CCO)CCO)C3=CC=C(C=CC2=C(C(=C(C#N)C#N)OC2(C)C)C#N)S3,\nwhich is <a href=\"http://web.archive.org/web/20240915152205/https://www.sciencelink.net/verdieping/organische-chemie-versnelt-internet/9035.article\">reported to be the most light sensitive molecule every synthesized so far\n<i class=\"fa-solid fa-box-archive fa-xs\"></i></a>.</p>",
      "summary": "The best remedy for being depressed is the rush after hacking some nice new feature (unfortunately, it is addictive). After hacking InChI support into Chemical blogspace a couple of days back, adding some more visual feedback on those molecules is not that hard, with PubChem around that is:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/inchisCbPage.png",
      "date_published": "2007-01-04T00:00:00+00:00",
      "date_modified": "2024-09-15T00:00:00+00:00",
      "tags": ["cb","inchi","pubchem"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2007/01/02/chemistry-in-html-javascript-from.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2007/01/02/chemistry-in-html-javascript-from.html",
      "title": "Chemistry in HTML: JavaScript from the server",
      "content_html": "<p>Recently I blogged about <a href=\"http://chem-bla-ics.blogspot.com/2006/12/smiles-cas-and-inchi-in-blogs.html\">a Greasemonkey script</a>\nto take advantage of <a href=\"http://chem-bla-ics.blogspot.com/2006/12/including-smiles-cml-and-inchi-in.html\">semantic markup of chemistry in blogs</a>\n(and HTML in general), and later made <a href=\"http://chem-bla-ics.blogspot.com/2006/12/chemistry-in-html-greasemonkey-again.html\">some plans how this can be\nextended</a>.\nOne of the ideas was to make this userscript available from the server, instead\nof having people need to install <a href=\"http://greasemonkey.mozdev.org/\">Greasemonkey</a>\nand the script separately. So, here it is.</p>\n\n<h2 id=\"sechemticjs\">sechemtic.js</h2>\n\n<p>Consider this (X)HTML:</p>\n\n<div class=\"language-html highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nt\">&lt;html</span> <span class=\"na\">xmlns=</span><span class=\"s\">\"http://www.w3.org/1999/xhtml\"</span>\n      <span class=\"na\">xmlns:chem=</span><span class=\"s\">\"http://www.blueobelisk.org/chemistryblogs/\"</span><span class=\"nt\">&gt;</span>\n\n<span class=\"nt\">&lt;head&gt;</span>\n <span class=\"nt\">&lt;title&gt;</span>m1<span class=\"nt\">&lt;/title&gt;</span>\n <span class=\"nt\">&lt;script </span><span class=\"na\">type=</span><span class=\"s\">\"text/javascript\"</span> <span class=\"na\">src=</span><span class=\"s\">\"sechemtic.js\"</span> <span class=\"nt\">/&gt;</span>\n<span class=\"o\">&lt;</span><span class=\"sr\">/head</span><span class=\"err\">&gt;\n</span>\n<span class=\"o\">&lt;</span><span class=\"nx\">body</span> <span class=\"nx\">onload</span><span class=\"o\">=</span><span class=\"dl\">\"</span><span class=\"s2\">addGoogleAndPubChemLinks(1,1)</span><span class=\"dl\">\"</span><span class=\"o\">&gt;</span>\n  <span class=\"o\">&lt;</span><span class=\"nx\">h1</span><span class=\"o\">&gt;</span><span class=\"nx\">The</span> <span class=\"nx\">Output</span><span class=\"o\">&lt;</span><span class=\"sr\">/h1</span><span class=\"err\">&gt;\n</span>  <span class=\"o\">&lt;</span><span class=\"nx\">p</span><span class=\"o\">&gt;</span><span class=\"nx\">This</span> <span class=\"nx\">article</span> <span class=\"nx\">is</span> <span class=\"nx\">about</span> <span class=\"o\">&lt;</span><span class=\"nx\">span</span> <span class=\"kd\">class</span><span class=\"o\">=</span><span class=\"dl\">\"</span><span class=\"s2\">chem:compound</span><span class=\"dl\">\"</span><span class=\"o\">&gt;</span><span class=\"nx\">m1</span><span class=\"o\">&lt;</span><span class=\"sr\">/span&gt;</span><span class=\"err\"> \n</span>  <span class=\"p\">(</span><span class=\"nx\">SMILES</span><span class=\"p\">:</span><span class=\"o\">&lt;</span><span class=\"nx\">span</span> <span class=\"kd\">class</span><span class=\"o\">=</span><span class=\"dl\">\"</span><span class=\"s2\">chem:smiles</span><span class=\"dl\">\"</span><span class=\"o\">&gt;</span><span class=\"nx\">CCCOC</span><span class=\"o\">&lt;</span><span class=\"sr\">/span&gt;</span><span class=\"se\">)</span><span class=\"sr\">.&lt;/</span><span class=\"nx\">p</span><span class=\"o\">&gt;</span>\n\n<span class=\"o\">&lt;</span><span class=\"sr\">/body</span><span class=\"err\">&gt;\n</span>\n<span class=\"o\">&lt;</span><span class=\"sr\">/html</span><span class=\"err\">&gt;\n</span></code></pre></div></div>\n\n<p><img src=\"/assets/images/sechemticJSOutput.png\" alt=\"\" /></p>\n\n<p>I think the above example shows the simple setup of the Sechemtic Web script (please\nforgive me my habit to use bad linguistic mashups ;). Just load the script in the\nHTML <code class=\"language-plaintext highlighter-rouge\">&lt;head&gt;</code>, and add in the <code class=\"language-plaintext highlighter-rouge\">onload=\"addGoogleAndPubChemLinks(1,1)\"</code> attribute to\nthe <code class=\"language-plaintext highlighter-rouge\">&lt;body&gt;</code> element. With blogs these bits would be part of the template, and,\ntherefore, need to be installed once. From then on, just use the <a href=\"http://chem-bla-ics.blogspot.com/2006/12/including-smiles-cml-and-inchi-in.html\">semantic markup as\nexplained earlier</a>.\nBoth the microformat and the RDFa method are supported. In\ncase of the latter, I recommend to define the chem namespace in the template of\nwebpages too, instead of in the <code class=\"language-plaintext highlighter-rouge\">&lt;span&gt;</code> elements.</p>\n\n<p>Currently, the Sechemtic Web script only has one functionality: to add links to\n<a href=\"http://pubchem.ncbi.nlm.nih.gov/\">PubChem</a> and <a href=\"http://www.google.com/\">Google</a>,\nwith the <code class=\"language-plaintext highlighter-rouge\">addGoogleAndPubChemLinks(int, int)</code> method. The\nfirst parameter determines (0 or 1) if links to Google should be made, and the\nsecond parameter does the same for links to PubChem.</p>\n\n<h2 id=\"download\">Download</h2>\n\n<p>For now, the script can be downloaded <a href=\"http://wiki.cubic.uni-koeln.de/cb/sechemtic.js\">here</a>.\nIt is licensed with the GPL version 2.0.</p>\n\n<h2 id=\"microformats\">Microformats</h2>\n\n<p>Here’s the same example using <a href=\"http://microformats.org/\">microformats</a>\ninstead of RDFa:</p>\n\n<div class=\"language-html highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nt\">&lt;html&gt;</span>\n\n<span class=\"nt\">&lt;head&gt;</span>\n <span class=\"nt\">&lt;title&gt;</span>m1<span class=\"nt\">&lt;/title&gt;</span>\n <span class=\"nt\">&lt;script </span><span class=\"na\">type=</span><span class=\"s\">\"text/javascript\"</span> <span class=\"na\">src=</span><span class=\"s\">\"sechemtic.js\"</span> <span class=\"nt\">/&gt;</span>\n<span class=\"o\">&lt;</span><span class=\"sr\">/head</span><span class=\"err\">&gt;\n</span>\n<span class=\"o\">&lt;</span><span class=\"nx\">body</span> <span class=\"nx\">onload</span><span class=\"o\">=</span><span class=\"dl\">\"</span><span class=\"s2\">addGoogleAndPubChemLinks(1,1)</span><span class=\"dl\">\"</span><span class=\"o\">&gt;</span>\n  <span class=\"o\">&lt;</span><span class=\"nx\">h1</span><span class=\"o\">&gt;</span><span class=\"nx\">The</span> <span class=\"nx\">Output</span><span class=\"o\">&lt;</span><span class=\"sr\">/h1</span><span class=\"err\">&gt;\n</span>  <span class=\"o\">&lt;</span><span class=\"nx\">p</span><span class=\"o\">&gt;</span><span class=\"nx\">This</span> <span class=\"nx\">article</span> <span class=\"nx\">is</span> <span class=\"nx\">about</span> <span class=\"o\">&lt;</span><span class=\"nx\">span</span> <span class=\"kd\">class</span><span class=\"o\">=</span><span class=\"dl\">\"</span><span class=\"s2\">compound</span><span class=\"dl\">\"</span><span class=\"o\">&gt;</span><span class=\"nx\">m1</span><span class=\"o\">&lt;</span><span class=\"sr\">/span&gt;</span><span class=\"err\"> \n</span>  <span class=\"p\">(</span><span class=\"nx\">SMILES</span><span class=\"p\">:</span><span class=\"o\">&lt;</span><span class=\"nx\">span</span> <span class=\"kd\">class</span><span class=\"o\">=</span><span class=\"dl\">\"</span><span class=\"s2\">smiles</span><span class=\"dl\">\"</span><span class=\"o\">&gt;</span><span class=\"nx\">CCCOC</span><span class=\"o\">&lt;</span><span class=\"sr\">/span&gt;</span><span class=\"se\">)</span><span class=\"sr\">.&lt;/</span><span class=\"nx\">p</span><span class=\"o\">&gt;</span>\n\n<span class=\"o\">&lt;</span><span class=\"sr\">/body</span><span class=\"err\">&gt;\n</span>\n<span class=\"o\">&lt;</span><span class=\"sr\">/html</span><span class=\"err\">&gt;\n</span></code></pre></div></div>",
      "summary": "Recently I blogged about a Greasemonkey script to take advantage of semantic markup of chemistry in blogs (and HTML in general), and later made some plans how this can be extended. One of the ideas was to make this userscript available from the server, instead of having people need to install Greasemonkey and the script separately. So, here it is.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/sechemticJSOutput.png",
      "date_published": "2007-01-02T00:00:00+00:00",
      "date_modified": "2007-01-02T00:00:00+00:00",
      "tags": ["html","javascript","userscript"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2006/12/30/modern-chemistry-in-cdk-beyond-two.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/12/30/modern-chemistry-in-cdk-beyond-two.html",
      "title": "Modern chemistry in the CDK: beyond the two-atom bond",
      "content_html": "<p><a href=\"http://depth-first.com/\">Rich</a> <a href=\"http://depth-first.com/articles/2006/12/19/ferrocene-and-beyond-a-solution-to-the-molecular-representation-problem\">recently blogged</a>\nabout the limitations of the two-atom bond representation often used in chemoinformatics,\ntriggered by <a href=\"http://depth-first.com/articles/2006/12/12/the-problem-with-ferrocene\">the four ferrocene entries in PubChem</a>.\nIn reply to himself, <a href=\"http://depth-first.com/articles/2006/12/20/a-molecular-language-for-modern-chemistry-getting-started-with-flexmol\">Rich described FlexMol</a>,\nan XML language that can describe bond systems that involve more than two atoms.</p>\n\n<p>Obviously, the problems originates from the lack of mathematical knowledge of chemists: the\ncurrent chemoinformatics heavily depends on graph theory, where each atom is a vertex and each\nbond an edge. This has the advantage that we can borrow all algorithms that work with graph\nrepresentations, such as <a href=\"http://en.wikipedia.org/wiki/Dijkstra's_algorithm\">Dijkstra’s algorithm</a>\nto find the shortest path between two vertices. Or, in chemical language, an algorithm to\ncalculate how many bonds two atoms are apart in a molecule.</p>\n\n<p>When discussing FlexMol, Rich mentions the work by Dietz (DOI:<a href=\"https://doi.org/10.1021/ci00027a001\">10.1021/ci00027a001</a>),\nbut I would like to mention the PhD thesis of S. Bauerschmidt to this (see\nDOI:<a href=\"https://doi.org/10.1021/ci9704423\">10.1021/ci9704423</a>) done in Gasteiger’s group.\nDropping this ‘two-atom bond’ representation in favor of something that better describes compounds\nlike ferrocene, like the Dietz and Bauerschmidt approaches, has the unfortunate disadvantage of\nloosing compatibility with graph theory algorithms. Nevertheless, in order to take\nchemoinformatics to the next level, we have to address these issues. But hope is not lost, and\npeople are working on rewriting our toolkit of chemoinformatics algorithms to match such new\nrepresentations.</p>\n\n<h2 id=\"cdk\">CDK</h2>\n\n<p>I will postpone analyzing the <a href=\"http://cdk.sf.net/\">CDK</a> for compatibility with such more modern\nrepresentations (look out for a <a href=\"http://cdknews.org/\">CDK News</a> article), and now just describe\nhow the CDK can be used for FlexMol/Dietz/Bauerschmidt representations. Consider\n<a href=\"http://depth-first.com/articles/2006/12/20/a-molecular-language-for-modern-chemistry-getting-started-with-flexmol\">the four examples Rich gives</a>\nin his blog. Here are the CDK ways of doing the same.</p>\n\n<p>For example, 1,3,5-cyclohexatriene:</p>\n\n<div class=\"language-java highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kd\">public</span> <span class=\"nc\">IMolecule</span> <span class=\"nf\">makeCycloHexaTriene</span><span class=\"o\">()</span> <span class=\"o\">{</span>\n  <span class=\"nc\">IMolecule</span> <span class=\"n\">cyclohexatriene</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newMolecule</span><span class=\"o\">();</span>\n\n  <span class=\"nc\">IAtom</span> <span class=\"n\">atomC0</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"nc\">Elements</span><span class=\"o\">.</span><span class=\"na\">CARBON</span><span class=\"o\">);</span>\n    <span class=\"n\">atomC0</span><span class=\"o\">.</span><span class=\"na\">setID</span><span class=\"o\">(</span><span class=\"s\">\"C0\"</span><span class=\"o\">);</span> <span class=\"n\">atomC0</span><span class=\"o\">.</span><span class=\"na\">setHydrogenCount</span><span class=\"o\">(</span><span class=\"mi\">1</span><span class=\"o\">);</span>\n  <span class=\"nc\">IAtom</span> <span class=\"n\">atomC1</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"nc\">Elements</span><span class=\"o\">.</span><span class=\"na\">CARBON</span><span class=\"o\">);</span>\n    <span class=\"n\">atomC1</span><span class=\"o\">.</span><span class=\"na\">setID</span><span class=\"o\">(</span><span class=\"s\">\"C1\"</span><span class=\"o\">);</span> <span class=\"n\">atomC1</span><span class=\"o\">.</span><span class=\"na\">setHydrogenCount</span><span class=\"o\">(</span><span class=\"mi\">1</span><span class=\"o\">);</span>\n  <span class=\"nc\">IAtom</span> <span class=\"n\">atomC2</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"nc\">Elements</span><span class=\"o\">.</span><span class=\"na\">CARBON</span><span class=\"o\">);</span>\n    <span class=\"n\">atomC2</span><span class=\"o\">.</span><span class=\"na\">setID</span><span class=\"o\">(</span><span class=\"s\">\"C2\"</span><span class=\"o\">);</span> <span class=\"n\">atomC2</span><span class=\"o\">.</span><span class=\"na\">setHydrogenCount</span><span class=\"o\">(</span><span class=\"mi\">1</span><span class=\"o\">);</span>\n  <span class=\"nc\">IAtom</span> <span class=\"n\">atomC3</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"nc\">Elements</span><span class=\"o\">.</span><span class=\"na\">CARBON</span><span class=\"o\">);</span>\n    <span class=\"n\">atomC3</span><span class=\"o\">.</span><span class=\"na\">setID</span><span class=\"o\">(</span><span class=\"s\">\"C3\"</span><span class=\"o\">);</span> <span class=\"n\">atomC3</span><span class=\"o\">.</span><span class=\"na\">setHydrogenCount</span><span class=\"o\">(</span><span class=\"mi\">1</span><span class=\"o\">);</span>\n  <span class=\"nc\">IAtom</span> <span class=\"n\">atomC4</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"nc\">Elements</span><span class=\"o\">.</span><span class=\"na\">CARBON</span><span class=\"o\">);</span>\n    <span class=\"n\">atomC4</span><span class=\"o\">.</span><span class=\"na\">setID</span><span class=\"o\">(</span><span class=\"s\">\"C4\"</span><span class=\"o\">);</span> <span class=\"n\">atomC4</span><span class=\"o\">.</span><span class=\"na\">setHydrogenCount</span><span class=\"o\">(</span><span class=\"mi\">1</span><span class=\"o\">);</span>\n  <span class=\"nc\">IAtom</span> <span class=\"n\">atomC5</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"nc\">Elements</span><span class=\"o\">.</span><span class=\"na\">CARBON</span><span class=\"o\">);</span>\n    <span class=\"n\">atomC5</span><span class=\"o\">.</span><span class=\"na\">setID</span><span class=\"o\">(</span><span class=\"s\">\"C5\"</span><span class=\"o\">);</span> <span class=\"n\">atomC5</span><span class=\"o\">.</span><span class=\"na\">setHydrogenCount</span><span class=\"o\">(</span><span class=\"mi\">1</span><span class=\"o\">);</span>\n\n  <span class=\"nc\">IBond</span> <span class=\"n\">bondB0</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newBond</span><span class=\"o\">(</span><span class=\"n\">atomC0</span><span class=\"o\">,</span> <span class=\"n\">atomC1</span><span class=\"o\">,</span> <span class=\"mf\">1.0</span><span class=\"o\">);</span>\n    <span class=\"n\">bondB0</span><span class=\"o\">.</span><span class=\"na\">setElectronCount</span><span class=\"o\">(</span><span class=\"mi\">2</span><span class=\"o\">);</span>\n  <span class=\"nc\">IBond</span> <span class=\"n\">bondB1</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newBond</span><span class=\"o\">(</span><span class=\"n\">atomC1</span><span class=\"o\">,</span> <span class=\"n\">atomC2</span><span class=\"o\">,</span> <span class=\"mf\">2.0</span><span class=\"o\">);</span>\n    <span class=\"n\">bondB1</span><span class=\"o\">.</span><span class=\"na\">setElectronCount</span><span class=\"o\">(</span><span class=\"mi\">4</span><span class=\"o\">);</span>\n  <span class=\"nc\">IBond</span> <span class=\"n\">bondB2</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newBond</span><span class=\"o\">(</span><span class=\"n\">atomC2</span><span class=\"o\">,</span> <span class=\"n\">atomC3</span><span class=\"o\">,</span> <span class=\"mf\">1.0</span><span class=\"o\">);</span>\n    <span class=\"n\">bondB2</span><span class=\"o\">.</span><span class=\"na\">setElectronCount</span><span class=\"o\">(</span><span class=\"mi\">2</span><span class=\"o\">);</span>\n  <span class=\"nc\">IBond</span> <span class=\"n\">bondB3</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newBond</span><span class=\"o\">(</span><span class=\"n\">atomC3</span><span class=\"o\">,</span> <span class=\"n\">atomC4</span><span class=\"o\">,</span> <span class=\"mf\">2.0</span><span class=\"o\">);</span>\n    <span class=\"n\">bondB3</span><span class=\"o\">.</span><span class=\"na\">setElectronCount</span><span class=\"o\">(</span><span class=\"mi\">4</span><span class=\"o\">);</span>\n  <span class=\"nc\">IBond</span> <span class=\"n\">bondB4</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newBond</span><span class=\"o\">(</span><span class=\"n\">atomC4</span><span class=\"o\">,</span> <span class=\"n\">atomC5</span><span class=\"o\">,</span> <span class=\"mf\">1.0</span><span class=\"o\">);</span>\n    <span class=\"n\">bondB4</span><span class=\"o\">.</span><span class=\"na\">setElectronCount</span><span class=\"o\">(</span><span class=\"mi\">2</span><span class=\"o\">);</span>\n  <span class=\"nc\">IBond</span> <span class=\"n\">bondB5</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newBond</span><span class=\"o\">(</span><span class=\"n\">atomC0</span><span class=\"o\">,</span> <span class=\"n\">atomC5</span><span class=\"o\">,</span> <span class=\"mf\">2.0</span><span class=\"o\">);</span>\n    <span class=\"n\">bondB5</span><span class=\"o\">.</span><span class=\"na\">setElectronCount</span><span class=\"o\">(</span><span class=\"mi\">4</span><span class=\"o\">);</span>\n\n  <span class=\"n\">cyclohexatriene</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">atomC0</span><span class=\"o\">);</span> <span class=\"n\">cyclohexatriene</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">atomC1</span><span class=\"o\">);</span>\n  <span class=\"n\">cyclohexatriene</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">atomC2</span><span class=\"o\">);</span> <span class=\"n\">cyclohexatriene</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">atomC3</span><span class=\"o\">);</span>\n  <span class=\"n\">cyclohexatriene</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">atomC4</span><span class=\"o\">);</span> <span class=\"n\">cyclohexatriene</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">atomC5</span><span class=\"o\">);</span>\n\n  <span class=\"n\">cyclohexatriene</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">bondB0</span><span class=\"o\">);</span> <span class=\"n\">cyclohexatriene</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">bondB1</span><span class=\"o\">);</span>\n  <span class=\"n\">cyclohexatriene</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">bondB2</span><span class=\"o\">);</span> <span class=\"n\">cyclohexatriene</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">bondB3</span><span class=\"o\">);</span>\n  <span class=\"n\">cyclohexatriene</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">bondB4</span><span class=\"o\">);</span> <span class=\"n\">cyclohexatriene</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">bondB5</span><span class=\"o\">);</span>\n\n  <span class=\"k\">return</span> <span class=\"n\">cyclohexatriene</span><span class=\"o\">;</span>\n<span class=\"o\">}</span>\n</code></pre></div></div>\n\n<p>Summarizing, the key thing is to use the <code class=\"language-plaintext highlighter-rouge\">IBond.setElectronCount()</code> method.\nThe call is sort of  redundant, as the CDK defaults to two electrons if not\nexplicitly given. This compound is, of course, benzene which we can represent\nlike this too:</p>\n\n<div class=\"language-java highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kd\">public</span> <span class=\"nc\">IMolecule</span> <span class=\"nf\">makeBenzene</span><span class=\"o\">()</span> <span class=\"o\">{</span>\n  <span class=\"nc\">IMolecule</span> <span class=\"n\">benzene</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newMolecule</span><span class=\"o\">();</span>\n\n  <span class=\"nc\">IAtom</span> <span class=\"n\">atomC0</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"nc\">Elements</span><span class=\"o\">.</span><span class=\"na\">CARBON</span><span class=\"o\">);</span>\n    <span class=\"n\">atomC0</span><span class=\"o\">.</span><span class=\"na\">setID</span><span class=\"o\">(</span><span class=\"s\">\"C0\"</span><span class=\"o\">);</span> <span class=\"n\">atomC0</span><span class=\"o\">.</span><span class=\"na\">setHydrogenCount</span><span class=\"o\">(</span><span class=\"mi\">1</span><span class=\"o\">);</span>\n  <span class=\"nc\">IAtom</span> <span class=\"n\">atomC1</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"nc\">Elements</span><span class=\"o\">.</span><span class=\"na\">CARBON</span><span class=\"o\">);</span>\n    <span class=\"n\">atomC1</span><span class=\"o\">.</span><span class=\"na\">setID</span><span class=\"o\">(</span><span class=\"s\">\"C1\"</span><span class=\"o\">);</span> <span class=\"n\">atomC1</span><span class=\"o\">.</span><span class=\"na\">setHydrogenCount</span><span class=\"o\">(</span><span class=\"mi\">1</span><span class=\"o\">);</span>\n  <span class=\"nc\">IAtom</span> <span class=\"n\">atomC2</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"nc\">Elements</span><span class=\"o\">.</span><span class=\"na\">CARBON</span><span class=\"o\">);</span>\n    <span class=\"n\">atomC2</span><span class=\"o\">.</span><span class=\"na\">setID</span><span class=\"o\">(</span><span class=\"s\">\"C2\"</span><span class=\"o\">);</span> <span class=\"n\">atomC2</span><span class=\"o\">.</span><span class=\"na\">setHydrogenCount</span><span class=\"o\">(</span><span class=\"mi\">1</span><span class=\"o\">);</span>\n  <span class=\"nc\">IAtom</span> <span class=\"n\">atomC3</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"nc\">Elements</span><span class=\"o\">.</span><span class=\"na\">CARBON</span><span class=\"o\">);</span>\n    <span class=\"n\">atomC3</span><span class=\"o\">.</span><span class=\"na\">setID</span><span class=\"o\">(</span><span class=\"s\">\"C3\"</span><span class=\"o\">);</span> <span class=\"n\">atomC3</span><span class=\"o\">.</span><span class=\"na\">setHydrogenCount</span><span class=\"o\">(</span><span class=\"mi\">1</span><span class=\"o\">);</span>\n  <span class=\"nc\">IAtom</span> <span class=\"n\">atomC4</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"nc\">Elements</span><span class=\"o\">.</span><span class=\"na\">CARBON</span><span class=\"o\">);</span> \n    <span class=\"n\">atomC4</span><span class=\"o\">.</span><span class=\"na\">setID</span><span class=\"o\">(</span><span class=\"s\">\"C4\"</span><span class=\"o\">);</span> <span class=\"n\">atomC4</span><span class=\"o\">.</span><span class=\"na\">setHydrogenCount</span><span class=\"o\">(</span><span class=\"mi\">1</span><span class=\"o\">);</span>\n  <span class=\"nc\">IAtom</span> <span class=\"n\">atomC5</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"nc\">Elements</span><span class=\"o\">.</span><span class=\"na\">CARBON</span><span class=\"o\">);</span> \n    <span class=\"n\">atomC5</span><span class=\"o\">.</span><span class=\"na\">setID</span><span class=\"o\">(</span><span class=\"s\">\"C5\"</span><span class=\"o\">);</span> <span class=\"n\">atomC5</span><span class=\"o\">.</span><span class=\"na\">setHydrogenCount</span><span class=\"o\">(</span><span class=\"mi\">1</span><span class=\"o\">);</span>\n\n  <span class=\"nc\">IBond</span> <span class=\"n\">bondB0</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newBond</span><span class=\"o\">(</span><span class=\"n\">atomC0</span><span class=\"o\">,</span> <span class=\"n\">atomC1</span><span class=\"o\">);</span>\n    <span class=\"n\">bondB0</span><span class=\"o\">.</span><span class=\"na\">setElectronCount</span><span class=\"o\">(</span><span class=\"mi\">2</span><span class=\"o\">);</span>\n  <span class=\"nc\">IBond</span> <span class=\"n\">bondB1</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newBond</span><span class=\"o\">(</span><span class=\"n\">atomC1</span><span class=\"o\">,</span> <span class=\"n\">atomC2</span><span class=\"o\">);</span>\n    <span class=\"n\">bondB1</span><span class=\"o\">.</span><span class=\"na\">setElectronCount</span><span class=\"o\">(</span><span class=\"mi\">2</span><span class=\"o\">);</span>\n  <span class=\"nc\">IBond</span> <span class=\"n\">bondB2</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newBond</span><span class=\"o\">(</span><span class=\"n\">atomC2</span><span class=\"o\">,</span> <span class=\"n\">atomC3</span><span class=\"o\">);</span>\n    <span class=\"n\">bondB2</span><span class=\"o\">.</span><span class=\"na\">setElectronCount</span><span class=\"o\">(</span><span class=\"mi\">2</span><span class=\"o\">);</span>\n  <span class=\"nc\">IBond</span> <span class=\"n\">bondB3</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newBond</span><span class=\"o\">(</span><span class=\"n\">atomC3</span><span class=\"o\">,</span> <span class=\"n\">atomC4</span><span class=\"o\">);</span>\n    <span class=\"n\">bondB3</span><span class=\"o\">.</span><span class=\"na\">setElectronCount</span><span class=\"o\">(</span><span class=\"mi\">2</span><span class=\"o\">);</span>\n  <span class=\"nc\">IBond</span> <span class=\"n\">bondB4</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newBond</span><span class=\"o\">(</span><span class=\"n\">atomC4</span><span class=\"o\">,</span> <span class=\"n\">atomC5</span><span class=\"o\">);</span>\n    <span class=\"n\">bondB4</span><span class=\"o\">.</span><span class=\"na\">setElectronCount</span><span class=\"o\">(</span><span class=\"mi\">2</span><span class=\"o\">);</span>\n  <span class=\"nc\">IBond</span> <span class=\"n\">bondB5</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newBond</span><span class=\"o\">(</span><span class=\"n\">atomC0</span><span class=\"o\">,</span> <span class=\"n\">atomC5</span><span class=\"o\">);</span>\n    <span class=\"n\">bondB5</span><span class=\"o\">.</span><span class=\"na\">setElectronCount</span><span class=\"o\">(</span><span class=\"mi\">2</span><span class=\"o\">);</span>\n\n  <span class=\"nc\">IBond</span> <span class=\"n\">bondingSystem</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newBond</span><span class=\"o\">();</span>\n    <span class=\"n\">bondingSystem</span><span class=\"o\">.</span><span class=\"na\">setElectronCount</span><span class=\"o\">(</span><span class=\"mi\">6</span><span class=\"o\">);</span>\n    <span class=\"n\">bondingSystem</span><span class=\"o\">.</span><span class=\"na\">setAtoms</span><span class=\"o\">(</span>\n      <span class=\"k\">new</span> <span class=\"nc\">IAtom</span><span class=\"o\">[]</span> <span class=\"o\">{</span> <span class=\"n\">atomC0</span><span class=\"o\">,</span> <span class=\"n\">atomC1</span><span class=\"o\">,</span> <span class=\"n\">atomC2</span><span class=\"o\">,</span> \n                    <span class=\"n\">atomC3</span><span class=\"o\">,</span> <span class=\"n\">atomC4</span><span class=\"o\">,</span> <span class=\"n\">atomC5</span><span class=\"o\">}</span>\n    <span class=\"o\">);</span>\n\n  <span class=\"n\">benzene</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">atomC0</span><span class=\"o\">);</span> <span class=\"n\">benzene</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">atomC1</span><span class=\"o\">);</span>\n  <span class=\"n\">benzene</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">atomC2</span><span class=\"o\">);</span> <span class=\"n\">benzene</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">atomC3</span><span class=\"o\">);</span>\n  <span class=\"n\">benzene</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">atomC4</span><span class=\"o\">);</span> <span class=\"n\">benzene</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">atomC5</span><span class=\"o\">);</span>\n\n  <span class=\"n\">benzene</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">bondB0</span><span class=\"o\">);</span> <span class=\"n\">benzene</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">bondB1</span><span class=\"o\">);</span>\n  <span class=\"n\">benzene</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">bondB2</span><span class=\"o\">);</span> <span class=\"n\">benzene</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">bondB3</span><span class=\"o\">);</span>\n  <span class=\"n\">benzene</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">bondB4</span><span class=\"o\">);</span> <span class=\"n\">benzene</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">bondB5</span><span class=\"o\">);</span>\n  <span class=\"n\">benzene</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">bondingSystem</span><span class=\"o\">);</span>\n\n  <span class=\"k\">return</span> <span class=\"n\">benzene</span><span class=\"o\">;</span>\n<span class=\"o\">}</span>\n</code></pre></div></div>\n\n<p>This version represents the delocalized aromatic pi-system as one IBond:\none with 6 electrons, and 6 associated atoms.</p>\n\n<p>The cyclopentadienyl anion is represented similarly:</p>\n\n<div class=\"language-java highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kd\">public</span> <span class=\"nc\">IMolecule</span> <span class=\"nf\">makeCycloPentadienylAnion</span><span class=\"o\">()</span> <span class=\"o\">{</span>\n  <span class=\"nc\">IMolecule</span> <span class=\"n\">cp</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newMolecule</span><span class=\"o\">();</span>\n\n  <span class=\"nc\">IAtom</span> <span class=\"n\">atomC0</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"nc\">Elements</span><span class=\"o\">.</span><span class=\"na\">CARBON</span><span class=\"o\">);</span>\n <span class=\"n\">atomC0</span><span class=\"o\">.</span><span class=\"na\">setID</span><span class=\"o\">(</span><span class=\"s\">\"C0\"</span><span class=\"o\">);</span> <span class=\"n\">atomC0</span><span class=\"o\">.</span><span class=\"na\">setHydrogenCount</span><span class=\"o\">(</span><span class=\"mi\">1</span><span class=\"o\">);</span>\n  <span class=\"nc\">IAtom</span> <span class=\"n\">atomC1</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"nc\">Elements</span><span class=\"o\">.</span><span class=\"na\">CARBON</span><span class=\"o\">);</span>\n <span class=\"n\">atomC1</span><span class=\"o\">.</span><span class=\"na\">setID</span><span class=\"o\">(</span><span class=\"s\">\"C1\"</span><span class=\"o\">);</span> <span class=\"n\">atomC1</span><span class=\"o\">.</span><span class=\"na\">setHydrogenCount</span><span class=\"o\">(</span><span class=\"mi\">1</span><span class=\"o\">);</span>\n  <span class=\"nc\">IAtom</span> <span class=\"n\">atomC2</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"nc\">Elements</span><span class=\"o\">.</span><span class=\"na\">CARBON</span><span class=\"o\">);</span>\n <span class=\"n\">atomC2</span><span class=\"o\">.</span><span class=\"na\">setID</span><span class=\"o\">(</span><span class=\"s\">\"C2\"</span><span class=\"o\">);</span> <span class=\"n\">atomC2</span><span class=\"o\">.</span><span class=\"na\">setHydrogenCount</span><span class=\"o\">(</span><span class=\"mi\">1</span><span class=\"o\">);</span>\n  <span class=\"nc\">IAtom</span> <span class=\"n\">atomC3</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"nc\">Elements</span><span class=\"o\">.</span><span class=\"na\">CARBON</span><span class=\"o\">);</span>\n <span class=\"n\">atomC3</span><span class=\"o\">.</span><span class=\"na\">setID</span><span class=\"o\">(</span><span class=\"s\">\"C3\"</span><span class=\"o\">);</span> <span class=\"n\">atomC3</span><span class=\"o\">.</span><span class=\"na\">setHydrogenCount</span><span class=\"o\">(</span><span class=\"mi\">1</span><span class=\"o\">);</span>\n  <span class=\"nc\">IAtom</span> <span class=\"n\">atomC4</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"nc\">Elements</span><span class=\"o\">.</span><span class=\"na\">CARBON</span><span class=\"o\">);</span>\n <span class=\"n\">atomC4</span><span class=\"o\">.</span><span class=\"na\">setID</span><span class=\"o\">(</span><span class=\"s\">\"C4\"</span><span class=\"o\">);</span> <span class=\"n\">atomC4</span><span class=\"o\">.</span><span class=\"na\">setHydrogenCount</span><span class=\"o\">(</span><span class=\"mi\">1</span><span class=\"o\">);</span>\n\n  <span class=\"nc\">IBond</span> <span class=\"n\">bondB0</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newBond</span><span class=\"o\">(</span><span class=\"n\">atomC0</span><span class=\"o\">,</span> <span class=\"n\">atomC1</span><span class=\"o\">);</span>\n    <span class=\"n\">bondB0</span><span class=\"o\">.</span><span class=\"na\">setElectronCount</span><span class=\"o\">(</span><span class=\"mi\">2</span><span class=\"o\">);</span>\n  <span class=\"nc\">IBond</span> <span class=\"n\">bondB1</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newBond</span><span class=\"o\">(</span><span class=\"n\">atomC1</span><span class=\"o\">,</span> <span class=\"n\">atomC2</span><span class=\"o\">);</span>\n    <span class=\"n\">bondB1</span><span class=\"o\">.</span><span class=\"na\">setElectronCount</span><span class=\"o\">(</span><span class=\"mi\">2</span><span class=\"o\">);</span>\n  <span class=\"nc\">IBond</span> <span class=\"n\">bondB2</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newBond</span><span class=\"o\">(</span><span class=\"n\">atomC2</span><span class=\"o\">,</span> <span class=\"n\">atomC3</span><span class=\"o\">);</span>\n    <span class=\"n\">bondB2</span><span class=\"o\">.</span><span class=\"na\">setElectronCount</span><span class=\"o\">(</span><span class=\"mi\">2</span><span class=\"o\">);</span>\n  <span class=\"nc\">IBond</span> <span class=\"n\">bondB3</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newBond</span><span class=\"o\">(</span><span class=\"n\">atomC3</span><span class=\"o\">,</span> <span class=\"n\">atomC4</span><span class=\"o\">);</span>\n    <span class=\"n\">bondB3</span><span class=\"o\">.</span><span class=\"na\">setElectronCount</span><span class=\"o\">(</span><span class=\"mi\">2</span><span class=\"o\">);</span>\n  <span class=\"nc\">IBond</span> <span class=\"n\">bondB4</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newBond</span><span class=\"o\">(</span><span class=\"n\">atomC4</span><span class=\"o\">,</span> <span class=\"n\">atomC0</span><span class=\"o\">);</span>\n    <span class=\"n\">bondB4</span><span class=\"o\">.</span><span class=\"na\">setElectronCount</span><span class=\"o\">(</span><span class=\"mi\">2</span><span class=\"o\">);</span>\n\n  <span class=\"nc\">IBond</span> <span class=\"n\">bondingSystem</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newBond</span><span class=\"o\">();</span>\n    <span class=\"n\">bondingSystem</span><span class=\"o\">.</span><span class=\"na\">setElectronCount</span><span class=\"o\">(</span><span class=\"mi\">6</span><span class=\"o\">);</span>\n  <span class=\"n\">bondingSystem</span><span class=\"o\">.</span><span class=\"na\">setAtoms</span><span class=\"o\">(</span>\n    <span class=\"k\">new</span> <span class=\"nc\">IAtom</span><span class=\"o\">[]{</span> <span class=\"n\">atomC0</span><span class=\"o\">,</span> <span class=\"n\">atomC1</span><span class=\"o\">,</span> <span class=\"n\">atomC2</span><span class=\"o\">,</span> <span class=\"n\">atomC3</span><span class=\"o\">,</span> <span class=\"n\">atomC4</span><span class=\"o\">}</span>\n  <span class=\"o\">);</span>\n\n  <span class=\"n\">cp</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">atomC0</span><span class=\"o\">);</span> <span class=\"n\">cp</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">atomC1</span><span class=\"o\">);</span>\n  <span class=\"n\">cp</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">atomC2</span><span class=\"o\">);</span> <span class=\"n\">cp</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">atomC3</span><span class=\"o\">);</span>\n  <span class=\"n\">cp</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">atomC4</span><span class=\"o\">);</span>\n\n  <span class=\"n\">cp</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">bondB0</span><span class=\"o\">);</span> <span class=\"n\">cp</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">bondB1</span><span class=\"o\">);</span>\n  <span class=\"n\">cp</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">bondB2</span><span class=\"o\">);</span> <span class=\"n\">cp</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">bondB3</span><span class=\"o\">);</span>\n  <span class=\"n\">cp</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">bondB4</span><span class=\"o\">);</span> <span class=\"n\">cp</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">bondingSystem</span><span class=\"o\">);</span>\n\n  <span class=\"k\">return</span> <span class=\"n\">cp</span><span class=\"o\">;</span>\n<span class=\"o\">}</span>\n</code></pre></div></div>\n\n<p>And the final step in this series, is ferrocene:</p>\n\n<div class=\"language-java highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kd\">public</span> <span class=\"nc\">IMolecule</span> <span class=\"nf\">makeFerrocene</span><span class=\"o\">()</span> <span class=\"o\">{</span>\n  <span class=\"nc\">IMolecule</span> <span class=\"n\">ferrocene</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newMolecule</span><span class=\"o\">();</span>\n\n  <span class=\"nc\">IAtom</span> <span class=\"n\">atomC0</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"nc\">Elements</span><span class=\"o\">.</span><span class=\"na\">CARBON</span><span class=\"o\">);</span>\n    <span class=\"n\">atomC0</span><span class=\"o\">.</span><span class=\"na\">setID</span><span class=\"o\">(</span><span class=\"s\">\"C0\"</span><span class=\"o\">);</span> <span class=\"n\">atomC0</span><span class=\"o\">.</span><span class=\"na\">setHydrogenCount</span><span class=\"o\">(</span><span class=\"mi\">1</span><span class=\"o\">);</span>\n  <span class=\"nc\">IAtom</span> <span class=\"n\">atomC1</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"nc\">Elements</span><span class=\"o\">.</span><span class=\"na\">CARBON</span><span class=\"o\">);</span>\n    <span class=\"n\">atomC1</span><span class=\"o\">.</span><span class=\"na\">setID</span><span class=\"o\">(</span><span class=\"s\">\"C1\"</span><span class=\"o\">);</span> <span class=\"n\">atomC1</span><span class=\"o\">.</span><span class=\"na\">setHydrogenCount</span><span class=\"o\">(</span><span class=\"mi\">1</span><span class=\"o\">);</span>\n  <span class=\"nc\">IAtom</span> <span class=\"n\">atomC2</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"nc\">Elements</span><span class=\"o\">.</span><span class=\"na\">CARBON</span><span class=\"o\">);</span>\n    <span class=\"n\">atomC2</span><span class=\"o\">.</span><span class=\"na\">setID</span><span class=\"o\">(</span><span class=\"s\">\"C2\"</span><span class=\"o\">);</span> <span class=\"n\">atomC2</span><span class=\"o\">.</span><span class=\"na\">setHydrogenCount</span><span class=\"o\">(</span><span class=\"mi\">1</span><span class=\"o\">);</span>\n  <span class=\"nc\">IAtom</span> <span class=\"n\">atomC3</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"nc\">Elements</span><span class=\"o\">.</span><span class=\"na\">CARBON</span><span class=\"o\">);</span>\n    <span class=\"n\">atomC3</span><span class=\"o\">.</span><span class=\"na\">setID</span><span class=\"o\">(</span><span class=\"s\">\"C3\"</span><span class=\"o\">);</span> <span class=\"n\">atomC3</span><span class=\"o\">.</span><span class=\"na\">setHydrogenCount</span><span class=\"o\">(</span><span class=\"mi\">1</span><span class=\"o\">);</span>\n  <span class=\"nc\">IAtom</span> <span class=\"n\">atomC4</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"nc\">Elements</span><span class=\"o\">.</span><span class=\"na\">CARBON</span><span class=\"o\">);</span>\n    <span class=\"n\">atomC4</span><span class=\"o\">.</span><span class=\"na\">setID</span><span class=\"o\">(</span><span class=\"s\">\"C4\"</span><span class=\"o\">);</span> <span class=\"n\">atomC4</span><span class=\"o\">.</span><span class=\"na\">setHydrogenCount</span><span class=\"o\">(</span><span class=\"mi\">1</span><span class=\"o\">);</span>\n  <span class=\"nc\">IAtom</span> <span class=\"n\">atomC5</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"nc\">Elements</span><span class=\"o\">.</span><span class=\"na\">CARBON</span><span class=\"o\">);</span>\n    <span class=\"n\">atomC5</span><span class=\"o\">.</span><span class=\"na\">setID</span><span class=\"o\">(</span><span class=\"s\">\"C5\"</span><span class=\"o\">);</span> <span class=\"n\">atomC5</span><span class=\"o\">.</span><span class=\"na\">setHydrogenCount</span><span class=\"o\">(</span><span class=\"mi\">1</span><span class=\"o\">);</span>\n  <span class=\"nc\">IAtom</span> <span class=\"n\">atomC6</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"nc\">Elements</span><span class=\"o\">.</span><span class=\"na\">CARBON</span><span class=\"o\">);</span>\n    <span class=\"n\">atomC6</span><span class=\"o\">.</span><span class=\"na\">setID</span><span class=\"o\">(</span><span class=\"s\">\"C6\"</span><span class=\"o\">);</span> <span class=\"n\">atomC6</span><span class=\"o\">.</span><span class=\"na\">setHydrogenCount</span><span class=\"o\">(</span><span class=\"mi\">1</span><span class=\"o\">);</span>\n  <span class=\"nc\">IAtom</span> <span class=\"n\">atomC7</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"nc\">Elements</span><span class=\"o\">.</span><span class=\"na\">CARBON</span><span class=\"o\">);</span>\n    <span class=\"n\">atomC7</span><span class=\"o\">.</span><span class=\"na\">setID</span><span class=\"o\">(</span><span class=\"s\">\"C7\"</span><span class=\"o\">);</span> <span class=\"n\">atomC7</span><span class=\"o\">.</span><span class=\"na\">setHydrogenCount</span><span class=\"o\">(</span><span class=\"mi\">1</span><span class=\"o\">);</span>\n  <span class=\"nc\">IAtom</span> <span class=\"n\">atomC8</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"nc\">Elements</span><span class=\"o\">.</span><span class=\"na\">CARBON</span><span class=\"o\">);</span>\n    <span class=\"n\">atomC8</span><span class=\"o\">.</span><span class=\"na\">setID</span><span class=\"o\">(</span><span class=\"s\">\"C8\"</span><span class=\"o\">);</span> <span class=\"n\">atomC8</span><span class=\"o\">.</span><span class=\"na\">setHydrogenCount</span><span class=\"o\">(</span><span class=\"mi\">1</span><span class=\"o\">);</span>\n  <span class=\"nc\">IAtom</span> <span class=\"n\">atomC9</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"nc\">Elements</span><span class=\"o\">.</span><span class=\"na\">CARBON</span><span class=\"o\">);</span>\n    <span class=\"n\">atomC9</span><span class=\"o\">.</span><span class=\"na\">setID</span><span class=\"o\">(</span><span class=\"s\">\"C9\"</span><span class=\"o\">);</span> <span class=\"n\">atomC9</span><span class=\"o\">.</span><span class=\"na\">setHydrogenCount</span><span class=\"o\">(</span><span class=\"mi\">1</span><span class=\"o\">);</span>\n  <span class=\"nc\">IAtom</span> <span class=\"n\">iron</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"nc\">Elements</span><span class=\"o\">.</span><span class=\"na\">IRON</span><span class=\"o\">);</span>\n    <span class=\"n\">iron</span><span class=\"o\">.</span><span class=\"na\">setID</span><span class=\"o\">(</span><span class=\"s\">\"Fe10\"</span><span class=\"o\">);</span> <span class=\"n\">iron</span><span class=\"o\">.</span><span class=\"na\">setHydrogenCount</span><span class=\"o\">(</span><span class=\"mi\">0</span><span class=\"o\">);</span>\n\n  <span class=\"nc\">IBond</span> <span class=\"n\">bondB0</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newBond</span><span class=\"o\">(</span><span class=\"n\">atomC0</span><span class=\"o\">,</span> <span class=\"n\">atomC1</span><span class=\"o\">);</span>\n    <span class=\"n\">bondB0</span><span class=\"o\">.</span><span class=\"na\">setElectronCount</span><span class=\"o\">(</span><span class=\"mi\">2</span><span class=\"o\">);</span>\n  <span class=\"nc\">IBond</span> <span class=\"n\">bondB1</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newBond</span><span class=\"o\">(</span><span class=\"n\">atomC1</span><span class=\"o\">,</span> <span class=\"n\">atomC2</span><span class=\"o\">);</span>\n    <span class=\"n\">bondB1</span><span class=\"o\">.</span><span class=\"na\">setElectronCount</span><span class=\"o\">(</span><span class=\"mi\">2</span><span class=\"o\">);</span>\n  <span class=\"nc\">IBond</span> <span class=\"n\">bondB2</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newBond</span><span class=\"o\">(</span><span class=\"n\">atomC2</span><span class=\"o\">,</span> <span class=\"n\">atomC3</span><span class=\"o\">);</span>\n    <span class=\"n\">bondB2</span><span class=\"o\">.</span><span class=\"na\">setElectronCount</span><span class=\"o\">(</span><span class=\"mi\">2</span><span class=\"o\">);</span>\n  <span class=\"nc\">IBond</span> <span class=\"n\">bondB3</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newBond</span><span class=\"o\">(</span><span class=\"n\">atomC3</span><span class=\"o\">,</span> <span class=\"n\">atomC4</span><span class=\"o\">);</span>\n    <span class=\"n\">bondB3</span><span class=\"o\">.</span><span class=\"na\">setElectronCount</span><span class=\"o\">(</span><span class=\"mi\">2</span><span class=\"o\">);</span>\n  <span class=\"nc\">IBond</span> <span class=\"n\">bondB4</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newBond</span><span class=\"o\">(</span><span class=\"n\">atomC4</span><span class=\"o\">,</span> <span class=\"n\">atomC0</span><span class=\"o\">);</span>\n    <span class=\"n\">bondB4</span><span class=\"o\">.</span><span class=\"na\">setElectronCount</span><span class=\"o\">(</span><span class=\"mi\">2</span><span class=\"o\">);</span>\n  <span class=\"nc\">IBond</span> <span class=\"n\">bondB5</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newBond</span><span class=\"o\">(</span><span class=\"n\">atomC5</span><span class=\"o\">,</span> <span class=\"n\">atomC6</span><span class=\"o\">);</span>\n    <span class=\"n\">bondB5</span><span class=\"o\">.</span><span class=\"na\">setElectronCount</span><span class=\"o\">(</span><span class=\"mi\">2</span><span class=\"o\">);</span>\n  <span class=\"nc\">IBond</span> <span class=\"n\">bondB6</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newBond</span><span class=\"o\">(</span><span class=\"n\">atomC6</span><span class=\"o\">,</span> <span class=\"n\">atomC7</span><span class=\"o\">);</span>\n    <span class=\"n\">bondB6</span><span class=\"o\">.</span><span class=\"na\">setElectronCount</span><span class=\"o\">(</span><span class=\"mi\">2</span><span class=\"o\">);</span>\n  <span class=\"nc\">IBond</span> <span class=\"n\">bondB7</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newBond</span><span class=\"o\">(</span><span class=\"n\">atomC7</span><span class=\"o\">,</span> <span class=\"n\">atomC8</span><span class=\"o\">);</span>\n    <span class=\"n\">bondB7</span><span class=\"o\">.</span><span class=\"na\">setElectronCount</span><span class=\"o\">(</span><span class=\"mi\">2</span><span class=\"o\">);</span>\n  <span class=\"nc\">IBond</span> <span class=\"n\">bondB8</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newBond</span><span class=\"o\">(</span><span class=\"n\">atomC8</span><span class=\"o\">,</span> <span class=\"n\">atomC9</span><span class=\"o\">);</span>\n    <span class=\"n\">bondB8</span><span class=\"o\">.</span><span class=\"na\">setElectronCount</span><span class=\"o\">(</span><span class=\"mi\">2</span><span class=\"o\">);</span>\n  <span class=\"nc\">IBond</span> <span class=\"n\">bondB9</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newBond</span><span class=\"o\">(</span><span class=\"n\">atomC9</span><span class=\"o\">,</span> <span class=\"n\">atomC5</span><span class=\"o\">);</span>\n    <span class=\"n\">bondB9</span><span class=\"o\">.</span><span class=\"na\">setElectronCount</span><span class=\"o\">(</span><span class=\"mi\">2</span><span class=\"o\">);</span>\n\n  <span class=\"nc\">IBond</span> <span class=\"n\">bondingSystem1</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newBond</span><span class=\"o\">();</span>\n    <span class=\"n\">bondingSystem1</span><span class=\"o\">.</span><span class=\"na\">setElectronCount</span><span class=\"o\">(</span><span class=\"mi\">6</span><span class=\"o\">);</span>\n    <span class=\"n\">bondingSystem1</span><span class=\"o\">.</span><span class=\"na\">setAtoms</span><span class=\"o\">(</span>\n      <span class=\"k\">new</span> <span class=\"nc\">IAtom</span><span class=\"o\">[]</span> <span class=\"o\">{</span>\n       <span class=\"n\">atomC0</span><span class=\"o\">,</span> <span class=\"n\">atomC1</span><span class=\"o\">,</span> <span class=\"n\">atomC2</span><span class=\"o\">,</span> <span class=\"n\">atomC3</span><span class=\"o\">,</span> <span class=\"n\">atomC4</span><span class=\"o\">,</span> <span class=\"n\">iron</span>\n      <span class=\"o\">}</span>\n    <span class=\"o\">);</span>\n  <span class=\"nc\">IBond</span> <span class=\"n\">bondingSystem2</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newBond</span><span class=\"o\">();</span> \n    <span class=\"n\">bondingSystem2</span><span class=\"o\">.</span><span class=\"na\">setElectronCount</span><span class=\"o\">(</span><span class=\"mi\">6</span><span class=\"o\">);</span>\n    <span class=\"n\">bondingSystem2</span><span class=\"o\">.</span><span class=\"na\">setAtoms</span><span class=\"o\">(</span>\n      <span class=\"k\">new</span> <span class=\"nc\">IAtom</span><span class=\"o\">[]</span> <span class=\"o\">{</span>\n        <span class=\"n\">atomC5</span><span class=\"o\">,</span> <span class=\"n\">atomC6</span><span class=\"o\">,</span> <span class=\"n\">atomC7</span><span class=\"o\">,</span> <span class=\"n\">atomC8</span><span class=\"o\">,</span> <span class=\"n\">atomC9</span><span class=\"o\">,</span> <span class=\"n\">iron</span>\n      <span class=\"o\">}</span>\n    <span class=\"o\">);</span>\n  <span class=\"nc\">IBond</span> <span class=\"n\">bondingSystem3</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newBond</span><span class=\"o\">();</span>\n    <span class=\"n\">bondingSystem3</span><span class=\"o\">.</span><span class=\"na\">setElectronCount</span><span class=\"o\">(</span><span class=\"mi\">6</span><span class=\"o\">);</span>\n    <span class=\"n\">bondingSystem3</span><span class=\"o\">.</span><span class=\"na\">setAtoms</span><span class=\"o\">(</span>\n      <span class=\"k\">new</span> <span class=\"nc\">IAtom</span><span class=\"o\">[]{</span>\n        <span class=\"n\">atomC0</span><span class=\"o\">,</span> <span class=\"n\">atomC1</span><span class=\"o\">,</span> <span class=\"n\">atomC2</span><span class=\"o\">,</span> <span class=\"n\">atomC3</span><span class=\"o\">,</span> <span class=\"n\">atomC4</span><span class=\"o\">,</span>\n        <span class=\"n\">atomC5</span><span class=\"o\">,</span> <span class=\"n\">atomC6</span><span class=\"o\">,</span> <span class=\"n\">atomC7</span><span class=\"o\">,</span> <span class=\"n\">atomC8</span><span class=\"o\">,</span> <span class=\"n\">atomC9</span><span class=\"o\">,</span>\n        <span class=\"n\">iron</span>\n      <span class=\"o\">}</span>\n    <span class=\"o\">);</span>\n\n  <span class=\"n\">ferrocene</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">atomC0</span><span class=\"o\">);</span> <span class=\"n\">ferrocene</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">atomC1</span><span class=\"o\">);</span>\n  <span class=\"n\">ferrocene</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">atomC2</span><span class=\"o\">);</span> <span class=\"n\">ferrocene</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">atomC3</span><span class=\"o\">);</span>\n  <span class=\"n\">ferrocene</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">atomC4</span><span class=\"o\">);</span> <span class=\"n\">ferrocene</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">atomC5</span><span class=\"o\">);</span>\n  <span class=\"n\">ferrocene</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">atomC6</span><span class=\"o\">);</span> <span class=\"n\">ferrocene</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">atomC7</span><span class=\"o\">);</span>\n  <span class=\"n\">ferrocene</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">atomC8</span><span class=\"o\">);</span> <span class=\"n\">ferrocene</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">atomC9</span><span class=\"o\">);</span>\n  <span class=\"n\">ferrocene</span><span class=\"o\">.</span><span class=\"na\">addAtom</span><span class=\"o\">(</span><span class=\"n\">iron</span><span class=\"o\">);</span>\n\n  <span class=\"n\">ferrocene</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">bondB0</span><span class=\"o\">);</span> <span class=\"n\">ferrocene</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">bondB1</span><span class=\"o\">);</span>\n  <span class=\"n\">ferrocene</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">bondB2</span><span class=\"o\">);</span> <span class=\"n\">ferrocene</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">bondB3</span><span class=\"o\">);</span>\n  <span class=\"n\">ferrocene</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">bondB4</span><span class=\"o\">);</span>\n  <span class=\"n\">ferrocene</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">bondB5</span><span class=\"o\">);</span> <span class=\"n\">ferrocene</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">bondB6</span><span class=\"o\">);</span>\n  <span class=\"n\">ferrocene</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">bondB7</span><span class=\"o\">);</span> <span class=\"n\">ferrocene</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">bondB8</span><span class=\"o\">);</span>\n  <span class=\"n\">ferrocene</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">bondB9</span><span class=\"o\">);</span>\n  <span class=\"n\">ferrocene</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">bondingSystem1</span><span class=\"o\">);</span>\n  <span class=\"n\">ferrocene</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">bondingSystem2</span><span class=\"o\">);</span>\n  <span class=\"n\">ferrocene</span><span class=\"o\">.</span><span class=\"na\">addBond</span><span class=\"o\">(</span><span class=\"n\">bondingSystem3</span><span class=\"o\">);</span>\n\n  <span class=\"k\">return</span> <span class=\"n\">ferrocene</span><span class=\"o\">;</span>\n<span class=\"o\">}</span>\n</code></pre></div></div>\n\n<p>Now, you will note that this approach does not exactly follow Rich’s FlexMol examples: the\nskipped atom pair concepts in the FlexMol version of ferrocene. His example, more closely follows\nwhat we are likely to draw, while the CDK code above more closely follows the molecular orbital\nconcept. (I have to check to see how Dietz and Bauerschmidt did this.)</p>\n\n<p>As said, the real trick is to have the chemoinformatics toolkit that can work with this\nrepresentation, but I will save that for later. At least our algorithms to calculate the\nmolecular mass should work ;)</p>",
      "summary": "Rich recently blogged about the limitations of the two-atom bond representation often used in chemoinformatics, triggered by the four ferrocene entries in PubChem. In reply to himself, Rich described FlexMol, an XML language that can describe bond systems that involve more than two atoms.",
      
      "date_published": "2006-12-30T00:00:00+00:00",
      "date_modified": "2006-12-30T00:00:00+00:00",
      "tags": ["cheminf","cdk"],
      "_references": [{ "url": "https://doi.org/10.1021/ci00027a001" },{ "url": "https://doi.org/10.1021/ci9704423" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/9pkxf-3ns82",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/12/21/updated-chemical-blogspace-layout-and.html",
      "title": "Updated Chemical Blogspace Layout and Software",
      "content_html": "<p>Last night I upgraded the software behind <a href=\"https://web.archive.org/web/20061223075417/http://wiki.cubic.uni-koeln.de/cb/\">Chemical <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2006/11/03/chemical-blogspace-updates.html\">blogspace <i class=\"fa-solid fa-recycle fa-xs\"></i></a>, to the version\n<a href=\"http://code.google.com/p/openreview/\">online</a> on <a href=\"http://code.google.com/\">Google Code</a>, though I needed the help from\n<a href=\"https://web.archive.org/web/20051104010705/http://www.ghastlyfop.com/blog/\">Eaun <i class=\"fa-solid fa-box-archive fa-xs\"></i></a> to get paper titles correctly picked up for <a href=\"http://pubs.acs.org/\">ACS journals</a>.\nThe number of working blogs is a bit down and now at <a href=\"https://web.archive.org/web/20070102170205/http://wiki.cubic.uni-koeln.de/cb/blogs.php\">68 <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>,\nwith an average number of 30 active blogs posting more than 100 blog items each day (see <a href=\"https://web.archive.org/web/20061223075048/http://wiki.cubic.uni-koeln.de/cb/stats.php\">Zeitgeist <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>).\nThe new design looks like quite nice compared to the <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/08/25/chemical-blogspace.html\">old one</a>:</p>\n\n<p><img src=\"/assets/images/chemicalBlogspaceScreeny.png\" alt=\"\" /></p>",
      "summary": "Last night I upgraded the software behind Chemical blogspace , to the version online on Google Code, though I needed the help from Eaun to get paper titles correctly picked up for ACS journals. The number of working blogs is a bit down and now at 68 , with an average number of 30 active blogs posting more than 100 blog items each day (see Zeitgeist ). The new design looks like quite nice compared to the old one:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/chemicalBlogspaceScreeny.png",
      "date_published": "2006-12-21T00:00:00+00:00",
      "date_modified": "2024-08-24T00:00:00+00:00",
      "tags": ["cb","chemistry"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2006/12/19/chemistry-in-html-greasemonkey-again.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/12/19/chemistry-in-html-greasemonkey-again.html",
      "title": "Chemistry in HTML: Greasemonkey again",
      "content_html": "<p>Here’s a quick update on my blog about <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/12/17/smiles-cas-and-inchi-in-blogs.html\">SMILES, CAS and InChI in blogs: Greasemonkey <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nlast sunday. The original download was messed up :( You can download a new version at <a href=\"http://userscripts.org/scripts/show/6807\">userscripts.org</a>.</p>\n\n<p>This new version also supports <code class=\"language-plaintext highlighter-rouge\">chem:compound</code>, for any chemical. For example:</p>\n\n<ul>\n  <li><span class=\"chem:compound\">isopropyl alcohol</span></li>\n</ul>\n\n<p>Remember that it only works for properly marked up content, as described in <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/12/10/including-smiles-cml-and-inchi-in.html\">Including SMILES, CML and InChI in blogs <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.\nThe HTML source code of the above example looks like (in RDFa):</p>\n\n<div class=\"language-html highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nt\">&lt;ul&gt;&lt;li&gt;</span>\n<span class=\"nt\">&lt;span</span> <span class=\"na\">xmlns:chem=</span><span class=\"s\">\"http://www.blueobelisk.org/chemistryblogs/\"</span>\n      <span class=\"na\">class=</span><span class=\"s\">\"chem:compound\"</span><span class=\"nt\">&gt;</span>isopropyl alcohol<span class=\"nt\">&lt;/span&gt;</span>\n<span class=\"nt\">&lt;/li&gt;&lt;/ul&gt;</span>\n</code></pre></div></div>\n\n<p>The current script only adds search links to <a href=\"http://pubchem.ncbi.nlm.nih.gov/\">PubChem</a> and\n<a href=\"http://google.com/\">Google</a>, but the possibilities are endless, and potentially very powerfull.\nHere are some future ideas.</p>\n\n<h2 id=\"a-link-to-predict-nmr-spectra-using-nmrshiftdborg\">A link to predict NMR spectra using NMRShiftDB.org:</h2>\n\n<p>Making a link to the <a href=\"http://www.nmrshiftdb.org/\">NMRShiftDB.org</a> website to predict <sup>13</sup>C or\n<sup>1</sup>H NMR from a SMILES, and InChI likely too, is easy, if the website provides a URL to do this.\n(I will discuss this with Stefan.)</p>\n\n<h2 id=\"a-popup-window-with-the-3d-structure-in-jmol\">A popup window with the 3D structure in Jmol:</h2>\n\n<p>This would involve some more work, but this most certainly possible too, given that we actually have\na website around which allows downloading 3D coordinates given a SMILES or InChI. While a simple approach\nwould be to make a popup with <a href=\"http://www.jmol.org/\">Jmol</a> that takes the URL to that 3D coordinate website,\nit could be extended using Ajax to query the 3D structure first, and depending on success, show\nJmol or a message “Could not find 3D coordinates”.</p>\n\n<h2 id=\"summarize-molecular-details-hidden-in-cml\">Summarize molecular details hidden in CML:</h2>\n\n<p>This is likely the most exiting possibility. I blogged about CMLRSS <a href=\"http://search.blogger.com/?as_q=CMLRSS&amp;ie=UTF-8&amp;ui=blg&amp;bl_url=chem-bla-ics.blogspot.com&amp;x=0&amp;y=0\">many times</a>\nnow (check the AVI, the <a href=\"https://doi.org/10.1021/ci034244p\">article</a>, etc), and combining these two\ntechnologies will take the semantic, chemistry internet to the next level. CMLRSS describes how CML\ncan be embedded in blog items (e.g. <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/02/18/blogging-chemistry-on-blogspotcom.html\">Blogging chemistry on blogspot.com <i class=\"fa-solid fa-recycle fa-xs\"></i></a>),\nbut really works for any <a href=\"http://www.w3.org/TR/xhtml1/\">XHTML</a>.</p>\n\n<p>Consider this mockup: add CML content to your blog item, containing molecular properties, such as its\nNMR peaks, elemental analysis, etc. This will not show up in your blog item, so that the user is not\nbothered with implementation details. Now, a userscript will now about the CML content, as it has access\nto the whole content of the page. The visible text will mention the molecule for which CML contains\nexperimental or other details. Using the <code class=\"language-plaintext highlighter-rouge\">&lt;span class=\"chem:compound\"/&gt;</code> technology shown above, it is\npossible to link that compound to this CML bit (details to follow in this blog in January 2007). The\nuserscript will then on the fly create a popup for the compound name in the visible text to show those\nexperimental details.</p>\n\n<p>How about that? Comments and other ideas are more than welcome!</p>\n\n<h2 id=\"server-side-scripts\">Server side scripts:</h2>\n\n<p>Greasemonkey allows users to decide which scripts to run on a website, and which not. If you, as blogger\nor XHTML editor, want to force a script like the above to be run, that should be possible too.\nGreasemonkey scripts are written in JavaScript, so including them on the server side should be\npossible too. I might explore this option soon too.</p>",
      "summary": "Here’s a quick update on my blog about SMILES, CAS and InChI in blogs: Greasemonkey last sunday. The original download was messed up :( You can download a new version at userscripts.org.",
      
      "date_published": "2006-12-19T00:00:00+00:00",
      "date_modified": "2024-12-29T00:00:00+00:00",
      "tags": ["userscript","html","rdf"],
      "_references": [{ "url": "https://doi.org/10.1021/CI034244P" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2006/12/17/smiles-cas-and-inchi-in-blogs.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/12/17/smiles-cas-and-inchi-in-blogs.html",
      "title": "SMILES, CAS and InChI in blogs: Greasemonkey",
      "content_html": "<p>As follow up on my <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/12/10/including-smiles-cml-and-inchi-in.html\">Including SMILES, CML and InChI in blogs <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nblog last week, I had a go at <a href=\"http://en.wikipedia.org/wiki/Greasemonkey\">Greasemonkey</a>. Some time ago already,\n<a href=\"http://www.ghastlyfop.com/blog/2006/09/postgenomic-pubmed-mashup.html\">Flags and Lollipops</a> and\n<a href=\"http://www.nodalpoint.org/2006/05/16/postgenomic_greasemonkey_script\">Nodalpoint</a> showed with two cool mashups (one Connotea/Postgenomic\nand one Pubmed/Postgenomic) that userscripts are rather useful in science too. I can very much recommend the PubMed/Postgenomic mashup,\nas PubMed has several organic chemistry journals indexed too!</p>\n\n<p>So, how does this relate to my blog of last week? Well, would it not be nice that if your blog uses the markup as suggested in that\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2006/12/10/including-smiles-cml-and-inchi-in.html\">blog <i class=\"fa-solid fa-recycle fa-xs\"></i></a>, that you automatically get links to\n<a href=\"http://pubchem.ncbi.nlm.nih.gov/\">PubChem</a> and <a href=\"http://google.com/\">Google</a>? That is now possible with a small GPL-ed Greasemonkey script\ncalled <a href=\"http://www.woc.science.ru.nl/devel/egonw/blogchemistry.user.js\">blogchemistry.user.js</a>.</p>\n\n<p>The <a href=\"http://greasemonkey.mozdev.org/\">Greasemonkey plugin</a> requires <a href=\"http://getfirefox.com/\">Firefox</a> to be installed. If ready, install\nthe script by cli·cking this link earlier, and the Greasemonkey will ask you if you want to install the script. After, check the output\nfor this RDFa markup content:</p>\n\n<ul>\n  <li>a SMILES: <span xmlns:chem=\"http://www.blueobelisk.org/chemistryblogs/\" class=\"chem:smiles\">CCO</span></li>\n  <li>a CAS registry number: <span xmlns:chem=\"http://www.blueobelisk.org/chemistryblogs/\" class=\"chem:casnumber\">50-00-0</span></li>\n  <li>and an InChI: <span xmlns:chem=\"http://www.blueobelisk.org/chemistryblogs/\" class=\"chem:inchi\">InChI=1/CH4/h1H4</span></li>\n</ul>\n\n<p>It should look like the output for this blog item:</p>\n\n<p><img src=\"/assets/images/sechemticWebScript.png\" alt=\"\" /></p>\n\n<p>Note the superscript PubChem and Google links.</p>",
      "summary": "As follow up on my Including SMILES, CML and InChI in blogs blog last week, I had a go at Greasemonkey. Some time ago already, Flags and Lollipops and Nodalpoint showed with two cool mashups (one Connotea/Postgenomic and one Pubmed/Postgenomic) that userscripts are rather useful in science too. I can very much recommend the PubMed/Postgenomic mashup, as PubMed has several organic chemistry journals indexed too!",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/sechemticWebScript.png",
      "date_published": "2006-12-17T00:00:00+00:00",
      "date_modified": "2024-12-29T00:00:00+00:00",
      "tags": ["chemistry","userscript","smiles","pubchem","inchi"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2006/12/17/counting-stereoisomers-from-molecular_17.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/12/17/counting-stereoisomers-from-molecular_17.html",
      "title": "Counting constitutional isomers from the molecular formula",
      "content_html": "<p><strong>Update</strong>: check <a href=\"https://doi.org/10.1186/s13321-022-00604-9\">these</a> <a href=\"https://doi.org/10.1186/s13321-021-00529-9\">two</a> papers.</p>\n\n<p>We all know the combinatorial explosion when calculating the number of possible constitutional\nisomers (see <a href=\"http://en.wikipedia.org/wiki/Structural_isomerism\">wp:structural isomorphism</a>) of\na certain molecular formula. For example, C2H6 has only one constitutional isomer (ethane,\nInChI=1/C2H6/c1-2/h1-2H3), and C4H10 has only two. Especially, breaking symmetry by replacing one\ncarbon by another element, or replacing a single by a double bond, increases the number sharply.\nFor example, C7H16 has only nine constitutional isomers, while replacing two single bonds by two\ndouble bonds, creating C7H10, increases this number to 499! Then, replacing in the last formula,\none carbon by an oxygen adds another few, totaling 747 isomers.</p>\n\n<p>Now, C8H8NBr has at least <strong>649 thousand</strong> constitutional isomers, and I am quite interested in\nbeing able to know the number of isomers beforehand, without having to generate the structures\nitself (for example, using <a href=\"http://cdk.sf.net/\">CDK</a>’s <code class=\"language-plaintext highlighter-rouge\">GENMDeterministicGenerator</code>).\nInChI=1/C8H8BrN/c9-7-1-2-8-6(5-7)3-4-10-8/h1-2,5,10H,3-4H2 is one of the isomers.</p>\n\n<p>So, my question: is anyone aware of free code (in order of preference: 1. LGPL, 2. BSD/MIT,</p>\n<ol>\n  <li>opensource, 4. free) to calculate or estimate the number of constitutional isomers for a\ncertain molecular formula. An estimate would already be nice. Ideally, I would implement this bit\nof code into the CDK, but otherwise, just knowing the number of isomers for C8H8NBr would be\nnice :)</li>\n</ol>\n\n<p>Additionally, any relevant, recent literature recommendations are most welcomed. I am aware of the\nuse of polynomials, but literature I have seen so far just focuses on molecules of a certain\narchitecture, and it not able to come up with a guess based on the molecular formula alone.</p>",
      "summary": "Update: check these two papers.",
      
      "date_published": "2006-12-17T00:00:00+00:00",
      "date_modified": "2022-04-05T00:00:00+00:00",
      "tags": ["cheminf","cdk"],
      "_references": [{ "url": "https://doi.org/10.1186/s13321-021-00529-9" },{ "url": "https://doi.org/10.1186/s13321-022-00604-9" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2006/12/12/molecular-chemometrics.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/12/12/molecular-chemometrics.html",
      "title": "Molecular Chemometrics",
      "content_html": "<p>I just found out that a review article that I wrote earlier this year got printed: <em>Molecular Chemometrics</em>\n(DOI:<a href=\"https://doi.org/10.1080/10408340600969601\">10.1080/10408340600969601</a>), with my personal view on the interplay between\nchemoinformatics and chemometrics. The review discusses interesting developments in the last five years, and was fun writing\n(reading too, I think :). It has four major topics:</p>\n\n<ul>\n  <li><em>molecular representation</em> (with ‘molecular descriptors’ and ‘beyond the molecule’)</li>\n  <li><em>chemical space, similarity and diversity</em></li>\n  <li><em>activity and property modeling</em> (with ‘dimension reduction’ and ‘model validation’)</li>\n  <li><em>library searching</em>, which mostly focuses on semantic web developments</li>\n</ul>\n\n<p>Comments most welcome; just leave them below this blog item, or blog about the article yourself :)</p>",
      "summary": "I just found out that a review article that I wrote earlier this year got printed: Molecular Chemometrics (DOI:10.1080/10408340600969601), with my personal view on the interplay between chemoinformatics and chemometrics. The review discusses interesting developments in the last five years, and was fun writing (reading too, I think :). It has four major topics:",
      
      "date_published": "2006-12-12T00:00:00+00:00",
      "date_modified": "2006-12-12T00:00:00+00:00",
      "tags": ["chemometrics","cheminf"],
      "_references": [{ "url": "https://doi.org/10.1080/10408340600969601" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2006/12/10/including-smiles-cml-and-inchi-in.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/12/10/including-smiles-cml-and-inchi-in.html",
      "title": "Including SMILES, CML and InChI in blogs",
      "content_html": "<p>The blogs <a href=\"http://blog.chembark.com/\">ChemBark</a> and <a href=\"http://kinasepro.wordpress.com/\">KinasePro</a> have been discussing\nthe use of SMILES, CML and InChI in <a href=\"http://wiki.cubic.uni-koeln.de/pg/\">Chemical Blogspace</a> (with 70 chemistry blogs now!).\nChemists seem to <a href=\"http://kinasepro.wordpress.com/2006/12/05/monday-night-ot-2/\">prefer SMILES over InChI</a>, while there is\n<a href=\"http://blog.chembark.com/2006/11/25/help-needed-how-do-we-use-cml-properly/\">interest in moving towards CML too</a>.\n<a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/\">Peter commented</a>.</p>\n\n<p>Any incorporation of content other than images and free text requires some HTML knowledge, but this can be rather limited.\nIt is up to us chemoinformaticians to write good documentation on how to do things; so here is a first go.</p>\n\n<h2 id=\"including-cml-in-blogs-and-other-rss-feeds\">Including CML in blogs and other RSS feeds</h2>\n\n<p>I blogged about including <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/02/18/blogging-chemistry-on-blogspotcom.html\">CML in blogs <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nlast February, and can generally refer to this article published last year: <em>Chemical markup, XML, and the World Wide Web. 5.\nApplications of chemical metadata in RSS aggregators</em> (PMID:<a href=\"https://pubmed.ncbi.nlm.nih.gov/15032525\">15032525</a>,\nDOI:<a href=\"https://doi.org/10.1021/ci034244p\">10.1021/ci034244p</a>). Basically, it just comes down to putting the CML code into\nthe HTML version of your blog content, though I appreciate the need for plugins.</p>\n\n<h2 id=\"including-smiles-cas-and-inchi-in-blogs\">Including SMILES, CAS and InChI in blogs</h2>\n\n<p>Including SMILES is much easier as it is plain text, and has the advantage over InChI that it is much more readable.\n<a href=\"http://www.cambridgemedchemconsulting.com/\">Chris</a> wondered in the KinasePro blog on how to tag SMILES, while Paul\ndid the same on ChemBark about CAS numbers.</p>\n\n<p>Now, users of <a href=\"http://postgenomic.com/\">PostGenomic.com</a> know how to <a href=\"http://postgenomic.com/wiki/doku.php?id=markup\">add markup to their blogs</a>\nto get PostGenomic index discussed literature, website and conferences. Something similar is easily done for chemistry\nthings too, as I showed in <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/02/25/hacking-inchi-support-into.html\">Hacking InChI support into postgenomic.com <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\n(which was put on lower priority because of finishing my PhD). PostGenomic.com basically uses microformats, which I\nblogged about just a few days ago in <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/12/06/chemoblogs-2.html\">Chemo::Blogs #2 <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\nwhere I suggested the use of <code class=\"language-plaintext highlighter-rouge\">&lt;span class=\"chemicalcompound\"&gt;asperin&lt;/span&gt;</code>.</p>\n\n<p>And this is the way SMILES, CAS and InChI’s can be tagged on blogs. The <code class=\"language-plaintext highlighter-rouge\">&lt;span&gt;</code> element is HTML code to indicate\na bit of similar content in HTML, and can, among many other things, be formatted differently than other text. However,\nthis can also be used to add semantics in a relatively cheap, but accepted, way. [Microformats](http://microformats.org/\n are formalized just by use, so whatever we, as chemistry bloggers, use will become the de facto standard. Here are my suggestions:</p>\n\n<ul>\n  <li>for SMILES: <code class=\"language-plaintext highlighter-rouge\">&lt;span class=\"smiles\"&gt;CCO&lt;/span&gt;</code></li>\n  <li>for CAS registry numbers: <code class=\"language-plaintext highlighter-rouge\">&lt;span class=\"casnumber\"&gt;50-00-0&lt;/span&gt;</code></li>\n  <li>for InChI: <code class=\"language-plaintext highlighter-rouge\">&lt;span class=\"inchi\"&gt;InChI=1/CH4/h1H4&lt;/span&gt;</code></li>\n</ul>\n\n<h2 id=\"the-rdfa-alternative\">The RDFa alternative</h2>\n\n<p>The future, however, might use RDFa over microformats, so here are the RDFa equivalents:</p>\n\n<ul>\n  <li>for SMILES: <code class=\"language-plaintext highlighter-rouge\">&lt;span class=\"chem:smiles\"&gt;CCO&lt;/span&gt;</code></li>\n  <li>for CAS registry numbers: <code class=\"language-plaintext highlighter-rouge\">&lt;span class=\"chem:casnumber\"&gt;50-00-0&lt;/span&gt;</code></li>\n  <li>for InChI: <code class=\"language-plaintext highlighter-rouge\">&lt;span class=\"chem:inchi\"&gt;InChI=1/CH4/h1H4&lt;/span&gt;</code></li>\n</ul>\n\n<p>which requires you to register the namespace <code class=\"language-plaintext highlighter-rouge\">xmlns:chem=\"http://www.blueobelisk.org/chemistryblogs/\"</code> somewhere though.\nFormally, the URN for this namespace needs to be formalized; Peter, would the <a href=\"http://www.blueobelisk.org/\">Blue Obelisk</a>\nbe the platform to do this? BTW, this is more advanced, and currently does not have practical advantages over the use of\nmicroformats.</p>",
      "summary": "The blogs ChemBark and KinasePro have been discussing the use of SMILES, CML and InChI in Chemical Blogspace (with 70 chemistry blogs now!). Chemists seem to prefer SMILES over InChI, while there is interest in moving towards CML too. Peter commented.",
      
      "date_published": "2006-12-10T00:00:00+00:00",
      "date_modified": "2024-12-29T00:00:00+00:00",
      "tags": ["cml","inchi","blog","cb","microformat","rdf","html"],
      "_references": [{ "url": "https://doi.org/10.1021/CI034244P" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2006/12/09/h-index-in-chemoinformatics.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/12/09/h-index-in-chemoinformatics.html",
      "title": "H-index in chemoinformatics",
      "content_html": "<p><a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/\">Peter</a> <a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=209\">blogged</a> about the\n<a href=\"http://en.wikipedia.org/wiki/H-index\">h-index</a>, which is a measure for ones scientific impact. He used\n<a href=\"http://scholar.google.com/\">Google Scholar</a>, but I do not feel that that database is clean enough. I believe a better\nsource would be the <a href=\"http://portal.isiknowledge.com/portal.cgi?DestApp=WOS&amp;Func=Frame\">ISI Web-of-Science</a>.</p>\n\n<p>Therefore, I composed a list of h-indices of my own, ordered by value. The choice of authors is biased to the\n<a href=\"http://www.blueobelisk.org/\">Blue Obelisk</a> and the <a href=\"http://cdk.sf.net/\">CDK</a>, has some personal touches\n(<a href=\"http://www.cac.science.ru.nl/people/lbuydens/\">Buydens</a> are <a href=\"http://www.cac.science.ru.nl/people/rwehrens/\">Wehrens</a>\nare my PhD supervisors) and some names that put the rest into perspective:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>query\t\th-index\t#pubs\nBENDER A\t41\t222\nWILLETT P\t37\t302\nGASTEIGER J\t33\t212\nRZEPA HS\t25\t236\nBUYDENS LMC\t18\t108\nGLEN RC\t\t18\t78\nWEHRENS R\t11\t47\nMURRAY-RUST P*\t9\t41\nSTEINBECK C\t9\t29\nFECHNER U\t6\t12\nGUHA R\t\t4\t24\nWILLIGHAGEN E*\t4\t9\nWEGNER JK\t3\t9\nLUTTMANN E\t2\t4\n</code></pre></div></div>\n\n<p>Of course, there are many comments on this. Like any measurement, take into account the error. Sources of error\ninclude, but are not limited to, ambiguity in the query. The most notable example of this, I think, is\n<a href=\"http://andygoesus.blogspot.com/\">Andreas Bender</a>; I don’t think he has been <em>that</em> successful :) Also,\n<a href=\"http://cheminfo.informatics.indiana.edu/~rguha/\">Rajarshi Guha</a>’s h-index was reported 6, but the list included\ntwo articles from the 70-ies and 80-ies, which I do not think are actually really his.</p>\n\n<p>Feel free to suggest other names, query corrections, tips, and I will add or work on those too.</p>",
      "summary": "Peter blogged about the h-index, which is a measure for ones scientific impact. He used Google Scholar, but I do not feel that that database is clean enough. I believe a better source would be the ISI Web-of-Science.",
      
      "date_published": "2006-12-09T00:00:00+00:00",
      "date_modified": "2006-12-09T00:00:00+00:00",
      "tags": ["cheminf"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2006/12/06/power-of-big-numbers.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/12/06/power-of-big-numbers.html",
      "title": "The power of big numbers",
      "content_html": "<p>Contributions to open data do not have to be large, as long as many people are doing it. The\n<a href=\"http://wikipedia.org/\">Wikipedia</a> is a good example, and <a href=\"http://pubchem.ncbi.nlm.nih.gov/\">PubChem</a>\naccepts contributions of small databases too (I think). The result can still be large and rather useful, even scientifically.</p>\n\n<p>The latter was recently written down in the paper <em>Internet-based monitoring of influenza-like illness (ILI) in the general\npopulation of the Netherlands during the 2003–2004 influenza season</em> by Marquet et al. (DOI:<a href=\"https://doi.org/1471-2458/6/242\">1471-2458/6/242</a>).\nThe data was provided by Internet users via <a href=\"http://www.degrotegriepmeting.nl/\">The Great Influenza Survey</a> website. The article states that\nthe sum of all those small contributions (anonymous website users are asked to fill out a weekly form), yields reliable data. The user is\nrewarded by colorful pictures, such as:</p>\n\n<p><img src=\"/assets/images/alles_2006-12-06.png\" alt=\"\" /></p>\n\n<p>If all chemists and biochemists would add information about or properties of one molecule or metabolite to the Wikipedia each month,\none or more commercial database companies will have to change their business model soon. Oh, you already can start doing this\n<a href=\"http://en.wikipedia.org/wiki/Portal:Chemistry\">here</a>.</p>",
      "summary": "Contributions to open data do not have to be large, as long as many people are doing it. The Wikipedia is a good example, and PubChem accepts contributions of small databases too (I think). The result can still be large and rather useful, even scientifically.",
      
      "date_published": "2006-12-06T00:00:00+00:00",
      "date_modified": "2006-12-06T00:00:00+00:00",
      "tags": ["virus","chemometrics"],
      "_references": [{ "url": "https://doi.org/1471-2458/6/242" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2006/12/06/chemoblogs-2.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/12/06/chemoblogs-2.html",
      "title": "Chemo::Blogs #2",
      "content_html": "<p>Because no one picked up my <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/09/15/chemoblogs-1.html\">Chemo::Blogs <i class=\"fa-solid fa-recycle fa-xs\"></i></a> suggestion, I will now\nofficially claim the blog series title. However, unlike the original <a href=\"http://bioblogs.wordpress.com/\">Bio::Blogs</a> series,\nI will not summarize interesting blogs, but just spam you with websites I recently marked as\n<a href=\"http://del.icio.us/egonw/toblog\">toblog on del.icio.us</a>.</p>\n\n<h2 id=\"semantics-and-text-mining\">Semantics and Text Mining</h2>\n\n<p><a href=\"http://evan.prodromou.name/\">Evan Prodromou</a> wrote about <a href=\"http://evan.prodromou.name/RDFa_vs_microformats\">RDFa vs microformats</a>.\nThe latter are commonly used in <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/02/06/tagging-blog-items.html\">enhancing blog semantics <i class=\"fa-solid fa-recycle fa-xs\"></i></a>, and\nfor example used by <a href=\"http://postgenomic.com/wiki/doku.php?id=markup\">PostGenomic.com</a>. While RDFa is more explicit, e.g. by using\nnamespaced markup, we have to wait until XHTML2 to see it working. I do not think chemists are using tags a log yet, but let me\npropose the following microformats: <span class=\"inchi\"><a href=\"http://google.com/search?q=1/CH4/h1H4\">1/CH4/h1H4</a></span> and\n<span class=\"chemicalcompound\">methane<span>. Standard JavaScripts and CSS scripts will then do the rest. (Think: addressing newlines,\nauto <a href=\"http://wwmm-svc.ch.cam.ac.uk/wwmm/html/googleinchiserver.html\">googling-for-inchi</a>, etc).</span></span></p>\n\n<p>The reason why using microformats is interesting, is text mining, of various kinds. Whether it is setting up a molecule-article\nlink database, or <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/02/25/hacking-inchi-support-into.html\">find hot molecules in blogspace <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\nadding semantics will help tools like <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/09/08/chemical-archeology-oscar3-to.html\">OSCAR3 to mine chemistry <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.\nSome time ago <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/05/07/open-text-mining-interface-and.html\">OTMI was proposed by Nature <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\nand they now set up a <a href=\"http://www.opentextmining.org/wiki/Main_Page\">dedicated web site</a> to explain there view on text mining.\n<a href=\"http://www.zacker.com/\">Zack Rosen</a> has a good idea why <a href=\"http://www.zacker.org/semantic-web-research-isnt-working\">RDF Semantic web research isn’t working</a>.</p>\n\n<h2 id=\"blogspace\">Blogspace</h2>\n\n<p>There are a few new chemistry blogs I want to mention (and already added to <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/08/25/chemical-blogspace.html\">Chemical blogspace <i class=\"fa-solid fa-recycle fa-xs\"></i></a>):\n<a href=\"http://blog.chembark.com/\">ChemBark</a>, <a href=\"http://www.lirico.co.uk/wp/\">lirico</a> which has an interesting\n<a href=\"http://www.lirico.co.uk/wp/?cat=8\">chemoinformatics section</a>, and <a href=\"http://ashutoshchemist.blogspot.com/\">The Curious Wavefunction</a>.\nWorth reading indeed.</p>\n\n<p><a href=\"http://plindenbaum.blogspot.com/\">Pierre’s YOKOFAKUN</a> deserves a paragraph of his own. He recently blogged about\n<a href=\"http://plindenbaum.blogspot.com/2006/11/bio2rdf.html\">bio2rdf</a> which provides an <a href=\"http://bio2rdf.org/\">RDF interface to biochemical knowledge</a>\nvia <a href=\"http://lsid.sourceforge.net/\">Life Science Identifiers</a> (LSID), <a href=\"http://plindenbaum.blogspot.com/2006/11/wwwoboeditorg.html\">OBOEdit</a>\nwhich is a Java-based ontology editor, and <a href=\"http://plindenbaum.blogspot.com/2006/12/visual-unix-pipeline.html\">Amadea</a>\nwhich is a <a href=\"http://taverna.sf.net/\">Taverna</a>- and <a href=\"http://www.knime.org/\">KNIME</a>-like tool for setting up UNIX pipes.</p>\n\n<h2 id=\"online-embl-symposium\">Online EMBL Symposium</h2>\n\n<p>A few EMBL PhD students are having the <a href=\"http://virtualsymposium.predocs.org/\">First Online EMBL PhD Symposium</a> (catchy name, or … ;)\nAnyway, discussions are held on IRC, and it has a rather interesting Web2.0 session. All\n<a href=\"http://virtualsymposium.predocs.org/media\">media is available on the website</a> but requires registration right now.\nAfter the conference it will become open access to all. <a href=\"http://www.blogger.com/profile/6833158\">Jean-Claude</a> contributed\n<em>The UsefulChem Project: Open Source Chemistry Research using Blogs and Wikis</em> to the\n<a href=\"http://virtualsymposium.predocs.org/media/participants-contributions/\">Participants’ Contributions section</a>, and I had\na poster on <em>Distributing molecular information over the Internet</em>, discussing CMLRSS, blog aggregators, CML and other things.\nThe IRC session was logged and is <a href=\"http://virtualsymposium.predocs.org/chat/discussion-about-the-influence-of-web-2-0-on-science-tuesday-december-6-2006-16-00-cet/\">available here</a>.</p>\n\n<h2 id=\"literature\">Literature</h2>\n\n<p>Finally, I want to mention three recent articles. First one is a recent write up by Bourne and Friedberg about\n<em>Ten Simple Rules for Selecting a Postdoctoral Position</em> (DOI: <a href=\"https://doi.org/10.1371/journal.pcbi.0020121\">10.1371/journal.pcbi.0020121</a>).\nWith the end of my current postdoc position nearing, rather useful reading. Some time ago I blogged about a\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2006/05/11/new-open-access-journal-source-code.html\">New open access journal Source Code for Biology and Medicine <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\nand the journal is now up and running. Details can be read in the first editorial (DOI: <a href=\"https://doi.org/10.1186/1751-0473-1-1\">10.1186/1751-0473-1-1</a>).\nThe third article I would like to mention is <em>Scientific Software Development Is Not an Oxymoron</em> by Baxter\n(DOI: <a href=\"https://doi.org/10.1371/journal.pcbi.0020087\">10.1371/journal.pcbi.0020087</a>), though I do not think it has new insights.</p>\n\n<p>OK, this was a rather lengthy write up, but really needed to clean up my toblog section :)</p>",
      "summary": "Because no one picked up my Chemo::Blogs suggestion, I will now officially claim the blog series title. However, unlike the original Bio::Blogs series, I will not summarize interesting blogs, but just spam you with websites I recently marked as toblog on del.icio.us.",
      
      "date_published": "2006-12-06T00:00:00+00:00",
      "date_modified": "2024-12-29T00:00:00+00:00",
      "tags": ["blog","rdf","textmining","cb"],
      "_references": [{ "url": "https://doi.org/10.1371/journal.pcbi.0020121" },{ "url": "https://doi.org/10.1186/1751-0473-1-1" },{ "url": "https://doi.org/10.1371/journal.pcbi.0020087" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2006/11/28/code-coverage-making-sure-your-code-is.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/11/28/code-coverage-making-sure-your-code-is.html",
      "title": "Code coverage: making sure your code is tested",
      "content_html": "<p>Recently I <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/10/26/running-single-junit-tests-in-eclipse.html\">discussed JUnit testing from within Eclipse <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\nand blogged at <a href=\"http://search.blogger.com/?as_q=JUnit&amp;ie=UTF-8&amp;x=0&amp;y=0&amp;q=JUnit+blogurl:chem-bla-ics.blogspot.com&amp;ui=blg&amp;start=0\">several occasions</a>\nabout it in other situations. I cannot stress enough how useful unit testing is: it adds this extra set of\n<a href=\"http://en.wikipedia.org/wiki/Given_enough_eyeballs,_all_bugs_are_shallow\">eyeballs to make bugs shallow</a>.\nAnd it does that, indeed.</p>\n\n<p>Ensuring that you actually test all the code you write, however, is not easy. A couple of years back I read an article about\n<a href=\"http://hansel.sf.net/\">Hansel</a>, which does code coverage checking, but never got it nicely working for the\n<a href=\"http://cdk.sf.net/\">CDK project</a>. Never looked at that lately, so no idea how the current release would work out.\nHansel is an extension of <a href=\"http://www.junit.org/\">JUnit</a>, and requires hard coding class names, which conflicts with\nCDK’s module setup.</p>\n\n<p>Thomas Kuhn pointed me last week to <a href=\"http://emma.sf.net/\">Emma</a>, which seems a nice tool. It does not require hacking\nour source, and generates cool HTML:</p>\n\n<p><img src=\"/assets/images/emmaCoverage.png\" alt=\"\" /></p>\n\n<p>And even highlights the source code:</p>\n\n<p><img src=\"/assets/images/emmaCoverage1.png\" alt=\"\" /></p>\n\n<p>BTW, I seem to be in good company: <a href=\"http://www.gnu.org/software/classpath/\">Classpath</a> is\n<a href=\"http://builder.classpath.org/~cpdev/coverage/\">using it too</a>.</p>\n\n<p>Below is the command I issued to generate the HTML output. Rajarshi, maybe this can be integrated into\n<a href=\"http://cheminfo.informatics.indiana.edu/~rguha/code/java/nightly/\">Nightly</a>? Note that it only runs the tests\nfor the data module:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>ant dist-large dist-test-large\njava <span class=\"nt\">-cp</span> ~/tmp/emma-2.0.5312/lib/emma.jar emmarun <span class=\"se\">\\</span>\n  <span class=\"nt\">-cp</span> develjar/junit.jar:dist/jar/cdk-svn-20061128.jar:dist/jar/cdk-test-svn-20061128.jar <span class=\"se\">\\</span>\n  <span class=\"nt\">-r</span> html <span class=\"nt\">-sp</span> src junit.textui.TestRunner org.openscience.cdk.test.MdataTest\n</code></pre></div></div>",
      "summary": "Recently I discussed JUnit testing from within Eclipse , and blogged at several occasions about it in other situations. I cannot stress enough how useful unit testing is: it adds this extra set of eyeballs to make bugs shallow. And it does that, indeed.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/emmaCoverage1.png",
      "date_published": "2006-11-28T00:00:00+00:00",
      "date_modified": "2024-08-24T00:00:00+00:00",
      "tags": ["opensource","cdk"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2006/11/14/german-conference-on-chemoinformatics_14.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/11/14/german-conference-on-chemoinformatics_14.html",
      "title": "German Conference on Chemoinformatics 2006: Day 3",
      "content_html": "<p>Just some short quites note about the third day (see <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/11/13/german-conference-on-chemoinformatics.html\">day 1 and 2 <i class=\"fa-solid fa-recycle fa-xs\"></i></a>).\nToday’s program of the <a href=\"http://scholle.oc.uni-kiel.de/users/cic/tagungen/workshop06/index.html\">German Conference on Chemoinformatics</a>\nstarted with a presentation by Rzepa about his work on a semantic wiki (DOI:<a href=\"https://doi.org/10.1021/ci060139e\">10.1021/ci060139e</a>),\nwhich might be <a href=\"http://www.ch.ic.ac.uk/wiki/\">online here</a>. (He recorded a podcast, but I have not seen it online yet.) I wish I could\nsee the sources of those wiki pages, to see how that system integrates RDF, but at least <a href=\"http://www.jmol.org/\">Jmol</a> is running fine.\nThe presentation by Couch showed the status of the <a href=\"http://www.materialsgrid.org/\">Materials Grid project</a>, and how a guy called AgentX\ndoes all the hard work. Ihlenfeldt updated us about the status of <a href=\"http://pubchem.ncbi.nlm.nih.gov/\">PubChem</a>, and mostly on what they\nhad to do to keep the system from dying from its own success, for example using something called minimol. Googling does not seem to\nhelp, as that points to a number of things, but not any PubChem webpage. I am still waiting for a European organization to set up a mirror.</p>\n\n<p>After the coffee break, Kuhn showed a coarse grained force field, approximating molecules by hacking them up in fragment of 3-10 heavy atoms.\nI guess, a bit like some small molecules force fields do for methyls. Fragments within a molecule are tied together by springs, and intra-\nand intermolecular force field parameters by running MD runs on fragment pairs. Varnek argued that QSPR for melting point prediction has\nreached a fundamental limited, with an RMSE of around 30 to 40 degrees Celsius, which makes it quite unreasonable to decide whether a\ncompound with a predicted melting point of 40 degrees is solid or fluid at room temperature.</p>\n\n<p>You have to forgive me for not reporting on the afternoon session; I was tied up talking with people at our booth, talking about the CDK,\nTaverna, Bioclipse, Jmol, other opensource chemoinformatics tools, and chemoinformatics in general. Very nice, but exhausting. I might\nadvise the organization to set up a blog aggregator next year, though I am not sure whether there are others blogging about this conference.</p>",
      "summary": "Just some short quites note about the third day (see day 1 and 2 ). Today’s program of the German Conference on Chemoinformatics started with a presentation by Rzepa about his work on a semantic wiki (DOI:10.1021/ci060139e), which might be online here. (He recorded a podcast, but I have not seen it online yet.) I wish I could see the sources of those wiki pages, to see how that system integrates RDF, but at least Jmol is running fine. The presentation by Couch showed the status of the Materials Grid project, and how a guy called AgentX does all the hard work. Ihlenfeldt updated us about the status of PubChem, and mostly on what they had to do to keep the system from dying from its own success, for example using something called minimol. Googling does not seem to help, as that points to a number of things, but not any PubChem webpage. I am still waiting for a European organization to set up a mirror.",
      
      "date_published": "2006-11-14T00:00:00+00:00",
      "date_modified": "2024-08-24T00:00:00+00:00",
      "tags": ["cheminf","conference","semweb"],
      "_references": [{ "url": "https://doi.org/10.1021/ci060139e" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2006/11/13/german-conference-on-chemoinformatics.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/11/13/german-conference-on-chemoinformatics.html",
      "title": "German Conference on Chemoinformatics 2006: Day 1 and 2",
      "content_html": "<p>The <a href=\"http://scholle.oc.uni-kiel.de/users/cic/tagungen/workshop06/index.html\">2nd German Conference on Chemoinformatics</a>\nstarted yesterday, with two chemoinformatics tutorials: one on industrial chemoinformatics (I saw this presentation\nbefore… not sure when), with a good overview on integrating different information sources; the second one was about\nopensource chemoinformatics by <a href=\"http://wiki.cubic.uni-koeln.de/blog/index.php\">Christoph Steinbeck</a> (being involved\nin opensource chemoinformatics for almost 10 years now!), which included a <a href=\"http://www.bioclipse.net/\">Bioclipse</a>\ndemo (by me) and a demo by Thomas Kuhn on the <a href=\"http://cdk.sf.net/\">CDK</a> based chemoinformatics plugin to\n<a href=\"http://taverna.sf.net/\">Taverna</a>. Other opensource projects of the <a href=\"http://www.blueobelisk.org/\">Blue Obelisk</a>\nmovement were mentioned and a few outside it too.</p>\n\n<p>The conference is in honor of the life work by <a href=\"http://www2.chemie.uni-erlangen.de/\">Prof. Gasteiger</a>, who gave an\noverview of chemoinformatics in his group, Germany and Europe. He stressed the need of education in chemoinformatics,\nlike in <a href=\"http://wiki.cubic.uni-koeln.de/blog/pivot/entry.php?id=12\">Obernai</a>. He also highlighted that we, today,\nare still solving the same problem as 30 years ago. Which is true, which is why this channel is called\n<a href=\"https://chem-bla-ics.linkedchemistry.info/\">Chem-bla-ics <i class=\"fa-solid fa-recycle fa-xs\"></i></a>, trying to solve that problem. When asked if opensource chemoinformatics\nform the start would have addressed this, he replied that he requires people to cooperatively do research with his\ngroup; opensource clearly cannot enforce that.</p>\n\n<h1 id=\"day-2\">Day 2</h1>\n\n<p>Todays program had a number of interesting presentations (I, unfortunately, missed the first presentation, so\nhave to visit that group soon now, to make up for that.) <a href=\"http://www.dq.fct.unl.pt/staff/jas/introduction.htm\">Prof. Aires-de-Sousa</a>\nshowed his work on MOLMAP for mapping metabolic networks (<a href=\"http://www.genome.jp/kegg/\">KEGG</a> really, see my\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2006/04/04/mining-kegg-pathway-database-with-self.html\">earlier blog <i class=\"fa-solid fa-recycle fa-xs\"></i></a>), and showed,\njust as proof of principle, classification of organisms based on this.</p>\n\n<p>J. Weisser talked about docking, still an obligatory topic. This work really showed two new approaches: the use\nof QM partial charges (the example showed an improvement in RMSD of a factor 10, not very statistical, but\npromising indeed); the second was the fact that water does not like to be in tight spots, because of reduced\npossibilities for hydrogen bonding. A concept common in understand supramolecular phenomenon, but I have not\nseen this applied to docking before. But I am no expert in that field. M. Wagner showed work on using KEGG\ndata to estimate likely metabolites, and the use in reducing effects of metabolic degradation. T. Schroeter\nintroduced me to <a href=\"http://www.gaussianprocess.org/\">gaussian processes</a>, a new data modeling method. Quite\nembarrassing to get introduced to such, as being specialized in modeling methods for chemical problems.</p>\n\n<p>The poster session was, as normally, really exhausting, talking to a lot of people. Having a booth at the exhibition\non opensource chemoinformatics added a nice twist to this. I therefore skipped the FIZ-award winner lectures, so I\nhope someone else will blog about those.</p>\n\n<p>One last note: <a href=\"http://www.sun.com/software/opensource/java/\">Sun started releasing their Java platform under the GPL license</a>.\n<a href=\"http://wwmm.ch.cam.ac.uk/blogs/downing/\">Jim</a>, seems that they <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/10/25/being-good-opensource-user.html\">proved me wrong <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.\nThe class library is still not GPL, but is expected to become licensed such somewhere in the first half of next year.</p>",
      "summary": "The 2nd German Conference on Chemoinformatics started yesterday, with two chemoinformatics tutorials: one on industrial chemoinformatics (I saw this presentation before… not sure when), with a good overview on integrating different information sources; the second one was about opensource chemoinformatics by Christoph Steinbeck (being involved in opensource chemoinformatics for almost 10 years now!), which included a Bioclipse demo (by me) and a demo by Thomas Kuhn on the CDK based chemoinformatics plugin to Taverna. Other opensource projects of the Blue Obelisk movement were mentioned and a few outside it too.",
      
      "date_published": "2006-11-13T00:00:00+00:00",
      "date_modified": "2024-08-24T00:00:00+00:00",
      "tags": ["cheminf","conference","openscience","bioclipse","cdk","taverna","java"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2006/11/12/organic-chemists-can-now-tune.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/11/12/organic-chemists-can-now-tune.html",
      "title": "Organic chemists can now tune properties without changing the molecular structure??",
      "content_html": "<p><a href=\"http://www.paulbracher.com/blog/?p=217\">Paul Bracher</a> and <a href=\"http://blogs.nature.com/thescepticalchymist/2006/08/the_big_picture.html\">Joshua Finkelstein</a>\npointed my attention to a nice discussion in <a href=\"http://www.nature.com/\">Nature</a> on the future of chemistry, in\n<a href=\"http://www.nature.com/nature/journal/v442/n7102/full/442500a.html\">What Chemists Want to Know</a>, by Philip Ball.\nPaul and Joshua already reviewed it thoroughly, but I could not resist commenting in it too. Having chosen chemistry\nas specialization when I went to <a href=\"http://www.ru.nl/\">university</a>, and with a minor in supramolecular chemistry,\nthis is a something I do relate to.</p>\n\n<p>A main theme is whether chemistry is unexplored enough to justify further academic research and education. Ball’s answer is\nyes, and came up with a six questions, of which I found this one most intriguing: <em>what is the chemical basis of thought and memory</em>.\nBut the article interestingly also discusses if chemistry has not become a tool for more interesting fields of research.\nThe Nobel prize winners Ball interviewed do not think so.</p>\n\n<p>One quote took my surprise: <em>Where is synthetic astronomy - changing the gravitational constant to see what effect that\nhas on the properties of the Universe, and thus perhaps improving it?</em> Well, I might be out of the synthetic organic\nchemistry for too long now, but this is not a quote I would like to be in Nature with; is synthetic chemistry now\nable, then, to modify the nature, strengths of bonds now?? can they actually change molecular properties without\nchanging the connectivity?? Moreover, astronomers have changed the properties of objects in our universe: since\nyears they have been reducing the mass of the earth by sending of probes to other objects (satellites etc).\nLikewise, chemistry is <strong>not</strong> changing nature, it is just exploring all compounds we never had purified in our\nglassware yet. Synthesis is nowhere like changing nature.</p>\n\n<p>There is one other comment I would like to post here. I strongly agree that chemistry in itself is important to have\nas separate educational and research topic at universities. Simply because too databases are, from a chemical point\nof view, messed up. For example, <a href=\"http://www.genome.jp/kegg/\">KEGG</a> and the <a href=\"http://www.pdb.org/\">PDB</a> are know to\nhave many chemical errors, though these databases are rather important indeed. We need people around to educate\npeople and point out those errors, if life sciences itself is to have a future.</p>",
      "summary": "Paul Bracher and Joshua Finkelstein pointed my attention to a nice discussion in Nature on the future of chemistry, in What Chemists Want to Know, by Philip Ball. Paul and Joshua already reviewed it thoroughly, but I could not resist commenting in it too. Having chosen chemistry as specialization when I went to university, and with a minor in supramolecular chemistry, this is a something I do relate to.",
      
      "date_published": "2006-11-12T00:00:00+00:00",
      "date_modified": "2006-11-12T00:00:00+00:00",
      "tags": ["chemistry","nature"],
      "_references": [{ "url": "https://doi.org/10.1038/442500a" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2006/11/07/when-is-open-source-chemoinformatics.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/11/07/when-is-open-source-chemoinformatics.html",
      "title": "When is open source chemoinformatics successful?",
      "content_html": "<p>Open source chemoinformatics has become a common phenomenon, though many projects are small in nature:\nsource code is developed by only few developers, or even in a closed manner and released when considered\ndone. Within open source software there is room for distinguishing a subset of open development\nchemoinformatics, that is, Bazar-like, instead of Cathedral-like (see\n<a href=\"http://catb.org/esr/writings/cathedral-bazaar/cathedral-bazaar/\">ESR famous writing</a>).</p>\n\n<p>Measuring the importance of an open source project can be done by many measures, such as the number of people\non the user and developers mailing lists, number of downloads, number of source lines of code\n[<a href=\"http://en.wikipedia.org/wiki/Source_lines_of_code\">wp:SLOC</a>], number of independent development locations,\nand rankings on, for example, <a href=\"http://www.sourceforge.net/\">SourceForge</a> or <a href=\"http://www.google.com/\">Google</a>.\nJust to name a few.</p>\n\n<p>Scientific importance of an open source project can sometimes be measured by a citation index; that is, only\nwhen there is a landmark article for the project. <a href=\"http://www.umass.edu/microbio/rasmol/index2.htm\">Rasmol</a>\nis such a project: a first article was published in 1995 (DOI:<a href=\"https://doi.org/10.1016/S0968-0004(00)89080-5\">10.1016/S0968-0004(00)89080-5</a>),\nand a follow up in 2000 (DOI:<a href=\"https://doi.org/10.1016/S0968-0004(00)01606-6\">10.1016/S0968-0004(00)01606-6</a>).\nThe first was cited <strong>1190</strong> times, and the second 65 times (as stated on <a href=\"http://www.isiknowledge.com/wos/\">Web-of-Science</a>).\nQuite successful indeed.</p>\n\n<p>OK, it is not even 100+, but I am quite happy with the <a href=\"http://wiki.cubic.uni-koeln.de/cdkwiki/doku.php?id=literature\">scientific impact of the CDK</a>\nso far: the 2003 CDK article (DOI:<a href=\"https://doi.org/10.1021/ci025584y\">10.1021/ci025584y</a>) was cited 24 times\nnow, and the just published 2006 article (DOI:<a href=\"https://doi.org/10.2174/138161206777585274\">10.2174/138161206777585274</a>)\nonce:</p>\n\n<p><img src=\"/assets/images/cdkCitationCounts.png\" alt=\"\" /></p>",
      "summary": "Open source chemoinformatics has become a common phenomenon, though many projects are small in nature: source code is developed by only few developers, or even in a closed manner and released when considered done. Within open source software there is room for distinguishing a subset of open development chemoinformatics, that is, Bazar-like, instead of Cathedral-like (see ESR famous writing).",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/cdkCitationCounts.png",
      "date_published": "2006-11-07T00:00:00+00:00",
      "date_modified": "2024-08-16T00:00:00+00:00",
      "tags": ["cdk","rasmol","cheminf"],
      "_references": [{ "url": "https://doi.org/10.1016/S0968-0004(00)89080-5" },{ "url": "https://doi.org/10.1016/S0968-0004(00)01606-6" },{ "url": "https://doi.org/10.1021/CI025584Y" },{ "url": "https://doi.org/10.2174/138161206777585274" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2006/11/03/chemical-blogspace-updates.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/11/03/chemical-blogspace-updates.html",
      "title": "Chemical Blogspace updates",
      "content_html": "<p><a href=\"http://wiki.cubic.uni-koeln.de/pg/\">Chemical Blogspace</a> is up and running fine for some time now. Since the\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2006/08/25/chemical-blogspace.html\">start <i class=\"fa-solid fa-recycle fa-xs\"></i></a> the number of aggregated blogs increased from 19 to\n<a href=\"http://wiki.cubic.uni-koeln.de/pg/all_blogs.php\">64</a> now, of which a number are situated at\n<a href=\"http://chemblogs.org/\">ChemBlogs</a> which is a site where you can run a blog. Meanwhile, the number of\n<a href=\"http://wiki.cubic.uni-koeln.de/pg/all_papers.php\">cited papers</a> went up to 186! The\n<a href=\"http://pubs.acs.org/journals/jacsat/\">JACS</a> is most popular so far, followed by the\n<a href=\"http://www3.interscience.wiley.com/cgi-bin/jhome/26737\">Angewandte Chemie Int. Ed.</a></p>\n\n<p>As mentioned <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/08/25/chemical-blogspace.html\">before <i class=\"fa-solid fa-recycle fa-xs\"></i></a>, the software was taken\n<a href=\"http://postgenomic.com/\">Postgenomic.com</a>, which has upgraded considerably and released new software since the author\n<a href=\"http://www.ghastlyfop.com/blog/2006/09/changes.html\">moved to Nature</a>, but I have not found time to follow that upgrade\nyet :( The <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/02/25/hacking-inchi-support-into.html\">promised InChI support <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nis still pending too.</p>",
      "summary": "Chemical Blogspace is up and running fine for some time now. Since the start the number of aggregated blogs increased from 19 to 64 now, of which a number are situated at ChemBlogs which is a site where you can run a blog. Meanwhile, the number of cited papers went up to 186! The JACS is most popular so far, followed by the Angewandte Chemie Int. Ed.",
      
      "date_published": "2006-11-03T00:00:00+00:00",
      "date_modified": "2024-08-24T00:00:00+00:00",
      "tags": ["cb","inchi"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2006/11/03/bioclipse-workshop-short-but.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/11/03/bioclipse-workshop-short-but.html",
      "title": "Bioclipse Workshop: short but productive",
      "content_html": "<p>The <a href=\"http://www.bioclipse.net/\">Bioclipse</a> <a href=\"http://wiki.bioclipse.net/index.php?title=Bioclipse_Workshop_Oct/Nov_2006\">Workshop</a>\nhas ended and, for just three days, turned out <a href=\"http://wiki.bioclipse.net/index.php?title=Outcome_of_the_Bioclipse_autumn_workshop_2006\">quite productive</a>.\nWe have first bits of scripting support for JavaScript using <a href=\"http://www.mozilla.org/rhino/\">Rhino</a>. At this moment the\nscripting plugin needs to explicit depend on plugins to be able to access their classpath, but we plan to solve that.\nAn example script:</p>\n\n<div class=\"language-javascript highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"c1\">// to have short identifiers</span>\n<span class=\"nb\">Array</span> <span class=\"o\">=</span> <span class=\"nb\">Packages</span><span class=\"p\">.</span><span class=\"nx\">java</span><span class=\"p\">.</span><span class=\"nx\">lang</span><span class=\"p\">.</span><span class=\"nx\">reflect</span><span class=\"p\">.</span><span class=\"nb\">Array</span><span class=\"p\">;</span>\n<span class=\"nb\">String</span> <span class=\"o\">=</span> <span class=\"nb\">Packages</span><span class=\"p\">.</span><span class=\"nx\">java</span><span class=\"p\">.</span><span class=\"nx\">lang</span><span class=\"p\">.</span><span class=\"nb\">String</span><span class=\"p\">;</span>\n<span class=\"nx\">msgBox</span> <span class=\"o\">=</span> <span class=\"nb\">Packages</span><span class=\"p\">.</span><span class=\"nx\">net</span><span class=\"p\">.</span><span class=\"nx\">bioclipse</span><span class=\"p\">.</span><span class=\"nx\">plugins</span><span class=\"p\">.</span><span class=\"nx\">bc_rhino</span><span class=\"p\">.</span><span class=\"nx\">ShowBcMsgBox</span><span class=\"p\">;</span>\n<span class=\"nx\">DbfetchServiceServiceLocator</span> <span class=\"o\">=</span>\n  <span class=\"nb\">Packages</span><span class=\"p\">.</span><span class=\"nx\">uk</span><span class=\"p\">.</span><span class=\"nx\">ac</span><span class=\"p\">.</span><span class=\"nx\">ebi</span><span class=\"p\">.</span><span class=\"nx\">www</span><span class=\"p\">.</span><span class=\"nx\">ws</span><span class=\"p\">.</span><span class=\"nx\">services</span><span class=\"p\">.</span><span class=\"nx\">urn</span><span class=\"p\">.</span><span class=\"nx\">Dbfetch</span><span class=\"p\">.</span><span class=\"nx\">DbfetchServiceServiceLocator</span><span class=\"p\">;</span>\n\n<span class=\"c1\">// get data</span>\n<span class=\"nx\">service</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"nc\">DbfetchServiceServiceLocator</span><span class=\"p\">();</span>\n<span class=\"nx\">strarray</span> <span class=\"o\">=</span> <span class=\"nx\">service</span><span class=\"p\">.</span><span class=\"nf\">getUrnDbfetch</span><span class=\"p\">().</span><span class=\"nf\">fetchData</span><span class=\"p\">(</span><span class=\"dl\">\"</span><span class=\"s2\">refseq:NM_210721</span><span class=\"dl\">\"</span><span class=\"p\">,</span> <span class=\"dl\">\"</span><span class=\"s2\">refseq</span><span class=\"dl\">\"</span><span class=\"p\">,</span> <span class=\"dl\">\"</span><span class=\"s2\">raw</span><span class=\"dl\">\"</span><span class=\"p\">);</span>\n\n<span class=\"c1\">// make readable</span>\n<span class=\"nx\">str</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"nc\">String</span><span class=\"p\">();</span>\n<span class=\"k\">for </span><span class=\"p\">(</span><span class=\"nx\">i</span> <span class=\"o\">=</span> <span class=\"mi\">0</span><span class=\"p\">;</span> <span class=\"nx\">i</span> <span class=\"o\">&lt;</span> <span class=\"nb\">Array</span><span class=\"p\">.</span><span class=\"nf\">getLength</span><span class=\"p\">(</span><span class=\"nx\">strarray</span><span class=\"p\">);</span> <span class=\"nx\">i</span><span class=\"o\">++</span><span class=\"p\">)</span> <span class=\"p\">{</span>\n  <span class=\"k\">if </span><span class=\"p\">(</span><span class=\"nx\">i</span> <span class=\"o\">!=</span> <span class=\"mi\">0</span><span class=\"p\">)</span>\n  <span class=\"nx\">str</span> <span class=\"o\">=</span> <span class=\"nx\">str</span> <span class=\"o\">+</span> <span class=\"p\">(</span><span class=\"dl\">\"</span><span class=\"se\">\\n</span><span class=\"dl\">\"</span><span class=\"p\">);</span>\n  <span class=\"nx\">str</span> <span class=\"o\">=</span> <span class=\"nx\">str</span> <span class=\"o\">+</span> <span class=\"nx\">strarray</span><span class=\"p\">[</span><span class=\"nx\">i</span><span class=\"p\">];</span>\n<span class=\"p\">}</span>\n\n<span class=\"c1\">// show</span>\n<span class=\"nx\">msgBox</span><span class=\"p\">.</span><span class=\"nc\">ShowStatic</span><span class=\"p\">(</span><span class=\"nx\">str</span><span class=\"p\">);</span>\n</code></pre></div></div>\n\n<p>It’s just a short example that uses webservice technology in Bioclipse to fetch a sequence.</p>\n\n<h1 id=\"qsar-support\">QSAR support</h1>\n\n<p>QSAR support is getting along too, with a new DescriptorProvider extension point in <a href=\"http://svn.sourceforge.net/viewvc/bioclipse/trunk/\">trunk/</a>\nand work is progressing on a wizard that allows selecting descriptors and a CDK backend. The output of the wizard is a matrix resource, for\nwhich we already have a rich editor. A <a href=\"http://www-ra.informatik.uni-tuebingen.de/software/joelib/\">JOELib</a> plugin has been suggested,\nas it has a good deal of QSAR descriptors too; <a href=\"http://miningdrugs.blogspot.com/\">Jörg</a>, interested in doing a tiny bit of Bioclipse hacking?</p>\n\n<p>A full proceedings is available <a href=\"http://wiki.bioclipse.net/index.php?title=Outcome_of_the_Bioclipse_autumn_workshop_2006\">online</a>.</p>",
      "summary": "The Bioclipse Workshop has ended and, for just three days, turned out quite productive. We have first bits of scripting support for JavaScript using Rhino. At this moment the scripting plugin needs to explicit depend on plugins to be able to access their classpath, but we plan to solve that. An example script:",
      
      "date_published": "2006-11-03T00:00:00+00:00",
      "date_modified": "2006-11-03T00:00:00+00:00",
      "tags": ["bioclipse","qsar","javascript","conference"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2006/11/01/bioclipse-workshop-is-in-progress.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/11/01/bioclipse-workshop-is-in-progress.html",
      "title": "The Bioclipse Workshop is in progress",
      "content_html": "<p>The <a href=\"http://www.bioclipse.net/\">Bioclipse</a> <a href=\"http://wiki.bioclipse.net/index.php?title=Bioclipse_Workshop_Oct/Nov_2006\">Workshop</a>\nis in progress, and <a href=\"http://bioclipse.blogspot.com/\">Ola</a> is now leading a discussion about future releases and functionality.\nProceedings are <a href=\"http://wiki.bioclipse.net/index.php?title=Outcome_of_the_Bioclipse_autumn_workshop_2006\">live updated</a>,\nand presentation sheets will be available shortly.</p>",
      "summary": "The Bioclipse Workshop is in progress, and Ola is now leading a discussion about future releases and functionality. Proceedings are live updated, and presentation sheets will be available shortly.",
      
      "date_published": "2006-11-01T00:00:00+00:00",
      "date_modified": "2006-11-01T00:00:00+00:00",
      "tags": ["bioclipse","conference"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2006/10/28/opensource-chemistry-and-opensource.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/10/28/opensource-chemistry-and-opensource.html",
      "title": "Opensource Chemistry and Opensource Chemoinformatics",
      "content_html": "<p>The <a href=\"http://hardly.cubic.uni-koeln.de/mailman/listinfo/blue-obelisk\">Blue Obelisk mailing list</a> has seen an\n<a href=\"http://hardly.cubic.uni-koeln.de/pipermail/blue-obelisk/2006-September/thread.html\">interesting discussion</a> on ambiguity in the term ‘open source’,\ntriggered by a study by <a href=\"http://www.blogger.com/profile/19401667\">Beth Ritter Guth</a>. For example, <a href=\"http://www.blogger.com/profile/6833158\">Jean-Claude Bradley</a>\nperforms ‘open source’ science (see his <a href=\"http://usefulchem.blogspot.com/\">Useful Chemistry blog</a>) who is not opposed to using\nclosed source software, while the <a href=\"http://www.blueobelisk.org/\">Blue Obelisk</a> is about ‘open source’ software. It seemed that\nthis was contradicting, and <a href=\"http://wwmm.ch.cam.ac.uk/wikis/wwmm/index.php/Peter_Murray_Rust\">Peter Murray-Rust</a>\n[<a href=\"http://en.wikipedia.org/\">wp</a>:<a href=\"http://en.wikipedia.org/wiki/Peter_Murray-Rust\">en</a>] wrote up a lengthy\n<a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=69\">overview of the use of the term ‘open’</a>.</p>\n\n<p>Now, I have been giving the ‘open source’ ambiguity some thinking (well, about a month or so…), and came to the following conclusions:</p>\n\n<ol>\n  <li>open source has the exact same meaning in both Bradley-like open source chemistry, and BO-like open source chemoinformatics</li>\n  <li>both have the same goal</li>\n  <li>it’s just the research topic that is different</li>\n</ol>\n\n<h1 id=\"ad-1-same-meaning-of-open-source\">Ad 1: same meaning of ‘open source’</h1>\n\n<p>I think ‘open source’ just means that every has the right to reproduce (and distribute and the same or modified shape)\nproducts created from the source.</p>\n\n<p>In ‘open source chemistry’ (Bradley-like, sorry for the term :) the source is are the details about the chemical reactions\nto perform, the product being being able to run the whole reaction pathway.</p>\n\n<p>In ‘open source chemoinformatics’ (Blue Obelisk-like) the source is the procedure that described how to get from one set\nof bits to another, really quite like getting from one molecule to another. Chemoinformatics, being IT science, just\nmakes it a lot easier to distribute the algorithm to do that. (Sure, <a href=\"https://doi.org/10.1021/ci0502698\">CMLReact</a>\nis getting along quite nicely.)</p>\n\n<p>The analogy even goes further, both science do not only depend on open source. Like Bradley-like open source science allows\nembedding proprietary stuff (glass-ware, closed-source software, chemical both from <a href=\"http://www.fisherscientific.com/\">Acros (now Fisher)</a>,\n…), so does BO-like open source science, which uses tons of proprietary stuff too (computers, Sun’s JVM, MS-Windows).</p>\n\n<h1 id=\"ad-2-same-goal\">Ad 2: same goal</h1>\n\n<p>I can be short on this one. For both ‘open source’ initiatives the goal is to share knowledge and make science reproducible.</p>\n\n<h1 id=\"ad-3-different-topic\">Ad 3: different topic</h1>\n\n<p>So, the confusion was just coming from the fact to what extend ‘open source’ tools are being used. Can you do open source\nscience without using open source chemoinformatics? Sure. In a utopic situation, all tools and small bits are ‘open source’\n(though <a href=\"http://wwmm.ch.cam.ac.uk/blogs/corbett/?p=7\">some are agnostic to this</a>). But fact is, that many Blue Obelisk members use ‘closed source’ tools all the time,\neven if they do not have too. At least everyone is doing ‘open source’ on their specialisms, both in open source chemistry\nand in open source chemoinformatics.</p>\n\n<p>I guess we should just be stop being short on ‘open source software’ to remove any ambiguity of the term ‘open source’.\nAs a spin-off, this would make Bradley’s work fit in nicely with ODOSOS: open data, open source, open standards.</p>",
      "summary": "The Blue Obelisk mailing list has seen an interesting discussion on ambiguity in the term ‘open source’, triggered by a study by Beth Ritter Guth. For example, Jean-Claude Bradley performs ‘open source’ science (see his Useful Chemistry blog) who is not opposed to using closed source software, while the Blue Obelisk is about ‘open source’ software. It seemed that this was contradicting, and Peter Murray-Rust [wp:en] wrote up a lengthy overview of the use of the term ‘open’.",
      
      "date_published": "2006-10-28T00:00:00+00:00",
      "date_modified": "2006-10-28T00:00:00+00:00",
      "tags": ["openscience","opensource","blue-obelisk","chemistry","cheminf"],
      "_references": [{ "url": "https://doi.org/10.1021/ci0502698" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/3htbd-qma24",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/10/26/running-single-junit-tests-in-eclipse.html",
      "title": "Running single JUnit tests in Eclipse",
      "content_html": "<p>Unit testing is important when developing source code. <a href=\"http://www.junit.org/\">JUnit</a> provides a library to facilitate this in Java,\nand <a href=\"http://www.eclipse.org/te\">Eclipse</a> had the functionality to run JUnit tests. Even better, it allows you to run single JUnit\ntests, even in debug mode:</p>\n\n<p><img src=\"/assets/images/JUnitTestInDebugMode.png\" alt=\"\" /></p>\n\n<p>Just open the java class in your Package Explorer, right click on the JUnit method you want to run, then pick <code class=\"language-plaintext highlighter-rouge\">Run As</code> or <code class=\"language-plaintext highlighter-rouge\">Debug As</code>,\nand then <code class=\"language-plaintext highlighter-rouge\">JUnit test</code>.</p>",
      "summary": "Unit testing is important when developing source code. JUnit provides a library to facilitate this in Java, and Eclipse had the functionality to run JUnit tests. Even better, it allows you to run single JUnit tests, even in debug mode:",
      
      "date_published": "2006-10-26T00:00:00+00:00",
      "date_modified": "2024-08-12T00:00:00+00:00",
      "tags": ["junit","eclipse"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2006/10/25/being-good-opensource-user.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/10/25/being-good-opensource-user.html",
      "title": "Being a good opensource user",
      "content_html": "<p>There are many ways to contribute to opensource software (OSS), programming only being one of them. I develop OSS, but use OSS too.\nFor example, I am a big user of the <a href=\"http://www.kernel.org/\">Linux</a> kernel, the <a href=\"http://www.kde.org/\">KDE desktop</a>, <a href=\"http://www.kubuntu.org/\">Kubuntu</a>,\n<a href=\"http://www.debian.org/\">Debian</a> (I have unstable in a <a href=\"http://www.ubuntuforums.org/showthread.php?t=24575\">chroot</a>),\n<a href=\"http://www.getfirefox.com/\">Firefox</a>, <a href=\"http://www.eclipse.org/\">Eclipse</a>, <a href=\"http://www.gnu.org/software/classpath/\">Classpath</a>, and many,\nmany others. What these have in common, is that I generally have no time to look into the source code of these projects. A small patch excluded,\nI am really a regular user of these projects.</p>\n\n<p>However, I try not to <a href=\"http://en.wikipedia.org/wiki/Leech_(computing)\">leech</a> (see also <a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=78\">Peter’s related comment on that</a>):\nI care about these projects and, therefore, I file bug reports. Sometimes, I even join the developers and talk to them via commonly used IRC and\nmailing lists. Even, every now and then I get this itch and then I do look up source code and contribute a patch. But filing bug reports is the\nleast one can do, the least everyone should do.</p>\n\n<h1 id=\"classpath\">Classpath</h1>\n\n<p><a href=\"http://www.gnu.org/software/classpath/\">Classpath</a> is the GNU project to provide a free Java library, i.e. the set of <code class=\"language-plaintext highlighter-rouge\">java.*</code> classes\nthat come with the Sun JVM. It is not a virtual machine, though, for which several opensource implementations are available, many of\nwhich use Classpath as library provider. They have a very nice chat channel at irc.freenode.net, called <code class=\"language-plaintext highlighter-rouge\">#classpath</code>.\nThere wiki provides a <a href=\"http://developer.classpath.org/mediation/FreeSwingTestApps\">platform for given feedback</a> on how well software\nruns. A bug track system (BTS) is <a href=\"http://www.gnu.org/software/classpath/bugs.html\">available too</a>. An overview of the bugs that I filed,\ncan be found at <a href=\"http://del.icio.us/egonw\">my del.icio.us account</a>: <a href=\"http://del.icio.us/egonw/bugreports%2BClasspath\">bugreports+Classpath</a>.</p>\n\n<p>Needless to say, Classpath is important in making our Java based chemoinformatics truely opensource.</p>\n\n<h1 id=\"debiankubuntu\">Debian/Kubuntu</h1>\n\n<p>Things are different for <a href=\"http://www.debian.org/\">Debian</a> and <a href=\"http://www.kubuntu.org/\">Kubuntu</a>: these are distributions and, except for\nsome patching, are generally not involved software development as done by upstream. However, they generally do appreciate to know about\nbugs too, so there is some duplication of bug reports here.</p>\n\n<p>That said, they do provide nice tools for bug reporting which works for all packages that they distribute. Debian has\n<a href=\"http://packages.debian.org/reportbug\">reportbug</a> and Kubuntu has <a href=\"http://launchpad.net/\">Launchpad</a>. An over view of bugs I reported with\nDebian can be found at del.icio.us <a href=\"http://del.icio.us/egonw/bugreports%2Bdebian\">bugreports+debian</a>. I do not have bug reports in Launchpad\nyet, but two can be found in mailing list archives, see del.icio.us <a href=\"http://del.icio.us/egonw/bugreports%2Bubuntu\">bugreports+ubuntu</a>.</p>\n\n<h1 id=\"kde\">KDE</h1>\n\n<p>I also tracked back two bugs I reported with KDE, see del.icio.us <a href=\"http://del.icio.us/egonw/bugreports%2BKDE\">bugreports+KDE</a>.</p>\n\n<h1 id=\"sourceforge\">SourceForge</h1>\n\n<p>Surely, I filed many more bugs to many other projects. A long list of bug reports can be found on SourceForge. However, it seems not\npossible to make an easy list of that :(</p>",
      "summary": "There are many ways to contribute to opensource software (OSS), programming only being one of them. I develop OSS, but use OSS too. For example, I am a big user of the Linux kernel, the KDE desktop, Kubuntu, Debian (I have unstable in a chroot), Firefox, Eclipse, Classpath, and many, many others. What these have in common, is that I generally have no time to look into the source code of these projects. A small patch excluded, I am really a regular user of these projects.",
      
      "date_published": "2006-10-25T00:00:00+00:00",
      "date_modified": "2006-10-25T00:00:00+00:00",
      "tags": ["openscience"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2006/10/11/are-chemogenomics-and.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/10/11/are-chemogenomics-and.html",
      "title": "Are chemogenomics and proteochemometrics the same?",
      "content_html": "<p><a href=\"http://www.blogger.com/profile/2366764\">Joerg Wegner</a> <a href=\"http://miningdrugs.blogspot.com/2006/09/chemogenomics-structuring-drug.html\">recently blogged</a>\nabout <em>Chemogenomics: structuring the drug discovery process to gene families</em> by C.J. Harris and A. P. Stevens in Drug Discov Today\n(DOI: <a href=\"https://doi.org/10.1016/j.drudis.2006.08.013\">10.1016/j.drudis.2006.08.013</a>). This review article provides a nice overview of a trend in\nmathematical modelling of the interaction of small organic molecules with proteins, often referred to as <a href=\"http://en.wikipedia.org/wiki/QSAR\">QSAR</a>.\nWhat the article does not discuss, is the <a href=\"http://www.proteochemometrics.org/index.php?option=com_content&amp;task=view&amp;id=20&amp;Itemid=22\">work by the group of Jarl Wikberg</a>\nwho coined the term proteochemometrics (see PubMed: <a href=\"https://pubmed.ncbi.nlm.nih.gov/11342268/\">11342268</a>).</p>",
      "summary": "Joerg Wegner recently blogged about Chemogenomics: structuring the drug discovery process to gene families by C.J. Harris and A. P. Stevens in Drug Discov Today (DOI: 10.1016/j.drudis.2006.08.013). This review article provides a nice overview of a trend in mathematical modelling of the interaction of small organic molecules with proteins, often referred to as QSAR. What the article does not discuss, is the work by the group of Jarl Wikberg who coined the term proteochemometrics (see PubMed: 11342268).",
      
      "date_published": "2006-10-11T00:00:00+00:00",
      "date_modified": "2006-10-11T00:00:00+00:00",
      "tags": ["cheminf","bioinfo"],
      "_references": [{ "url": "https://doi.org/10.1016/j.drudis.2006.08.013" },{ "url": "https://doi.org/10.1016/s0304-4165(00)00187-2" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/1sk32-0jb54",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/10/06/googles-new-search-engine-code-search.html",
      "title": "Google&apos;s new search engine: /* Code Search */",
      "content_html": "<p><a href=\"http://www.google.com/\">Google</a> has set up a new search enginge specifically for source code:\n<a href=\"http://www.google.com/codesearch\">/* Code Search */</a>. Important difference with their normal search engine is that it\nallows restricting your search by programming language, license and filename and package. I have not been able to figure\nout how to use ‘package’ yet, but the others are pretty clear. For example: <code class=\"language-plaintext highlighter-rouge\">AtomContainer license:LGPL lang:java</code>\nshould do it. The search results show filenames, licenses and programming languages:</p>\n\n<p><img src=\"/assets/images/google_code.png\" alt=\"\" /></p>\n\n<p>Alternatively, you can use <a href=\"http://www.koders.com/\">Koders</a>, which is a source code search engine too. It has been around\nfor quite some time now, and shows the copyright notice too. Additionally, Koders offers a\n<a href=\"http://www.koders.com/info.aspx?c=tools\">plugin for Eclipse</a> which adds a search ‘view’ which will show the HTML from the\nwebsite in an editor window inside Eclipse.</p>",
      "summary": "Google has set up a new search enginge specifically for source code: /* Code Search */. Important difference with their normal search engine is that it allows restricting your search by programming language, license and filename and package. I have not been able to figure out how to use ‘package’ yet, but the others are pretty clear. For example: AtomContainer license:LGPL lang:java should do it. The search results show filenames, licenses and programming languages:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/google_code.png",
      "date_published": "2006-10-06T00:00:00+00:00",
      "date_modified": "2006-10-06T00:00:00+00:00",
      "tags": ["google","opensource"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2006/10/04/bioinformatics-open-source-or-open.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/10/04/bioinformatics-open-source-or-open.html",
      "title": "Bioinformatics: Open Source or Open Access??",
      "content_html": "<p>I have heard that bioinformatics is ahead of chemoinformatics. However, I discoverd that this is not necessarily the case,\nwhile preparing for a homology modeling course I gave this week at the <a href=\"http://www.cubic.uni-koeln.de/\">CUBIC</a>. Open Access\nis really no issue there, with open access journals and many open access databases. But it is different when it comes down\nto open source software.</p>\n\n<p>Below is a list of bioinformatics programs which are free for academic use, but not open:</p>\n\n<ul>\n  <li><a href=\"http://www-cryst.bioc.cam.ac.uk/~joy/\">JOY</a> (free after getting license)</li>\n  <li><a href=\"http://www.cryst.chem.uu.nl/platon/\">PLATON</a> (free download)</li>\n  <li><a href=\"http://www.biochem.ucl.ac.uk/~roman/procheck/procheck.html\">PROCHECK</a> (free after getting license)</li>\n  <li><a href=\"http://www.predictprotein.org/\">ProteinPredict</a> (free download)</li>\n  <li><a href=\"http://dunbrack.fccc.edu/SCWRL3.php\">SCWRL</a> (free after getting license)</li>\n  <li><a href=\"http://bioinf.cs.ucl.ac.uk/threader/\">THREADER</a> (free after getting license)</li>\n  <li><a href=\"http://swift.cmbi.ru.nl/gv/whatcheck/\">WHAT_CHECK</a> (free download)</li>\n  <li><a href=\"http://swift.cmbi.ru.nl/whatif/\">WHAT_IF</a> (free after getting license)</li>\n</ul>\n\n<p>And this not even includes the many websites which do not offer the software behind them. And these programs cover several\nsteps in the whole homology modeling process. Open source homology modeling is not possible at this moment :(</p>\n\n<p>But, on the bright side, there are already some open source programs involved too:</p>\n\n<ul>\n  <li><a href=\"http://www.ncbi.nlm.nih.gov/blast/\">BLAST</a> (public domain)</li>\n  <li><a href=\"http://www.gromacs.org/\">GROMACS</a> (GPL)</li>\n</ul>\n\n<p>And protein structure viewers is hardly a problem at all; several open source viewers are available, among which\n<a href=\"http://pymol.sourceforge.net/\">Rasmol</a>, <a href=\"http://pymol.sourceforge.net/\">PyMOL</a> and\n<a href=\"http://www.jmol.org/\">Jmol</a>.</p>\n\n<p>In other words: we might not want to look at bioinformatics too much.</p>",
      "summary": "I have heard that bioinformatics is ahead of chemoinformatics. However, I discoverd that this is not necessarily the case, while preparing for a homology modeling course I gave this week at the CUBIC. Open Access is really no issue there, with open access journals and many open access databases. But it is different when it comes down to open source software.",
      
      "date_published": "2006-10-04T00:00:00+00:00",
      "date_modified": "2006-10-04T00:00:00+00:00",
      "tags": ["opensource","bioinfo"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2006/09/28/complife06-day-1.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/09/28/complife06-day-1.html",
      "title": "CompLife&apos;06 - Day 1",
      "content_html": "<p><a href=\"http://www.inf.uni-konstanz.de/complife06/\">CompLife’06</a> started today in Cambridge, UK. About 80 people are attending the meeting,\nand topics range from systems biology to QSAR. This evening there was a free software session mostly focussing on opensource software.\nTwelve projects were presented, among which the <a href=\"http://cdk.sf.net/\">CDK</a> (by me) and <a href=\"http://www.bioclipse.net/\">Bioclipse</a> (by Ola),\nin five minute presentations, and a two hour demo period during a reception (free speech and free beer :). We had our brand new fliers\nwith us, as well as a large poster for some additional branding.</p>\n\n<p>One research presentation compared a number of fingerprint implementations in a QSAR study, and CDK came out very well, beating a few\ncommercial programs. The free software session was full of CDK, however, with <a href=\"http://ambit.acad.bg/\">AMBIT</a>,\n<a href=\"http://openbabel.sourceforge.net/wiki/IBabel\">iBabel</a>, Bioclipse and <a href=\"http://knime.org/\">KNIME</a> mentioning the CDK.</p>\n\n<p>The latter is really interesting: it’s a workflow program just like <a href=\"http://taverna.sourceforge.net/\">Taverna</a> or\n<a href=\"http://www.scitegic.com/products/overview/index.html\">PipeLine Pilot</a>, which is using the Eclipse RCP as starting point, just like\nBioclipse. And like the other two, KNIME has CDK integration, at least for displaying structures.</p>",
      "summary": "CompLife’06 started today in Cambridge, UK. About 80 people are attending the meeting, and topics range from systems biology to QSAR. This evening there was a free software session mostly focussing on opensource software. Twelve projects were presented, among which the CDK (by me) and Bioclipse (by Ola), in five minute presentations, and a two hour demo period during a reception (free speech and free beer :). We had our brand new fliers with us, as well as a large poster for some additional branding.",
      
      "date_published": "2006-09-28T00:00:00+00:00",
      "date_modified": "2006-09-28T00:00:00+00:00",
      "tags": ["cdk","bioclipse","knime"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/paxbm-rac78",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/09/24/cdk-bug-squash-party-day-5.html",
      "title": "CDK Bug Squash Party - Day 5",
      "content_html": "<p>Day 5 was formally the last day (see also the summaries of <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/09/18/cdk-bug-squash-party-day-1.html\">day 1 <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2006/09/20/cdk-bug-squash-party-day-2.html\">day 2 <i class=\"fa-solid fa-recycle fa-xs\"></i></a> and\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2006/09/22/cdk-bug-squash-party-day-3-and-4.html\">day 3/4 <i class=\"fa-solid fa-recycle fa-xs\"></i></a>) of the\n<a href=\"http://cdk.sf.net/\">Chemistry Development Kit</a> <a href=\"http://wiki.cubic.uni-koeln.de/cdkwiki/doku.php?id=bsp200609\">Bug Squash Party</a> (BSP).\nMiguel uploaded the last bits of his CDK <a href=\"http://cheminfo.informatics.indiana.edu/~rguha/code/java/nightly/api/org/openscience/cdk/protein/data/PDBPolymer.html\">PDBPolymer</a>\nto CML to CDK PDBPolymer roundtripping functionality (closing a bug and a feature request in one go). Have not tested this first hand yet,\nbut looking forward to playing with this bit of code. Kai continued to work on the more difficult bits of the\n<a href=\"http://wiki.cubic.uni-koeln.de/cdkwiki/doku.php?id=refactoringkernelclasses\">code refactoring</a>, resulting in fewer though more\ncomprehensive commits. Stefan fixed another bug in JChemPaint; the rendering of implicit hydrogens.</p>\n\n<p>About the last, the <a href=\"http://cheminfo.informatics.indiana.edu/~rguha/code/java/nightly/api/org/openscience/cdk/renderer/Renderer2D.html\">Renderer2D</a>\nneeds a serious overhaul. That is, a complete rewrite in proper Java2D, which can use affine transformations for zooming, scaling and fixing the\ncoordinate system. The current code is ancient and predates Java2D. <a href=\"http://depth-first.com/articles/2006/08/28/drawing-2-d-structures-with-structure-cdk\">Rich’ code</a>\nmight be a good starting point. I would love to do this rewrite, but lack the resources… anyone in need of some open source fame?</p>\n\n<p>I worked on atom typing, which is yet largely untested, and often integrated with other bits of code. Yesterday I uploaded\n<a href=\"http://svn.sourceforge.net/viewvc/cdk/trunk/cdk/src/org/openscience/cdk/atomtype/\">some first patches</a> which I wrote on the train ride\nback to the Netherlands.</p>\n\n<p>Now, what can be concluded from this BSP? The participant count was below what I had hoped for, but those who did worked hard (and\nwith pleasure I hope :) The total number of JUnit test has increased:</p>\n\n<p><img src=\"/assets/images/junit_tests.png\" alt=\"\" /></p>\n\n<p>And so has the number of failing tests:</p>\n\n<p><img src=\"/assets/images/fails_tests.png\" alt=\"\" /></p>\n\n<p>These plots were made with <a href=\"http://www.r-project.org/\">R</a> from data created with two custom scripts both found in\n<a href=\"http://svn.sourceforge.net/viewvc/cdk/trunk/cdk/tools/\">cdk/tools</a>: makeBugCountPlot.pl and extractBugCountPlotData.bsh.\nNote that <a href=\"http://cheminfo.informatics.indiana.edu/~rguha/code/java/nightly/junitsummary.html\">96.86% of the tests do not fail</a>!</p>\n\n<p>The bump in failing tests seems to be due to <a href=\"http://svn.sourceforge.net/viewvc/cdk/trunk/cdk/src/org/openscience/cdk/smiles/SmilesParser.java?r1=7009&amp;r2=7011\">commit 7010-7011</a>,\nwhich has to do with SMILES parsing. Yes, the bond order resolving is still not solved. I don’t seem to get Todd’s patch for this working,\nbut not giving up either. The bump is so large, because quite some JUnit tests use the SmilesParser as a quick tool to get a configured\nconnection table. However, these tests should be replaced by explicit CDK models, which is easy done with the\n<a href=\"http://cheminfo.informatics.indiana.edu/~rguha/code/java/nightly/api/org/openscience/cdk/io/CDKSourceCodeWriter.html\">CDKSourceCodeWriter</a>.\nI’ll blog about how to use that soon.</p>",
      "summary": "Day 5 was formally the last day (see also the summaries of day 1 , day 2 and day 3/4 ) of the Chemistry Development Kit Bug Squash Party (BSP). Miguel uploaded the last bits of his CDK PDBPolymer to CML to CDK PDBPolymer roundtripping functionality (closing a bug and a feature request in one go). Have not tested this first hand yet, but looking forward to playing with this bit of code. Kai continued to work on the more difficult bits of the code refactoring, resulting in fewer though more comprehensive commits. Stefan fixed another bug in JChemPaint; the rendering of implicit hydrogens.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/junit_tests.png",
      "date_published": "2006-09-24T00:00:00+00:00",
      "date_modified": "2024-08-24T00:00:00+00:00",
      "tags": ["cdk","bsp","junit","conference"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/zwkym-aty79",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/09/22/cdk-bug-squash-party-day-3-and-4.html",
      "title": "CDK Bug Squash Party - Day 3 and 4",
      "content_html": "<p>Because I was struggling hard with <a href=\"http://sourceforge.net/mailarchive/forum.php?thread_id=30594266&amp;forum_id=2178\">default values for cdk.interfaces fields</a>,\nI did not have time to write up the <a href=\"http://wiki.cubic.uni-koeln.de/cdkwiki/doku.php?id=bsp200609\">Bug Squash Party</a> report for day 3 (see also\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2006/09/18/cdk-bug-squash-party-day-1.html\">day 1 <i class=\"fa-solid fa-recycle fa-xs\"></i></a> and\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2006/09/20/cdk-bug-squash-party-day-2.html\">day 2 <i class=\"fa-solid fa-recycle fa-xs\"></i></a>).\nBut here it is.</p>\n\n<h1 id=\"day-3\">Day 3</h1>\n\n<p>Kai worked hard on getting the <code class=\"language-plaintext highlighter-rouge\">cdk.interfaces</code> API cleaned up, as <a href=\"http://wiki.cubic.uni-koeln.de/cdkwiki/doku.php?id=refactoringkernelclasses\">agreed upon earlier</a>.\nChristian added a test for the <a href=\"http://cheminfo.informatics.indiana.edu/~rguha/code/java/nightly/api/org/openscience/cdk/geometry/GeometryTools.html\">RMSD calculator</a>\n(see <code class=\"language-plaintext highlighter-rouge\">getAllAtomRMSD()</code>), and cleaned up his code a bit. Stefan continued his bug-squashing on JChemPaint and fixed another one or two bugs.</p>\n\n<p>Rajarshi uploaded a patch to set undefined atomic properties, like partial and formal charges and the implicit hydrogen count, to <code class=\"language-plaintext highlighter-rouge\">UNSET</code> by default.\nHowever, this broke the CDK at many places, as apparently many class methods assume the default to be zero. After discussing the issue at the CUBIC,\nit turned out that this was sort of the intended, though undocumented, behavior: use the <a href=\"http://java.sun.com/docs/books/tutorial/java/nutsandbolts/datatypes.html\">default Java values</a>.</p>\n\n<p>And I added missing <code class=\"language-plaintext highlighter-rouge\">clone()</code> methods, closing one bug on SourceForge, added files for Eclipse to know how to build the CDK with Ant (thanx\nto Nico for similar files for <a href=\"http://www.jmol.org/\">Jmol</a>), and got CDK compiled again against <a href=\"http://www.classpath.org/\">Classpath</a>.</p>\n\n<h1 id=\"day-4\">Day 4</h1>\n\n<p>Miguel uploaded his first patched for support saving <a href=\"http://cheminfo.informatics.indiana.edu/~rguha/code/java/nightly/api/org/openscience/cdk/protein/data/PDBPolymer.html\">PDBPolymer</a>\ndata structures into and restoring them again from CML, addressing an <a href=\"https://sourceforge.net/tracker/index.php?func=detail&amp;aid=1085912&amp;group_id=20024&amp;atid=120024\">almost two-year-old bug</a>.\nHe created new cdk.interfaces for them, to address module dependencies, but a large set of JUnit tests are <a href=\"http://cheminfo.informatics.indiana.edu/~rguha/code/java/nightly/test/result-data.html\">yet missing</a>.</p>\n\n<p>Kai continued his cdk.interfaces refactoring, working on the more involved changes. Stefan, Tobias, and me worked on a poster and three three-fold\nflyers for our CDK booth at <a href=\"http://www.inf.uni-konstanz.de/complife06/\">CompLife2006</a>, so have not been very productive in bug squashing.\nBut we are happy with the result. Below is a screenshot on one side of the main CDK folder:</p>\n\n<p><img src=\"/assets/images/flyerScreeny.png\" alt=\"\" /></p>\n\n<p>With <a href=\"http://cheminfo.informatics.indiana.edu/~rguha/code/java/nightly/junitsummary.html\">77 failing JUnit test</a>, and still a too large number of\n<a href=\"http://sourceforge.net/tracker/?atid=120024&amp;group_id=20024&amp;func=browse\">open bugs on SourceForge</a>, there is plenty of things to do today.</p>",
      "summary": "Because I was struggling hard with default values for cdk.interfaces fields, I did not have time to write up the Bug Squash Party report for day 3 (see also day 1 and day 2 ). But here it is.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/flyerScreeny.png",
      "date_published": "2006-09-22T00:00:00+00:00",
      "date_modified": "2024-08-24T00:00:00+00:00",
      "tags": ["cdk","bsp","java","pdb","conference"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2006/09/20/cdk-bug-squash-party-day-2.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/09/20/cdk-bug-squash-party-day-2.html",
      "title": "CDK Bug Squash Party - Day 2",
      "content_html": "<p>Like <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/09/18/cdk-bug-squash-party-day-1.html\">yesterday <i class=\"fa-solid fa-recycle fa-xs\"></i></a> I will give short overview of things done at the\n<a href=\"http://cdk.sf.net/\">Chemistry Development Kit</a> <a href=\"http://wiki.cubic.uni-koeln.de/cdkwiki/doku.php?id=bsp200609\">Bug Squash Party</a> (BSP).\nI think Stefan was the only to fix and close a bug report yesterday. Rajarshi added the\n<a href=\"http://cheminfo.informatics.indiana.edu/~rguha/code/java/nightly/api/org/openscience/cdk/qsar/descriptors/molecular/MDEDescriptor.html\">MDE descriptor</a>\n(yes, during a BSP new code might be commited too ;)</p>\n\n<p>More interestingly, discussion on the <a href=\"http://sourceforge.net/mailarchive/forum.php?forum_id=2178\">developers mailing list</a> on the\npatch by Todd Martin of the <a href=\"http://www.epa.gov/\">EPA</a> to address deducing bond orders in\nSMILES parsing (the major source of current open bugs!). A problem seems to be when his tool should be called in the SmilesParser class.</p>\n\n<p>More details on the proceedings can be found on the <a href=\"http://wiki.cubic.uni-koeln.de/cdkwiki/doku.php?id=bsp200609\">BSP wiki page</a>.</p>",
      "summary": "Like yesterday I will give short overview of things done at the Chemistry Development Kit Bug Squash Party (BSP). I think Stefan was the only to fix and close a bug report yesterday. Rajarshi added the MDE descriptor (yes, during a BSP new code might be commited too ;)",
      
      "date_published": "2006-09-20T00:00:00+00:00",
      "date_modified": "2024-12-29T00:00:00+00:00",
      "tags": ["cdk","bsp","conference"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2006/09/18/cdk-bug-squash-party-day-1.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/09/18/cdk-bug-squash-party-day-1.html",
      "title": "CDK Bug Squash Party - Day 1",
      "content_html": "<p>I plan to do a daily coverage of the <a href=\"http://cdk.sf.net/\">Chemistry Development Kit</a> <a href=\"http://wiki.cubic.uni-koeln.de/cdkwiki/doku.php?id=bsp200609\">Bug Squash Party</a>\n(BSP). While Stefan was working hard to get the <a href=\"http://wiki.cubic.uni-koeln.de/\">wiki machine</a> back online after a hard-disc crash, Rajarshi,\nMiguel and me have been working hard. Miguel started to work on missing JUnit tests for <a href=\"http://sourceforge.net/tracker/?group_id=20024&amp;atid=120024\">bugs reported on SourceForge</a>\nand Rajarshi <a href=\"http://cia.navi.cx/stats/author/rajarshi\">fixed PMD, JavaDoc and other problems</a>. I wrote 19 new JUnit tests and fixed two bugs,\nbut with 44 bugs still open at SourceForge, there is quite some work to do. Luckily, several others will join in later this week.</p>\n\n<p>As can be read on the <a href=\"http://wiki.cubic.uni-koeln.de/cdkwiki/doku.php?id=bsp200609\">BSP wiki page</a>, there is work for everyone, on every level,\nand even for non-programmers. Or just stop by on <a href=\"irc://irc.freenode.net/#jmol\">CDK’s IRC channel</a> (link works with Konqueror,\nmaybe other browsers too) to see what a BSP looks like from the inside.</p>",
      "summary": "I plan to do a daily coverage of the Chemistry Development Kit Bug Squash Party (BSP). While Stefan was working hard to get the wiki machine back online after a hard-disc crash, Rajarshi, Miguel and me have been working hard. Miguel started to work on missing JUnit tests for bugs reported on SourceForge and Rajarshi fixed PMD, JavaDoc and other problems. I wrote 19 new JUnit tests and fixed two bugs, but with 44 bugs still open at SourceForge, there is quite some work to do. Luckily, several others will join in later this week.",
      
      "date_published": "2006-09-18T00:00:00+00:00",
      "date_modified": "2006-09-18T00:00:00+00:00",
      "tags": ["cdk","bsp","conference"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2006/09/15/chemoblogs-1.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/09/15/chemoblogs-1.html",
      "title": "Chemo::Blogs #1",
      "content_html": "<p>There are a number of links I wanted to blog about, but never really had time for yet. Here’s a short review of a them.\n<a href=\"http://bioblogs.wordpress.com/\">Bio::Blogs</a> is a series of summary/review articles of bio related blogs, and definately\nworth putting in your aggregator. Maybe someone is interested in setting up a Chemo::Blogs for\n<a href=\"http://blueobelisk.org/pg/all_blogs.php\">chemistry blogs</a>?</p>\n\n<p>My <a href=\"http://del.icio.us/\">del.icio.us</a> (social bookmarking) <a href=\"http://del.icio.us/network/egonw\">network</a> informed me about\n<a href=\"http://www.w3.org/Talks/Tools/Slidy/\">HTML Slidy</a>, an XHTML based PowerPoint replacement. Being true XHTML, it allows\nembedding <a href=\"http://www.jmol.org/\">Jmol</a>, <a href=\"http://jchempaint.sf.net/\">JChemPaint</a> and any other applet. Embed your pieces\nof CML, MathML and SVG (or any other <a href=\"http://en.wikipedia.org/wiki/XML_namespace\">namespace</a>) and you no longer\n<a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=28\">have data loss</a>.</p>\n\n<p><a href=\"http://nar.oxfordjournals.org/\">Nucleic Acids Research</a> recently had a special issue on webservers\n(DOI:<a href=\"http://dx.doi.org/10.1093/nar/gkl385\">10.1093/nar/gkl385</a>), in which <a href=\"https://incubator.apache.org/projects/taverna.html\">Taverna <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nwas featured (DOI:<a href=\"https://doi.org/10.1093/nar/gkl320\">10.1093/nar/gkl320</a>). Just want to mention once more that Taverna has\na chemoinformatics module: <a href=\"http://sourceforge.net/project/showfiles.php?group_id=20024&amp;package_id=166755\">CDK-Taverna</a>.</p>\n\n<p>Day and Motherwell published the paper <em>An Experiment in Crystal Structure Prediction by Popular Vote</em>\n(DOI:<a href=\"https://doi.org/10.1021/cg060313r\">10.1021/cg060313r</a>). It links to a<a href=\"http://pubs.acs.org/isubscribe/journals/cgdefu/asap/objects/cg060313r/CSP_popular_vote.html\"> openaccess website</a>\nto participate yourself. This is one way in which one have tigher integration of the internet with old-fashion publishing.</p>\n\n<p>And some minor notes: a video tutorial was put online in <a href=\"http://phobos.xtec.net/fmas/modules.php?name=News&amp;file=article&amp;sid=27\">this blog</a>\nthat shows how Jmol is inserted on a Moodle page. And, as <a href=\"http://plindenbaum.blogspot.com/2006/08/life-sciences-semantic-web-is-full-of.html\">Pierre reminded me</a>,\n<em>The Life Sciences Semantic Web is Full of Creeps!</em> (DOI:<a href=\"https://doi.org/10.1093/bib/bbl025\">10.1093/bib/bbl025</a>),\nwhich puts me in an identity crisis: hacker, chemist or creep. Mmmm…</p>",
      "summary": "There are a number of links I wanted to blog about, but never really had time for yet. Here’s a short review of a them. Bio::Blogs is a series of summary/review articles of bio related blogs, and definately worth putting in your aggregator. Maybe someone is interested in setting up a Chemo::Blogs for chemistry blogs?",
      
      "date_published": "2006-09-15T00:00:00+00:00",
      "date_modified": "2024-12-29T00:00:00+00:00",
      "tags": ["taverna","cdk"],
      "_references": [{ "url": "https://doi.org/10.1093/NAR/GKL385" },{ "url": "https://doi.org/10.1093/NAR/GKL320" },{ "url": "https://doi.org/10.1093/BIB/BBL025" },{ "url": "https://doi.org/10.1021/CG060313R" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/b3zbn-9w223",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/09/14/complex-pdb-documents-using-bioclipse.html",
      "title": "Complex PDB documents using the Bioclipse ChildResourceCreator",
      "content_html": "<p>Some time ago I blogged about the <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/08/22/bioclipse-gets-new-extension-point.html\">ChildResourceCreator extension point in Bioclipse <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nand hinted as using that for <a href=\"http://www.rcsb.org/pdb/\">PDB files</a>. which contain 3D molecular models, sequences and bibliographic information. Using the new extension point,\n<a href=\"http://www.bioclipse.net/\">Bioclipse</a> now treats PDB files as complex documents, creating child resources for the 3D molecular model (using the\n<a href=\"http://cdk.sf.net/\">CDK</a> plugin), and a sequence resource (using the <a href=\"http://www.biojava.org/\">BioJava</a> plugin).</p>\n\n<p><img src=\"/assets/images/bioclipseBioJavaSupport.png\" alt=\"\" /></p>",
      "summary": "Some time ago I blogged about the ChildResourceCreator extension point in Bioclipse and hinted as using that for PDB files. which contain 3D molecular models, sequences and bibliographic information. Using the new extension point, Bioclipse now treats PDB files as complex documents, creating child resources for the 3D molecular model (using the CDK plugin), and a sequence resource (using the BioJava plugin).",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/bioclipseBioJavaSupport.png",
      "date_published": "2006-09-14T00:00:00+00:00",
      "date_modified": "2024-12-29T00:00:00+00:00",
      "tags": ["bioclipse","biojava","cdk","pdb","jmol"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2006/09/13/jmol-and-cdk-add-powerful-chemical.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/09/13/jmol-and-cdk-add-powerful-chemical.html",
      "title": "&quot;Jmol and the CDK add powerful chemical capabilities&quot;, says Munos in Nature Reviews Drug Discovery",
      "content_html": "<p><a href=\"http://www.nature.com/nrd/journal/vaop/ncurrent/authors/nrd2131.html\">Bernard Munos</a> at <a href=\"http://www.lilly.com/\">Eli Lilly &amp; Co.</a>\nwrote up a lengthy analysis on open source in drug discovery in <a href=\"http://www.nature.com/nrd/index.html\">Nature Reviews Drug Discovery</a>:\nCan open-source R&amp;D reinvigorate drug research? (DOI:<a href=\"https://doi.org/10.1038/nrd2131\">10.1038/nrd2131</a>). When scanning the article\nI saw this quote:</p>\n\n<p><em>Other tools such as eMolecules, Jmol or the Chemistry Development Kit are adding powerful chemical search and visualization\ncapabilities to the open-source scientist’s toolbox.</em></p>\n\n<p>Unfortunately, the paper does not point to the correct <a href=\"http://cdk.sf.net/\">CDK website</a>, but to the CUBIC backend at\n<a href=\"http://almost.cubic.uni-koeln.de/cdk\">http://almost.cubic.uni-koeln.de/cdk</a>. Moreover, I don’t think the quote does full justice to\nwhat the CDK has achieved in the past six years; I’m sure we have achieved more than a fingerprinter and some 2D and 3D rendering!</p>",
      "summary": "Bernard Munos at Eli Lilly &amp; Co. wrote up a lengthy analysis on open source in drug discovery in Nature Reviews Drug Discovery: Can open-source R&amp;D reinvigorate drug research? (DOI:10.1038/nrd2131). When scanning the article I saw this quote:",
      
      "date_published": "2006-09-13T00:00:00+00:00",
      "date_modified": "2006-09-13T00:00:00+00:00",
      "tags": ["drugdiscovery","jmol","cdk","opensource"],
      "_references": [{ "url": "https://doi.org/10.1038/nrd2131" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2006/09/08/chemical-archeology-oscar3-to.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/09/08/chemical-archeology-oscar3-to.html",
      "title": "Chemical Archeology: OSCAR3 to NMRShiftDB.org",
      "content_html": "<p>Chemical Archeology (see <a href=\"http://wiki.cubic.uni-koeln.de/blog/pivot/entry.php?id=7#body\">Christoph’s comment</a>) is the\nprocess of extracting chemical information from old journal articles. Some time ago,\n<a href=\"http://wwmm.ch.cam.ac.uk/blogs/corbett/\">Peter Corbett</a> from the group of <a href=\"http://wwmm.ch.cam.ac.uk/blogs/murrayrust/\">Peter Murray-Rust</a>\nvisited the <a href=\"http://almost.cubic.uni-koeln.de/jrg/\">CUBIC</a> to talk to us about\n<a href=\"http://wwmm.ch.cam.ac.uk/wikis/wwmm/index.php/Oscar3\">Oscar3</a> which can do just that. That day, we already\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2006/06/22/text-mining-for-chemistry-using-oscar3.html\">hooked OPSIN into Bioclipse <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.</p>\n\n<p>Oscar3, however, is capable of more than the name2structure of OPSIN (see also\n<a href=\"httpa://doi.org/10.1039/b411033a\">10.1039/b411033a</a>; it can take a plain text file with an experimental section\nwith details on the synthesis of small organic compounds, and analyze the chemistry in that. This functionality has been\navailable as <a href=\"http://www.rsc.org/Publishing/ReSourCe/AuthorGuidelines/AuthoringTools/index.asp\">an RSC authoring tool</a>\nfor some time now (see also <a href=\"https://doi.org/10.1039/b411699m\">10.1039/b411699m</a>). Unfortunately, what publisher put\nonline (PDF and HTML) is much more difficult to process with Oscar3: those formats are often optimized for display,\nnot for machine processing. The HTML can be cleaned up, but there is no general approach.</p>\n\n<p><a href=\"http://wiki.cubic.uni-koeln.de/blog/\">Christoph Steinbeck</a> is going to present at the\n<a href=\"http://www.chemistry.org/portal/a/c/s/1/acsdisplay.html?DOC=meetings%5Csanfrancisco2006%5Chome.html\">upcoming ACS meeting</a>\nthe use of Oscar3 for extraction of NMR spectra from old journal article, in preperation for submission to the\n<a href=\"http://www.nmrshiftdb.org/\">NMRShiftDB.org</a> (see the <a href=\"http://wiki.cubic.uni-koeln.de/blog/pivot/entry.php?id=4#body\">abstract</a>\nof <a href=\"http://oasys2.confex.com/acs/232nm/techprogram/P981204.HTM\">CINF 101</a>).</p>\n\n<p>Since the full Oscar3 was not hooked into <a href=\"http://www.bioclipse.net/\">Bioclipse</a> yet, I had some work to do. It took me\nsome time to figure out how to properly configure Oscar3, and what additional things I had to do to clean up the HTML\nused by publishers to get Oscar3 to extract NMR spectra (thanx to PeterC for hints!). I also had to tweak the Oscar3\ncode itself here and there, but that’s what opensource is about :) (Peter, if you are reading this: I have a number\nof patches for the Oscar3 code in <a href=\"http://svn.sourceforge.net/viewvc/bioclipse/trunk/bc_oscar/\">bc_oscar</a>;\nlet me know if you’re interested in them.)</p>\n\n<p>This is the end result:</p>\n\n<p><img src=\"/assets/images/oscar1.png\" alt=\"\" /></p>\n\n<p>Note especially the hierarchy in the resource navigator on the left. The misc folder contains all the chemistry found in the article. But more importantly is that for six molecules it fully detected he experimental section! For 3-(2-Oxocyclooctanyl)-3-phenylpropan-1-al (InChI=1/C17H22O2/c18-13-12-15(14-8-4-3-5-9-14)16-10-6-1-2-7-11-17(16)19/h3-5,8-9,13,15-16H,1-2,6-7,10-12H2) it derived the molecular structure (with OPSIN), and a few spectra: H-NMR, high-resolution MS and IR.</p>\n\n<p>So, if you attend the ACS meeting: make sure to visit Christoph’s CINF 101 presentation!</p>",
      "summary": "Chemical Archeology (see Christoph’s comment) is the process of extracting chemical information from old journal articles. Some time ago, Peter Corbett from the group of Peter Murray-Rust visited the CUBIC to talk to us about Oscar3 which can do just that. That day, we already hooked OPSIN into Bioclipse .",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/oscar1.png",
      "date_published": "2006-09-08T00:00:00+00:00",
      "date_modified": "2025-01-17T00:00:00+00:00",
      "tags": ["oscar","bioclipse","acs","chemistry","textmining"],
      "_references": [{ "url": "https://doi.org/10.1039/b411033a" },{ "url": "https://doi.org/10.1039/b411699m" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2006/09/08/biojava-15-beta-released.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/09/08/biojava-15-beta-released.html",
      "title": "BioJava 1.5 beta released",
      "content_html": "<p><a href=\"http://www.bioservices.net/2006/09/biojava-15-beta-released.html\">Martin Szugat reported</a> that a beta for <a href=\"http://biojava.org/wiki/BioJava:Download\">BioJava 1.5</a>\nhas been released. New features include: a new <a href=\"http://www.biojava.org/docs/api15b/index.html\">biojavax</a> package with extension on the basic functionlity, such as\nthe <code class=\"language-plaintext highlighter-rouge\">RichSequence.IOTools</code> and the <code class=\"language-plaintext highlighter-rouge\">RichSequence</code> object; a <a href=\"http://biojava.org/wiki/BioJava:BioJavaXDocs#Genetic_Algorithms\">genetic algorithm library</a>; features\nthat allow manipulation of 3D structure files and objects; and non-HMM implementations of the NW and SW alignment algorithms. The announcement also mentions a new\npackage for handling external processes (org.biojava.utils.process); I am wondering what that is about. I will upload this beta to Bioclipse\n<a href=\"http://svn.sourceforge.net/viewvc/bioclipse/trunk/bc_biojava/\">trunk/bc_biojava/</a> shortly, so that we can play with it.</p>",
      "summary": "Martin Szugat reported that a beta for BioJava 1.5 has been released. New features include: a new biojavax package with extension on the basic functionlity, such as the RichSequence.IOTools and the RichSequence object; a genetic algorithm library; features that allow manipulation of 3D structure files and objects; and non-HMM implementations of the NW and SW alignment algorithms. The announcement also mentions a new package for handling external processes (org.biojava.utils.process); I am wondering what that is about. I will upload this beta to Bioclipse trunk/bc_biojava/ shortly, so that we can play with it.",
      
      "date_published": "2006-09-08T00:00:00+00:00",
      "date_modified": "2006-09-08T00:00:00+00:00",
      "tags": ["biology","java","biojava"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2006/09/02/calculating-geometrical-properties.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/09/02/calculating-geometrical-properties.html",
      "title": "Calculating geometrical properties with the CDK",
      "content_html": "<p><a href=\"http://cheminformatics.seesaa.net/\">ケムインフォマティクスに虚空投げ</a> runs <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/09/02/calculating-geometrical-properties.html\">a story on how to calculate geometrical\nproperties of a 3D structure <i class=\"fa-solid fa-recycle fa-xs\"></i></a> using\nCDK’s <a href=\"http://cdk.sourceforge.net/api/org/openscience/cdk/modeling/forcefield/ForceFieldTools.html\">ForceFieldTools</a>.\nThis class contains a few methods to calculate distances between atoms and angles between bonds.</p>\n\n<p>This tools class is special as it uses vecmath GVector objects, which just contain atomic coordinates, likely suitable\nfor extensive computation, as expected in <a href=\"http://cdk.sourceforge.net/api/org/openscience/cdk/modeling/forcefield/package-frame.html\">CDK’s force field implementation</a>.\nHowever, for just calculating the distance and angles, there are simpler alternatives.</p>\n\n<p>The distance between two atoms can be calculated with:</p>\n\n<div class=\"language-java highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"n\">atom1</span> <span class=\"o\">=</span> <span class=\"n\">molecule</span><span class=\"o\">.</span><span class=\"na\">getAtom</span><span class=\"o\">(</span><span class=\"mi\">0</span><span class=\"o\">);</span>\n<span class=\"n\">atom2</span> <span class=\"o\">=</span> <span class=\"n\">molecule</span><span class=\"o\">.</span><span class=\"na\">getAtom</span><span class=\"o\">(</span><span class=\"mi\">1</span><span class=\"o\">);</span>\n<span class=\"kt\">double</span> <span class=\"n\">dist</span> <span class=\"o\">=</span> <span class=\"n\">atom1</span><span class=\"o\">.</span><span class=\"na\">getPoint3d</span><span class=\"o\">().</span><span class=\"na\">distance</span><span class=\"o\">(</span><span class=\"n\">atom2</span><span class=\"o\">.</span><span class=\"na\">getPoint3d</span><span class=\"o\">());</span>\n</code></pre></div></div>\n\n<p>or, by constructing a vector for the bond first:</p>\n\n<div class=\"language-java highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nc\">Vector3d</span> <span class=\"n\">bond1to2</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"nc\">Vector3d</span><span class=\"o\">(</span><span class=\"n\">atom2</span><span class=\"o\">.</span><span class=\"na\">getPoint3d</span><span class=\"o\">());</span>\n<span class=\"n\">bond1to2</span><span class=\"o\">.</span><span class=\"na\">sub</span><span class=\"o\">(</span><span class=\"n\">atom1</span><span class=\"o\">.</span><span class=\"na\">getPoint3d</span><span class=\"o\">());</span>\n<span class=\"kt\">double</span> <span class=\"n\">dist</span> <span class=\"o\">=</span> <span class=\"n\">bond1to2</span><span class=\"o\">.</span><span class=\"na\">length</span><span class=\"o\">();</span>\n</code></pre></div></div>\n\n<p>Using vectors to represent bond (with two atoms!), allows easily calculating angles too (assuming the bonds shard atom1):</p>\n\n<div class=\"language-java highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kt\">double</span> <span class=\"n\">angle</span> <span class=\"o\">=</span> <span class=\"n\">bond1to2</span><span class=\"o\">.</span><span class=\"na\">angle</span><span class=\"o\">(</span><span class=\"n\">bond1to3</span><span class=\"o\">);</span>\n</code></pre></div></div>\n\n<p>Vecmath does not seem to contain a convenience method for calculating torsion angles :(</p>",
      "summary": "ケムインフォマティクスに虚空投げ runs a story on how to calculate geometrical properties of a 3D structure using CDK’s ForceFieldTools. This class contains a few methods to calculate distances between atoms and angles between bonds.",
      
      "date_published": "2006-09-02T00:00:00+00:00",
      "date_modified": "2024-12-29T00:00:00+00:00",
      "tags": ["cdk"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2006/08/25/r-news-special-issue-on-chemistry.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/08/25/r-news-special-issue-on-chemistry.html",
      "title": "R News special issue on chemistry",
      "content_html": "<p><a href=\"http://cran.r-project.org/doc/Rnews/\">R News</a> just released a <a href=\"http://cran.r-project.org/doc/Rnews/Rnews_2006-3.pdf\">special issue</a> on\nthe use of the versatile statistics program <a href=\"http://www.r-project.org/\">R</a> in chemistry. It features six articles amongst which one by\nRajarshi Guha on the <a href=\"http://cdk.sf.net/\">CDK</a>-R bridge, and one by my supervisor and me on the use of self-organizing maps to\ncluster crystal structures.</p>",
      "summary": "R News just released a special issue on the use of the versatile statistics program R in chemistry. It features six articles amongst which one by Rajarshi Guha on the CDK-R bridge, and one by my supervisor and me on the use of self-organizing maps to cluster crystal structures.",
      
      "date_published": "2006-08-25T00:00:00+00:00",
      "date_modified": "2006-08-25T00:00:00+00:00",
      "tags": ["rstats","chemistry"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/xge7p-17184",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/08/25/chemical-blogspace.html",
      "title": "Chemical blogspace",
      "content_html": "<p>We all know <a href=\"http://en.wikipedia.org/wiki/Chemical_space\">chemical space</a>; <a href=\"http://wiki.cubic.uni-koeln.de/pg/\">Chemical blogspace</a> (Cb) is different:\nit is the chemistry discussed in <a href=\"http://en.wikipedia.org/wiki/Blogspace\">blogspace</a>. Cb is build on the\n<a href=\"http://postgenomic.org/\">opensource software</a> of <a href=\"http://postgenomic.com/\">Postgenomic.com</a> which I bloged on\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2006/02/15/hot-articles-mining-semantic-web.html\">before <i class=\"fa-solid fa-recycle fa-xs\"></i></a>. The now running Cb aggregates\n<a href=\"http://wiki.cubic.uni-koeln.de/pg/all_blogs.php\">19 blogs</a> and, like the original, extracts linked (cited or reviewed) articles from literature.</p>\n\n<p><img src=\"/assets/images/chemblogspace.png\" alt=\"\" /></p>\n\n<p>The system is beta, but I am happy about it already that I mention it now. For example, some article titles are not properly recognized,\nand some journals are known in the statistics in several formats. And, more importantly, I have not yet hooked in the\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2006/02/25/hacking-inchi-support-into.html\">InChI <i class=\"fa-solid fa-recycle fa-xs\"></i></a> support I developed earlier.</p>\n\n<p>So, if you like the idea, or know other interesting scientifically interesting chemistry blogs, leave a comment, or send me email.</p>",
      "summary": "We all know chemical space; Chemical blogspace (Cb) is different: it is the chemistry discussed in blogspace. Cb is build on the opensource software of Postgenomic.com which I bloged on before . The now running Cb aggregates 19 blogs and, like the original, extracts linked (cited or reviewed) articles from literature.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/chemblogspace.png",
      "date_published": "2006-08-25T00:00:00+00:00",
      "date_modified": "2024-12-29T00:00:00+00:00",
      "tags": ["cb","feeds","chemistry"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2006/08/22/bioclipse-gets-new-extension-point.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/08/22/bioclipse-gets-new-extension-point.html",
      "title": "Bioclipse gets a new extension point",
      "content_html": "<p>I hacked in a new extension point for <a href=\"http://www.bioclipse.net/\">Bioclipse</a> yesterday, based on a <a href=\"http://wiki.bioclipse.net/index.php?title=ChildCreator_extension_point\">proposal</a>\nI made earlier. The new extension point (EP) is called <code class=\"language-plaintext highlighter-rouge\">ChildResourceCreator</code> and allows creating child resources for a given IBioResource. One application where this is very useful is the\n<a href=\"http://dx.doi.org/10.1021/ci034244p\">CMLRSS application</a> (<a href=\"https://chem-bla-ics.linkedchemistry.info/2006/07/03/avi-movies-of-cmlrss-howto-in.html\">earlier blog <i class=\"fa-solid fa-recycle fa-xs\"></i></a>), or any\n<a href=\"http://en.wikipedia.org/wiki/RSS_(file_format)\">RSS</a> or <a href=\"http://www.atomenabled.org/\">Atom</a> enriched with any other XML language. Here, child resources are\ncreated for each feed entry resource with as content the foreign XML, e.g. the CML bits in the blog.</p>\n\n<p>Other applications involve complex documents, which is basically most existing documents. Take, for example, the\n<a href=\"http://www.rcsb.org/pdb/static.do?p=file_formats/pdb/index.html\">PDB format</a> from the <a href=\"http://www.rcsb.org/pdb/\">PDB database</a>. These PDB files contain a pletory\nof information including one or more protein structures, sequences and bibliographic information. Bioclipse supports each of those using the\n<a href=\"http://cdk.sf.net/\">CDK</a>, <a href=\"http://biojava.org/\">BioJava</a> and <a href=\"http://jabref.sf.net/\">JabRef</a> libraries.</p>\n\n<p>By making extension for the <code class=\"language-plaintext highlighter-rouge\">ChildResourceCreator</code> EP, I am able to setup a general PDBResource (with Bioclipse’s syntax highlighted PDB editor),\nand child resources for the different bits of information. <a href=\"http://sourceforge.net/project/showfiles.php?group_id=150681\">Bioclipse 1.0</a>, however,\nonly allow looking at the molecular structure(s) in the file, not at the sequence, nor the references. Will post the obligatory screenshot asap.</p>",
      "summary": "I hacked in a new extension point for Bioclipse yesterday, based on a proposal I made earlier. The new extension point (EP) is called ChildResourceCreator and allows creating child resources for a given IBioResource. One application where this is very useful is the CMLRSS application (earlier blog ), or any RSS or Atom enriched with any other XML language. Here, child resources are created for each feed entry resource with as content the foreign XML, e.g. the CML bits in the blog.",
      
      "date_published": "2006-08-22T00:00:00+00:00",
      "date_modified": "2024-12-29T00:00:00+00:00",
      "tags": ["bioclipse","feeds","cml"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2006/08/21/cml-explained.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/08/21/cml-explained.html",
      "title": "CML Explained",
      "content_html": "<p>Recently, a new generation of <a href=\"http://www.xml-cml.org/\">Chemical Markup Language</a> CML users seem to hit the\nlearning-curve-wall; there seems to be a niche in explaining the use of CML,\n<a href=\"http://cmlexplained.blogspot.com/2006/08/cml-explained.html\">so here goes</a>. My new (third) blog will discuss\nfrequently and less frequently asked questions about the use of CML.</p>",
      "summary": "Recently, a new generation of Chemical Markup Language CML users seem to hit the learning-curve-wall; there seems to be a niche in explaining the use of CML, so here goes. My new (third) blog will discuss frequently and less frequently asked questions about the use of CML.",
      
      "date_published": "2006-08-21T00:00:00+00:00",
      "date_modified": "2006-08-21T00:00:00+00:00",
      "tags": ["cml"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2006/08/18/small-java-applet-for-2d-structure.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/08/18/small-java-applet-for-2d-structure.html",
      "title": "Small java applet for 2D structure drawing",
      "content_html": "<p>Trepalin et al. published in <a href=\"http://mdpi.org/molecules/\">Molecules</a> the article <em>A Java Chemical Structure Editor Supporting the Modular Chemical Descriptor\nLanguage (MCDL)</em> (open access <a href=\"http://mdpi.org/molecules/papers/11040219.pdf\">PDF</a>). The applet is about 250kB (though the article mentions 200kB) in size and\ndownloadable from the <a href=\"http://sourceforge.net/projects/mcdl/\">MCDL</a> project on SourceForge (license: Public Domain). The article compares the applet with the\n<a href=\"http://jchempaint.sf.net/\">JChemPaint</a> applet and notes that their applet is much smaller. Both allow a template database for automated structure diagram\ngeneration, and the database that comes with the MCDL applet contains 105 fragments, whereas the JChemPaint applet contains a few.</p>\n\n<p>The article also discusses the algorithm they use to deduce bond orders, starting from the MCDL, a problem <a href=\"http://cdk.sf.net/\">CDK</a> is struggling with when\ndealing with SMILES strings.</p>",
      "summary": "Trepalin et al. published in Molecules the article A Java Chemical Structure Editor Supporting the Modular Chemical Descriptor Language (MCDL) (open access PDF). The applet is about 250kB (though the article mentions 200kB) in size and downloadable from the MCDL project on SourceForge (license: Public Domain). The article compares the applet with the JChemPaint applet and notes that their applet is much smaller. Both allow a template database for automated structure diagram generation, and the database that comes with the MCDL applet contains 105 fragments, whereas the JChemPaint applet contains a few.",
      
      "date_published": "2006-08-18T00:00:00+00:00",
      "date_modified": "2006-08-18T00:00:00+00:00",
      "tags": ["java"],
      "_references": [{ "url": "https://doi.org/10.3390/11040219" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/z4kfz-xcy58",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/08/14/classpath-092-has-been-released.html",
      "title": "Classpath 0.92 has been released",
      "content_html": "<p><a href=\"http://gnu.wildebeest.org/diary/index.php?p=163\">Bling! Bling!</a>. Mark Wielaard announced the <a href=\"http://savannah.gnu.org/forum/forum.php?forum_id=4573\">GNU Classpath 0.92</a>\nrelease, with the following changes: <em>an alternative awt peer implementation based on Escher that uses the X protocol directly. Various ImageIO providers for png,\ngif and bmp images. Support for reading and writing midi files and reading .au and .wav files have been added. Various tools and support classes have been added\nfor jar, native2ascii, serialver, keytool, jarsigner. A GConf based util.peers backend has been added. Support for using alternative root certificate authorities\nwith the security and crypto packages. Start of javax.management and runtime lang.managment runtime support. NIO channels now support scatter-gather operations.</em></p>\n\n<p>IMAGE LOST</p>\n\n<p>This means new items on my TODO list: remove the dust from the <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/02/06/test-suite-for-free-open-source-jvms.html\">CDK based test suite\n<i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\ntest if <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/03/11/classpath-090-makes-jmol-application.html\">Jmol <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2005/11/20/open-source-swing-jchempaint-runs.html\">JChemPaint <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2006/05/18/taverna-runs-with-classpath-091.html\">Taverna <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nstill work, and report the outcome on the <a href=\"http://developer.classpath.org/mediation/FreeSwingTestApps\">Classpath website</a>. I wonder how the Cairo\nand Escher patches for AWT and Swing affect my favorite chemblaics tools.</p>\n\n<p>BTW, that the Classpath team appreciates such testing efforts is clear from the foto in the ‘Bling! Bling!’ blog by Mark mentioned above.</p>",
      "summary": "Bling! Bling!. Mark Wielaard announced the GNU Classpath 0.92 release, with the following changes: an alternative awt peer implementation based on Escher that uses the X protocol directly. Various ImageIO providers for png, gif and bmp images. Support for reading and writing midi files and reading .au and .wav files have been added. Various tools and support classes have been added for jar, native2ascii, serialver, keytool, jarsigner. A GConf based util.peers backend has been added. Support for using alternative root certificate authorities with the security and crypto packages. Start of javax.management and runtime lang.managment runtime support. NIO channels now support scatter-gather operations.",
      
      "date_published": "2006-08-14T00:00:00+00:00",
      "date_modified": "2006-08-14T00:00:00+00:00",
      "tags": ["java","cdk","jchempaint","taverna"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2006/08/10/fortran-and-xml-fox-reads-and-writes.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/08/10/fortran-and-xml-fox-reads-and-writes.html",
      "title": "Fortran and XML: FoX reads and writes CML",
      "content_html": "<p>Mix one of the oldest and one of the latest computer technologies, and you get <a href=\"http://www.uszla.me.uk/software/FoX.html\">FoX</a>\n(BSD license), a <a href=\"http://en.wikipedia.org/wiki/Fortran\">Fortran</a> library for reading and writing <a href=\"http://www.xml-cml.org/\">Chemical Markup Language</a>,\nand thus <a href=\"http://www.w3.org/XML/\">XML</a>. Amazing, what Toby White achieved, though he did not start from scratch:\n<em>“FoX evolved from the initial codebase of <a href=\"http://lcdx00.wm.lc.ehu.es/ag/xml/\">xmlf90</a>, which was written largely by Alberto\nGarcia and Jon Wakelin.”</em> (source: <a href=\"http://sourceforge.net/mailarchive/forum.php?forum=cml-discuss\">cml-discuss mailing list</a>).</p>",
      "summary": "Mix one of the oldest and one of the latest computer technologies, and you get FoX (BSD license), a Fortran library for reading and writing Chemical Markup Language, and thus XML. Amazing, what Toby White achieved, though he did not start from scratch: “FoX evolved from the initial codebase of xmlf90, which was written largely by Alberto Garcia and Jon Wakelin.” (source: cml-discuss mailing list).",
      
      "date_published": "2006-08-10T00:00:00+00:00",
      "date_modified": "2006-08-10T00:00:00+00:00",
      "tags": ["cml"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/m0wwb-ty759",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/08/06/new-atomelementscarbon.html",
      "title": "new Atom(Elements.CARBON);",
      "content_html": "<p>Something I have not completely comfortable with about the <a href=\"http://cdk.sf.net/\">CDK</a> in the past, is the way\n<a href=\"http://cdk.sourceforge.net/api/org/openscience/cdk/Atom.html\">Atom</a>’s are constructed:</p>\n\n<div class=\"language-java highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>  <span class=\"nc\">IAtom</span> <span class=\"n\">carbon</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"nc\">Atom</span><span class=\"o\">(</span><span class=\"s\">\"C\"</span><span class=\"o\">);</span>\n</code></pre></div></div>\n\n<p>Not that it is horrible code, but the CDK has an <a href=\"http://cdk.sourceforge.net/api/org/openscience/cdk/Element.html\">Element</a>\ntoo. Why not reuse that? However, until revision 6755 there were not constructors that allowed something like the following:</p>\n\n<div class=\"language-java highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>  <span class=\"nc\">IAtom</span> <span class=\"n\">carbon</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"nc\">Atom</span><span class=\"o\">(</span><span class=\"k\">new</span> <span class=\"nc\">Element</span><span class=\"o\">(</span><span class=\"s\">\"C\"</span><span class=\"o\">));</span>\n</code></pre></div></div>\n\n<p>This afternoon I have hacked in constructors for <a href=\"http://cdk.sourceforge.net/api/org/openscience/cdk/ChemObject.html\">ChemObject</a>,\nElement, <a href=\"http://cdk.sourceforge.net/api/org/openscience/cdk/Isotope.html\">Isotope</a>, <a href=\"http://cdk.sourceforge.net/api/org/openscience/cdk/AtomType.html\">AtomType</a>,\nAtom and <a href=\"http://cdk.sourceforge.net/api/org/openscience/cdk/PseudoAtom.html\">PseudoAtom</a> that allow to be constructed from its\ninterface, or the interface of one of its superclasses.</p>\n\n<p>Additionally, in revision 6753, I added <a href=\"http://svn.sourceforge.net/viewvc/cdk/trunk/cdk/src/org/openscience/cdk/config/Elements.java\">cdk.config.Elements</a> with\nstatic IElements for all elements up to atomic number 116, taken from the <a href=\"http://www.blueobelisk.org/\">Blue Obelisk Data Repository</a>.\nTherefore, I can now also write:</p>\n\n<div class=\"language-java highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>  <span class=\"nc\">IAtom</span> <span class=\"n\">carbon</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"nc\">Atom</span><span class=\"o\">(</span><span class=\"nc\">Elements</span><span class=\"o\">.</span><span class=\"na\">CARBON</span><span class=\"o\">);</span>\n</code></pre></div></div>",
      "summary": "Something I have not completely comfortable with about the CDK in the past, is the way Atom’s are constructed:",
      
      "date_published": "2006-08-06T00:00:00+00:00",
      "date_modified": "2006-08-06T00:00:00+00:00",
      "tags": ["cdk"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2006/08/03/blueobelisk-components-in-japanese.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/08/03/blueobelisk-components-in-japanese.html",
      "title": "BlueObelisk components in Japanese",
      "content_html": "<p><a href=\"http://technorati.com/\">Technorati</a> is nice in several ways, one being the feature to set up a <a href=\"http://technorati.com/watchlist/\">watchlist</a>.\nI have set watches on <em>chemoinformatics, Jmol, Bioclipse</em> and a few more. This allows me see the latest blog items on these topics. Often,\nthe point to Asian blogs, mostly Chinese and Japanese, which I mostly find hard to read. Funny characters with <em>Jmol</em> somewhere in the sentence :)</p>\n\n<p>Yesterday, I found this way a rather interesting Japanese blog, called <a href=\"http://cheminformatics.seesaa.net/\">ケムインフォマティクスに虚空投げ</a>,\nwhich I still can’t read, but which has a lot of small code fragments. (Can someone please translate the title for me??) The last 10-ish items\ndiscuss fingerprints calculation with the <a href=\"http://cdk.sf.net/\">CDK</a> and <a href=\"http://joelib.sf.net/\">JOELib</a>, some SMARTS work with JOELib, and some\ndiscussion on neural network tools.</p>",
      "summary": "Technorati is nice in several ways, one being the feature to set up a watchlist. I have set watches on chemoinformatics, Jmol, Bioclipse and a few more. This allows me see the latest blog items on these topics. Often, the point to Asian blogs, mostly Chinese and Japanese, which I mostly find hard to read. Funny characters with Jmol somewhere in the sentence :)",
      
      "date_published": "2006-08-03T00:00:00+00:00",
      "date_modified": "2006-08-03T00:00:00+00:00",
      "tags": ["cdk","joelib","technorati"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2006/08/01/cdk-and-java-6-beta.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/08/01/cdk-and-java-6-beta.html",
      "title": "CDK and the Java 6 beta",
      "content_html": "<p>Recently, a second beta of Java 6 was <a href=\"http://java.sun.com/javase/downloads/ea.jsp\">released</a>, which triggered a\n<a href=\"http://lists.alioth.debian.org/pipermail/pkg-java-maintainers/2006-June/008385.html\">patch</a> for the\n<a href=\"http://www.debian.org/\">Debian</a> <a href=\"http://packages.debian.org/java-package\">java-package</a> package. It was a Bioclipse\n<a href=\"http://sourceforge.net/tracker/index.php?func=detail&amp;aid=1532612&amp;group_id=150681&amp;atid=778609\">bug report</a> today,\nhowever, which made me patch my java-package setup and install the beta.</p>\n\n<p>So, next thing was to try to get the <a href=\"http://cdk.sf.net/\">CDK</a> compile with the Java 6 beta. Because our build system uses\nJavaDoc (anyone with a pointer with a easy to use Java parser, which parses JavaDoc too?), and because this setup is\ndifferent for literally every platform and Java version, the <a href=\"http://svn.sourceforge.net/viewvc/cdk/trunk/cdk/build.xml?view=log\">build.xml</a>\nneeded some tweaking (patch 6719 and 6721). Additionally, a number of source files were marked as needing Java 1.5, while they actually\ndepend on features introduced in Java 5 (aka 1.5) and which are present in Java 6 (aka 1.6) too, so that needed some tweaking\ntoo (patch 6720).</p>\n\n<p>I have no idea what Java 6 will change and/or introduce, but I did note some comments on it being faster, which is always a good thing.\nThe <a href=\"http://www.junit.org/\">JUnit</a> test timings seems to agree with this. While my Java 1.5.0_06 installation needed 204 seconds\n(no duplicates), Java 1.6.0_beta2 needed only 168 seconds (no duplicates), and improvement of 18%.</p>",
      "summary": "Recently, a second beta of Java 6 was released, which triggered a patch for the Debian java-package package. It was a Bioclipse bug report today, however, which made me patch my java-package setup and install the beta.",
      
      "date_published": "2006-08-01T00:00:00+00:00",
      "date_modified": "2006-08-01T00:00:00+00:00",
      "tags": ["cdk","java","debian"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/94e4t-2q855",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/07/13/context-help-in-bioclipse.html",
      "title": "Context help in Bioclipse",
      "content_html": "<p>The <a href=\"http://www.eclipse.org/\">Eclipse</a> <a href=\"http://wiki.eclipse.org/index.php/Rich_Client_Platform\">Rich Client Platform (RCP)</a> is very powerfull,\nand takes a lot of architectural things of your hand when developing a bio- and chemoinformatics GUIs. <a href=\"http://www.bioclipse.net/\">Bioclipse</a>\nis based on it. One thing the RCP offers is a Help View which works with plain (X)HTML files, and one neat feature is the context help. It is\nhelp shown in the Help View when one focused on a specific GUI element.</p>\n\n<p>As an example, the below figure gives the context help for the JmolView in the <a href=\"http://www.jmol.org/\">Jmol</a> plugin\n(<a href=\"http://wiki.bioclipse.net/index.php?title=Jmol_plugin\">bc_jmol</a>) plugin for Bioclipse:</p>\n\n<p><img src=\"/assets/images/contextHelp.png\" alt=\"\" /></p>\n\n<p>On the right side of the Jmol view (showing <a href=\"http://www.pdb.org/pdb/navbarsearch.do?newSearch=yes&amp;isAuthorSearch=no&amp;radioset=All&amp;inputQuickSearch=1SPX\">1SPX</a>)\nis the Help View, showing the context help for the Jmol View pointing to the ‘Jmol Script Commands Reference’.</p>",
      "summary": "The Eclipse Rich Client Platform (RCP) is very powerfull, and takes a lot of architectural things of your hand when developing a bio- and chemoinformatics GUIs. Bioclipse is based on it. One thing the RCP offers is a Help View which works with plain (X)HTML files, and one neat feature is the context help. It is help shown in the Help View when one focused on a specific GUI element.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/contextHelp.png",
      "date_published": "2006-07-13T00:00:00+00:00",
      "date_modified": "2006-07-13T00:00:00+00:00",
      "tags": ["jmol","bioclipse"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/6924n-01r62",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/07/11/matrix-support-in-bioclipse.html",
      "title": "Matrix support in Bioclipse",
      "content_html": "<p>With <a href=\"http://en.wikipedia.org/wiki/Chemometrics\">chemometrics</a> in mind (QSAR, data mining, …), I have started working on matrix support in\n<a href=\"http://www.bioclipse.net/\">Bioclipse</a>, because the matrix is the important step between (bio-)molecular content and statistical analysis.\nI implemented this such that the actual matrix implementation can be freely chosen, that is,\n<a href=\"http://svn.sourceforge.net/viewcvs.cgi/bioclipse/trunk/bc_statistical/\">bc_statistical</a> provides a <code class=\"language-plaintext highlighter-rouge\">IMatrixImplementation</code> extension point.\nThe plugin <a href=\"http://svn.sourceforge.net/viewcvs.cgi/bioclipse/trunk/bc_jama/\">bc_jama</a> provides a <a href=\"http://math.nist.gov/javanumerics/jama/\">JAMA</a>\nbased extension for this, but other implementations are possible, and possibly useful.</p>\n\n<p>The second component provided by the new statistics plugin, is the MatrixResource, a <a href=\"http://wiki.bioclipse.net/index.php?title=Bioclipse_object_model\">BioResource</a>\nfor documents (e.g. files on the harddisk) that represent a matrix. However, Bioclipse can create such matrices on the fly too, and these do not necessarily have\nto be stored on disk, as is general for BioResource’s. This makes it possible for other plugins to create matrices from other resources: for example, the\n<a href=\"http://cdk.sf.net/\">CDK</a> plugin can now have an action that converts a SDF file into a QSAR data matrix.</p>\n\n<p>The MatrixResource can be edited using a plain text editor, and a more visually attractive graphical editor based on the\n<a href=\"http://sourceforge.net/projects/ktable\">KTable</a> SWT widget:</p>\n\n<p><img src=\"/assets/images/bioclipseMatrixSupport.png\" alt=\"\" /></p>\n\n<p>The next step is to work on column and row names, and replace those uninformative X’s. As you can see in the Properties View, I also need to tweak adding and\nremoving advanced properties a bit. And then it is time to have the CDK plugin create a QSAR data matrix.</p>",
      "summary": "With chemometrics in mind (QSAR, data mining, …), I have started working on matrix support in Bioclipse, because the matrix is the important step between (bio-)molecular content and statistical analysis. I implemented this such that the actual matrix implementation can be freely chosen, that is, bc_statistical provides a IMatrixImplementation extension point. The plugin bc_jama provides a JAMA based extension for this, but other implementations are possible, and possibly useful.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/bioclipseMatrixSupport.png",
      "date_published": "2006-07-11T00:00:00+00:00",
      "date_modified": "2006-07-11T00:00:00+00:00",
      "tags": ["bioclipse","chemometrics","qsar","cdk"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/5rq0q-4ht07",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/07/03/avi-movies-of-cmlrss-howto-in.html",
      "title": "AVI movies of CMLRSS howto in Bioclipse",
      "content_html": "<p>David Strumfels posted news <a href=\"https://web.archive.org/web/20061011100407/http://usefulchem.blogspot.com/2006/07/cml-in-rss-feeds.html\">about the Useful Chemistry CMLRSS feed <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>.\nHe explains how this feed can be accessed using <a href=\"http://www.jmol.org/\">Jmol</a> and <a href=\"http://www.bioclipse.net/\">Bioclipse</a>. The latter are accompanied by two AVI\nmovies: <a href=\"http://showme.physics.drexel.edu/usefulchem/Software/bioclipse/CreatingAnOPML.avi\">one about creating a new OPML file</a>, and\n<a href=\"http://showme.physics.drexel.edu/usefulchem/Software/bioclipse/UsingAnOPML.avi\">one about accessing the CMLRSS file from the OPML</a>.</p>",
      "summary": "David Strumfels posted news about the Useful Chemistry CMLRSS feed . He explains how this feed can be accessed using Jmol and Bioclipse. The latter are accompanied by two AVI movies: one about creating a new OPML file, and one about accessing the CMLRSS file from the OPML.",
      
      "date_published": "2006-07-03T00:00:00+00:00",
      "date_modified": "2006-07-03T00:00:00+00:00",
      "tags": ["cml","rss","bioclipse","jmol"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2006/07/01/new-chemistry-on-desktop-blog.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/07/01/new-chemistry-on-desktop-blog.html",
      "title": "New chemistry-on-the-desktop blog",
      "content_html": "<p>I started a spin-off blog earlier this week: <a href=\"http://kemistry-desktop.blogspot.com/\">kemistry desktop environment</a>. It will deal with\nintegration of chemistry on opensource desktops, with <a href=\"http://www.kde.org/\">KDE</a> as one of them. Today, it features an\n<a href=\"http://kemistry-desktop.blogspot.com/2006/07/overview-of-earlier-blogs.html\">overview of earlier blogs</a> on the subject in this new blog.</p>",
      "summary": "I started a spin-off blog earlier this week: kemistry desktop environment. It will deal with integration of chemistry on opensource desktops, with KDE as one of them. Today, it features an overview of earlier blogs on the subject in this new blog.",
      
      "date_published": "2006-07-01T00:00:00+00:00",
      "date_modified": "2006-07-01T00:00:00+00:00",
      "tags": ["kde"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/62e2c-ycj21",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/06/25/kde4-keyword-support-mockups.html",
      "title": "KDE4 keyword support mockups",
      "content_html": "<p>In reply to interesting comments to <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/06/20/strigi-gets-kfile-plugin-support.html\">my previous blog <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\non <a href=\"http://www.vandenoever.info/software/strigi/\">Strigi</a> and xAttr support in <a href=\"http://www.kde.org/\">KDE</a>4, I would like to suggest\nthe following mockups, which I would find very useful. The deal with the ability to store keywords, for example, not but necessarily\nusing xAttr. I have no idea on how to implement these mockups, so any help or pointers are appreciated.</p>\n\n<p>The first plot is an example of how these keyword markup could be used in KDE, other than searching itself. When showing the properties\nof a directory in KDE, it would show an overview of hottest keywords for that directory, such as used on social bookmark website like\n<a href=\"http://technorati.com/\">Technorati</a> too:</p>\n\n<p><img src=\"/assets/images/kfileXAttrSupport.png\" alt=\"\" /></p>\n\n<p>This example shows that the keyword ‘Strigi’ was used much inside the index_files directory (they are not just the keywords given for\nthat directory, but a summary of the directory content!). Now, these keywords could be stored as xAttr, but in a database too. The\nfirst requires a filesystem that supports xAttr, while the second requires a database daemon to be running. However, for speed\nperformance reasons this would be required anyway. Strigi indexes xAttr now (post 0.3.0 release), and basically allows both.</p>\n\n<p>Independent of the chosen/prefered way to store keywords, these keywords can be edited from the Properties dialog:</p>\n\n<p><img src=\"/assets/images/kfileXAttrSupport2.png\" alt=\"\" /></p>\n\n<p>Now comes the tricky part: though I would like to add this to KDE, I do not have the C++/KDE experience to actually do this.\nI’m already happy that I was able to extend the Strigi with support for KDE’s kfile architecture. Yes, the Strigi version in\nSVN will index all metadata extractable with kfile plugins installed on the KDE installation.</p>",
      "summary": "In reply to interesting comments to my previous blog on Strigi and xAttr support in KDE4, I would like to suggest the following mockups, which I would find very useful. The deal with the ability to store keywords, for example, not but necessarily using xAttr. I have no idea on how to implement these mockups, so any help or pointers are appreciated.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/kfileXAttrSupport2.png",
      "date_published": "2006-06-25T00:00:00+00:00",
      "date_modified": "2024-12-29T00:00:00+00:00",
      "tags": ["kde","strigi","technorati"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/wpk6m-d9y71",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/06/22/text-mining-for-chemistry-using-oscar3.html",
      "title": "Text mining for chemistry using OSCAR3",
      "content_html": "<p><a href=\"http://wwmm.ch.cam.ac.uk/wikis/wwmm/index.php/User:ptc24\">Peter Corbett</a> from <a href=\"http://wwmm.ch.cam.ac.uk/\">Peter Murray-Rust’s group</a>\nat the <a href=\"http://www-ucc.ch.cam.ac.uk/\">Unilever Cambridge Centre for Molecular Informatics</a> visited\n<a href=\"http://almost.cubic.uni-koeln.de/jrg/\">Christoph Steinbeck’s junior Research Group on Molecular Informatics</a> at the\n<a href=\"http://www.cubic.uni-koeln.de/\">CUBIC</a> today, and spoke about the status of <a href=\"http://sourceforge.net/projects/oscar3-chem\">Oscar3</a>,\na chemistry text mining program with the <a href=\"http://www.opensource.org/licenses/artistic-license.php\">Artistic License</a>.\nOscar3, the successor of version 1 and 2, can detect and extract molecular structures and experimental details from plain text articles,\nusing a variety of text mining techniques.</p>\n\n<p>The afternoon was spend on hacking Oscar3 into <a href=\"http://www.bioclipse.net/\">Bioclipse</a>, with good success. It involved updating Oscar3\nfor the latest <a href=\"http://cdk.sf.net/\">CDK</a> and setting up a plugin infrastructure for Bioclipse. This plugin will allow mining\n(scientific) articles for chemical compounds and there properties from within Bioclipse. The outcome of today’s hacking session was\nsomewhat less ambitious and focused on the general infrastructure, and getting the OPSIN functionality in Oscar3 available as a wizard.\nOPSIN is a IUPAC name 2 structure tool and, amongst many other names, is able to recognize <a href=\"http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=2519\">caffeine</a>\n(<code class=\"language-plaintext highlighter-rouge\">InChI=1/C8H10N4O2/c1-10-4-9-6-5(10)7(13)12(3)8(14)11(6)2/h4H,1-3H3</code>):</p>\n\n<p><img src=\"/assets/images/opsin.png\" alt=\"\" /></p>",
      "summary": "Peter Corbett from Peter Murray-Rust’s group at the Unilever Cambridge Centre for Molecular Informatics visited Christoph Steinbeck’s junior Research Group on Molecular Informatics at the CUBIC today, and spoke about the status of Oscar3, a chemistry text mining program with the Artistic License. Oscar3, the successor of version 1 and 2, can detect and extract molecular structures and experimental details from plain text articles, using a variety of text mining techniques.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/opsin.png",
      "date_published": "2006-06-22T00:00:00+00:00",
      "date_modified": "2006-06-22T00:00:00+00:00",
      "tags": ["oscar","bioclipse","textmining"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2006/06/20/strigi-gets-kfile-plugin-support.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/06/20/strigi-gets-kfile-plugin-support.html",
      "title": "Strigi gets kfile plugin support",
      "content_html": "<p>With some help, I got the <a href=\"http://developer.kde.org/documentation/tutorials/kfile-plugin/t1.html\">kfile</a> stream analyzer\nfor <a href=\"http://www.vandenoever.info/software/strigi/\">Strigi</a> working. This means that Strigi will now index the meta data\nfields defined by the <a href=\"http://www.kde-apps.org/content/show.php?content=28995\">kfile-chemical</a> plugins.</p>\n\n<p>The problem why it was not working earlier, was that it segfaulted on every creation of KDE classes. That’s something I\nreally hate about C/C++: the lack of stack traces, though <a href=\"http://valgrind.org/\">valgrind</a> was helpful. It turned out\nthat adding the below line fixed all. A <a href=\"http://developer.kde.org/documentation/library/3.0-api/classref/kdecore/KInstance.html\">KInstance</a>\nis needed when using KDE technology outside a KDE program:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>KInstance instance( \"strigita_kfile\" );\n</code></pre></div></div>\n\n<p>Combine this with the <a href=\"http://wiki.linuxquestions.org/wiki/Extended_attributes\">xattr</a> support added by Jos earlier today, I hope to\nsee an interesting new Strigi release soon! Now we only need to get <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/06/17/kde-desktop-search-kat-strigi-and.html\">editing of keywords <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\ninto KDE4.</p>",
      "summary": "With some help, I got the kfile stream analyzer for Strigi working. This means that Strigi will now index the meta data fields defined by the kfile-chemical plugins.",
      
      "date_published": "2006-06-20T00:00:00+00:00",
      "date_modified": "2024-12-29T00:00:00+00:00",
      "tags": ["strigi","kde"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2006/06/20/dutch-summer-of-code-sponsors.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/06/20/dutch-summer-of-code-sponsors.html",
      "title": "Dutch Summer of Code sponsors a Bioclipse project",
      "content_html": "<p>The Dutch version of the <a href=\"http://code.google.com/soc\">Google Summer of Code</a>, <a href=\"http://www.programmeerzomer.nl/\">Programmeerzomer.nl</a>,\nannounced today the five students participating. I was happy to see that Rob Schellhorn was selected with his project proposal for a\n<a href=\"http://bioinformatics.org/ghemical/ghemical/\">Ghemical</a> plugin for <a href=\"http://www.bioclipse.net/\">Bioclipse</a>. Like in the Google\noriginal, both the student and the mentoring organization are funded, 3600 and 400 euro respectively.</p>",
      "summary": "The Dutch version of the Google Summer of Code, Programmeerzomer.nl, announced today the five students participating. I was happy to see that Rob Schellhorn was selected with his project proposal for a Ghemical plugin for Bioclipse. Like in the Google original, both the student and the mentoring organization are funded, 3600 and 400 euro respectively.",
      
      "date_published": "2006-06-20T00:00:00+00:00",
      "date_modified": "2006-06-20T00:00:00+00:00",
      "tags": ["programmeerzomer","bioclipse","ghemical"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/9n9m7-y4v29",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/06/17/kde-desktop-search-kat-strigi-and.html",
      "title": "KDE desktop search: Kat, Strigi and Tenor",
      "content_html": "<p>Desktop searching has become a hot topic (some <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/05/26/molecular-indexing-on-kde-and-osx.html\">earlier <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2005/11/07/ubuntu-dapper-will-include-chemistry.html\">blogs <i class=\"fa-solid fa-recycle fa-xs\"></i></a>), now that years of data accumulated on ones\nhard disk: PDFs, OpenOffice.org documents, Latex manuscripts, old Java source code, digitized music, and a lot of chemical files. Well,\non my hard disk that is. Unlike piles of paper, a computer could search this data, but due to the size an index is required. What’s KDE4\ngoing to offer?</p>\n\n<p>For the <a href=\"http://www.kde.org/\">KDE</a> desktop <a href=\"http://kat.mandriva.com/\">Kat</a> has for more than a year offered this, and latter\n<a href=\"http://www.kde-apps.org/content/show.php?content=36832\">Kerry</a> came along as frontend to [Beagle(http://beaglewiki.org/Main_Page)],\nthough this does not have the nice integration with KDE <a href=\"http://developer.kde.org/documentation/tutorials/kfile-plugin/t1.html\">kfile plugins</a>.\nSince then, Kat developed has come to a stop (unfortunately), and attempts to reach the main author\n(<a href=\"mailto:roberto.cappuccio@gmail.com\">Roberto</a>) have been unsuccesfull. Last thing happening was a rewrite of the database backend.</p>\n\n<p>Additionally, <a href=\"http://dot.kde.org/1109163846/\">Scott Wheeler proposed Tenor</a> on <a href=\"http://www.fosdem.org/\">FOSDEM</a> 2005:\n<em>“KDE 4: Beyond Hierarchical Data, The Desktop as a Searchable Web of Context”</em>. A semantic desktop; potentially cool, but I have heard\n<a href=\"http://www.kdedevelopers.org/blog/72?from=10\">little from it lately</a>, except for some rumours that\n<a href=\"http://mail.kde.org/pipermail/klink/2006-April/000133.html\">Scott has some actual code at home</a>.</p>\n\n<p>Now, <a href=\"http://www.vandenoever.info/software/strigi/\">Strigi</a> (<a href=\"http://www.kde-look.org/content/show.php?content=40889\">download</a>) has come along,\nwith a fast indexing engine, just the thing where the Kat developed seemed to have stopped. The design is different from that of Kat, but it\ndoes not seem unlikely that Kat code can be ported. No support for PDF or OpenOffice.org documents yet, but that’s really the easy part, and\nkfile is on its way.</p>\n\n<p>Getting back to Tenor, one might wonder how Strigi could implement Tenor concepts. A simple approach is at least to allow users to tag files,\njust like we have become used to with blogs (e.g. <a href=\"http://www.technorati.com/\">Technorati.com</a>) and websites (e.g.\n<a href=\"http://www.connotea.org/\">Connotea</a>). This could be easily implemented using <a href=\"http://wiki.linuxquestions.org/wiki/Extended_attributes\">extended attributes</a>\n(xattr), <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/05/26/molecular-indexing-on-kde-and-osx.html\">already used by Beagle <i class=\"fa-solid fa-recycle fa-xs\"></i></a>:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code># file: home/egonw/1CRN.jpg\nuser.Tenor.Keywords=\"crambin\"\nuser.Tenor.Comment=\"Used in my ontologies presentation.\"\n</code></pre></div></div>\n\n<p>Obviously, this example shows not just these tags, but a user comment too. The idea, here, is that Strigi mines these attributes in\naddition to the file itself, so that search on tags can be done too. BTW, my argument to use this, instead of putting these things\nin the Strigi database itself, is persistence: data and metadata are kept together. KDE’s file properties dialog would be extended\nwith an extra tab that allows editing these fields.</p>\n\n<p>Strigi itself can be embedded in KDE applications to search specific information (e.g. search molecular data within\n<a href=\"http://cniehaus.livejournal.com/23010.html\">Kalzium</a> using the <a href=\"http://www.iupac.org/inchi/\">InChI</a>), and even in the FileOpen dialog.\nWe need patches for KDE4 that allows this, soon.</p>",
      "summary": "Desktop searching has become a hot topic (some earlier blogs ), now that years of data accumulated on ones hard disk: PDFs, OpenOffice.org documents, Latex manuscripts, old Java source code, digitized music, and a lot of chemical files. Well, on my hard disk that is. Unlike piles of paper, a computer could search this data, but due to the size an index is required. What’s KDE4 going to offer?",
      
      "date_published": "2006-06-17T00:00:00+00:00",
      "date_modified": "2024-12-29T00:00:00+00:00",
      "tags": ["kde","strigi","kalzium","linux","technorati"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/kc7ax-n3f66",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/06/12/chemistry-extension-for-spreadsheets.html",
      "title": "A chemistry extension for spreadsheet(s)",
      "content_html": "<p>Just wanted to make sure this news made it to the <a href=\"http://www.blueobelisk.org/planetbo\">Blue Obelisk Planet</a> too:\n<a href=\"http://www.blogger.com/profile/21711372\">David Strumfels</a> reported that\n<a href=\"https://web.archive.org/web/20060614224108/http://usefulchem.blogspot.com/2006/06/processing-usefulchem-molecules-with.html\">he extended MS-Excel with CDK functionality <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>.\nI wonder how difficult it would be to do this with <a href=\"http://www.koffice.org/kspread/\">Kspread</a> or\n<a href=\"http://www.gnome.org/projects/gnumeric/\">Gnumeric</a>?</p>",
      "summary": "Just wanted to make sure this news made it to the Blue Obelisk Planet too: David Strumfels reported that he extended MS-Excel with CDK functionality . I wonder how difficult it would be to do this with Kspread or Gnumeric?",
      
      "date_published": "2006-06-12T00:00:00+00:00",
      "date_modified": "2006-06-12T00:00:00+00:00",
      "tags": ["cdk","cheminf"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2006/06/05/recent-developments-of-chemistry.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/06/05/recent-developments-of-chemistry.html",
      "title": "Recent Developments of the Chemistry Development Kit",
      "content_html": "<p><em><a href=\"https://doi.org/10.2174/138161206777585274\">Recent Developments of the Chemistry Development Kit (CDK) <i class=\"fa-solid fa-recycle fa-xs\"></i></a> -\nAn Open-Source Java Library for Chemo- and Bioinformatics</em> (<a href=\"https://repository.ubn.ru.nl/bitstream/handle/2066/35445/35445_aut.pdf\">green OA</a>) discusses (reasonably) recent additions to the\n<a href=\"http://cdk.sf.net/\">CDK</a>. It appeared in issue 17 of this years <a href=\"http://www.bentham.org/cpd/\">Current Pharmaceutical Design</a>\nvolume, after being too long in the queue after being accepted; but I am happy that it is out now.</p>\n\n<p>The article discusses CDK’s QSAR capabilities (the class designs and an overview of provided descriptors), the 3D model builder\n(see also <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/06/05/recent-developments-of-chemistry.html\">C. Hoppe, CDK News, 1(2):4-5 <i class=\"fa-solid fa-recycle fa-xs\"></i></a>)\nand and the interface to the statistical software <a href=\"http://www.r-project.org/\">R</a> (see also\n<a href=\"http://sourceforge.net/project/showfiles.php?group_id=20024&amp;package_id=124796&amp;release_id=310462\">CDK News, vol.2, issue 1</a>).\nThe article is part of a small special issue on Computational Applications in Medicinal Chemistry.</p>\n\n<p>CDK’s QSAR package comes with one main requirement: <strong>the outcome of QSAR descriptor calculations must be reproducable</strong>.\n<em>“Science must be reproducable”</em>; I’m sure someone once said this :) Therefore, each QSAR descriptor has a specification\npointing the a unique algorithm found in an ontology (see diagram below). This QSAR descriptor ontology is maintained by\nthe <a href=\"http://qsar.sf.net/\">qsar.sf.net</a> project, which is project independent, and even welcomes proprietary programs to\ndiscuss interoperability.</p>\n\n<p><img src=\"/assets/images/DescriptorOverview.png\" alt=\"\" /></p>\n\n<p>And calculated descriptors are explicitely linked to this specification again, though it is up to the user to do with\nthis what he wants:</p>\n\n<p><img src=\"/assets/images/DescriptorResultOverview.png\" alt=\"\" /></p>\n\n<p>Note that code has evolved since this publication, so class, interface and method names may have changed a bit.</p>",
      "summary": "Recent Developments of the Chemistry Development Kit (CDK) - An Open-Source Java Library for Chemo- and Bioinformatics (green OA) discusses (reasonably) recent additions to the CDK. It appeared in issue 17 of this years Current Pharmaceutical Design volume, after being too long in the queue after being accepted; but I am happy that it is out now.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/DescriptorOverview.png",
      "date_published": "2006-06-05T00:00:00+00:00",
      "date_modified": "2024-12-29T00:00:00+00:00",
      "tags": ["cdk","cheminf","qsar"],
      "_references": [{ "url": "https://doi.org/10.2174/138161206777585274" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2006/05/28/blue-obelisk-in-obernai-at.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/05/28/blue-obelisk-in-obernai-at.html",
      "title": "Blue Obelisk in Obernai at Chemoinformatics in Europe",
      "content_html": "<p>Together with <a href=\"http://wiki.cubic.uni-koeln.de/blog/pivot/entry.php?id=7\">Christoph</a>, Christian and Jerome, I will be\nrepresenting the Blue Obelisk movement on the first <a href=\"http://infochim.u-strasbg.fr/recherche/europeen_chemistry/index.php\">First Workshop on Chemoinformatics in\nEurope</a> with the topic <em>Research and Teaching</em>.\nThough I wonder what this theme excludes? Development? Can’t imagine that commercials companies will not be\nrepresented as usual. Moreover, it will likely include some bioinformatics too, unless you consider that to\ndeal with sequences only.</p>\n\n<p>I have my laptop with me, and, of course, the <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/05/22/live-life-sciences-cd.html\">Blue Obelisk Live CD 2 <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\non which the mouse now actually works. <a href=\"http://bioclipse.blogspot.com/2006/05/bioclipse-091-released.html\">Bioclipse 0.9.1</a>\ndoes not work, though; will report that bug later.</p>\n\n<p>My work schedule for the train ride:</p>\n\n<ul>\n  <li>Work on my manuscript</li>\n  <li>Integrate Todd Martin’s SMILES and QSAR work</li>\n  <li>Work on the next CDK News</li>\n  <li>Think about InChI creation in Bioclipse, using OpenBabel</li>\n</ul>",
      "summary": "Together with Christoph, Christian and Jerome, I will be representing the Blue Obelisk movement on the first First Workshop on Chemoinformatics in Europe with the topic Research and Teaching. Though I wonder what this theme excludes? Development? Can’t imagine that commercials companies will not be represented as usual. Moreover, it will likely include some bioinformatics too, unless you consider that to deal with sequences only.",
      
      "date_published": "2006-05-28T00:00:00+00:00",
      "date_modified": "2024-12-29T00:00:00+00:00",
      "tags": ["cdk","bioclipse","cheminf","bioinfo"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2006/05/26/molecular-indexing-on-kde-and-osx.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/05/26/molecular-indexing-on-kde-and-osx.html",
      "title": "Molecular indexing on the KDE and OS/X desktops",
      "content_html": "<p><a href=\"http://geoffhutchison.net/about/\">Geoff Hutchinson</a> <a href=\"http://geoffhutchison.net/blog/archives/2006/05/25/chemspotlight-indexing-chemistry-on-your-mac/\">blogged</a>\nabout his <a href=\"http://geoffhutchison.net/projects/chem/\">OS/X ChemSpotLight</a>, an indexing tool for chemistry documents. It’s like,\nbut more advanced than, the <a href=\"http://www.kde-apps.org/content/show.php?content=28995\">kfile_chemical</a> and\n<a href=\"http://kat.mandriva.com/\">Kat</a> I have been working on (with others) for the\n<a href=\"https://kde.org/\">KDE <i class=\"fa-solid fa-recycle fa-xs\"></i></a> desktop (see earlier blog items).</p>\n\n<p>ChemSpotLight currently does more than the KDE tools: it adds Spotlight comments. I assume these are like the Linux\n<a href=\"http://wiki.linuxquestions.org/wiki/Extended_attributes\">extended attributes</a>, used for example by\n<a href=\"http://beaglewiki.org/Main_Page\">Beagle</a>. For example, a file indexed by Beagle will have extended attributes like:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code># file: home/egonw/m43.jpg\nuser.Beagle.AttrTime=\"20060509071950\"\nuser.Beagle.Filter=\"003 Beagle.Filters.FilterJpeg\"\nuser.Beagle.Fingerprint=\"02 xHn5Yi58x0eoI8ityBYkUw\"\nuser.Beagle.MTime=\"20031225151016\"\nuser.Beagle.Uid=\"YcIW72RWyk+K5FbGnpv4iA\"\n</code></pre></div></div>\n\n<p>This is very suitable for adding metadata, like comments as in ChemSpotLight. Geoff’s program adds metadata like number of\natoms and bond, but it calculates the <a href=\"http://www.daylight.com/smiles/\">SMILES</a> and <a href=\"http://www.iupac.org/inchi/\">InChI</a>\non the fly too. Especially the last is very good for indexing purposes, as it is a really unique identifier for molecular\nstructures, and even works for <a href=\"https://chem-bla-ics.linkedchemistry.info/2006/03/31/inchis-in-latex-and-cdk-news.html\">proteins <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.</p>\n\n<p>Now, kfile_chemical is a kfile plugin. These kfile plugins only extract metadata from files, and have little to do with\ncalculated metadata. Kat, on the other hand, is an indexing application and might be expected to add additional, derived\nor calculated, metadata as extended attributes, just like Beagle does. And then InChI and SMILES are good candidates.</p>",
      "summary": "Geoff Hutchinson blogged about his OS/X ChemSpotLight, an indexing tool for chemistry documents. It’s like, but more advanced than, the kfile_chemical and Kat I have been working on (with others) for the KDE desktop (see earlier blog items).",
      
      "date_published": "2006-05-26T00:00:00+00:00",
      "date_modified": "2024-12-29T00:00:00+00:00",
      "tags": ["kde","cheminf","inchi"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2006/05/24/xml-validation-on-eclipse-with-web.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/05/24/xml-validation-on-eclipse-with-web.html",
      "title": "XML validation on Eclipse with Web Tools Platform",
      "content_html": "<p>Yesterday I installed the <a href=\"http://www.eclipse.org/webtools/\">Eclipse Web Tools Platform</a> again, and now\nsuccesfully, using the Eclipse update mechanism, on my <a href=\"http://www.kubuntu.org/\">Kubuntu dapper</a> eclipse\ninstall. Because it has a validating XML editor, the one last thing I still needed\n<a href=\"http://www.jedit.org/\">jEdit</a> for. (I do miss the vertical selection feature of jEdit, though.) It\nsignals me of errors, and allows autocompletion.</p>\n\n<p>Now I can validate all <a href=\"http://www.xml-cml.org/\">Chemical Markup Langauge</a> files I have around, which is\nvery useful for those I use to make sure <a href=\"http://cdk.sf.net/\">CDK</a> and <a href=\"http://www.bioclipse.net/\">Bioclipse</a>\nis working properly. I just need to make sure I use the <code class=\"language-plaintext highlighter-rouge\">http://www.w3.org/2001/XMLSchema-instance namespace</code>,\nfor example as in this example from CDK SVN:</p>\n\n<div class=\"language-xml highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nt\">&lt;cml</span> <span class=\"na\">title=</span><span class=\"s\">\"Regression tests for valid XML Schema documents for CML 2.3\"</span>\n\n  <span class=\"na\">xmlns=</span><span class=\"s\">\"http://www.xml-cml.org/schema\"</span>\n  <span class=\"na\">xmlns:xsi=</span><span class=\"s\">\"http://www.w3.org/2001/XMLSchema-instance\"</span>\n  <span class=\"na\">xsi:schemaLocation=</span><span class=\"s\">\"http://www.xml-cml.org/schema ../../../io/cml/data/cml23.xsd\"</span><span class=\"nt\">&gt;</span>\n</code></pre></div></div>\n\n<p>Now, I do have some questions. Firstly, does WTP allow recycling of the XML editor? That is, can I use their validating XML editor in, for example, Bioclipse? Would I just depend on the right plugin jars from WTP, or is it more complicated? Alternatively, since in RCP all is a plugin, can WTP be installed as plugin in Bioclipse directly??</p>\n\n<p>Secondly, does Kubuntu or Debian sid have binary packages for WTP? I think to remember having read something about this, in relation with splitting up the WTP into smaller, more specific plugins. Anyone?</p>",
      "summary": "Yesterday I installed the Eclipse Web Tools Platform again, and now succesfully, using the Eclipse update mechanism, on my Kubuntu dapper eclipse install. Because it has a validating XML editor, the one last thing I still needed jEdit for. (I do miss the vertical selection feature of jEdit, though.) It signals me of errors, and allows autocompletion.",
      
      "date_published": "2006-05-24T00:00:00+00:00",
      "date_modified": "2006-05-24T00:00:00+00:00",
      "tags": ["xml","bioclipse","cml"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/vwtxz-8dh40",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/05/22/live-life-sciences-cd.html",
      "title": "A live life-sciences CD",
      "content_html": "<p>November last year, I <a href=\"https://chem-bla-ics.linkedchemistry.info/2005/11/18/goal-live-chemblaics-cd.html\">reported my plans <i class=\"fa-solid fa-recycle fa-xs\"></i></a> to develop\na live CD with all our favorite chemo- and bioinformatics software. <a href=\"http://www.bioclipse.net/\">Bioclipse</a> requires Java5\nand sort of still depends on the Sun JVM (I will experiment with classpath-generics later), but is now distributable with\noperating systems. So, I made a <a href=\"http://www.kubuntu.org/\">Kubuntu</a> derived operating system with\n<a href=\"http://openbabel.sourceforge.net/\">OpenBabel</a>, <a href=\"http://www.jmol.org/\">Jmol</a>, <a href=\"http://pymol.sourceforge.net/\">PyMOL</a>,\nBioclipse, and, on systems level, the chemical MIMEs and <a href=\"http://www.kde-apps.org/content/show.php?content=28995\">kfile_chemical</a>,\nwich extends the desktop with chemistry awareness. In addition, I added the\n<a href=\"http://www.blueobelisk.org/\">Blue Obelisk Data Repository</a>, all <a href=\"http://almost.cubic.uni-koeln.de/cdk/cdk_top/cdk_news/\">CDK News</a>\nissues, and the full <a href=\"http://www.nmrshiftdb.org/\">NMRShiftDB</a> data in CML format.</p>\n\n<p>The <a href=\"http://wiki.cubic.uni-koeln.de/iso/cdname.iso\">iso image</a> can be downloaded, and is really a first set up. Bioclipse does not\nwork, but much of the rest does. Please download it (about 625MB) and experiment with it, and leave your comments with this blog item.</p>",
      "summary": "November last year, I reported my plans to develop a live CD with all our favorite chemo- and bioinformatics software. Bioclipse requires Java5 and sort of still depends on the Sun JVM (I will experiment with classpath-generics later), but is now distributable with operating systems. So, I made a Kubuntu derived operating system with OpenBabel, Jmol, PyMOL, Bioclipse, and, on systems level, the chemical MIMEs and kfile_chemical, wich extends the desktop with chemistry awareness. In addition, I added the Blue Obelisk Data Repository, all CDK News issues, and the full NMRShiftDB data in CML format.",
      
      "date_published": "2006-05-22T00:00:00+00:00",
      "date_modified": "2006-05-22T00:00:00+00:00",
      "tags": ["linux","jmol","bioclipse"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/4f36v-1ze23",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/05/18/taverna-runs-with-classpath-091.html",
      "title": "Taverna runs with Classpath 0.91",
      "content_html": "<p>Classpath 0.91 <a href=\"http://www.gnu.org/software/classpath/announce/20060515.html\">is released</a> with\n<a href=\"http://jroller.com/page/dgilbert?entry=1_45_million_lines_of\">1.45 million</a> lines of code and with\n<a href=\"http://www.kaffe.org/~stuart/japi/htmlout/h-jdk14-classpath.html\">98.96%</a> coverage of Java 1.4.2,\nand 99.82% of java.swing. Or, as <a href=\"http://jroller.com/page/dgilbert?entry=gnu_classpath_0_91\">Dave calls it</a>:\n0.91 rocks! <a href=\"https://chem-bla-ics.linkedchemistry.info/2005/11/20/open-source-swing-jchempaint-runs.html\">JChemPaint runs again <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\n(they fixed the XML parsing problem), and <a href=\"https://chem-bla-ics.linkedchemistry.info/2005/11/27/open-source-swing-jmol-renderer-runs.html\">Jmol still runs &lt;i class=”fa-solid fa-recycle fa-xs”</a>,\n<a href=\"http://developer.classpath.org/mediation/FreeSwingTestApps\">but slow</a>. I also tested\n<a href=\"http://taverna.sourceforge.net/\">Taverna</a> which now also starts up, but has an XML parsing error too:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>Exception occured whilst loading RDFS! Error on line 2: required string: \"?&gt;\"\norg.jdom.input.JDOMParseException: Error on line 2: required string: \"?&gt;\"\n   at org.jdom.input.SAXBuilder.build(SAXBuilder.java:468)\n   at org.jdom.input.SAXBuilder.build(SAXBuilder.java:851)\n   at org.embl.ebi.escience.scufl.semantics.RDFSParser.loadRDFSDocument(RDFSParser.java:70)\n   at org.embl.ebi.escience.scuflui.workbench.Workbench.main(Workbench.java:128)\n   at java.lang.reflect.Method.invokeNative(Native Method)\n   at java.lang.reflect.Method.invoke(Method.java:355)\n   at org.embl.ebi.escience.scuflui.workbench.WorkbenchLauncher.main(WorkbenchLauncher.java:40)\n</code></pre></div></div>\n\n<p>Oh, and rumours go that <a href=\"http://www.nongnu.org/gcjwebplugin/\">gcjwebplugin</a> can run the Jmol applet now,\nexcept for the JavaScript interaction, that is.</p>",
      "summary": "Classpath 0.91 is released with 1.45 million lines of code and with 98.96% coverage of Java 1.4.2, and 99.82% of java.swing. Or, as Dave calls it: 0.91 rocks! JChemPaint runs again (they fixed the XML parsing problem), and Jmol still runs &lt;i class=”fa-solid fa-recycle fa-xs”, but slow. I also tested Taverna which now also starts up, but has an XML parsing error too:",
      
      "date_published": "2006-05-18T00:00:00+00:00",
      "date_modified": "2006-05-18T00:00:00+00:00",
      "tags": ["java","workflow","jchempaint","taverna"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2006/05/11/new-open-access-journal-source-code.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/05/11/new-open-access-journal-source-code.html",
      "title": "New open access journal Source Code for Biology and Medicine",
      "content_html": "<p><a href=\"http://www.biomedcentral.com/\">BioMed Central</a> is setting up a new peer-reviewed, open access journal\n<a href=\"http://www.scfbm.org/\">Source Code for Biology and Medicine</a>. It will <em>“encompass all aspects of workflow for\ninformation systems, decision support systems, client user networks, database management, and data mining”</em>.\nBasically, anything that fits into chem-bla-ics. (Thanx to Werner, for pointing me to the website!)</p>\n\n<p>The ‘source code’ aspect is the interesting thing of this new journal. The editorial board set the aim to <em>publish\nsource code for distribution and use in the public domain in order to advance biological and medical research</em>.\nAnd, in a bit more detail, they list the following goals:</p>\n\n<ul>\n  <li>increase productivity</li>\n  <li>reduce discovery times</li>\n  <li>reduce search times for source code</li>\n  <li>Provide a historical reflection of source code applied</li>\n  <li>serve as a repository</li>\n</ul>\n\n<p>This comes close to what open source is trying to achieve too, but I do not differences. For example, the announcement\nmentions the public domain (see the <a href=\"http://en.wikipedia.org/wiki/Public_domain\">Wikipedia entry</a>). I tend to be a\nbit confused by the use of this term: to me the public domain is where things end up after copyright claims have\nended, and everyone is free to do with it whatever he wants, and, very important in this case, that open source\nsoftware is not in the public domain. Do they mean that they will not allow open source in the new journal?</p>\n\n<p>I also wonder wether we need a journal like this? Open source projects often have other resources available that\nserve as repository (e.g. <a href=\"https://sourceForge.net\">SourceForge <i class=\"fa-solid fa-recycle fa-xs\"></i></a>), and the use\nversion control systems as repositories (like <a href=\"http://www.nongnu.org/cvs/\">CVS</a>, <a href=\"http://subversion.tigris.org/\">Subversion</a>)\nis widespread too, which takes care of the historical reflection. Indeed, many open source software is already\npublished in other journals.</p>\n\n<p>The process of picking the journal to submit to, often involves looking up the journals impact factor. Is this new\njournal expected to get a high impact factor? How many people will regularly read the journal? Will it be read by\nthe right audience, or just by fellow bioinformaticians?</p>\n\n<p>Though I have my doubts about the success of this journal, I am looking forward to the first issue!</p>\n\n<p><strong>Update</strong>: <a href=\"http://www.nodalpoint.org/user/pedrobeltrao\">Pedro</a> <a href=\"https://web.archive.org/web/20060615123103/http://www.nodalpoint.org/2006/05/12/source_code_for_biology_and_medicine\">pointed <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>\nme to the <a href=\"https://web.archive.org/web/20060620202859/http://www.scfbm.org/info/about/\">About page <i class=\"fa-solid fa-box-archive fa-xs\"></i></a> of\nthe SCFBM, giving details on the types of articles taken into consideration.</p>",
      "summary": "BioMed Central is setting up a new peer-reviewed, open access journal Source Code for Biology and Medicine. It will “encompass all aspects of workflow for information systems, decision support systems, client user networks, database management, and data mining”. Basically, anything that fits into chem-bla-ics. (Thanx to Werner, for pointing me to the website!)",
      
      "date_published": "2006-05-11T00:00:00+00:00",
      "date_modified": "2024-12-29T00:00:00+00:00",
      "tags": ["openscience","bioinfo"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2006/05/07/open-text-mining-interface-and.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/05/07/open-text-mining-interface-and.html",
      "title": "Open Text Mining Interface and Bioclipse",
      "content_html": "<p>Timo Hannay <a href=\"https://web.archive.org/web/20060620194249/http://blogs.nature.com/wp/nascent/2006/04/open_text_mining_interface.html\">blogged <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>\nin <a href=\"http://www.nature.com/\">Nature</a>’s <a href=\"https://web.archive.org/web/20060504035155/http://blogs.nature.com/wp/nascent/\">Nascent blog <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>\nabout the Open Text Mining Interface (OTMI), which is “a suggestion from Nature about how we might achieve text-mining\nand indexing purposes”. The idea is that each article has a link pointing to a machine readable file\ncontaining raw data about (and from?) the article. The standing example uses\n<a href=\"http://atompub.org/2005/07/11/draft-ietf-atompub-format-10.html\">Atom 1.0</a> as a container, allowing raw\ndata to be included using foreign namespaces, such as <a href=\"http://prismstandard.org/\">Dublic Core</a>\n(for metadata) and <a href=\"http://prismstandard.org/\">Prism</a> (for bibliographic data), and the OTMI text\nmining statistics uses a namespace too.</p>\n\n<p>In a comment, <a href=\"http://www.ch.ic.ac.uk/rzepa/\">Henry Rzepa</a> proposed inclusion of CML, and refers to earlier\nwork on CMLRSS where <a href=\"http://www.xml-cml.org/\">Chemical Markup Language</a> is embedded in RSS news feeds\nfor which I wrote readers for <a href=\"http://www.jmol.org/\">Jmol</a> and\n<a href=\"http://jchempaint.sf.net/\">JChemPaint</a> (DOI:<a href=\"https://doi.org/10.1021/ci034244p\">10.1021/ci034244p</a>).</p>\n\n<p>As readers of my blog know, the <a href=\"http://www.bioclipse.net/\">Bioclipse</a> project has been working hard\non an integrated (bio)chemistry workbench, and the <a href=\"http://bioclipse.blogspot.com/2006/05/bioclipse-090-released.html\">latest release</a>\nincludes a <a href=\"http://wiki.bioclipse.net/index.php?title=CMLRSS_plugin\">CMLRSS reader plugin</a> too, which\nsupports CML embedded in Atom 0.3/1.0 and RSS 1.0/2.0 feeds. Now, adding support for other embedded\nnamespaces is trivial, and this morning I hacked in support for OTMI:</p>\n\n<p><img src=\"/assets/images/otmiSupport.png\" alt=\"\" /></p>\n\n<p>This screenshot show the original OTMI example\nwith the Atom 1.0 entry now wrapped in an Atom 1.0 <code class=\"language-plaintext highlighter-rouge\">&lt;feed&gt;</code> element. There is no nice OTMI icon for the OTMI content in the\nAtom 1.0 entry, neither did I make a ‘view’ yet showing the actual vector’s or the snippet’s, but that’s a piece of cake too.</p>\n\n<p>Now, the nice thing about this is that the Bioclipse code for the Atom and RSS feeds, just greps through the feed entry\nand show whatever CML or OTMI content is present. When Nature decides to include CML in these OTMI files too,\nI will not have to update the current code.</p>",
      "summary": "Timo Hannay blogged in Nature’s Nascent blog about the Open Text Mining Interface (OTMI), which is “a suggestion from Nature about how we might achieve text-mining and indexing purposes”. The idea is that each article has a link pointing to a machine readable file containing raw data about (and from?) the article. The standing example uses Atom 1.0 as a container, allowing raw data to be included using foreign namespaces, such as Dublic Core (for metadata) and Prism (for bibliographic data), and the OTMI text mining statistics uses a namespace too.",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/otmiSupport.png",
      "date_published": "2006-05-07T00:00:00+00:00",
      "date_modified": "2024-12-29T00:00:00+00:00",
      "tags": ["cml","bioclipse","xml","textmining","rss"],
      "_references": [{ "url": "https://doi.org/10.1021/CI034244P" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/r4sw4-ehh35",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/05/03/four-graph-mining-methods-integrated.html",
      "title": "Four graph mining methods integrated in ParMol",
      "content_html": "<p><a href=\"https://www.blogger.com/profile/09112376168632883058\">Joerg Wegner <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\n<a href=\"http://miningdrugs.blogspot.com/2006/05/molecule-mining-field-is-rapidly.html\">mentioned in his blog</a>\nthe graph mining program <a href=\"https://web.archive.org/web/20070609221004/http://www2.informatik.uni-erlangen.de/Forschung/Projekte/ParMol/?language=en\">ParMol <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>\nwhich integrates four mining algorithms:\n<a href=\"http://fuzzy.cs.uni-magdeburg.de/~borgelt/moss.html\">MoSS</a> (aka MoFa) and <a href=\"http://www.liacs.nl/~snijssen/gaston/\">Gaston</a>, which I mentioned\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2005/11/02/open-source-data-mining-in.html\">in November last year <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\nand FFSM and gSpan, which I did not know about\nyet. ParMol provides a common interface to the four different algorithms and is, like the four mining modules, licensed GPL. An interesting aspect\nis that Gaston was originally written in C++.</p>",
      "summary": "Joerg Wegner mentioned in his blog the graph mining program ParMol which integrates four mining algorithms: MoSS (aka MoFa) and Gaston, which I mentioned in November last year , and FFSM and gSpan, which I did not know about yet. ParMol provides a common interface to the four different algorithms and is, like the four mining modules, licensed GPL. An interesting aspect is that Gaston was originally written in C++.",
      
      "date_published": "2006-05-03T00:00:00+00:00",
      "date_modified": "2024-06-13T00:00:00+00:00",
      "tags": ["cheminf","opensource"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/s63wt-hbx56",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/05/01/nightly-cdk-builds-now-available.html",
      "title": "Nightly CDK builds now available",
      "content_html": "<p><a href=\"http://web.archive.org/web/20060815001811/http://cheminfo.informatics.indiana.edu/~rguha/\">Rajarshi Guha <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>\nhas set a <a href=\"http://blue.chem.psu.edu/~rajarshi/code/java/nightly/\">nightly build service <i class=\"fa-solid fa-link-slash fa-xs\"></i></a>\nfor the <a href=\"http://cdk.sf.net/\">Chemistry Development Kit</a> (CDK). The output is pretty, but information rich: it includes results for the\n<a href=\"http://www.junit.org/\">JUnit test</a>, <a href=\"http://java.sun.com/j2se/javadoc/doccheck/\">DocCheck</a>, and <a href=\"http://pmd.sourceforge.net/\">PMD</a>.\nThe compiled jar and the corresponding JavaDoc can be downloaded, offering a cutting edge distribution for users.</p>",
      "summary": "Rajarshi Guha has set a nightly build service for the Chemistry Development Kit (CDK). The output is pretty, but information rich: it includes results for the JUnit test, DocCheck, and PMD. The compiled jar and the corresponding JavaDoc can be downloaded, offering a cutting edge distribution for users.",
      
      "date_published": "2006-05-01T00:00:00+00:00",
      "date_modified": "2024-06-12T00:00:00+00:00",
      "tags": ["cdk","junit","pmd","opensource"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/23wn4-1nt07",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/04/23/protein-support-in-bioclipse-using.html",
      "title": "Protein support in Bioclipse using Jmol and the CDK",
      "content_html": "<p>I have not blogged for about a week now, and been too busy with other things, like finishing my PhD articles/manuscript,\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2006/03/25/cologne-university-bioinformatics.html\">my new job at the CUBIC <i class=\"fa-solid fa-recycle fa-xs\"></i></a> where I\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2006/04/10/getting-jmols-cartoon-on-to-work-in.html\">continued the work <i class=\"fa-solid fa-recycle fa-xs\"></i></a> on proper protein support in\n<a href=\"http://www.bioclipse.net/\">Bioclipse</a> using the <a href=\"http://cdk.sf.net/\">CDK</a> and\n<a href=\"http://www.jmol.org/\">Jmol</a>:</p>\n\n<p><img src=\"/assets/images/cdkpdbsupport800.png\" alt=\"Screenshot of Bioclipse with a protein visualized with Jmol in the middle.\" /></p>\n\n<p>The latter involves getting the <a href=\"https://sourceforge.net/p/bioclipse/code/11760/log/?path=/bioclipse/trunk/plugins/net.bioclipse.jmol/src/net/bioclipse/plugins/adapter/cdk/CdkJmolAdapter.java\">CdkJmolAdapter <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\nthe interface between the CDK and Jmol, <a href=\"https://web.archive.org/web/20060508024648/http://wiki.cubic.uni-koeln.de/cdkwiki/doku.php?id=cdknewsartjmolandcdk\">updated for changes <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>\nsince the <a href=\"https://sourceforge.net/projects/cdk/files/CDK%20News/2_1/\">Jmol as 3D viewer for CDK <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\narticle in <a href=\"https://sourceforge.net/projects/cdk/files/CDK%20News/\">CDK News <i class=\"fa-solid fa-recycle fa-xs\"></i></a>, the open access journal for CDK related projects.</p>\n\n<p>The screenshot is not showing the actual status: the <code class=\"language-plaintext highlighter-rouge\">CdkJmolAdapter</code> does not propagate all information to Jmol correctly; as you\ncan see in the screenshot in the <code class=\"language-plaintext highlighter-rouge\">BioPolymerTree</code> and <code class=\"language-plaintext highlighter-rouge\">Property</code> views, the CDK now reads the structure information from the PDB file,\nand I verified that Jmol really extracts this using the <code class=\"language-plaintext highlighter-rouge\">StructureIterator</code>, but the secundairy structure does not show up yet.\nI believe the problem is in the <code class=\"language-plaintext highlighter-rouge\">AtomIterator</code>: issueing the <code class=\"language-plaintext highlighter-rouge\">select protein</code> script, selects zero atoms.</p>\n\n<p>The above screenshot is using a workaround, and was made by using Jmol’s own IO instead of the <code class=\"language-plaintext highlighter-rouge\">CdkJmolAdapter</code>. But\nI’m very close and think I will be able to fix this soon.</p>",
      "summary": "I have not blogged for about a week now, and been too busy with other things, like finishing my PhD articles/manuscript, my new job at the CUBIC where I continued the work on proper protein support in Bioclipse using the CDK and Jmol:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/cdkpdbsupport800.png",
      "date_published": "2006-04-23T00:00:00+00:00",
      "date_modified": "2024-12-29T00:00:00+00:00",
      "tags": ["bioclipse","jmol","cdk"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/mgcny-v7r82",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/04/23/download-statistics-for-chemblaics.html",
      "title": "Download statistics for chemblaics components",
      "content_html": "<p>Here are some quick download statistics for some of the chemblaics components. First\n<a href=\"http://www.jmol.org/\">Jmol</a>. The new stable Jmol 10.2 was release just over a week ago, and this obviously boosted downloads,\nbreaking the monthly download total of two earlier this year (<a href=\"http://sourceforge.net/project/stats/?group_id=23629&amp;ugn=jmol&amp;type=&amp;mode=alltime\">source</a>):</p>\n\n<p><img src=\"/assets/images/jmolDownloadStats.April2006.png\" alt=\"\" /></p>\n\n<p>Statistics for the CDK include download numbers for the <a href=\"http://cdk.sf.net/\">CDK</a> library itself, but for <a href=\"http://jchempaint.sf.net/\">JChemPaint</a>,\nthe CDK News, and several other packages too. Totals are at about 1/3rd of Jmol. Another new record, breaking an earlier record set in February 2003\n(<a href=\"http://sourceforge.net/project/stats/?group_id=20024&amp;ugn=cdk&amp;type=&amp;mode=alltime\">source</a>):</p>\n\n<p><img src=\"/assets/images/cdkDownloadStats.April2006.png\" alt=\"\" /></p>\n\n<p>Finally, I want to mention the overall download count for <a href=\"http://www.kde-apps.org/content/show.php?content=28995\">kfile_chemical</a>\nwas is much higher than I ever would have hoped for: 1125 in 7 months! Maybe I should ask to get this in the\n<a href=\"http://www.kde.org/\">KDE</a> extragear.</p>",
      "summary": "Here are some quick download statistics for some of the chemblaics components. First Jmol. The new stable Jmol 10.2 was release just over a week ago, and this obviously boosted downloads, breaking the monthly download total of two earlier this year (source):",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/jmolDownloadStats.April2006.png",
      "date_published": "2006-04-23T00:00:00+00:00",
      "date_modified": "2006-04-23T00:00:00+00:00",
      "tags": ["jmol","cdk","jchempaint"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/7d8hq-pp704",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/04/14/postgenomiccom-maps-upcoming.html",
      "title": "Postgenomic.com maps upcoming conferences",
      "content_html": "<p>Conference season is nearing. And just in time, <a href=\"http://web.archive.org/web/20240601063018mp_/http://postgenomic.com/\">Postgenomic.com <i class=\"fa-solid fa-box-archive fa-xs\"></i></a> added\na <a href=\"https://web.archive.org/web/20060513202812/http://postgenomic.com/meetings.php\">conferences map <i class=\"fa-solid fa-box-archive fa-xs\"></i></a> showing locations of upcoming and\nrecently finished conferences. Oh boy, do I want to set this up for chemoinformatics too!</p>\n\n<p>Postgenomic.com makes use of the <a href=\"https://web.archive.org/web/20060813150816/http://postgenomic.com/about_reviews.php\">rel=”conference” attribute <i class=\"fa-solid fa-box-archive fa-xs\"></i></a> for the\n<code class=\"language-plaintext highlighter-rouge\">&lt;a&gt;</code> element. I’m not sure how they distinguish between upcoming and finished conferences (will need to check the\n<a href=\"http://web.archive.org/web/20060519215119/http://www.postgenomic.org/\">source code <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>). But I think some manual\nprocessing is done, for example, to extract conference details, like title, location and dates. I assume the URL is used as unique identifier. Additionally,\nthe conferences are not ‘tagged’ yet, which should be possible too, as Postgenomic.com already associates tags from blog items with articles mentioned in\nthat item. But this is likely a temporary ommision.</p>\n\n<p>I already saw <a href=\"https://web.archive.org/web/20060621192118/http://www.ched-ccce.org/confchem/\">ChemConf2006 <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>\n<a href=\"https://web.archive.org/web/20060513202812/http://postgenomic.com/meetings.php#conference_id_6\">picked up <i class=\"fa-solid fa-box-archive fa-xs\"></i></a> from an\n<a href=\"https://chem-bla-ics.linkedchemistry.info/2006/04/02/free-online-chemconf-2006-conference.html\">earlier post <i class=\"fa-solid fa-recycle fa-xs\"></i></a> by me. Unfortunately, because it is an online conference, it does not show up on tha map :( The following two conference do have a physical location, and I hope the will appear on the map. If you wonder why I mention only these two, they are the two I will attend in the next 8 weeks, and will have presence of open source bio- and chemoinformatics software developers (at least one, me).</p>\n\n<ul>\n  <li><a href=\"https://gw1-prod.nbic.nl/http://cms1-prod-inside.nbic.nl/home/events/20060424_NBICevent\">Netherlands Bioinformatics Conference <i class=\"fa-solid fa-link-slash fa-xs\"></i></a>, April 24, Ede, The Netherlands</li>\n  <li><a href=\"http://web.archive.org/web/20060612215907/http://infochim.u-strasbg.fr/recherche/europeen_chemistry/index.php\">Workshop Chemoinformatics in Europe: Research and Teaching <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>, 29 May - 1 June, Obernai, France</li>\n</ul>",
      "summary": "Conference season is nearing. And just in time, Postgenomic.com added a conferences map showing locations of upcoming and recently finished conferences. Oh boy, do I want to set this up for chemoinformatics too!",
      
      "date_published": "2006-04-14T00:00:00+00:00",
      "date_modified": "2024-12-29T00:00:00+00:00",
      "tags": ["postgenomic","cheminf"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2006/04/12/cdk-data-classes-and-change.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/04/12/cdk-data-classes-and-change.html",
      "title": "The CDK data classes and change notifications",
      "content_html": "<p>The data classes of the <a href=\"http://cdk.sf.net/\">Chemistry Development Kit</a> are mutable, unlike those of\n<a href=\"http://sourceforge.net/projects/octet\">Octet</a>. This means that other classes may need to respond when\nthe content updates. For example, a render class. CDK’s <a href=\"http://cdk.sourceforge.net/api/org/openscience/cdk/ChemObject.html\">ChemObject</a>\nprovides a <code class=\"language-plaintext highlighter-rouge\">notifyChanged()</code> and <code class=\"language-plaintext highlighter-rouge\">addListener()</code> methods for this. However, as was\n<a href=\"http://sourceforge.net/mailarchive/forum.php?thread_id=10001141&amp;forum_id=2178\">recently</a> pointed out,\nwhile this is useful in editors, such as <a href=\"http://jchempaint.sf.net/\">JChemPaint</a>, this is a performance killer in high-throughput\nsitations, such as descriptor calculation, or structure diagram generation runs.</p>\n\n<p>To address this, the <a href=\"http://svn.sourceforge.net/viewcvs.cgi/cdk/trunk/cdk/src/org/openscience/cdk/interfaces/IChemObject.java?view=log\">IChemObject</a>\ninterface has been extended with the methods <code class=\"language-plaintext highlighter-rouge\">setNotification(boolean)</code> and <code class=\"language-plaintext highlighter-rouge\">getNotification()</code>, which allow to temporarily\ndisable change notifications. There are no helper methods yet to disable it for a complete data structure, like\n<code class=\"language-plaintext highlighter-rouge\">ChemModelManipulator.setNotification(ChemModel, boolean)</code>, but I expect these to be written soon.</p>\n\n<p>Alternatively, special data classes may be used if notification is never needed for a special setup, for example, in case the QSAR descriptor calculation. In such cases, the new <a href=\"http://svn.sourceforge.net/viewcvs.cgi/cdk/trunk/cdk/src/org/openscience/cdk/nonotify/NoNotificationChemObjectBuilder.java?view=log\">NoNotificationChemObjectBuilder</a>\ncan be used:</p>\n\n<div class=\"language-java highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nc\">IChemObjectReader</span> <span class=\"n\">reader</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"nc\">MDLReader</span><span class=\"o\">(</span><span class=\"k\">new</span> <span class=\"nc\">FileInputStream</span><span class=\"o\">(</span><span class=\"k\">new</span> <span class=\"nc\">File</span><span class=\"o\">(</span><span class=\"s\">\"some.mol\"</span><span class=\"o\">)));</span>\n<span class=\"nc\">IChemObjectBuilder</span> <span class=\"n\">builder</span> <span class=\"o\">=</span> <span class=\"nc\">NoNotificationChemObjectBuilder</span><span class=\"o\">.</span><span class=\"na\">getInstance</span><span class=\"o\">();</span>\n<span class=\"nc\">IMolecule</span> <span class=\"n\">molecule</span> <span class=\"o\">=</span> <span class=\"n\">reader</span><span class=\"o\">.</span><span class=\"na\">read</span><span class=\"o\">(</span><span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newMolecule</span><span class=\"o\">());</span>\n<span class=\"c1\">// then perform some operation in which the molecule changes a lot</span>\n</code></pre></div></div>\n\n<p>The advantage is that you do not have to manually disable notification for each class you instantiate. This should give a considerable speed up, and I hope soon to give some statistics.</p>",
      "summary": "The data classes of the Chemistry Development Kit are mutable, unlike those of Octet. This means that other classes may need to respond when the content updates. For example, a render class. CDK’s ChemObject provides a notifyChanged() and addListener() methods for this. However, as was recently pointed out, while this is useful in editors, such as JChemPaint, this is a performance killer in high-throughput sitations, such as descriptor calculation, or structure diagram generation runs.",
      
      "date_published": "2006-04-12T00:00:00+00:00",
      "date_modified": "2006-04-12T00:00:00+00:00",
      "tags": ["cdk","cheminf","jchempaint"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/7nz8x-a7q09",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/04/10/getting-jmols-cartoon-on-to-work-in.html",
      "title": "Getting Jmol&apos;s &apos;cartoon on&apos; to work in Bioclipse",
      "content_html": "<p><a href=\"https://web.archive.org/web/20060420034219/http://www.bioclipse.net/\">Bioclipse</a> 1.0 is to be released in May, and the cartoon on script command is\nstill not working in the <a href=\"http://www.jmol.org/\">Jmol</a> viewer. For those who do not know yet, <a href=\"http://www.eclipse.org/\">Bioclipse</a> is a cool Eclipse\nRCP based Java chemo-and bioinformatics workbench. To have a better idea what goes on inside Bioclipse, I wrote a new BioPolymer tree to show me the\nstrands in the protein. After <a href=\"http://bioclipse.blogspot.com/\">Ola</a> wrote code to show properties for IChemObject’s, I extended it with PDB properties\nfor the atoms, strands and monomers.</p>\n\n<p>The contents of the ChemTree view in the middle and the Properties view below that look fine:</p>\n\n<p><img src=\"https://media.springernature.com/full/springer-static/image/art%3A10.1186%2F1471-2105-8-59/MediaObjects/12859_2006_Article_1431_Fig4_HTML.jpg?as=webp\" alt=\"\" /></p>\n\n<p>So I’ll have to dig a bit further.</p>",
      "summary": "Bioclipse 1.0 is to be released in May, and the cartoon on script command is still not working in the Jmol viewer. For those who do not know yet, Bioclipse is a cool Eclipse RCP based Java chemo-and bioinformatics workbench. To have a better idea what goes on inside Bioclipse, I wrote a new BioPolymer tree to show me the strands in the protein. After Ola wrote code to show properties for IChemObject’s, I extended it with PDB properties for the atoms, strands and monomers.",
      
      "date_published": "2006-04-10T00:00:00+00:00",
      "date_modified": "2024-05-26T00:00:00+00:00",
      "tags": ["bioclipse","jmol","protein"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2006/04/04/mining-kegg-pathway-database-with-self.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/04/04/mining-kegg-pathway-database-with-self.html",
      "title": "Mining the KEGG pathway database with self-organizing maps",
      "content_html": "<p>The <a href=\"https://en.wikipedia.org/wiki/Self_organizing_map\">Self-organizing map</a> (SOM) is a popular (again) and intuitive non-linear mapping\nmethod: it transforms a multidimensional space into two dimensions (normally: they are so easy to visualize). Latino and\n<a href=\"http://www.dq.fct.unl.pt/staff/jas/\">Aires-de-Sousa</a> published a paper that uses this method to analyze the whole\n<a href=\"http://www.genome.jp/kegg/pathway.html\">KEGG pathway database</a>: <em>Genome-Scale Classification of Metabolic Reactions: A Chemoinformatics\nApproach</em> (DOI: <a href=\"https://doi.org/10.1002/anie.200503833\">anie.200503833</a>).</p>\n\n<p>The method is based on earlier work by Zhang and Aires-de-Sousa: <em>Structure-Based Classification of Chemical Reactions without Assignment\nof Reaction Centers</em> (DOI: <a href=\"https://doi.org/10.1021/ci0502707\">10.1021/ci0502707</a>). A non-trivial feature of the suggested method is the\nuse of two SOMs. The first maps the reaction onto a fixed-length vector (coined MOLMAP), which is used as input vector for the second map.\nThis later map is used to cluster the KEGG reactions on a purely chemical basis. The resemblence with the\n<a href=\"https://en.wikipedia.org/wiki/EC_number\">EC numbering system</a> is striking.</p>",
      "summary": "The Self-organizing map (SOM) is a popular (again) and intuitive non-linear mapping method: it transforms a multidimensional space into two dimensions (normally: they are so easy to visualize). Latino and Aires-de-Sousa published a paper that uses this method to analyze the whole KEGG pathway database: Genome-Scale Classification of Metabolic Reactions: A Chemoinformatics Approach (DOI: anie.200503833).",
      
      "date_published": "2006-04-04T00:00:00+00:00",
      "date_modified": "2006-04-04T00:00:00+00:00",
      "tags": ["kegg","chemometrics"],
      "_references": [{ "url": "https://doi.org/10.1002/ANIE.200503833" },{ "url": "https://doi.org/10.1021/CI0502707" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2006/04/02/uncertainty-in-nmr-based-3d-protein.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/04/02/uncertainty-in-nmr-based-3d-protein.html",
      "title": "Uncertainty in NMR based 3D protein models",
      "content_html": "<p>While I was working on implementing proper author-given chain IDs in <a href=\"http://www.pdb.org/\">PDB</a> structures for\n<a href=\"http://www.jmol.org/\">Jmol</a>’s mmCIF reader today, I thought it was interesting to mention the recent article\n<em>Traditional Biomolecular Structure Determination by NMR Spectroscopy Allows for Major Errors by Nabuurs</em>\n(DOI:<a href=\"http://dx.doi.org/10.1371/journal.pcbi.0020009\">10.1371/journal.pcbi.0020009</a>, open access), working at the\n<a href=\"http://www.cmbi.ru.nl/\">CMBI</a>, two floors away from my former working location at the\n<a href=\"https://www.ru.nl/\">Radboud University Nijmegen</a>.</p>\n\n<p>Nabuurs discusses in this article the uncertainties that come with NMR derived 3D molecular structures of proteins.\nThese studies do not give factual data on atomic coordinates, but generally give facts about interatomic distances.\nSolving the 3D geometry is then an optimization problem where the task is to find the 3D geometry that best\nreproduces the factual interatomic distances.</p>\n\n<p>Now, this optimization has many closeby, i.e. in terms of matching the experimental data, minima, corresponding,\npossibly, to quite different structures.</p>\n\n<p>This is nicely demonstrated in the article, by comparing the folds of <a href=\"http://www.pdb.org/pdb/explore.do?structureId=1Y4O\">1Y4O</a>\nand <a href=\"http://www.pdb.org/pdb/explore.do?structureId=1TGQ\">1TGQ</a>, as shown in the figure below\n(<a href=\"http://www.plos.org/oa/index.html\">CCAL</a> license):</p>\n\n<p><img src=\"/assets/images/pcbi.0020009.g001.png\" alt=\"Figure 1 from the article: Sequence and Structure Ensembles of Two DLC2A Structures.\" /></p>\n\n<p>It is interesting to note that 1TGQ got replaced by <a href=\"http://www.pdb.org/pdb/explore.do?structureId=2B95\">2B95</a> about the same\ntime the article by Nabuurs was published, which shows a 3D model that is homologous with that of 1Y4O, and different from\nthat in the Nabuurs article.</p>",
      "summary": "While I was working on implementing proper author-given chain IDs in PDB structures for Jmol’s mmCIF reader today, I thought it was interesting to mention the recent article Traditional Biomolecular Structure Determination by NMR Spectroscopy Allows for Major Errors by Nabuurs (DOI:10.1371/journal.pcbi.0020009, open access), working at the CMBI, two floors away from my former working location at the Radboud University Nijmegen.",
      
      "date_published": "2006-04-02T00:00:00+00:00",
      "date_modified": "2006-04-02T00:00:00+00:00",
      "tags": ["pdb","crystal","pdb","cif"],
      "_references": [{ "url": "https://doi.org/10.1371/JOURNAL.PCBI.0020009" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/3en08-3zc34",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/04/02/free-online-chemconf-2006-conference.html",
      "title": "Free online ChemConf 2006 conference",
      "content_html": "<p>Internet has the nice feature of bringing together people. This has helped many open source projects in the past. But it is also a\nconvenient and cheap way to have conferences. Next month, the\n<a href=\"http://web.archive.org/web/20060213124001/http://www.ched-ccce.org/confchem/\">ChemConf 2006 <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>\nconference will be held, and interested people only need to subscribe to a mailing list to participate.</p>\n\n<p>The topic of this years ChemConf is Web-Based Applications for Chemical Education. At least three posters will show the use of\nJava applets in chemistry education, using <a href=\"https://jmol.org/\">Jmol</a>, <a href=\"http://jchempaint.sourceforge.net/\">JChemPaint</a> and\n<a href=\"http://jspecview.sourceforge.net/\">JSpecView</a>. I am (co-)author of two of them.</p>\n\n<p>Again, participation is free. So join in!</p>",
      "summary": "Internet has the nice feature of bringing together people. This has helped many open source projects in the past. But it is also a convenient and cheap way to have conferences. Next month, the ChemConf 2006 conference will be held, and interested people only need to subscribe to a mailing list to participate.",
      
      "date_published": "2006-04-02T00:00:00+00:00",
      "date_modified": "2024-02-19T00:00:00+00:00",
      "tags": ["conference","jmol","jchempaint","education"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/hysd0-wvc09",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/03/31/inchis-in-latex-and-cdk-news.html",
      "title": "InChI&apos;s in LaTex and CDK News",
      "content_html": "<p>An <a href=\"http://www.iupac.org/inchi/\">InChI</a> (or see the <a href=\"http://www.iupac.org/inchi/\">FAQ</a>) is a line notation\nfor a molecular structure that was recently developed by the <a href=\"http://www.nist.gov/\">NIST</a> and the\n<a href=\"http://www.iupac.org/\">IUPAC</a>. Principally they can be applied to protein too (see below), but because\nproteins would give lenghty InChI’s and are quite well defined in terms of connectivity anyway, those can\nbetter be described by their amino acid sequence.</p>\n\n<p>The March 2006 issue of <a href=\"http://almost.cubic.uni-koeln.de/cdk/cdk_top/cdk_news/\">CDK News</a>, the\n<a href=\"http://cdk.sf.net/\">Chemistry Development Kit</a> project newsletter, will be\n<a href=\"http://sourceforge.net/project/showfiles.php?group_id=20024&amp;package_id=124796\">released</a> later today,\nand had, for the second time, the requirment that authors provide InChI’s for molecular structures mentioned in the articles.\nDifferent from the previous issue is how InChI’s are marked up in LaTeX. I’ve setup a <code class=\"language-plaintext highlighter-rouge\">\\inchi{}</code>\nfor this that automatically creates a <a href=\"http://www.google.com/\">Google</a> search query as link behind the InChI:</p>\n\n<div class=\"language-latex highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"k\">\\newcommand</span><span class=\"p\">{</span>\n  <span class=\"k\">\\inchi</span><span class=\"p\">}</span>[1]<span class=\"p\">{</span><span class=\"k\">\\href</span><span class=\"p\">{</span>http://www.google.com/search?q=#1<span class=\"p\">}</span>\n                  <span class=\"p\">{</span><span class=\"k\">\\normalfont\\texttt</span><span class=\"p\">{</span>InChI=#1<span class=\"p\">}</span>\n            <span class=\"p\">}</span>\n<span class=\"p\">}</span>\n</code></pre></div></div>\n\n<p>Now, googling for InChI’s only works if one removes the <code class=\"language-plaintext highlighter-rouge\">InChI=</code> part of the InChI. As an example I will show how it works\nfor methane. The InChI for this compound is <code class=\"language-plaintext highlighter-rouge\">InChI=1/CH4/h1H4</code>, so in LaTex one enters <code class=\"language-plaintext highlighter-rouge\">\\inchi{1/CH4/h1H4}</code>.\nThis will create a link like: <a href=\"http://www.google.com/search?q=1/CH4/h1H4\">InChI=1/CH4/h1H4</a>.</p>\n\n<p>BTW, if you are interested in InChI’s for proteins, here is the InChI for <a href=\"http://www.pdb.org/pdb/explore.do?structureId=1CRN\">1CRN</a>,\ncreated with <a href=\"http://openbabel.sourceforge.net/\">OpenBabel</a>:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>InChI=1/C202H439N55O64S6/c1-28-92(12)149-188(308)237-127-84-323-324-\n85-128(176(296)225-114(46-37-63-212-202(209)210)165(285)232-122(69-89(6)7)195(315)253-64-38-\n47-132(253)179(299)215-80-143(274)241-158(107(27)265)199(319)257-68-42-51-136(257)182(302)226-\n115(60-61-144(275)276)164(284)218-100(20)162(282)244-149)236-187(307)148(91(10)11)242-172(292)\n120(74-138(204)269)229-168(288)117(70-108-43-34-33-35-44-108)228-169(289)119(73-137(203)268)\n230-173(293)124(81-258)234-166(286)113(45-36-62-211-201(207)208)224-159(279)99(19)221-186(306)\n147(90(8)9)243-189(309)150(93(13)29-2)245-174(294)125(82-259)235-183(303)135-50-41-66-255(135)\n196(316)130-87-326-322-83-126(223-142(273)79-216-185(305)154(103(23)261)251-171(291)118(72-\n110-54-58-112(267)59-55-110)231-192(312)155(104(24)262)250-163(283)101(21)220-175(127)295)178\n(298)246-151(94(14)30-3)190(310)247-152(95(15)31-4)191(311)248-153(96(16)32-5)198(318)256-67-\n40-49-134(256)181(301)213-77-140(271)217-97(17)161(281)249-156(105(25)263)194(314)240-131\n(88-327-325-86-129(177(297)239-130)238-193(313)157(106(26)264)252-184(304)146(206)102(22)260)197\n(317)254-65-39-48-133(254)180(300)214-78-141(272)222-121(76-145(277)278)170(290)227-116(71-\n109-52-56-111(266)57-53-109)167(287)219-98(18)160(280)233-123(200(320)321)75-139(205)270/h89-\n202,211-252,258-321H,28-88,203-210H2,1-27H3/t92-,93-,94-,95-,96-,97-,98-,99-,100-,101-,102+,\n103+,104+,105+,106+,107+,109-,110-,111+,112+,113-,114-,115-,116-,117-,118-,119-,120-,121-,122-,\n123-,124-,125-,126-,127-,128-,129-,130-,131-,132-,133-,134-,135-,136-,137?,138-,139-,140-,141+,\n142-,143+,146-,147-,148-,149-,150-,151-,152-,153-,154-,155-,156-,157-,158-,159+,160?,161-,162?,\n163-,164-,165?,166+,167?,168+,169+,170+,171-,172+,173+,174+,175?,176-,177?,178+,179+,180-,\n181?,182-,183+,184?,185+,186+,187-,188-,189-,190+,191?,192-,193?,194-,195-,196-,197-,198-,199-/m0/s1\n</code></pre></div></div>",
      "summary": "An InChI (or see the FAQ) is a line notation for a molecular structure that was recently developed by the NIST and the IUPAC. Principally they can be applied to protein too (see below), but because proteins would give lenghty InChI’s and are quite well defined in terms of connectivity anyway, those can better be described by their amino acid sequence.",
      
      "date_published": "2006-03-31T00:00:00+00:00",
      "date_modified": "2024-03-10T00:00:00+00:00",
      "tags": ["inchi","cdk","cdknews","iupac","nist","google","protein","openbabel"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/nn7ag-7fp72",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/03/25/cologne-university-bioinformatics.html",
      "title": "The Cologne University BioInformatics Center (CUBIC)",
      "content_html": "<p>As of April 3, I will be working as postdoc in the group of <a href=\"http://almost.cubic.uni-koeln.de/jrg/\">Christoph Steinbeck</a>\nat the <a href=\"http://www.cubic.uni-koeln.de/\">Cologne University BioInformatics Center</a>, or simply CUBIC, for a year. Though\nno exact plans have been decided upon, the work will include <a href=\"http://cdk.sf.net/\">CDK</a>, <a href=\"http://www.xml-cml.org/\">CML</a>,\nontologies, <a href=\"http://www.bioclipse.net/\">Bioclipse</a>, semantic web technologies, <a href=\"http://www.jmol.org/\">Jmol</a>, and other\ninteresting things. Research areas will at least include <a href=\"http://qsar.sf.net/\">QSAR</a>, but I hope to touch bits of\nbioinformatics too.</p>",
      "summary": "As of April 3, I will be working as postdoc in the group of Christoph Steinbeck at the Cologne University BioInformatics Center, or simply CUBIC, for a year. Though no exact plans have been decided upon, the work will include CDK, CML, ontologies, Bioclipse, semantic web technologies, Jmol, and other interesting things. Research areas will at least include QSAR, but I hope to touch bits of bioinformatics too.",
      
      "date_published": "2006-03-25T00:00:00+00:00",
      "date_modified": "2024-03-10T00:00:00+00:00",
      "tags": ["qsar","ontology","cml","jmol","bioclipse","semweb"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2006/03/18/how-to-make-money-from-open-source.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/03/18/how-to-make-money-from-open-source.html",
      "title": "How to make money from Open Source scientific software",
      "content_html": "<p><a href=\"http://www.openscience.org/blog/\">Dan</a> (the original <a href=\"http://www.jmol.org/\">Jmol</a> author) has an interesting blog series:\nHow to make money from Open Source scientific software <a href=\"http://www.openscience.org/blog/?p=164\">I</a>,\n<a href=\"http://www.openscience.org/blog/?p=165\">II</a> and <a href=\"http://www.openscience.org/blog/?p=166\">III</a>. Three more blog items are\nin the planning. The deal with how to make money from open source scientific software. He wants to be able to\nskeptically review the software in his field, hence open source. But open source software development, at least in\nchemistry, needs funding, because there are too few people working on such software on a voluntary basis.</p>\n\n<p>The articles discuss possible scenarios. Article I discusses ‘Sell hardware’ that comes with open source software, and\narticle II discusses the ‘Sell services’ scenario, which still works in the GNU/Linux OS world. He argues that\nselling support does not fit the chem-bla-ics world: <em>“First, scientific software targets a relatively small group of\nusers, and at the same time, the development and support costs are often quite large.”</em> and <em>“Why would a researcher spend\n$10000 on a support contract if the problem could be solved by throwing a graduate student at the open source version\nof the code for a few months?”</em> Interesting arguments indeed.</p>\n\n<p>Instead, he suggests, the service sold should be knowledge. The open source based company should sell knowledge,\nshould solve customer problems using open source software. Each problem will come with specific needs, allowing indirect\nfunding of open source development. And, yes, this is indeed how open source chemo-/bioinformatics software is\ncurrently development: as a mean to solve scientific challenging problems.</p>\n\n<p>I’m looking forward to his next articles in this series.</p>",
      "summary": "Dan (the original Jmol author) has an interesting blog series: How to make money from Open Source scientific software I, II and III. Three more blog items are in the planning. The deal with how to make money from open source scientific software. He wants to be able to skeptically review the software in his field, hence open source. But open source software development, at least in chemistry, needs funding, because there are too few people working on such software on a voluntary basis.",
      
      "date_published": "2006-03-18T00:00:00+00:00",
      "date_modified": "2006-03-18T00:00:00+00:00",
      "tags": ["jmol","openscience"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2006/03/16/pdb-protein-database-uses-jmol.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/03/16/pdb-protein-database-uses-jmol.html",
      "title": "The PDB protein database uses Jmol",
      "content_html": "<p>The beta has been using <a href=\"http://www.jmol.org/\">Jmol</a> as one of the viewers for ages already, but this beta\nis no longer: it’s the new interface for the <a href=\"http://www.pdb.org/\">PDB database</a>.</p>",
      "summary": "The beta has been using Jmol as one of the viewers for ages already, but this beta is no longer: it’s the new interface for the PDB database.",
      
      "date_published": "2006-03-16T00:00:00+00:00",
      "date_modified": "2006-03-16T00:00:00+00:00",
      "tags": ["jmol"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2006/03/12/open-source-in-drug-discovery.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/03/12/open-source-in-drug-discovery.html",
      "title": "Open source in drug discovery",
      "content_html": "<p>Geldenhuys et al. published an article in <a href=\"http://www.sciencedirect.com/science/journal/13596446\">Drug Discovery Today</a> titled\n<em>Optimizing the use of open-source software applications in drug discovery</em> (DOI:<a href=\"https://doi.org/10.1016/S1359-6446(05)03692-5\">10.1016/S1359-6446(05)03692-5</a>),\nand approached the review from a bench chemist point of view. Unfortunately, he discusses free, but closed source, program in one go.</p>\n\n<p>He discusses the advantages and problems with opensource, and mentions the often lacking user-friendly GUI (true),\nand the the lack of literature to validate the program. It was unclear to me wether the last argument applied to the free tools,\nor to the open source programs; I thought the open-source projects like the <a href=\"http://cdk.sf.net/\">CDK</a>,\n<a href=\"http://joelib.sf.net/\">JOELib</a>, <a href=\"http://www.jmol.org/\">Jmol</a> and <a href=\"http://pymol.sf.net/\">PyMol</a> were quite strong in this area,\nat least compared to the commercial software I have seen.</p>",
      "summary": "Geldenhuys et al. published an article in Drug Discovery Today titled Optimizing the use of open-source software applications in drug discovery (DOI:10.1016/S1359-6446(05)03692-5), and approached the review from a bench chemist point of view. Unfortunately, he discusses free, but closed source, program in one go.",
      
      "date_published": "2006-03-12T00:00:00+00:00",
      "date_modified": "2006-03-12T00:00:00+00:00",
      "tags": ["drugdiscovery","openscience"],
      "_references": [{ "url": "https://doi.org/10.1016/S1359-6446(05)03692-5" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/83mgj-93w85",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/03/11/more-chemistry-in-kde.html",
      "title": "More chemistry in KDE",
      "content_html": "<p>After <a href=\"http://edu.kde.org/kalzium/\">Kalzium</a> and\n<a href=\"https://web.archive.org/web/20150930165836/http://kde-apps.org/content/show.php?content=28995\">kfile_chemical <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>,\nKDE has now be extended with kparts for 3D structure and spectrum display:\n<a href=\"https://web.archive.org/web/20130721124532/http://www.kde-apps.org/content/show.php?content=36260\">Kryomol <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>.\nIt is written in C++ and licensed GPL. It supports several chemistry formats, among which quantum chemical formats like Gaussian03,\nNwChem and ACES, and 3D structures as MDL molefile and XYZ.</p>",
      "summary": "After Kalzium and kfile_chemical , KDE has now be extended with kparts for 3D structure and spectrum display: Kryomol . It is written in C++ and licensed GPL. It supports several chemistry formats, among which quantum chemical formats like Gaussian03, NwChem and ACES, and 3D structures as MDL molefile and XYZ.",
      
      "date_published": "2006-03-11T00:00:00+00:00",
      "date_modified": "2023-09-16T00:00:00+00:00",
      "tags": ["kde","chemistry"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/ef1pm-6g994",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/03/11/classpath-090-makes-jmol-application.html",
      "title": "Classpath 0.90 makes the Jmol application run",
      "content_html": "<p>A few days back, <a href=\"http://www.gnu.org/software/classpath/announce/20060306.html\">Classpath 0.90</a> was released, the first release after the 0.20 release. Earlier Classpath releases\n<a href=\"/blog/2005/11/27/open-source-swing-jmol-renderer-runs.html\">could run the rendering engine <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\nbut <a href=\"/blog/2005/11/18/goal-live-chemblaics-cd.html\">running the application failed so far <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.</p>\n\n<p>Today it hit Debian unstable, so upgrade my sid32 chroot and had <a href=\"http://www.cacaojvm.org/\">Cacao</a> run Jmol.\nI had some memory issues opening a small molecule (4-methyl-2-pentyne),\nand the rendering speed was a factor 100 or so slower than Sun’s JVM, but it runs!</p>\n\n<p>Using the command <code class=\"language-plaintext highlighter-rouge\">cacao -Xmx512M -jar Jmol.jar triplebond.mol</code> I got results.</p>\n\n<p>Note the exceptions copied to the console. Many thanx to the Classpath team!</p>\n\n<p>The stacktrace:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>The full stack trace:\n\njava.lang.IllegalArgumentException: width&lt;<span class=\"o\">=</span>0 height&lt;<span class=\"o\">=</span>0\nat java.awt.image.SampleModel.&lt;init&gt; <span class=\"o\">(</span>SampleModel.java:63<span class=\"o\">)</span>\nat java.awt.image.SinglePixelPackedSampleModel.&lt;init&gt; <span class=\"o\">(</span>SinglePixelPackedSampleModel.java:61<span class=\"o\">)</span>\nat java.awt.image.SinglePixelPackedSampleModel.&lt;init&gt; <span class=\"o\">(</span>SinglePixelPackedSampleModel.java:55<span class=\"o\">)</span>\nat org.jmol.g3d.Swing3D.allocateImage <span class=\"o\">(</span>Swing3D.java:65<span class=\"o\">)</span>\nat org.jmol.g3d.Platform3D.allocateBuffers <span class=\"o\">(</span>Platform3D.java:102<span class=\"o\">)</span>\nat org.jmol.g3d.Graphics3D.beginRendering <span class=\"o\">(</span>Graphics3D.java:697<span class=\"o\">)</span>\nat org.jmol.viewer.Viewer.render1 <span class=\"o\">(</span>Viewer.java:1840<span class=\"o\">)</span>\nat org.jmol.viewer.Viewer.renderScreenImage <span class=\"o\">(</span>Viewer.java:1798<span class=\"o\">)</span>\nat org.openscience.jmol.app.DisplayPanel.paint <span class=\"o\">(</span>DisplayPanel.java:100<span class=\"o\">)</span>\nat javax.swing.JComponent.paintChildren <span class=\"o\">(</span>JComponent.java:1659<span class=\"o\">)</span>\nat javax.swing.JComponent.paint <span class=\"o\">(</span>JComponent.java:1564<span class=\"o\">)</span>\nat javax.swing.JComponent.paintChildren <span class=\"o\">(</span>JComponent.java:1659<span class=\"o\">)</span>\nat javax.swing.JComponent.paint <span class=\"o\">(</span>JComponent.java:1564<span class=\"o\">)</span>\nat javax.swing.JComponent.paintChildren <span class=\"o\">(</span>JComponent.java:1659<span class=\"o\">)</span>\nat javax.swing.JComponent.paint <span class=\"o\">(</span>JComponent.java:1564<span class=\"o\">)</span>\nat javax.swing.JComponent.paintChildren <span class=\"o\">(</span>JComponent.java:1659<span class=\"o\">)</span>\nat javax.swing.JComponent.paint <span class=\"o\">(</span>JComponent.java:1564<span class=\"o\">)</span>\nat javax.swing.JComponent.paintChildren <span class=\"o\">(</span>JComponent.java:1659<span class=\"o\">)</span>\nat javax.swing.JComponent.paint <span class=\"o\">(</span>JComponent.java:1564<span class=\"o\">)</span>\nat javax.swing.JLayeredPane.paint <span class=\"o\">(</span>JLayeredPane.java:647<span class=\"o\">)</span>\nat javax.swing.JComponent.paintChildren <span class=\"o\">(</span>JComponent.java:1659<span class=\"o\">)</span>\nat javax.swing.JComponent.paint <span class=\"o\">(</span>JComponent.java:1564<span class=\"o\">)</span>\nat javax.swing.JComponent.paintDoubleBuffered <span class=\"o\">(</span>JComponent.java:1782<span class=\"o\">)</span>\nat javax.swing.JComponent.paint <span class=\"o\">(</span>JComponent.java:1555<span class=\"o\">)</span>\nat java.awt.Container<span class=\"nv\">$GfxPaintVisitor</span>.visit <span class=\"o\">(</span>Container.java:1888<span class=\"o\">)</span>\nat java.awt.Container.visitChild <span class=\"o\">(</span>Container.java:1703<span class=\"o\">)</span>\nat java.awt.Container.visitChildren <span class=\"o\">(</span>Container.java:1674<span class=\"o\">)</span>\nat java.awt.Container.paint <span class=\"o\">(</span>Container.java:770<span class=\"o\">)</span>\nat gnu.java.awt.peer.gtk.GtkWindowPeer.handleEvent <span class=\"o\">(</span>GtkWindowPeer.java:268<span class=\"o\">)</span>\nat java.awt.Component.dispatchEventImpl <span class=\"o\">(</span>Component.java:4968<span class=\"o\">)</span>\nat java.awt.Container.dispatchEventImpl <span class=\"o\">(</span>Container.java:1723<span class=\"o\">)</span>\nat java.awt.Window.dispatchEventImpl <span class=\"o\">(</span>Window.java:626<span class=\"o\">)</span>\nat java.awt.Component.dispatchEvent <span class=\"o\">(</span>Component.java:2320<span class=\"o\">)</span>\nat java.awt.EventQueue.dispatchEvent <span class=\"o\">(</span>EventQueue.java:474<span class=\"o\">)</span>\nat java.awt.EventDispatchThread.run <span class=\"o\">(</span>EventDispatchThread.java:60<span class=\"o\">)</span>\nat java.lang.VMThread.run <span class=\"o\">(</span>VMThread.java:121<span class=\"o\">)</span>\n</code></pre></div></div>",
      "summary": "A few days back, Classpath 0.90 was released, the first release after the 0.20 release. Earlier Classpath releases could run the rendering engine , but running the application failed so far .",
      
      "date_published": "2006-03-11T00:00:00+00:00",
      "date_modified": "2023-09-24T00:00:00+00:00",
      "tags": ["jmol"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/xvhk1-q8r72",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/03/06/progress-with-cmlrss-plugin-for.html",
      "title": "Progress with CMLRSS plugin for Bioclipse",
      "content_html": "<p>With quite some help from <a href=\"http://bioclipse.blogspot.com/\">Ola</a> (thanx!), I made good progress with the\n<a href=\"https://web.archive.org/web/20160413181618/http://wiki.bioclipse.net/index.php?title=CMLRSS_plugin\">CMLRSS plugin <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>.\nThe current result looks like:</p>\n\n<p><img src=\"/assets/images/Cmlrss_bioclipse.png\" alt=\"Screenshot of Bioclipse with an OPML file in the navigator on the left and some first extracted info.\" /></p>\n\n<p>A problem in the transition from Jumbo 5.0 to 5.1 is causing a problem so that it does not show a 3D model or 2D diagram, but that will follow soon.</p>",
      "summary": "With quite some help from Ola (thanx!), I made good progress with the CMLRSS plugin . The current result looks like:",
      "image": "https://chem-bla-ics.linkedchemistry.info/assets/images/Cmlrss_bioclipse.png",
      "date_published": "2006-03-06T00:00:00+00:00",
      "date_modified": "2023-09-16T00:00:00+00:00",
      "tags": ["bioclipse","cml"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2006/02/25/open-source-jmol-taking-over-world.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/02/25/open-source-jmol-taking-over-world.html",
      "title": "Open source Jmol taking over the world",
      "content_html": "<p>Earlier I already <a href=\"2006-02-01-open-source-jmol-hits-student-text.markdown\">reported</a> that student text books were picking up\n<a href=\"http://www.jmol.org/\">Jmol</a> as 3D viewer. Now, <a href=\"http://www.nature.com/nsmb/index.html\">Nature Structural &amp; Molecular Biology</a> reports\n(DOI:<a href=\"https://doi.org/10.1038/nsmb0206-93\">10.1038/nsmb0206-93</a>) that they picked it up too, using\n<a href=\"http://firstglance.jmol.org/\">FirstGlance in Jmol</a> (thanx Peter, for reporting this on the\n<a href=\"http://blueobelisk.org/\">Blue Obelisk</a> <a href=\"http://hardly.cubic.uni-koeln.de/mailman/listinfo/blue-obelisk\">mailing list</a>!).\nAnd, thanx Eric, for acknowledging the hard work of the <a href=\"http://sourceforge.net/project/memberlist.php?group_id=23629\">Jmol developers</a>.</p>\n\n<p>An example article in this Nature publication is Crystal structure of the essential N-terminal domain of telomerase reverse transcriptase\nby Jacobs et al. (DOI:<a href=\"http://dx.doi.org/10.1038/nsmb1054\">10.1038/nsmb1054</a>) about the structure of a part of the telomerase reverse\ntranscriptase (FirstGlance: <a href=\"http://molvis.sdsc.edu/fgij/fg.htm?mol=2B2A\">2B2A</a>). You can easily <a href=\"http://www.google.com/search?q=FirstGlance+site%3Anature.com\">google</a>\nfor more articles as they get indexed.</p>\n\n<p>Note that FirstGlance is certainly not the only webinterface using Jmol! An overview of <a href=\"http://wiki.jmol.org/WebsitesUsingJmol\">websites using Jmol</a>\nis found in the <a href=\"http://wiki.jmol.org/\">Jmol wiki</a>. Those who are not convinced yet, please check out <a href=\"http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=search&amp;DB=pubmed\">PubMed</a>\nand search for Jmol there.</p>\n\n<p>And, yes, this makes me a proud Jmol developer!</p>",
      "summary": "Earlier I already reported that student text books were picking up Jmol as 3D viewer. Now, Nature Structural &amp; Molecular Biology reports (DOI:10.1038/nsmb0206-93) that they picked it up too, using FirstGlance in Jmol (thanx Peter, for reporting this on the Blue Obelisk mailing list!). And, thanx Eric, for acknowledging the hard work of the Jmol developers.",
      
      "date_published": "2006-02-25T00:00:00+00:00",
      "date_modified": "2006-02-25T00:00:00+00:00",
      "tags": ["jmol"],
      "_references": [{ "url": "https://doi.org/10.1038/nsmb0206-93" },{ "url": "https://doi.org/10.1038/nsmb1054" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/nth2m-yyk05",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/02/25/hacking-inchi-support-into.html",
      "title": "Hacking InChI support into postgenomic.com",
      "content_html": "<p>Earlier I <a href=\"2006-02-15-hot-articles-mining-semantic-web.markdown\">reported</a> about\n<a href=\"https://web.archive.org/web/20060303081952/https://postgenomic.com/\">postgenomic.com <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>,\nand needed some diversion from my manuscript work (could no longer think straight about the article I’m working on). So time for\nsome reading up on new technologies. Timing was perfect, because the source code of postgenomic.com got just uploaded to\n<a href=\"http://sourceforge.net/projects/postgenomic\">SourceForge SVN</a>.</p>\n\n<p>Though the author marks it as not-well-documented and alpha, I was quite happy to see a clear modularisation, and good enough\ndocs to get me started with <a href=\"http://www.iupac.org/inchi/\">InChI</a> support: if it can do mining for papers on\n<a href=\"http://www.doi.org/\">DOIs</a>, then it can do mining for InChI’s too.</p>\n\n<p>It does not show which blog items cite this compound, not does it extract some molecular info from PubChem, but\nI’m happy with the result of four hours of hacking. BTW, the first two InChI’s are left overs from bad\nregular expressions :)</p>",
      "summary": "Earlier I reported about postgenomic.com , and needed some diversion from my manuscript work (could no longer think straight about the article I’m working on). So time for some reading up on new technologies. Timing was perfect, because the source code of postgenomic.com got just uploaded to SourceForge SVN.",
      
      "date_published": "2006-02-25T00:00:00+00:00",
      "date_modified": "2023-09-16T00:00:00+00:00",
      "tags": ["cb","inchi"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2006/02/24/novel-qsar-and-qspr-descriptors_24.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/02/24/novel-qsar-and-qspr-descriptors_24.html",
      "title": "Novel QSAR and QSPR descriptors?",
      "content_html": "<p>For the past few weeks I have been working on a review article, which will contain a section with new QSAR/QSPR descriptors\npublished in the period 2000-now. Here are a few:</p>\n\n<ul>\n  <li>2001: oxygen paths of length 3 <a href=\"https://doi.org/10.1021/ci000116e\">10.1021/ci000116e</a></li>\n  <li>2002: a molecular shape descriptor <a href=\"https://doi.org/10.1021/ci000100o\">10.1021/ci000100o</a></li>\n  <li>2003: molecular signature <a href=\"https://doi.org/10.1021/ci020345w\">10.1021/ci020345w</a></li>\n  <li>2004: 4D-fingerprint <a href=\"https://doi.org/10.1021/ci049898s\">10.1021/ci049898s</a></li>\n  <li>2005: summed NMR shift difference <a href=\"https://doi.org/10.1021/ci049643e\">10.1021/ci049643e</a></li>\n</ul>\n\n<p>If you know additional new descriptors, or feel like discussion one or more of the above, please leave a comment.</p>",
      "summary": "For the past few weeks I have been working on a review article, which will contain a section with new QSAR/QSPR descriptors published in the period 2000-now. Here are a few:",
      
      "date_published": "2006-02-24T00:00:00+00:00",
      "date_modified": "2006-02-24T00:00:00+00:00",
      "tags": ["qsar","cheminf"],
      "_references": [{ "url": "https://doi.org/10.1021/CI000116E" },{ "url": "https://doi.org/10.1021/CI000100O" },{ "url": "https://doi.org/10.1021/CI020345W" },{ "url": "https://doi.org/10.1021/CI049898S" },{ "url": "https://doi.org/10.1021/CI049643E" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2006/02/22/blueobelisk-opensource-opendata-and.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/02/22/blueobelisk-opensource-opendata-and.html",
      "title": "BlueObelisk: OpenSource, OpenData and OpenStandards",
      "content_html": "<p>OpenSource, OpenData and OpenStandards are not as strong in chemoinformatics as they are in bioinformatcs, where it is common knowledge\nthat sharing is a good. Today, the <a href=\"http://pubs3.acs.org/acs/journals/toc.page?incoden=jcisd8\">JCIM</a> published on the web an\n<a href=\"http://dx.doi.org/10.1021/ci050400b\">article</a> about the <a href=\"http://www.blueobelisk.org/\">Blue Obelisk</a> movement, which promotes these\nthree idealogies.</p>\n\n<p>Several open source projects participate, amongst which the <a href=\"http://cdk.sf.net/\">CDK</a>, <a href=\"http://www.jmol.org/\">Jmol</a>,\n<a href=\"http://joelib.sf.net/\">JOELib</a>, <a href=\"http://openbabel.sf.net/\">OpenBabel</a>, <a href=\"http://cml.sf.net/\">Chemical Markup Language</a>,\n<a href=\"http://bioclipse.net/\">Bioclipse</a> and <a href=\"http://cdk.sf.net/\">Kalzium</a>.</p>",
      "summary": "OpenSource, OpenData and OpenStandards are not as strong in chemoinformatics as they are in bioinformatcs, where it is common knowledge that sharing is a good. Today, the JCIM published on the web an article about the Blue Obelisk movement, which promotes these three idealogies.",
      
      "date_published": "2006-02-22T00:00:00+00:00",
      "date_modified": "2006-02-22T00:00:00+00:00",
      "tags": ["blue-obelisk","openscience","cdk","cml","bioclipse","kde","jmol"],
      "_references": [{ "url": "https://doi.org/10.1021/CI050400B" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/p37t7-7mz48",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/02/18/blogging-chemistry-on-blogspotcom.html",
      "title": "Blogging chemistry on blogspot.com",
      "content_html": "<p>You might have read earlier posts in this blog on <a href=\"https://doi.org/10.1021/ci034244p\">CMLRSS</a>, and received a question today on how to integrate\nCMLRSS with blogs on blogspot.com. Now, <a href=\"http://www.ch.ic.ac.uk/rzepa/cmlrss_distrib/\">current CMLRSS feeds</a> are normally generated with customized\nscripts, often directly from a database.</p>\n\n<p>So, here’s my attempt to include CML in a blogspot.com blog. <a href=\"http://openbabel.sf.net/\">OpenBabel 2.0</a> can create good CML, for example for acetic acid:</p>\n\n<cml:molecule xmlns:cml=\"http://www.xml-cml.org/schema/cml2/core\">\n<cml:atomArray atomID=\"a1 a2 a3 a4\" elementType=\"C C O O\" formalCharge=\"0 0 0 0\" />\n<cml:bondArray atomRef1=\"a1 a2 a2\" atomRef2=\"a2 a3 a4\" order=\"1 2 1\" />\n</cml:molecule>\n\n<p>Nothing much to see, right? Well, that’s good, because it’s inserted as CML, not as anything readable, like this equivalent:</p>\n\n<div class=\"language-xml highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nt\">&lt;cml:molecule</span> <span class=\"na\">xmlns:cml=</span><span class=\"s\">\"http://www.xml-cml.org/schema/cml2/core\"</span><span class=\"nt\">&gt;</span>\n<span class=\"nt\">&lt;cml:atomArray</span> <span class=\"na\">atomID=</span><span class=\"s\">\"a1 a2 a3 a4\"</span> <span class=\"na\">elementType=</span><span class=\"s\">\"C C O O\"</span> <span class=\"na\">formalCharge=</span><span class=\"s\">\"0 0 0 0\"</span><span class=\"nt\">/&gt;</span>\n<span class=\"nt\">&lt;cml:bondArray</span> <span class=\"na\">atomRef1=</span><span class=\"s\">\"a1 a2 a2\"</span> <span class=\"na\">atomRef2=</span><span class=\"s\">\"a2 a3 a4\"</span> <span class=\"na\">order=</span><span class=\"s\">\"1 2 1\"</span><span class=\"nt\">/&gt;</span>\n<span class=\"nt\">&lt;/cml:molecule&gt;</span>\n</code></pre></div></div>\n\n<p>I am curious how this will come out in the RSS feed. Maybe it is usefull; please read the comments for additional notes.</p>",
      "summary": "You might have read earlier posts in this blog on CMLRSS, and received a question today on how to integrate CMLRSS with blogs on blogspot.com. Now, current CMLRSS feeds are normally generated with customized scripts, often directly from a database.",
      
      "date_published": "2006-02-18T00:00:00+00:00",
      "date_modified": "2006-02-18T00:00:00+00:00",
      "tags": ["cml","semweb"],
      "_references": [{ "url": "https://doi.org/10.1021/CI034244P" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2006/02/17/chemical-reactions-in-cml.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/02/17/chemical-reactions-in-cml.html",
      "title": "Chemical reactions in CML",
      "content_html": "<p>Gemma Holiday’s article on CMLReact was published in the january issue of the <a href=\"http://pubs3.acs.org/acs/journals/toc.page?incoden=jcisd8\">JCIM</a>\n(doi:<a href=\"https://doi.org/10.1021/ci0502698\">10.1021/ci0502698</a>), which seems to be marked as sample issue right now. She used CMLReact as data format for\n<a href=\"http://www-mitchell.ch.cam.ac.uk/macie/\">MACiE <i class=\"fa-solid fa-link-slash fa-xs\"></i></a> (see doi:<a href=\"https://doi.org/10.1093/bioinformatics/bti693\">10.1093/bioinformatics/bti693</a>), a\ndatabase of 100 enzyme reactions, with fully annotated reaction mechanisms, making this an remarkable and insightfull database.</p>\n\n<p>Now, the nice thing is that this CML should be readable and renderable by the <a href=\"http://cdk.sf.net/\">CDK</a>, though the webinterface uses\nSVG and can be used using <a href=\"http://www.mozilla.com/firefox/\">FireFox</a> too.</p>",
      "summary": "Gemma Holiday’s article on CMLReact was published in the january issue of the JCIM (doi:10.1021/ci0502698), which seems to be marked as sample issue right now. She used CMLReact as data format for MACiE (see doi:10.1093/bioinformatics/bti693), a database of 100 enzyme reactions, with fully annotated reaction mechanisms, making this an remarkable and insightfull database.",
      
      "date_published": "2006-02-17T00:00:00+00:00",
      "date_modified": "2006-02-17T00:00:00+00:00",
      "tags": ["cdk","cml","bioinfo"],
      "_references": [{ "url": "https://doi.org/10.1021/ci0502698" },{ "url": "https://doi.org/10.1093/bioinformatics/bti693" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2006/02/15/hot-articles-mining-semantic-web.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/02/15/hot-articles-mining-semantic-web.html",
      "title": "Hot articles; mining the semantic web",
      "content_html": "<p><a href=\"https://web.archive.org/web/20100730101359/http://www.molgen.mpg.de/~krause//\">Roland Krause <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>\n<a href=\"http://binf.twoday.net/stories/1572879/\">discussed today</a> in his blog <a href=\"http://binf.twoday.net/\">Notes from the Biomass</a> an interesting\nwebsite: <a href=\"https://web.archive.org/web/20060409032031/http://postgenomic.com/\">postgenomic.com <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>.\nThis website, still marked BETA, mines blogs in the field of genomics and extract noteworthy statistics from it: which articles are cited in those blogs.</p>\n\n<p>For example, the most discussed article is Kai Wang’s <a href=\"https://doi.org/10.1038/439534a\">Gene-function wiki would let biologists pool worldwide resources <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nin <a href=\"http://www.nature.com/\">Nature</a>. Additionally, postgenomic.com links to the DOI, PubMed and shows which blogs discuss the article.</p>\n\n<p>Wow. This really shows what happens when you start doing things in a semantic way!</p>\n\n<p>Now, what does this mean to the <em>molecular web</em>? We already have chemistry enriched blogs, i.e.\n<a href=\"https://doi.org/10.1021/ci034244p\">CMLRSS <i class=\"fa-solid fa-recycle fa-xs\"></i></a>. Now, let’s make a website\nthat mines chemoinformatics blogs in the same way that postgenomic.com does, and not stick with statistics for article citations,\nbut add statistics for citing molecules too! Start discussing the molecules we find in our CMLRSS feeds!</p>",
      "summary": "Roland Krause discussed today in his blog Notes from the Biomass an interesting website: postgenomic.com . This website, still marked BETA, mines blogs in the field of genomics and extract noteworthy statistics from it: which articles are cited in those blogs.",
      
      "date_published": "2006-02-15T00:00:00+00:00",
      "date_modified": "2023-08-21T00:00:00+00:00",
      "tags": ["bioinfo","cb","semweb"],
      "_references": [{ "url": "https://doi.org/10.1021/CI034244P" },{ "url": "https://doi.org/10.1038/439534a" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2006/02/13/kalzium-wins-award-carsten-niehaus.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/02/13/kalzium-wins-award-carsten-niehaus.html",
      "title": "Kalzium Wins Award; Carsten Niehaus Interviewed",
      "content_html": "<p>I was very pleased to read today that <a href=\"http://edu.kde.org/kalzium/\">Kalzium</a>, one of the projects that participate in the\n<a href=\"http://blueobelisk.org/\">Blue Obelisk</a>, <a href=\"http://dot.kde.org/1139779450/\">got awarded</a>! Cheers, Carsten!</p>",
      "summary": "I was very pleased to read today that Kalzium, one of the projects that participate in the Blue Obelisk, got awarded! Cheers, Carsten!",
      
      "date_published": "2006-02-13T00:00:00+00:00",
      "date_modified": "2006-02-13T00:00:00+00:00",
      "tags": ["kde","blue-obelisk"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2006/02/06/test-suite-for-free-open-source-jvms.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/02/06/test-suite-for-free-open-source-jvms.html",
      "title": "An test suite for free, open source JVMs",
      "content_html": "<p>This weekend I continued my work on getting the <a href=\"http://cdk.sf.net/\">CDK</a> and <a href=\"http://www.jmol.org/\">Jmol</a> run with free, open source JVMs.\nReally, a lot works fine, as reported earlier in this blog: JChemPaint works and Jmol almost works (see the\n<a href=\"http://developer.classpath.org/mediation/FreeSwingTestApps\">Classpath’s FreeSwingTestApps wiki page</a>), and well over 95% of the CDK JUnit\ntests run without trouble too. So it comes down to identifying what does not run properly, and file bugs for this. For example,\n<a href=\"http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26101\">26101</a> and <a href=\"http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26108\">26108</a>.</p>\n\n<p>To make this finding bugs in Classpath and the free virtual machines easier, I have setup a CDK based test suite: the CDK\n<a href=\"http://sourceforge.net/project/showfiles.php?group_id=20024\">OpenSource JVM Test Suite</a>. The idea is it can be used for regression testing,\nand identification of bugs in the virtual machines. It can also be used to do timing benchmarks, and I will report on both of these soon.</p>\n\n<p>But I first need to write some scripts to make nice XHTML pages. And, I have tweaked the CDK tests to skip known bugs, so that all reported\nbugs are actually caused by the virtual machine and the Java library that it uses, and not by a bug in the CDK itself.</p>",
      "summary": "This weekend I continued my work on getting the CDK and Jmol run with free, open source JVMs. Really, a lot works fine, as reported earlier in this blog: JChemPaint works and Jmol almost works (see the Classpath’s FreeSwingTestApps wiki page), and well over 95% of the CDK JUnit tests run without trouble too. So it comes down to identifying what does not run properly, and file bugs for this. For example, 26101 and 26108.",
      
      "date_published": "2006-02-06T00:00:00+00:00",
      "date_modified": "2006-02-06T00:00:00+00:00",
      "tags": ["linux","java","cdk","jmol"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/3vppj-ez166",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/02/06/tagging-blog-items.html",
      "title": "Tagging blog items",
      "content_html": "<p>If you have read <a href=\"/blog/2006/02/06/blog-about-bioinformatics-semantic-web.html\">my previous post</a>\nand visited that other blog, you might have noted the\n<a href=\"http://web.archive.org/web/20060207020403/http://www.technorati.com/tags/\">Technorati keywords <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>.\nOr tags, really, as explained in this <a href=\"http://microformats.org/wiki/reltag\">rel=”tag”</a> microformat. Adding them\nto blog items, will enable indexing by Technorati, one of the bigger blog search engines. So, from now on,\nyou’ll see these tags in my items too, hoping they don’t get annoying. No idea, btw, how blog planets respond to them…\nFor the record, the tags I list below are general for my blog, and not for this blog item specifically.</p>\n\n<p>Update: The idea was discontinued at some point. The tags are now local to this blog.</p>",
      "summary": "If you have read my previous post and visited that other blog, you might have noted the Technorati keywords . Or tags, really, as explained in this rel=”tag” microformat. Adding them to blog items, will enable indexing by Technorati, one of the bigger blog search engines. So, from now on, you’ll see these tags in my items too, hoping they don’t get annoying. No idea, btw, how blog planets respond to them… For the record, the tags I list below are general for my blog, and not for this blog item specifically.",
      
      "date_published": "2006-02-06T00:00:00+00:00",
      "date_modified": "2023-08-19T00:00:00+00:00",
      "tags": ["cheminf","chemometrics","bioinfo","technorati"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2006/02/06/blog-about-bioinformatics-semantic-web.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/02/06/blog-about-bioinformatics-semantic-web.html",
      "title": "A blog about bioinformatics, semantic web, comics and social networks.",
      "content_html": "<p>I never got around to mentioning this blog, but <a href=\"http://plindenbaum.blogspot.com/\">YAKAFOKON</a> is a nice blog about, as the\ntitel already says, bioinformatics, the semantic web and social networks. Nice to read, and interesting comments on the\nfunction and features of the internet and how they relate to bioinformatics, and science in general. Recommended!</p>",
      "summary": "I never got around to mentioning this blog, but YAKAFOKON is a nice blog about, as the titel already says, bioinformatics, the semantic web and social networks. Nice to read, and interesting comments on the function and features of the internet and how they relate to bioinformatics, and science in general. Recommended!",
      
      "date_published": "2006-02-06T00:00:00+00:00",
      "date_modified": "2006-02-06T00:00:00+00:00",
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2006/02/04/skype-on-kubuntu-using-tiptel-usb.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/02/04/skype-on-kubuntu-using-tiptel-usb.html",
      "title": "Skype on Kubuntu using a Tiptel USB telephone",
      "content_html": "<p>Because I wanted to test internet telephony I downloaded <a href=\"http://www.skype.com/\">Skype</a> and tried to get it to work on my\n<a href=\"http://www.kubuntu.org/\">Kubuntu</a> system. Unfortunately, the Skype version is only 1.2.0.18, and it does not work well with\n<code class=\"language-plaintext highlighter-rouge\">arts</code> :( That is, using <code class=\"language-plaintext highlighter-rouge\">artsdsp</code> it crashes with segfaults whenever I start even a chat, let alone a phone call. This\ncould be worked around by disabling sound in my KDE session, and then the <code class=\"language-plaintext highlighter-rouge\">/dev/dsp</code> is open again.</p>\n\n<p>Better even, I bought a USB telephone yesterday: a reasonably cheap <a href=\"http://www.tiptel.nl/\">Tiptel 115</a>, with\n<a href=\"http://www.skypefoon.nl/skype_telefoon_info.php/products_id/126\">Skype support</a>. Kubunty breezy recognized the USB device,\nadded a <code class=\"language-plaintext highlighter-rouge\">/dev/dsp1</code> and after running <code class=\"language-plaintext highlighter-rouge\">alsamixer</code> to raise the sound levels, it seems to work fine, though did not have an\nactual phone call yet :) I enabled KDE sound again, which is in the first device, and Skype runs on the second.\nNo more segfaults it seems.</p>",
      "summary": "Because I wanted to test internet telephony I downloaded Skype and tried to get it to work on my Kubuntu system. Unfortunately, the Skype version is only 1.2.0.18, and it does not work well with arts :( That is, using artsdsp it crashes with segfaults whenever I start even a chat, let alone a phone call. This could be worked around by disabling sound in my KDE session, and then the /dev/dsp is open again.",
      
      "date_published": "2006-02-04T00:00:00+00:00",
      "date_modified": "2006-02-04T00:00:00+00:00",
      "tags": ["linux"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/3j6pf-yw823",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/02/02/dutch-google-news-themes-messed-up.html",
      "title": "Dutch Google News themes messed up",
      "content_html": "<p>Recently, a <a href=\"http://news.google.nl/\">Dutch version of Google News</a> was started, and might mean a replacement for\n<a href=\"http://nu.nl/\">nu.nl</a>. I do not like the verbose layout much, because it makes it more difficult to scan headlines.\nI do like the themes. Except for one.</p>\n\n<p>The English theme ‘Sci/Tech’ is Wetenschap in the Dutch version, or plain Science. And it annoys me to read IT headlines\nwhen looking up scientific news. Is a IE 7 beta really science, or did the translators mess up? (If any Google employee\nis reading this: please split up those two themes.)</p>",
      "summary": "Recently, a Dutch version of Google News was started, and might mean a replacement for nu.nl. I do not like the verbose layout much, because it makes it more difficult to scan headlines. I do like the themes. Except for one.",
      
      "date_published": "2006-02-02T00:00:00+00:00",
      "date_modified": "2023-08-16T00:00:00+00:00",
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/1amt8-5me42",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/02/01/open-source-jmol-hits-student-text.html",
      "title": "Open source Jmol hits student text book Biochemistry",
      "content_html": "<p>Today I received news on the <a href=\"http://sourceforge.net/mail/?group_id=23629\">Jmol user list</a> that Lubert Stryer’s\n<a href=\"https://www.macmillanlearning.com/college/us/product/Biochemistry/p/1319333621\">Biochemistry <i class=\"fa-solid fa-recycle fa-xs\"></i></a> replaced the\n<a href=\"https://en.wikipedia.org/wiki/MDL_Chime\">proprietary Chime <i class=\"fa-solid fa-recycle fa-xs\"></i></a> with the open source\n<a href=\"http://www.jmol.org/\">Jmol</a>. The third edition from which I learned biochemistry in my first year at the university did not feature a CD with live\nfigures, but I am very thrilled to see a program on which I have actively programmed hit a text book I used myself in the past.</p>",
      "summary": "Today I received news on the Jmol user list that Lubert Stryer’s Biochemistry replaced the proprietary Chime with the open source Jmol. The third edition from which I learned biochemistry in my first year at the university did not feature a CD with live figures, but I am very thrilled to see a program on which I have actively programmed hit a text book I used myself in the past.",
      
      "date_published": "2006-02-01T00:00:00+00:00",
      "date_modified": "2023-08-16T00:00:00+00:00",
      "tags": ["jmol","publishing"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2006/01/27/1d-nmr-spectra-do-not-work-in-qspr.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/01/27/1d-nmr-spectra-do-not-work-in-qspr.html",
      "title": "1D NMR Spectra do not work in QSPR",
      "content_html": "<p>About two years ago a student started with me to work on the use of 1D NMR and IR spectra in quantitative structure-activity relationship\n(QSAR) work, with the goal to show that these spectra contain 3D information relevent to QSAR models. It is known that these spectra\ndepend on the 3D conformation of the molecule.</p>\n\n<p>Half a year later we concluded that from the data which we started with (48 compounds with binding affinity), no conclusions could be drawn\nwhat so ever: no statistically sound models could be build at all. So, we composed three larger data sets. These sets, all QSPR data sets,\ndid give us models, but all the spectra based models were worse than a <a href=\"http://web.archive.org/web/20080113162439/http://www.talete.mi.it/dragon_net.htm\">Dragon <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>\ndescriptor based model using the same number of variables, without doing any variable selection.</p>\n\n<p>I presented this work at the 7th <a href=\"https://iccs-nl.org/\">ICCS <i class=\"fa-solid fa-recycle fa-xs\"></i></a> in Noordwijkerhout half a year ago, and now got published in the JCIM: DOI\n<a href=\"https://doi.org/10.1021/ci050282s\">10.1021/ci050282s</a>. Comments on this article are <strong><em>most</em></strong> welcome!</p>",
      "summary": "About two years ago a student started with me to work on the use of 1D NMR and IR spectra in quantitative structure-activity relationship (QSAR) work, with the goal to show that these spectra contain 3D information relevent to QSAR models. It is known that these spectra depend on the 3D conformation of the molecule.",
      
      "date_published": "2006-01-27T00:00:00+00:00",
      "date_modified": "2023-08-14T00:00:00+00:00",
      "tags": ["cheminf"],
      "_references": [{ "url": "https://doi.org/10.1021/CI050282S" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/rn78z-r7j37",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/01/22/trouble-running-cdk-junit-tests-with.html",
      "title": "Trouble running the CDK JUnit tests with Cacao and Kaffe",
      "content_html": "<p>Because I am still looking forward to testing CDK against the latest <a href=\"http://gnu.wildebeest.org/diary/index.php?p=147\">Classpath 0.20</a>,\nI downloaded cacao 0.94-1 for Debian sid, then tried to compile CDK with it:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nv\">JAVA_HOME</span><span class=\"o\">=</span>/usr/lib/jvm/cacao ant <span class=\"nt\">-Dbuild</span>.compiler<span class=\"o\">=</span>gcj clean test-all\n</code></pre></div></div>\n\n<p>But that hangs at some point with zero load. I have no idea what is going on there. I’ve spoken with twisti on the\n#classpath IRC channel, and he helped me run the compile with gdb, which indicated that at some point all threads were waiting.</p>\n\n<p>I also tried it with kaffe 1.1.6.91-2 in sid, but now with a XML parser in the CLASSPATH, as Dalibor in\n<a href=\"/blog/2006/01/06/open-source-java-tool-chain-cdk.html\">a previous blog item suggested <i class=\"fa-solid fa-recycle fa-xs\"></i></a>:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nb\">export </span><span class=\"nv\">CLASSPATH</span><span class=\"o\">=</span>/usr/share/java/xercesImpl.jar:xmlParserAPIs.jar\n<span class=\"nv\">JAVA_HOME</span><span class=\"o\">=</span>/usr/lib/kaffe ant <span class=\"nt\">-Dbuild</span>.compiler<span class=\"o\">=</span>gcj clean test-all\n</code></pre></div></div>\n\n<p>But that failed too with:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nb\">test</span>:\n    <span class=\"o\">[</span>junit] Running org.openscience.cdk.test.CDKTests\n    <span class=\"o\">[</span>junit] kaffe-bin: /home/mkoch/debian/kaffe/kaffe-1.1.6.91/build-tree/kaffe-1.1.6.91/kaffe/kaffevm/jit3/machine.c:276: translate: Assertion <span class=\"sb\">`</span>reinvoke <span class=\"o\">==</span> <span class=\"nb\">false</span><span class=\"s1\">' failed.\n    [junit] Test org.openscience.cdk.test.CDKTests FAILED\n</span></code></pre></div></div>\n\n<p>It did work previously :(</p>\n\n<p>OK, to reproduce this yourself, you need to check out CDK from CVS (hoping that anonymous CVS is reasonable in sync, and online) with:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>cvs <span class=\"nt\">-d</span>:pserver:anonymous@cvs.sourceforge.net:/cvsroot/cdk login\ncvs <span class=\"nt\">-z3</span> <span class=\"nt\">-d</span>:pserver:anonymous@cvs.sourceforge.net:/cvsroot/cdk co <span class=\"nt\">-P</span> cdk\n</code></pre></div></div>",
      "summary": "Because I am still looking forward to testing CDK against the latest Classpath 0.20, I downloaded cacao 0.94-1 for Debian sid, then tried to compile CDK with it:",
      
      "date_published": "2006-01-22T00:00:00+00:00",
      "date_modified": "2023-08-11T00:00:00+00:00",
      "tags": ["cdk","java"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/hgjn2-w5e63",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/01/19/free-at-last.html",
      "title": "Free at last!",
      "content_html": "<p>Free at last! Well, not quite yet, but close enough anyway: my PhD contract has ended; last friday was my last working day, which my\ncollegues and I celebrated with a visit to Nijmegen oldest bar, <a href=\"https://indeblaauwehand.nl/in-de-blaauwe-hand/\">In de Blauwe Hand <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.\nBut I still have my manuscript to finish. This formally ends a period of almost 12.5 years at the <a href=\"http://ru.nl/\">Radboud University Nijmegen</a>.</p>\n\n<p>Starting last monday I’m at home, trying to get things finished as soon as possible. Mostly working on my laptop, remote logged in into\nour desktop machine downstairs. A good ADSL (170kB downstream) helps a lot too, and the proxy on my university machine allows me to\naccess the full access journals of my university.</p>\n\n<p>I’m trying to dome some open source chemoinformatics in between writing, and my current QSAR research actually allows me to do some\nfeature enhancement in CDK’s QSAR package too. Today, I hope to write and finish a <a href=\"http://sourceforge.net/mailarchive/forum.php?thread_id=9476956&amp;forum_id=2178\">config file architecture <i class=\"fa-solid fa-link-slash fa-xs\"></i></a>\nthat allow fine tuning which QSAR descriptors should be calculated. I anticipate a default config files to be distributed.</p>\n\n<p>Additionally, I will try to finish running teh CDK JUnit test against <a href=\"http://gnu.wildebeest.org/diary/index.php?p=147\">Classpath 0.20</a>,\nwhich 98% of Java 1.4.2 covered, and the limited support for HTML rendering is most of this last 2%. The Classpath progress has\nreally amazed me over the last few weeks. I have not tested Jmol and JChemPaint against the latest open source java tools, but will\ntry to do that before I go on holiday next week. Results with 0.19 were very promising, as I reported in earlier blog entries.</p>",
      "summary": "Free at last! Well, not quite yet, but close enough anyway: my PhD contract has ended; last friday was my last working day, which my collegues and I celebrated with a visit to Nijmegen oldest bar, In de Blauwe Hand . But I still have my manuscript to finish. This formally ends a period of almost 12.5 years at the Radboud University Nijmegen.",
      
      "date_published": "2006-01-19T00:00:00+00:00",
      "date_modified": "2023-08-10T00:00:00+00:00",
      "tags": ["phd","cdk"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2006/01/11/uspto-considers-open-source-software.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/01/11/uspto-considers-open-source-software.html",
      "title": "USPTO considers open source software prior art",
      "content_html": "<p>This is the best news I heard in weeks! The <a href=\"http://www.uspto.gov/\">US Patent and Trade Offfice</a> spoke with open source representatives\nabout ways to deal with open source software as prior art. Apparently, their problem was how to be sure about release dates of open source,\nand authoritative sites like <a href=\"http://www.sf.net/\">SourceForge.net</a>,\n<a href=\"http://freshmeat.net/\">FreshMeat.net</a> help a lot here, which extensive logging of releases.</p>\n\n<p>Quoting from there website:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>The Department of Commerce’s United States Patent and Trademark Office (USPTO)\nhas created a partnership with the open source community to ensure that patent\nexaminers have access to all available prior art relating to software code\nduring the patent examination process.\n</code></pre></div></div>\n\n<p>It also indicates that releasing open source software with, or announcing it on, such an authoritative website is important! Otherwise, patent offices will not be able to decide wether our open source art is really prior.</p>",
      "summary": "This is the best news I heard in weeks! The US Patent and Trade Offfice spoke with open source representatives about ways to deal with open source software as prior art. Apparently, their problem was how to be sure about release dates of open source, and authoritative sites like SourceForge.net, FreshMeat.net help a lot here, which extensive logging of releases.",
      
      "date_published": "2006-01-11T00:00:00+00:00",
      "date_modified": "2006-01-11T00:00:00+00:00",
      "tags": ["openscience"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2006/01/06/open-source-java-tool-chain-cdk.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/01/06/open-source-java-tool-chain-cdk.html",
      "title": "Open Source Java tool chain: CDK compiles and JUnit tests run",
      "content_html": "<p>While waiting for a <a href=\"http://www.talete.mi.it/products/dragon_description.htm\">Dragon</a> calculation to finish (it does not work for molecules with more than\n300 atoms!), I updated <a href=\"http://cdk.sf.net/\">CDK</a>’s build.xml to support <a href=\"http://www.gnu.org/software/classpath/cp-tools/\">gjdoc</a>. The build script is now\nable to compile the custom doclets we use for creating the <code class=\"language-plaintext highlighter-rouge\">src/*.javafiles</code> and others from the Java source files. And using\n<a href=\"http://gcc.gnu.org/onlinedocs/gcc-3.0.4/gcj_8.html\">gij</a> I could also run\n<a href=\"http://cvs.sourceforge.net/viewcvs.py/cdk/cdk/src/org/openscience/cdk/test/\">CDK’s 1688 JUnit tests</a>!</p>\n\n<p>On my Debian GNU/Linux sid chroot, I have java-gcj-compat installed allowing me to do (thanx man-di!):</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>JAVA_HOME=/usr/lib/jvm/java-1.4.2-gcj-4.0-1.4.2.0 ant -Dbuild.compiler=gcj runDoclet\nJAVA_HOME=/usr/lib/jvm/java-1.4.2-gcj-4.0-1.4.2.0 ant -Dbuild.compiler=gcj test-all\n</code></pre></div></div>\n\n<p>The first command creates the custom doclets, while the second command compiles the CDK and runs the JUnit tests. For Classpath developers:\n<a href=\"http://sourceforge.net/cvs/?group_id=20024\">here</a>’s how to check out the cdk module from CVS.</p>\n\n<p>The results are interesting: while Sun’s JVM gives 11 problems, gij gives 399 problems. The test-all target creates a <code class=\"language-plaintext highlighter-rouge\">reports/result.txt</code>\ndocument listing all failing tests, and I’ve put the <a href=\"http://www.woc.science.ru.nl/devel/egonw/diff_cdk_junit_sun_vs_gij_debianSid_20060106.txt\">diff -u</a>\nfor the two JVMs online. I will make diffs for jamvm, kaffe and cacao too.</p>\n\n<p>I hope this gives the free Java community extra feedback on the excellent work they are doing.</p>",
      "summary": "While waiting for a Dragon calculation to finish (it does not work for molecules with more than 300 atoms!), I updated CDK’s build.xml to support gjdoc. The build script is now able to compile the custom doclets we use for creating the src/*.javafiles and others from the Java source files. And using gij I could also run CDK’s 1688 JUnit tests!",
      
      "date_published": "2006-01-06T00:00:00+00:00",
      "date_modified": "2006-01-06T00:00:00+00:00",
      "tags": ["cdk","java"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/s0ftg-ppb65",
      "url": "https://chem-bla-ics.linkedchemistry.info/2006/01/03/kubuntu-xrandr-and-tv-out.html",
      "title": "Kubuntu, XRandR and TV-OUT",
      "content_html": "<p>One of the things I had not fully figured out up to today, was how to configure my <a href=\"http://www.kubuntu.org/\">Kubuntu</a> system to easily view DVDs on our TV,\nusing my <a href=\"http://www.nvidia.com/\">NVIDIA</a>’s TV-OUT. I’ve seen xorg.conf files that define a X11 server for the monitor and a second for the TV, and files\nthat use TwinView. Now, I did not really like the way first option worked, so tried the second.</p>\n\n<p>Unfortunately, I had to reconfigure and restart my X11 each time my kids wanted to see <a href=\"https://nl.wikipedia.org/wiki/Bob_de_Bouwer\">Bob the Builder <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.\nI already knew about <a href=\"http://wiki.x.org/X11R6.8.1/doc/Xrandr.3.html\">XRandR</a>, and today finally had a look at it again, and got it to work without much\ntrouble this time. (Lesson: if something does not work, let it rest and try again half a year later.)</p>\n\n<p>For the googlers, this is what my <a href=\"http://wiki.x.org/X11R6.8.0/doc/xorg.conf.5.html\">xorg.conf</a> <code class=\"language-plaintext highlighter-rouge\">Screen</code> section now looks like:</p>\n\n<div class=\"language-plaintext highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>Section \"Screen\"\n Identifier \"Default Screen\"\n Device  \"NVIDIA Corporation NV18 [GeForce4 MX 4000 AGP 8x]\"\n Monitor  \"Hansol H711\"\n DefaultDepth 24\n Option \"TwinView\" \"on\"\n Option \"TwinViewOrientation\" \"clone\"\n Option \"SecondMonitorHorizSync\"     \"30-50\"\n Option \"SecondMonitorVertRefresh\"   \"60\"\n Option  \"MetaModes\" \"1280x1024,1280x1024;1024x768,1024x768\"\n Option \"TVStandard\" \"PAL-B\"\n Option \"TVOutFormat\" \"SVIDEO\"\n Option \"ConnectedMonitor\" \"crt, tv\"\n SubSection \"Display\"\n  Depth  24\n  Modes  \"1280x1024\" \"1024x768\" \"832x624\" \"800x600\" \"720x400\" \"640x480\"\n EndSubSection\nEndSection\n</code></pre></div></div>\n\n<p>And now, to switch resolution, I can just do:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nb\">sudo </span>xrandr <span class=\"nt\">-s</span> 1\n<span class=\"c\"># watch DVD</span>\n<span class=\"nb\">sudo </span>xrandr <span class=\"nt\">-s</span> 0\n</code></pre></div></div>\n\n<p>PS. Happy new year!</p>",
      "summary": "One of the things I had not fully figured out up to today, was how to configure my Kubuntu system to easily view DVDs on our TV, using my NVIDIA’s TV-OUT. I’ve seen xorg.conf files that define a X11 server for the monitor and a second for the TV, and files that use TwinView. Now, I did not really like the way first option worked, so tried the second.",
      
      "date_published": "2006-01-03T00:00:00+00:00",
      "date_modified": "2023-08-09T00:00:00+00:00",
      "tags": ["kde","linux"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/2jyfn-d1910",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/12/28/good-bad-and-ugly-molecules.html",
      "title": "The good, the bad and the ugly molecules",
      "content_html": "<p>Derek Lowe is the author of the blog <a href=\"https://web.archive.org/web/20051229035537/http://corante.com/pipeline/\">In the Pipeline <i class=\"fa-solid fa-box-archive fa-xs\"></i></a> which is really fun to read. Derek works in\npharmaceutical industry and gives a great insight in how things work in that field of molecular sciences. Yesterday he blogged about\n<a href=\"https://web.archive.org/web/20080611192217/http://www.corante.com/pipeline/archives/2005/12/27/what_makes_an_ugly_molecule.php\">What Makes an Ugly Molecule? <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>, and touches the\nRule-of-Five, the hydrochloric acid bath (aka stomach), and other reasons that make molecules ugly.</p>\n\n<p>But there are many other interesting posts, and, something that my blog still lacks, comments by many users, discussing the ideas he\nposts, making his blog even nicer.</p>",
      "summary": "Derek Lowe is the author of the blog In the Pipeline which is really fun to read. Derek works in pharmaceutical industry and gives a great insight in how things work in that field of molecular sciences. Yesterday he blogged about What Makes an Ugly Molecule? , and touches the Rule-of-Five, the hydrochloric acid bath (aka stomach), and other reasons that make molecules ugly.",
      
      "date_published": "2005-12-28T00:00:00+00:00",
      "date_modified": "2023-08-08T00:00:00+00:00",
      "tags": ["chemistry","cheminf"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/1pgeq-yqn56",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/12/27/knoppix-saves-day.html",
      "title": "Knoppix saves the day...",
      "content_html": "<p>After the three obligatory days of christmas holidays (fun, especially with two children, but very exhausting), it is time to get back to business again. I’m still\nat my father-in-laws place with only XP installed, so booted the <a href=\"http://www.knopper.net/knoppix/\">Knoppix 4.0.2 DVD</a> I burned last friday. Eclipse is not working,\nbut being able to use Kmail to read my email again is just what you need as in internet-junkie. A computer is just not complete without a nice KDE session hanging around.</p>\n\n<p>Anyway, booted eclipse on my computer at work, and tunneled the window over SSH. Not overly fast, but it seems to run fine. (If only I knew how to setup NX on\nthat Kubuntu breezy system!) Let’s see if I can get the <a href=\"http://sourceforge.net/tracker/?group_id=20024&amp;atid=120024\">CDK bug count</a> somewhat lower.</p>",
      "summary": "After the three obligatory days of christmas holidays (fun, especially with two children, but very exhausting), it is time to get back to business again. I’m still at my father-in-laws place with only XP installed, so booted the Knoppix 4.0.2 DVD I burned last friday. Eclipse is not working, but being able to use Kmail to read my email again is just what you need as in internet-junkie. A computer is just not complete without a nice KDE session hanging around.",
      
      "date_published": "2005-12-27T00:00:00+00:00",
      "date_modified": "2005-12-27T00:00:00+00:00",
      "tags": ["linux","kde"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2005/12/23/subset-selection-mind-complexity.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/12/23/subset-selection-mind-complexity.html",
      "title": "Subset selection: mind the complexity",
      "content_html": "<p>In a recent <a href=\"http://pubs.acs.org/journals/jcisd8/\">JCIM</a> article, Schuffenhauer <a href=\"http://dx.doi.org/10.1021/ci0503558\">compares</a> a few subset selection\nmethods, and notes that some of them reduce the average complexity of the molecules. They put this in relation to other research that states that\nlead compounds with high complexity have higher activities. Recommended reading material for the holidays.</p>",
      "summary": "In a recent JCIM article, Schuffenhauer compares a few subset selection methods, and notes that some of them reduce the average complexity of the molecules. They put this in relation to other research that states that lead compounds with high complexity have higher activities. Recommended reading material for the holidays.",
      
      "date_published": "2005-12-23T00:00:00+00:00",
      "date_modified": "2023-08-14T00:00:00+00:00",
      "tags": ["cheminf"],
      "_references": [{ "url": "https://doi.org/10.1021/ci0503558" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/q7ehm-v1m81",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/12/18/statcvs-on-cdk.html",
      "title": "StatCVS on CDK",
      "content_html": "<p>One of the <a href=\"http://www.classpath.org/\">Classpath</a> developers pointed me to their\n<a href=\"http://object-refinery.com/classpath/statcvs/\">CVS statistics</a> when I asked them\nhow actively their project is currently developed, i.e. the number of active developers.</p>\n\n<p>The pages are generated with <a href=\"http://statcvs.sourceforge.net/\">StatCVS</a>, and I ran it one the CDK too.</p>\n\n<p>I knew I did a lot of work on the CDK, but never realized that <a href=\"http://www.woc.science.ru.nl/devel/egonw/log.html/authors.html\">62.7%</a>\nof the commits were mine! Keep in mind, though, that a lot of these commits are for code maintainance! Next in line are\n<a href=\"http://almost.cubic.uni-koeln.de/jrg/Members/steinbeck\">steinbeck</a> and <a href=\"http://blue.chem.psu.edu/~rajarshi/\">rajarshi</a>.\nIn total 28 people commited patches to CVS, though other people contributed patches too, which were commited by a developer with write\naccess. There is jump in the commit messages somewhere this summer, which I think is the move of the data directory from cdk/data to\ncdk/src/data.</p>\n\n<p>The full analysis results can be found <a href=\"http://www.woc.science.ru.nl/devel/egonw/log.html/\">here</a>. It was generated with the\n<a href=\"http://packages.debian.org/unstable/devel/statcvs\">StatCVS version in sid</a>, and will rerun it soon with a more recent StatCVS version.</p>",
      "summary": "One of the Classpath developers pointed me to their CVS statistics when I asked them how actively their project is currently developed, i.e. the number of active developers.",
      
      "date_published": "2005-12-18T00:00:00+00:00",
      "date_modified": "2005-12-18T00:00:00+00:00",
      "tags": ["cdk"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/6p49t-sj396",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/12/16/cdk-debug-classes-and-fixing.html",
      "title": "CDK Debug classes and fixing the ModelBuilder3D bug",
      "content_html": "<p>For some weeks now I have been thinking about bug <a href=\"https://sourceforge.net/tracker/index.php?func=detail&amp;aid=1309731&amp;group_id=20024&amp;atid=120024\">1309731</a>:\n“ModelBuilder3D overwrites Atom IDs”. The <a href=\"http://cvs.sourceforge.net/viewcvs.py/cdk/cdk/src/org/openscience/cdk/modeling/builder3d/ModelBuilder3D.java?rev=1.23&amp;view=markup\">ModelBuilder3D</a>\nis a complex piece of source code, reusing many other parts of the CDK, including\n<a href=\"http://cdk.sourceforge.net/api/org/openscience/cdk/atomtype/package-summary.html\">atom type perception</a>.</p>\n\n<p>Somewhere in October, however, I found that Taverna could not create 3D models and convert these into reasonable CML because the Atom ID’s were messed up. So the question is, where did the\nModelBuilder3D do this? Did it do this itself, or is it done by one of the other pieces of CDK that it uses? But due to the complex nature of this algorithm, it quickly became clear\nthat looking at the code was not going to solve it; there was too much code to look at.</p>\n\n<p>The solution was clear to me: use the [new data interfaces <i class=\"fa-solid fa-recycle fa-xs\">](https://chem-bla-ics.linkedchemistry.info/2005/10/25/more-cdkinterfaces-updates.html).\nTo identify where the IDs where messed up, I only needed to write a DebugAtom class with a method that looked like:</i></p>\n\n<div class=\"language-java highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kd\">public</span> <span class=\"kt\">void</span> <span class=\"nf\">setID</span><span class=\"o\">(</span><span class=\"nc\">String</span> <span class=\"n\">identifier</span><span class=\"o\">)</span> <span class=\"o\">{</span>\n  <span class=\"n\">logger</span><span class=\"o\">.</span><span class=\"na\">debug</span><span class=\"o\">(</span><span class=\"s\">\"Setting ID: \"</span><span class=\"o\">,</span> <span class=\"n\">identifier</span><span class=\"o\">);</span>\n  <span class=\"kd\">super</span><span class=\"o\">.</span><span class=\"na\">setID</span><span class=\"o\">(</span><span class=\"n\">identifier</span><span class=\"o\">);</span>\n<span class=\"o\">}</span>\n</code></pre></div></div>\n\n<p>And I would immediately at what stage the ID was overwritten.</p>\n\n<p>So I started this week to implement the <a href=\"http://cvs.sourceforge.net/viewcvs.py/cdk/cdk/src/org/openscience/cdk/debug/DebugAtom.java?rev=1.1&amp;view=markup\">DebugAtom</a> and related classes.\nBy extending <code class=\"language-plaintext highlighter-rouge\">Atom</code>, I could just add debugging stuff and reuse the code in that class. However, the <code class=\"language-plaintext highlighter-rouge\">DebugAtom</code> can not extend <code class=\"language-plaintext highlighter-rouge\">DebugAtomType</code> too then. And this is a pity,\nbecause all methods inherited by the <code class=\"language-plaintext highlighter-rouge\">Atom</code> interface from <code class=\"language-plaintext highlighter-rouge\">AtomType</code>, <code class=\"language-plaintext highlighter-rouge\">Isotope</code>, <code class=\"language-plaintext highlighter-rouge\">Element</code> and <code class=\"language-plaintext highlighter-rouge\">ChemObject</code> interfaces could not be inherited from the <code class=\"language-plaintext highlighter-rouge\">DebugAtomType</code> class.\nInstead, they now have to duplicate those bits of code.</p>\n\n<p>This is not a clean solution, as duplicate code is a known cause of bugs. So, the next step was to write JUnit tests for the new debug classes. And for this\nI wanted to reuse, i.e. extend, the tests for the default data classes. This required, however, changes to those test classes.</p>\n\n<p>The first thing that needed to be changed was that instantiation of data classes in the tests would now have to depend on the data classes being tested. A simple</p>\n\n<div class=\"language-java highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nc\">Atom</span> <span class=\"n\">atom</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"nc\">Atom</span><span class=\"o\">(</span><span class=\"s\">\"C\"</span><span class=\"o\">);</span>\n</code></pre></div></div>\n\n<p>only makes sense when a specific <code class=\"language-plaintext highlighter-rouge\">Atom</code> class was important. Fortunately, the new interfaces provide a solution for this: the <code class=\"language-plaintext highlighter-rouge\">ChemObjectBuilder</code> implementations.\nThese allow to use the following syntax to replace the hard coded instantiation:</p>\n\n<div class=\"language-java highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nc\">Atom</span> <span class=\"n\">atom</span> <span class=\"o\">=</span> <span class=\"n\">builder</span><span class=\"o\">.</span><span class=\"na\">newAtom</span><span class=\"o\">(</span><span class=\"s\">\"C\"</span><span class=\"o\">);</span>\n</code></pre></div></div>\n\n<p>Therefore, I added a protected field to the <code class=\"language-plaintext highlighter-rouge\">AtomTest</code>, which was instantiated in the <code class=\"language-plaintext highlighter-rouge\">setUp()</code>:</p>\n\n<div class=\"language-java highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kd\">protected</span> <span class=\"nc\">ChemObjectBuilder</span> <span class=\"n\">builder</span><span class=\"o\">;</span>\n<span class=\"kd\">public</span> <span class=\"kt\">void</span> <span class=\"nf\">setUp</span><span class=\"o\">()</span> <span class=\"o\">{</span>\n  <span class=\"n\">builder</span> <span class=\"o\">=</span> <span class=\"nc\">DefaultChemObjectBuilder</span><span class=\"o\">.</span><span class=\"na\">getInstance</span><span class=\"o\">();</span>\n<span class=\"o\">}</span>\n</code></pre></div></div>\n\n<p>and use this builder to instantiate all test objects, as shows for the atom above.</p>\n\n<p>And then I can simply reuse this JUnit test by defining the <code class=\"language-plaintext highlighter-rouge\">DebugAtomTest</code> like:</p>\n\n<div class=\"language-java highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"kd\">public</span> <span class=\"kd\">class</span> <span class=\"nc\">DebugAtomTest</span> <span class=\"kd\">extends</span> <span class=\"nc\">AtomTest</span> <span class=\"o\">{</span>\n  <span class=\"kd\">public</span> <span class=\"nf\">DebugAtomTest</span><span class=\"o\">(</span><span class=\"nc\">String</span> <span class=\"n\">name</span><span class=\"o\">)</span> <span class=\"o\">{</span>\n    <span class=\"kd\">super</span><span class=\"o\">(</span><span class=\"n\">name</span><span class=\"o\">);</span>\n  <span class=\"o\">}</span>\n\n  <span class=\"kd\">public</span> <span class=\"kt\">void</span> <span class=\"nf\">setUp</span><span class=\"o\">()</span> <span class=\"o\">{</span>\n    <span class=\"kd\">super</span><span class=\"o\">.</span><span class=\"na\">builder</span> <span class=\"o\">=</span> <span class=\"nc\">DebugChemObjectBuilder</span><span class=\"o\">.</span><span class=\"na\">getInstance</span><span class=\"o\">();</span>\n  <span class=\"o\">}</span>\n\n  <span class=\"kd\">public</span> <span class=\"kd\">static</span> <span class=\"nc\">Test</span> <span class=\"nf\">suite</span><span class=\"o\">()</span> <span class=\"o\">{</span>\n    <span class=\"k\">return</span> <span class=\"k\">new</span> <span class=\"nf\">TestSuite</span><span class=\"o\">(</span><span class=\"nc\">DebugAtomTest</span><span class=\"o\">.</span><span class=\"na\">class</span><span class=\"o\">);</span>\n  <span class=\"o\">}</span>\n<span class=\"o\">}</span>\n</code></pre></div></div>\n\n<p>The sources for these debug data classes tests are found in the new <code class=\"language-plaintext highlighter-rouge\">cdk.test.debug</code> package.</p>\n\n<p>The number of JUnit tests for the CDK jumped from around 1250 to over 1500 tests right now. And if you think these new\ntests only test old code, because of all the <code class=\"language-plaintext highlighter-rouge\">super.bla()</code> calls in the debug classes, you’re way off. I found bugs in the\nnew debug classes, but <strong>also</strong> many class cast bugs and several other problems in the real data classes!</p>\n\n<p>Anyway. Does this help fix the <code class=\"language-plaintext highlighter-rouge\">ModelBuilder3D</code> bug? Yes, it does:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nv\">$ </span><span class=\"nb\">grep</span> <span class=\"s2\">\"Setting ID\"</span> reports/result.modeling.builder3d.ModelBuilder3dTest.txt\norg.openscience.cdk.debug.DebugAtom DEBUG: Setting ID: carbon1\norg.openscience.cdk.debug.DebugAtom DEBUG: Setting ID: oxygen1\norg.openscience.cdk.debug.DebugAtom DEBUG: Setting ID: C\norg.openscience.cdk.debug.DebugAtom DEBUG: Setting ID: HC\norg.openscience.cdk.debug.DebugAtom DEBUG: Setting ID: HC\norg.openscience.cdk.debug.DebugAtom DEBUG: Setting ID: HC\norg.openscience.cdk.debug.DebugAtom DEBUG: Setting ID: O\norg.openscience.cdk.debug.DebugAtom DEBUG: Setting ID: HO\n</code></pre></div></div>\n\n<p>This shows me where the <code class=\"language-plaintext highlighter-rouge\">Atom</code> ID is overwritten to be something other than “carbon1”! I can now look at the rest of the\n<code class=\"language-plaintext highlighter-rouge\">result.modeling.builder3d.ModelBuilder3dTest.txt</code> file to see what the <code class=\"language-plaintext highlighter-rouge\">ModelBuilder3D</code> was doing at the time,\nand which CDK class made the <code class=\"language-plaintext highlighter-rouge\">setID()</code> call.</p>\n\n<p>I only needed to change this line in the JUnit test for the bug to generate the above debug lines:</p>\n\n<div class=\"language-java highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nc\">Molecule</span> <span class=\"n\">methanol</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"nc\">Molecule</span><span class=\"o\">();</span>\n</code></pre></div></div>\n\n<p>into</p>\n\n<div class=\"language-java highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nc\">Molecule</span> <span class=\"n\">methanol</span> <span class=\"o\">=</span> <span class=\"k\">new</span> <span class=\"nc\">DebugMolecule</span><span class=\"o\">();</span>\n</code></pre></div></div>",
      "summary": "For some weeks now I have been thinking about bug 1309731: “ModelBuilder3D overwrites Atom IDs”. The ModelBuilder3D is a complex piece of source code, reusing many other parts of the CDK, including atom type perception.",
      
      "date_published": "2005-12-16T00:00:00+00:00",
      "date_modified": "2024-03-23T00:00:00+00:00",
      "tags": ["cdk","cheminf"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/er890-p9m81",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/12/13/math-libraries-for-java.html",
      "title": "Math libraries for Java?",
      "content_html": "<p>I drop in on the <code class=\"language-plaintext highlighter-rouge\">#classpath</code> channel of <a href=\"http://www.freenode.net/\">freenode.net</a> IRC network, where the <code class=\"language-plaintext highlighter-rouge\">#cdk</code> channel runs too.\nThe <code class=\"language-plaintext highlighter-rouge\">#classpath</code> channel is for the <a href=\"http://www.gnu.org/software/classpath/\">Classpath</a> project which is developing the free Java libraries used by most\nopen source virtual machines.</p>\n\n<p>A <a href=\"http://slashdot.org/\">Slashdot.org</a> item was mentioned <a href=\"http://developers.slashdot.org/developers/05/12/13/1824236.shtml?tid=108&amp;tid=156\">“Java Is So 90s”</a>.\nIt lead to a funny discussion about what that would make C/C++ and Fortran. A more serious question was brought up: where are the efficient and super fast\nJava linear algebra and complex number libraries?</p>\n\n<p>There is <a href=\"http://www.cs.waikato.ac.nz/ml/weka/\">Weka</a> but it is more aimed at data analysis. I believe it has support principle component analysis, so it\nmust have singular value decomposition. There is a book called <strong>Java Number Cruncher: The Java Programmer’s Guide to Numerical Computing</strong>\nby Ronald Mak, 2003, Prentice Hall.</p>\n\n<p>After some further asking about it on the channel, they mentioned the <a href=\"http://jakarta.apache.org/commons/math/\">Apache commons math</a> project,\nwhich seems promising. The website mentions complex numbers, linear algebra, statistics and numerical analysis, but have not looked at the full API,\nso not sure how well populated these areas are.</p>\n\n<p>Anyone, with experience in the area of numerical computing and Java?</p>",
      "summary": "I drop in on the #classpath channel of freenode.net IRC network, where the #cdk channel runs too. The #classpath channel is for the Classpath project which is developing the free Java libraries used by most open source virtual machines.",
      
      "date_published": "2005-12-13T00:00:00+00:00",
      "date_modified": "2005-12-13T00:00:00+00:00",
      "tags": ["math","java"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/y0mte-4ns18",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/12/10/jumbo-50-and-cdk.html",
      "title": "Jumbo 5.0 and the CDK",
      "content_html": "<p>I <a href=\"https://egonw.github.io/blog/2005/12/08/jumbo-50-and-cml-support-in-cdk.html\">reported earlier <i class=\"fa-solid fa-recycle fa-xs\"></i></a> that the CDK has been updated in CVS to use\nCML from the new Jumbo 5.0. The transition actually involved a lot of changes in the CDK, some I would like to address in the following comments.\nOne thing is that CML write support (not reading!) uses the new Jumbo library which requires Java 1.5. Thus, if Java 1.5 is not available,\nthen CML writing should not be compiled. This is how this is done.</p>\n\n<h3 id=\"the-javadoc\">The JavaDoc</h3>\n\n<p>The CDK makes extensive use of <a href=\"http://java.sun.com/j2se/1.5.0/docs/guide/javadoc/taglet/spec/com/sun/tools/doclets/Taglet.html\">JavaDoc taglets</a>.\nCDK uses tags of type <code class=\"language-plaintext highlighter-rouge\">@cdk.SOMETAG</code>. And an important tag in this case, is the <code class=\"language-plaintext highlighter-rouge\">@cdk.require</code> tag, becuase it allows us to make the CDK build\nsystem aware that the class requires Java 5.0 to be compiled. Thus, we have for example\n<a href=\"http://cvs.sourceforge.net/viewcvs.py/cdk/cdk/src/org/openscience/cdk/io/CMLWriter.java?rev=1.90&amp;view=log\">this code in CVS</a>, of which bits are:</p>\n\n<div class=\"language-java highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"cm\">/**\n * Serializes a SetOfMolecules or a Molecule object to CML 2 code.\n * Chemical Markup Language is an XML based file format {@cdk.cite PMR99}.\n * Output can be redirected to other Writer objects like StringWriter\n * and FileWriter.\n *\n * @cdk.module       libio-cml\n * @cdk.builddepends xom-1.0.jar\n * @cdk.depends      jumbo50.jar\n * @cdk.require      java1.5\n */</span>\n<span class=\"kd\">public</span> <span class=\"kd\">class</span> <span class=\"nc\">CMLWriter</span> <span class=\"kd\">extends</span> <span class=\"nc\">DefaultChemObjectWriter</span> <span class=\"o\">{</span>\n<span class=\"o\">}</span>\n</code></pre></div></div>\n\n<p>As probably is clear compiling this jars requires a two jars to be present, of which the <code class=\"language-plaintext highlighter-rouge\">jumbo50.jar</code> itself is not required for compiling\nthe class source code. It also shows the use of the <code class=\"language-plaintext highlighter-rouge\">@cdk.require</code> tag.</p>\n\n<h3 id=\"the-buildxml\">The build.xml</h3>\n\n<p>Because the CDK still does not require Java 1.5, the CDK is supposed to be buildable with Java 1.4 (the oldest supported Java release). The\n<a href=\"http://ant.apache.org/\">Ant</a> <a href=\"http://cvs.sourceforge.net/viewcvs.py/cdk/cdk/build.xml?rev=1.310&amp;view=markup\">build.xml</a> script is quite\nable to conditionally leave out compiling parts of the CDK, if configured correctly using proper JavaDoc tags, as explained earlier.</p>\n\n<p>First, the build.xml checks what libraries are available for compiling certain parts of the CDK. For example, the build.xml code to check for Java 1.5 looks like:</p>\n\n<div class=\"language-xml highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nt\">&lt;condition</span> <span class=\"na\">property=</span><span class=\"s\">\"isJava15\"</span><span class=\"nt\">&gt;</span>\n  <span class=\"nt\">&lt;contains</span> <span class=\"na\">string=</span><span class=\"s\">\"${java.version}\"</span> <span class=\"na\">substring=</span><span class=\"s\">\"1.5\"</span><span class=\"nt\">/&gt;</span>\n<span class=\"nt\">&lt;/condition&gt;</span>\n</code></pre></div></div>\n\n<p>Run <code class=\"language-plaintext highlighter-rouge\">ant info</code> to see what is being checked for, or look at the <code class=\"language-plaintext highlighter-rouge\">build.xml</code> source code for the check target.</p>\n\n<p>All compiling is done by the compile-module target, and there it in- and excludes bits of the CDK depending on the checked conditions:</p>\n\n<div class=\"language-xml highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nt\">&lt;javac</span> <span class=\"na\">srcdir=</span><span class=\"s\">\"${build.src}\"</span> <span class=\"na\">destdir=</span><span class=\"s\">\"${build}\"</span> <span class=\"na\">optimize=</span><span class=\"s\">\"${optimization}\"</span> \n       <span class=\"na\">debug=</span><span class=\"s\">\"${debug}\"</span> <span class=\"na\">deprecation=</span><span class=\"s\">\"${deprecation}\"</span><span class=\"nt\">&gt;</span>\n\n  <span class=\"nt\">&lt;excludesfile</span> <span class=\"na\">name=</span><span class=\"s\">\"${src}/java1.4+.javafiles\"</span> <span class=\"na\">if=</span><span class=\"s\">\"isJava13\"</span><span class=\"nt\">/&gt;</span>\n  <span class=\"nt\">&lt;excludesfile</span> <span class=\"na\">name=</span><span class=\"s\">\"${src}/java1.4.javafiles\"</span> <span class=\"na\">unless=</span><span class=\"s\">\"isJava14\"</span><span class=\"nt\">/&gt;</span>\n  <span class=\"nt\">&lt;excludesfile</span> <span class=\"na\">name=</span><span class=\"s\">\"${src}/java1.5.javafiles\"</span> <span class=\"na\">unless=</span><span class=\"s\">\"isJava15\"</span><span class=\"nt\">/&gt;</span>\n  <span class=\"nt\">&lt;excludesfile</span> <span class=\"na\">name=</span><span class=\"s\">\"${src}/ant1.6.javafiles\"</span> <span class=\"na\">unless=</span><span class=\"s\">\"hasAnt16\"</span><span class=\"nt\">/&gt;</span>\n  <span class=\"nt\">&lt;excludesfile</span> <span class=\"na\">name=</span><span class=\"s\">\"${src}/r-project.javafiles\"</span> <span class=\"na\">unless=</span><span class=\"s\">\"rispresent\"</span><span class=\"nt\">/&gt;</span>\n\n  <span class=\"nt\">&lt;includesfile</span> <span class=\"na\">name=</span><span class=\"s\">\"${src}/${module}.javafiles\"</span><span class=\"nt\">/&gt;</span>\n<span class=\"nt\">&lt;/javac&gt;</span>\n</code></pre></div></div>\n\n<p>Keep in mind that the <code class=\"language-plaintext highlighter-rouge\">*.javafiles</code> are created with JavaDoc based on the CDK JavaDoc tags mentioned earlier.</p>\n\n<h3 id=\"the-buildxml-2\">The build.xml 2</h3>\n\n<p>While the above mechanism has been present since for some time now, having jumbo50.jar in CVS made the situation a bit trickier:\nthe <code class=\"language-plaintext highlighter-rouge\">jumbo50.jar</code> uses the 49.0 class format used in Java 1.5, and cannot be processed by Java 1.4 systems. Since the classpath\nused when compiling CDK source code, is defined in configuration files for those modules in\n<a href=\"http://cvs.sourceforge.net/viewcvs.py/cdk/cdk/src/META-INF/\">src/META-INF</a>, the problem did not occur when compiling the modules.\nHowever, it did show an error in the <code class=\"language-plaintext highlighter-rouge\">reallyRunDoclet</code> target today, when I was creating the <code class=\"language-plaintext highlighter-rouge\">*.javafiles</code> with JavaDoc.\nThe solution was trivial:</p>\n\n<div class=\"language-xml highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"nt\">&lt;target</span> <span class=\"na\">name=</span><span class=\"s\">\"reallyRunDoclet\"</span> <span class=\"na\">id=</span><span class=\"s\">\"reallyRunDoclet\"</span>\n  <span class=\"na\">depends=</span><span class=\"s\">\"compileDoclet\"</span> <span class=\"na\">unless=</span><span class=\"s\">\"dotjavafiles.uptodate\"</span><span class=\"nt\">&gt;</span>\n  <span class=\"nt\">&lt;javadoc</span> <span class=\"na\">private=</span><span class=\"s\">\"true\"</span>  <span class=\"na\">maxmemory=</span><span class=\"s\">\"128m\"</span><span class=\"nt\">&gt;</span>\n    <span class=\"nt\">&lt;classpath&gt;</span>\n      <span class=\"nt\">&lt;fileset</span> <span class=\"na\">dir=</span><span class=\"s\">\"${lib}\"</span><span class=\"nt\">&gt;</span>\n        <span class=\"nt\">&lt;include</span> <span class=\"na\">name=</span><span class=\"s\">\"*.jar\"</span> <span class=\"nt\">/&gt;</span>\n        <span class=\"c\">&lt;!-- some jars require some Java version --&gt;</span>\n        <span class=\"nt\">&lt;exclude</span> <span class=\"na\">name=</span><span class=\"s\">\"jumbo50.jar\"</span> <span class=\"na\">unless=</span><span class=\"s\">\"isJava15\"</span><span class=\"nt\">/&gt;</span>\n      <span class=\"nt\">&lt;/fileset&gt;</span>\n      <span class=\"nt\">&lt;fileset</span> <span class=\"na\">dir=</span><span class=\"s\">\"${lib}/libio\"</span><span class=\"nt\">&gt;</span>\n        <span class=\"nt\">&lt;include</span> <span class=\"na\">name=</span><span class=\"s\">\"*.jar\"</span> <span class=\"nt\">/&gt;</span>\n      <span class=\"nt\">&lt;/fileset&gt;</span>\n      <span class=\"nt\">&lt;fileset</span> <span class=\"na\">dir=</span><span class=\"s\">\"${devellib}\"</span><span class=\"nt\">&gt;</span>\n        <span class=\"nt\">&lt;include</span> <span class=\"na\">name=</span><span class=\"s\">\"*.jar\"</span> <span class=\"nt\">/&gt;</span>\n      <span class=\"nt\">&lt;/fileset&gt;</span>\n    <span class=\"nt\">&lt;/classpath&gt;</span>\n\n    <span class=\"nt\">&lt;doclet</span> <span class=\"na\">name=</span><span class=\"s\">\"net.sf.cdk.tools.MakeJavaFilesFilesDoclet\"</span>\n      <span class=\"na\">path=</span><span class=\"s\">\"${doc}/javadoc\"</span><span class=\"nt\">/&gt;</span>\n\n    <span class=\"nt\">&lt;packageset</span> <span class=\"na\">dir=</span><span class=\"s\">\"${src}\"</span><span class=\"nt\">&gt;</span>\n      <span class=\"nt\">&lt;include</span> <span class=\"na\">name=</span><span class=\"s\">\"org/openscience/cdk/**\"</span><span class=\"nt\">/&gt;</span>\n    <span class=\"nt\">&lt;/packageset&gt;</span>\n\n<span class=\"nt\">&lt;/javadoc&gt;</span>\n</code></pre></div></div>\n\n<h3 id=\"cdkapplicationsfileconvertor\">cdk.applications.FileConvertor</h3>\n\n<p>There is another area of interest: the <code class=\"language-plaintext highlighter-rouge\">FileConvertor</code>, which is, sort of, CDK’s\n<a href=\"http://openbabel.sf.net/\">OpenBabel</a>’s <code class=\"language-plaintext highlighter-rouge\">babel</code> variant. The FileConvertor must\nbe compiled in all cases, so we need to conditionally instantiate the <code class=\"language-plaintext highlighter-rouge\">CMLWriter</code>, which is not really a problem. However, compiling\nthe source code is more troublesome: the <code class=\"language-plaintext highlighter-rouge\">CMLWriter</code> class must be loaded on runtime, and not occur hardcoded in the source code.</p>\n\n<p>In the past I have solved this by using <code class=\"language-plaintext highlighter-rouge\">.getInstance()</code> constructs, but the\n<a href=\"http://cvs.sourceforge.net/viewcvs.py/cdk/cdk/src/org/openscience/cdk/io/ChemObjectWriter.java?rev=1.19&amp;view=log\">ChemObjectWriter interface</a> does not define this\nfunctionality, so I decided to use the <code class=\"language-plaintext highlighter-rouge\">java.lang.reflect</code> mechanism:</p>\n\n<div class=\"language-java highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code><span class=\"o\">}</span> <span class=\"k\">else</span> <span class=\"k\">if</span> <span class=\"o\">(</span><span class=\"n\">format</span><span class=\"o\">.</span><span class=\"na\">equalsIgnoreCase</span><span class=\"o\">(</span><span class=\"s\">\"CML\"</span><span class=\"o\">))</span> <span class=\"o\">{</span>\n  <span class=\"nc\">Class</span> <span class=\"n\">cmlWriterClass</span> <span class=\"o\">=</span> <span class=\"k\">this</span><span class=\"o\">.</span><span class=\"na\">getClass</span><span class=\"o\">().</span><span class=\"na\">getClassLoader</span><span class=\"o\">().</span>\n    <span class=\"n\">loadClass</span><span class=\"o\">(</span><span class=\"s\">\"org.opscience.cdk.io.CMLWriter\"</span><span class=\"o\">);</span>\n  <span class=\"k\">if</span> <span class=\"o\">(</span><span class=\"n\">cmlWriterClass</span> <span class=\"o\">!=</span> <span class=\"kc\">null</span><span class=\"o\">)</span> <span class=\"o\">{</span>\n    <span class=\"n\">writer</span> <span class=\"o\">=</span> <span class=\"o\">(</span><span class=\"nc\">ChemObjectWriter</span><span class=\"o\">)</span><span class=\"n\">cmlWriterClass</span><span class=\"o\">.</span><span class=\"na\">newInstance</span><span class=\"o\">();</span>\n  <span class=\"o\">}</span>\n  <span class=\"nc\">Constructor</span> <span class=\"n\">constructor</span> <span class=\"o\">=</span> <span class=\"n\">writer</span><span class=\"o\">.</span><span class=\"na\">getClass</span><span class=\"o\">().</span><span class=\"na\">getConstructor</span><span class=\"o\">(</span><span class=\"k\">new</span> <span class=\"nc\">Class</span><span class=\"o\">[]{</span><span class=\"nc\">Writer</span><span class=\"o\">.</span><span class=\"na\">class</span><span class=\"o\">});</span>\n  <span class=\"n\">writer</span> <span class=\"o\">=</span> <span class=\"o\">(</span><span class=\"nc\">ChemObjectWriter</span><span class=\"o\">)</span><span class=\"n\">constructor</span><span class=\"o\">.</span><span class=\"na\">newInstance</span><span class=\"o\">(</span><span class=\"k\">new</span> <span class=\"nc\">Object</span><span class=\"o\">[]{</span><span class=\"n\">fileWriter</span><span class=\"o\">});</span>\n<span class=\"o\">}</span> <span class=\"k\">else</span> <span class=\"o\">{</span>\n</code></pre></div></div>\n\n<p>Now, this has been, by far, the longest blog item I have written so far. I hope it gave you good insight in some techniques CDK uses to deal with\nsituations where functionality might, or might not, be present at build and at run time.</p>",
      "summary": "I reported earlier that the CDK has been updated in CVS to use CML from the new Jumbo 5.0. The transition actually involved a lot of changes in the CDK, some I would like to address in the following comments. One thing is that CML write support (not reading!) uses the new Jumbo library which requires Java 1.5. Thus, if Java 1.5 is not available, then CML writing should not be compiled. This is how this is done.",
      
      "date_published": "2005-12-10T00:00:00+00:00",
      "date_modified": "2023-08-05T00:00:00+00:00",
      "tags": ["cdk","cml","java"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/dzvnw-3b413",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/12/08/jumbo-50-and-cml-support-in-cdk.html",
      "title": "Jumbo 5.0 and CML support in CDK",
      "content_html": "<p>Tobias <a href=\"http://cvs.sourceforge.net/viewcvs.py/cdk/cdk/jar/jumbo50.jar?rev=1.1&amp;view=log\">commited</a>\n<a href=\"http://sourceforge.net/forum/forum.php?forum_id=518283\">Jumbo 5.0</a> to CDK CVS, so that the CDK is now\nagain up to date with the latest <a href=\"http://www.xml-cml.org/\">CML</a> library. Note that Jumbo 5.0 requires Java 5.0.</p>\n\n<p>At first all JUnit tests seems to work, but apparently the <a href=\"http://cvs.sourceforge.net/viewcvs.py/cdk/cdk/src/org/openscience/cdk/test/io/cml/CML2WriterTest.java?rev=1.13&amp;view=log\">CML2Writer</a>\ntests were skipped because they were only run when Java 1.4 was found. I updated the test for the a appropriate\nJava version, and then it turned out that most tests fail. So those running CDK from CVS and depent on CML\nwriting: hang on, it will be fixed very soon.</p>",
      "summary": "Tobias commited Jumbo 5.0 to CDK CVS, so that the CDK is now again up to date with the latest CML library. Note that Jumbo 5.0 requires Java 5.0.",
      
      "date_published": "2005-12-08T00:00:00+00:00",
      "date_modified": "2005-12-08T00:00:00+00:00",
      "tags": ["cdk","blue-obelisk","cml"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/a3r1n-72841",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/12/06/uml-diagram-of-cdk-module-dependencies.html",
      "title": "UML diagram of CDK module dependencies",
      "content_html": "<p>The code clean up after <a href=\"http://cdk.sf.net/\">CDK</a>’s interfaces transition is in progress, and two\n<a href=\"http://almost.cubic.uni-koeln.de/cdk/cdk_top/devel/modules/\">CDK modules</a> are now independent\nof the <em>data</em> module. After <a href=\"https://chem-bla-ics.linkedchemistry.info/2005/10/25/more-cdkinterfaces-updates.html\">doing the <em>core</em> module <i class=\"fa-solid fa-recycle fa-xs\"></i></a>,\nthe standard was next, and I finished this yesterday. The dependencies in CVS now look like (click it to get a larger view):</p>\n\n<p>IMAGE LOST</p>\n\n<p>This <a href=\"https://en.wikipedia.org/wiki/Unified_Modeling_Language\">UML</a> diagram was made with <a href=\"http://uml.sourceforge.net/\">Umbrello</a>, and the source is in\n<a href=\"http://www-128.ibm.com/developerworks/xml/library/x-xmi/\">XMI</a> in CVS.</p>\n\n<p>I cannot stress enough the advantages of these changes:</p>\n\n<ol>\n  <li>the code is cleaner</li>\n  <li>module dependencies are cleaner</li>\n  <li>impossible to use methods outside the interface</li>\n  <li>the algorithms are independent of the data classes</li>\n</ol>\n\n<p>The last advantage is really important: it allows alternative implementations of the data classes. For example, we could make debug\ndata classes, which, unlike the normal classes, do all sorts of checks when using methods of these classes. For example, they can\nexplicitely check that parameters are not null, of the right class, and generally make sense. This makes them, possibly, slower,\nbut also more type save, and as such great for debugging and development sessions.</p>\n\n<p>Another important application of making the CDK library independent of the data classes (and only depending on the\n<a href=\"http://cdk.sourceforge.net/api/org/openscience/cdk/interfaces/package-frame.html\">interfaces</a>), is that we can have data classes\nshared with other Java libraries, such as <a href=\"http://joelib.sf.net/\">JOElib</a>, <a href=\"http://octetsource.com/\">Octet</a>,\nCML (<a href=\"http://sourceforge.net/mailarchive/forum.php?thread_id=9146642&amp;forum_id=8774\">Jumbo 5.0 is out!</a>), and even proprietary libraries.\nThis approach is already used in the <a href=\"https://chem-bla-ics.linkedchemistry.info/2005/10/18/cdk-taverna-fully-recognized.html\">CDK-Taverna <i class=\"fa-solid fa-recycle fa-xs\"></i></a>\nlibrary, and I anticipate much wider use with the arrival of <a href=\"http://www.bioclipse.net/\">Bioclipse</a>.</p>",
      "summary": "The code clean up after CDK’s interfaces transition is in progress, and two CDK modules are now independent of the data module. After doing the core module , the standard was next, and I finished this yesterday. The dependencies in CVS now look like (click it to get a larger view):",
      
      "date_published": "2005-12-06T00:00:00+00:00",
      "date_modified": "2024-03-11T00:00:00+00:00",
      "tags": ["cdk","uml"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/s2cqd-wvh17",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/12/04/planet-blue-obelisk-website-updates.html",
      "title": "Planet Blue Obelisk website updates",
      "content_html": "<p>After requests I added yesterday more visible the RSS and Atom feeds for the\n<a href=\"http://www.woc.science.ru.nl/planetbo/\">Planet Blue Obelisk</a>. They are linked in the menu\non the right, and as alternative links to the document. These should show up in most recent webbrowsers as feed icon in the\nlower right corner of the browser window. It is often an orange icon. I also added a ‘Leave a comment’ link to encourage\npeople to leave comments on items. Please do!</p>",
      "summary": "After requests I added yesterday more visible the RSS and Atom feeds for the Planet Blue Obelisk. They are linked in the menu on the right, and as alternative links to the document. These should show up in most recent webbrowsers as feed icon in the lower right corner of the browser window. It is often an orange icon. I also added a ‘Leave a comment’ link to encourage people to leave comments on items. Please do!",
      
      "date_published": "2005-12-04T00:00:00+00:00",
      "date_modified": "2005-12-04T00:00:00+00:00",
      "tags": ["feeds","blue-obelisk"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/v0a2f-hfk94",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/12/03/about-jchempaints-future-and-todays.html",
      "title": "About JChemPaint&apos;s future and todays 2.1.5 release",
      "content_html": "<p>Stefan has done an excellent debugging week on <a href=\"http://jchempaint.sf.net/\">JChemPaint</a>, while I have been late with a\n2.1 release. Anyway, I’ve just uploaded a Java 1.4 compiled JChemPaint 2.1 series release. I was told the (reported) bug\ncount is down to one, so I expect to see the next stable branch to be released soon (2.2 series).</p>\n\n<p>But what after JChemPaint 2.2 gets released? Will a 2.3 developers branch be opened? Or will the JChemPaint application,\nas we know it, cease to exist, and make place for the <a href=\"http://www.bioclipse.net/\">Bioclipse</a>\n<a href=\"http://www.bioclipse.net/index.php?option=com_content&amp;task=view&amp;id=6&amp;Itemid=7\">JChemPaint plugin</a>, that is being worked on?</p>\n\n<p>It is worth mentioning the pros and cons of JChemPaint. One big pro is the applet version of JChemPaint, though free but\nclosed source alternatives are available (e.g. <a href=\"http://www.chemaxon.com/marvin/chemaxon/marvin/help/common.html\">MarvinSketch</a>).\nAnother advantage is the great semantics of the chemistry being drawn. For example, when drawing reactions, reactants are\nreally marked as reactants, and are not just molecules left of an arrow. Moreover, JChemPaint is a great platform in which\nideas can be tested! One of the key virtues of opensourceness. Cons include the limited amount of templates, print quality\ngraphics, and others. (Comments on JChemPaint most welcomed.)</p>\n\n<p>So what about this Bioclipse then? It is inheritently SWT based, but currently the\n<a href=\"http://help.eclipse.org/help30/index.jsp?topic=/org.eclipse.platform.doc.isv/reference/api/org/eclipse/swt/awt/SWT_AWT.html\">SWT_AWT</a>\nbridge is used to embed to current JChemPaint and underlying CDK code as is. Unfortunately,\n<a href=\"http://lists.gnu.org/archive/html/classpath/2005-11/msg00162.html\">this bridge is using proprietary code from Sun</a>\n(<code class=\"language-plaintext highlighter-rouge\">sun.awt classes</code>), which makes it impossible to use with free virtual machines.</p>\n\n<p>But there is also the option of using the SWT drawing classes. This has the advantage that it can be run with free virtual\nmachines, and that it can even be compiled to native code. It requires serious rewriting of code in the JChemPaint and\nCDK code base. But, CDK’s <a href=\"http://cdk.sourceforge.net/api/org/openscience/cdk/renderer/Renderer2D.html\">Renderer2D</a> needs a\nrewrite anyway: it does not even use Swing’s Java2D efficiently (try to figure out how it transforms atomic 2D coordinates into\nscreen coordinates!). Some efforts have been ongoing, but a rewrite from scratch, with a better, more modular, design cannot\nhurd at all.</p>",
      "summary": "Stefan has done an excellent debugging week on JChemPaint, while I have been late with a 2.1 release. Anyway, I’ve just uploaded a Java 1.4 compiled JChemPaint 2.1 series release. I was told the (reported) bug count is down to one, so I expect to see the next stable branch to be released soon (2.2 series).",
      
      "date_published": "2005-12-03T00:00:00+00:00",
      "date_modified": "2005-12-03T00:00:00+00:00",
      "tags": ["jchempaint","cdk","bioclipse"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/egxtq-kd254",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/11/30/kde-35-is-out.html",
      "title": "KDE 3.5 is out",
      "content_html": "<p><a href=\"http://www.kde.org/\">KDE</a> 3.5 was <a href=\"http://dot.kde.org/1133270759/\">released</a> with\n<a href=\"http://www.kde.org/announcements/visualguide-3.5.php\">lots of changes</a>. SuperKaramba is now a standard\nKDE application and is neatly integrated. It allows embedding themelets on your desktop background.</p>\n\n<p>It shows several themelets: the weather, a calender, a toolbar with applications, a\n<a href=\"https://web.archive.org/web/20060127053003/http://wiki.jmol.org/FoldingAtHomeCommunity\">FoldingAtHome monitor <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>,\nthe contents of the clipboard, the music that is playing\n(<a href=\"http://en.wikipedia.org/wiki/Cake_(band)\">Cake</a>) and a simple todo list. All customizable up to the pixel.</p>\n\n<p>And before I forget: a nice new <a href=\"http://edu.kde.org/kalzium/\">Kalzium</a> release!</p>",
      "summary": "KDE 3.5 was released with lots of changes. SuperKaramba is now a standard KDE application and is neatly integrated. It allows embedding themelets on your desktop background.",
      
      "date_published": "2005-11-30T00:00:00+00:00",
      "date_modified": "2023-08-03T00:00:00+00:00",
      "tags": ["kde"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/v9q9d-pbv52",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/11/30/getting-started-with-eclipse-and-swt.html",
      "title": "Getting Started with Eclipse and the SWT",
      "content_html": "<p><a href=\"http://www.cs.umanitoba.ca/~eclipse/\">Getting Started with Eclipse and the SWT</a> is a very nice set of introductory tutorial on working\nwith SWT and Eclipse in general. The tutorials cover the <a href=\"http://www.cs.umanitoba.ca/~eclipse/2-Basic.pdf\">basic</a>,\n<a href=\"http://www.cs.umanitoba.ca/~eclipse/3-Advanced.pdf\">advanced</a> SWT widgets,\n<a href=\"http://www.cs.umanitoba.ca/~eclipse/4-Layouts.pdf\">SWT layout</a>, and several other interesting topics.</p>\n\n<p>Now that <a href=\"http://www.bioclipse.net/\">Bioclipse</a> is gaining speed, it is a must-read.</p>",
      "summary": "Getting Started with Eclipse and the SWT is a very nice set of introductory tutorial on working with SWT and Eclipse in general. The tutorials cover the basic, advanced SWT widgets, SWT layout, and several other interesting topics.",
      
      "date_published": "2005-11-30T00:00:00+00:00",
      "date_modified": "2005-11-30T00:00:00+00:00",
      "tags": ["bioclipse"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/1vq27-8js77",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/11/28/blue-obelisk-blog-planet.html",
      "title": "A Blue Obelisk blog Planet",
      "content_html": "<p>Today I setup a blog planet for <a href=\"http://www.blueobelisk.org/\">Blue Obelisk</a> members. First I tried\nChumpologica but it did not read Atom feeds.</p>\n\n<p>Next in line was <a href=\"https://web.archive.org/web/20171029175722/http://www.planetplanet.org/\">Planet <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>,\nwhich turned out to be used by many big planet sites, like\n<a href=\"http://planet.debian.org/\">Planet Debian <i class=\"fa-solid fa-recycle fa-xs\"></i></a>. It also works with Atom feeds in general, but not well with Atom 1.0 feeds, like that of\n<a href=\"http://www.livejournal.com/users/cniehaus/\">Carsten</a>. After some googling I found a\n<a href=\"http://lists.planetplanet.org/pipermail/devel/2005-November/000710.html\">patched version <i class=\"fa-solid fa-link-slash fa-xs\"></i></a> which did the job.</p>\n\n<p>The result is at <a href=\"http://www.woc.science.ru.nl/planetbo/\">http://www.woc.science.ru.nl/planetbo/ <i class=\"fa-solid fa-link-slash fa-xs\"></i></a>,\nbut I hope that someone can arrange a http://planet.blueobelisk.org/.</p>",
      "summary": "Today I setup a blog planet for Blue Obelisk members. First I tried Chumpologica but it did not read Atom feeds.",
      
      "date_published": "2005-11-28T00:00:00+00:00",
      "date_modified": "2023-08-03T00:00:00+00:00",
      "tags": ["blue-obelisk","feeds"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/s1sxs-8qb11",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/11/27/open-source-swing-jmol-renderer-runs.html",
      "title": "Open Source Swing: Jmol renderer runs!",
      "content_html": "<p>Where I was able to mention <a href=\"/blog/2005/11/20/open-source-swing-jchempaint-runs.html\">earlier <i class=\"fa-solid fa-recycle fa-xs\"></i></a> that JChemPaint now runs with free\n(as in open source) Java virtual machines, I just tried to run the core Jmol renderer, using the\n<a href=\"https://sourceforge.net/p/jmol/code/4289/tree//trunk/Jmol/examples/Integration.java\">Integration.java <i class=\"fa-solid fa-recycle fa-xs\"></i></a> which comes as an example.</p>\n\n<p>Sadly, the original screenshots got lost that were made with <a href=\"http://jamvm.sourceforge.net/\">jamvm</a> 1.3.3 and <a href=\"http://developer.classpath.org/\">classpath</a> 0.19.</p>\n\n<p>It is very slow, however. I have not tried it with other free virtual machines, which are supposedly faster. It is a good start nevertheless: it means that a\nJmol based <a href=\"http://www.bioclipse.net/\">Bioclipse</a> plugin will work with free virtual machines too.</p>",
      "summary": "Where I was able to mention earlier that JChemPaint now runs with free (as in open source) Java virtual machines, I just tried to run the core Jmol renderer, using the Integration.java which comes as an example.",
      
      "date_published": "2005-11-27T00:00:00+00:00",
      "date_modified": "2024-08-09T00:00:00+00:00",
      "tags": ["jmol","java"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/bzqem-cqy33",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/11/23/machine-crash-svn-went-along.html",
      "title": "Machine crash; SVN went along",
      "content_html": "<p>Doesn’t happen often, but my machine crashed two hours ago. Not a big deal, because I have my important files in SVN. Oh wait, SVN had a commit\nin progress during the crash. So, <code class=\"language-plaintext highlighter-rouge\">svn recover</code>. Mmmm… doesn’t work either. OK, SVN FAQ: try <code class=\"language-plaintext highlighter-rouge\">db_recover</code>. That worked. No, it did not:\n<code class=\"language-plaintext highlighter-rouge\">svn commit</code> still not working for the files I was trying to commit. Fortunately, I make regular SVN db backups so I created a brand new\nSVN repository from scratch and recovered the back up. That worked. Really.</p>",
      "summary": "Doesn’t happen often, but my machine crashed two hours ago. Not a big deal, because I have my important files in SVN. Oh wait, SVN had a commit in progress during the crash. So, svn recover. Mmmm… doesn’t work either. OK, SVN FAQ: try db_recover. That worked. No, it did not: svn commit still not working for the files I was trying to commit. Fortunately, I make regular SVN db backups so I created a brand new SVN repository from scratch and recovered the back up. That worked. Really.",
      
      "date_published": "2005-11-23T00:00:00+00:00",
      "date_modified": "2023-08-02T00:00:00+00:00",
      "tags": ["svn"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/sm10s-hjc49",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/11/21/bioclipse-chemo-bioinformatics.html",
      "title": "Bioclipse: the chemo-/bioinformatics workbench",
      "content_html": "<p>Some weeks back there was the <a href=\"https://web.archive.org/web/20080208101002/http://almost.cubic.uni-koeln.de/cdk/cdk_top/events/cdk5yearworkshop/\">CDK5AW <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>,\nthe CDK 5th anniversiry workshop. A small group of international open source chemo-, bioinformatics software developers met,\namong which two from Sweden. It was then decided to generalize their work resulting in Bioclipse:</p>\n\n<p><a href=\"https://www.bioclipse.net/\">https://www.bioclipse.net/</a></p>\n\n<p>It’s heavily using the <a href=\"https://wiki.eclipse.org/Rich_Client_Platform\">Eclipse Rich Client Platform <i class=\"fa-solid fa-recycle fa-xs\"></i></a>, making additional plugins trivial. OK, if this does\nnot convinve you: check the screenshots on the Bioclipse website.</p>\n\n<p>It’s a killer, really! Ola, Martin: great work!</p>\n\n<p>PS. I am going to try to run it with free Java virtual machines this weekend, but if you have a working solution earlier than that, please leave a comment and screenshot in the comments.</p>",
      "summary": "Some weeks back there was the CDK5AW , the CDK 5th anniversiry workshop. A small group of international open source chemo-, bioinformatics software developers met, among which two from Sweden. It was then decided to generalize their work resulting in Bioclipse:",
      
      "date_published": "2005-11-21T00:00:00+00:00",
      "date_modified": "2023-08-02T00:00:00+00:00",
      "tags": ["cdk","bioclipse"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/4dgp8-dtq30",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/11/20/open-source-swing-jchempaint-runs.html",
      "title": "Open Source Swing: JChemPaint runs!",
      "content_html": "<p>Thanx to <a href=\"https://chem-bla-ics.blogspot.com/2005/11/goal-live-chemblaics-cd.html?showComment=1132422120000\">Mark’s encouragements</a>, I tried to run\n<a href=\"http://www.jmol.org/\">Jmol</a> and <a href=\"http://jchempaint.sf.net/\">JChemPaint</a> with\n<a href=\"http://jamvm.sourceforge.net/\">jamvm</a>.</p>\n\n<p>Jmol fails with an <a href=\"https://chem-bla-ics.linkedchemistry.info/2005/11/18/goal-live-chemblaics-cd.html\">NullPointerException <i class=\"fa-solid fa-recycle fa-xs\"></i></a>, but JChemPaint runs! And note that\nthis was not even running with the latest of the latest; just recent packages from Kubuntu! Yes, there are some glitches, but I’m happy nevertheless!</p>",
      "summary": "Thanx to Mark’s encouragements, I tried to run Jmol and JChemPaint with jamvm.",
      
      "date_published": "2005-11-20T00:00:00+00:00",
      "date_modified": "2024-03-23T00:00:00+00:00",
      "tags": ["jchempaint","java","jmol","linux"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/e2cdx-9q525",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/11/18/goal-live-chemblaics-cd.html",
      "title": "The goal: a live chemblaics CD",
      "content_html": "<p>This evening I have been looking at with the <a href=\"http://www.knoppix.net/\">KNOPPIX</a> customization howto, and ran many of the interesting commands.\nI’ve setup a environment with Kalzium, OpenBabel, CDK, jython, <a href=\"http://pymol.sourceforge.net/\">PyMOL</a>, and for development I included gcj and\nEclipse. At some later point I will include kfile_chemical too, but I want to make a deb package first.</p>\n\n<p>Moreover, I also wanted it to include JChemPaint, Jmol and <a href=\"http://taverna.sourceforge.net/\">Taverna</a> (with the CDK extension). However, these\ndepend on Swing, which is not suffiently provided by open source java virtual machines. I attempted gij 4.0, <a href=\"http://www.kaffe.org/\">kaffe</a>\nand <a href=\"http://sablevm.org/\">sablevm</a>, all without success.</p>\n\n<p>A live CD with all the open source chemo- and bioinformatics tools would be a real killer. We could take a burned live CD with us to conferences\nand have others run our software on their laptop! But we need to stop use Swing. Fortunately, there seems to be a serious project going on to\nport JChemPaint and Jmol to a free Java GUI environment, so maybe we can have the live CD up and going before the 2006 conferences start.</p>",
      "summary": "This evening I have been looking at with the KNOPPIX customization howto, and ran many of the interesting commands. I’ve setup a environment with Kalzium, OpenBabel, CDK, jython, PyMOL, and for development I included gcj and Eclipse. At some later point I will include kfile_chemical too, but I want to make a deb package first.",
      
      "date_published": "2005-11-18T00:00:00+00:00",
      "date_modified": "2005-11-18T00:00:00+00:00",
      "tags": ["cheminf","linux","java","workflow"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/sfzaf-73y03",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/11/17/back-from-1st-gcc.html",
      "title": "Back from the 1st GCC",
      "content_html": "<p>OK, just back from the <a href=\"http://www.cic-workshop.de/\">first German Chemoinformatics Conference</a>, which I enjoyed very much. A rather interesting\nprogram, and lots of interesting posters too. You can read the programme online, and will not spend too many words on that (at least not now).\nBut what I will do is point out some interesting posters here.</p>\n\n<p>One poster was on the Molecular Query Language (MQL) by Ewgenij Proschak from <a href=\"http://gecco.org.chemie.uni-frankfurt.de/\">Frankfurt</a>. You can\nread more on this in the latest <a href=\"http://almost.cubic.uni-koeln.de/cdk/cdk_top/cdk_news/\">CDK News</a> as it is implemented for the CDK too.\nThe opensource implementation is expected next year.</p>\n\n<p>Another interesting poster was on the use of <a href=\"http://www.biowisdom.com/ontology/faq_q3.htm\">ontologies to connect chemistry and biology</a>.\nThis poster was by Juergen Harter from <a href=\"http://www.biowisdom.com/\">BioWisdom</a>, a Cambridge, UK based company.</p>\n\n<p><a href=\"http://www.scai.fraunhofer.de/209.0.html?&amp;L=1\">Marc Zimmermann</a> had a poster on the chemical OCR variant, called chemical structure\nrecognition (CSR). This process converts images, for example scanned from literature, into a connectivity table. Difficult task, indeed.\n<a href=\"http://www.ercim.org/publication/Ercim_News/enw60/zimmermann.html\">This page</a> contains some information about this project.</p>\n\n<p>There were other interesting posters too, so will probably report on those later too. But do feel free to leave comments to this blog post,\ndiscussing other interesting posters.</p>",
      "summary": "OK, just back from the first German Chemoinformatics Conference, which I enjoyed very much. A rather interesting program, and lots of interesting posters too. You can read the programme online, and will not spend too many words on that (at least not now). But what I will do is point out some interesting posters here.",
      
      "date_published": "2005-11-17T00:00:00+00:00",
      "date_modified": "2005-11-17T00:00:00+00:00",
      "tags": ["cheminf","ontology"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://chem-bla-ics.linkedchemistry.info/2005/11/11/going-to-german-chemoinformatics.html",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/11/11/going-to-german-chemoinformatics.html",
      "title": "Going to the German Chemoinformatics Conference",
      "content_html": "<p>This sunday starts the first <a href=\"https://web.archive.org/web/20051215010113/https://www.cic-workshop.de/\">German Chemoinformatics Conference <i class=\"fa-solid fa-box-archive fa-xs\"></i></a> in\n<a href=\"http://www.goslar.de/\">Goslar</a>. It’s an interesting <a href=\"https://web.archive.org/web/20060206222231/http://scholle.oc.uni-kiel.de/users/cic/tagungen/workshop05/programm.html\">programme <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>, with\npresentations on the InChI, PubChem, 25 years of chemoinformatics, the chemical semantic web, and much more.</p>\n\n<p>Among these presentations is mine, on comparing crystal structures\n(<a href=\"https://web.archive.org/web/20050410111504/http://www.cac.science.ru.nl/research/publications/PDFs/willighagen2005.pdf\">PDF <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>)\nand deducing cell parameters. But I’m having a poster on QSAR too.</p>\n\n<p>I’ll arrive on saturday afternoon in Goslar, so leave a message at the conference hotel if you want to meet up, and talk about my work, or yours, or\nthe CDK, KDE, JChemPaint, Jmol, kfile_chemical, Kat/Chemistry, <a href=\"http://www.blueobelisk.org/\">BlueObelisk</a>, Eclipse, R, or whatever else…\nI plan to have a modest german meal and one or two beers in the evening.</p>\n\n<p>BTW, after Belém (Lissabon), Sintra, Boppard, Kinderdijk, Hoorn and Cologne, it’s the 7th\n<a href=\"http://whc.unesco.org/\">UNESCO world heritage</a> site I’m visiting in just 14 months! Can’t we just have conferences in Hawaii and sorts, like\nthey do in other fields?? Oh, wait, we do: EuroQSAR is on a cruise boat.</p>",
      "summary": "This sunday starts the first German Chemoinformatics Conference in Goslar. It’s an interesting programme , with presentations on the InChI, PubChem, 25 years of chemoinformatics, the chemical semantic web, and much more.",
      
      "date_published": "2005-11-11T00:00:00+00:00",
      "date_modified": "2023-07-31T00:00:00+00:00",
      "tags": ["cheminf","crystal"],
      "_references": [{ "url": "https://doi.org/10.1107/S0108768104028344" }],
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/6n4we-wam18",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/11/10/scons-and-bksys-for-kfilechemical.html",
      "title": "Scons and bksys for kfile_chemical",
      "content_html": "<p>Not so long ago, it was <a href=\"http://conference2005.kde.org/slides/software-construction-tools-talk--thomas-nagy.pdf\">decided</a> that KDE 4.0\nwill use <a href=\"http://www.scons.org/\">SCons</a> as a configuration and building tool, instead of the autotools and make: the common\n<code class=\"language-plaintext highlighter-rouge\">./configure &amp;&amp; make &amp;&amp; make install</code> which has served the open source community very well for so long.</p>\n\n<p>SCons is <a href=\"http://dot.kde.org/1126452494/\">different</a> in several ways. One of these is that the tar.gz packages it produces are some\n500kB smaller, which makes a huge difference for <a href=\"http://kde-apps.org/content/show.php?content=28995\">kfile_chemical</a> which is\nnow 121kB instead of 635kB.</p>\n\n<p>Now, the <a href=\"http://www.kde.org/\">KDE</a> community, or Thomas Nagy to be precise, developed a helper for KDE software, called\n<a href=\"http://www.kde-apps.org/content/show.php?content=19243\">bksys</a>. Version 1.5.1, however, did not contain an example directory for kfile\nplugins, but I managed to work something out starting from the configuring scripts from <a href=\"http://kde-apps.org/content/show.php?content=12725\">kdissert</a>,\nand ended up with these <a href=\"http://websvn.kde.org/trunk/playground/utils/kfile_chemical/SConstruct?rev=479410&amp;view=log\">SConstruct</a> and\n<a href=\"http://websvn.kde.org/trunk/playground/utils/kfile_chemical/config.bks?rev=479414&amp;view=log\">config.bks</a>.</p>\n\n<p>Now, I haven’t figured out how to include the translations, but will figure that out sooner or later… for now, I’m quite happy with the new build system.</p>",
      "summary": "Not so long ago, it was decided that KDE 4.0 will use SCons as a configuration and building tool, instead of the autotools and make: the common ./configure &amp;&amp; make &amp;&amp; make install which has served the open source community very well for so long.",
      
      "date_published": "2005-11-10T00:00:00+00:00",
      "date_modified": "2005-11-10T00:00:00+00:00",
      "tags": ["kde","chemistry"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/hxb0r-66s49",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/11/08/when-to-stop-including-qsar-model.html",
      "title": "When to stop including QSAR model variables...",
      "content_html": "<p>Yesterday I reviewed an article which published a QSPR model which looked something like:</p>\n\n\\[y = 151 + 50p1 - 12p2 - 0.006p3\\]\n\n<p>with quite OK prediction results (R=0.9880). But I was not quite comfortable with the coefficient for the \\(p3\\) variable.\nThe article did not calculate significances for the coefficients, so it was not obvious from the article wether is was useful\nto include them. I then looked at the range for <code class=\"language-plaintext highlighter-rouge\">p3</code>, which was 110-150; so, the maximal influence this variable can have is\n\\(150*0.006 = 0.9\\). Now, the experimental values given in the article were rounded to integers, indicating that the maximal\neffect of the <code class=\"language-plaintext highlighter-rouge\">p3</code> variable is smaller than the experimental error! It’s even worse when you consider the difference between the\nmin and max value (40), then the influence would even be smaller (assuming that most model methods would put the mean temperature\neffect in the offset, 151 in this case).</p>\n\n<p>Today, I reread an article with a similar issue. The model was something like:</p>\n\n\\[y = -0.81 + 0.03*p1 + 0.009*p2\\]\n\n<p>Here, \\(max(p2)-min(p2)\\) is a smaller than 100, so the maximal effect of the variable would be in the order 0.9, which is of\nthe same order of the root mean square error of prediction (RMSEP) for this model. Indeed, the article already states that the\ncoefficient is only significant at the 95% level, and not at the 99% level. But, without having calculated the RMSEP for a model\nwithout the p4 variable, I would guess that leaving it out would give equally good prediction results.</p>\n\n<p>Concluding, I would say the the <code class=\"language-plaintext highlighter-rouge\">p2</code> variable does not include relevant information.</p>\n\n<p>Do you think it is reasonable to include the <code class=\"language-plaintext highlighter-rouge\">p2</code> variable in the second model?</p>",
      "summary": "Yesterday I reviewed an article which published a QSPR model which looked something like:",
      
      "date_published": "2005-11-08T00:00:00+00:00",
      "date_modified": "2005-11-08T00:00:00+00:00",
      "tags": ["cheminf","qsar"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/r8zfg-3e891",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/11/08/r-gui-rkward.html",
      "title": "A R GUI: rkward",
      "content_html": "<p>The great thing about open source is that… it’s open.</p>\n\n<p>When I was browsing the internet just now, I dropped in on <a href=\"http://dot.kde.org/\">KDE Dot News</a>. In the rightside column, there is a feed of\nnew KDE software from <a href=\"http://www.kde-apps.org/\">KDE-apps.org</a>. A new version of my favoriate music player,\n<a href=\"http://amarok.kde.org/\">amarok</a>, lured me to the KDE-apps website, where I saw <a href=\"http://rkward.sf.net/\">rkward</a> is latest announcement. The funny\nname, and the categorization as scientific, triggered some interest on my side, and it turned out to be a graphical frontend to my favorite statistics program,\n<a href=\"http://www.r-project.org/\">R</a>.</p>\n\n<p>Ok, they had a <a href=\"http://www.debian.org/\">Debian</a> package, and the debian/ build dir in the tar.gz so I downloaded it and started making a\n<a href=\"http://www.woc.science.ru.nl/devel/egonw/rkward_0.3.4_i386.deb\">Kubuntu 5.10 package</a>. While doing this I saw some notice about the R syntax highlighting\nused, which conflicts with the older version in the Kate packages.</p>\n\n<p>Then I realized that a long time ago, I wrote such syntax highlighting for Kate, so my attention was lured again. And, indeed, they use my syntax highlighting,\nthough <a href=\"http://www.uni-kiel.de/agrarpol/ahenningsen/index-e.html\">extended later</a> (somewhere down the page).</p>\n\n<p>And this makes me happy. The syntax highlighting was useful to me in the past, but apparently to a lot of other people too. And because I released it\nas GPL, back then, it now appears in rkward! Yes, a really like open source :)</p>",
      "summary": "The great thing about open source is that… it’s open.",
      
      "date_published": "2005-11-08T00:00:00+00:00",
      "date_modified": "2005-11-08T00:00:00+00:00",
      "tags": ["rstats","kde"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/v6wb4-fxp54",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/11/07/ubuntu-dapper-will-include-chemistry.html",
      "title": "Ubuntu Dapper will include chemistry features",
      "content_html": "<p>I just <a href=\"https://launchpad.net/distros/ubuntu/+spec/kubuntu-file-search\">read</a> that the <a href=\"http://www.kubuntu.org/\">Kubuntu</a> team\n<a href=\"https://wiki.ubuntu.com/KubuntuFileSearchWithKat\">wants</a> to include <a href=\"http://kat.mandriva.com/\">Kat</a> in the\n<a href=\"http://packages.ubuntu.com/dapper/\">dapper</a> release (scheduled for April 2006). Kat is (to be) the KDE equivalent of Google’s desktop search bar.</p>\n\n<p>This is great news for us chem-bla-icians, as Kat has support for full text searching of chemistry files! Let’s see if I can get the Kubuntu team\nto package up <a href=\"http://www.kde-apps.org/content/show.php?content=28995\">kfile_chemical</a> too, which will extend Kat (and KDE in general), with\nextraction of meta data from chemical documents.</p>",
      "summary": "I just read that the Kubuntu team wants to include Kat in the dapper release (scheduled for April 2006). Kat is (to be) the KDE equivalent of Google’s desktop search bar.",
      
      "date_published": "2005-11-07T00:00:00+00:00",
      "date_modified": "2005-11-07T00:00:00+00:00",
      "tags": ["kde","chemistry"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/b1vyj-0kd63",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/11/02/rcdk-install-fails-on-gcc-40-systems.html",
      "title": "R/CDK install fails on GCC 4.0 systems",
      "content_html": "<p>Some time ago <a href=\"http://blue.chem.psu.edu/~rajarshi/\">Rajarshi Guha</a> introduced <a href=\"http://www.r-project.org/\">R</a> bindings for the\n<a href=\"http://cdk.sf.net/\">CDK</a> (see his CDK News <a href=\"http://almost.cubic.uni-koeln.de/cdk/cdk_top/cdk_news/\">articles</a>), and\ntoday I tried to install his rcdk package that makes it happen.</p>\n\n<p>However, it requires <a href=\"http://www.omegahat.org/RSJava/\">SJava</a> which compiled fine on other machines, but not on my AMD64\nmachine. The problem seems to be related to the GNU GCC 4.0 compiler I have installed. Compiling with 3.4 works fine,\nbut 4.0 complains with:</p>\n\n<div class=\"language-shell highlighter-rouge\"><div class=\"highlight\"><pre class=\"highlight\"><code>CtoJava.cweb:215: error: static declaration of <span class=\"s1\">'std_env'</span> follows non-static declaration\nCtoJava.cweb:195: error: previous declaration of <span class=\"s1\">'std_env'</span> was here\n</code></pre></div></div>\n\n<p>Googling, learned me that I am not the only one with this problem, but did not find any solution. If you know how to fix this problem, please leave a message in the comments.</p>",
      "summary": "Some time ago Rajarshi Guha introduced R bindings for the CDK (see his CDK News articles), and today I tried to install his rcdk package that makes it happen.",
      
      "date_published": "2005-11-02T00:00:00+00:00",
      "date_modified": "2005-11-02T00:00:00+00:00",
      "tags": ["cdk","rstats"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/gc4hw-5k265",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/11/02/open-source-data-mining-in.html",
      "title": "Open Source data mining in chemoinformatics",
      "content_html": "<p>On the <a href=\"http://www.int-conf-chem-structures.org/\">7th International Conference on Chemical Structures</a>\n<a href=\"http://www.medchem.leidenuniv.nl/people/jeroen_kazius.htm\">Jeroen Kazius</a> has a\n<a href=\"http://www.liacs.nl/~snijssen/gaston/iccs.html\">poster</a> on finding discriminative substructures, that is, molecular fragments\nwhich can be discriminate between two acitivity classes. The software is released as\n<a href=\"http://www.liacs.nl/~snijssen/gaston/\">Gaston</a>, is written in C++ and has the GPL license.</p>\n\n<p>Later I encountered <a href=\"http://fuzzy.cs.uni-magdeburg.de/~borgelt/moss.html\">MoSS</a> which has the same goal, but uses a different algorithm.\nMoSS is written in Java and uses the LGPL license. MoSS reads STN and SMILES as input, which might not be optimal for all users,\nso a CDK port comes to mind.</p>",
      "summary": "On the 7th International Conference on Chemical Structures Jeroen Kazius has a poster on finding discriminative substructures, that is, molecular fragments which can be discriminate between two acitivity classes. The software is released as Gaston, is written in C++ and has the GPL license.",
      
      "date_published": "2005-11-02T00:00:00+00:00",
      "date_modified": "2005-11-02T00:00:00+00:00",
      "tags": ["iccs","cheminf"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/p46tq-r7946",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/11/01/annual-lunteren-meeting.html",
      "title": "The annual Lunteren meeting",
      "content_html": "<p>Most Dutch chemists have their annual Lunteren meeting, so do I. Lunteren is a small village on the Veluwe where nothing much can be done,\nexcept for listening to the presentations. I participate in the Lunteren meeting for analytical chemists, i.e. HPLC, MS, GC and all their\ncombinations upto and including HPLC/MS/MS, and since a few years the Lab-on-a-Chip stuff. And, as such, in many cases a lot of details on\nhow to use and develop these methods.</p>\n\n<p>For a computational chemist, this often is too much practical detail on too little -ics. Fortunately, the proteomics, genomics, etc is a\nstrong upcoming funding subject, so data analysis is getting in their picture too. Which is good for someone with a chemometrics/chemoinformatics\nbackground as funding in that area is getting smaller every year.</p>\n\n<p>My presentation went reasonable well, as far as I can tell myself. I was very nervous with both my professor and some 150 other people in the\naudience, but managed to not wander off the main topic. However, I was told to be a bit too monotone, but that’s an unfortunate effect of\nbeing so nervous.</p>",
      "summary": "Most Dutch chemists have their annual Lunteren meeting, so do I. Lunteren is a small village on the Veluwe where nothing much can be done, except for listening to the presentations. I participate in the Lunteren meeting for analytical chemists, i.e. HPLC, MS, GC and all their combinations upto and including HPLC/MS/MS, and since a few years the Lab-on-a-Chip stuff. And, as such, in many cases a lot of details on how to use and develop these methods.",
      
      "date_published": "2005-11-01T00:00:00+00:00",
      "date_modified": "2005-11-01T00:00:00+00:00",
      "tags": ["phd"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/xf9bq-44218",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/10/30/cdk-news.html",
      "title": "CDK News",
      "content_html": "<p>Just finished applying the latest spelling error fixes to <a href=\"https://sourceforge.net/projects/cdk/files/CDK%20News/2_3/\">CDK News 2.3 <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.\nTook me some three hours to finish it up the 12 pages, which has mostly to the need to recompile the PDF after each change to make sure that nothing in\nthe layout got broken.</p>\n\n<p>The content contains four <a href=\"https://web.archive.org/web/20070807110111/http://almost.cubic.uni-koeln.de/cdk/cdk_top/cdk_news/submitting\">communications <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>:</p>\n\n<ul>\n  <li>An Open Framework for Online QSAR Modeling</li>\n  <li>Atom types in the CDK</li>\n  <li>MQL - Development of a novel substructure query language</li>\n  <li>Stereochemistry detection in the CDK</li>\n</ul>\n\n<p>And, of course, the recurrent Editorial, FAQ and ChangeLog.</p>",
      "summary": "Just finished applying the latest spelling error fixes to CDK News 2.3 . Took me some three hours to finish it up the 12 pages, which has mostly to the need to recompile the PDF after each change to make sure that nothing in the layout got broken.",
      
      "date_published": "2005-10-30T00:00:00+00:00",
      "date_modified": "2023-07-30T00:00:00+00:00",
      "tags": ["cdk","cheminf"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/frske-p0649",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/10/29/kfilechemical-gets-xyz-mol2-smiles-vmd.html",
      "title": "kfile_chemical gets XYZ, Mol2, SMILES, VMD and GenBank support",
      "content_html": "<p>Jerome Pansanel contributed new patches for <a href=\"http://www.kde-apps.org/content/show.php?content=28995\">kfile_chemical</a>; on\nMonday actually, but I have been busy with other things, among which a presentation I have to give next Monday for some 100+\nanalytical chemists. The patch adds support to <a href=\"http://www.kde.org/\">KDE</a> for five new chemical MIMEs: XYZ, Mol2, SMILES,\nVMD and GenBank. Therefore, I just released a new version (0.10), and added an announcement to\n<a href=\"http://freshmeat.net/projects/kfile_chemical/\">Freshmeat.net</a>.</p>\n\n<p>As a reminder, version 1.0 will have all chemical mime types supported, after which I will initiate a process to formalize\nthe meta data we want the kfile plugins to give, which will lead to the 2.0 release. So far, I had in mind that the next\nstep was to make the plugins ready for KDE 4.0, but I became aware of the <a href=\"http://developer.kde.org/documentation/library/kdeqt/kde3arch/mime.html\">mime magic</a>\nas implemented in <a href=\"http://developer.kde.org/documentation/library/3.1-api/classref/kio/KMimeMagic.html\">KMimeMagic</a>.</p>\n\n<p>So, concluding, I might squeeze in another beta release 3.0, where this magic gets addressed; knowing that it will definately\nnot work for all files, but hopefully it will for files with stupid file extensions like .log.</p>",
      "summary": "Jerome Pansanel contributed new patches for kfile_chemical; on Monday actually, but I have been busy with other things, among which a presentation I have to give next Monday for some 100+ analytical chemists. The patch adds support to KDE for five new chemical MIMEs: XYZ, Mol2, SMILES, VMD and GenBank. Therefore, I just released a new version (0.10), and added an announcement to Freshmeat.net.",
      
      "date_published": "2005-10-29T00:00:00+00:00",
      "date_modified": "2023-07-30T00:00:00+00:00",
      "tags": ["kde","chemistry","web"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/a4vw2-r5y93",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/10/27/my-birthday-31-and-adsense.html",
      "title": "My birthday (31) and the Adsense",
      "content_html": "<p>Today is my 31st birthday, nearing half-point now (statistically seen). Also, by now I should have had my scientific moment of glory, otherwise I can forget that Nobel prize. Oh well, forget it.</p>\n\n<p>Have you seen those small advertisements on this page (RSS users, please visit the website :)? Funny links they give. The system is very nice btw: it awaits google indexing of the blog and then decides which ads are relevant. Hence, the links to small chemoinformatics companies. Nice to browse.</p>\n\n<p>Disclaimer, when clicking any or all of the ads, I’ll get a bit of money. But don’t start clicking away, otherwise Adsense will get upset, and then I get nothing.</p>",
      "summary": "Today is my 31st birthday, nearing half-point now (statistically seen). Also, by now I should have had my scientific moment of glory, otherwise I can forget that Nobel prize. Oh well, forget it.",
      
      "date_published": "2005-10-27T00:00:00+00:00",
      "date_modified": "2005-10-27T00:00:00+00:00",
      "tags": ["google"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/y9z8g-s6k09",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/10/25/more-cdkinterfaces-updates.html",
      "title": "More cdk.interfaces updates",
      "content_html": "<p>Yesterday I had some spare time before going to a meeting about the <a href=\"http://www.woc.science.ru.nl/\">Woordenboek Organische Chemie</a>,\nso I was boldly going where no one has went before: getting the CDK module core independent of the data module. Why, you might wonder…</p>\n\n<p>Well, if the as many modules of CDK become independent of the classes implementing the data interfaces, i.e. those classes that\nimplement the <a href=\"http://cdk.sourceforge.net/api/org/openscience/cdk/interfaces/package-frame.html\">org.openscience.cdk.interfaces</a>\ninterfaces, then it becomes possible to make alternative implementations. For example, an implementation that also implement the\n<a href=\"http://octetsource.net/\">Octet</a> interfaces, or an implementation that extends the <a href=\"http://joelib.sf.net/\">JOELib</a> classes. In that\nway, combining these libraries becomes as easy as writing a blog :)</p>\n\n<p>Anyway, today I finished the <a href=\"http://cdk.sourceforge.net/api/org/openscience/cdk/config/AtomTypeFactory.html\">AtomTypeFactory</a>, and\nonly the <a href=\"http://cdk.sourceforge.net/api/org/openscience/cdk/config/IsotopeFactory.html\">IstopeFactory</a> remains to be updated.\nSince many classes in the CDK library use these two classes, patches had to be applied throughout the library. And code outside the\nCDK library might be broken now, so be aware…</p>",
      "summary": "Yesterday I had some spare time before going to a meeting about the Woordenboek Organische Chemie, so I was boldly going where no one has went before: getting the CDK module core independent of the data module. Why, you might wonder…",
      
      "date_published": "2005-10-25T00:00:00+00:00",
      "date_modified": "2005-10-25T00:00:00+00:00",
      "tags": ["cdk"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/sqzez-f9r89",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/10/24/jchempaint-applet-download-size-538kb.html",
      "title": "JChemPaint applet download size: 538kB",
      "content_html": "<p>A good functional molecular editor is of much important to the chemical web. There are a few small download sized editors around.\n<a href=\"http://jchempaint.sf.net/\">JChemPaint</a> has been available as applet for some time now, but the download size has been large. The\nsituation has improved considerable over the past months, and the download size upon which the applet now shows up in your webbrowser\nis down to 538kB. A live demo is available from <a href=\"http://www.chemistry-development-kit.org/\">www.chemistry-development-kit.org</a>.</p>\n\n<p>The applet, however, does have the same functionality as the full application. When a feature is used that is not available from the\njars downloaded first (which make up the 538kB), additional jars are downloaded.</p>\n\n<p>The applet is not bugless yet. For example, drawing reactions does not seem to work :( But, it’s really getting somewhere.\nCongrats to the applet development team!</p>",
      "summary": "A good functional molecular editor is of much important to the chemical web. There are a few small download sized editors around. JChemPaint has been available as applet for some time now, but the download size has been large. The situation has improved considerable over the past months, and the download size upon which the applet now shows up in your webbrowser is down to 538kB. A live demo is available from www.chemistry-development-kit.org.",
      
      "date_published": "2005-10-24T00:00:00+00:00",
      "date_modified": "2005-10-24T00:00:00+00:00",
      "tags": ["jchempaint"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/6tezh-5a955",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/10/23/wrapping-up.html",
      "title": "Wrapping up...",
      "content_html": "<p>Less then three months before the end of my contract of my PhD project. And not nearly done yet. Weekends are now spend on wrapping up\nbits of experimental research into something like a coherent article. And even lot’s of calculations to do to answer the open\nquestions. <a href=\"http://freemind.sourceforge.net/\">FreeMind</a> is helping me organize thoughts.</p>\n\n<p>Opensource chemoinformatics is a welcomed diversion now and then. Working on some easy-to-fix CDK bugs yesterday, like the\n<a href=\"https://cdk.github.io/cdk/latest/docs/api/org/openscience/cdk/isomorphism/matchers/QueryAtomContainer.html\">QueryAtomContainer <i class=\"fa-solid fa-recycle fa-xs\"></i></a> now correctly\nupdated for the recent <a href=\"http://sourceforge.net/mailarchive/forum.php?thread_id=8016575&amp;forum_id=2178\">cdk.interfaces changes <i class=\"fa-solid fa-link-slash fa-xs\"></i></a>. Fixed now.\nI also touched a lot of code when updating the FSF address in the LGPL license notice, and when I modified the construction of\n<a href=\"https://cdk.github.io/cdk/latest/docs/api/org/openscience/cdk/exception/CDKException.html\">CDKException <i class=\"fa-solid fa-recycle fa-xs\"></i></a>’s to set the causing Throwable.\nAlso helped out <a href=\"http://www.livejournal.com/users/cniehaus/\">Carsten</a> a bit with adding his data from\n<a href=\"http://edu.kde.org/kalzium/\">Kalzium</a> to the <a href=\"http://www.blueobelisk.org/\">Blue Obelisk</a>\n<a href=\"https://github.com/BlueObelisk/bodr\">data repository <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.</p>\n\n<p>Another nice diversion is <a href=\"http://wesnoth.org/\">The Battle for Wesnoth</a>. Just got killed, though.</p>",
      "summary": "Less then three months before the end of my contract of my PhD project. And not nearly done yet. Weekends are now spend on wrapping up bits of experimental research into something like a coherent article. And even lot’s of calculations to do to answer the open questions. FreeMind is helping me organize thoughts.",
      
      "date_published": "2005-10-23T00:00:00+00:00",
      "date_modified": "2023-07-29T00:00:00+00:00",
      "tags": ["phd","cdk"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/fbnx1-9r832",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/10/21/viagra-saves-environment.html",
      "title": "Viagra saves the environment",
      "content_html": "<p>This week there was an interesting article in the Dutch <a href=\"http://intermediair.nl/\">Intermediar</a> about viagra. They cite an article in\n<a href=\"http://www.swetswise.com/eAccess/viewTitleIssues.do?titleID=68609\">Environmental Conversation</a> and state that it saves the environment\nas it greatly reduced the market for animal parts from the traditional chinese medicine that address the same problem as viagra does.</p>\n\n<p><em>Viagra: good for the environment, good for you! ;)</em></p>\n\n<p>You don’t see this often, though. Public opinion, at least in my social environment, is that chemicals (in general) are bad for the environment,\nwhat so ever… Natural products are much better. Wait, those are chemical too… but that is to complicated for most :(</p>\n\n<p>BTW, viagra is <a href=\"http://www.google.com/search?client=safari&amp;rls=en-us&amp;q=InChI%3D1S%2FC22H30N6O4S.C6H8O7%2Fc1-5-7-17-19-20%2827%284%2925-17%2922%2829%2924-21%2823-19%2916-14-15%288-9-18%2816%2932-6-2%2933%2830%2C31%2928-12-10-26%283%2911-13-28%3B7-3%288%291-6%2813%2C5%2811%2912%292-4%289%2910%2Fh8-9%2C14H%2C5-7%2C10-13H2%2C1-4H3%2C%28H%2C23%2C24%2C29%29%3B13H%2C1-2H2%2C%28H%2C7%2C8%29%28H%2C9%2C10%29%28H%2C11%2C12%29\">InChI=1S/C22H30N6O4S.C6H8O7/c1-5-7-17-19-20(27(4)25-17)22(29)24-21(23-19)16-14-15(8-9-18(16)32-6-2)33(30,31)28-12-10-26(3)11-13-28;7-3(8)1-6(13,5(11)12)2-4(9)10/h8-9,14H,5-7,10-13H2,1-4H3,(H,23,24,29);13H,1-2H2,(H,7,8)(H,9,10)(H,11,12) <i class=\"fa-solid fa-recycle fa-xs\"></i></a>.</p>",
      "summary": "This week there was an interesting article in the Dutch Intermediar about viagra. They cite an article in Environmental Conversation and state that it saves the environment as it greatly reduced the market for animal parts from the traditional chinese medicine that address the same problem as viagra does.",
      
      "date_published": "2005-10-21T00:00:00+00:00",
      "date_modified": "2005-10-21T00:00:00+00:00",
      "tags": ["chemistry","environment"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/z97vw-87009",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/10/20/cdk-news-23-and-inchis.html",
      "title": "CDK News 2.3 and InChI&apos;s",
      "content_html": "<p><a href=\"https://sourceforge.net/projects/cdk/files/CDK%20News/\">CDK News <i class=\"fa-solid fa-recycle fa-xs\"></i></a> 2.3 is scheduled for this month, and origanally\nplanned to be distributed on the CDK5AW event. So, it’s a bit late. But the editorial process is converging… I realized that\nI forgot to mention the requirement for <a href=\"http://www.iupac.org/inchi/\">InChI</a>’s whenever molecules are given. So,\nI’m now in the process of going through the issue and add the missing identifiers…</p>",
      "summary": "CDK News 2.3 is scheduled for this month, and origanally planned to be distributed on the CDK5AW event. So, it’s a bit late. But the editorial process is converging… I realized that I forgot to mention the requirement for InChI’s whenever molecules are given. So, I’m now in the process of going through the issue and add the missing identifiers…",
      
      "date_published": "2005-10-20T00:00:00+00:00",
      "date_modified": "2023-07-28T00:00:00+00:00",
      "tags": ["cdk","inchi"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/904sy-xc977",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/10/19/jmols-fah-team-in-top-800.html",
      "title": "Jmol&apos;s FAH team in Top 800",
      "content_html": "<p>The <a href=\"https://wiki.jmol.org/index.php/Folding_At_Home_Community\">Jmol FAH team <i class=\"fa-solid fa-recycle fa-xs\"></i></a> has just entered the Top 800 of most active\n<a href=\"https://foldingathome.org/\">Folding@Home <i class=\"fa-solid fa-recycle fa-xs\"></i></a> teams. And they started monitoring contributions on a user level. Thus, I can now see how active\nI am within the team. And so can you! Join the team, and let’s get into the Top 500!</p>",
      "summary": "The Jmol FAH team has just entered the Top 800 of most active Folding@Home teams. And they started monitoring contributions on a user level. Thus, I can now see how active I am within the team. And so can you! Join the team, and let’s get into the Top 500!",
      
      "date_published": "2005-10-19T00:00:00+00:00",
      "date_modified": "2005-10-19T00:00:00+00:00",
      "tags": ["jmol"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/bs3x9-0em56",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/10/19/inchi-meta-data-with-kfilechemical.html",
      "title": "InChI meta data with kfile_chemical",
      "content_html": "<p>I’ve just uploaded <a href=\"http://web.archive.org/web/20051120044043/http://www.kde-apps.org/content/show.php?content=28995\">kfile_chemical 0.9 <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>. It has new translations for\nES and DA, and plugins for <a href=\"http://www.iupac.org/inchi/\">InChI</a> files. It will extract the InChI string as meta data (and will thus be used by the\n<a href=\"http://www.kde.org/\">KDE</a> desktop search <a href=\"http://web.archive.org/web/20230727174017/https://lwn.net/Articles/148822/\">Kat <i class=\"fa-solid fa-recycle fa-xs\"></i></a>, and the InChI version number.</p>\n\n<p>Thinking about this, it might be useful to extract all layers as meta data, so that one can search on chemical formula and even\nconnectivity, and find all matching structures. Not really close to substructure search, but we’ll tackle that later :)</p>",
      "summary": "I’ve just uploaded kfile_chemical 0.9 . It has new translations for ES and DA, and plugins for InChI files. It will extract the InChI string as meta data (and will thus be used by the KDE desktop search Kat , and the InChI version number.",
      
      "date_published": "2005-10-19T00:00:00+00:00",
      "date_modified": "2023-07-27T00:00:00+00:00",
      "tags": ["kde","inchi"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/pk40z-7z702",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/10/18/cdk-taverna-fully-recognized.html",
      "title": "CDK-Taverna fully recognized",
      "content_html": "<p>After asking about it, Tom explained me how <a href=\"http://taverna.sf.net/\">Taverna</a> can pick\nup the <code class=\"language-plaintext highlighter-rouge\">apiconsumer.xml</code> file from jars: just copy it into the root directory of the jar package. Easy as that.</p>\n\n<p>So, users now only need to copy the <code class=\"language-plaintext highlighter-rouge\">cdk-taverna.jar</code> into the <code class=\"language-plaintext highlighter-rouge\">taverna-workbench-1.3/lib/</code> directory and have a nice chemoinformatics\nworkbench environment. I’ll upload the jar to <a href=\"http://sourceforge.net/projects/cdk\">CDK’s project page</a> right now.</p>",
      "summary": "After asking about it, Tom explained me how Taverna can pick up the apiconsumer.xml file from jars: just copy it into the root directory of the jar package. Easy as that.",
      
      "date_published": "2005-10-18T00:00:00+00:00",
      "date_modified": "2005-10-18T00:00:00+00:00",
      "tags": ["cdk","workflow"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/f4370-9cz05",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/10/17/cia-statistics-for-blue-obelisk.html",
      "title": "CIA statistics for Blue Obelisk",
      "content_html": "<p>I have just enabled <a href=\"https://web.archive.org/web/20051024075530/http://cia.navi.cx/\">CIA <i class=\"fa-solid fa-box-archive fa-xs\"></i></a> statistics for the\n<a href=\"https://web.archive.org/web/20060422193559/http://www.blueobelisk.org/repos/blueobelisk/\">Blue Obelisk SVN <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>:\n<a href=\"http://cia.navi.cx/stats/project/cdk/blueobelisk\">/stats/project/cdk/blueobelisk  <i class=\"fa-solid fa-link-slash fa-xs\"></i></a>.</p>\n\n<p>It’s done by using the <a href=\"https://web.archive.org/web/20050924050012/http://cia.navi.cx/doc/clients\">ciabot_svn.py <i class=\"fa-solid fa-box-archive fa-xs\"></i></a>\nclient script and hooked into the <code class=\"language-plaintext highlighter-rouge\">$REPOS/hooks/post-commit</code> hook on the SVN server. The client script is slightly hacked to hard code the module name, which\notherwise did not show up on the <a href=\"irc://irc.freenode.net/#cdk\">chat channel</a>.</p>",
      "summary": "I have just enabled CIA statistics for the Blue Obelisk SVN : /stats/project/cdk/blueobelisk .",
      
      "date_published": "2005-10-17T00:00:00+00:00",
      "date_modified": "2023-07-27T00:00:00+00:00",
      "tags": ["blue-obelisk"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/rgdzb-bfe36",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/10/15/single-pdfs-for-cdk-news-articles.html",
      "title": "Single PDFs for CDK News articles",
      "content_html": "<p>This week was the <a href=\"https://web.archive.org/web/20080208101002/http://almost.cubic.uni-koeln.de/cdk/cdk_top/events/cdk5yearworkshop/\">CDK5AW <i class=\"fa-solid fa-box-archive fa-xs\"></i></a> event, a workshop for users and\ndevelopers of the <a href=\"http://cdk.sf.net/\">Chemistry Development Kit</a> (CDK). After talking with other developers we agreed on\ncreating PDF and HTML versions of single articles that appeared in the\n<a href=\"https://sourceforge.net/projects/cdk/files/CDK%20News/\">CDK News <i class=\"fa-solid fa-recycle fa-xs\"></i></a> newsletter. Well, I haven’t figured out how to create nice HTML\n(the latex2html does not give nice results, anyone ideas?), but for the PDF version I now have a pipeline.</p>\n\n<p>For each article, a split.config file determines which pages from the CDK News issue PDF should be extracted. To do this, I used the\n<a href=\"http://www.accesspdf.com/pdftk/\">PDF ToolKit</a>, or pdftk for short (comes with Debian/Unbuntu by default). And using a Perl script to read this config files,\nthe pipeline creates PDF files for each article. Currently, I’ll only have it do the features articles; that is, not the\nChangeLog, Editorial, Literature and FAQ. For those you’ll need to download the full issue. If you don’t like that, let me know :)</p>\n\n<p>Ok, you will probably have noticed that the almost server is down\n(<a href=\"http://www.google.com/search?q=CDK+News\">Googling for ‘CDK News’</a> allows you read the cache!), and\nI the PDF’s will be uploaded there asap. For those not familiar with CDK News, the articles are FDL, so feel free to\ncopy and distribute them. If you reuse the text and update it, which is allowed too, please let us know.</p>",
      "summary": "This week was the CDK5AW event, a workshop for users and developers of the Chemistry Development Kit (CDK). After talking with other developers we agreed on creating PDF and HTML versions of single articles that appeared in the CDK News newsletter. Well, I haven’t figured out how to create nice HTML (the latex2html does not give nice results, anyone ideas?), but for the PDF version I now have a pipeline.",
      
      "date_published": "2005-10-15T00:00:00+00:00",
      "date_modified": "2023-07-27T00:00:00+00:00",
      "tags": ["cdk"],
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    },
    {
      "id": "https://doi.org/10.59350/za0jj-7x159",
      "url": "https://chem-bla-ics.linkedchemistry.info/2005/10/15/chem-bla-ics.html",
      "title": "Chem-bla-ics",
      "content_html": "<p>This new blog will deal with chemblaics in the broader sense, and will not be restricted to research in this field\nin which I am involved personally.</p>\n\n<p>Chemblaics (pronounced chem-bla-ics) is the science that uses computers to address and possibly solve problems in\nthe area of chemistry, biochemistry and related fields. The general denomiter seems to be molecules, but I might\nbe wrong there.</p>\n\n<p>The <strong>big</strong> difference between chemblaics and areas as cheminformatics, chemoinformatics, chemometrics, proteochemometrics,\netc, is that chemblaic <em>only</em> uses open source software, making experimental results reproducable and validatable.\nAnd this is a <strong>big</strong> difference with how research in these areas is now often done.</p>\n\n<p>Egon</p>",
      "summary": "This new blog will deal with chemblaics in the broader sense, and will not be restricted to research in this field in which I am involved personally.",
      
      "date_published": "2005-10-15T00:00:00+00:00",
      "date_modified": "2005-10-15T00:00:00+00:00",
      
      
      
      
        "authors": [ { "name": "Egon Willighagen", "url": "https://orcid.org/0000-0001-7542-0286" } ]
      
    }
  ]
}