<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.3.4">Jekyll</generator><link href="https://chem-bla-ics.linkedchemistry.info/feed/by_tag/cheminf.xml" rel="self" type="application/atom+xml" /><link href="https://chem-bla-ics.linkedchemistry.info/" rel="alternate" type="text/html" /><updated>2026-04-19T09:50:36+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/feed/by_tag/cheminf.xml</id><title type="html">chem-bla-ics</title><subtitle>Chemblaics (pronounced chem-bla-ics) is the science that uses open science and computers to solve problems in chemistry, biochemistry and related fields.</subtitle><author><name>Egon Willighagen</name></author><entry><title type="html">AI Technologies in Academia</title><link href="https://chem-bla-ics.linkedchemistry.info/2025/08/18/ai-technologies-in-academia.html" rel="alternate" type="text/html" title="AI Technologies in Academia" /><published>2025-08-18T00:00:00+00:00</published><updated>2025-08-18T00:00:00+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/2025/08/18/ai-technologies-in-academia</id><content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2025/08/18/ai-technologies-in-academia.html"><![CDATA[<p>I have had the <a href="https://openletter.earth/open-letter-stop-the-uncritical-adoption-of-ai-technologies-in-academia-b65bba1e">Open Letter: Stop the Uncritical Adoption of AI Technologies in Academia</a>
from June 27 open for some time now. I thought I wanted to sign it, but got stuck on the first paragraphs multiple times:</p>

<blockquote>
  <p>With this letter we take a principled stand against the proliferation of so-called ‘AI’ technologies in universities. As an educational institution,
we cannot condone the uncritical use of AI by students, faculty, or leadership. We also call for reconsidering any direct financial relationships
between Dutch universities and AI companies.The unfettered introduction of AI technology leads to contravention of the spirit of the EU Al act. It
undermines our basic pedagogical values and the principles of scientific integrity. It prevents us from maintaining our standards of independence
and transparency. And most concerning, AI use has been shown to hinder learning and deskill critical thought.</p>
</blockquote>

<p>These few lines contain for me more than 25 years of research and I know the complexities. Before I can co-sign this letter,
I need to understand the details. There is no definition of ‘AI’ here and it mentiones the <a href="https://eur-lex.europa.eu/legal-content/NL/TXT/?uri=CELEX:32024R1689">EU AI Act</a>
(I guess, the letter actually writes “Al” (with an <code class="language-plaintext highlighter-rouge">l</code> of letter) act, I notice now after I read the content in another font),
but I have not read the EU AI Act yet (it is 144 pages of legal text).</p>

<h2 id="the-legal-context-of-the-open-letter">The legal context of the Open Letter</h2>

<p>Let me first say, I am not a lawyer (IANAL). I am not versed in the specific legal definitions of tightly defined and controlled
words.</p>

<p>Reading the <em>EU AI Act</em>, I read a reassuring opening statement (repeated later with more context, links to other laws, etc):</p>

<blockquote>
  <p>to promote the uptake of human centric and trustworthy artificial intelligence (AI) while ensuring a high level of protection
of health, safety, fundamental rights as enshrined in the Charter of Fundamental Rights of the European Union (the ‘Charter’),
including democracy, the rule of law and environmental protection, to protect against the harmful effects of AI systems in the Union</p>
</blockquote>

<p>We clearly see how these things are currently routinely violated.</p>

<blockquote>
  <p>This Regulation does not apply to AI systems or AI models, including their output, specifically developed and put into service
for the sole purpose of scientific research and development.</p>
</blockquote>

<p>In Dutch this is officially translated to “wetenschappelijk onderzoek”, so <em>scientific research</em> seems to be legally
including humanites, etc, and not limited to natural sciences [citation needed].</p>

<p>The EU AI Act also outlines a definition of “AI”, leaning towards machine learning, but the border between deterministic,
rule-based algorithms and machine-learned patters for predictions remains a bit vague to me. But I can live with it.</p>

<p>The Open Letter’s <em>contravention of the spirit of the EU Al act</em> gets context here too. It has to be the <em>spirit</em>,
because the law does not apply to academia. Good, clarified. The Letter continues with:</p>

<blockquote>
  <p>It undermines our basic pedagogical values and the principles of scientific integrity.
It prevents us from maintaining our standards of independence and transparency. And most concerning, AI use has been
shown to hinder learning and deskill critical thought.</p>
</blockquote>

<p>Yes, that clearly links to the EU AI Act’s protection of rights. Maybe on purpose and maybe there are legal reasons
to not explicitly list them, are the international human rights, which includes rigths to benefit from science,
but I think this is still in the spirit of the EU AI Act. And if AI fetters our ability to learn (yes, there
is scientific evidence for that [citation needed]), then it violates the EU AI Act (IANAL).</p>

<h2 id="what-the-open-letter-expects">What the Open Letter expects</h2>

<p>The next part of the Open Letter calls to what the signers expect from our universities. I will will reflect on each of them.</p>

<blockquote>
  <p><strong>Resist the introduction of AI in our own software systems</strong>, from Microsoft to OpenAI to Apple. It is not in our interests
to let our processes be corrupted and give away our data to be used to train models that are not only useless to us, but
also harmful.</p>
</blockquote>

<p>The intrinsic problem and why I think it is fair to call out these companies, is, as the letter explains, there
is an clear conflict of interest. The goal of companies is to make profit (and in a Western world, as much
as possible), and not any of the human or scientific needs. In this respect, companies like Elsevier
could just as well have mentioned too (see e.g. <a href="https://irisvanrooijcogsci.com/2025/08/12/ai-slop-and-the-destruction-of-knowledge/">this post by Prof. Van Rooij</a>,
actually 2nd signature on the letter).</p>

<blockquote>
  <p><strong>Ban AI use in the classroom</strong> for student assignments, in the same way we ban essay mills and other forms
of plagiarism. Students must be protected from de-skilling and allowed space and time to perform their
assignments themselves.</p>
</blockquote>

<p>About a year ago, I was pleasently surprised by the depth of discussion at Maastricht University on how and when
to use AI, and by default not. This one is really complicated and it matters when and how the AI is used.
After all, and the spirit of the EU AI Act expects us to use AI in research (to trigger innovation). So,
I cannot agree with the literal statement, but I fully agree with the spirit. Particularly combined with
the clear “Stop the Uncritical Adoption of AI Technologies in Academia” of the title of the Open Letter.</p>

<p>I read this line like this, AI in the classroom must have a purpose that aligns with the EU AI Act.
That means, use for writing assays, reports, it must not be used. I am old enough that remember the
academic discussions (at Radboud University) about writing and the clear hesitance among scholars
about the use of written assignments: “I want to test their scientific knowledge and reasoning skills,
not their ability to write narratives”. And LLMs, like ChatGPT but also the European, more open variants,
they write narratives, so the written report and assay is no longer a valid way to assess a student’s
scientific learning progress.</p>

<p>So, alternatively, we should very carefully and scientifically evaluate which forms of assessment
we perform, and banning AI in the classroom may just be distracting from a more fundamental problem.
Anyways… if you continue using writing assignments to test progress in learning, you must ban
use of AI in that process. You must be testing the student, not some piece of software (as a teaching
institute).</p>

<blockquote>
  <p><strong>Cease normalising the AI hype</strong> and the lies which are prevalent in the technology industry’s framing of
these technologies. The technologies do not have the advertised capacities and their adoption puts students
and academics at risk of violating ethical, legal, scholarly, and scientific standards of reliability,
sustainability, and safety.</p>
</blockquote>

<p>Sounds like a no brainer. But I too find my own university uncritically promoting AI. Maybe the tested
it well, and just forgot to share that. But hey, scientifical quality goes all ways.</p>

<blockquote>
  <p><strong>Fortify our academic freedom</strong> as university staff to enforce these principles and standards in our
classrooms and our research as well as on the computer systems we are obliged to use as part of our
work. We as academics have the right to our own spaces.</p>
</blockquote>

<p>Again, a no brainer. But important to add. It must be said as it is intrisic part of
<a href="https://recognitionrewards.nl/">Recognition &amp; Rewards</a>. If you cannot guarantee academic freedom,
there there is something seriously wrong with your R&amp;R.</p>

<blockquote>
  <p><strong>Sustain critical thinking on AI</strong> and promote critical engagement with technology on a firm
academic footing. Scholarly discussion must be free from the conflicts of interest caused by
industry funding, and reasoned resistance must always be an option.</p>
</blockquote>

<p>Yeah, this is something that is underestimated. Part of our academic teaching is this critical thinking.
It returns in academic reading (did you already read <em>“What Little Red Riding Hood Can Teach Us about Reading Science”</em>,
doi:<a href="https://uplopen.com/chapters/e/10.1515/9783110782844-010">10.1515/9783110782844-010</a>,
by <a href="https://scholar.google.com/citations?user=0KRmIbcAAAAJ&amp;hl=nl&amp;oi=ao">Monica Gonzalez-Marquez</a> <em>et al.</em>?),
scientific programming, data analysis, and our teaching has been
lacking here. Not just for new AI forms, but also for the old algorihmts. I have seen this, and
scientific literature is riddled with mistakes, just because our peer reviewers are not sufficiently
skilled. This will take effort. I know, it was a major part of
<a href="https://chem-bla-ics.linkedchemistry.info/2008/03/01/todo-april-2nd-defend-my-phd-work.html">my PhD thesis</a>.</p>

<p>Of course, this is exactly why I have been so active in Open Science. Without Open Science,
we cannot work <em>in the spirit</em> of the EU AI Act. It’s nothing new. It’s just that the big money
has found in AI a way to profit at the expense of humans.</p>

<p>So, go read that <a href="https://openletter.earth/open-letter-stop-the-uncritical-adoption-of-ai-technologies-in-academia-b65bba1e">Open Letter</a>
and sign too!</p>]]></content><author><name>Egon Willighagen</name></author><category term="cheminf" /><category term="chemometrics" /><category term="justdoi:10.1515/9783110782844-010" /><category term="openscience" /><summary type="html"><![CDATA[I have had the Open Letter: Stop the Uncritical Adoption of AI Technologies in Academia from June 27 open for some time now. I thought I wanted to sign it, but got stuck on the first paragraphs multiple times:]]></summary></entry><entry><title type="html">One Million IUPAC names</title><link href="https://chem-bla-ics.linkedchemistry.info/2025/03/08/iupac-names.html" rel="alternate" type="text/html" title="One Million IUPAC names" /><published>2025-03-08T00:00:00+00:00</published><updated>2025-03-08T00:00:00+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/2025/03/08/iupac-names</id><content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2025/03/08/iupac-names.html"><![CDATA[<p>Names of chemicals are part of the human user experience when browsing a chemical database. And literature too,
of course. Chemical names are also not easy to use, and what a chemical name means is not always clear.
This is why the <a href="https://en.wikipedia.org/wiki/International_Union_of_Pure_and_Applied_Chemistry">IUPAC</a>
started a standardizing nomenclature in chemistry, the <em>IUPAC names</em>. Each IUPAC name uniquely defines
the chemical structure it defines. For example, <em>methane</em> is the IUPAC name for the chemical CH<sub>4</sub>.</p>

<p>So, when propagating chemical structures from the <a href="https://chem-bla-ics.linkedchemistry.info/2025/02/13/beiltein-journal-has-bioschemas.html">Beilstein Bioschemas feed</a>,
I was looking for names, IUPAC or not, ideally the name used in the article. When I asked about this,
the question came up if they could autogenerate IUPAC names, for which
<a href="https://doi.org/10.1038/s41598-021-94082-y">various</a>
<a href="https://doi.org/10.1186/s13321-021-00535-x">new</a>
<a href="https://doi.org/10.1186/s13321-021-00512-4">tools</a>
<a href="https://doi.org/10.1186/s13321-024-00941-x">exist</a>
(I think I am missing one from an American team, but cannot find the reference),
along with multiple established commerical tools.
Because the IUPAC nomenclature is a long list of naming rules, priorities, etc, a rule-based
algorithm is logical, but newer methods take a deep-learning approach.</p>

<p>Back to the chemical annotation of chemistry literature. This is of obvious interest: you want
to know where we can read more about a certain chemical. We need the chemical structures in
a database for that, linked to the articles. This is, of course, one of the original studies
of <em>cheminformatics</em>. And when authors of the chemical literature do not provide this routinely
(<a href="https://chem-bla-ics.linkedchemistry.info/2025/02/13/beiltein-journal-has-bioschemas.html">this post</a>
shows a few exceptions, but it is still all too rare). And then manual and automated curation
is needed, e.g. done by <a href="https://en.wikipedia.org/wiki/Chemical_Abstracts_Service">Chemical Abstracts</a>.</p>

<p>Third, <a href="https://wikidata.org/">Wikidata</a> has <a href="https://scholia.toolforge.org/chemical/">about 1.4 million</a>
chemical compounds and many names. A <a href="https://www.wikidata.org/wiki/Wikidata:Property_proposal/Pending#IUPAC_name">property propoal for IUPAC names</a>
has been long pending, but once accepted in one form or another, will require IUPAC names too.</p>

<h2 id="one-million-iupac-names">One million IUPAC names</h2>

<p>Thus, the idea came up, can we create a set of 1 million unique IUPAC names found in literature?
I asked on the <a href="https://elixir-europe.org/">ELIXIR Europe</a> slack channel if <a href="https://europepmc.org/">Europe PMC</a>
had such a dataset (doi:<a href="https://doi.org/10.1093/nar/gkad1085">10.1093/nar/gkad1085</a>). I knew they had been adding chemical
<a href="https://scholia.toolforge.org/topic/Q403574">named-entity recognition</a> (NER) results in
<a href="https://europepmc.org/Annotations">their annotation API</a>. I learned they used <a href="https://www.ebi.ac.uk/chebi/">ChEBI</a>.
Melanie Vollmar and Summer Rosonovski or Europe PMC gave useful information and support.
<a href="https://cpm.lumc.nl/research/bioinformatics-224/magnus-palmblad-5">Magnus Palmblad</a> also replied
and provided Python code to use the Europe PMC API to fetch names it returns and see if those
are IUPAC names. Well, that’s easy. We have <a href="https://opsin.ch.cam.ac.uk/">OPSIN</a> for that
(see doi:<a href="https://doi.org/10.1021/ci100384d">10.1021/ci100384d</a>).</p>

<p>Unfortunately, the Europe PMC NER results are not ideal for IUPAC names. Just scanning
some 5, 6 organic chemistry journals returned some 8 thousand IUPAC names in open access
articles. But it quickly started to be too limited: each set of articles returned
increasingly few new names. The reason is simple: the NER is too <em>greedy</em> and as a
result, does not easily recognize longer IUPAC names. It is too happy with a substring
of the IUPAC name. For example, when it encounters the IUPAC name <em>5-Bromo-1H-indole-3-carboxylic acid</em>,
it settles for <em>indole-3-carboxylic acid</em>:</p>

<p><img src="/assets/images/greedy.png" alt="" /></p>

<h2 id="open-source-chemistry-analysis-routines">Open-Source Chemistry Analysis Routines</h2>

<p>During my PhD, in 2003, when I worked a few months with Prof. <a href="https://scholia.toolforge.org/author/Q908710">Peter Murray-Rust</a> (University of Cambridge)
and Prof. Janet Thornthon (EMBL-EBI), I learned about the research by <a href="https://scholia.toolforge.org/author/Q28946549">Sam Adams</a>
(doi:<a href="https://doi.org/10.1039/B411699M">10.1039/B411699M</a>), <a href="https://scholia.toolforge.org/author/Q133040220">Joe Townsend</a>
(doi:<a href="https://doi.org/10.1039/B411033A">10.1039/B411033A</a>), and <a href="https://scholia.toolforge.org/author/Q90318722">Peter Corbett</a>
(doi:<a href="https://doi.org/10.1007/11875741_11">10.1007/11875741_11</a>). One of the tools that used
this research was (is) <a href="https://scholia.toolforge.org/topic/Q133037490">OSCAR</a>,
short for <em>Open-Source Chemistry Analysis Routines</em> (see <a href="https://blogs.ch.cam.ac.uk/pmr/2009/05/16/opsin-and-oscar-chemical-language-processing/">this detailed write up by Peter MR</a>).
Later, in 2010 I visted Peter again, as postdoc, in Cambridge, and then
<a href="https://chem-bla-ics.linkedchemistry.info/2010/10/15/working-on-oscar-for-three-months.html">worked on the OSCAR project</a> too.
And while OSCAR did a lot more, the integration of <a href="https://chem-bla-ics.linkedchemistry.info/2010/12/26/oscar-training-data-models-etc.html">Corbett’s NER research</a>
made OSCAR the obvious follow-up step in finding IUPAC names in literature.</p>

<p>And because <a href="https://chem-bla-ics.linkedchemistry.info/2011/09/27/almost-year-ago-i-started-position-with.html">OSCAR4 had been integrated into Bioclipse</a>
(doi:<a href="https://doi.org/10.1186/1758-2946-3-41">10.1186/1758-2946-3-41</a>) and I had this ported to Bacting already
(doi:<a href="https://doi.org/10.21105/joss.02558">10.21105/joss.02558</a>), using this was trivial.
The use of Europe PMC is different now, however, and we are no longer using the Annotations API,
but just using it to find open access articles, and to get the full text in XML format.
That allows a simple XPath search on <code class="language-plaintext highlighter-rouge">&lt;p&gt;</code> elements, pass the resulting string to OSCAR4,
and the recognized names are checked with OPSIN.
And with this approach, processing two of the five or six journals we earlier explored,
we find another 40+ thousand IUPAC names. Quite a success, I am tempted to say.</p>

<h2 id="a-blue-obelisk-project">A Blue Obelisk project</h2>

<p>So, I started a new <a href="https://blueobelisk.github.io/">Blue Obelisk</a> project,
<a href="https://github.com/BlueObelisk/iupac-names">iupac-names</a>, to collect 1M IUPAC names. For researchers
to use, learn from, etc. Just IUPAC names. Not even the chemical structure, nor the link to the
articles. The first is trivial to do with OPSIN, so the matching SMILES do not need to be stored.
Links to literature is tricky because of the aforementioned issues, and we only want to know
which (partial) IUPAC names occur in literature. If you really want to know in which articles
that IUPAC name is found, you can simply do a search in Europe PMC.</p>

<p>And because we only store IUPAC names, this are very basic facts (this is an IUPAC name, as defined
by OPSIN being able to generate a SMILES for this structure) and that that string occurs in
some article) and we can share them as CCZero. We <a href="is:issue" title="milestone release">defined various milestones</a>,
and I am happy that the first two have been reached within two weeks:</p>

<ul>
  <li><a href="https://github.com/BlueObelisk/iupac-names/releases/tag/milestone-10k">Milestone 10k</a> (doi:<a href="https://doi.org/10.5281/zenodo.14965762">10.5281/zenodo.14965762</a>)</li>
  <li><a href="https://github.com/BlueObelisk/iupac-names/releases/tag/milestone-50k">Milestone 50k</a> (doi:<a href="https://doi.org/10.5281/zenodo.14978557">10.5281/zenodo.14978557</a>)</li>
</ul>

<p>This second milestone has 53848 unique names, but as literature goes, there are interesting
variations, some likely because of typesetting leading to spaces added and missing. If
we ignore spaces and hyphens, we have 50534 names left (hence the milestone). But IUPAC
names are also not fully unique, partly because of Unicode character variations and greek
letter alternatives, and you may wonder how many different chemical structures this set
reflects. While not perfect, the Standard InChI gives some lower limit, and we find 36528
InChIKeys in this second milestone.</p>

<p>Now, we need twenty times as much to reach the 1M IUPAC names, but given we have many, many
more open access articles to process. The bottleneck seems to be mostly our workflow.</p>

<h3 id="can-you-contribute">Can you contribute?</h3>

<p>Yes, of course! This is an open science project. But please keep in mind the narrow focus of this
project: only IUPAC names which can be found in (open access) literature. This project doed not accept
autogenerated names (PubChem would have given use many millions already), nor IUPAC names from existing
databases. Ideally, you are able to show the code you use to extract/find those names in literature.</p>

<h3 id="can-i-use-these-names">Can I use these names?</h3>

<p>First of all, this is what the CCZero license and open science nature of this project is about: reuse.
We love to hear how you are using these names, tho, and we encourage you to write up how you
are using them. You can use <a href="https://datacite.org/">DataCite</a> to cite the release you used,
and citing this blog post by DOI is also possible.</p>

<h3 id="does-it-support-my-language-too">Does it support my language too?</h3>

<p>No, at this moment it only support IUPAC names in English. Dutch, French, Spanish, or Chinese
IUPAC names are valid, but currently not supported. See also
<a href="https://chem-bla-ics.linkedchemistry.info/2010/12/30/text-mining-chemistry-from-dutch-or.html">this post</a>.</p>

<h3 id="will-there-be-a-publication">Will there be a publication?</h3>

<p>Magnus and I intend so. We already submitted an abstract to the <a href="https://iccs-nl.org/">International Conference on Chemical Structures</a>,
which has <a href="https://www.biomedcentral.com/collections/ICCS25">a Collection in the Journal of Cheminformatics</a>.
If the abstract gets accepted, of course, we can submit there. Otherwise, we will look for another venue,
likely <a href="https://en.wikipedia.org/wiki/Diamond_open_access">diamond open access</a>.</p>

<h3 id="where-is-your-script">Where is your script?</h3>

<p>Ah, fair point. We did not decide on the final license yet. I have used two scripts based on the template
by Magnus. As soon as we have finalized the license, we will make those available.</p>]]></content><author><name>Egon Willighagen</name></author><category term="iupac" /><category term="cheminf" /><category term="justdoi:10.1038/s41598-021-94082-y" /><category term="justdoi:10.1186/s13321-021-00512-4" /><category term="justdoi:10.1186/s13321-021-00535-x" /><category term="justdoi:10.1186/s13321-024-00941-x" /><category term="justdoi:10.1021/ci100384d" /><category term="oscar" /><category term="justdoi:10.1039/B411699M" /><category term="justdoi:10.1039/B411033A" /><category term="justdoi:10.1007/11875741_11" /><category term="textmining" /><category term="cito:usesMethodIn:10.1186/1758-2946-3-41" /><category term="cito:usesMethodIn:10.21105/JOSS.02558" /><category term="cito:usesMethodIn:10.1093/nar/gkad1085" /><category term="cito:citesAsEvidence:10.5281/zenodo.14965762" /><category term="cito:citesAsEvidence:10.5281/zenodo.14978557" /><category term="europepmc" /><summary type="html"><![CDATA[Names of chemicals are part of the human user experience when browsing a chemical database. And literature too, of course. Chemical names are also not easy to use, and what a chemical name means is not always clear. This is why the IUPAC started a standardizing nomenclature in chemistry, the IUPAC names. Each IUPAC name uniquely defines the chemical structure it defines. For example, methane is the IUPAC name for the chemical CH4.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://chem-bla-ics.linkedchemistry.info/assets/images/greedy.png" /><media:content medium="image" url="https://chem-bla-ics.linkedchemistry.info/assets/images/greedy.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Richard L. Apodaca</title><link href="https://chem-bla-ics.linkedchemistry.info/2024/12/08/rich-l-apodaca.html" rel="alternate" type="text/html" title="Richard L. Apodaca" /><published>2024-12-08T00:00:00+00:00</published><updated>2024-12-08T00:00:00+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/2024/12/08/rich-l-apodaca</id><content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2024/12/08/rich-l-apodaca.html"><![CDATA[<p><img src="/assets/images/depth_first.png" style="width: 40%; display: block; margin-left: auto; margin-right: auto; float: right" alt="Screenshot of the first Depth-First blog post" />
If you are into openscience chemistry or chemistry blogging, then you probably heard of
<a href="https://orcid.org/0000-0003-3855-9427">Rich Apodaca</a>’s <a href="https://depth-first.com/">Depth-First blog</a>. <!-- keep link -->
Rich <a href="https://doi.org/10.59350/xyp0f-9dt42">started blogging in 2006 <i class="fa-solid fa-recycle fa-xs"></i></a> but this is not
how I discovered his work originally. I know that we at least already had contact in 2005,
because that is when he wrote about an integration between his Octet library and the Chemistry Development Kit
in the <a href="https://sourceforge.net/projects/cdk/files/CDK%20News/">CDK News</a> (volume 2, issue 2),
<em>CDKTools: The CDK-Octet Bridge</em>. In 2006 he <a href="https://doi.org/10.59350/esgte-mv539">reviewed our use of the Open Journal System for CDK News <i class="fa-solid fa-recycle fa-xs"></i></a>.</p>

<p>But I did find we have been blogging about our work a lot. <a href="https://www.google.com/search?q=site%3Achem-bla-ics.blogspot.com+rich">Searching for Rich</a>
gives false positives, but plenty of discussions of his work. At the same time, <a href="https://www.google.com/search?q=site:depth-first.com+egon">my name shows up multiple times</a> <!-- keep link -->
in Depth-First too. Looking back at our shared history, we find, for example, Rich has blogged a lot about using the
<a href="https://doi.org/10.59350/50ebs-4zq55">Chemistry Development Kit in Ruby <i class="fa-solid fa-recycle fa-xs"></i></a>.</p>

<p>Rich <a href="https://depth-first.com/articles/">blogged about a lot of cheminformatics innovation</a>. For example, <!-- keep link -->
in 2006 <a href="https://doi.org/10.59350/pz3p6-fv247">he was working on multi-atom bonding <i class="fa-solid fa-recycle fa-xs"></i></a>,
such as in ferrocene, something that is even today not routinely used in cheminformatics. I replied
to that in <a href="https://chem-bla-ics.linkedchemistry.info/2006/12/30/modern-chemistry-in-cdk-beyond-two.html">this post</a>.
Another thing he explored was embedding chemical graph notations in PNG images. In 2007 he
wrote how to <a href="https://doi.org/10.59350/j026p-17z02">Never Draw the Same Molecule Twice: Image Metadata for Cheminformatics <i class="fa-solid fa-recycle fa-xs"></i></a>.
This was picked up by several others, including me with <a href="https://chem-bla-ics.linkedchemistry.info/2007/08/24/jchempaint-too-png-embedded.html">an implementation in JChemPaint</a>.</p>

<p>Another tool that I really liked was <a href="https://web.archive.org/web/20101010030537/http://chempedia.com/">his Chempedia <i class="fa-solid fa-box-archive fa-xs"></i></a>
which collected “[f]ree chemical information resources created and reviewed by chemists”. One of the things it did
was link chemical names to chemical structures, e.g. for <a href="https://web.archive.org/web/20101031093610/http://chempedia.com/substances/0-4825-8876-0064">this compound <i class="fa-solid fa-box-archive fa-xs"></i></a>.
And because of the open license I was able to generate <a href="https://chem-bla-ics.linkedchemistry.info/2009/11/19/chempedia-rdf-1-sparql-end-point.html">an RDF representation of Chempedia</a>.
This resulted perhaps in one of my first online SPARQL endpoints.</p>

<p>One and a half year ago he was <a href="https://doi.org/10.59350/5ct28-aaj63">confronted with health issues <i class="fa-solid fa-recycle fa-xs"></i></a>. Rich
blogged openly about the months after that. Rereading this post is still hard, having seen cancer in action
on my mother. It turned out to be cancer, <a href="https://doi.org/10.59350/g29jj-d3m35">a brain tumor <i class="fa-solid fa-recycle fa-xs"></i></a>.
Just this Thursday I attended a fascinating <sup>2</sup>H NMR presentation, showing how much better
we got at recognizing tumors, but Rich’ MRI was obvious. He blogged for months on
<a href="https://doi.org/10.59350/mxqbw-ek659">his plan</a>. Until <a href="https://doi.org/10.59350/6beed-gk067">the end of May <i class="fa-solid fa-recycle fa-xs"></i></a>
this year.</p>

<p>Some weeks ago I received confirmation our fear; he passed away. Richard L. Apodaca was
<a href="https://search.lib.utexas.edu/discovery/fulldisplay?docid=alma991024143089706011&amp;context=L&amp;vid=01UTAU_INST:SEARCH&amp;lang=en&amp;search_scope=MyInst_and_CI&amp;adaptor=Local%20Search%20Engine&amp;tab=Everything&amp;query=any,contains,39207173&amp;sortby=rank">born in 1968</a>,
completed his PhD at the University of Texas at Austin in 1996 on <em>Studies in enantioselective catalysis:
(1) a new class of chiral C₂-symmetric bisphenols; (2) Diorganotin dihalides</em> (wikidata:<a href="https://scholia.toolforge.org/work/Q131405461">Q131405461</a>).
Rich published multiple papers in the field of medicinal chemistry (see <a href="https://scholia.toolforge.org/author/Q43837652">his Scholia profile</a>),
was very active in open science and <a href="https://patents.google.com/?inventor=Richard+Apodaca">held many patents</a>.
His latest work was about <em>Balsa: A Compact Line Notation Based on SMILES</em>
(see doi:<a href="https://doi.org/10.26434/chemrxiv-2022-01ltp">10.26434/chemrxiv-2022-01ltp</a>).</p>

<p>The <a href="https://depth-first.com/">Depth-First blog</a> has a CC-BY 2.0 license and perhaps <a href="https://rogue-scholar.org/">Rogue Scholar</a> <!-- keep link -->
can archive it? It helps us remember Rich and his contributions to open science cheminformatics.</p>]]></content><author><name>Egon Willighagen</name></author><category term="openscience" /><category term="cheminf" /><category term="justdoi:10.26434/chemrxiv-2022-01ltp" /><category term="justdoi:10.59350/xyp0f-9dt42" /><category term="justdoi:10.59350/esgte-mv539" /><category term="justdoi:10.59350/50ebs-4zq55" /><category term="justdoi:10.59350/pz3p6-fv247" /><category term="justdoi:10.59350/j026p-17z02" /><category term="justdoi:10.59350/5ct28-aaj63" /><category term="justdoi:10.59350/g29jj-d3m35" /><category term="justdoi:10.59350/6beed-gk067" /><category term="justdoi:10.59350/mxqbw-ek659" /><summary type="html"><![CDATA[If you are into openscience chemistry or chemistry blogging, then you probably heard of Rich Apodaca’s Depth-First blog. Rich started blogging in 2006 but this is not how I discovered his work originally. I know that we at least already had contact in 2005, because that is when he wrote about an integration between his Octet library and the Chemistry Development Kit in the CDK News (volume 2, issue 2), CDKTools: The CDK-Octet Bridge. In 2006 he reviewed our use of the Open Journal System for CDK News .]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://chem-bla-ics.linkedchemistry.info/assets/images/depth_first.png" /><media:content medium="image" url="https://chem-bla-ics.linkedchemistry.info/assets/images/depth_first.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Artificial intelligence for natural product drug discovery</title><link href="https://chem-bla-ics.linkedchemistry.info/2023/09/24/ai.html" rel="alternate" type="text/html" title="Artificial intelligence for natural product drug discovery" /><published>2023-09-24T00:00:00+00:00</published><updated>2023-09-24T00:00:00+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/2023/09/24/ai</id><content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2023/09/24/ai.html"><![CDATA[<p>Two weeks ago the write up of a week-long scientific discussions around artificial intelligence for natural product drug discovery
in Leiden at the <a href="https://www.lorentzcenter.nl/">Lorentz Center</a> got published
(doi:<a href="https://doi.org/10.1038/s41573-023-00774-7">10.1038/s41573-023-00774-7</a>, <a href="https://cris.maastrichtuniversity.nl/en/publications/artificial-intelligence-for-natural-product-drug-discovery">free PDF</a>).</p>

<p><img src="/assets/images/ai.png" alt="Part of the copyrighted Figure 1 from the article. I hope this counts as fair use." /></p>

<p>Sadly, the meetings was still during the (partial) lockdown, and I think my contribution could have been
more extensive. But I am happy I got to pitch the idea of using Wikidata in this area too, taking advantage
of the work done by the LOTUS (doi:<a href="https://doi.org/10.7554/eLife.70780">10.7554/eLife.70780</a>) team earlier.</p>

<p>And this is key to me: you cannot do statistics, chemometrics, machine learning, or artificial
intelligence without good quality linked data. Happy reading!</p>]]></content><author><name>Egon Willighagen</name></author><category term="cheminf" /><category term="natprod" /><category term="doi:10.1038/s41573-023-00774-7" /><category term="doi:10.7554/eLife.70780" /><summary type="html"><![CDATA[Two weeks ago the write up of a week-long scientific discussions around artificial intelligence for natural product drug discovery in Leiden at the Lorentz Center got published (doi:10.1038/s41573-023-00774-7, free PDF).]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://chem-bla-ics.linkedchemistry.info/assets/images/ai.png" /><media:content medium="image" url="https://chem-bla-ics.linkedchemistry.info/assets/images/ai.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">What metabolites are found in which species? Nanopublications from Wikidata</title><link href="https://chem-bla-ics.linkedchemistry.info/2019/03/30/what-metabolites-are-found-in-which.html" rel="alternate" type="text/html" title="What metabolites are found in which species? Nanopublications from Wikidata" /><published>2019-03-30T00:00:00+00:00</published><updated>2019-03-30T00:00:00+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/2019/03/30/what-metabolites-are-found-in-which</id><content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2019/03/30/what-metabolites-are-found-in-which.html"><![CDATA[<p>In December I reported about Groovy <a href="https://chem-bla-ics.linkedchemistry.info/2018/12/27/creating-nanopublications-with-groovy.html">code to create nanopublications <i class="fa-solid fa-recycle fa-xs"></i></a>.
This has been running for some time now, extracting nanopubs that assert that some
metabolite is found in some species. I send the resulting nanopubs to
<a href="https://scholia.toolforge.org/author/Q42027946">Tobias Kuhn <i class="fa-solid fa-recycle fa-xs"></i></a>, to populate his
<em>Growing Resource of Provenance-Centric Scientific Linked Data</em>
(doi:<a href="https://doi.org/10.1109/eScience.2018.00024">10.1109/eScience.2018.00024</a>,
<a href="https://arxiv.org/pdf/1809.06532.pdf">PDF</a>).</p>

<p>Each data set comes with <a href="http://np.inn.ac/RA6KPZ2qS8joGDOA9EvfcNHeNsg6nI2_T1YePsYMjL9io">an index pointing to the individual nanopubs</a>,
and that looks like this:</p>

<p><img src="/assets/images/nanopubs.png" alt="" /></p>

<p>I wonder what options I have to to archive the full set up nanopublications on
Figshare or Zenodo, and see that DOI show up here…</p>]]></content><author><name>Egon Willighagen</name></author><category term="nanopub" /><category term="cheminf" /><category term="wikidata" /><category term="doi:10.1109/ESCIENCE.2018.00024" /><summary type="html"><![CDATA[In December I reported about Groovy code to create nanopublications . This has been running for some time now, extracting nanopubs that assert that some metabolite is found in some species. I send the resulting nanopubs to Tobias Kuhn , to populate his Growing Resource of Provenance-Centric Scientific Linked Data (doi:10.1109/eScience.2018.00024, PDF).]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://chem-bla-ics.linkedchemistry.info/assets/images/nanopubs.png" /><media:content medium="image" url="https://chem-bla-ics.linkedchemistry.info/assets/images/nanopubs.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">PubChemRDF: semantic web access to PubChem data</title><link href="https://chem-bla-ics.linkedchemistry.info/2015/07/15/pubchemrdf-semantic-web-access-to.html" rel="alternate" type="text/html" title="PubChemRDF: semantic web access to PubChem data" /><published>2015-07-15T00:00:00+00:00</published><updated>2015-07-15T00:00:00+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/2015/07/15/pubchemrdf-semantic-web-access-to</id><content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2015/07/15/pubchemrdf-semantic-web-access-to.html"><![CDATA[<p><img src="/assets/images/s13321-015-0084-4-graphical-abstract.gif" style="width: 30%; display: block; margin-left: auto; margin-right: auto; float: right" />
Gang Fu and Evan Bolton have <a href="https://pubchem.ncbi.nlm.nih.gov/rdf/">blogged</a> about it previously, but their PubChemRDF paper is out now
(doi:<a href="https://doi.org/10.1186/s13321-015-0084-4">10.1186/s13321-015-0084-4</a>). It very likely defines the largest collection of RDF triples
using the <a href="http://chem-bla-ics.blogspot.nl/search?q=CHEMINF&amp;max-results=20&amp;by-date=true">CHEMINF ontology</a> and I congratulate the
authors with a increasingly powerful <a href="http://pubchem.ncbi.nlm.nih.gov/">PubChem</a> database.</p>

<p>With this major provider of Linked Open Data for chemistry now published, I should soon see where
<a href="http://chem-bla-ics.blogspot.nl/2012/07/isbjrn-4-added-cheminf-support.html">my Isbjørn stands</a>. The release of this publication is
also very timely with respect to the CHEMINF ontology, as I last week finished a transition from Google to GitHub, by moving the important
wiki pages, including one about “<a href="https://github.com/semanticchemistry/semanticchemistry/wiki/Where-is-the-CHEMINF-ontology-used%3F">Where is the CHEMINF ontology used?</a>”.
I already added Gang’s paper. A big thanks and congratulations to the PubChem team and my sincere thanks to have been able to contribute to this paper.</p>]]></content><author><name>Egon Willighagen</name></author><category term="pubchem" /><category term="rdf" /><category term="cheminf" /><category term="ontology" /><category term="doi:10.1186/S13321-015-0084-4" /><summary type="html"><![CDATA[Gang Fu and Evan Bolton have blogged about it previously, but their PubChemRDF paper is out now (doi:10.1186/s13321-015-0084-4). It very likely defines the largest collection of RDF triples using the CHEMINF ontology and I congratulate the authors with a increasingly powerful PubChem database.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://chem-bla-ics.linkedchemistry.info/assets/images/s13321-015-0084-4-graphical-abstract.gif" /><media:content medium="image" url="https://chem-bla-ics.linkedchemistry.info/assets/images/s13321-015-0084-4-graphical-abstract.gif" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">New Paper: “The ChEMBL database as linked open data”</title><link href="https://chem-bla-ics.linkedchemistry.info/2013/05/09/new-paper-chembl-database-as-linked.html" rel="alternate" type="text/html" title="New Paper: “The ChEMBL database as linked open data”" /><published>2013-05-09T00:00:00+00:00</published><updated>2013-05-09T00:00:00+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/2013/05/09/new-paper-chembl-database-as-linked</id><content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2013/05/09/new-paper-chembl-database-as-linked.html"><![CDATA[<script src="https://d1bxh8uas1mnw7.cloudfront.net/assets/embed.js" type="text/javascript"></script>

<div class="altmetric-embed" data-badge-details="right" data-badge-type="donut" data-doi="10.1186/1758-2946-5-23" style="float: right;"></div>

<p><strong>Update</strong>: Mark wrote up a <a href="http://chembl.blogspot.co.uk/2013/05/chembl-chembl-rdf.html">blog post</a> on the RDF that the ChEMBL team itself.</p>

<p>Yesterday, the paper “The ChEMBL database as linked open data” (doi:<a href="https://doi.org/10.1186/1758-2946-5-23">10.1186/1758-2946-5-23</a>) by
Andra Waagmeester (<a href="https://twitter.com/andrawaag">@andrawaag</a>), Ola Spjuth (<a href="https://twitter.com/ola_spjuth">@ola_spjuth</a>), Peter Ansell
(<a href="http://twitter.com/p_ansell">@p_ansell</a>), Antony Williams (<a href="https://twitter.com/chemconnector">@chemconnector</a>), Valery Tkachenko,
Janna Hastings, Bin Chen (<a href="http://twitter.com/binchenindiana">@binchenindiana</a>), David J Wild (<a href="http://twitter.com/davidjohnwild">@davidjohnwild</a>),
and me appeared in the OA <a href="http://en.wikipedia.org/wiki/Journal_of_Cheminformatics">JChemInf</a> journal.</p>

<p>I am also indebted to the <a href="https://www.ebi.ac.uk/chembl/">ChEMBL</a> team (<a href="http://twitter.com/chembl">@chembl</a>) for both providing such
valuable data under a liberal Open Access license and their critical reading of the manuscript! <strong>Additionally, I would like to stress
that the ChEMBL team will create their own RDF version of ChEMBL and that this paper is not describing the version they will release.</strong></p>

<p>BTW, the <a href="https://github.com/egonw/chembl-rdf-paper/">source of the paper</a> is available from GitHub. And the
<a href="https://github.com/egonw/chembl.rdf">(original) scripts to create RDF from the MySQL dump of ChEMBL</a> are also on GitHub.</p>

<p><img src="https://media.springernature.com/lw685/springer-static/image/art%3A10.1186%2F1758-2946-5-23/MediaObjects/13321_2012_Article_469_Figa_HTML.gif" alt="" /></p>

<p>This paper outlines the <a href="http://www.jcheminf.com/content/3/1/15">RDF</a> as it has evolved from various earlier projects. The above
diagram visualizes the basic structure (red), various Linked Data resources linked too (blue) and illustrates how various ontologies are used,
such as the <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0025513">CHEMINF</a>, <a href="http://bibliontology.com/">BIBO</a>,
and <a href="http://www.jbiomedsem.com/content/1/S1/S6">CiTO</a> ontologies.</p>

<p>Additionally, various applications and links are described developed by various co-authors. For example, Peter worked on the use in
<a href="http://bio2rdf.org/">Bio2RDF</a> and Bin and David on <a href="http://cheminfov.informatics.indiana.edu:8080/">Chem2Bio2RDF</a>. Andra developed
an extension for his (#altmetric) <a href="http://citedin.org/">CitedIn</a> resource, giving credit to a paper when data in it is extracted into
ChEMBL. Ola, Valery, and Anthony developed a <a href="http://www.bioclipse.net/decision-support">Bioclipse Decision Support</a> extension,
which supports a nearest neighbor search in ChEMBL using <a href="http://chemspider.com/">ChemSpider</a>. Of course, Ola also hosts
<a href="http://rdf.farmbio.uu.se/chembl/snorql/">the SPARQL end point</a> of which you can monitor the uptime at the also cool
<a href="http://labs.mondeca.com/sparqlEndpointsStatus/details/farmbio-chembl.html">mondeca.com service</a>:</p>

<p><img src="/assets/images/mondecaUptime.png" alt="" /></p>

<p>(Yes, I think I have all the cool buzzwords covered in this paper. Sadly, marketing is needed nowadays as a scientist. Where is the
time that you could rant on page after page in all your domain specific jargon, not having to worry if your reader would understand
it immediately, or without a university degree…)</p>

<p>What this paper does not describe, is all the things I did with ChEMBL-RDF in the <a href="http://www.openphacts.org/">Open PHACTS</a> project
(<a href="https://twitter.com/open_phacts">@Open_PHACTS</a>), which includes the use of <a href="http://qudt.org/">QUDT</a> and the
<a href="https://github.com/egonw/jqudt">jQUDT</a> library for unit normalization outlined in <a href="http://www.bigcat.unimaas.nl/~egonw/units/">this document</a>
and the use of VoID for link sets as described in <a href="http://www.openphacts.org/specs/2012/WD-datadesc-20121019/">this document</a>.</p>]]></content><author><name>Egon Willighagen</name></author><category term="chembl" /><category term="rdf" /><category term="cito" /><category term="cheminf" /><category term="doi:10.1186/1758-2946-5-23" /><category term="doi:10.1186/1758-2946-3-15" /><category term="ontology" /><category term="doi:10.1371/JOURNAL.PONE.0025513" /><category term="justdoi:10.1186/2041-1480-1-S1-S6" /><category term="chemspider" /><category term="openphacts" /><summary type="html"><![CDATA[]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://chem-bla-ics.linkedchemistry.info/assets/images/mondecaUptime.png" /><media:content medium="image" url="https://chem-bla-ics.linkedchemistry.info/assets/images/mondecaUptime.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">ChEMBL 13 as RDF</title><link href="https://chem-bla-ics.linkedchemistry.info/2012/03/04/chembl-13-as-rdf.html" rel="alternate" type="text/html" title="ChEMBL 13 as RDF" /><published>2012-03-04T00:00:00+00:00</published><updated>2012-03-04T00:00:00+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/2012/03/04/chembl-13-as-rdf</id><content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2012/03/04/chembl-13-as-rdf.html"><![CDATA[<p><strong>Update</strong>: this work is now described in <a href="https://chem-bla-ics.linkedchemistry.info/2013/05/09/new-paper-chembl-database-as-linked.html">this paper <i class="fa-solid fa-recycle fa-xs"></i></a>.</p>

<p>Last week, ChEMBL 13 was <a href="http://chembl.blogspot.com/2012/02/chembl-13-released.html">released</a>, with even more data, data fixes,
<a href="ftp://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/releases/chembl_13/chembl_13_release_notes.txt">etc</a>. Since my RDF for
<a href="https://chem-bla-ics.linkedchemistry.info/2011/04/21/chembl-09-as-rdf.html">ChEMBL 09 <i class="fa-solid fa-recycle fa-xs"></i></a> my workflow has become
<a href="https://github.com/egonw/chembl.rdf/commits/master">more solid</a> and uses more common ontologies, started using more common ontologies
and ontologies I just like, such as <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0025513">CHEMINF</a> and
<a href="http://www.jbiomedsem.com/content/1/S1/S6">CiTO</a>. Below is an overview of the resource types present in the RDF:
activities (almost 7M now), chemical entities, assays, targets, and documents.</p>

<p><img src="/assets/images/relations.png" alt="" /></p>

<p>The <a href="https://chem-bla-ics.linkedchemistry.info/2011/10/22/chembl-rdf-uploading-data-to-kasabi.html">data on Kasabi <i class="fa-solid fa-recycle fa-xs"></i></a> will be updated soon,
and the <a href="http://rdf.farmbio.uu.se/chembl/sparql">SPARQL end point</a> hosted by Uppsala University was updated yesterday, including the
<a href="http://rdf.farmbio.uu.se/chembl/snorql/">SNORQL frontend</a>:</p>

<p><img src="/assets/images/chemblRDF13.png" alt="" /></p>

<p>The new data is not fully backwards compatible. The changes to the RDF include the use of <code class="language-plaintext highlighter-rouge">cito:citesAsDataSource</code>, more typing
using existing ontologies, e.g. with <code class="language-plaintext highlighter-rouge">cheminf:CHEMINF_000000</code> and <code class="language-plaintext highlighter-rouge">pro:PR_000000001</code> from the
<a href="http://pir.georgetown.edu/pro/">PRotein Ontology</a>.</p>

<p>A paper dedicated to the ChEMBL-RDF is in preparation. Existing use cases can be found
<a href="http://www.jbiomedsem.com/content/2/S1/S6">here</a>.</p>]]></content><author><name>Egon Willighagen</name></author><category term="chembl" /><category term="rdf" /><category term="semweb" /><category term="ontology" /><category term="cheminf" /><category term="doi:10.1371/JOURNAL.PONE.0025513" /><category term="cito" /><category term="justdoi:10.1186/2041-1480-1-S1-S6" /><category term="doi:10.1186/2041-1480-2-S1-S6" /><summary type="html"><![CDATA[Update: this work is now described in this paper .]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://chem-bla-ics.linkedchemistry.info/assets/images/relations.png" /><media:content medium="image" url="https://chem-bla-ics.linkedchemistry.info/assets/images/relations.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Groovy Cheminformatics 4th edition</title><link href="https://chem-bla-ics.linkedchemistry.info/2012/01/15/groovy-cheminformatics-4th-edition.html" rel="alternate" type="text/html" title="Groovy Cheminformatics 4th edition" /><published>2012-01-15T00:00:00+00:00</published><updated>2012-01-15T00:00:00+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/2012/01/15/groovy-cheminformatics-4th-edition</id><content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2012/01/15/groovy-cheminformatics-4th-edition.html"><![CDATA[<p>Six month was not quite the amount of time I anticipated between the third and fourth edition, but I finally managed
to upload edition 1.4.7-0 of my <a href="http://www.lulu.com/product/paperback/groovy-cheminformatics-with-the-chemistry-development-kit/18825420">Groovy Cheminformatics</a>
book. The first three editions sold 37 copies, including two for myself. Enough to feel supported and to continue working on it.</p>

<p>So, this new edition is again thicker, summing up to 152 pages now, which is 28 pages more than
<a href="https://chem-bla-ics.linkedchemistry.info/2011/07/31/groovy-cheminformatics-3rd-edition.html">the 3rd edition <i class="fa-solid fa-recycle fa-xs"></i></a>. Indeed, the table of contents
is more than half a page longer in itself, though, just barely, still fitting on four pages. In fact, I had to remove one (new)
subsection title, because it would take otherwise two further pages.</p>

<p>The new content is again a mix of sections and chapters. While writing new chapters, I find myself realizing I need to cover
more basics. Those get typically added as new sections. I did not get many feature requests, except for one email pointing me
the text promised how to interpret and handle failing atom type perception, which explains one of the new sections.
The full list of new content is:</p>

<ul>
  <li>Section 2.1.4: explaining the three flavors of atomic coordinates</li>
  <li>Extended Section 2.2: added detail about electron counts of bonds (partly in reply to this post by Rich)</li>
  <li>Chapter 5 “Protein and DNA”: four pages, mostly about PDB files, and the matching CDK data structure</li>
  <li>Chapter 6 “IChemObjectBuilders”: four pages explaining the four alternative builders CDK 1.4.7 has</li>
  <li>Section 7.8: a new section with recipes on how to post-process read input, discussing MDL molfiles only now. It talks about what information is present in the file format, and what steps must be untertaken to add missing information</li>
  <li>Section 8.2.4 “No atom type perceived?!”</li>
  <li>Section 11.4: describes how to depict aromatic rings</li>
  <li>Section 11.5: describes how to change the background color of depictions</li>
  <li>Section 13.4: explains how to calculate the Van der Waals volume of molecules</li>
  <li>Section 18.1.3: discussing the API improvement in the iterating readers</li>
  <li>Appendix C: a list of all descriptors provided by the CDK</li>
  <li>Appendix D: a list of file formats known by the CDK, indicating which has readers and writers</li>
</ul>

<p>On top of that, I improved other bits of the book too, such as the resolution of the depictions of molecules,
as well as those of various diagrams. Also the number of scripts has seriously gone up, from 94 to 134!</p>

<p>Appendix C is a prelude to a chapter I am already writing, but did not get finished yet: a chapter about
descriptor calculation. But since I just started a new post-doc position, it may take another six months
for that chapter to make it into print.</p>

<p>The paperbak is <a href="http://www.lulu.com/product/paperback/groovy-cheminformatics-with-the-chemistry-development-kit/18825420">available from Lulu.com</a>,
an on-demand publisher, as well as <a href="http://www.lulu.com/product/ebook/groovy-cheminformatics-with-the-chemistry-development-kit/18825437">this ebook version</a>.</p>]]></content><author><name>Egon Willighagen</name></author><category term="cdk" /><category term="cdkbook" /><category term="java" /><category term="cheminf" /><summary type="html"><![CDATA[Six month was not quite the amount of time I anticipated between the third and fourth edition, but I finally managed to upload edition 1.4.7-0 of my Groovy Cheminformatics book. The first three editions sold 37 copies, including two for myself. Enough to feel supported and to continue working on it.]]></summary></entry><entry><title type="html">Groovy Cheminformatics 3rd edition</title><link href="https://chem-bla-ics.linkedchemistry.info/2011/07/31/groovy-cheminformatics-3rd-edition.html" rel="alternate" type="text/html" title="Groovy Cheminformatics 3rd edition" /><published>2011-07-31T00:00:00+00:00</published><updated>2011-07-31T00:00:00+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/2011/07/31/groovy-cheminformatics-3rd-edition</id><content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2011/07/31/groovy-cheminformatics-3rd-edition.html"><![CDATA[<p><strong>Update</strong>: the <a href="https://chem-bla-ics.linkedchemistry.info/2012/01/15/groovy-cheminformatics-4th-edition.html">fourth edition <i class="fa-solid fa-recycle fa-xs"></i></a> is out.</p>

<p>I am starting to get the hang of this publishing soon, publishing often thing, and
<a href="http://www.lulu.com/product/paperback/groovy-cheminformatics-with-the-chemistry-development-kit/16378378">just uploaded</a>
edition 1.4.1-0 of the <a href="https://chem-bla-ics.linkedchemistry.info/2011/02/06/groovy-cheminformatics.html">Groovy Cheminformatics <i class="fa-solid fa-recycle fa-xs"></i></a> book.
The cover is the same (with one typo fix), and the content is 20 pages thicker. True, six of those pages are isotope
masses of all natural isotopes. That leaves 14 pages with this new content:</p>

<ul>
  <li>Section 2.7 on line notations with 2.7.1 about reading and writing SMILES</li>
  <li>Section 6.3 about Sybyl (mol2) atom types</li>
  <li>Section 7.4 on atom numbering with 7.4.1 on Morgan atom numbers, and 7.4.2 on InChI atom numbers</li>
  <li>Chapter 9 on molecule depiction with the new rendering code, with
    <ul>
      <li>Section 9.1 on drawing molecules,</li>
      <li>Section 9.2 on rendering parameters, and</li>
      <li>Section 9.3 on the generator API and how to add custom content</li>
    </ul>
  </li>
  <li>Section 11.4 on calculating aromaticity</li>
  <li>Appendix A.2 listing all Sybyl atom types</li>
  <li>Appendix B listing all naturally occurring isotopes</li>
</ul>

<p>Features requests most welcome.</p>]]></content><author><name>Egon Willighagen</name></author><category term="cdk" /><category term="cdkbook" /><category term="java" /><category term="cheminf" /><summary type="html"><![CDATA[Update: the fourth edition is out.]]></summary></entry></feed>