<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <generator uri="https://jekyllrb.com/" version="4.3.4">Jekyll</generator>
  <link href="https://chem-bla-ics.linkedchemistry.info/feed.xml" rel="self" type="application/atom+xml"/>
  <link href="https://chem-bla-ics.linkedchemistry.info" rel="alternate" type="text/html"/>
  <updated>2026-04-11T11:30:50+00:00</updated>
  <id>https://chem-bla-ics.linkedchemistry.info/archive.xml</id>
  <title type="html">chem-bla-ics</title>
  <subtitle>Chemblaics (pronounced chem-bla-ics) is the science that uses open science and computers to solve problems in chemistry, biochemistry and related fields.</subtitle>
  <author>
    <name>Egon Willighagen</name>
    <uri>https://orcid.org/0000-0001-7542-0286</uri>
  </author>

  
  <entry>
    <title type="html">SWAT4HCLS 2026</title>
    <link href="https://chem-bla-ics.linkedchemistry.info/2026/04/04/swat4hcls-2026.html" rel="alternate" type="text/html" title="SWAT4HCLS 2026"/>
    <published>2026-04-04T16:54:00+00:00</published>
    <updated>2026-04-04T16:54:00+00:00</updated>
    <id>https://doi.org/10.59350/bmxve-vry14</id>
    <content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2026/04/04/swat4hcls-2026.html">
      <![CDATA[ <p>A bit over a week ago, <a href="https://www.swat4ls.org/workshops/amsterdam2026/">SWAT4HCLS 2026</a> took place, with the matching
<a href="https://www.swat4ls.org/workshops/amsterdam2026/swat4hcls-biohackathon-2026/">biohackathon</a> on Thursday (see
<a href="https://chem-bla-ics.linkedchemistry.info/2026/03/22/swat4hcls-2026-amsterdam-this-week.html">this post</a>.
I attempted a bit of live coverage on mastodon: <a href="https://social.edu.nl/@egonw/116285060969709401">day 1</a> and
<a href="https://social.edu.nl/@egonw/116289579219485790">day 2</a>. But it seems the semantic web community interested
in SWAT4HCLS has not found the fediverse yet. So, make sure to check
<a href="https://www.swat4ls.org/workshops/amsterdam2026/programme/accepted-submissions/">this full list of abstracts</a>.</p>

<p>The meeting consisted of <a href="https://www.swat4ls.org/workshops/amsterdam2026/programme/keynotes/">four keynotes</a>, each
one was quite interesting. Cornet gave a nice historic perspective of the venue and of the semantic web field,
which is a great way to welcome the participants to your institute. The talk also touches on the main theme
of the meeting: clinical data. It is a long standing (and important) research field, but progress is slow.
Cornet <a href="https://social.edu.nl/@egonw/116283216644714695">comments</a> along the lines that <em>we have been talking
about reasoning over patient data for more than twenty years, but we still have not solve it</em>.</p>

<p>The problem is really not only privacy, but simple also lack of a common language. As
<a href="https://qlever.scholia.wiki/orcid/0000-0003-3248-7899">Sabine Österle</a> explains
about sharing health/patient data in Switzerland, across 26 kantons and legislations and 4 national languages.
Another issue is more technical, running SPARQL across hospitals involves more than just aligning ontologies,
but also requires (too much) fiddling with SPARQL queries.</p>

<p>There was plenty of other content too, however. For example, I was pleasantly
<a href="https://social.edu.nl/@egonw/116284409447761902">surprised</a> by the
<a href="https://www.swat4ls.org/workshops/amsterdam2026/programme/accepted-submissions/#RDF4RiskAssessment_Toolkit_A_Toolkit_for_Converting_Tabular_Research_Data_to_FAIR_RDF_for_Risk_Assessment_and_Life_Sciences">RDF4RiskAssessment</a>
work, the <a href="https://www.swat4ls.org/workshops/amsterdam2026/programme/accepted-submissions/#RO-Crates_for_BioImaging">RO-Crates for BioImaging</a>,
and <a href="https://www.swat4ls.org/workshops/amsterdam2026/programme/accepted-submissions/#FDPcrawleR_A_Lightweight_R_Framework_for_Auditing_FAIR_Data_Points_and_FAIR_Virtual_Platforms">FDPcrawleR</a>.
All these projects have direct links to research ongoing in <a href="https://www.maastrichtuniversity.nl/research/translational-genomics">our TGX team</a>.</p>

<p><a href="https://qlever.scholia.wiki/orcid/0000-0003-1213-6776">Hanna Bast</a> gave the second keynote of the first day, about <a href="https://qlever.dev/">QLever</a>
(doi:<a href="https://doi.org/10.1145/3132847.3132921">10.1145/3132847.3132921</a>). She talked about some of the recent improvements,
something we really <a href="https://chem-bla-ics.linkedchemistry.info/2026/02/28/rescuing-scholia-3-we-did-it.html">needed for Scholia</a>.
She showed a technical approach to make federated queries faster, tho it currently only works between endpoints
that both run QLever. One thing I am looking forward to, is playing with the notion of
<a href="https://docs.qlever.dev/materialized-views/?h=materialize">materialized views</a>, but the biohackathon
was too short to get around to that during the Thursday.</p>

<p>The second day kicked off with a keynote by <a href="https://qlever.scholia.wiki/orcid/0000-0002-3469-4923">Janna Hastings</a>,
whose work I greatly admire. I was not disappointed today, and she showed the
<a href="https://www.bciontology.org/">Behaviour Change Intervention Ontology</a> and <a href="https://chebifier.hastingslab.org/">Chebifier</a>
(doi:<a href="https://doi.org/10.1039/D3DD00238A">10.1039/D3DD00238A</a>).</p>

<p>The last talk I want to mention in the blog is by two researcher working with Michel Dumontier. They
<a href="https://www.swat4ls.org/workshops/amsterdam2026/programme/accepted-submissions/#Embedding-based_Deduplication_of_Knowledge_Graphs_using_Graph_Neural_Networks">presented</a>
a study about deduplication in/of knowledge graphs. This is something I want to read in more detail.</p>

      <h4>References</h4>
      <ul>
      
      
        <li><a href="https://doi.org/10.1039/D3DD00238A">10.1039/D3DD00238A</a></li>
      
        <li><a href="https://doi.org/10.1145/3132847.3132921">10.1145/3132847.3132921</a></li>
      </ul>
      ]]>
    </content>
    
    
      <author><name>Egon Willighagen</name><uri>https://orcid.org/0000-0001-7542-0286</uri></author>
    
    <category term="swat4ls"/><category term="mastodon"/><category term="justdoi:10.1039/D3DD00238A"/><category term="justdoi:10.1145/3132847.3132921"/>
    
    <summary type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2026/04/04/swat4hcls-2026.html">
      <![CDATA[ A bit over a week ago, SWAT4HCLS 2026 took place, with the matching biohackathon on Thursday (see this post. I attempted a bit of live coverage on mastodon: day 1 and day 2. But it seems the semantic web community interested in SWAT4HCLS has not found the fediverse yet. So, make sure to check this full list of abstracts. ]]>
    </summary></entry>
  
  <entry>
    <title type="html">Using compact identifiers in project reports</title>
    <link href="https://chem-bla-ics.linkedchemistry.info/2026/03/29/using-compact-identifiers-in-project-reports.html" rel="alternate" type="text/html" title="Using compact identifiers in project reports"/>
    <published>2026-03-29T00:00:00+00:00</published>
    <updated>2026-03-29T00:00:00+00:00</updated>
    <id>https://doi.org/10.59350/re9j2-hk972</id>
    <content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2026/03/29/using-compact-identifiers-in-project-reports.html">
      <![CDATA[ <p>This document describes how you can improve the FAIR-ness of your project report by using
compact identifiers. Of course, it can be applied to any other document too, and has been used
in, for example, journal articles and online documentation already.</p>

<p>Compact identifiers find a balance between compactness in writing and being a persistent, unique,
and global identifier. It “is a string constructed by concatenating a namespace prefix, a separating colon,
and a locally unique identifier (LUI)” (doi:<a href="https://doi.org/10.1038/sdata.2018.29">10.1038/sdata.2018.29</a>).
For example, for proteins it can represent the PDB structure <a href="https://bioregistry.io/pdb:2gc4">2gc4</a> as
<em>pdb:2gc4</em>. There is a clear similarity with the SciCrunch <a href="https://rrid.site/">Research Resource Identifiers</a>
(RRIDs) as used by several journals, like
<a href="https://elifesciences.org/inside-elife/ff683ecc/rrids-how-did-we-get-here-and-where-are-we-going">eLife</a>
(doi:<a href="https://doi.org/10.1007/s12021-015-9284-3">10.1007/s12021-015-9284-3</a>).</p>

<p>When the prefixes are defined by community standards, then a compact identifier can be resolved.
There currently are multiple providers of prefix files (doi:<a href="https://doi.org/10.1038/sdata.2018.29">10.1038/sdata.2018.29</a>),
including Identifiers.org (doi:<a href="https://doi.org/10.1093/bioinformatics/btaa864">10.1093/bioinformatics/btaa864</a>)
and Bioregistry (doi:<a href="https://doi.org/10.1038/s41597-022-01807-3">10.1038/s41597-022-01807-3</a>).
The Bioregistry has an overview of more than twenty registries of prefixes and their metadata
(doi:<a href="https://doi.org/10.1038/s41597-022-01807-3">10.1038/s41597-022-01807-3</a>). The metadata commonly
includes information on the URL pattern for each identifier. Often this is more than one pattern, as
there may more several databases with information for the same identifier.</p>

<p>It is the URL pattern in the database that allows services to <em>resolve</em> the compact identifier
into a link to a database. The above registries correspond to three existing <em>resolvers</em> that will take a compact
identifier as part of a resolver URL and redirect to the database with the record matching
that identifier:</p>

<ul>
  <li>Name-to-Thing (N2T): <a href="https://n2t.net/">https://n2t.net/</a></li>
  <li>Identifiers.org: <a href="https://identifiers.org/">https://identifiers.org/</a></li>
  <li>The Bioregistry: <a href="https://bioregistry.io/">https://bioregistry.io/</a></li>
</ul>

<p>Each of these URLs can be extended with a compact identifier. For example, a taxon record
from the NCBI databases or the PDB entry mentioned earlier:</p>

<ul>
  <li><a href="https://bioregistry.io/pdb:2gc4">https://bioregistry.io/pdb:2gc4</a></li>
  <li><a href="https://identifiers.org/col:6MB3T">https://identifiers.org/col:6MB3T</a> (<code class="language-plaintext highlighter-rouge">col</code> is the prefix for the Catalogue of Life)</li>
</ul>

<h2 id="why-use-in-reports">Why use in reports?</h2>

<p>Using persistent identifiers is generally accepted as a good practice that benefits science
and has been part of the ideas of FAIR data (doi:<a href="https://doi.org/10.1038/sdata.2016.18">10.1038/sdata.2016.18</a>)
and of Open Science. Compact
identifiers make it easy to be precise in reports about what things the reports talk about: they
are relatively short but very precise at the same time. also, that has the benefit that they
are much easier to reuse than labels of things and concepts that intrinsically have a certain
level of uncertainty; a database entry has commonly a very specific meaning.</p>

<h2 id="examples-uses">Examples uses</h2>

<p>The use of compact identifiers can be used in two ways. The simplest is to just put the
compact identifier as plain text in the document, possibly in parentheses
(with the compact identifier highlighted here in bold):</p>

<ul>
  <i>This report is only about the experimental data of the human (<b>NCBITaxon:9606</b>) cell lines.</i>
</ul>

<p>Or:</p>

<ul>
  <i>We found that BRCA1 (<b>ensembl:ENSG00000012048</b>) played an important role.</i>
</ul>

<p>Alternatively, you can add a hyperlink with one of the resolvers, for example, Identifiers.org:</p>

<ul>
  <i>We found that BRCA1 (<b><a href="https://identifiers.org/ensembl:ENSG00000012048">ensembl:ENSG00000012048</a></b>) played an important role.</i>
</ul>

<h3 id="compact-identifiers-for-material-identifiers">Compact identifiers for material identifiers</h3>

<p>The European Registry of Materials proposes to use the compact identifier for their
ERM identifiers (doi:<a href="https://doi.org/10.1186/s13321-022-00614-7">10.1186/s13321-022-00614-7</a>):</p>

<ul>
  <i>
    For example, the NanoSolveIT project registered a material with the ERM00000001 identifier.
    The full Uniform Resource Identifier (URI) for this compound is
    https://nanocommons.github.io/identifiers/registry#ERM00000001 which is too long to be used
    in documentation. The corresponding compact identifier <b>erm:ERM00000001</b> is easy to use in written
    material, analogous to the use of Protein Data Bank (PDB) identifiers for proteins in journals.
  </i>
</ul>

<h3 id="compact-identifiers-for-citation-intent-annotations">Compact identifiers for citation intent annotations</h3>

<p>The compact identifier has also been used as the method to include citation intentions in journal
articles (doi:<a href="https://doi.org/10.1186/s13321-020-00448-1">10.1186/s13321-020-00448-1</a>,
compact identifier here highlighted in bold):</p>

<ul>
  <i>
    We take advantage here of the ability to add notes to full form [..] references in bibliographies.
    These are referred to as bibnotes. The content of the note will be strictly formatted: it will use
    the syntax [<b>cito:usesMethodIn</b>] and formatted in bold. That is, the bibnote starts with the
    [ character, followed by one of the CiTO types, and ends with the ] character. If you wish to
    provide more than one annotation, you can repeat this syntax, separated by one or more spaces,
    for example: [<b>cito:usesMethodIn</b>] [<b>cito:citeAsAuthority</b>].
  </i>
</ul>

<p>Note that in this use, the square brackets and bold typeface are used to make them easier to
be recognized. Also, note that this document uses this approach to indicate the intention of
why the cited articles are cited.</p>

<h2 id="conclusion">Conclusion</h2>

<p>This document described what the compact identifier is, how it helps linking to online
databases, and how they can be used in written reports as plain text, optionally
hyperlinked with one of the compact identifier resolvers.</p>

<h3 id="acknowledgments">Acknowledgments</h3>

<p>I thank <a href="https://n2t.net/github:tabbassidaloii">github:tabbassidaloii</a>,
<a href="https://n2t.net/github:cthoyt">github:cthoyt</a>, and
<a href="https://n2t.net/github:larsgw">github:larsgw</a> for their comment on
<a href="https://github.com/egonw/compact-ids-in-reports">this GitHub repo</a>.</p>

      <h4>References</h4>
      <ul>
      
      
      
        
      
        
      
        
      
        
      
        
      
        
      
        
      </ul>
      ]]>
    </content>
    
    
      <author><name>Egon Willighagen</name><uri>https://orcid.org/0000-0001-7542-0286</uri></author>
    
    <category term="identifier"/><category term="semweb"/><category term="cito"/><category term="cito:usesMethodIn,includesQuotationFrom:10.1038/sdata.2018.29"/><category term="cito:obtainsBackgroundFrom:10.1007/s12021-015-9284-3"/><category term="cito:usesMethodIn:10.1093/bioinformatics/btaa864"/><category term="mycito:usesMethodIn:10.1038/S41597-022-01807-3"/><category term="cito:obtainsBackgroundFrom:10.1038/sdata.2016.18"/><category term="mycito:includesQuotationFrom:10.1186/S13321-022-00614-7"/><category term="mycito:includesQuotationFrom:10.1186/S13321-020-00448-1"/>
    
    <summary type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2026/03/29/using-compact-identifiers-in-project-reports.html">
      <![CDATA[ This document describes how you can improve the FAIR-ness of your project report by using compact identifiers. Of course, it can be applied to any other document too, and has been used in, for example, journal articles and online documentation already. ]]>
    </summary></entry>
  
  <entry>
    <title type="html">SWAT4HCLS 2026 Amsterdam this week</title>
    <link href="https://chem-bla-ics.linkedchemistry.info/2026/03/22/swat4hcls-2026-amsterdam-this-week.html" rel="alternate" type="text/html" title="SWAT4HCLS 2026 Amsterdam this week"/>
    <published>2026-03-22T00:00:00+00:00</published>
    <updated>2026-03-22T00:00:00+00:00</updated>
    <id>https://doi.org/10.59350/mztnx-y1770</id>
    <content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2026/03/22/swat4hcls-2026-amsterdam-this-week.html">
      <![CDATA[ <p>Tomorrow, <a href="https://www.swat4ls.org/workshops/amsterdam2026/">SWAT4HCLS 2026</a> will start, again in Amsterdam.
The first SWAT4LS I attended <a href="https://chem-bla-ics.linkedchemistry.info/2009/11/21/swat4ls-linking-open-drug-data-to.html">was also in Amsterdam</a>, and the second meeting in Amsterdam I was <a href="https://chem-bla-ics.linkedchemistry.info/2016/12/18/my-swat4ls-poster-about-enanomapper.html">also there</a>. And I was in <a href="https://www.swat4ls.org/workshops/cambridge2015/index.php">Cambridge</a> (see
<a href="https://chem-bla-ics.blogspot.com/2015/12/swat4ls-in-cambridge.html">this post</a>),
<a href="https://www.swat4ls.org/workshops/antwerp2018/">Antwerp</a>  (no post), and at least to one of the two
<a href="https://www.swat4ls.org/workshops/leiden2024/">Leiden</a> meetings (also no posts, it seems).</p>

<p>I am looking forward to meet old friends, new friends (some whom I never met in person), and
recent collaborators (that I never met in person).
For those who will not be in Amsterdam, you can follow the meeting on social media with
the <a href="https://hashtags-hub.toolforge.org/swat4hcls">hashtag #swat4hcls</a>. And there is also
<a href="https://fediwall.biohackrxiv.org/">this BioHackrXiv Fediwall</a>, for those in the
<a href="https://en.wikipedia.org/wiki/Fediverse">fediverse</a>.</p>

<h3 id="scholia-demo">Scholia demo</h3>

<p>I will give a demo to update people on the work in the <a href="https://github.com/wdscholia/scholia">Scholia</a> project with
Daniel Mietchen, Peter Patel-Schneider, Konrad Linden, Johannes Kalmbach,
Lars Willighagen, Wolfgang Fahl, and Hannah Bast (also keynote in Amsterdam)
to <a href="https://chem-bla-ics.linkedchemistry.info/2026/02/28/rescuing-scholia-3-we-did-it.html">update the SPARQL queries</a>
we use to visualize data in <a href="https://www.wikidata.org/">Wikidata</a> to SPARQL 1.1 so that it can run on
<a href="https://qlever.dev/">Qlever</a>.
The abstract can be <a href="https://commons.wikimedia.org/wiki/File:Scholia_2026_Compliance_with_SPARQL_1.1.pdf">found in Wikimedia Commons</a>.</p>

<p>This was the outcome of many years figuring how to ensure Scholia could remain working. The
<a href="https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2026-02-17/Technology_report">Wikidata RDF graph split</a>
has given us many headaches, so many that just before christmas it became it could
be possible to survive the split, I was so happy, I realize I want to share this news. So, we teamed
up and wrote this demonstration contribution abstract. Thanks to everyone who made this happen!
Just to be clear, we are not done yet. The system is not running outside the Wikimedia Foundation
platforms.</p>

<p>One of the reviewer comments requested <a href="https://qlever.scholia.wiki/event/Q138033585">a Scholia page for the meeting</a>.
It has not been updated for the accepted speakers, but you can look at <a href="https://qlever.scholia.wiki/event-series/Q56846035">pages for past meetings</a>
to get an idea what you will find.</p>

<h3 id="swat4hcls-biohackathon-2026">SWAT4HCLS Biohackathon 2026</h3>

<p>There will also be <a href="https://www.swat4ls.org/workshops/amsterdam2026/swat4hcls-biohackathon-2026/">a biohackathon again</a>,
of course, with the <a href="https://index.biohackrxiv.org/tag/SWAT4HCLS26">option for BioHackRxiv reports</a>.
There are already <a href="https://www.swat4ls.org/workshops/amsterdam2026/swat4hcls-biohackathon-2026/">several pitches</a>,
including one that I submitted about Scholia.</p>

      <h4>References</h4>
      <ul>
      
      
      
      </ul>
      ]]>
    </content>
    
    
      <author><name>Egon Willighagen</name><uri>https://orcid.org/0000-0001-7542-0286</uri></author>
    
    <category term="rdf"/><category term="sparql"/><category term="swat4ls"/><category term="wikidata"/>
    
    <summary type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2026/03/22/swat4hcls-2026-amsterdam-this-week.html">
      <![CDATA[ Tomorrow, SWAT4HCLS 2026 will start, again in Amsterdam. The first SWAT4LS I attended was also in Amsterdam, and the second meeting in Amsterdam I was also there. And I was in Cambridge (see this post), Antwerp (no post), and at least to one of the two Leiden meetings (also no posts, it seems). ]]>
    </summary></entry>
  
  <entry>
    <title type="html">CDK 2.12</title>
    <link href="https://chem-bla-ics.linkedchemistry.info/2026/03/08/cdk-2.12.html" rel="alternate" type="text/html" title="CDK 2.12"/>
    <published>2026-03-08T00:00:00+00:00</published>
    <updated>2026-03-08T00:00:00+00:00</updated>
    <id>https://doi.org/10.59350/gw9at-srp84</id>
    <content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2026/03/08/cdk-2.12.html">
      <![CDATA[ <p><a href="https://github.com/cdk/cdk/releases/tag/cdk-2.12">Version 2.12</a> of the <a href="https://cdk.github.io/">Chemistry Development Kit</a> has been released.
It is the last release with contributions by <a href="https://www.nwo.nl/en/projects/osf232097">our NWO Open Science grant</a>.
This release adds some nice new APIs:</p>

<ul>
  <li>harmonize hydrogens to various states: depiction, stereo, minimal, and unsafe (useful for depictions)</li>
  <li>generate wedge bonds based on coordinates and stereochemistry</li>
  <li>more Markush / RGroup support</li>
  <li>atropisomers via CXSMILES</li>
  <li>sugar extraction</li>
</ul>

<p>I also update the following libraries/tools to use CDK 2.12:</p>

<ul>
  <li><a href="https://github.com/enanomapper/nanojava/releases/tag/nanojava-2.0.6">NanoJava 2.16</a></li>
  <li><a href="https://github.com/egonw/bacting/releases/tag/bacting-1.0.10">Bacting 1.0.10</a> (and the Python pyBacting will follow asap)</li>
</ul>

      <h4>References</h4>
      <ul>
      
      
        <li><a href="https://doi.org/10.5281/zenodo.18850648">10.5281/zenodo.18850648</a></li>
      </ul>
      ]]>
    </content>
    
    
      <author><name>Egon Willighagen</name><uri>https://orcid.org/0000-0001-7542-0286</uri></author>
    
    <category term="cdk"/><category term="openscience"/><category term="justdoi:10.5281/zenodo.18850648"/>
    
    <summary type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2026/03/08/cdk-2.12.html">
      <![CDATA[ Version 2.12 of the Chemistry Development Kit has been released. It is the last release with contributions by our NWO Open Science grant. This release adds some nice new APIs: ]]>
    </summary></entry>
  
  <entry>
    <title type="html">Rescuing Scholia #3: We did it!</title>
    <link href="https://chem-bla-ics.linkedchemistry.info/2026/02/28/rescuing-scholia-3-we-did-it.html" rel="alternate" type="text/html" title="Rescuing Scholia #3: We did it!"/>
    <published>2026-02-28T00:00:00+00:00</published>
    <updated>2026-02-28T00:00:00+00:00</updated>
    <id>https://doi.org/10.59350/kd793-2fe02</id>
    <content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2026/02/28/rescuing-scholia-3-we-did-it.html">
      <![CDATA[ <p>It was not a set up, when I openly <a href="https://chem-bla-ics.linkedchemistry.info/2025/12/08/rescuing-scholia.html">wondered if we would be able to rescue Scholia in time</a>.
I honestly did not know. Three weeks and some serious hacking by an international team later <a href="https://chem-bla-ics.linkedchemistry.info/2025/12/31/rescuing-scholia-2-getting-close.html">I was more optimistic</a>.
Actually, just before christmas, we started writing a <a href="https://www.swat4ls.org/">SWAT4HCLS 2026</a> demonstration abstract. This was accepted and
you can read the <em>Scholia 2026: Compliance with SPARQL 1.1</em> preprint <a href="https://github.com/WolfgangFahl/ScholiaGraphSplitPaper">here</a> and
<a href="https://commons.wikimedia.org/wiki/File:Scholia_2026_Compliance_with_SPARQL_1.1.pdf">here</a>.
This paper describes the work that had to be done, and I am deeply grateful to everyone who contributed with smaller or
bigger contributions (Daniel, Peter, Konrad, Johannes, Lars, Wolfgang, Hannah).
I am merely first author for the demo, and just another contributor to the long series of patches, in a
<a href="https://github.com/WDscholia/scholia/pull/2715">branch started by Prof. Hannah Bast</a>.</p>

<p>The work actually started long before that, with the <em>Robustifying Scholia</em> grant (see doi:<a href="https://doi.org/10.3897/rio.5.e35820">10.3897/rio.5.e35820</a>),
where we explored alternatives. The Wikidata graph (RDF) split has been long coming, and I can recommend
<a href="https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2026-02-17/Technology_report">this recent The Signpost article</a>
by <a href="https://disobey.net/@Bluerasberry">Lane</a> for a good overview. So, this would not have been possible with
<a href="https://github.com/WDscholia/scholia/graphs/contributors">the many people who contributed over the years</a>.
But this last sprint really made a difference.</p>

<p>The developments of the QLever software in the past year are very important, and the SPARQL endpoint we run now is live updated,
just like we knew from the Wikidata Query Service (WDQS). Recent improvement allowed us to replace all the Wikidata and Blazegraph
specific aspects of the SPARQL queries, and good discussions let to pragmatic approaches to keep localization features
Scholia had for displaying query results from Wikidata.</p>

<p>The work is not completed, however. All queries are SPARQL 1.1 now, but some can still be further optimized, and some still
need some fixing. For example, I still spot some QIDs here and there, instead of the localized labels that should be shown instead.
Also, we are actively looking in getting everything running again on WMF servers (see <a href="https://github.com/WDscholia/scholia/issues/2766">this overview issue</a>),
so that <em>scholia.toolforge.org</em> works again.</p>

<p>For now, however, please use <a href="https://qlever.scholia.wiki/">qlever.scholia.wiki</a>.</p>

      <h4>References</h4>
      <ul>
      
      
      
        <li><a href="https://doi.org/10.3897/RIO.5.E35820">10.3897/RIO.5.E35820</a></li>
      </ul>
      ]]>
    </content>
    
    
      <author><name>Egon Willighagen</name><uri>https://orcid.org/0000-0001-7542-0286</uri></author>
    
    <category term="scholia"/><category term="sparql"/><category term="swat4ls"/><category term="doi:10.3897/RIO.5.E35820"/>
    
    <summary type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2026/02/28/rescuing-scholia-3-we-did-it.html">
      <![CDATA[ It was not a set up, when I openly wondered if we would be able to rescue Scholia in time. I honestly did not know. Three weeks and some serious hacking by an international team later I was more optimistic. Actually, just before christmas, we started writing a SWAT4HCLS 2026 demonstration abstract. This was accepted and you can read the Scholia 2026: Compliance with SPARQL 1.1 preprint here and here. This paper describes the work that had to be done, and I am deeply grateful to everyone who contributed with smaller or bigger contributions (Daniel, Peter, Konrad, Johannes, Lars, Wolfgang, Hannah). I am merely first author for the demo, and just another contributor to the long series of patches, in a branch started by Prof. Hannah Bast. ]]>
    </summary></entry>
  
  <entry>
    <title type="html">Where do the WikiPathways come from?</title>
    <link href="https://chem-bla-ics.linkedchemistry.info/2026/02/22/where-do-the-wikipathways-come-from.html" rel="alternate" type="text/html" title="Where do the WikiPathways come from?"/>
    <published>2026-02-22T00:00:00+00:00</published>
    <updated>2026-02-22T00:00:00+00:00</updated>
    <id>https://doi.org/10.59350/6smn2-ah530</id>
    <content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2026/02/22/where-do-the-wikipathways-come-from.html">
      <![CDATA[ <p><a href="https://en.wikipedia.org/wiki/WikiPathways">WikiPathways</a> was <a href="https://qlever.scholia.wiki/topic/Q7999828#earliest-published-works">founded in 2008</a>,
in the year I left Wageningen (and we Nijmegen) and moved to Uppsala, Sweden. When we dediced to move back to The Netherlands in 2012, I got to opportunity
to join the Department of Bioinformatics (BiGCaT) and work on Open PHACTS. I had visited the group in March 2011 because I had a COST action
workshop near Maastricht (about nanoQSAR) and the bioinformatics group did <a href="https://wikipathways.org/">WikiPathways</a>.</p>

<p>When I joined, there were already hundreds of pathways, originating from various collaborations (see below).
Around the winter break, the question came up who are the people who have drawn all these pathways. And on the new website
this is not actually that easy to see. You can <a href="https://www.wikipathways.org/browse/table.html">browse all pathways</a>, or look up
<a href="https://www.wikipathways.org/browse/authors.html">author profiles</a>, but not all authors have done the same amount of work.
Moreover, at various points of time, batches of pathways from those collaborators were added. Often, these were added
by the <code class="language-plaintext highlighter-rouge">MaintBot</code> account, which is routinely hidden, and then the author who shows up as first author, is not even
the original author. And then we still have a lot of homology-converted pathways. These are pathways translated to
some species from a model species. You can find them in <a href="https://github.com/wikipathways/wikipathways-homology">this repository</a>.</p>

<p>But nowadays I do a lot in the WikiPathways project, among other things generate the RDF and maintain the code that does so.
And I realized that we have author information in the RDF too (created by <a href="https://orcid.org/0000-0001-5706-2163">Alex Pico</a>.
So, the idea came up to see who the “first authors” are of the WikiPathways (mind the <em>MaintBot</em> issue), and what we know
about them. Many already had their ORCID profiles linked from their profile pages, making it easy to look up their
expertises.</p>

<p>Now, that was in January. But it turned out that the author information in the RDF worked fine in the <code class="language-plaintext highlighter-rouge">.ttl</code> file
of a single pathway, but that the <em>series ordinal</em> (e.g. 1 for being first author) was bound to the author, and
a SPARQL query would not be able to figure out on which pathways someone was first author. I fixed this somewhere
in January, so in the <a href="https://github.com/wikipathways/wikipathways-help/discussions/221">February 10 release</a> the
improved data model was available.</p>

<p>Allow me to show what is now possible, with a few SPARQL queries. First, list the authors of a pathway, use
<a href="https://edu.nl/q9txc">this template</a> for <code class="language-plaintext highlighter-rouge">WP10</code>:</p>

<div class="language-sparql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">PREFIX</span><span class="w"> </span><span class="nn">dc</span><span class="o">:</span><span class="w">    </span><span class="nn">&lt;http://purl.org/dc/elements/1.1/&gt;</span><span class="w">
</span><span class="k">PREFIX</span><span class="w"> </span><span class="nn">foaf</span><span class="o">:</span><span class="w">  </span><span class="nn">&lt;http://xmlns.com/foaf/0.1/&gt;</span><span class="w">
</span><span class="k">PREFIX</span><span class="w"> </span><span class="nn">wpq</span><span class="o">:</span><span class="w">   </span><span class="nn">&lt;http://www.wikidata.org/prop/qualifier/&gt;</span><span class="w">
</span><span class="k">PREFIX</span><span class="w"> </span><span class="nn">pav</span><span class="o">:</span><span class="w">   </span><span class="nn">&lt;http://purl.org/pav/&gt;</span><span class="w">

</span><span class="k">SELECT</span><span class="w"> </span><span class="nv">?pathway</span><span class="w"> </span><span class="nv">?version</span><span class="w"> </span><span class="nv">?ordinal</span><span class="w"> </span><span class="nv">?author_</span><span class="w"> </span><span class="nv">?name</span><span class="w"> </span><span class="nv">?orcid</span><span class="w"> </span><span class="nv">?page</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="p">{</span><span class="w">
  </span><span class="k">VALUES</span><span class="w"> </span><span class="nv">?pathway</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nn">&lt;https://identifiers.org/wikipathways/WP10&gt;</span><span class="w"> </span><span class="p">}</span><span class="w">
  </span><span class="nv">?author_</span><span class="w"> </span><span class="k">a</span><span class="w"> </span><span class="nn">foaf</span><span class="o">:</span><span class="ss">Person</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">wp</span><span class="o">:</span><span class="ss">hasAuthorship</span><span class="w"> </span><span class="nv">?authorship</span><span class="w"> </span><span class="p">.</span><span class="w">
  </span><span class="nv">?authorship</span><span class="w"> </span><span class="err">^</span><span class="nn">wp</span><span class="o">:</span><span class="ss">hasAuthorship</span><span class="w"> </span><span class="nv">?pathway</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">wpq</span><span class="o">:</span><span class="ss">series_ordinal</span><span class="w"> </span><span class="nv">?ordinal</span><span class="w"> </span><span class="p">.</span><span class="w">
  </span><span class="nv">?pathway</span><span class="w"> </span><span class="nn">pav</span><span class="o">:</span><span class="ss">hasVersion</span><span class="w"> </span><span class="nv">?pathway_</span><span class="w"> </span><span class="p">.</span><span class="w">
  </span><span class="nv">?pathway_</span><span class="w"> </span><span class="k">a</span><span class="w"> </span><span class="nn">wp</span><span class="o">:</span><span class="ss">Pathway</span><span class="w"> </span><span class="p">;</span><span class="w"> </span><span class="nn">wp</span><span class="o">:</span><span class="ss">isAbout</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="nn">gpml</span><span class="o">:</span><span class="ss">version</span><span class="w"> </span><span class="nv">?version</span><span class="w"> </span><span class="p">.</span><span class="w">
  </span><span class="k">OPTIONAL</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nv">?author_</span><span class="w"> </span><span class="nn">foaf</span><span class="o">:</span><span class="ss">homepage</span><span class="w"> </span><span class="nv">?page</span><span class="w"> </span><span class="p">}</span><span class="w">
  </span><span class="k">OPTIONAL</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nv">?author_</span><span class="w"> </span><span class="nn">foaf</span><span class="o">:</span><span class="ss">name</span><span class="w"> </span><span class="nv">?name</span><span class="w"> </span><span class="p">}</span><span class="w">
  </span><span class="k">OPTIONAL</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nv">?author_</span><span class="w"> </span><span class="nn">dc</span><span class="o">:</span><span class="ss">identifier</span><span class="w"> </span><span class="nv">?orcid</span><span class="w"> </span><span class="p">}</span><span class="w">
</span><span class="p">}</span><span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="k">ASC</span><span class="p">(</span><span class="nv">?pathway</span><span class="p">)</span><span class="w"> </span><span class="k">ASC</span><span class="p">(</span><span class="nv">?ordinal</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>

<p>We can see who the 8 people are who contributed to this pathway (we cannot actually see here what they contributed), and many
authors are member of the WikiPathways review team who focus more on technical quality than the biology. The first author,
however, often is the person who contributed most of the biological knowledge in the pathway, in this case
<a href="https://www.wikipathways.org/authors/A.Pandey">Akhilesh Pandey</a> from the NetSlim collaboration
(see doi:<a href="https://doi.org/10.1093/database/bar032">10.1093/database/bar032</a>):</p>

<p><img src="/assets/images/wikipathways_authorList.png" alt="" /></p>

<h2 id="collaborations">Collaborations</h2>

<p>Over time, multiple collaborations have taken place, like the one with NetSlim from the above query. In these collaborations,
the knowledge may not be digitized in WikiPathways as GPML by the biological experts. That encoding regularly is done
by others, but with those experts ensuring the quality. The following collaborations are examples, and
<a href="https://www.wikipathways.org/browse/communities.html">a fuller list is found online</a>:</p>

<ul>
  <li><a href="https://www.wikipathways.org/communities/wormbase_approved.html">WormBase</a> (doi:<a href="https://doi.org/10.1093/nar/gkt1063">10.1093/nar/gkt1063</a>)</li>
  <li><a href="https://www.wikipathways.org/communities/lipids.html">LIPID MAPS</a> (doi:<a href="https://doi.org/10.1093/nar/gkad896">10.1093/nar/gkad896</a>)</li>
  <li><a href="https://www.wikipathways.org/communities/imd.html">Inherited Metabolic Disorders</a> (doi:<a href="https://doi.org/10.1007/978-3-030-67727-5_73">10.1007/978-3-030-67727-5_73</a>)</li>
  <li><a href="https://www.wikipathways.org/communities/micronutrients.html">Micronutrients</a> (doi:<a href="https://doi.org/10.1007/s12263-010-0192-8">10.1007/s12263-010-0192-8</a>)</li>
</ul>

<p>We have collaborated with Reactome on various occassions (e.g. see doi:<a href="https://doi.org/10.1371/journal.pcbi.1004941">10.1371/journal.pcbi.1004941</a> and
doi:<a href="https://doi.org/10.1007/s12263-010-0192-8">10.1007/s12263-010-0192-8</a>), around plants (e.g. see doi:<a href="https://doi.org/10.1186/1939-8433-6-14">10.1186/1939-8433-6-14</a>),
around rare diseases in projects like <a href="https://www.ejprarediseases.org/">EJP-RD</a> and <a href="https://erdera.org/">ERDERA</a>, and around SARS-CoV-2.
For that, see these communities:</p>

<ul>
  <li><a href="https://www.wikipathways.org/communities/reactome.html">Reactome</a></li>
  <li><a href="https://www.wikipathways.org/communities/plants.html">Plants</a> (see also <a href="https://doi.org/10.37044/osf.io/m37f2_v1">this DBCLS BioHackathon 2025 paper</a>)</li>
  <li><a href="https://www.wikipathways.org/communities/rarediseases.html">Rare Diseases</a></li>
  <li><a href="https://www.wikipathways.org/communities/covid19.html">COVID-19</a></li>
</ul>

<p>And then there are pathways in WikiPathways supported by a full paper, but I will leave that for a later moment.</p>

<h2 id="author-statistics">Author statistics</h2>

<p>Back to the authors, because the new RDF model allows a few more nice queries. For example, we can check the number
of pathways with a certain number of authors, and then we find with the following query that there are two pathways
with up to 18 authors (<a href="https://edu.nl/mhjbw">try here</a>):</p>

<div class="language-sparql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">PREFIX</span><span class="w"> </span><span class="nn">dc</span><span class="o">:</span><span class="w">    </span><span class="nn">&lt;http://purl.org/dc/elements/1.1/&gt;</span><span class="w">
</span><span class="k">PREFIX</span><span class="w"> </span><span class="nn">wpq</span><span class="o">:</span><span class="w">   </span><span class="nn">&lt;http://www.wikidata.org/prop/qualifier/&gt;</span><span class="w">

</span><span class="k">SELECT</span><span class="w"> </span><span class="nv">?atLeast</span><span class="w"> </span><span class="p">(</span><span class="nb">COUNT</span><span class="p">(</span><span class="k">DISTINCT</span><span class="w"> </span><span class="nv">?pathway</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="nv">?count</span><span class="p">)</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="p">{</span><span class="w">
  </span><span class="nv">?author_</span><span class="w"> </span><span class="k">a</span><span class="w"> </span><span class="nn">foaf</span><span class="o">:</span><span class="ss">Person</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">wp</span><span class="o">:</span><span class="ss">hasAuthorship</span><span class="w"> </span><span class="nv">?authorship</span><span class="w"> </span><span class="p">.</span><span class="w">
  </span><span class="nv">?authorship</span><span class="w"> </span><span class="err">^</span><span class="nn">wp</span><span class="o">:</span><span class="ss">hasAuthorship</span><span class="w"> </span><span class="nv">?pathway</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">wpq</span><span class="o">:</span><span class="ss">series_ordinal</span><span class="w"> </span><span class="nv">?atLeast</span><span class="w"> </span><span class="p">.</span><span class="w">
</span><span class="p">}</span><span class="w"> </span><span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="nv">?atLeast</span><span class="w">
  </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="k">ASC</span><span class="p">(</span><span class="nn">xsd</span><span class="o">:</span><span class="ss">integer</span><span class="p">(</span><span class="nv">?atLeast</span><span class="p">))</span><span class="w">
</span></code></pre></div></div>

<p>We can also look at the <a href="https://edu.nl/fkwy9">list of authors</a>, sorted by the number of pathways they are noted as first author on.
allong with their profile page on ORCID number:</p>

<div class="language-sparql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">PREFIX</span><span class="w"> </span><span class="nn">dc</span><span class="o">:</span><span class="w">    </span><span class="nn">&lt;http://purl.org/dc/elements/1.1/&gt;</span><span class="w">
</span><span class="k">PREFIX</span><span class="w"> </span><span class="nn">foaf</span><span class="o">:</span><span class="w">  </span><span class="nn">&lt;http://xmlns.com/foaf/0.1/&gt;</span><span class="w">
</span><span class="k">PREFIX</span><span class="w"> </span><span class="nn">wpq</span><span class="o">:</span><span class="w">   </span><span class="nn">&lt;http://www.wikidata.org/prop/qualifier/&gt;</span><span class="w">
</span><span class="k">PREFIX</span><span class="w"> </span><span class="nn">pav</span><span class="o">:</span><span class="w">   </span><span class="nn">&lt;http://purl.org/pav/&gt;</span><span class="w">

</span><span class="k">SELECT</span><span class="w"> </span><span class="p">(</span><span class="nb">COUNT</span><span class="p">(</span><span class="k">DISTINCT</span><span class="w"> </span><span class="nv">?pathway</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="nv">?count</span><span class="p">)</span><span class="w"> </span><span class="nv">?name</span><span class="w"> </span><span class="nv">?orcid</span><span class="w"> </span><span class="nv">?page</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="p">{</span><span class="w">
  </span><span class="k">VALUES</span><span class="w"> </span><span class="nv">?ordinal</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="s2">"1"</span><span class="w"> </span><span class="p">}</span><span class="w">
  </span><span class="nv">?author_</span><span class="w"> </span><span class="k">a</span><span class="w"> </span><span class="nn">foaf</span><span class="o">:</span><span class="ss">Person</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">wp</span><span class="o">:</span><span class="ss">hasAuthorship</span><span class="w"> </span><span class="nv">?authorship</span><span class="w"> </span><span class="p">.</span><span class="w">
  </span><span class="nv">?authorship</span><span class="w"> </span><span class="err">^</span><span class="nn">wp</span><span class="o">:</span><span class="ss">hasAuthorship</span><span class="w"> </span><span class="nv">?pathway</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">wpq</span><span class="o">:</span><span class="ss">series_ordinal</span><span class="w"> </span><span class="nv">?ordinal</span><span class="w"> </span><span class="p">.</span><span class="w">
  </span><span class="nv">?pathway</span><span class="w"> </span><span class="nn">pav</span><span class="o">:</span><span class="ss">hasVersion</span><span class="w"> </span><span class="nv">?pathway_</span><span class="w"> </span><span class="p">.</span><span class="w">
  </span><span class="nv">?pathway_</span><span class="w"> </span><span class="k">a</span><span class="w"> </span><span class="nn">wp</span><span class="o">:</span><span class="ss">Pathway</span><span class="w"> </span><span class="p">;</span><span class="w"> </span><span class="nn">dcterms</span><span class="o">:</span><span class="ss">identifier</span><span class="w"> </span><span class="nv">?version</span><span class="w"> </span><span class="p">.</span><span class="w">
  </span><span class="k">OPTIONAL</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nv">?author_</span><span class="w"> </span><span class="nn">foaf</span><span class="o">:</span><span class="ss">homepage</span><span class="w"> </span><span class="nv">?page</span><span class="w"> </span><span class="p">}</span><span class="w">
  </span><span class="k">OPTIONAL</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nv">?author_</span><span class="w"> </span><span class="nn">foaf</span><span class="o">:</span><span class="ss">name</span><span class="w"> </span><span class="nv">?name</span><span class="w"> </span><span class="p">}</span><span class="w">
  </span><span class="k">OPTIONAL</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nv">?author_</span><span class="w"> </span><span class="nn">dc</span><span class="o">:</span><span class="ss">identifier</span><span class="w"> </span><span class="nv">?orcid</span><span class="w"> </span><span class="p">}</span><span class="w">
</span><span class="p">}</span><span class="w"> </span><span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="nv">?ordinal</span><span class="w"> </span><span class="nv">?name</span><span class="w"> </span><span class="nv">?orcid</span><span class="w"> </span><span class="nv">?page</span><span class="w">
  </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="k">DESC</span><span class="p">(</span><span class="nv">?count</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>

<p>Is this the full story? No, of course not. There are so much details yet uncovered, but it gives a bit more
insight of where the biological knowledge in WikiPathways is coming from.</p>

<p>Want more peer review of the content? Then why not help setup a new community? Just ping me or
<a href="https://www.wikipathways.org/authors/Mkutmon">Martina</a>.</p>

      <h4>References</h4>
      <ul>
      
      
        <li><a href="https://doi.org/10.1093/database/bar032">10.1093/database/bar032</a></li>
      
      
        <li><a href="https://doi.org/10.1093/nar/gkt1063">10.1093/nar/gkt1063</a></li>
      
        <li><a href="https://doi.org/10.1093/nar/gkad896">10.1093/nar/gkad896</a></li>
      
        <li><a href="https://doi.org/10.1007/978-3-030-67727-5_73">10.1007/978-3-030-67727-5_73</a></li>
      
        <li><a href="https://doi.org/10.1007/s12263-010-0192-8">10.1007/s12263-010-0192-8</a></li>
      
        <li><a href="https://doi.org/10.1371/journal.pcbi.1004941">10.1371/journal.pcbi.1004941</a></li>
      
        <li><a href="https://doi.org/10.1007/s12263-010-0192-8">10.1007/s12263-010-0192-8</a></li>
      
        <li><a href="https://doi.org/10.37044/osf.io/m37f2_v1">10.37044/osf.io/m37f2_v1</a></li>
      </ul>
      ]]>
    </content>
    
    
      <author><name>Egon Willighagen</name><uri>https://orcid.org/0000-0001-7542-0286</uri></author>
    
    <category term="wikipathways"/><category term="rdf"/><category term="justdoi:10.1093/database/bar032"/><category term="sparql"/><category term="justdoi:10.1093/nar/gkt1063"/><category term="justdoi:10.1093/nar/gkad896"/><category term="doi:10.1007/978-3-030-67727-5_73"/><category term="justdoi:10.1007/s12263-010-0192-8"/><category term="justdoi:10.1371/journal.pcbi.1004941"/><category term="justdoi:10.1007/s12263-010-0192-8"/><category term="justdoi:10.37044/osf.io/m37f2_v1"/>
    
    <summary type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2026/02/22/where-do-the-wikipathways-come-from.html">
      <![CDATA[ WikiPathways was founded in 2008, in the year I left Wageningen (and we Nijmegen) and moved to Uppsala, Sweden. When we dediced to move back to The Netherlands in 2012, I got to opportunity to join the Department of Bioinformatics (BiGCaT) and work on Open PHACTS. I had visited the group in March 2011 because I had a COST action workshop near Maastricht (about nanoQSAR) and the bioinformatics group did WikiPathways. ]]>
    </summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://chem-bla-ics.linkedchemistry.info/assets/images/wikipathways_authorList.png"/>
    <media:content xmlns:media="http://search.yahoo.com/mrss/" medium="image" url="https://chem-bla-ics.linkedchemistry.info/assets/images/wikipathways_authorList.png"/></entry>
  
  <entry>
    <title type="html">The TDCC NES Col-Lab Retreat</title>
    <link href="https://chem-bla-ics.linkedchemistry.info/2026/02/21/the-tdcc-nes-col-lab-retreat.html" rel="alternate" type="text/html" title="The TDCC NES Col-Lab Retreat"/>
    <published>2026-02-21T00:00:00+00:00</published>
    <updated>2026-02-21T00:00:00+00:00</updated>
    <id>https://doi.org/10.59350/pm3c5-89k94</id>
    <content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2026/02/21/the-tdcc-nes-col-lab-retreat.html">
      <![CDATA[ <p>Last autumn two TDCC projects started, <em>FAIR4ChemNL</em> (<a href="https://chem-bla-ics.linkedchemistry.info/2026/02/08/open-infrastructures.html">with the PeerTube channel</a>
and doi:<a href="https://doi.org/10.61686/XVYQV45374">10.61686/XVYQV45374</a>) and <em>FAIRify for metabolomics data</em>
(doi:<a href="https://doi.org/10.61686/CSGIP04334">10.61686/CSGIP04334</a>). But I haven’t written much on either yet and what the role is our research group in these projects.</p>

<p>Let’s start with what the TDCC actually are: they are <a href="https://tdcc.nl/">Thematic Digital Competence Centres</a>:</p>

<blockquote>
  <p>The Thematic Digital Competence Centres (TDCCs) are network-based initiatives set up by NWO and the Dutch academic
community to broker investments into research data management projects. The three TDCCs are national and discipline
based, with one pillar each for the Social Sciences &amp; Humanities (SSH), Natural and Engineering Sciences (NES) and
Life Sciences &amp; Health (LSH). The networks will help formulate and facilitate projects designed to promote the adoption
of open data, software and research practices, alongside the development of the necessary expertise.</p>
</blockquote>

<p>So, where initiatives like <a href="https://www.go-fair.org/">GO FAIR</a> had centers of competencies (the implementation networks),
they did not have funding for them. This was a main reason why the <em>Chemistry Implementation Network</em> (ChIN,
doi:<a href="https://doi.org/10.1162/dint_a_00035">10.1162/dint_a_00035</a>) did not take off.
The TDCCs do not provide a lot of money, but enough to support disseminating expertise and promote some key ideas.</p>

<p>The idea is that combined with other efforts, it strengthens the level of FAIR in the Dutch research community.
I have to say, this is much needed, as the level of FAIR data in journal publications is so much to wish for,
and still mostly absent.</p>

<p>The FAIR4ChemNL project already had a networking activity during the writing of the proposal, the workshop already
back in 2024 that I <a href="https://chem-bla-ics.linkedchemistry.info/2024/06/10/two-meetings.html">blogged about earlier</a>
(see also <a href="https://doi.org/10.5281/zenodo.15050550">this report</a>).
The FAIRify project is coordinated by the group that was key in the <em>Netherlands Metabolomics Center</em> (NMC), now the
<a href="https://metabolomicscentre.nl/">BeneLux Metabolomics Center</a>. During a postdoc at the NMC during my Wageningen
days, we already did a lot of FAIR competency building with <a href="https://chem-bla-ics.linkedchemistry.info/tag/metware">the MetWare project</a>.</p>

<h2 id="the-col-lab-retreat">The Col-Lab Retreat</h2>

<p>The <a href="https://tdcc.nl/about-tddc/nes/">TDCC-NES</a> organized a networking event in August last year,
the 2025 <a href="https://nescollab.nl/">TDCC-NES Col-Lab Retreat</a>. I am late with
reporting on it, but there simply was too much project management that took priority. The meeting was in the
wonderful Dutch town Schoorl, and the location is great for collaborative meetings. I had been there a year
earlier for an Open Science Retreat and was happy to go back.</p>

<p>During the unconference-style meeting <a href="https://tdcc.nl/creating-space-for-our-community-the-story-of-our-nes-col-lab-retreat/">various topics were discussed</a>
in breakout groups, and because of the two TDCC projects, I was particularly interested in the <em>Metadata and interoperability</em>
topic. Partly because this is how we can make eletronic lab notebooks automatically push metadata to
registries (and <a href="https://www.linkedin.com/in/rory-macneil-68a80011/">Rory Macneil</a> was also in Schoorl,
of <a href="https://www.researchspace.com/">RSpace/ResearchSpace</a> which already integrated with various open
platforms), and partly because I wanted to continue explore <a href="https://chem-bla-ics.linkedchemistry.info/tag/nanopub">nanopublications</a>
with <a href="https://fediscience.org/@rupdecat">Christian Meesters</a>, which could be the envelope to distribute
the metadata. For the last, I was looking at the Java library for nanopublications
(see <a href="https://github.com/Nanopublication/nanopub-java/pull/52">this PR</a>.</p>

<p>The idea that ELNs automatically share metadata about experiments is something that is attractive.
It would require no involvement from the researcher, would be fully automatic, and drive interest
(users, peer reviewers) to experiments and experimental data. Something that is still absurdly hard
is to do a search for experiments that measured the melting point of some chemical. How
awesome would it be if ELNs would automatically register chemicals from the experiment in,
for example, <a href="https://pubchem.ncbi.nlm.nih.gov/">PubChem</a>.</p>

<p>We had the idea of applying for a Lorentz Workshop, but the earliest deadline was too early, but
maybe it is time to pick up that idea again. Interoperability standards already exist, like
the aforementioned nanopubs, but also <a href="https://www.researchobject.org/ro-crate/">RO-Crates</a> that are also studied by Jente Houweling
in the VHP4Safety project (see <a href="https://platform.vhp4safety.nl/data">this Data tab</a> for a preview).</p>

      <h4>References</h4>
      <ul>
      
        <li><a href="https://doi.org/10.1162/DINT_A_00035">10.1162/DINT_A_00035</a></li>
      
      
      
      
      
        
      
      
      
      </ul>
      ]]>
    </content>
    
    
      <author><name>Egon Willighagen</name><uri>https://orcid.org/0000-0001-7542-0286</uri></author>
    
    <category term="fair"/><category term="doi:10.1162/DINT_A_00035"/><category term="chemistry"/><category term="metabolomics"/><category term="fair4chemnl"/><category term="fairify"/><category term="cito:citesAsEvidence:10.5281/ZENODO.15050550"/><category term="nanopub"/><category term="crate"/><category term="pubchem"/>
    
    <summary type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2026/02/21/the-tdcc-nes-col-lab-retreat.html">
      <![CDATA[ Last autumn two TDCC projects started, FAIR4ChemNL (with the PeerTube channel and doi:10.61686/XVYQV45374) and FAIRify for metabolomics data (doi:10.61686/CSGIP04334). But I haven’t written much on either yet and what the role is our research group in these projects. ]]>
    </summary></entry>
  
  <entry>
    <title type="html">Open Infrastructures #2: the SURF Fediverse</title>
    <link href="https://chem-bla-ics.linkedchemistry.info/2026/02/08/open-infrastructures.html" rel="alternate" type="text/html" title="Open Infrastructures #2: the SURF Fediverse"/>
    <published>2026-02-08T00:00:00+00:00</published>
    <updated>2026-02-08T00:00:00+00:00</updated>
    <id>https://doi.org/10.59350/1ja9h-jem83</id>
    <content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2026/02/08/open-infrastructures.html">
      <![CDATA[ <p>When I first started writing this post, I started writing up why scientific communication is important, but because I started
explaining what needs improving, and what are underlying causes why change is not happening, it got dark pretty quickly. So,
I deleted that essay again. Instead, let’s just enjoy the awesome and long list of solutions we have for scientific discourse.
Readers of my blog can find many posts in the past 20 years about the diversification.
One thing I will say before I move one, is a reply to the argument that journal-based peer review is essential to the
quality of research: if the quality of your research is dependent on your peers, then please rethink why you are doing research.</p>

<p>Now, about the <a href="https://en.wikipedia.org/wiki/Fediverse">fediverse</a>…</p>

<h2 id="mastodon-service-by-surf">Mastodon (service by SURF)</h2>

<p><a href="https://en.wikipedia.org/wiki/Mastodon_(social_network)">Mastodon</a> is one of the more well-known corners of the fediverse,
and I <a href="https://chem-bla-ics.linkedchemistry.info/tag/mastodon">blogged about it before</a>.
It is intrinsically open, while it has extensive options to make things more private. It is like Twitter but then without the
central control. It is unlike Slack, <a href="https://en.wikipedia.org/wiki/Zulip">Zulip</a>, and LinkedIn which has clear walls around communities.
It also is unlike past efforts like Google Wave and <a href="https://chem-bla-ics.linkedchemistry.info/tag/friendfeed">FriendFeed</a>
which created much more structured discourse.</p>

<p>But I enjoy Mastodon. It has all the good science, the friendly, helpful people, and I have many options to block people,
fediverse servers, and even individual keywords (you can remove anything “PFAS”, for example, something hard in the real world).
But you also have linear timeline, with just content of the people you follow.</p>

<p>And, with the <a href="http://surf.nl/">SURF</a> <a href="https://social.edu.nl/">social.edu.nl</a> server, every researcher from a SURF-linked
research insitute can get an account there via <a href="https://www.surf.nl/en/services/identity-access-management/surfconext">SURFconext</a>
(the Mastodon solution may need to be activited by your institute first; if so, ask your institute ICT to enable it).
The list of accounts on this SURF Mastodon server shows <a href="https://social.edu.nl/directory?order=active">a veried list of people and organisations</a>,
but you can also check this list of <a href="https://communities.surf.nl/publieke-waarden/artikel/80-ways-to-follow-research-science-and-education-on-mastodon">80 Ways to follow Research, Science and Education on Mastodon</a>.
Or <a href="https://chem-bla-ics.linkedchemistry.info/2022/11/21/finding-mastodon-accounts-with-wikidata.html">this list of Wikidata queries</a>.</p>

<p>I think every organization that communicates their research should have at least one open world communication channel,
and if they then like to keep their wall-garden LinkedIn account too, that is fine. But societal impact for just a select group
of people feels a bit awkward to me.</p>

<h2 id="peertube-service-by-surf">PeerTube (service by SURF)</h2>

<p>But SURF operates a second fediverse server, one using the <a href="https://en.wikipedia.org/wiki/PeerTube">PeerTube</a> software, also
extended with the SURFconext interoperability. PeerTube is a platform to share videos, like YouTube.
Just before the winter holiday, I got the opportunity to create two project accounts on SURF’s <a href="https://video.edu.nl/">video.edu.nl</a>,
one for the <a href="https://vhp4safety.nl/">VHP4Safety</a> project and one for the <a href="https://tdcc.nl/projects/tdcc-nes-projects/fair4chemnl-accelerating-the-adoption-of-universal-data-standards-in-chemistry/">FAIR4ChemNL</a>
project.</p>

<p>The cool thing actually is that SURFconext has group accounts via <a href="https://servicedesk.surf.nl/wiki/spaces/IAM/pages/92668196/SURFconext+Invite+EN">SURFconext Invite</a>
(it was earlier called <em>SURFconext Teams</em>), so these two PeerTube channels are operated by two or more
people from the project, and the two videos that are now available, have not actually been uploaded by me.</p>

<p>But I am very excited we now have channels to share our video communication, <a href="https://video.edu.nl/a/vhp4safety/videos">here for VHP4Safety</a>:</p>

<p><img src="/assets/images/peertube_vhp4safety.png" alt="" /></p>

<p>And <a href="https://video.edu.nl/a/fair4chemnl/videos">here for FAIR4ChemNL</a>:</p>

<p><img src="/assets/images/peertube_fair4chemnl.png" alt="" /></p>

<!-- Communication infrastructure behind the world wide web has been open infrastructure for a long time, including email, the web itself,
and internet relay chat. Early commercial alternatives, like Compuserve and AOL, created walled gardens using unique information, quite like
Netflix, HBO, and AppleTV do now. While these disappeared, the commercial need for walls is deep rooted in the Western culture.
And the walled gardens won in the end. The do for streaming, for searching, and increasingly for communication. The latter, of course,
is causing a lot of social problems, by controlling who can say what to whom. And being operated by huge interantional companies, the
often operate outside law. Even the European Commissions cannot keep them within legal limits.

It is essential to realize this affects the research community hard. The publishing industry is largely a walled garden: it was
before open access and with APC-that-come-with-30-percent-profit as the norm the walls have not really dropped. If you prefer to
talk about the peer review walls, the walls exist just as well: who can do peer review (is allowed inside the wall), who decides
which peer reviewers are important (who gets thrown outside the wall), and why post-publication peer review is not a thing
(only thing inside the wall matter). The walls, unfortunately, are often based on good looks (like journal impact factor,
the label "American" or "Society") and discussions about quality are mostly pushed outside the wall.

Yet, communication is a central activity in doing research, and open communication channels are to me an essential part
of that. If the discussion of good science is limited to those in power, this can only harm science. Of course, retractions are
rare, fraud even more, and any correlation with anything cannot happen inside the walls (until it does).
Unfortunately, until we can untangle the notion of peer review from prestige, power, and money, it will not easily change. -->

      <h4>References</h4>
      <ul>
      
      
      
      </ul>
      ]]>
    </content>
    
    
      <author><name>Egon Willighagen</name><uri>https://orcid.org/0000-0001-7542-0286</uri></author>
    
    <category term="mastodon"/><category term="peertube"/><category term="vhp4safety"/><category term="fair4chemnl"/>
    
    <summary type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2026/02/08/open-infrastructures.html">
      <![CDATA[ When I first started writing this post, I started writing up why scientific communication is important, but because I started explaining what needs improving, and what are underlying causes why change is not happening, it got dark pretty quickly. So, I deleted that essay again. Instead, let’s just enjoy the awesome and long list of solutions we have for scientific discourse. Readers of my blog can find many posts in the past 20 years about the diversification. One thing I will say before I move one, is a reply to the argument that journal-based peer review is essential to the quality of research: if the quality of your research is dependent on your peers, then please rethink why you are doing research. ]]>
    </summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://chem-bla-ics.linkedchemistry.info/assets/images/peertube_fair4chemnl.png"/>
    <media:content xmlns:media="http://search.yahoo.com/mrss/" medium="image" url="https://chem-bla-ics.linkedchemistry.info/assets/images/peertube_fair4chemnl.png"/></entry>
  
  <entry>
    <title type="html">Chemical blogs history</title>
    <link href="https://chem-bla-ics.linkedchemistry.info/2026/01/17/chemical-blogs-history.html" rel="alternate" type="text/html" title="Chemical blogs history"/>
    <published>2026-01-17T00:00:00+00:00</published>
    <updated>2026-03-20T00:00:00+00:00</updated>
    <id>https://doi.org/10.59350/v13h7-7av66</id>
    <content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2026/01/17/chemical-blogs-history.html">
      <![CDATA[ <p>Like many awesome internet phenomena, <a href="https://en.wikipedia.org/wiki/Blog">blogging started in the late nineties</a>.
<a href="https://doi.org/10.1038/432933a">Nature</a> <a href="https://doi.org/10.1038/ngeo170">authors</a> <a href="https://doi.org/10.1038/ncb0905-845b">and</a>
<a href="https://doi.org/10.1038/4571058a">editors</a> recognized the effort early. In 2006 there were already
more than 45 million blogs, and at least 50 science blogs made it in the top 50,000 and
<a href="https://doi.org/10.1038/442009a">5 in the top 3,500</a>.</p>

<p>I started blogging in 2005, around the time many others did, among which many chemists.
in 2006 I started a website called <a href="https://chem-bla-ics.linkedchemistry.info/2006/08/25/chemical-blogspace.html">Chemical blogspace</a>
using the <em>Postgenomic.com</em> software. Chemical blogspace extracted which journal articles
were discussed (yeah, there is <a href="https://www.linkedin.com/pulse/how-did-altmetric-come-euan-adie/">a causal relationship with altmetrics</a>!),
and <a href="https://chem-bla-ics.linkedchemistry.info/2006/02/25/hacking-inchi-support-into.html">I added recognition of chemicals</a>,
so that you could <a href="https://chem-bla-ics.linkedchemistry.info/2007/01/04/chemical-blogspace-is-getting-more.html">follow blog posts talking about a specific chemical</a>.
I <a href="https://chem-bla-ics.linkedchemistry.info/2007/10/16/lunch-at-nature-hq-with-euan-joanna-ian.html">visited Euan Adie and others in 2007</a>.
I had to sunset Chemical blogspace several years later, in a time where blogging seems to
be on its return, overtaken by microblogging platforms like Twitter (which died in 2022).</p>

<p>We know now that it didn’t really go away, however. If we look at <a href="https://chem-bla-ics.linkedchemistry.info/2024/07/21/rogue-scholar-and-more.html">Rogue Scholar</a>
we <a href="https://docs.rogue-scholar.org/dashboard">see there is plenty of activity</a>, indeed.
I am very interested in restarting something like Chemical blogspace, based on Rogue Scholar.
The nice things of Chemical blogspace was that it created a virtual community, and in
the end it aggregated and indexed more than 250 chemistry blogs. I would love to see
many of them archived on Rogue Scholar, but the blog authors have to
<a href="https://tally.so/r/nPvNK0">recommend their blog personally here</a>.</p>

<p>You can also just visit many of these blogs to relive the dynamics at the time:</p>

<ul>
  <li><a href="https://chemicalblogspace.blogspot.com/2006/12/new-blogs-1.html">New Blogs #1</a> (2006)</li>
  <li><a href="https://chemicalblogspace.blogspot.com/2007/01/new-blogs-2.html">New Blogs #2</a> (2007)</li>
  <li><a href="https://chemicalblogspace.blogspot.com/2007/02/new-blogs-3.html">New Blogs #3</a></li>
  <li><a href="https://chemicalblogspace.blogspot.com/2007/03/new-blogs-4.html">New Blogs #4</a></li>
  <li><a href="https://chemicalblogspace.blogspot.com/2007/04/new-blogs-5.html">New Blogs #5</a></li>
  <li><a href="https://chemicalblogspace.blogspot.com/2007/05/new-blogs-6.html">New Blogs #6</a></li>
  <li><a href="https://chemicalblogspace.blogspot.com/2007/06/these-are-new-blogs-that-entered.html">New Blogs #7</a></li>
  <li><a href="https://chemicalblogspace.blogspot.com/2007/10/new-blogs-8.html">New Blogs #8</a></li>
  <li><a href="https://chemicalblogspace.blogspot.com/2008/04/new-blogs-9.html">New Blogs #9</a> (2008)</li>
  <li><a href="https://chem-bla-ics.linkedchemistry.info/2009/07/23/new-blogs-10.html">New Blogs #10 <i class="fa-solid fa-recycle fa-xs"></i></a> (2009)</li>
  <li><a href="https://chem-bla-ics.linkedchemistry.info/2009/07/31/new-blogs-11.html">New Blogs #11 <i class="fa-solid fa-recycle fa-xs"></i></a></li>
  <li><a href="https://chemicalblogspace.blogspot.com/2009/11/new-blogs-12.html">New Blogs #12</a></li>
  <li><a href="https://chem-bla-ics.linkedchemistry.info/2010/07/15/cb-new-blogs-13.html">Cb: New Blogs #13</a> (2010)</li>
  <li><a href="https://chem-bla-ics.linkedchemistry.info/2010/10/22/cb-new-blogs-14.html">Cb: New Blogs #14</a></li>
</ul>

<p>A lot has happened since then. There are <a href="https://docs.rogue-scholar.org/dashboard">new platforms</a>.
Blogger and Wordpress are still the bigger platform, but Hugo, Jekyll, and Quarto are modern, open source
alternatives. <a href="https://www.anildash.com/2026/01/09/how-markdown-took-over-the-world/">Markdown may have helped</a>
with the revival of blogging, making it easier than ever.</p>

<p>What is your current favorite chemistry blog? Love to hear from you!</p>

      <h4>References</h4>
      <ul>
      
      
        <li><a href="https://doi.org/10.1038/4571058a">10.1038/4571058a</a></li>
      
        <li><a href="https://doi.org/10.1038/ngeo170">10.1038/ngeo170</a></li>
      
        <li><a href="https://doi.org/10.1038/ncb0905-845b">10.1038/ncb0905-845b</a></li>
      
        <li><a href="https://doi.org/10.1038/442009a">10.1038/442009a</a></li>
      
        <li><a href="https://doi.org/10.1038/432933a">10.1038/432933a</a></li>
      </ul>
      ]]>
    </content>
    
    
      <author><name>Egon Willighagen</name><uri>https://orcid.org/0000-0001-7542-0286</uri></author>
    
    <category term="blog"/><category term="nature"/><category term="justdoi:10.1038/4571058a"/><category term="justdoi:10.1038/ngeo170"/><category term="justdoi:10.1038/ncb0905-845b"/><category term="justdoi:10.1038/442009a"/><category term="justdoi:10.1038/432933a"/>
    
    <summary type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2026/01/17/chemical-blogs-history.html">
      <![CDATA[ Like many awesome internet phenomena, blogging started in the late nineties. Nature authors and editors recognized the effort early. In 2006 there were already more than 45 million blogs, and at least 50 science blogs made it in the top 50,000 and 5 in the top 3,500. ]]>
    </summary></entry>
  
  <entry>
    <title type="html">Where does the WikiPathways Cited In information come from?</title>
    <link href="https://chem-bla-ics.linkedchemistry.info/2026/01/10/where-does-the-wikipathways-cited-in-information-come-from.html" rel="alternate" type="text/html" title="Where does the WikiPathways Cited In information come from?"/>
    <published>2026-01-10T00:00:00+00:00</published>
    <updated>2026-01-10T00:00:00+00:00</updated>
    <id>https://doi.org/10.59350/0xxqw-90533</id>
    <content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2026/01/10/where-does-the-wikipathways-cited-in-information-come-from.html">
      <![CDATA[ <p>I have been wanting to blog about this since this summer, but with everything going on, I never really got around to it.
What is this <em>Cited In</em> feature of <a href="https://wikipathways.org/">WikiPathways</a> and where does that information come from?
If you have not noticed this yet, this is what it looks like for <a href="https://www.wikipathways.org/instance/WP4846">WP4846</a>:</p>

<p><img src="/assets/images/wp_cited_in.png" alt="" /></p>

<p>Recently, I was close to writing up the context, because it is related to a new feature of the profile pages, where you
now can look up citations to pathways that you first authored (see
<a href="https://chem-bla-ics.linkedchemistry.info/2025/11/30/wikipathways-curation-reports-on-profile-pages.html">this post</a>).
And it also relates to the data I have been collecting around <a href="https://chem-bla-ics.linkedchemistry.info/tag/cito">citation intention annotations</a>:
articles that cite one of the WikiPathways papers and mention a specific pathway, could be considered <em>cito:usesDataFrom</em>
(see doi:<a href="https://doi.org/10.1186/s13321-023-00683-2">10.1186/s13321-023-00683-2</a>).</p>

<p>A third angle to citations to specific WikiPathways is the following. WikiPathways is used a lot in data analyses and
putting experimental data in biological context. How researchers do this varies a lot, in multiple ways. But just
thinking about this factually, research output cite specific biological pathways. And there are some interesting
phenomena there. Back in 2015 at the Metabolomics Society meeting in San Francisco (apparently, I only
blogged about the meeting only <a href="https://chem-bla-ics.blogspot.com/2015/06/metsoc2015-converting-smiles-annotation.html">once</a>?),
when I visited the 500+ posters looking for interesting biological pathways, there were a lot of studies
on different species, different diseases, different toxicities. The biological response had one thing in common:
it always was the TCA cycle that was key (see doi:<a href="https://doi.org/10.1096/FJ.11-203091">10.1096/FJ.11-203091</a> for
a 2012 comparison of TCA models).</p>

<p>Thus, with so many articles mentioned specific pathways and deriving biological knowledge from this, what is
reasonable to expect? Do we expect <em>co-citation</em> effects? That is, if two articles found the same set of pathways
of interest to their data, is the data showing a similar biological response? Do we expect a similar thing
like the above TCA cycle in metabolomics, something similar to the notion of <em>frequent hitters</em> (see
doi:<a href="https://doi.org/10.1021/jm010934d">10.1021/jm010934d</a>)?</p>

<p>Of course, to test this hypothesis we need data and the <em>Cited In</em> feature comes in. At the time of
writing of this blog post, we can see on <a href="https://www.wikipathways.org/browse/citedin.html">this page</a>
that 878 pathways have been cited a total of 2715 times. We are getting somewhere. This blog
post will not analyze this data, which is one reason why I had not blogged about it. But from the
above you can understand that I want to :)</p>

<h2 id="the-cited-in-feature">The Cited In feature</h2>

<p>This <em>Cited In</em> feature was introduced along with the new website (see doi:<a href="https://doi.org/10.1093/nar/gkad960">10.1093/nar/gkad960</a>),
where we change how GPML files are stored and how web pages are created from that.
Because we are no longer confined to the MediaWiki platform (which has served the project for very long,
very effectively), it is easier to integrate information from other sources. For example,
from literature databases. This feature was developed by <a href="https://orcid.org/0000-0001-5706-2163">Alex Pico</a>
at the Gladstone Institutes (see <a href="https://github.com/wikipathways/wikipathways-database/commit/840234adfd581730d86553910c078401351606ce">this 2022 commit</a>),
where he uses the <a href="https://www.ncbi.nlm.nih.gov/books/NBK25497/">NCBI eUtils API</a> to access
<a href="https://pmc.ncbi.nlm.nih.gov/">PubMed Central</a>.
The data is then collected into <a href="https://github.com/wikipathways/wikipathways-database/blob/main/downstream/citedin_lookup.yml">this YAML file</a>
which then gets used to generate webpage content (like the section in the above screenshot
and the page mentioning the current statistics).</p>

<h2 id="where-is-the-data-coming-from">Where is the data coming from?</h2>

<p>As just explained, originally the data was only coming from NCBI.
However, because I found many articles citting specific pathways that were not picked up by this
approach, and I wanted more data, so I started searching <a href="https://europepmc.org/">Europe PMC</a> the European
partner of PubMed Central. However, I am not automating this. I want to see the data, the articles, and
how people cite the pathways. I need to see that so that I can better understand how people are
using the data/knowledge from WikiPathways. I cannot keep up with checking why people are citing
my own research, but <a href="https://chem-bla-ics.linkedchemistry.info/2010/10/31/citeulike-cito-use-case-1-wordles.html">I once was</a>.
I learn(-ed) a lot from that.</p>

<p>I normally use a search that requires the word “WikiPathways” to be
<a href="https://europepmc.org/search?query=wikipathways">mentioned in the article</a> (in most, but
not all of them; citing literature you extend sounds like a core scholarly value, but is factually
not systematically complied with), and then manually searching for “WP”. With close to 1000
PubMed Central articles mentioning WikiPathways in 2025 and that these are mostly full texts,
I can see if the cite specific pathways. A good number of article mentions the WikiPathways
identifier, e.g. the aforementioned <code class="language-plaintext highlighter-rouge">WP4846</code>. If the article only mentions a pathway title,
I cannot confidently identify which pathway is cited, so I exclude that.</p>

<p>I originally started out manually editing the YAML file where the citations are collected,
but by now use <a href="https://github.com/wikipathways/wikipathways-database/blob/main/scripts/citedin_fromFile.R">a script similar to Alex’ R script</a>.
This makes it far easier to scale up, as I just have to populate a three column TSV file,
which is used by my R script to update the YAML file. This manual approach ensures that
I am not looking at text mining results, but see the citation of the WikiPathways identifier
with my own eyes. That’s just how I like it.</p>

<p>The full history of the YAML file content can be found on <a href="https://github.com/wikipathways/wikipathways-database/commits/main/downstream/citedin_lookup.yml">this GitHub page</a>
and <a href="https://github.com/wikipathways/wikipathways-database/blame/main/downstream/citedin_lookup.yml">this <em>git blame</em></a>
tells you if the information came from PubMed Central via the API, or was added by me:</p>

<p><img src="/assets/images/wp_cited_in_git_blame.png" alt="" /></p>

<p>This is Open Science in action: added transparency and making it easier for anyone to verify,
so that no one needs to be stuck in (dis)trust.</p>

<p>Of course, as we know from the CiTO ontology and real-world data, there are so
many different reasons why journal articles are cited (just <a href="https://chem-bla-ics.linkedchemistry.info/2024/08/07/cito-updates.html">an example</a>),
the data in the YAML file and on the WikiPathways website in the <em>Cited In</em> feature
does not have direct meaning. Just like a high citation count for an article or
even a journal impact factor cannot be directly interpreted (despite so many researchers
just blindly doing just that).</p>

<h2 id="whats-next">What’s next?</h2>

<p>Well, while I did not do any analysis yet, and do not even know yet how much citations we need to
reach some level of statistical significance, there are some observations I can mention:</p>

<ul>
  <li>if your analysis included anything like linking your data to pathways, citing those pathways is
a good way to give credit to the researchers that created that pathway</li>
  <li>if you cite data, please cite that as accurately as possible, see e.g. DataCite</li>
  <li>I wish all journal articles citing specific pathways from WikiPathways would include the pathway identifier</li>
  <li>I congratulate those authors that even mentioned the revision of the pathway! well done!</li>
</ul>

<p>And about biological interpretation, our group has long published that some genes with
differential data mapping to a pathway does not imply that that pathway is really affected.
Gene-set enrichment and over-representation analysis are a starting point; not a conclusion.
I wish more people were more aware of the work in our (now)
<a href="https://cris.maastrichtuniversity.nl/en/organisations/translational-genomics/">Translational Genomics research group</a>.
Like that of <a href="https://orcid.org/0000-0002-7699-8191">Martina Kutmon</a> (now as
<a href="https://www.maastrichtuniversity.nl/research/maastricht-centre-systems-biology-and-bioinformatics">MaCSBio<sup>2</sup></a>),
whom I have had the pleasure of collaborating with for quite some years now (and long time
archtect of WikiPathways).</p>

<p>There is so much more I want to write up about WikiPathways, but I leave it to this
for now.</p>

      <h4>References</h4>
      <ul>
      
      
        <li><a href="https://doi.org/10.1186/S13321-023-00683-2">10.1186/S13321-023-00683-2</a></li>
      
        
      
        
      
        <li><a href="https://doi.org/10.1093/NAR/GKAD960">10.1093/NAR/GKAD960</a></li>
      </ul>
      ]]>
    </content>
    
    
      <author><name>Egon Willighagen</name><uri>https://orcid.org/0000-0001-7542-0286</uri></author>
    
    <category term="wikipathways"/><category term="europepmc"/><category term="doi:10.1186/S13321-023-00683-2"/><category term="cito:citesAsDataSource:10.1096/FJ.11-203091"/><category term="cito:obtainsBackgroundFrom:10.1021/jm010934d"/><category term="doi:10.1093/NAR/GKAD960"/>
    
    <summary type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2026/01/10/where-does-the-wikipathways-cited-in-information-come-from.html">
      <![CDATA[ I have been wanting to blog about this since this summer, but with everything going on, I never really got around to it. What is this Cited In feature of WikiPathways and where does that information come from? If you have not noticed this yet, this is what it looks like for WP4846: ]]>
    </summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://chem-bla-ics.linkedchemistry.info/assets/images/wp_cited_in.png"/>
    <media:content xmlns:media="http://search.yahoo.com/mrss/" medium="image" url="https://chem-bla-ics.linkedchemistry.info/assets/images/wp_cited_in.png"/></entry>
  
</feed>
