<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.3.4">Jekyll</generator><link href="https://chem-bla-ics.linkedchemistry.info/feed/by_tag/wikidata.xml" rel="self" type="application/atom+xml" /><link href="https://chem-bla-ics.linkedchemistry.info/" rel="alternate" type="text/html" /><updated>2026-04-19T09:50:36+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/feed/by_tag/wikidata.xml</id><title type="html">chem-bla-ics</title><subtitle>Chemblaics (pronounced chem-bla-ics) is the science that uses open science and computers to solve problems in chemistry, biochemistry and related fields.</subtitle><author><name>Egon Willighagen</name></author><entry><title type="html">SWAT4HCLS 2026 Amsterdam this week</title><link href="https://chem-bla-ics.linkedchemistry.info/2026/03/22/swat4hcls-2026-amsterdam-this-week.html" rel="alternate" type="text/html" title="SWAT4HCLS 2026 Amsterdam this week" /><published>2026-03-22T00:00:00+00:00</published><updated>2026-03-22T00:00:00+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/2026/03/22/swat4hcls-2026-amsterdam-this-week</id><content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2026/03/22/swat4hcls-2026-amsterdam-this-week.html"><![CDATA[<p>Tomorrow, <a href="https://www.swat4ls.org/workshops/amsterdam2026/">SWAT4HCLS 2026</a> will start, again in Amsterdam.
The first SWAT4LS I attended <a href="https://chem-bla-ics.linkedchemistry.info/2009/11/21/swat4ls-linking-open-drug-data-to.html">was also in Amsterdam</a>, and the second meeting in Amsterdam I was <a href="https://chem-bla-ics.linkedchemistry.info/2016/12/18/my-swat4ls-poster-about-enanomapper.html">also there</a>. And I was in <a href="https://www.swat4ls.org/workshops/cambridge2015/index.php">Cambridge</a> (see
<a href="https://chem-bla-ics.blogspot.com/2015/12/swat4ls-in-cambridge.html">this post</a>),
<a href="https://www.swat4ls.org/workshops/antwerp2018/">Antwerp</a>  (no post), and at least to one of the two
<a href="https://www.swat4ls.org/workshops/leiden2024/">Leiden</a> meetings (also no posts, it seems).</p>

<p>I am looking forward to meet old friends, new friends (some whom I never met in person), and
recent collaborators (that I never met in person).
For those who will not be in Amsterdam, you can follow the meeting on social media with
the <a href="https://hashtags-hub.toolforge.org/swat4hcls">hashtag #swat4hcls</a>. And there is also
<a href="https://fediwall.biohackrxiv.org/">this BioHackrXiv Fediwall</a>, for those in the
<a href="https://en.wikipedia.org/wiki/Fediverse">fediverse</a>.</p>

<h3 id="scholia-demo">Scholia demo</h3>

<p>I will give a demo to update people on the work in the <a href="https://github.com/wdscholia/scholia">Scholia</a> project with
Daniel Mietchen, Peter Patel-Schneider, Konrad Linden, Johannes Kalmbach,
Lars Willighagen, Wolfgang Fahl, and Hannah Bast (also keynote in Amsterdam)
to <a href="https://chem-bla-ics.linkedchemistry.info/2026/02/28/rescuing-scholia-3-we-did-it.html">update the SPARQL queries</a>
we use to visualize data in <a href="https://www.wikidata.org/">Wikidata</a> to SPARQL 1.1 so that it can run on
<a href="https://qlever.dev/">Qlever</a>.
The abstract can be <a href="https://commons.wikimedia.org/wiki/File:Scholia_2026_Compliance_with_SPARQL_1.1.pdf">found in Wikimedia Commons</a>.</p>

<p>This was the outcome of many years figuring how to ensure Scholia could remain working. The
<a href="https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2026-02-17/Technology_report">Wikidata RDF graph split</a>
has given us many headaches, so many that just before christmas it became it could
be possible to survive the split, I was so happy, I realize I want to share this news. So, we teamed
up and wrote this demonstration contribution abstract. Thanks to everyone who made this happen!
Just to be clear, we are not done yet. The system is not running outside the Wikimedia Foundation
platforms.</p>

<p>One of the reviewer comments requested <a href="https://qlever.scholia.wiki/event/Q138033585">a Scholia page for the meeting</a>.
It has not been updated for the accepted speakers, but you can look at <a href="https://qlever.scholia.wiki/event-series/Q56846035">pages for past meetings</a>
to get an idea what you will find.</p>

<h3 id="swat4hcls-biohackathon-2026">SWAT4HCLS Biohackathon 2026</h3>

<p>There will also be <a href="https://www.swat4ls.org/workshops/amsterdam2026/swat4hcls-biohackathon-2026/">a biohackathon again</a>,
of course, with the <a href="https://index.biohackrxiv.org/tag/SWAT4HCLS26">option for BioHackRxiv reports</a>.
There are already <a href="https://www.swat4ls.org/workshops/amsterdam2026/swat4hcls-biohackathon-2026/">several pitches</a>,
including one that I submitted about Scholia.</p>]]></content><author><name>Egon Willighagen</name></author><category term="rdf" /><category term="sparql" /><category term="swat4ls" /><category term="wikidata" /><summary type="html"><![CDATA[Tomorrow, SWAT4HCLS 2026 will start, again in Amsterdam. The first SWAT4LS I attended was also in Amsterdam, and the second meeting in Amsterdam I was also there. And I was in Cambridge (see this post), Antwerp (no post), and at least to one of the two Leiden meetings (also no posts, it seems).]]></summary></entry><entry><title type="html">Rescuing Scholia #2: getting closer</title><link href="https://chem-bla-ics.linkedchemistry.info/2025/12/31/rescuing-scholia-2-getting-close.html" rel="alternate" type="text/html" title="Rescuing Scholia #2: getting closer" /><published>2025-12-31T00:00:00+00:00</published><updated>2025-12-31T00:00:00+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/2025/12/31/rescuing-scholia-2-getting-close</id><content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2025/12/31/rescuing-scholia-2-getting-close.html"><![CDATA[<p>Three weeks ago, I wrote a the post <a href="https://chem-bla-ics.linkedchemistry.info/2025/12/08/rescuing-scholia.html">Rescuing Scholia: will we make it in time?</a>,
where I sketched a future without <a href="https://scholia.toolforge.org/">Scholia</a>. Scholia, started
<a href="https://chem-bla-ics.linkedchemistry.info/2023/01/27/scholia-timeline.html">almost 10 years ago</a>
and I think it is worth keeping around longer.</p>

<p>Fortunately, it looks like we will have a working replacement in time before the
<a href="https://www.mediawiki.org/wiki/Wikidata_Query_Service">WDQS</a> instance with all the
<a href="https://wikidata.org/">Wikidata</a> triples in a single SPARQL endpoint goes down,
likely in a week or so (even tho we may be behind <a href="https://openalex.org/works?page=1&amp;filter=cites:w2767995756">the citation peak</a>).</p>

<p>The work of the past year helped, for exampe, making it easier to configure Scholia for a different
endpoint and the asynchronous loading of panels (reducing the stress on the SPARQL end point).
Already in September, Prof. <a href="https://github.com/hannahbast">Hannah Bast</a> started
<a href="https://github.com/WDscholia/scholia/pull/2715">a branch</a> for the transition and various
hackathons this autumn, and the work by <a href="https://github.com/KonradLinden">Konrad Linded</a>
who explored and addressed some of the hurdles to take. The tips and suggestions from
Hannah and <a href="https://github.com/RobinTF">RobinTF</a> really made a difference. And also a huge thanks
to <a href="https://orcid.org/0000-0001-9488-1870">Daniel</a> who kept relentlessly pushing this forward.</p>

<p>When I posted my <a href="https://chem-bla-ics.linkedchemistry.info/2025/12/08/rescuing-scholia.html">will we make it</a> post,
there was a demo instance and a spreadsheet showing the state of each query. The instance
showed no human-readable labels. This was because the WDQS <code class="language-plaintext highlighter-rouge">wikibase:label</code> service 
was used a lot, and there is no replacement for that. Getting labels for all relevant
items is possible, but makes the queries a lot heavier and made even more queries
run out of memory. Various solutions were <a href="https://github.com/ad-freiburg/scholia/issues/17">discussed</a>,
Finn indicated he <a href="https://github.com/ad-freiburg/scholia/issues/17#issuecomment-3605952951">preferred a macro solution</a>,
which <a href="https://github.com/ad-freiburg/scholia/pull/20/changes">Lars implemented</a>, and
saw some tweaks after that. Then followed a long series of patches by particularly
<a href="https://github.com/pfps">Peter</a> to update all the SPARQL queries to have them use
the new labels macro. But plenty of other things were fixed or newly implemented,
such as <a href="https://github.com/WolfgangFahl">Wolfgang</a>’s <a href="https://qlever.scholia.wiki/backend">/backend</a>
page.</p>

<p>So, with one week to go, we need your help: as the weekly
<a href="https://www.wikidata.org/wiki/Wikidata:Status_updates/2025_12_29">Wikidata Status Update</a>
already indicated:</p>

<blockquote>
  <p>this month’s Scholia hackathon has moved Scholia closer to its planned switch to a
QLever backend. Beta testers can assist by exploring the
<a href="https://qlever.scholia.wiki/">interim QLever-backed Scholia instance</a>
and <a href="https://github.com/WDscholia/scholia/issues">reporting any issues</a>.</p>
</blockquote>

<p>And thanks to <a href="https://github.com/Adafede">Adriano</a> and others who already have!</p>

<p>Now, we are not done yet. The real instance at <a href="https://scholia.toolforge.org/">scholia.toolforge.org</a>
has seen ridiculous abuse by scrapers (and the main instance is regularly unusable, to be honest),
and we have no idea the new setup is powerful enough. And we need to point to the new servers anyway.
So, plenty of work is left to be done in the next few days.</p>

<p>But we are getting close. So, please give <a href="https://qlever.scholia.wiki/">qlever.scholia.wiki</a>
a go, and let us know your observations. As <a href="https://en.wikipedia.org/wiki/Linus%27s_law">Linus’s law</a> writes:</p>

<blockquote>
  <p>Given enough eyeballs, all bugs are shallow.</p>
</blockquote>]]></content><author><name>Egon Willighagen</name></author><category term="wikidata" /><category term="scholia" /><category term="sparql" /><category term="rdf" /><summary type="html"><![CDATA[Three weeks ago, I wrote a the post Rescuing Scholia: will we make it in time?, where I sketched a future without Scholia. Scholia, started almost 10 years ago and I think it is worth keeping around longer.]]></summary></entry><entry><title type="html">Rescuing Scholia: will we make it in time?</title><link href="https://chem-bla-ics.linkedchemistry.info/2025/12/08/rescuing-scholia.html" rel="alternate" type="text/html" title="Rescuing Scholia: will we make it in time?" /><published>2025-12-08T00:00:00+00:00</published><updated>2025-12-08T00:00:00+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/2025/12/08/rescuing-scholia</id><content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2025/12/08/rescuing-scholia.html"><![CDATA[<p>What <a href="https://chem-bla-ics.linkedchemistry.info/2023/01/27/scholia-timeline.html">started out in 2016 on Twitter</a> became a
<a href="https://meta.wikimedia.org/wiki/Coolest_Tool_Award/Full_history">(small) award winning</a>
<a href="https://chem-bla-ics.linkedchemistry.info/tag/scholia">decade long collaborative project</a>.
Unfortunately, the future is not clear. We are at odds if it will survice the growth of Wikidata
and in particularly the <a href="https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_graph_split">SPARQL graph split</a>.
To be clear, the choice for Blazegraph initially worked great, but after it was bought by a big
company, developed halted. Very unfortunate for Wikidata. Unlike earlier, we no longer have funding, and rewriting Scholia
at this scale takes a good bit of effort. We already
<a href="https://chem-bla-ics.linkedchemistry.info/2025/04/20/the-april-2025-scholia-hackathon.html">held a few hackathons</a>.</p>

<p>So far, we have been able to continue to use a <em>legacy</em> SPARQL endpoint with all the data, but in exactly one month
that endpoint will be sunset. And we are <strong>not</strong> ready.</p>

<h2 id="rescuing-scholia">Rescuing Scholia</h2>
<p>Daniel and Lane have been leading an effort to rescue Scholia. The hackathons were part of this effort. It seems
that <a href="https://en.wikipedia.org/wiki/QLever">QLever</a> is the only route left. Earlier efforts to rewrite the more
than 350 Scholia SPARQL queries to support the graph split have basically failed. The complexity is far too high.
QLever, however, provides the full graph and since recently full SPARQL 1.1 support. That is also not enough to
reproduce the full Scholia functionality, but it seems to get us far.
Importantly, the data may not update as frequently as the <a href="https://www.mediawiki.org/wiki/Wikidata_Query_Service">WDQS</a>,
and that is another complexity to take into account. Particularly, all the 404 pages.</p>

<p>So, in the next weeks, we have to complete rewriting all those queries as queries that QLever can handle. A team
of people have done great work already, <a href="https://github.com/ad-freiburg/scholia/issues?q=is%3Aissue%20author%3AKonradLinden">including Konrad</a>.</p>

<p>I hope we make it in time.</p>]]></content><author><name>Egon Willighagen</name></author><category term="scholia" /><category term="wikidata" /><category term="rdf" /><category term="sparql" /><summary type="html"><![CDATA[What started out in 2016 on Twitter became a (small) award winning decade long collaborative project. Unfortunately, the future is not clear. We are at odds if it will survice the growth of Wikidata and in particularly the SPARQL graph split. To be clear, the choice for Blazegraph initially worked great, but after it was bought by a big company, developed halted. Very unfortunate for Wikidata. Unlike earlier, we no longer have funding, and rewriting Scholia at this scale takes a good bit of effort. We already held a few hackathons.]]></summary></entry><entry><title type="html">The Internet Journal of Chemistry</title><link href="https://chem-bla-ics.linkedchemistry.info/2025/08/11/the-internet-journal-of-chemistry.html" rel="alternate" type="text/html" title="The Internet Journal of Chemistry" /><published>2025-08-11T00:00:00+00:00</published><updated>2025-08-11T00:00:00+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/2025/08/11/the-internet-journal-of-chemistry</id><content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2025/08/11/the-internet-journal-of-chemistry.html"><![CDATA[<p>The <a href="https://scholia.toolforge.org/topic/Q27211732">Internet Journal of Chemistry</a> (IJC, issn:1099-8292) was one of the first scientific journals to get
published on the world wide web (part of <em>the Internet</em>), see doi:<a href="https://doi.org/10.1080/00987913.2000.10764578">10.1080/00987913.2000.10764578</a>.
Issues were published from 1998 to 2004. But because it predates
systematic archiving of webpages by libraries, a lot is lost. The nature of the journal, however, makes it unique, and quite
a number of articles are cited a lot, and should be part of the <em>scientific record</em>.
But I soon realized it actually is quite hard to track down content of the journal. I knew some articles have been
<em>author accepted manuscripts</em> online. One of that was my own first (and single) author-article, self-archived on
Zenodo (doi:<a href="https://doi.org/10.5281/zenodo.1495470">10.5281/zenodo.1495470</a>), green open access style.</p>

<p>I wanted to see what I could recover, and here I describe what I did and what could be done next.</p>

<h2 id="a-list-of-all-articles">A list of all articles</h2>

<p>The first step is actually to create a list of all articles published in the IJC and collect as much metadata about
them as possible. With just over 100 articles, I decided to use Wikidata, as a machine-readable database, supporting the curation and reporting. I wanted at least
two independent sources, and for Wikidata, use public resources. That means, while Web of Science does have a list of
all articles, I only used this for validation, and <strong>not</strong> as information source. Instead, I used citations to IJC
articles and, of course, the Internet Archive (IA). It turns out <a href="https://web.archive.org/web/*/http://www.ijc.com/abstracts/*">a query like this</a>
does wonders (well, for the abstracts; I did not find full-texts archived on IA):</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>https://web.archive.org/web/*/http://www.ijc.com/abstracts/*
</code></pre></div></div>

<p>I found that all but one article had the abstract archived in the IA. Here’s <a href="https://web.archive.org/web/20000925050415/http://www.ijc.com/abstracts/abstract2n8.html">an example</a>:</p>

<p><img src="/assets/images/ia_ijc_abstract.png" alt="" /></p>

<p>This gave my a lot of information to add to Wikidata. Title, publication date, volume, article number, keywords, an absstract,
and, of course, the list of authors. Some authors I know personally, many I did not. But it did allow me to enter all
articles to Wikidata along with the authors and “author” (<a href="https://www.wikidata.org/wiki/Property:P50">P50</a>) or
“author name string” (<a href="https://www.wikidata.org/wiki/Property:P2093">P2093</a>).</p>

<h2 id="the-article-authors">The article authors</h2>

<p>It also turned out that multiple authors listed their IJC article on their public ORCID profile.
That greatly helped identification. I managed to <a href="https://w.wiki/Ezda">link many authors</a> to mostly existing Wikidata items:</p>

<p><img src="/assets/images/ijc_authors.png" alt="" /></p>

<p>I already mentioned that I used Wikidata to collect this information. Besides the <a href="https://scholia.toolforge.org/venue/Q27211732">interactive visualization with Scholia</a>,
it also gave me the option to track my progress with SPARQL queries. For example, <a href="https://w.wiki/Ezdf">this query</a> helped
me do that author FAIR-ification:</p>

<p><img src="/assets/images/ijc_sparql1.png" alt="" /></p>

<p>You can see here two columns with author information, one for P50 and the other for P2093. There is quite some
identification left to be done, and additional information is welcome:</p>

<p><img src="/assets/images/ijc_sparql2.png" alt="" /></p>

<h2 id="sources">Sources</h2>

<p>So, that brings us to this list of sources:</p>

<ul>
  <li>Internet Archive: abstracts and metadata</li>
  <li>ORCID profiles: ORCIDs of (some) authors</li>
  <li>Google Scholar: metadata and citations</li>
  <li>Web of Science: independent list for external validation</li>
</ul>

<p>Because there is plenty of work left to be done and I hope the collected information will further spread
in library collections, I added sources as much as possible. <a href="https://w.wiki/Em9i">This query</a> lists for all
articles the Web of Science identifier (recorded so that everyone can check the consistency), the link
to the Internet Archive-d abstract page, and a link to a known full text (five).</p>

<p>If you wonder, neither <a href="https://openalex.org/works?page=1&amp;filter=primary_location.source.id:s32147083">OpenAlex</a>
or <a href="https://europepmc.org/search?query=JOURNAL%3A%28%22Internet%20Journal%20of%20Chemistry%22%29">Europe PMC</a> have a full list.</p>

<h2 id="whats-next">What’s next?</h2>

<p>I do not have a formal training in archiving, but I am happy with the minimal viable metadata collection.
I know more can be done (and love to hear your pointers and suggestions): more author identies,
better coverage of keyword annotation, etc. But I think an important addition is adding citations
to and from the IJC articles are important. The journal predates efforts like the <a href="https://i4oc.org/">I4OC</a> and
<a href="https://opencitations.net/">Open Citations</a>, so I may have to manually recover citations from Google Scholar.
I will have to report on that later. But you can enjoy the citations that are
<a href="https://scholia.toolforge.org/venue/Q27211732#Citations">already there</a>. And now that we have sufficient metadata,
I can use this to find more full texts.</p>

<p>Btw, I have made contact with Prof. <a href="https://scholia.toolforge.org/author/Q28420106">Steven Bachrach</a>,
who founded the journal and was the Editor-in-Chief.</p>]]></content><author><name>Egon Willighagen</name></author><category term="publishing" /><category term="wikidata" /><category term="scholia" /><category term="doi:10.5281/ZENODO.1495470" /><category term="cito:citesAsEvidence:10.1080/00987913.2000.10764578" /><category term="europepmc" /><summary type="html"><![CDATA[The Internet Journal of Chemistry (IJC, issn:1099-8292) was one of the first scientific journals to get published on the world wide web (part of the Internet), see doi:10.1080/00987913.2000.10764578. Issues were published from 1998 to 2004. But because it predates systematic archiving of webpages by libraries, a lot is lost. The nature of the journal, however, makes it unique, and quite a number of articles are cited a lot, and should be part of the scientific record. But I soon realized it actually is quite hard to track down content of the journal. I knew some articles have been author accepted manuscripts online. One of that was my own first (and single) author-article, self-archived on Zenodo (doi:10.5281/zenodo.1495470), green open access style.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://chem-bla-ics.linkedchemistry.info/assets/images/ia_ijc.png" /><media:content medium="image" url="https://chem-bla-ics.linkedchemistry.info/assets/images/ia_ijc.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">PFAS in the blood of the Dutch population</title><link href="https://chem-bla-ics.linkedchemistry.info/2025/07/06/pfas-in-the-blood-of-the-dutch-population.html" rel="alternate" type="text/html" title="PFAS in the blood of the Dutch population" /><published>2025-07-06T00:00:00+00:00</published><updated>2025-07-06T00:00:00+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/2025/07/06/pfas-in-the-blood-of-the-dutch-population</id><content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2025/07/06/pfas-in-the-blood-of-the-dutch-population.html"><![CDATA[<p>A recent report by the Dutch <a href="https://www.rivm.nl/">RIVM</a>, <em>PFAS in the blood of the Dutch population</em>
(doi:<a href="https://www.rivm.nl/bibliotheek/rapporten/2025-0094.pdf">10.21945/RIVM-2025-0094</a>), writes
that seven <a href="https://scholia.toolforge.org/chemical-class/Q648037">PFAS</a> compounds are found in blood samples
of all tested people. Another nine compounds are found in at least 1-in-10 people.
Because there is relevant data in the report on the 28 studied PFAS compound, I wanted to
have the report more FAIR than it is on the website. Why this report? Well, the chemistry and the
history is fascinating and brutal (I like <a href="https://www.youtube.com/watch?v=SC2eSujzrUY">this Veritasium video</a>).</p>

<p>The history tells me that our society may sound woke and leftish, in reality it is a continous fight
for basic human rights. (Something that plenty have been saying for years.)
In this case, a healtht life is the human right.</p>

<p>So, what can I do to make this report more FAIR?</p>

<h2 id="findable-in-wikidata">Findable in Wikidata</h2>

<p>Since this report has been <a href="https://news.google.com/search?q=PFAS%20in%20the%20blood%20of%20the%20Dutch%20population&amp;hl=en-US&amp;gl=US&amp;ceid=US%3Aen">mentioned in the news</a>,
it clearly is notable. The simplest thing to do is thus to just add it <a href="https://www.wikidata.org/wiki/Wikidata:Main_Page">Wikidata</a>.
Because the DOI of the report had not been recorded yet, I could not let <a href="https://scholia.toolforge.org/">Scholia</a>
do it for me. But doing it manually is only a bit more work: <a href="https://www.wikidata.org/wiki/Q135222054">Q135222054</a>.
The provided metadata <a href="https://www.rivm.nl/en/news/first-nationwide-study-into-pfas-in-blood">on the RIVM website</a>
is minimal.</p>

<p>But we can do more. Particularly, because I want people to find this report when they look info knowledge
about the 28 studied chemicals, I added <a href="https://www.wikidata.org/wiki/Q135222054#P921">main subject</a> annotation
using the information in <em>Table 1</em> in the report. Using Scholia and the CAS registry number in the table,
I crosscheck the information in Wikidata is consistent with the report (and visa versa). It was.
I then added the Dutch name and acronym for most of them. Some already had the name as in the Table.
That gives us a nice “Topic scores” plot for <a href="https://scholia.toolforge.org/work/Q135222054">the Scholia page of the report</a>:</p>

<p><img src="/assets/images/pfas_report.png" alt="" /></p>

<p>The central PFAS bubble is also only one <em>main subject</em> but larger because many the specific PFAS compounds
are subclassing PFAS. And you may also note many smaller bubbles. These actually come from <em>main subject</em>
annotations of articles cited from the report. Because I added a few of them too. Not all, because many are
not in Wikidata (yet).</p>

<h2 id="findable-in-wikipathways">Findable in WikiPathways</h2>

<p>But since 16 of these compounds are readily found in human blood samples, that is handy knowledge when
doing metabolomics (on blood samples). Or (and I leave that to later blog post), we can map the experimental
data for Dordrecht versus the rest of The Netherlands to the PFAS compounds. That is relevant to research
by <a href="https://vhp4safety.nl">VHP4Safety</a>. There are many ways to see if you have PFAS in your dataset,
but since we have many controlled lists of genes in metabolites, I added one for common PFAS in human
blood samples. Well, the 16 common in Dutch blood samples:</p>

<p><img src="/assets/images/pfas_wikipathways.png" alt="" /></p>

<p>Each <em>metabolite</em> here is annotated with their Wikidata identifier, allowing us to map experimental
data on top of it. And we get links out to other databases almost for free:</p>

<p><img src="/assets/images/pfas_wikipathways_outlinks.png" alt="" /></p>

<p>And the link to Wikidata actually links to Scholia, so for the PFOA in the above example,
we can quickly see the boiling point, decomposition point, and melting point of this PFAS.
And literature with undoubtedly even more knowledge about this PFAS:</p>

<p><img src="/assets/images/pfas_scholia.png" alt="" /></p>

<p>Now, these two steps were mostly manual: drawing <a href="https://classic.wikipathways.org/index.php/Pathway:WP5579">WP5579</a>
in WikiPathways and adding the report annotations (<em>main subject</em> and <em>cites</em>) in Wikidata.</p>

<h2 id="findable-in-the-vhp4safety-compound-wiki">Findable in the VHP4Safety Compound Wiki</h2>

<p>As part of the VHP4Safety project, I am collecting information on chemicals studied in the context
of toxicology, safety, and risk assessment. Often specific collections of compounds studied as a whole.
This report is such a collection and provides experimental data on these compounds. So, I want this
report to be findable for the <a href="https://compoundcloud.wikibase.cloud/">VHP4Safety Compound Wiki</a> too.
Creating the collection is a manual step: <a href="https://compoundcloud.wikibase.cloud/wiki/Item:Q5145">Q5145</a>.</p>

<p>Now, because both Wikidata and our VHP4Safety Compound Wiki (a Wikibase instance) are semantic and support, I can use SPARQL
to create instructions to link the 28 compounds to the new collection. Now, arguably, that can be
done manually too, and maybe faster, for larger collections this is harder. So, I dug up
<a href="https://compoundcloud.wikibase.cloud/wiki/User:Egonw">my earlier notes</a> and got some useful
things together.</p>

<p>This query lists all 28 PFAS linked to the report <a href="https://w.wiki/Eepm">in Wikidata</a>:</p>

<div class="language-sparql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span><span class="w"> </span><span class="nv">?pfas</span><span class="w"> </span><span class="nv">?pfasLabel</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="p">{</span><span class="w">
  </span><span class="nn">wd</span><span class="o">:</span><span class="ss">Q135222054</span><span class="w"> </span><span class="nn">wdt</span><span class="o">:</span><span class="ss">P921</span><span class="w"> </span><span class="nv">?pfas</span><span class="w"> </span><span class="p">.</span><span class="w">
  </span><span class="nv">?pfas</span><span class="w"> </span><span class="nn">wdt</span><span class="o">:</span><span class="ss">P31</span><span class="w"> </span><span class="nn">wd</span><span class="o">:</span><span class="ss">Q113145171</span><span class="w"> </span><span class="p">.</span><span class="w">
  </span><span class="k">SERVICE</span><span class="w"> </span><span class="nn">wikibase</span><span class="o">:</span><span class="ss">label</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nn">bd</span><span class="o">:</span><span class="ss">serviceParam</span><span class="w"> </span><span class="nn">wikibase</span><span class="o">:</span><span class="ss">language</span><span class="w"> </span><span class="s2">"[AUTO_LANGUAGE],mul,en"</span><span class="p">.</span><span class="w"> </span><span class="p">}</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<p>Using federation powers, I can use this for <a href="https://edu.nl/ar9wf to match these up with our Wikibase">a SPARQL query</a>,
and return the results in QuickStatements that say <em>this VHP compound is part of the VHP collection</em>:</p>

<div class="language-sparql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">PREFIX</span><span class="w"> </span><span class="nn">wb</span><span class="o">:</span><span class="w"> </span><span class="nn">&lt;https://compoundcloud.wikibase.cloud/entity/&gt;</span><span class="w">
</span><span class="k">PREFIX</span><span class="w"> </span><span class="nn">wbt</span><span class="o">:</span><span class="w"> </span><span class="nn">&lt;https://compoundcloud.wikibase.cloud/prop/direct/&gt;</span><span class="w">

</span><span class="k">SELECT</span><span class="w"> </span><span class="p">(</span><span class="nb">SUBSTR</span><span class="p">(</span><span class="nb">STR</span><span class="p">(</span><span class="nv">?cmp</span><span class="p">),</span><span class="mi">45</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="nv">?qid</span><span class="p">)</span><span class="w"> </span><span class="nv">?P21</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="p">{</span><span class="w">
  </span><span class="nv">?cmp</span><span class="w"> </span><span class="nn">wbt</span><span class="o">:</span><span class="ss">P5</span><span class="w"> </span><span class="nv">?wikidata</span><span class="w"> </span><span class="p">.</span><span class="w">
  </span><span class="k">SERVICE</span><span class="w"> </span><span class="nn">&lt;https://query.wikidata.org/sparql&gt;</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nn">wd</span><span class="o">:</span><span class="ss">Q135222054</span><span class="w"> </span><span class="nn">wdt</span><span class="o">:</span><span class="ss">P921</span><span class="w"> </span><span class="nv">?pfas</span><span class="w"> </span><span class="p">.</span><span class="w">
    </span><span class="nv">?pfas</span><span class="w"> </span><span class="nn">wdt</span><span class="o">:</span><span class="ss">P31</span><span class="w"> </span><span class="nn">wd</span><span class="o">:</span><span class="ss">Q113145171</span><span class="w"> </span><span class="p">.</span><span class="w">
    </span><span class="k">BIND</span><span class="w"> </span><span class="p">(</span><span class="nb">substr</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="nv">?pfas</span><span class="p">),</span><span class="mi">32</span><span class="p">)</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="nv">?wikidata</span><span class="p">)</span><span class="w">
  </span><span class="p">}</span><span class="w">
  </span><span class="k">BIND</span><span class="w"> </span><span class="p">(</span><span class="s2">"Q5145"</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="nv">?P21</span><span class="p">)</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<p>I actually had to add 5 PFAS compounds in the VHP4Safety Compound Wiki first. That follows the
<a href="https://chem-bla-ics.linkedchemistry.info/2016/03/20/adding-disclosures-to-wikidata-with.html">same procedure for how I have been adding chemical compounds to Wikidata</a>
(see also <a href="https://doi.org/10.26434/chemrxiv-2025-53n0w">this preprint</a>).
The input <code class="language-plaintext highlighter-rouge">cas.smi</code> has the (missing) SMILES, Wikidata QID, and English label:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>C(CS(=O)(=O)O)C(C(C(C(C(C(F)(F)F)(F)F)(F)F)(F)F)(F)F)(F)F       Q27063662       6:2 Fluorotelomer sulfonate
CN(CC(=O)O)S(=O)(=O)C(C(C(C(C(C(F)(F)F)(F)F)(F)F)(F)F)(F)F)(F)F Q126605979      MeFHxSAA
CN(CC(=O)O)S(=O)(=O)C(C(C(C(F)(F)F)(F)F)(F)F)(F)F       Q126682412      MeFBSAA
C(=O)(C(C(F)(F)F)(F)OC(C(C(F)(F)F)(F)F)(F)F)O[H]        Q29387971       2,3,3,3-tetrafluoro-2-(heptafluoropropoxy)propanoic acid
C(C(C(=O)O)(F)F)(OC(C(C(OC(F)(F)F)(F)F)(F)F)(F)F)F      Q81981675       4,8-Dioxa-3H-perfluorononanoic acid
</code></pre></div></div>

<p>For reference, this is the command line I used to create QuickStatement instructions:</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>groovy createWDitemsFromSMILES.groovy <span class="nt">-w</span> compoundcloud.wikibase.cloud <span class="nt">-c</span> Q2368 <span class="nt">-d</span> P5 <span class="nt">-l</span> <span class="nt">-i</span> wikidata <span class="nt">-a</span> P11
</code></pre></div></div>

<h2 id="final-remark">Final remark</h2>

<p>Are these 16 the only PFAS in our body? With 28 studied out of <a href="https://doi.org/10.1021/acs.est.3c04855">a potential seven million</a>,
I doubt it.</p>]]></content><author><name>Egon Willighagen</name></author><category term="pfas" /><category term="chemistry" /><category term="fair" /><category term="scholia" /><category term="wikidata" /><category term="vhp4safety" /><category term="doi:10.26434/CHEMRXIV-2025-53N0W" /><category term="cito:citesAsRecommendedReading:10.1021/acs.est.3c04855" /><summary type="html"><![CDATA[A recent report by the Dutch RIVM, PFAS in the blood of the Dutch population (doi:10.21945/RIVM-2025-0094), writes that seven PFAS compounds are found in blood samples of all tested people. Another nine compounds are found in at least 1-in-10 people. Because there is relevant data in the report on the 28 studied PFAS compound, I wanted to have the report more FAIR than it is on the website. Why this report? Well, the chemistry and the history is fascinating and brutal (I like this Veritasium video).]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://chem-bla-ics.linkedchemistry.info/assets/images/pfas_report.png" /><media:content medium="image" url="https://chem-bla-ics.linkedchemistry.info/assets/images/pfas_report.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Retracted articles cited in Wikipedia</title><link href="https://chem-bla-ics.linkedchemistry.info/2025/06/08/retracted-articles-cited-in-wikipedia.html" rel="alternate" type="text/html" title="Retracted articles cited in Wikipedia" /><published>2025-06-08T00:10:00+00:00</published><updated>2025-06-08T00:10:00+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/2025/06/08/retracted-articles-cited-in-wikipedia</id><content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2025/06/08/retracted-articles-cited-in-wikipedia.html"><![CDATA[<p>Last week, the <a href="https://www.wikidata.org/wiki/Event:Wikidata_and_Sister_Projects">Wikidata and Sister Projects</a> event tooks place.
The presentations are recorded, and I strongly encourage you to check the schedule. One presentation I liked (there are more),
was the one by <a href="https://www.wikidata.org/wiki/User:Mike_Peel">Mike Peel</a> with the title
<em>“Best practices for reusing Wikidata’s data in the Wikimedia Projects”</em>. At some point he walks us through the
<a href="https://en.wikipedia.org/wiki/Template:Cite_Q">{{Cite Q}}</a> template, <a href="https://www.youtube.com/live/xanSjW30g2o?feature=shared&amp;t=1561">around 26:07</a>.</p>

<p>I learned that this template will highlight when an article cited in Wikipedia is actually retracted (withdrawn or replaced).
Now, for the past months, I have been using the Crossref API to the <a href="http://retractiondatabase.org">Retraction Watch Database</a>
and annotated thousands of articles as retracted, using URLs from the database as reference. I use
<a href="https://github.com/egonw/ons-wikidata/blob/main/RetractionWatch/quickstatements.groovy">this script</a>.</p>

<p>So, that means that this work actually has had a massive impact. Perhaps thousands of (English) Wikipedia
readers have seen the results from running that script. That is pretty awesome! This is why we do open science.</p>

<p>But it made me also wonder something else. The Retraction Watch Database has over 60 thousand articles and
Wikidata only about 22 thousand (at the time of writing). What if Wikipedia has an article not in Wikidata?
Well, obviously, it cannot use <code class="language-plaintext highlighter-rouge">{{Cite Q}}</code>. But wouldn’t we want to have that article in Wikidata? Clearly,
the article is notable; at least, in Wikipedia notability-sense. So, I was wondering, of those 40 thousand
retracted articles not in Wikidata, how many are cited in English Wikipedia (to start with).</p>

<p>So, I wrote <a href="https://github.com/egonw/ons-wikidata/blob/main/RetractionWatch/listRetractionsNotInWikidata.groovy">a first script</a>
that lists DOIs in the Retraction Watch Database (via the CrossRef API downloaded list) that are not found
in Wikidata. <a href="https://github.com/egonw/ons-wikidata/blob/main/RetractionWatch/searchMissingInWikipedia.groovy">A second script</a>
uses a Scholia (doi:<a href="https://doi.org/10.1007/978-3-319-70407-4_36">10.1007/978-3-319-70407-4_36</a>) SPARQL query develop by Finn Nielsen that
<a href="https://github.com/WDscholia/scholia/commit/caf2694a4">uses wikibase:mwapi to do an efficient DOI search</a>.</p>

<p>The results are fascinating and this is the list of DOIs found:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Searching 10.1016/j.engfailanal.2021.105457 in Wikipedia...
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Filippo_Berto
Searching 10.1002/14651858.CD002291 in Wikipedia...
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Antipruritic
Searching 10.1007/s12115-020-00496-1 in Wikipedia...
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Lawrence_Mead
Searching 10.1515/9783110619768 in Wikipedia...
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Dead_Eagle_Owl
Searching 10.1016/j.sbi.2022.102426 in Wikipedia...
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Deborah_F._Kelly
Searching 10.1371/journal.pone.0240851 in Wikipedia...
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Industrialization_of_China
Searching 10.1080/00927670309601525 in Wikipedia...
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Rouben_Azizian
Searching 10.1080/14693062.2016.1179616 in Wikipedia...
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Benjamin_K._Sovacool
Searching 10.1038/s41390-022-02127-3 in Wikipedia...
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Harald_Walach
Searching 10.1001/jamapediatrics.2021.2659 in Wikipedia...
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Harald_Walach
Searching 10.1109/CCECE.2007.335 in Wikipedia...
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/List_of_scientific_misconduct_incidents
Searching 10.1002/14651858.CD001834 in Wikipedia...
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Meningococcal_vaccine
Searching 10.1007/s11356-021-16530-6 in Wikipedia...
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Kamran_Bagheri_Lankarani
Searching 10.1016/j.biortech.2023.129044 in Wikipedia...
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Ashok_Pandey
Searching 10.1002/14651858.CD003614 in Wikipedia...
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Trimetazidine
Searching 10.1002/14651858.CD003808 in Wikipedia...
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Hall_Technique
Searching 10.1002/14651858.CD003747 in Wikipedia...
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Venous_thrombosis
Searching 10.1002/14651858.CD003711 in Wikipedia...
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Myocarditis
Searching 10.1002/14651858.CD003225 in Wikipedia...
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Prevention_of_migraine_attacks
Searching 10.1002/14651858.CD003226 in Wikipedia...
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Prevention_of_migraine_attacks
Searching 10.1002/14651858.CD003498 in Wikipedia...
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/A2_milk
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Autism
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Autism_therapies
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Casein
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Casomorphin
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Causes_of_autism
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Gluten-free%2C_casein-free_diet
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Gluten-free_diet
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Opioid_excess_theory
Searching 10.1093/restud/rdy054 in Wikipedia...
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Managerial_economics
Searching 10.1039/d1nr00388g in Wikipedia...
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Deborah_F._Kelly
Searching 10.1139/p90-116 in Wikipedia...
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Canada%27s_Stonehenge
Searching 10.3791/64256 in Wikipedia...
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Thomas_J._Webster
Searching 10.1016/j.biortech.2022.127565 in Wikipedia...
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Ashok_Pandey
Searching 10.1016/j.anbehav.2015.04.001 in Wikipedia...
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Stegodyphus_dumicola
Searching 10.1016/j.tafmec.2022.103573 in Wikipedia...
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Filippo_Berto
Searching 10.1155/2022/7002630 in Wikipedia...
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Dopamine_receptor_D3
Searching 10.1007/s11223-017-9884-2 in Wikipedia...
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Filippo_Berto
Searching 10.1016/j.forsciint.2024.112115 in Wikipedia...
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Peter_A._McCullough
Searching 10.1002/14651858.CD002778 in Wikipedia...
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Temporomandibular_joint_dysfunction
Searching 10.1016/j.jcv.2022.105248 in Wikipedia...
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Kay_Davies
Searching 10.1002/14651858.CD002916 in Wikipedia...
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Bleomycin
Searching 10.1007/s11756-021-00841-7 in Wikipedia...
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/2021_in_archosaur_paleontology
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Argentinadraco
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Wellnhopterus
Searching 10.4132/KoreanJPathol.2009.43.4.306 in Wikipedia...
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Cho_Kuk
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Cho_Min_academic_credentials_scandal
Searching 10.1503/cmaj.80742 in Wikipedia...
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/LetUsTalk
Searching 10.1002/14651858.CD004125 in Wikipedia...
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Granisetron
Searching 10.1111/ffe.12616 in Wikipedia...
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Filippo_Berto
Searching 10.1007/s00366-008-0118-x in Wikipedia...
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Kamran_Daneshjoo
Searching 10.1002/jmv.28097 in Wikipedia...
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Noora_%28vaccine%29
Searching 10.1126/science.1070563 in Wikipedia...
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Sch%C3%B6n_scandal
Searching 10.1246/cl.170853 in Wikipedia...
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Enantioselective_Iridium-Catalyzed_C-H_Borylation
Searching 10.1016/j.marpol.2017.06.032 in Wikipedia...
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Ray_Hilborn
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Tony_J._Pitcher
Searching 10.1016/j.ijfatigue.2021.106450 in Wikipedia...
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Filippo_Berto
Searching 10.1016/j.crphar.2022 in Wikipedia...
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Desidustat
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Emoxypine
Searching 10.1007/s10479-023-05261-1 in Wikipedia...
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Artificial_intelligence_marketing
Searching 10.1257/aer.20210369 in Wikipedia...
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Dividend_tax
Searching 10.1109/AIMSEC.2011.6010222 in Wikipedia...
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/SAP_CRM
Searching 10.1002/14651858.CD001103 in Wikipedia...
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Hydrocolloid_dressing
Searching 10.1007/978-3-642-27708-5 in Wikipedia...
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/IP_Multimedia_Subsystem
Searching 10.1351/PAC-CON-08-12-06 in Wikipedia...
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Parthenocissus_tricuspidata
Searching 10.1002/advs.202204315 in Wikipedia...
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Tetrataenite
Searching 10.1353/sho.2011.0038 in Wikipedia...
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Maus
Searching 10.1111/jpim.12058 in Wikipedia...
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Ulrich_Lichtenthaler
Searching 10.1136/bcr-2021-241572 in Wikipedia...
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/COVID-19_proxalutamide_trial_in_Brazil
Searching 10.1038/s44160-022-00068-7 in Wikipedia...
  ... found in this Wikipedia page: https://en.wikipedia.org/wiki/Single-layer_materials
</code></pre></div></div>]]></content><author><name>Egon Willighagen</name></author><category term="wikidata" /><category term="wikipedia" /><category term="doi" /><category term="doi:10.1007/978-3-319-70407-4_36" /><summary type="html"><![CDATA[Last week, the Wikidata and Sister Projects event tooks place. The presentations are recorded, and I strongly encourage you to check the schedule. One presentation I liked (there are more), was the one by Mike Peel with the title “Best practices for reusing Wikidata’s data in the Wikimedia Projects”. At some point he walks us through the {{Cite Q}} template, around 26:07.]]></summary></entry><entry><title type="html">New preprint: “Scholia Chemistry: access to chemistry in Wikidata”</title><link href="https://chem-bla-ics.linkedchemistry.info/2025/05/25/new-preprint-scholia-chemistry-access-to-chemistry-in-wikidata.html" rel="alternate" type="text/html" title="New preprint: “Scholia Chemistry: access to chemistry in Wikidata”" /><published>2025-05-25T00:00:00+00:00</published><updated>2025-05-25T00:00:00+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/2025/05/25/new-preprint-scholia-chemistry-access-to-chemistry-in-wikidata</id><content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2025/05/25/new-preprint-scholia-chemistry-access-to-chemistry-in-wikidata.html"><![CDATA[<p>Two week ago I uploaded a paper that has been in the works for some time. In fact, I first mention it as conference paper
for the special issue of the <a href="https://scholia.toolforge.org/event/Q47501229">11th International Conference on Chemical Structures</a>,
you know, the meeting held in 2018, of which <a href="https://iccs-nl.org/">the 13th edition</a> starts in 7 days. I had a
<a href="https://doi.org/10.6084/m9.figshare.6356027.v1">poster</a> at that conference which I described in
<a href="https://chem-bla-ics.linkedchemistry.info/2018/08/18/compound-class-identifiers-in-wikidata.html">this blog post</a>.</p>

<p>In turn, that poster described work of at least three years, going back to
<a href="https://chem-bla-ics.linkedchemistry.info/2015/12/22/new-edition-getting-cas-registry.html">adding identifiers in 2015</a>
and <a href="https://chem-bla-ics.linkedchemistry.info/2016/01/27/adding-chemical-compound-to-wikidata.html">chemical structures in early 2016</a>.
I started <a href="https://chem-bla-ics.linkedchemistry.info/2016/03/20/adding-disclosures-to-wikidata-with.html">using scripts two months later</a>.
This helped a lot with <a href="https://chem-bla-ics.linkedchemistry.info/2016/03/27/migrating-pka-data-from-drugmet-to.html">migrating pKa data</a>
from a custom Semantic MediaWiki installation to Wikidata and with adding thousands of EPA CompTox
<a href="https://chem-bla-ics.blogspot.com/2017/01/epa-comptox-dashboard-ids-in-wikidata.html">identifiers in 2017</a>.</p>

<p>But that 2018 conference paper never happened. Because <a href="https://chem-bla-ics.linkedchemistry.info/2017/10/15/two-conference-proceedings.html">Scholia did</a>.
And even on the ICCS poster, Scholia was used to visualize chemistry data in Wikidata. To be honest, not just that,
of course. About a year ago I had a serious go at finishing the paper, and it was sent around to co-authors.
But I realized at the time, that the paper was lacking some good suggestions how the peer review our
actual contributions to Wikidata. I could hardly expect readers of the paper browse the individual
histories of all, by then, 1.3 million chemical compounds. And during the holidays I collected a few
tools, which I had lined up to add to the manuscript.</p>

<p>However, another thing happened, the COVID-19 pandemic. While all the experience helped a lot with getting
knowledge together around SARS-CoV-2, it also made something else clear: the software behind Wikidata
does not scale well (enough). This lead to plans to split the RDF graph representation into two
separate SPARQL endpoints. And that breaks many, if not most, of Scholia’s SPARQL queries, including
those for the chemistry aspects. The situation in Summer 2024 was that there was a significant
chance Scholia would not survive the split. And the <em>Scholia Chemistry</em> paper had to wait. You
cannot publish an article of which the website is gone before it is formally accepted.</p>

<p>Let me make clear, this graph split is not solved and the risk is not gone. But a serious of unfunded,
weekend hackathons allows us to refactor Scholia to give us a chance. It started with
<a href="https://chem-bla-ics.linkedchemistry.info/2024/08/23/scholia.html">making Scholia more configurable</a>.
We had the first hackathons in October and November, and I had
<a href="https://chem-bla-ics.linkedchemistry.info/2025/04/20/the-april-2025-scholia-hackathon.html">four more hackathon weekends</a>
this April.</p>

<p>The graph split into a main graph and a scholarly graph <a href="https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_graph_split">happened on May 9</a>.
Currently, we have been granted extra time and can use a legacy server with the full graph, but a lot
less hardware, so slower. A final patch, merged in last week, allows us to define which SPARQL endpoint a query
should run. So, each time we port a SPARQL query, we can directly update Scholia, making the migration
somewhat more manageable.</p>

<p>But, with those uncertainties out of the way, it was time to finish the Scholia Chemistry paper!</p>

<p>The preprint (doi:<a href="https://doi.org/10.26434/chemrxiv-2025-53n0w">10.26434/chemrxiv-2025-53n0w</a>) brings
10 years of research together, and describes details of the used methods not formally peer-reviewed before.
We describe in detail how chemical structures are added, the choices of Wikidata on how to
represent chemical structures, how we curate the quality, and how we visualize chemical structures
and data with Scholia. As you can expect, the Chemistry Development Kit has an important role,
along with the InChI.</p>

<p>The paper introduces three new Scholia <em>aspects</em> for chemicals, chemical classes, and elements.
Each aspect is a template for a page with information about molecular entities and chemical substances,
compound classes (like <em>fatty acids</em>), and elements (like carbon). Each template provides relevant
information. Of course, any compound, class, or element can also still be opened in the Scholia
“topic” aspect, listing relevant literature.</p>

<p>With this paper we aim to show that Wikidata is a innovative platform that meets the needs for
a chemical structure database, with detailed data provenance, and scalable community curation.</p>

<p>I welcome your strongest peer review on the preprint. I don’t liking settling for anything less.
Here’s the abstract:</p>

<blockquote>
  <p>Sharing knowledge on chemicals in the digital age has been the playground of databases such
as the Chemical Abstract Services and PubChem. Wikipedia complements this field by providing
context to chemicals aimed at a broad audience, but is not easily read by machines. Wikidata
was started as a database service to improve the machine readability of the knowledge captured
in Wikipedia. Wikidata has an open license, application programming interfaces, and a strong
provenance model. Scholia uses the features to provide access to chemical knowledge. This
study reviews the chemistry in Wikidata, shows how thousands of new chemicals were added,
extends Wikidata with new properties for chemical representation and external links to
additional databases, and shows how we extended Scholia to represent the chemistry in Wikidata.</p>
</blockquote>

<p>Thanks to Finn, Denise, Daniel, and Adriano!</p>]]></content><author><name>Egon Willighagen</name></author><category term="wikidata" /><category term="scholia" /><category term="chemistry" /><category term="iccs" /><category term="cito:citesAsEvidence:10.6084/m9.figshare.6356027.v1" /><category term="doi:10.26434/CHEMRXIV-2025-53N0W" /><summary type="html"><![CDATA[Two week ago I uploaded a paper that has been in the works for some time. In fact, I first mention it as conference paper for the special issue of the 11th International Conference on Chemical Structures, you know, the meeting held in 2018, of which the 13th edition starts in 7 days. I had a poster at that conference which I described in this blog post.]]></summary></entry><entry><title type="html">Retracted articles in Wikidata</title><link href="https://chem-bla-ics.linkedchemistry.info/2025/02/16/retraction-data-in-wikidata.html" rel="alternate" type="text/html" title="Retracted articles in Wikidata" /><published>2025-02-16T00:00:00+00:00</published><updated>2025-02-16T00:00:00+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/2025/02/16/retraction-data-in-wikidata</id><content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2025/02/16/retraction-data-in-wikidata.html"><![CDATA[<p>A good number of years ago, a colleague and I explored if we could get access to the <a href="retractiondatabase.org/">Retraction Watch Database</a>,
but we could not afford it. We have been using data on retractions for curate our databases, like
<a href="https://www.wikipathways.org/">WikiPathways</a>. A database should not contain knowledge based on (only) a retracted article.
Wikidata, btw, has a small number (499) of statements supported by retracted articles. Similarly, it turns out that I am
<a href="https://w.wiki/8pwe">citing retracted articles in two papers</a> (and a preprint of one of them).</p>

<p><a href="https://www.wikidata.org/">Wikidata</a> has a good number of retracted articles in their database
(<a href="https://scholia.toolforge.org/statistics">some 21 thousand at the time of writing</a>). A lot of this data
comes from CrossRef, that recently <a href="https://www.crossref.org/blog/news-crossref-and-retraction-watch/">acquired the Retraction Watch Database</a>
(doi:<a href="https://doi.org/10.13003/c23rw1d9">10.13003/c23rw1d9</a>)) and started providing the content as FAIR and Open data.
With <a href="https://github.com/egonw/ons-wikidata/blob/main/RetractionWatch/quickstatements.groovy">a Bacting-based script</a>
I am regularly updating Wikidata with annotations from CrossRef, giving a rich dataset in Wikidata around
the queries. Over the past few years I have written various SPARQL queries to show the results which today
I <a href="https://bigcat-um.github.io/sparql-examples/examples/WikidataRetractions/">collected under a single home</a>:</p>

<p><img src="/assets/images/retraction_SPARQL.png" alt="" /></p>]]></content><author><name>Egon Willighagen</name></author><category term="wikidata" /><category term="wikipathways" /><category term="doi:10.1093/NAR/GKAD960" /><category term="cito:citesAsEvidence:10.13003/c23rw1d9" /><summary type="html"><![CDATA[A good number of years ago, a colleague and I explored if we could get access to the Retraction Watch Database, but we could not afford it. We have been using data on retractions for curate our databases, like WikiPathways. A database should not contain knowledge based on (only) a retracted article. Wikidata, btw, has a small number (499) of statements supported by retracted articles. Similarly, it turns out that I am citing retracted articles in two papers (and a preprint of one of them).]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://chem-bla-ics.linkedchemistry.info/assets/images/retraction_SPARQL.png" /><media:content medium="image" url="https://chem-bla-ics.linkedchemistry.info/assets/images/retraction_SPARQL.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Adding citations between existing articles in Wikidata</title><link href="https://chem-bla-ics.linkedchemistry.info/2024/09/07/wikidata-citations.html" rel="alternate" type="text/html" title="Adding citations between existing articles in Wikidata" /><published>2024-09-07T00:00:00+00:00</published><updated>2024-09-07T00:00:00+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/2024/09/07/wikidata-citations</id><content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2024/09/07/wikidata-citations.html"><![CDATA[<p>Scholarly articles provide context to the factualness of statements in <a href="https://wikidata.org/">Wikidata</a>,
similar to the <a href="https://en.wikipedia.org/wiki/Citation_needed">[citation needed]</a> in <a href="https://en.wikipedia.org/wiki/">Wikipedia</a>.
And just like the cited references in each scholarly article itself. The citation network is general seen
as an essential part of (doing) science, even without <a href="https://chem-bla-ics.linkedchemistry.info/tag/cito">citation intention annotation</a>.
Nowadays, citations are mostly open, but this took very serious lobbying by the <a href="https://i4oc.org/">Initiative for Open Citations</a> and
<a href="https://chem-bla-ics.linkedchemistry.info/2018/11/17/join-me-in-encouraging-acs-to-join.html">not every publisher reacted immediately</a>.
But now that they are open, projects like <a href="https://opencitations.net/">OpenCitations</a> are making this citation
network FAIR.</p>

<p>Therefore, when an article is cited as reference in Wikidata, I think that the articles (and other research output)
cited in that article is part of the reference. After all, it is really hard to understand any article without the details
in the cited articles. So, getting these citations between article into Wikidata deepens the knowledge captured
by Wikidata. Of course, Wikidata is also one of the few places where we can capture the citation intentions at all.</p>

<p>Adding these citations manually is cumbersome but <a href="https://chem-bla-ics.linkedchemistry.info/2023/08/08/history-provenance-detail.html">sometimes needed</a>
as these citations are not open or not FAIR yet. Fortunately, in many cases we can automate the process, for
which I wrote a <a href="https://chem-bla-ics.linkedchemistry.info/tag/bioclipse">Bacting</a>-cased
<a href="https://github.com/egonw/ons-wikidata/blob/main/OpenCitations/quickstatements.groovy">script</a>.
Until recently, the script takes as input a single DOI or a list of DOIs as input, and for each DOI
looks up in OpenCitations if it cites other article DOIs and is cited by other DOIs. For the
cited and citing DOIs it checks if those are in Wikidata and (only) if they are in Wikidata,
then it create QuickStatements. The result can look like <a href="https://www.wikidata.org/wiki/Q91911528#P2860">this</a>:</p>

<p><img src="/assets/images/opencitationsImport.png" alt="" /></p>

<p>The script also needs a OpenCitation token, which you can <a href="https://opencitations.net/querying">get here</a>.
This is how I run the code from the command line (with the token in the <code class="language-plaintext highlighter-rouge">TOKEN</code> environment variable),
for a single DOI:</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>groovy quickstatements.groovy <span class="nt">-t</span> <span class="k">${</span><span class="nv">TOKEN</span><span class="k">}</span> <span class="nt">-d</span> 10.1002/JLAC.18721620110 | <span class="nb">tee </span>output.qs
</code></pre></div></div>

<p>A list of DOIs is provided as a text file, with one DOI on one line. I then use the <code class="language-plaintext highlighter-rouge">-l</code> parameter
(oh, here DOIs of works by <a href="https://en.wikipedia.org/wiki/Shyamala_Gopalan">Shyamala Gopalan</a>, mother of
<a href="https://en.wikipedia.org/wiki/Kamala_Harris">Kamala Harris</a>):</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>groovy quickstatements.groovy <span class="nt">-t</span> <span class="k">${</span><span class="nv">TOKEN</span><span class="k">}</span> <span class="nt">-l</span> harris_dois.txt | <span class="nb">tee </span>output.qs
</code></pre></div></div>

<p>But last weekend I created a new feature. To enrich the profiles of authors, for example Nobel Prize
winners, mothers of, or <a href="https://scholia.toolforge.org/author/Q76784">famous</a> <a href="https://scholia.toolforge.org/author/Q80956">chemists</a>,
previously I would create a list of DOIs, now I have the script do that:</p>

<p>So, today I could add the citation network for any arbitraty author, e.g. <a href="https://en.wikipedia.org/wiki/Carolyn_Bertozzi">Carolyn Bertozzi</a>,
I just pass the Wikidata QID:</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>groovy quickstatements.groovy <span class="nt">-t</span> <span class="k">${</span><span class="nv">TOKEN</span><span class="k">}</span> <span class="nt">-a</span> Q7442 | <span class="nb">tee </span>output.qs
</code></pre></div></div>

<p>I can imagine that in the future the script will have more such options, to do the same
for many authors at some affiliation, or all DOIs for a certain journal.</p>]]></content><author><name>Egon Willighagen</name></author><category term="wikidata" /><category term="bioclipse" /><category term="opencitations" /><category term="justdoi:10.1002/JLAC.18721620110" /><summary type="html"><![CDATA[Scholarly articles provide context to the factualness of statements in Wikidata, similar to the [citation needed] in Wikipedia. And just like the cited references in each scholarly article itself. The citation network is general seen as an essential part of (doing) science, even without citation intention annotation. Nowadays, citations are mostly open, but this took very serious lobbying by the Initiative for Open Citations and not every publisher reacted immediately. But now that they are open, projects like OpenCitations are making this citation network FAIR.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://chem-bla-ics.linkedchemistry.info/assets/images/opencitationsImport.png" /><media:content medium="image" url="https://chem-bla-ics.linkedchemistry.info/assets/images/opencitationsImport.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Scholia configurability</title><link href="https://chem-bla-ics.linkedchemistry.info/2024/08/23/scholia.html" rel="alternate" type="text/html" title="Scholia configurability" /><published>2024-08-23T00:00:00+00:00</published><updated>2024-08-23T00:00:00+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/2024/08/23/scholia</id><content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2024/08/23/scholia.html"><![CDATA[<p><a href="https://scholia.toolforge.org/">Scholia</a> is a visual layer on top of <a href="https://wikidata.org/">Wikidata</a> providing
a rich user experience for browing scholarly research related knowledge. I am using the combinatie
for various things, including exploring new research topics (a method, compound, or protein I do not know so much
about yet), indexing notable research output (including citations), <a href="https://chem-bla-ics.linkedchemistry.info/tag/cito">progress of Citation Typing Ontology
uptake</a>, etc. This weekend I hope to send around the
final draft for the <em>Scholia Chemistry</em> paper.</p>

<p>Scholia has received a fair share of scholarly and social attention. The Scholia paper has been cited
<a href="https://scholar.google.com/scholar?hl=en&amp;as_sdt=0%2C5&amp;q=scholia+wikidata&amp;btnG=&amp;oq=scholia">over 100 times</a> and
the websites received about 200 thousand page views each day (though we do not know how to get Toolforge
to give us sufficient insight into the how and what of that count). There is a Wikipedia template to link
to Scholia and some of projects I am involved in link Scholia for articles, such as
<a href="https://wikipathways.org/">WikiPathways</a>.</p>

<p>With that, there is also interest in using it for other Wikibases and perhaps even random SPARQL endpoints.
These things are not trivial, as Scholia uses complementary APIs, various URL patterns for some of the
functionality, and generally, all SPARQL queries are tweaked to the Wikidata Blazegraph SPARQL endpoint
to ensure results are returned in reasonable time. But that last requires use of Blazegraph extensions
to the SPARQL standard.</p>

<p>All this requires Scholia to become more independent, in a better model-view-controller model. And that
actually turns out very important at this moment. That is, Wikidata is not a RDF-first database, but
a Wikibase-based store. Whenever an edit is made, RDF is generated and the SPARQL endpoint is updated.
Now, the number of edits in Wikidata is enormous and the notion that the SPARQL endpoint is often minutes
at most behind is a huge accomplishment. But the Blazegraph platform cannot keep up with Wikidata.
Blazegraph is open source, but has been bought up and development stopped from one day to another.</p>

<p>Therefore, a split of the Wikidata SPARQL platform is <a href="https://phabricator.wikimedia.org/T337013">planned</a>.
This split will put one part of
the knowledge in on endpoint and the other half in the other. Any query that needs information
from both graphs, will have to do a federated SPARQL query. Basically, there are very few Scholia
queries that do not rewriting. My first rewrite actually failed, because the rewriting is not
obvious and quickly times out. To some extend, this is because now lots of results of subqueries
need to be send over the network from one endpoint to the other. When the combined query basically
covers half of each endpoint, that’s a lot of network traffic.</p>

<p>An immediate use case of the configuration is therefore running Scholia against the current three
endpoints: the current official endpoint, and the two split endpoints under development. With
<a href="https://github.com/WDscholia/scholia/pull/2515">a recent patch</a> <a href="@fnielsen@expressional.social">Finn</a>
and I worked on, this configuration looks like this (and saved as <code class="language-plaintext highlighter-rouge">scholia.ini</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[query-server]
# Wikidata:
#sparql_endpoint = https://query.wikidata.org/sparql
#sparql_editurl = https://query.wikidata.org/#
#sparql_embedurl = https://query.wikidata.org/embed.html#

# Wikidata Split Main
sparql_endpoint = https://query-main.wikidata.org/sparql
sparql_editurl = https://query-main.wikidata.org/#
sparql_embedurl = https://query-main.wikidata.org/embed.html#

# Wikidata Split Scholar
#sparql_endpoint = https://query-scholarly.wikidata.org/sparql
#sparql_editurl = https://query-scholarly.wikidata.org/#
#sparql_embedurl = https://query-scholarly.wikidata.org/embed.html#
</code></pre></div></div>

<p>So, right now, we can test the impact of the split with Scholia and this patch.
We would fire up a local instances of Scholia, running against one of the
split endpoints, and use the Toolforge instance as baseline.</p>

<p>Now, on my system I need to use <a href="https://python.land/virtual-environments/virtualenv">Python virtualenv</a>
so, I first start a Scholia <code class="language-plaintext highlighter-rouge">venv</code>:</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">source</span> ~/.venvs/scholia/bin/activate
</code></pre></div></div>

<p>After that, I can select an other endpoint, e.g. the <code class="language-plaintext highlighter-rouge">main</code> Wikidata split endpoint (<code class="language-plaintext highlighter-rouge">query-main-experimental.wikidata.org</code>)
were it not they are <a href="https://phabricator.wikimedia.org/T371833">currently offline</a> as part of the transition
and run Scholia on a unique port:</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>scholia run
</code></pre></div></div>

<p>Then I can have two browser windows along side and compare Scholia pages againt the current
Scholia instance and when running against another SPARQL endpoint. For now, I can test how well
Scholia runs on the <a href="qlever.cs.uni-freiburg.de/wikidata">QLever instance of Wikidata</a> (superfast and
updated data once a week). Here the configuration I have is not entirely complete, and many
SPARQL queries do not work against QLever, including anything with graphical depiction. But
that said, I can use this configuration:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[query-server]
# QLever
#sparql_endpoint = https://qlever.cs.uni-freiburg.de/api/wikidata
#sparql_editurl = https://qlever.cs.uni-freiburg.de/wikidata/?query=
#sparql_embedurl = 
</code></pre></div></div>

<p>Then, I can compare, for example, the chemicals statistics the main Scholia with one running
against QLever:</p>

<p><img src="/assets/images/scholia_comparison.png" alt="" /></p>

<p>This query ran without modification. For other queries rewriting is needed, but with this
setup we can at least quickly see the differences in the results.</p>]]></content><author><name>Egon Willighagen</name></author><category term="scholia" /><category term="doi:10.1007/978-3-319-70407-4_36" /><category term="wikidata" /><category term="sparql" /><summary type="html"><![CDATA[Scholia is a visual layer on top of Wikidata providing a rich user experience for browing scholarly research related knowledge. I am using the combinatie for various things, including exploring new research topics (a method, compound, or protein I do not know so much about yet), indexing notable research output (including citations), progress of Citation Typing Ontology uptake, etc. This weekend I hope to send around the final draft for the Scholia Chemistry paper.]]></summary></entry></feed>