<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.3.4">Jekyll</generator><link href="https://chem-bla-ics.linkedchemistry.info/feed/by_tag/hmdb.xml" rel="self" type="application/atom+xml" /><link href="https://chem-bla-ics.linkedchemistry.info/" rel="alternate" type="text/html" /><updated>2026-04-19T09:50:36+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/feed/by_tag/hmdb.xml</id><title type="html">chem-bla-ics</title><subtitle>Chemblaics (pronounced chem-bla-ics) is the science that uses open science and computers to solve problems in chemistry, biochemistry and related fields.</subtitle><author><name>Egon Willighagen</name></author><entry><title type="html">Compound (class) identifiers in Wikidata</title><link href="https://chem-bla-ics.linkedchemistry.info/2018/08/18/compound-class-identifiers-in-wikidata.html" rel="alternate" type="text/html" title="Compound (class) identifiers in Wikidata" /><published>2018-08-18T00:00:00+00:00</published><updated>2018-08-18T00:00:00+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/2018/08/18/compound-class-identifiers-in-wikidata</id><content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2018/08/18/compound-class-identifiers-in-wikidata.html"><![CDATA[<p><span style="width: 40%; display: block; margin-left: auto; margin-right: auto; float: right">
<img src="/assets/images/extid-wikidata-histogram.png" /> <br />
<a href="https://edu.nl/h6kg3">Bar chart</a> showing the number of compounds with a particular chemical identifier.
</span>
I think <a href="http://wikidata.org/">Wikidata</a> is a groundbreaking project, which will have a major impact on science. One of the
reasons is the open license (CCZero), the very basic approach (<a href="http://wikiba.se/">Wikibase</a>), and the superb community around
it. For example, setting up your own Wikibase including a cool SPARQL endpoint, is
<a href="https://github.com/wmde/wikibase-docker">easily done with Docker</a>.</p>

<p>Wikidata has many sub projects, such as <a href="http://wikicite.org/">WikiCite</a>, which captures the collective of primary literature.
Another one is the <a href="https://www.wikidata.org/wiki/Wikidata:WikiProject_Chemistry">WikiProject Chemistry</a>. The two nicely match
up, I think, making a public database linking chemicals to literature (tho, very much needs to be done here), see my recent
ICCS 2018 poster (doi:<a href="https://doi.org/10.6084/m9.figshare.6356027.v1">10.6084/m9.figshare.6356027.v1</a>, paper pending).</p>

<p>But Wikidata is also a great resource for identifier mappings between chemical databases, something we need for
<a href="https://chem-bla-ics.blogspot.com/2017/11/new-paper-wikipathways-multifaceted.html">our metabolism pathway research</a>.
The mapping, as you may know, are <a href="https://chem-bla-ics.blogspot.com/2016/09/metabolite-identifier-mapping-databases.html">used in the latter</a>
via <a href="https://www.bridgedb.org/">BridgeDb</a> and we have been using Wikidata as one of three sources for some time now (the others being
<a href="http://www.hmdb.ca/">HMDB</a> and <a href="https://www.ebi.ac.uk/chebi/">ChEBI</a>). WikiProject Chemistry has a related
<a href="https://www.wikidata.org/wiki/Wikidata:WikiProject_Chemistry/ChemID">ChemID</a> effort, and while the wiki page does not show
much recent activity, there is actually a lot of ongoing effort (see <a href="https://edu.nl/h6kg3">plot</a>).
And I’ve been <a href="https://chem-bla-ics.blogspot.com/2018/07/lipid-map-identifiers-and.html">adding my bits</a>.</p>

<h2 id="limitations-of-the-links">Limitations of the links</h2>
<p>But not each identifier in Wikidata has the same meaning. While they are all classified as ‘external-id’, the actual link may
have different meaning. This, of course, is the essence of scientific lenses, see <a href="https://chem-bla-ics.blogspot.com/2013/05/linking-wikipathways-to-binding.html">this post</a>
and the papers cited therein. One reason here is the difference in what entries in the various databases mean.</p>

<p>Wikidata has an extensive model, defined by the aforementioned WikiProject Chemistry. For example, it has different concepts
for chemical compounds (in fact, the hierarchy is pretty rich) and compound classes. And these are differently modeled. Furthermore,
it has a model that formalizes that things with a different InChI are different, but even allows things with the same InChI to be
different, if need arises. It tries to accurately and precisely capture the certainty and uncertainty of the chemistry. As such,
it is a powerful system to handle identifier mappings, because databases are not clear, and chemistry and biological in data is
even less: we measure experimentally a characterization of chemicals, but what we put in databases and give names, are specific
models (often chemical graphs).</p>

<p>That model differs from what other (chemical) databases use, or seem to use, because not always do databases indicate what they
actually have in a record. But I think this is a fair guess.</p>

<h2 id="chebi">ChEBI</h2>
<p>ChEBI (and the matching <a href="https://www.wikidata.org/wiki/Property:P683">ChEBI ID</a>) has entries for chemical classes (e.g.
<a href="https://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:35366">fatty acid</a>) and specific compounds (e.g.
<a href="https://www.ebi.ac.uk/chebi/searchId.do?chebiId=30089">acetate</a>).</p>

<h2 id="pubchem-chemspider-unichem">PubChem, ChemSpider, UniChem</h2>
<p>These three resources use the InChI as central asset. While they do not really have the concept of compound classes so much
(though increasingly they have classifications), they do have entries where stereochemistry is undefined or unknown. Each
one has their own way to link to other databases themselves, which normally includes tons of structure normalization (see
e.g. doi:<a href="https://doi.org/10.1186/s13321-018-0293-8">10.1186/s13321-018-0293-8</a> and
doi:<a href="https://doi.org/10.1186/s13321-015-0072-8">10.1186/s13321-015-0072-8</a>).</p>

<h2 id="hmdb">HMDB</h2>
<p>HMDB (and the matching <a href="https://www.wikidata.org/wiki/Property:P2057">P2057</a>) has a biological perspective; the entries
reflect the biology of a chemical. Therefore, for most compounds, they focus on the neutral forms of compounds. This makes
linking to/from other databases where the compound is not neutral chemically less precise.</p>

<h2 id="cas-registry-numbers">CAS registry numbers</h2>
<p>CAS (and the matching <a href="https://www.wikidata.org/wiki/Property:P231">P231</a>) is pretty unique itself, and has identifiers
for substances (see <a href="https://www.wikidata.org/wiki/Q79529">Q79529</a>), much more than chemical compounds, and comes with a
own set of unique features. For example, solutions of some compound, by design, have the same identifier. Previously,
formaldehyde and formalin had different Wikipedia/Wikidata pages, both with the same CAS registry number.</p>

<h2 id="limitations-of-the-links-2">Limitations of the links #2</h2>
<p>Now, returning to our starting point: limitations in linking databases. If we want FAIR mappings, we need to be as precise
as possible. Of course, that may mean we need more steps, but we can always simplify at will, but we never can have a
computer make the links more complex (well, not without making assumptions, etc).</p>

<p>And that is why Wikidata is so suitable to link all these chemical databases: it can distinguish differences when needed,
and make that explicit. It make mappings between the databases more <a href="https://www.nature.com/articles/sdata201618">FAIR</a>.</p>]]></content><author><name>Egon Willighagen</name></author><category term="wikidata" /><category term="scholia" /><category term="chemistry" /><category term="bridgedb" /><category term="cas" /><category term="chebi" /><category term="chemspider" /><category term="fair" /><category term="hmdb" /><category term="pubchem" /><category term="rdf" /><category term="wikicite" /><category term="justdoi:10.6084/m9.figshare.6356027.v1" /><category term="justdoi:10.1186/s13321-018-0293-8" /><category term="justdoi:10.1186/s13321-015-0072-8" /><category term="justdoi:10.1038/sdata.2016.18" /><summary type="html"><![CDATA[Bar chart showing the number of compounds with a particular chemical identifier. I think Wikidata is a groundbreaking project, which will have a major impact on science. One of the reasons is the open license (CCZero), the very basic approach (Wikibase), and the superb community around it. For example, setting up your own Wikibase including a cool SPARQL endpoint, is easily done with Docker.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://chem-bla-ics.linkedchemistry.info/assets/images/extid-wikidata-histogram.png" /><media:content medium="image" url="https://chem-bla-ics.linkedchemistry.info/assets/images/extid-wikidata-histogram.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry></feed>