<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.3.4">Jekyll</generator><link href="https://chem-bla-ics.linkedchemistry.info/feed/by_tag/data.xml" rel="self" type="application/atom+xml" /><link href="https://chem-bla-ics.linkedchemistry.info/" rel="alternate" type="text/html" /><updated>2026-06-15T12:00:19+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/feed/by_tag/data.xml</id><title type="html">chem-bla-ics</title><subtitle>Chemblaics (pronounced chem-bla-ics) is the science that uses open science and computers to solve problems in chemistry, biochemistry and related fields.</subtitle><author><name>Egon Willighagen</name></author><entry><title type="html">New paper: pyBiodatafuse: Extending interoperability of data using modular queries across biomedical resources</title><link href="https://chem-bla-ics.linkedchemistry.info/2026/05/30/new-paper-pybiodatafuse.html" rel="alternate" type="text/html" title="New paper: pyBiodatafuse: Extending interoperability of data using modular queries across biomedical resources" /><published>2026-05-30T00:00:00+00:00</published><updated>2026-05-30T00:00:00+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/2026/05/30/new-paper-pybiodatafuse</id><content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2026/05/30/new-paper-pybiodatafuse.html"><![CDATA[<p>The number of data and knowledge source relevant to your biological or chemical question
increases every year. They all come with different API and different data models. These
need to be documented and mapped. What better way to do that than actually do that and
then use that. I never asked, but I can imagine that was the original idea of Tooba
and Yojana. At the very least, it demonstrates the level of interoperability we need
in the life sciences.</p>

<p>In a recent paper, <a href="https://orcid.org/0000-0002-7683-0452">Yojana Gadiya</a>,
<a href="https://orcid.org/0000-0002-4166-7093">Javier Millán Acosta</a>, and
<a href="https://orcid.org/0000-0002-4904-3269">Tooba Abbassi-Daloii</a> led a project called
BioDataFuse (worked on at the biohackathons of ELIXIR in <a href="https://doi.org/10.37044/osf.io/mhsqp">2023</a>
and <a href="https://doi.org/10.37044/osf.io/ptmg5_v1">2024</a>
and of SWAT4HCLS in <a href="https://ceur-ws.org/Vol-3890/paper-23.pdf">2024</a>
and <a href="https://ceur-ws.org/Vol-4196/paper_71.pdf">2025</a>) and the matching Python package,
<a href="https://github.com/BioDataFuse/pyBiodatafuse">pyBiodatafuse</a>
(doi:<a href="https://doi.org/10.1093/bioinformatics/btag064">10.1093/bioinformatics/btag064</a>).</p>

<p>With a group of researchers from The Netherlands, Switzerland, Czech Republic, and
the USA, multiple databases are wrapped in a uniform data model. The package
allows the generation of a graph across the imported databases which can then
be further analyzed and visualized. This is an example (RDF) graph that was generated:</p>

<p><img src="/assets/images/pyBiodatafuseGraph.png" alt="" /></p>

<p>Seeing this kind of interoperability brings back <a href="https://chem-bla-ics.linkedchemistry.info/2010/03/04/rdf-jena-bioclipse-eclipse-zest-2-icons.html">good memories</a>.</p>

<p>Congrats to all authors!</p>]]></content><author><name>Egon Willighagen</name></author><category term="python" /><category term="data" /><category term="doi:10.1093/BIOINFORMATICS/BTAG064" /><category term="doi:10.37044/OSF.IO/MHSQP" /><category term="justdoi:10.37044/OSF.IO/PTMG5_V1" /><summary type="html"><![CDATA[The number of data and knowledge source relevant to your biological or chemical question increases every year. They all come with different API and different data models. These need to be documented and mapped. What better way to do that than actually do that and then use that. I never asked, but I can imagine that was the original idea of Tooba and Yojana. At the very least, it demonstrates the level of interoperability we need in the life sciences.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://chem-bla-ics.linkedchemistry.info/assets/images/pyBiodatafuseGraph.png" /><media:content medium="image" url="https://chem-bla-ics.linkedchemistry.info/assets/images/pyBiodatafuseGraph.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Additional files, data, datasets, databases, and published data</title><link href="https://chem-bla-ics.linkedchemistry.info/2024/10/29/suppdata-data-dataset-database.html" rel="alternate" type="text/html" title="Additional files, data, datasets, databases, and published data" /><published>2024-10-29T00:00:00+00:00</published><updated>2024-10-29T00:00:00+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/2024/10/29/suppdata-data-dataset-database</id><content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2024/10/29/suppdata-data-dataset-database.html"><![CDATA[<p>Open Science doesn’t make publishing easier. That that’s all for the better: our research efforts are complex,
so why should the publishing be. Sure, I am <strong>not</strong> talking about references formatting or moving the Methods
section to the right location, or some silly statement that all authors agree with the manuscript when you are
the only author.</p>

<p>No, let’s talk about data. What should you publish? How, and when? And why would you do it in the first
place? This is not going to be a post about FAIR either, but instead about when to publish data as additional
files (aka supplementary data), raw data, processed data, as a datasets, or even as a database. That’s a
lot of types of data, and the differences matter at least for the effort you want to put in.</p>

<p>First, things have changed. We produce a massive amount more data. In the past your data, or at least the
processed data, would be part of your conference talk, your journal article, or your book (chapter).
Open Science has changed this: data should be easier to reuse. But that results in new questions; those
as in the previous paragraph. So, let’s add some context.</p>

<p>Data is very broad and includes digital knowledge. Data can be raw, and the exact numbers collected (e.g.
by a apparatus) or created by researchers. Processed data is what you get when you process the raw data.
For example, raw data may be a FID graph in nucleic magnetic resonance, while processed data would be a
plot showing intensities versus chemical shifts. Published data is then a list of peaks you put in your
results section to support your claim of chemical identity.</p>

<p>A fourth type of data is metadata, and could here be the instrument on which the FID was measured, or
the solvent used, etc. This is where it gets complicated, because depending on the researcher who
processes the data, metadata can actually be data itself. For example, when you study the chemical
shift differences in different organic solvents.</p>

<p>From a more social level, the <a href="https://chem-bla-ics.linkedchemistry.info/2024/10/21/nasa-tops.html">Open Science 101</a>
uses the following categories: primary data as collected/recorded by the researcher, and
“secondary data typically refers to data that is used by someone different from who collected or generated the data”.
This angle of data captures the collaboration aspects of open science, but says more about
the processors than the data, I think.</p>

<h2 id="monitoring-open-data">Monitoring Open Data</h2>

<p>Central aspect of doing research is to disseminate the research. Traditionally, this has been
disseminating results, hoping they become facts. Increasingly, we realize that this process needs
improvement, particularly clearly studies, done, and communicated by the Open Science approaches.</p>

<p>Complementary, there is recognition&amp;rewarding (R&amp;R) and the wish to use various kinds of monitoring to
assess who should be rewarded (and who should be fired), and the monitor is the implementation
of the recognition. So, how does this work for open data? We can count every open data, but
if thrown on a big pile, that becomes a bad monitor for use in recognition and rewarding.</p>

<p>One idea is to differentiate in what data we monitor? Just raw data? Or processed data?
How much intellectual effort does that have to in collecting/recording the data? Should that
be part of the monitor and how do you even measure that? Lot’s of known unknowns here.</p>

<p>But this should not inhibit us from telling the research narative. And maybe we should
just exploring the possible narratives to allow us how it may help us monitor work done,
how to recognize contributions to the scientific record, and how to use all that in R&amp;R.</p>

<p>I here present some example from my own research, just to start a narrative.</p>

<h2 id="raw-data">Raw data</h2>

<p>Over the years I have collected and recorded quite a bit of raw data. First data collected in the lab
and later mostly recorded. Even though I have been doing Open Science since the late nineties,
I cannot say all my data has been archived well. Even less so, I do not have a “publication list”
of all my raw data. As an academic community, we have been focusing too much on the scholarly
article as the center of the research system (more on that later, because there is awesome
research presented at the Dutch National Open Science Festival).</p>

<ul>
  <li><a href="https://chem-bla-ics.linkedchemistry.org/03/27/migrating-pka-data-from-drugmet-to.html">pKa values</a> (not archived, no DOI)</li>
  <li><a href="https://doi.org/10.6084/m9.figshare.7075214.v1">NanoWiki 5</a> (archived, with DOI)</li>
</ul>

<h2 id="processed-data">Processed data</h2>

<p>As is defined in the <a href="https://commission.europa.eu/law/law-topic/data-protection/reform/what-constitutes-data-processing_en">European laws around GDPR</a>,
processing “includes the collection, recording, organisation, structuring, storage,
adaptation or alteration, retrieval, consultation, use, disclosure by transmission, dissemination or otherwise making available,
alignment or combination, restriction, erasure or destruction of [..] data”. As you can see, this is slightly
different from the first, but in light of protecting citizen, this broader definition makes sense.
My point here the that processing should be taken broadly. And data curation, which researchers
routinely do, is processing too. For any data scientist, this is easily taking up 25% of the
full time needed for any data analysis. One of the points of the FAIR principles is to keep
that number as low as possible, but not really the point here.</p>

<p>When it comes to this kind of data, I like people to have readily access to the results
of my curation. You will find a lot of processed data like this archived. Some examples of
data by me or to which I contributed:</p>

<ul>
  <li><a href="https://doi.org/10.5281/zenodo.13933046">WikiPathways</a> (monthly archived, with DOIs)</li>
  <li><a href="https://doi.org/10.6084/m9.figshare.681678">ChemPedia RDF</a> (different format than original data, archived, with DOI)</li>
  <li><a href="https://doi.org/10.6084/m9.figshare.26931712.v1">BridgeDb Metabolite ID mapping database</a> (irregular releases, not every one is notable; archived, with DOI)</li>
</ul>

<p>The last one will look something like this:</p>

<p><img src="/assets/images/figshare_bridgedb.png" alt="" /></p>

<h2 id="published-data">Published data</h2>

<p>And then we have published data, which refers to data presented in a publication, like a journal
articles. We know this as supplementary data or additional files. Several publishers, like
BioMedCentral, submit these data automatically to a repository. For example, the
<a href="https://jcheminf.biomedcentral.com/">Journal of Cheminformatics</a> publishes all additional files under a CCZero license on Figshare.
But many of these support the narrative of the story, rather than the narrative of the
research question. Of course, journals also have limited expectations of the format and
my personal impression is that these are not commonly FAIR. (Open Access is not Open Science.)</p>

<p>Some examples of such datasets where I do not see them as notable and do not expect them
to be monitored. These datasets are part of the journal article, and that narrative is
already monitored.</p>

<ul>
  <li><a href="https://doi.org/10.6084/m9.figshare.c.3696370_D1.v1">MOESM1 of PubChemRDF: towards the semantic annotation of PubChem compound and substance databases</a> (Word document with data, with DOI)</li>
  <li><a href="https://doi.org/10.6084/m9.figshare.c.3698536_D1.v1">MOESM1 of XMetDB: an open access database for xenobiotic metabolism</a> (archived Structured Data file with chemical structures, with DOI)</li>
</ul>

<h2 id="databases">Databases</h2>

<p>And then we have databases provides as interactive website. This allows other researchers
to explore the data, before the start processing the data. These typically do not have a DOI itself,
tho data can be routinely archived as in the above WikiPathways example.</p>

<p>Databases itself, as research output, are much harder to archive. And to make them citatable,
research publish journal articles with a narrative that describes the database. The follwing two
are such database papers, where the article DOI is a proxy for the database:</p>

<ul>
  <li><a href="https://doi.org/10.1186/1758-2946-5-23">The ChEMBL database as linked open data</a> (<a href="https://chemblmirror.rdf.bigcat-bioinformatics.org/">online</a>, DOI via article)</li>
  <li><a href="https://doi.org/10.1186/s13321-021-00573-5">PSnpBind</a> (<a href="https://psnpbind.org/">online</a>, DOI via article)</li>
</ul>]]></content><author><name>Egon Willighagen</name></author><category term="data" /><category term="doi:10.6084/M9.FIGSHARE.7075214.V1" /><category term="doi:10.5281/ZENODO.13933046" /><category term="doi:10.6084/M9.FIGSHARE.681678" /><category term="doi:10.6084/M9.FIGSHARE.26931712.V1" /><category term="doi:10.6084/M9.FIGSHARE.C.3696370_D1.V1" /><category term="doi:10.6084/M9.FIGSHARE.C.3698536_D1.V1" /><category term="doi:10.1186/1758-2946-5-23" /><category term="doi:10.1186/S13321-021-00573-5" /><summary type="html"><![CDATA[Open Science doesn’t make publishing easier. That that’s all for the better: our research efforts are complex, so why should the publishing be. Sure, I am not talking about references formatting or moving the Methods section to the right location, or some silly statement that all authors agree with the manuscript when you are the only author.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://chem-bla-ics.linkedchemistry.info/assets/images/figshare_bridgedb.png" /><media:content medium="image" url="https://chem-bla-ics.linkedchemistry.info/assets/images/figshare_bridgedb.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">CiTO updates #4: annotations in datasets</title><link href="https://chem-bla-ics.linkedchemistry.info/2023/04/02/cito-updates-4-annotations-in-datasets.html" rel="alternate" type="text/html" title="CiTO updates #4: annotations in datasets" /><published>2023-04-02T00:00:00+00:00</published><updated>2023-04-02T00:00:00+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/2023/04/02/cito-updates-4-annotations-in-datasets</id><content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2023/04/02/cito-updates-4-annotations-in-datasets.html"><![CDATA[<p>Okay, <a href="https://jcheminf.biomedcentral.com/articles/10.1186/s13321-023-00683-2">the Pilot</a>
<a href="https://jcheminf.biomedcentral.com/articles/10.1186/s13321-023-00684-1">is over</a> ending with 17 papers, 16 of which have CiTO
annotations (and so far 4 J.Cheminform. <a href="https://doi.org/10.1186/s13321-022-00656-x">papers</a>
<a href="https://doi.org/10.1186/s13321-022-00673-w">after</a> <a href="https://doi.org/10.1186/s13321-022-00677-6">the</a>
<a href="https://doi.org/10.1186/s13321-023-00701-3">pilot</a>), but my interest in the
<a href="http://purl.org/spar/cito">Citation Typing Ontology</a> continues and we just need
<a href="https://chem-bla-ics.blogspot.com/2023/02/citation-typing-progress-but-we-need.html">more adoption</a>.</p>

<p><strong>Datasets as source of annotations</strong></p>

<p>So, here’s a quick <a href="https://wikidata.org/">Wikidata</a> update. I have been using Wikidata as infrastructure to collect and share CiTO
annotations (see also the below “Scholia patch” posts). Some time ago I recovered my CiteULike CiTO annotations and made this
<a href="https://scholia.toolforge.org/work/Q115470140">available on Zenodo</a> (doi:<a href="https://doi.org/10.5281/ZENODO.7368209">10.5281/zenodo.7368209</a>).</p>

<p>And while thinking about datasets with CiTO annotations, I found two other datasets. One was from an article in Portuguese and one from an
<a href="https://scholia.toolforge.org/work/Q117369886">article by Peroni et al.</a> with
<a href="https://zenodo.org/record/6885109">this data file</a>. That data file is actually a zip, but inside the zip file is a CSV file with three
interesting columns: <code class="language-plaintext highlighter-rouge">cited_doi</code>, <code class="language-plaintext highlighter-rouge">citing_doi</code>, and <code class="language-plaintext highlighter-rouge">intext_citation.intent</code>. There are many more columns and I can highly recommend browsing
them. But these are the three I need to add data to Wikidata. The third column is free text, but using the CiTO for labels, making it
relatively easy to convert to <a href="https://w.wiki/62sR">citation intentions from Wikidata</a>
(PS, thanks to <a href="https://www.wikidata.org/wiki/User:Fvtvr3r">Fvtvr3r</a> for adding more!).</p>

<p>So, I had a cleaned file and started writing a Groovy Bioclipse script using <a href="https://doi.org/10.21105/joss.02558">Bacting</a>.
It basically does a few things: extract all DOIs, check which ones are in Wikidata, analyze the <code class="language-plaintext highlighter-rouge">intext_citation.intent</code> column content,
and then generate QuickStatements (see <a href="https://gist.github.com/egonw/f74fd3bc1f6361434b042a4cac2a8089">this gist</a>). Out of the 600
lines from the input, it creates some 200 new CiTO-annotated citations in Wikidata between
<a href="https://scholia.toolforge.org/work/Q117357537#statements">some 150 article pairs</a>:</p>

<p><img src="/assets/images/Screenshot_20230402_084711.png" alt="" /></p>

<p>The ability to include CiTO annotations from datasets is another welcome boost for the CiTO statistics in Wikidata.
<a href="https://w.wiki/6XQf">This SPARQL query</a> shows an overview of sources that support the CiTO intention annotation, but note that
a claim with a CiTO intention may also have CrossRef, PubMed, and COCI as reference. In those cases, they are primarily for
the citations and not the intention.</p>

<p>There are <a href="https://scholar.social/@egonw/110124747053293502">now</a> (the <a href="https://scholia.toolforge.org/cito/#statistics">latest stats are here</a>)
<strong>1202 citation intention</strong> annotations in Wikidata for 992 citations from <strong>405 articles in 199 venues</strong>. Of these 27 articles have
explicit annotations in the article itself and are found in 4 venues, two journals and two preprint servers). These annotated citations
are to 510 articles in 190 different venues. <a href="https://github.com/WDscholia/scholia/pull/2271">This Scholia patch</a> will add a new
statistics, the number of datasets providing citation intentions, of which there are (as discussed)
<a href="https://scholia.toolforge.org/topic/Q115470140">currently</a> <a href="https://scholia.toolforge.org/work/Q117357537">two</a> in Wikidata.
The latter two provide intentions for the majority of articles and are depicted in yellow in the below overview.</p>

<p><img src="/assets/images/Screenshot_20230402_085317.png" alt="" /></p>

<p>With an annotation in <a href="https://www.wikidata.org/wiki/Q27638524">an 1938 article by Alan Turing</a>! I ran into this article in November 2011
noting an apparent duplicate title in his article list. I turned out an earlier article had a correction with the same name.
I added <a href="https://www.wikidata.org/w/index.php?title=Q27638524&amp;diff=1527020358&amp;oldid=984628387&amp;diffmode=source">this clarification</a>:</p>

<p><img src="/assets/images/Screenshot_20230402_090600.png" alt="" /></p>

<p>This is very trivial citation intention data that publishers could provide as open data.</p>

<p>Okay, that will do for today. There are actually some really interesting things in the pipeline, but I will have to write about that later. I have some deadlines I should start looking at. Below is some extra reading.
Some more history</p>

<ul>
  <li>2021: <a href="https://chem-bla-ics.linkedchemistry.info/2021/11/15/biohackathon-europe-2021-1-cito.html">BioHackathon Europe 2021 #1: CiTO annotations in BioHackrXiv <i class="fa-solid fa-recycle fa-xs"></i></a></li>
  <li>2021: <a href="https://chem-bla-ics.blogspot.com/2021/03/markdown-template-for-journal-of.html">Markdown template for the Journal of Cheminformatics with CiTO support</a></li>
  <li>2020: <a href="https://chem-bla-ics.linkedchemistry.info/2020/11/30/cito-updates-3-third-paper-in.html">CiTO updates #3: third paper in the collection and updated Scholia patch <i class="fa-solid fa-recycle fa-xs"></i></a></li>
  <li>2020: <a href="https://chem-bla-ics.linkedchemistry.info/2020/11/01/cito-updates-2-annotation-migration-to.html">CiTO updates #2: annotation migration to Wikidata and first Scholia patch <i class="fa-solid fa-recycle fa-xs"></i></a></li>
  <li>2020: <a href="https://chem-bla-ics.linkedchemistry.info/2020/11/01/cito-updates-1-first-research-paper-in.html">CiTO updates #1: first research paper in the Journal of Cheminformatics with CiTO annotation published <i class="fa-solid fa-recycle fa-xs"></i></a></li>
  <li>July 2020: <a href="https://chem-bla-ics.blogspot.com/2020/07/new-editorial-adoption-of-citation.html">New Editorial: “Adoption of the Citation Typing Ontology by the Journal of Cheminformatics”</a></li>
  <li>2015: <a href="https://chem-bla-ics.blogspot.com/2015/03/what-youre-doing-is-rather-desperate.html">“What You’re Doing Is Rather Desperate”</a></li>
  <li>2012: <a href="https://chem-bla-ics.linkedchemistry.info/2012/02/23/cito-citeulike-publishing-innovation.html">CiTO / CiteULike: publishing innovation <i class="fa-solid fa-recycle fa-xs"></i></a></li>
  <li>2010: <a href="https://chem-bla-ics.linkedchemistry.info/2010/10/31/citeulike-cito-use-case-1-wordles.html">CiteULike CiTO Use Case #1: Wordles <i class="fa-solid fa-recycle fa-xs"></i></a></li>
  <li>September 2010: <a href="https://chem-bla-ics.linkedchemistry.info/2010/09/17/list-of-things-i-miss-in-citeulike.html">A list of things I miss in CiteULike <i class="fa-solid fa-recycle fa-xs"></i></a></li>
</ul>]]></content><author><name>Egon Willighagen</name></author><category term="cito" /><category term="data" /><category term="scholia" /><category term="doi:10.1186/s13321-023-00683-2" /><category term="justdoi:10.1186/s13321-023-00684-1" /><category term="justdoi:10.1186/s13321-022-00656-x" /><category term="justdoi:10.1186/s13321-022-00673-w" /><category term="justdoi:10.1186/s13321-022-00677-6" /><category term="doi:10.1186/s13321-023-00701-3" /><category term="justdoi:10.1162/QSS_A_00222" /><category term="justdoi:10.5281/zenodo.5155219" /><category term="doi:10.21105/joss.02558" /><category term="doi:10.5281/ZENODO.7368209" /><summary type="html"><![CDATA[Okay, the Pilot is over ending with 17 papers, 16 of which have CiTO annotations (and so far 4 J.Cheminform. papers after the pilot), but my interest in the Citation Typing Ontology continues and we just need more adoption.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://chem-bla-ics.linkedchemistry.info/assets/images/Screenshot_20230402_085317.png" /><media:content medium="image" url="https://chem-bla-ics.linkedchemistry.info/assets/images/Screenshot_20230402_085317.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Also new this week: “Google Dataset Search”</title><link href="https://chem-bla-ics.linkedchemistry.info/2018/09/08/also-new-this-week-google-dataset-search.html" rel="alternate" type="text/html" title="Also new this week: “Google Dataset Search”" /><published>2018-09-08T00:00:00+00:00</published><updated>2018-09-08T00:00:00+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/2018/09/08/also-new-this-week-google-dataset-search</id><content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2018/09/08/also-new-this-week-google-dataset-search.html"><![CDATA[<p>There was a lot of Open Science news this week. The <a href="https://www.blog.google/products/search/making-it-easier-discover-datasets/">announcement</a>
of the <a href="https://toolbox.google.com/datasetsearch">Google Dataset Search</a> was one of them:</p>

<p><img src="/assets/images/google_dataset_search.png" alt="" /></p>

<p>Of course, I first tried searching for “<a href="https://toolbox.google.com/datasetsearch/search?query=RDF%20chemistry&amp;docid=hiQ14TdWzjx%2FQ37gAAAAAA%3D%3D">RDF chemistry</a>”
which shows some of my data sets (and a lot more):</p>

<p><img src="/assets/images/google_dataset_search2.png" alt="" /></p>

<p>It picks up data from many sources, such as <a href="https://figshare.com/">Figshare</a> in this image. That means it also works
(well, sort of, as <a href="https://twitter.com/baoilleach/status/1037986030266318848">Noel O’Boyle noticed</a>) for
supplementary information from the <a href="https://jcheminf.biomedcentral.com/">Journal of Cheminformatics</a>.</p>

<p>It picks up metadata in several ways, among which <a href="http://schemas.org/">schemas.org</a>. So, next week we’ll see if
we can get <a href="http://enanomapper.net/">eNanoMapper</a> extended to spit compatible JSON-LD for its data sets, called “bundles”.</p>

<h2 id="integrated-with-google-scholar">Integrated with Google Scholar?</h2>

<p>While the URL for the search engine does not suggest the service is more than a 20% project, we can
hope it will stay around like Google Scholar has been. But I do hope they will further integrate it
with Scholar. For example, in the above figure, it did pick up that I am the author of that data set
(well, repurposed from an effort of <a href="https://twitter.com/rapodaca">Rich Apodaca</a>), it did not figure
out that I am also on Scholar.</p>

<p>So, these data sets do not show up in your Google Scholar profile yet, but they <strong><em>must</em></strong>. Time will
tell where this data search engine is going. There are many interesting features, and given the amount
of online attention, they won’t stop development just yet, and I expect to discover more and better
features in the next months. Give it a spin!</p>]]></content><author><name>Egon Willighagen</name></author><category term="data" /><category term="google" /><summary type="html"><![CDATA[There was a lot of Open Science news this week. The announcement of the Google Dataset Search was one of them:]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://chem-bla-ics.linkedchemistry.info/assets/images/google_dataset_search2.png" /><media:content medium="image" url="https://chem-bla-ics.linkedchemistry.info/assets/images/google_dataset_search2.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry></feed>