<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.3.4">Jekyll</generator><link href="https://chem-bla-ics.linkedchemistry.info/feed/by_tag/dbpedia.xml" rel="self" type="application/atom+xml" /><link href="https://chem-bla-ics.linkedchemistry.info/" rel="alternate" type="text/html" /><updated>2026-06-07T16:43:55+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/feed/by_tag/dbpedia.xml</id><title type="html">chem-bla-ics</title><subtitle>Chemblaics (pronounced chem-bla-ics) is the science that uses open science and computers to solve problems in chemistry, biochemistry and related fields.</subtitle><author><name>Egon Willighagen</name></author><entry><title type="html">Bioclipse and SPARQL end points</title><link href="https://chem-bla-ics.linkedchemistry.info/2009/08/16/bioclipse-and-sparql-end-points.html" rel="alternate" type="text/html" title="Bioclipse and SPARQL end points" /><published>2009-08-16T00:00:00+00:00</published><updated>2009-08-16T00:00:00+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/2009/08/16/bioclipse-and-sparql-end-points</id><content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2009/08/16/bioclipse-and-sparql-end-points.html"><![CDATA[<p>Last week, there was a very interesting thread on the <a href="http://dbpedia.org/">DBPedia</a> mailing list, on using Java for doing remote
<a href="http://en.wikipedia.org/wiki/SPARQL">SPARQL</a> queries. This was one of the features still missing in <a href="http://github.com/egonw/bioclipse.rdf/tree/master">bioclipse.rdf</a>.
<a href="http://dowhatimean.net/">Richard Cyganiak</a> replied pointing the code in Jena which conveniently does this and which bioclipse.rdf is already using anyway. Next,
<a href="http://iwis.cs.aau.dk/blog/4">Fred Durao</a> even gave a full code example relieving me from any further research, resulting in
<code class="language-plaintext highlighter-rouge">sparqlRemote()</code> now implemented in the <code class="language-plaintext highlighter-rouge">rdf</code> manager:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&gt; rdf.sparqlRemote(
  "http://dbpedia.org/sparql",
  "select distinct ?Concept where{[] a ?Concept } LIMIT 10"
);
[[http://dbpedia.org/ontology/Place], [http://dbpedia.org/ontology/Area],
[http://dbpedia.org/ontology/City], [http://dbpedia.org/ontology/River],
[http://dbpedia.org/ontology/Road], [http://dbpedia.org/ontology/Lake],
[http://dbpedia.org/ontology/LunarCrater],
[http://dbpedia.org/ontology/ShoppingMall], [http://dbpedia.org/ontology/Park],
[http://dbpedia.org/ontology/SiteOfSpecialScientificInterest]]
</code></pre></div></div>

<p>I reported earlier <a href="https://chem-bla-ics.linkedchemistry.info/2009/02/dbpedia-lookup-and-autocomplete-of.html">two example SPARQL queries for chemistry <i class="fa-solid fa-recycle fa-xs"></i></a>,
which can now be rewritten as Bioclipse scripts:</p>

<script src="https://gist.github.com/168582.js"></script>

<p>and</p>

<script src="https://gist.github.com/168583.js"></script>]]></content><author><name>Egon Willighagen</name></author><category term="bioclipse" /><category term="sparql" /><category term="rdf" /><category term="dbpedia" /><summary type="html"><![CDATA[Last week, there was a very interesting thread on the DBPedia mailing list, on using Java for doing remote SPARQL queries. This was one of the features still missing in bioclipse.rdf. Richard Cyganiak replied pointing the code in Jena which conveniently does this and which bioclipse.rdf is already using anyway. Next, Fred Durao even gave a full code example relieving me from any further research, resulting in sparqlRemote() now implemented in the rdf manager:]]></summary></entry><entry><title type="html">Open Data: license, rights, aggregation, clean interfaces?</title><link href="https://chem-bla-ics.linkedchemistry.info/2009/05/18/open-data-license-rights-aggregation.html" rel="alternate" type="text/html" title="Open Data: license, rights, aggregation, clean interfaces?" /><published>2009-05-18T00:00:00+00:00</published><updated>2009-05-18T00:00:00+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/2009/05/18/open-data-license-rights-aggregation</id><content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2009/05/18/open-data-license-rights-aggregation.html"><![CDATA[<p>A <a href="http://blog.openwetware.org/scienceintheopen/2009/05/15/a-breakthrough-on-data-licensing-for-public-science/">recent post</a> by
<a href="http://blog.openwetware.org/scienceintheopen/">Cameron</a> on his visit last week with <a href="http://wwmm.ch.cam.ac.uk/blogs/adams/">Nico</a>,
<a href="http://wwmm.ch.cam.ac.uk/blogs/murrayrust/">Peter</a> and <a href="http://wwmm.ch.cam.ac.uk/blogs/downing/">Jim</a>, discussed
<a href="http://en.wikipedia.org/wiki/Open_data">Open Data</a> licensing. This lead to an interesting discussion on these matters, and
questions by me on why people care so much about only public domain data (or licensed with
<a href="http://www.opendatacommons.org/licenses/pddl/1.0/">PDDL</a> or <a href="http://wiki.creativecommons.org/CC0">CC0</a>).</p>

<p>Open licensing for data has not as much matured as for software, and international law seems to be more confusing about the
issues. I guess that is because data aggregation has been around for way before the computer era. The PDDL and CC0 both try to
overcome this fuzziness. But there is another issue we need to keep in mind. A lot of useful Data was aggregated and made Open
<em>before</em> these licenses came about, and use, for example, the <a href="http://www.gnu.org/copyleft/fdl.html">GNU FDL</a> license, such as
the <a href="http://www.nmrshiftdb.org/">NMRShiftDB</a>.</p>

<h2 id="rights">Rights</h2>

<p>Right now, there are two Open Data camps, much like the BSD-vs-GPL wars in Open Source: one that believes in waiving any rights
on the Data, indicating that facts are free; others that believe that data must be protected to not be eaten by big companies
and lost to the community (e.g. <a href="http://friendfeed.com/onssolubility/cf6afd52/should-we-contribute-solubility-data-to">the WolframAlpha arragnements are suspect</a>).</p>

<p>Of course, both camps are not that far apart, and both believe Open is important. Interestingly, there are some noteworthy
differences with the Open Source wars. I see parallels between the two, which details an important difference: Open Source has
algorithms (uncopyrightable) and implementations (copyrightable); Open Data has Data (uncopyrightable) and aggregation
(copyrightable). Open Source talks mostly about the implementation, not the algorithm; it’s Open Source, not Open Algorithms
after all. In cheminformatics it is even often the case that the algorithms are not even specified and that there only truly
is source.</p>

<p>However, Open Data in title does not make distinction. Data is fairly cheap and acquisition can be automated and computerized;
Aggregation, on the other hand, requires human involvement: curation and thinking about data models, etc. This is where added
value is. Consider an assigned NMR spectrum or the raw data returned from the spectrometer.</p>

<p>It is this added value that people want to protect, not the data itself. I think.</p>

<h2 id="aggregation">Aggregation</h2>

<p>One important argument that tend to show up when people argument for PDDL and CC0 is that it makes data aggregation easier.
This is most certainly true: if you can do whatever you like with a blob of data, that also means aggregate with any other
blob of data. However, copyleft licenses, like the GNU FDL, require the aggregation to have a compatible license too. It is
the license incompatibilities that make this impossible. Or … ?</p>

<p>Open Source has matured to such a point that it is fairly clear what the intended behaviour is, regarding derivatives. An
aggregation of software (typically refered to as a distribution) is only a derivative under certain conditions. This makes
it possible to run proprietary software on top of GNU/Linux, which uses the GNU GPL but does not require software to run on
top of it to be GPL too. Unless… unless, not a clear well-defined interface has been used, indicating a strong dependency.
Now, surely, these things have not been confirmed to match actual law in court, but the intentions are clear.</p>

<h2 id="clean-data-interfaces">Clean Data Interfaces?</h2>

<p>Now, if we would translate this to Open Data, would there be the equivalent of a clean interface? Can we build a data
distribution with data of various licenses? I think we can! I am not a lawyer and please consider this an invitation
to discuss these matters…</p>

<p>Let’s start simlpe… if I put a GNU FDL image in this blog, by linking to it with a open, free, clean HTML interface
(<code class="language-plaintext highlighter-rouge">&lt;img src=""/&gt;</code>), would that make my blog GNU FDL too? I don’t think so. Surely, I would need to list copyright owner,
and actually would be required to put the GNU FDL in my blog too, but hope linking to the license text would suffice too.
(Let’s skip fair use at this moment, and assume the use goes beyond fair use). Question: am I not using a clean interface,
and would this not make the image’s license no infect my blog?</p>

<p>A more difficult example, consider <a href="http://rdf.openmolecules.net/">rdf.openmolecules.net</a>, which surely aggregated facts,
including data from the NMRShiftDB and <a href="http://dbpedia.org/">DBPedia</a>. I am using a unique identifiers here, the NMRShiftDB
compound ID, and the DBPedia URL, which surely is GNU FDL, and use this to make a <code class="language-plaintext highlighter-rouge">&lt;owl:sameAs&gt;</code> statement. Again, please do
not consider fair use, which this certainly is. But, let’s say I put in some more DBPedia and NMRShiftDB data in this
aggregation. The GNU FDL data on rdf.openmolecules.net would be separate RDF blocks, with proper dc:license, dc:author
annotation. But the block would be part of a larger aggregation. The clean interface here is
<a href="http://en.wikipedia.org/wiki/Resource_description_framework">Resource Description Framework</a>.</p>

<p>This second case does not only affect my rdf.openmolecules.net website, but, for example, <a href="http://bio2rdf.org/">bio2rdf.org</a>
is also in the same situation and aggregated and distribute DBPedia’s GNU FDL data (e.g.
<a href="http://bio2rdf.org/searchns/dbpedia/hexokinase">hexinanose</a>. Does that make the
whole of bio2rdf database GNU FDL. They too use RDF as clean interface.</p>

<h2 id="call-for-discussion">Call for Discussion</h2>

<p>Despite what one of the two camps like to see, the mere fact of added value when making data aggregations will keep
copyleft license stay around, and instead of trying to convince everyone of the virtues of PDDL- and CC0-like licenses,
we should think about to what extend it really matters.</p>

<p>I can do my data analysis with data sources of various licenses. I can search and retrieve data from various sources
with various licenses. What obstacles are really there that disallow us to do science? Do the data interfaces we have
now not provide enough technical means to address the license incompatibilities? They have in Open Source, why would
that not apply to Open Data too?</p>]]></content><author><name>Egon Willighagen</name></author><category term="opendata" /><category term="nmrshiftdb" /><category term="rdf" /><category term="dbpedia" /><category term="bio2rdf" /><summary type="html"><![CDATA[A recent post by Cameron on his visit last week with Nico, Peter and Jim, discussed Open Data licensing. This lead to an interesting discussion on these matters, and questions by me on why people care so much about only public domain data (or licensed with PDDL or CC0).]]></summary></entry><entry><title type="html">DBPedia enters rdf.openmolecules.net</title><link href="https://chem-bla-ics.linkedchemistry.info/2009/02/17/dbpedia-enters-rdfopenmoleculesnet.html" rel="alternate" type="text/html" title="DBPedia enters rdf.openmolecules.net" /><published>2009-02-17T00:00:00+00:00</published><updated>2009-02-17T00:00:00+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/2009/02/17/dbpedia-enters-rdfopenmoleculesnet</id><content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2009/02/17/dbpedia-enters-rdfopenmoleculesnet.html"><![CDATA[<p>As of tonight, <a href="http://rdf.openmolecules.net/">rdf.openmolecules.net</a> links to the chemistry <a href="http://www.dbpedia.org/">DBPedia</a> (1816 chemical compounds),
for which I used the SPARQL given in <a href="https://chem-bla-ics.linkedchemistry.info/2009/02/dbpedia-lookup-and-autocomplete-of.html">DBPedia: lookup and autocomplete of chemistry <i class="fa-solid fa-recycle fa-xs"></i></a>.
It’s first of several steps to extend rdf.openmolecules.net to link up various chemistry database. The below figure shows the current state, where the green nodes are fully RDF-ied:</p>

<p><img src="/assets/images/ons.png" alt="" /></p>

<p>Drugs are still missing, but will add those too, and since not all entries had InChIs, SMILES were converted using
<a href="https://chem-bla-ics.linkedchemistry.info/2009/02/10/cdk-12-release-candidate.html">CDK 1.1.5 <i class="fa-solid fa-recycle fa-xs"></i></a>.</p>]]></content><author><name>Egon Willighagen</name></author><category term="rdf" /><category term="dbpedia" /><category term="sparql" /><category term="inchi" /><category term="smiles" /><category term="chebi" /><category term="cb" /><summary type="html"><![CDATA[As of tonight, rdf.openmolecules.net links to the chemistry DBPedia (1816 chemical compounds), for which I used the SPARQL given in DBPedia: lookup and autocomplete of chemistry . It’s first of several steps to extend rdf.openmolecules.net to link up various chemistry database. The below figure shows the current state, where the green nodes are fully RDF-ied:]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://chem-bla-ics.linkedchemistry.info/assets/images/ons.png" /><media:content medium="image" url="https://chem-bla-ics.linkedchemistry.info/assets/images/ons.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">DBPedia: lookup and autocomplete of chemistry</title><link href="https://chem-bla-ics.linkedchemistry.info/2009/02/11/dbpedia-lookup-and-autocomplete-of.html" rel="alternate" type="text/html" title="DBPedia: lookup and autocomplete of chemistry" /><published>2009-02-11T00:00:00+00:00</published><updated>2009-02-11T00:00:00+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/2009/02/11/dbpedia-lookup-and-autocomplete-of</id><content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2009/02/11/dbpedia-lookup-and-autocomplete-of.html"><![CDATA[<p>On the <a href="http://dbpedia.org/">DBPedia</a> <a href="https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion">discussion mailing list</a> there was a post on a
nice web page which allows you to look up things, and which features a autocomplete edit field. The below screenshot show lookup of molecular structures:</p>

<p><img src="/assets/images/dbpediaAutocomplete.png" alt="" /></p>

<p>If you are not ware of this, adding content to DBPedia is as easy as adding something to <a href="http://www.wikipedia.org/">WikiPedia</a>. Literally: DBPedia is
the <a href="http://en.wikipedia.org/wiki/Resource_Description_Framework">RDF</a> flavour of WikiPedia. It extracts the information from the info boxes, as I
discussed before (see <a href="http://chem-bla-ics.blogspot.com/2007/08/molecules-in-wikipedia.html">Molecules in Wikipedia</a>).</p>

<p>BTW, one can take advantage of DBPedia to see what WikiPedia has to offer in terms of chemistry. For example, to list all molecules which have a SMILES, one can use this simple SPARQL query:</p>

<script src="http://gist.github.com/57559.js"></script>

<p>Or, to list those which have an InChI:</p>

<script src="https://gist.github.com/57571.js"></script>

<p>And this is actually quite useful, e.g. it can be used in quality control. Running the above queries will show up several broken SMILES and InChIs. I have not had time to fix those yet, so please go ahead and beat me to those fixes, and get some WikiPedia Fame :) Alternatively, invert the queries and add missing InChIs, PubChem CID or SMILES. When I have a bit more free time again, after the new stable CDK and Bioclipse releases, I’ll runs these analyses again, and summarize them in a web page.</p>]]></content><author><name>Egon Willighagen</name></author><category term="rdf" /><category term="dbpedia" /><category term="wikipedia" /><category term="smiles" /><category term="inchi" /><summary type="html"><![CDATA[On the DBPedia discussion mailing list there was a post on a nice web page which allows you to look up things, and which features a autocomplete edit field. The below screenshot show lookup of molecular structures:]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://chem-bla-ics.linkedchemistry.info/assets/images/dbpediaAutocomplete.png" /><media:content medium="image" url="https://chem-bla-ics.linkedchemistry.info/assets/images/dbpediaAutocomplete.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry></feed>