<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.3.4">Jekyll</generator><link href="https://chem-bla-ics.linkedchemistry.info/feed/by_tag/bio2rdf.xml" rel="self" type="application/atom+xml" /><link href="https://chem-bla-ics.linkedchemistry.info/" rel="alternate" type="text/html" /><updated>2026-04-19T09:50:36+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/feed/by_tag/bio2rdf.xml</id><title type="html">chem-bla-ics</title><subtitle>Chemblaics (pronounced chem-bla-ics) is the science that uses open science and computers to solve problems in chemistry, biochemistry and related fields.</subtitle><author><name>Egon Willighagen</name></author><entry><title type="html">SPARQL end points, Jena and bif:contains</title><link href="https://chem-bla-ics.linkedchemistry.info/2009/10/15/sparql-end-points-jena-and-bifcontains.html" rel="alternate" type="text/html" title="SPARQL end points, Jena and bif:contains" /><published>2009-10-15T00:00:00+00:00</published><updated>2009-10-15T00:00:00+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/2009/10/15/sparql-end-points-jena-and-bifcontains</id><content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2009/10/15/sparql-end-points-jena-and-bifcontains.html"><![CDATA[<p>I have been having fun with SPARQL in <a href="http://www.bioclipse.net/">Bioclipse</a> for a while now, and blogged at several occasions:</p>

<ul>
  <li><a href="">NMRShiftDB enters rdf.openmolecules.net #2: SPARQL end point with Virtuoso</a></li>
  <li><a href="https://chem-bla-ics.linkedchemistry.info/2009/08/21/bioclipse-and-sparql-end-points-2.html">Bioclipse and SPARQL end points #2: MyExperiment <i class="fa-solid fa-recycle fa-xs"></i></a></li>
  <li><a href="https://chem-bla-ics.linkedchemistry.info/2009/08/16/bioclipse-and-sparql-end-points.html">Bioclipse and SPARQL end points <i class="fa-solid fa-recycle fa-xs"></i></a></li>
</ul>

<p>One thing I had not been able to work out, is that <a href="http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/">Virtuoso</a> uses a
(rather nice) <em>bif:contains</em> extension that support indexing. However, <a href="http://jena.sourceforge.net/">Jena</a> would complain with:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>com.hp.hpl.jena.query.QueryParseException: Line 1, column 31: Unresolved
prefixed name: bif:contains
</code></pre></div></div>

<p>Defining the prefix did not solve the problem either, but <a href="http://www.linkedin.com/in/ivanmikhailov">Ivan Mikhailov</a> just
replied to my post to the <a href="https://sourceforge.net/mailarchive/forum.php?forum_name=virtuoso-users">virtuoso-user</a> mailing
list providing the solution.</p>

<p>The solution is in the fact that <code class="language-plaintext highlighter-rouge">bif:</code> is in its own namespace, which makes it possible to replace <code class="language-plaintext highlighter-rouge">bif:contains</code> by its
full reference <code class="language-plaintext highlighter-rouge">&lt;bif:contains&gt;</code>. I directly gave that a try in Bioclipse, and just succesfull ran this Bioclipse
script snippet:</p>

<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nx">rdf</span><span class="p">.</span><span class="nf">sparqlRemote</span><span class="p">(</span>
  <span class="dl">"</span><span class="s2">http://bio2rdf.org/sparql</span><span class="dl">"</span><span class="p">,</span>
  <span class="dl">"</span><span class="s2">SELECT * WHERE {?s ?p ?o . ?o &lt;bif:contains&gt; </span><span class="se">\"</span><span class="s2">aspirin</span><span class="se">\"</span><span class="s2"> .};</span><span class="dl">"</span>
<span class="p">);</span>
</code></pre></div></div>

<p>Thanx, Ivan!</p>]]></content><author><name>Egon Willighagen</name></author><category term="sparql" /><category term="rdf" /><category term="bioclipse" /><category term="bio2rdf" /><summary type="html"><![CDATA[I have been having fun with SPARQL in Bioclipse for a while now, and blogged at several occasions:]]></summary></entry><entry><title type="html">NMRShiftDB RDF #3: Bio2RDF</title><link href="https://chem-bla-ics.linkedchemistry.info/2009/10/09/nmrshiftdb-rdf-3-bio2rdf.html" rel="alternate" type="text/html" title="NMRShiftDB RDF #3: Bio2RDF" /><published>2009-10-09T00:00:00+00:00</published><updated>2009-10-09T00:00:00+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/2009/10/09/nmrshiftdb-rdf-3-bio2rdf</id><content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2009/10/09/nmrshiftdb-rdf-3-bio2rdf.html"><![CDATA[<p>My might have seen my efforts to convert the <a href="http://www.nmrshiftdb.org/">NMRShiftDB</a> data into RDF:</p>

<ul>
  <li><a href="http://chem-bla-ics.blogspot.com/2009/09/nmrshiftdb-rdf-2-some-statistics.html">NMRShiftDB RDF #2: Some statistics</a></li>
  <li><a href="http://chem-bla-ics.blogspot.com/2009/09/nmrshiftdb-rdf-1-spectra-by-inchi.html">NMRShiftDB RDF #1: Spectra by InChI </a></li>
  <li><a href="https://chem-bla-ics.linkedchemistry.info/2009/09/04/nmrshiftdb-enters-rdfopenmoleculesnet-2.html">NMRShiftDB enters rdf.openmolecules.net #2: SPARQL end point with Virtuoso <i class="fa-solid fa-recycle fa-xs"></i></a></li>
</ul>

<p><a href="http://bio2rdf.blogspot.com/">Peter Ansell</a> has shortly after that copied the data into <a href="http://bio2rdf.org/">Bio2RDF</a>,
but I had not blogged about that yet. So, here goes. If you have not looked at Bio2RDF yet, this is a good time to do that.
The structure of the exposed triples is not perfect, and I just realized I made a beginners mistake, to use a domain name
in a namespace I have not control over (bad me). The Virtuoso6 faceted browser allows you to navigate the data in Bio2RDF
by molecule (e.g. <a href="http://cu.bio2rdf.org/page/nmrshiftdb_molecule:234">molecule 234</a>):</p>

<p><img src="/assets/images/nmrRDF1.png" alt="" /></p>

<p>And by spectrum too (e.g. <a href="http://cu.bio2rdf.org/page/nmrshiftdb_spectrum:4735">spectrum 4735</a>):</p>

<p><img src="/assets/images/nmrRDF2.png" alt="" /></p>]]></content><author><name>Egon Willighagen</name></author><category term="nmrshiftdb" /><category term="rdf" /><category term="bio2rdf" /><summary type="html"><![CDATA[My might have seen my efforts to convert the NMRShiftDB data into RDF:]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://chem-bla-ics.linkedchemistry.info/assets/images/nmrRDF1.png" /><media:content medium="image" url="https://chem-bla-ics.linkedchemistry.info/assets/images/nmrRDF1.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Michel Dumontier at Uppsala University</title><link href="https://chem-bla-ics.linkedchemistry.info/2009/06/26/michel-dumontier-at-uppsala-university.html" rel="alternate" type="text/html" title="Michel Dumontier at Uppsala University" /><published>2009-06-26T00:00:00+00:00</published><updated>2009-06-26T00:00:00+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/2009/06/26/michel-dumontier-at-uppsala-university</id><content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2009/06/26/michel-dumontier-at-uppsala-university.html"><![CDATA[<p><a href="http://dumontierlab.com/">Michel</a> visits <a href="http://www.farmbio.uu.se/researchgroup.php?fg=1">our group</a> this week and gave a
very exciting talk yesterday on the role of ontologies in drug discover. This being ongoing research in our group too, the talk was
well received by the audience (which was not too large, because after mid-summer, Uppsala has holiday). First the first time, I
microblogged a talk on <a href="http://twitter.com/egonwillighagen">my twitter account</a> (using the
<a href="http://search.twitter.com/search?q=dumontieratuppsala">#dumontieratuppsala</a> tag). I have not got a XSLT ready to convert the
relevant items into a nice HTML snippet for embedding in this blog, but will try to do that later. Meanwhile, I also made a
few bookmarks here and there, which are available from <a href="http://delicious.com/egonw/%23dumontieratupssala">Delicious</a>.</p>

<p>The rest of the day, we talked about various ontology, bio- and cheminformatics related stuff. We looked at
<a href="http://sadiframework.org/">SADI</a>, <a href="http://www.bioclipse.net/">Bioclipse</a> (and my RDF extension, see
<a href="http://delicious.com/tag/bioclipse+gist+manager:rdf">these JavaScripts</a>),
<a href="http://www.bio2rdf.org/">Bio2RDF</a>, <a href="http://rdf.openmolecules.net/?InChI=1/CH4/h1H4">rdf.openmolecules.net</a>,
and <a href="http://www.openlinksw.com/dataspace/dav/wiki/Main/VOSRDF">Virtuoso</a>.</p>]]></content><author><name>Egon Willighagen</name></author><category term="bio2rdf" /><category term="rdf" /><category term="bioclipse" /><summary type="html"><![CDATA[Michel visits our group this week and gave a very exciting talk yesterday on the role of ontologies in drug discover. This being ongoing research in our group too, the talk was well received by the audience (which was not too large, because after mid-summer, Uppsala has holiday). First the first time, I microblogged a talk on my twitter account (using the #dumontieratuppsala tag). I have not got a XSLT ready to convert the relevant items into a nice HTML snippet for embedding in this blog, but will try to do that later. Meanwhile, I also made a few bookmarks here and there, which are available from Delicious.]]></summary></entry><entry><title type="html">Open Data: license, rights, aggregation, clean interfaces?</title><link href="https://chem-bla-ics.linkedchemistry.info/2009/05/18/open-data-license-rights-aggregation.html" rel="alternate" type="text/html" title="Open Data: license, rights, aggregation, clean interfaces?" /><published>2009-05-18T00:00:00+00:00</published><updated>2009-05-18T00:00:00+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/2009/05/18/open-data-license-rights-aggregation</id><content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2009/05/18/open-data-license-rights-aggregation.html"><![CDATA[<p>A <a href="http://blog.openwetware.org/scienceintheopen/2009/05/15/a-breakthrough-on-data-licensing-for-public-science/">recent post</a> by
<a href="http://blog.openwetware.org/scienceintheopen/">Cameron</a> on his visit last week with <a href="http://wwmm.ch.cam.ac.uk/blogs/adams/">Nico</a>,
<a href="http://wwmm.ch.cam.ac.uk/blogs/murrayrust/">Peter</a> and <a href="http://wwmm.ch.cam.ac.uk/blogs/downing/">Jim</a>, discussed
<a href="http://en.wikipedia.org/wiki/Open_data">Open Data</a> licensing. This lead to an interesting discussion on these matters, and
questions by me on why people care so much about only public domain data (or licensed with
<a href="http://www.opendatacommons.org/licenses/pddl/1.0/">PDDL</a> or <a href="http://wiki.creativecommons.org/CC0">CC0</a>).</p>

<p>Open licensing for data has not as much matured as for software, and international law seems to be more confusing about the
issues. I guess that is because data aggregation has been around for way before the computer era. The PDDL and CC0 both try to
overcome this fuzziness. But there is another issue we need to keep in mind. A lot of useful Data was aggregated and made Open
<em>before</em> these licenses came about, and use, for example, the <a href="http://www.gnu.org/copyleft/fdl.html">GNU FDL</a> license, such as
the <a href="http://www.nmrshiftdb.org/">NMRShiftDB</a>.</p>

<h2 id="rights">Rights</h2>

<p>Right now, there are two Open Data camps, much like the BSD-vs-GPL wars in Open Source: one that believes in waiving any rights
on the Data, indicating that facts are free; others that believe that data must be protected to not be eaten by big companies
and lost to the community (e.g. <a href="http://friendfeed.com/onssolubility/cf6afd52/should-we-contribute-solubility-data-to">the WolframAlpha arragnements are suspect</a>).</p>

<p>Of course, both camps are not that far apart, and both believe Open is important. Interestingly, there are some noteworthy
differences with the Open Source wars. I see parallels between the two, which details an important difference: Open Source has
algorithms (uncopyrightable) and implementations (copyrightable); Open Data has Data (uncopyrightable) and aggregation
(copyrightable). Open Source talks mostly about the implementation, not the algorithm; it’s Open Source, not Open Algorithms
after all. In cheminformatics it is even often the case that the algorithms are not even specified and that there only truly
is source.</p>

<p>However, Open Data in title does not make distinction. Data is fairly cheap and acquisition can be automated and computerized;
Aggregation, on the other hand, requires human involvement: curation and thinking about data models, etc. This is where added
value is. Consider an assigned NMR spectrum or the raw data returned from the spectrometer.</p>

<p>It is this added value that people want to protect, not the data itself. I think.</p>

<h2 id="aggregation">Aggregation</h2>

<p>One important argument that tend to show up when people argument for PDDL and CC0 is that it makes data aggregation easier.
This is most certainly true: if you can do whatever you like with a blob of data, that also means aggregate with any other
blob of data. However, copyleft licenses, like the GNU FDL, require the aggregation to have a compatible license too. It is
the license incompatibilities that make this impossible. Or … ?</p>

<p>Open Source has matured to such a point that it is fairly clear what the intended behaviour is, regarding derivatives. An
aggregation of software (typically refered to as a distribution) is only a derivative under certain conditions. This makes
it possible to run proprietary software on top of GNU/Linux, which uses the GNU GPL but does not require software to run on
top of it to be GPL too. Unless… unless, not a clear well-defined interface has been used, indicating a strong dependency.
Now, surely, these things have not been confirmed to match actual law in court, but the intentions are clear.</p>

<h2 id="clean-data-interfaces">Clean Data Interfaces?</h2>

<p>Now, if we would translate this to Open Data, would there be the equivalent of a clean interface? Can we build a data
distribution with data of various licenses? I think we can! I am not a lawyer and please consider this an invitation
to discuss these matters…</p>

<p>Let’s start simlpe… if I put a GNU FDL image in this blog, by linking to it with a open, free, clean HTML interface
(<code class="language-plaintext highlighter-rouge">&lt;img src=""/&gt;</code>), would that make my blog GNU FDL too? I don’t think so. Surely, I would need to list copyright owner,
and actually would be required to put the GNU FDL in my blog too, but hope linking to the license text would suffice too.
(Let’s skip fair use at this moment, and assume the use goes beyond fair use). Question: am I not using a clean interface,
and would this not make the image’s license no infect my blog?</p>

<p>A more difficult example, consider <a href="http://rdf.openmolecules.net/">rdf.openmolecules.net</a>, which surely aggregated facts,
including data from the NMRShiftDB and <a href="http://dbpedia.org/">DBPedia</a>. I am using a unique identifiers here, the NMRShiftDB
compound ID, and the DBPedia URL, which surely is GNU FDL, and use this to make a <code class="language-plaintext highlighter-rouge">&lt;owl:sameAs&gt;</code> statement. Again, please do
not consider fair use, which this certainly is. But, let’s say I put in some more DBPedia and NMRShiftDB data in this
aggregation. The GNU FDL data on rdf.openmolecules.net would be separate RDF blocks, with proper dc:license, dc:author
annotation. But the block would be part of a larger aggregation. The clean interface here is
<a href="http://en.wikipedia.org/wiki/Resource_description_framework">Resource Description Framework</a>.</p>

<p>This second case does not only affect my rdf.openmolecules.net website, but, for example, <a href="http://bio2rdf.org/">bio2rdf.org</a>
is also in the same situation and aggregated and distribute DBPedia’s GNU FDL data (e.g.
<a href="http://bio2rdf.org/searchns/dbpedia/hexokinase">hexinanose</a>. Does that make the
whole of bio2rdf database GNU FDL. They too use RDF as clean interface.</p>

<h2 id="call-for-discussion">Call for Discussion</h2>

<p>Despite what one of the two camps like to see, the mere fact of added value when making data aggregations will keep
copyleft license stay around, and instead of trying to convince everyone of the virtues of PDDL- and CC0-like licenses,
we should think about to what extend it really matters.</p>

<p>I can do my data analysis with data sources of various licenses. I can search and retrieve data from various sources
with various licenses. What obstacles are really there that disallow us to do science? Do the data interfaces we have
now not provide enough technical means to address the license incompatibilities? They have in Open Source, why would
that not apply to Open Data too?</p>]]></content><author><name>Egon Willighagen</name></author><category term="opendata" /><category term="nmrshiftdb" /><category term="rdf" /><category term="dbpedia" /><category term="bio2rdf" /><summary type="html"><![CDATA[A recent post by Cameron on his visit last week with Nico, Peter and Jim, discussed Open Data licensing. This lead to an interesting discussion on these matters, and questions by me on why people care so much about only public domain data (or licensed with PDDL or CC0).]]></summary></entry></feed>