<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.3.4">Jekyll</generator><link href="https://chem-bla-ics.linkedchemistry.info/feed/by_tag/strigi.xml" rel="self" type="application/atom+xml" /><link href="https://chem-bla-ics.linkedchemistry.info/" rel="alternate" type="text/html" /><updated>2026-06-15T12:00:19+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/feed/by_tag/strigi.xml</id><title type="html">chem-bla-ics</title><subtitle>Chemblaics (pronounced chem-bla-ics) is the science that uses open science and computers to solve problems in chemistry, biochemistry and related fields.</subtitle><author><name>Egon Willighagen</name></author><entry><title type="html">Extracting RDF from Chem4Word documents</title><link href="https://chem-bla-ics.linkedchemistry.info/2010/01/21/extracting-rdf-from-chem4word-documents.html" rel="alternate" type="text/html" title="Extracting RDF from Chem4Word documents" /><published>2010-01-21T00:00:00+00:00</published><updated>2010-01-21T00:00:00+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/2010/01/21/extracting-rdf-from-chem4word-documents</id><content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2010/01/21/extracting-rdf-from-chem4word-documents.html"><![CDATA[<p><a href="http://jat45.wordpress.com/">Joe</a> has released the first <a href="http://research.microsoft.com/en-us/projects/chem4word/">Chem4Word</a>
<a href="http://jat45.files.wordpress.com/2010/01/example.docx">demo file</a>, and has written about how to
<a href="http://jat45.wordpress.com/2010/01/20/extracting-cml-from-a-chem4word-authored-document-java/">extract the CML with Java</a>
and <a href="http://jat45.wordpress.com/2010/01/21/extracting-cml-from-a-chem4word-authored-document-c/">with C#</a>.</p>

<p>I haven’t actually gotten around to fiddling with Java, but ran <a href="http://strigi.sf.net/">Strigi</a> against it to extract RDF,
while having the <a href="http://neksa.blogspot.com/2007/05/introduction.html">Strigi-Chemistry</a> plugins installed. This is part of the
<a href="http://en.wikipedia.org/wiki/Resource_Description_Framework">RDF</a> that came out:</p>

<div class="language-turtle highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nl">&lt;example-doc.docx&gt;</span><span class="w">
  </span><span class="nl">&lt;http://freedesktop.org/standards/xesam/1.0/core#title&gt;</span><span class="w">
    </span><span class="s">"acetic acid"</span><span class="p">,</span><span class="w">
    </span><span class="s">"(8R,9S,10R,13S,14S,17S)- 17-hydroxy-10,13-dimethyl- 1,2,6,7,8,9,11,12,14,15,16,17-dodecahydrocyclopenta[a] phenanthren-3-one"</span><span class="p">,</span><span class="w">
    </span><span class="s">"testosterone"</span><span class="p">;</span><span class="w">
  </span><span class="nl">&lt;http://freedesktop.org/standards/xesam/1.0/core#version&gt;</span><span class="w">
    </span><span class="s">"2"</span><span class="p">,</span><span class="w">
    </span><span class="s">"2"</span><span class="p">;</span><span class="w">
  </span><span class="nl">&lt;http://rdf.openmolecules.net/0.9#atomCount&gt;</span><span class="w">
    </span><span class="s">"8"</span><span class="p">,</span><span class="w">
    </span><span class="s">"49"</span><span class="p">;</span><span class="w">
  </span><span class="nl">&lt;http://rdf.openmolecules.net/0.9#bondCount&gt;</span><span class="w">
    </span><span class="s">"7"</span><span class="p">,</span><span class="w">
    </span><span class="s">"52"</span><span class="p">;</span><span class="w">
  </span><span class="nl">&lt;http://rdf.openmolecules.net/0.9#molecularFormula&gt;</span><span class="w">
    </span><span class="s">"C2H4O2"</span><span class="p">,</span><span class="w">
    </span><span class="s">"C19H28O2"</span><span class="p">;</span><span class="w">
</span></code></pre></div></div>

<p>I believe there is quite some room for improvement, but it’s a start :) Thanx to Joe for posting the public domain test file, so
that other projects can start play with the exiting new technology. I should note, however, that I am not running a Microsoft OS
nor MS-Word, and the saved documents source are the only way I have access to the
<a href="http://en.wikipedia.org/wiki/Chemical_Markup_Language">CML</a> right now.</p>]]></content><author><name>Egon Willighagen</name></author><category term="cml" /><category term="java" /><category term="rdf" /><category term="chem4word" /><category term="strigi" /><summary type="html"><![CDATA[Joe has released the first Chem4Word demo file, and has written about how to extract the CML with Java and with C#.]]></summary></entry><entry><title type="html">KDE4 keyword support mockups</title><link href="https://chem-bla-ics.linkedchemistry.info/2006/06/25/kde4-keyword-support-mockups.html" rel="alternate" type="text/html" title="KDE4 keyword support mockups" /><published>2006-06-25T00:00:00+00:00</published><updated>2006-06-25T00:00:00+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/2006/06/25/kde4-keyword-support-mockups</id><content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2006/06/25/kde4-keyword-support-mockups.html"><![CDATA[<p>In reply to interesting comments to <a href="https://chem-bla-ics.linkedchemistry.info/2006/06/20/strigi-gets-kfile-plugin-support.html">my previous blog <i class="fa-solid fa-recycle fa-xs"></i></a>
on <a href="http://www.vandenoever.info/software/strigi/">Strigi</a> and xAttr support in <a href="http://www.kde.org/">KDE</a>4, I would like to suggest
the following mockups, which I would find very useful. The deal with the ability to store keywords, for example, not but necessarily
using xAttr. I have no idea on how to implement these mockups, so any help or pointers are appreciated.</p>

<p>The first plot is an example of how these keyword markup could be used in KDE, other than searching itself. When showing the properties
of a directory in KDE, it would show an overview of hottest keywords for that directory, such as used on social bookmark website like
<a href="http://technorati.com/">Technorati</a> too:</p>

<p><img src="/assets/images/kfileXAttrSupport.png" alt="" /></p>

<p>This example shows that the keyword ‘Strigi’ was used much inside the index_files directory (they are not just the keywords given for
that directory, but a summary of the directory content!). Now, these keywords could be stored as xAttr, but in a database too. The
first requires a filesystem that supports xAttr, while the second requires a database daemon to be running. However, for speed
performance reasons this would be required anyway. Strigi indexes xAttr now (post 0.3.0 release), and basically allows both.</p>

<p>Independent of the chosen/prefered way to store keywords, these keywords can be edited from the Properties dialog:</p>

<p><img src="/assets/images/kfileXAttrSupport2.png" alt="" /></p>

<p>Now comes the tricky part: though I would like to add this to KDE, I do not have the C++/KDE experience to actually do this.
I’m already happy that I was able to extend the Strigi with support for KDE’s kfile architecture. Yes, the Strigi version in
SVN will index all metadata extractable with kfile plugins installed on the KDE installation.</p>]]></content><author><name>Egon Willighagen</name></author><category term="kde" /><category term="strigi" /><category term="technorati" /><summary type="html"><![CDATA[In reply to interesting comments to my previous blog on Strigi and xAttr support in KDE4, I would like to suggest the following mockups, which I would find very useful. The deal with the ability to store keywords, for example, not but necessarily using xAttr. I have no idea on how to implement these mockups, so any help or pointers are appreciated.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://chem-bla-ics.linkedchemistry.info/assets/images/kfileXAttrSupport2.png" /><media:content medium="image" url="https://chem-bla-ics.linkedchemistry.info/assets/images/kfileXAttrSupport2.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Strigi gets kfile plugin support</title><link href="https://chem-bla-ics.linkedchemistry.info/2006/06/20/strigi-gets-kfile-plugin-support.html" rel="alternate" type="text/html" title="Strigi gets kfile plugin support" /><published>2006-06-20T00:00:00+00:00</published><updated>2006-06-20T00:00:00+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/2006/06/20/strigi-gets-kfile-plugin-support</id><content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2006/06/20/strigi-gets-kfile-plugin-support.html"><![CDATA[<p>With some help, I got the <a href="http://developer.kde.org/documentation/tutorials/kfile-plugin/t1.html">kfile</a> stream analyzer
for <a href="http://www.vandenoever.info/software/strigi/">Strigi</a> working. This means that Strigi will now index the meta data
fields defined by the <a href="http://www.kde-apps.org/content/show.php?content=28995">kfile-chemical</a> plugins.</p>

<p>The problem why it was not working earlier, was that it segfaulted on every creation of KDE classes. That’s something I
really hate about C/C++: the lack of stack traces, though <a href="http://valgrind.org/">valgrind</a> was helpful. It turned out
that adding the below line fixed all. A <a href="http://developer.kde.org/documentation/library/3.0-api/classref/kdecore/KInstance.html">KInstance</a>
is needed when using KDE technology outside a KDE program:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>KInstance instance( "strigita_kfile" );
</code></pre></div></div>

<p>Combine this with the <a href="http://wiki.linuxquestions.org/wiki/Extended_attributes">xattr</a> support added by Jos earlier today, I hope to
see an interesting new Strigi release soon! Now we only need to get <a href="https://chem-bla-ics.linkedchemistry.info/2006/06/17/kde-desktop-search-kat-strigi-and.html">editing of keywords <i class="fa-solid fa-recycle fa-xs"></i></a>
into KDE4.</p>]]></content><author><name>Egon Willighagen</name></author><category term="strigi" /><category term="kde" /><summary type="html"><![CDATA[With some help, I got the kfile stream analyzer for Strigi working. This means that Strigi will now index the meta data fields defined by the kfile-chemical plugins.]]></summary></entry><entry><title type="html">KDE desktop search: Kat, Strigi and Tenor</title><link href="https://chem-bla-ics.linkedchemistry.info/2006/06/17/kde-desktop-search-kat-strigi-and.html" rel="alternate" type="text/html" title="KDE desktop search: Kat, Strigi and Tenor" /><published>2006-06-17T00:00:00+00:00</published><updated>2006-06-17T00:00:00+00:00</updated><id>https://chem-bla-ics.linkedchemistry.info/2006/06/17/kde-desktop-search-kat-strigi-and</id><content type="html" xml:base="https://chem-bla-ics.linkedchemistry.info/2006/06/17/kde-desktop-search-kat-strigi-and.html"><![CDATA[<p>Desktop searching has become a hot topic (some <a href="https://chem-bla-ics.linkedchemistry.info/2006/05/26/molecular-indexing-on-kde-and-osx.html">earlier <i class="fa-solid fa-recycle fa-xs"></i></a>
<a href="https://chem-bla-ics.linkedchemistry.info/2005/11/07/ubuntu-dapper-will-include-chemistry.html">blogs <i class="fa-solid fa-recycle fa-xs"></i></a>), now that years of data accumulated on ones
hard disk: PDFs, OpenOffice.org documents, Latex manuscripts, old Java source code, digitized music, and a lot of chemical files. Well,
on my hard disk that is. Unlike piles of paper, a computer could search this data, but due to the size an index is required. What’s KDE4
going to offer?</p>

<p>For the <a href="http://www.kde.org/">KDE</a> desktop <a href="http://kat.mandriva.com/">Kat</a> has for more than a year offered this, and latter
<a href="http://www.kde-apps.org/content/show.php?content=36832">Kerry</a> came along as frontend to [Beagle(http://beaglewiki.org/Main_Page)],
though this does not have the nice integration with KDE <a href="http://developer.kde.org/documentation/tutorials/kfile-plugin/t1.html">kfile plugins</a>.
Since then, Kat developed has come to a stop (unfortunately), and attempts to reach the main author
(<a href="mailto:roberto.cappuccio@gmail.com">Roberto</a>) have been unsuccesfull. Last thing happening was a rewrite of the database backend.</p>

<p>Additionally, <a href="http://dot.kde.org/1109163846/">Scott Wheeler proposed Tenor</a> on <a href="http://www.fosdem.org/">FOSDEM</a> 2005:
<em>“KDE 4: Beyond Hierarchical Data, The Desktop as a Searchable Web of Context”</em>. A semantic desktop; potentially cool, but I have heard
<a href="http://www.kdedevelopers.org/blog/72?from=10">little from it lately</a>, except for some rumours that
<a href="http://mail.kde.org/pipermail/klink/2006-April/000133.html">Scott has some actual code at home</a>.</p>

<p>Now, <a href="http://www.vandenoever.info/software/strigi/">Strigi</a> (<a href="http://www.kde-look.org/content/show.php?content=40889">download</a>) has come along,
with a fast indexing engine, just the thing where the Kat developed seemed to have stopped. The design is different from that of Kat, but it
does not seem unlikely that Kat code can be ported. No support for PDF or OpenOffice.org documents yet, but that’s really the easy part, and
kfile is on its way.</p>

<p>Getting back to Tenor, one might wonder how Strigi could implement Tenor concepts. A simple approach is at least to allow users to tag files,
just like we have become used to with blogs (e.g. <a href="http://www.technorati.com/">Technorati.com</a>) and websites (e.g.
<a href="http://www.connotea.org/">Connotea</a>). This could be easily implemented using <a href="http://wiki.linuxquestions.org/wiki/Extended_attributes">extended attributes</a>
(xattr), <a href="https://chem-bla-ics.linkedchemistry.info/2006/05/26/molecular-indexing-on-kde-and-osx.html">already used by Beagle <i class="fa-solid fa-recycle fa-xs"></i></a>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># file: home/egonw/1CRN.jpg
user.Tenor.Keywords="crambin"
user.Tenor.Comment="Used in my ontologies presentation."
</code></pre></div></div>

<p>Obviously, this example shows not just these tags, but a user comment too. The idea, here, is that Strigi mines these attributes in
addition to the file itself, so that search on tags can be done too. BTW, my argument to use this, instead of putting these things
in the Strigi database itself, is persistence: data and metadata are kept together. KDE’s file properties dialog would be extended
with an extra tab that allows editing these fields.</p>

<p>Strigi itself can be embedded in KDE applications to search specific information (e.g. search molecular data within
<a href="http://cniehaus.livejournal.com/23010.html">Kalzium</a> using the <a href="http://www.iupac.org/inchi/">InChI</a>), and even in the FileOpen dialog.
We need patches for KDE4 that allows this, soon.</p>]]></content><author><name>Egon Willighagen</name></author><category term="kde" /><category term="strigi" /><category term="kalzium" /><category term="linux" /><category term="technorati" /><category term="connotea" /><summary type="html"><![CDATA[Desktop searching has become a hot topic (some earlier blogs ), now that years of data accumulated on ones hard disk: PDFs, OpenOffice.org documents, Latex manuscripts, old Java source code, digitized music, and a lot of chemical files. Well, on my hard disk that is. Unlike piles of paper, a computer could search this data, but due to the size an index is required. What’s KDE4 going to offer?]]></summary></entry></feed>