Jerven Bolleman et al. recently published a great preprint about how to use RDF to give SPARQL queries context by linking it (semantically) with metadata. The context includes keywords, the SPARQL endpoint the query can be run against, and a human-oriented description of the query. A few groups have at recent hackathons been working on usingn the combination of a SPARQL query and a human-oriented description to train large language models, including the group behind this paper. Given that SPARQL is a very small language, I can see this may work well, and that it may support our VHP4Safety and Scholia projects.

But in addition to the data model for SPARQL as research output (see doi:10.32388/ZNWI7T.2), the paper also introduces the sparql-example-utils software that I was first introduced with at the recent October Scholia hackathon.

But I have/had some features I like to see added. The first is provenance. Who is the author/contributor of the SPARQL query? Is there a open license for it, or perhaps public domain? How do I give attribution if I reuse the SPARQL query? These things matter in a modern recognition and rewards world where is room for everyone’s talent. A set of good SPARQL queries may be more valuable than a ten-page Jupyter notebook (and the other way around). So, I started writing patches. And I created a custom jar so that I can see these patches in action in our growing list of SPARQL queries (here a WikiPathways query):

I started collecting SPARQL queries for ChEMBL, WikiPathways, and VHP4Safety. These queries are often part of other interfaces but we can easily extract the original SPARQL from the Turtle files behind these pages.