chem-bla-ics

Adding disclosures to Wikidata with Bioclipse

2016-03-20T00:00:00+00:00

Last week the huge, bi-annual ACS meeting took place (#ACSSanDiego), during which commonly new drug (leads) are disclosed. This time too, like this one tweeted by Bethany Halford:

Because getting this information out in the open is important, I think it’s a good idea to add them to Wikidata (see doi:10.3897/rio.1.e7573). So, with Bioclipse (doi:10.1186/1471-2105-8-59) I redrew the structure:

I previously blogged about how to add chemicals to Wikidata, but I realized that I wanted to also use Bioclipse to automate this process a bit. So, I wrote this script to generated the SMILES, InChI, InChIKey, double check the compound is not already in Wikidata (using the Wikidata SPARQL endpoint), an look up the PubChem compound identifier (example SMILES).

smiles = "CCCC"

mol = cdk.fromSMILES(smiles)
ui.open(mol)

inchiObj = inchi.generate(mol)
inchiShort = inchiObj.value.substring(6)
key = inchiObj.key // key = "GDGXJFJBRMKYDL-FYWRMAATSA-N"

sparql = """
PREFIX wdt: 
SELECT ?compound WHERE {
  ?compound wdt:P235 "$key" .
}
"""

if (bioclipse.isOnline()) {
  results = rdf.sparqlRemote(
    "https://query.wikidata.org/sparql", sparql
  )
  missing = results.rowCount == 0
} else {
  missing = true
}

formula = cdk.molecularFormula(mol)

// Create the Wikidata QuickStatement,
// see https://tools.wmflabs.org/wikidata-todo/quick_statements.php

item = "LAST" // set to Qxxxx if you need to append info,
              // e.g. item = "Q22579236"

pubchemLine = ""
if (bioclipse.isOnline()) {
  pcResults = pubchem.search(key)
  if (pcResults.size == 1) {
    cid = pcResults[0]
    pubchemLine = "$item\tP662\t\"$cid\""
  }
}

if (!missing) {
  println "===================="
  println "Already in Wikidata as " + results.get(1,"compound")
  println "===================="
} else {
  statement = """
    CREATE
    
    $item\tDen\t\"chemical compound\"
    $item\tP233\t\"$smiles\"
    $item\tP274\t\"$formula\"
    $item\tP234\t\"$inchiShort\"
    $item\tP235\t\"$key\"
    $pubchemLine
  """

  println "===================="
  println statement
  println "===================="
}

The output of this script is a QuickStatement for Magnus Manske’s tool (IMPORTANT: it’s not meant to automate editing Wikidata! I only automate creating the input, which I carefully check (e.g. checking all stereochemistry is defined)! Note, how Bioclipse opens up the structure in a viewer with ui.open()), which is a list of commands to create and edit entries in Wikidata. You need to enable it first, but if you have an account, this is not too hard. Of course, the advantage is that it is a lot quicker. I have similar script to create QuickStatements starting with only a ChEMBL identifier.

The QuickStatement for GDC-0853 looks like:

    CREATE
    
    LAST Den "chemical compound"
    LAST P233 "O=C1C(=CC(=CN1C)c2ccnc(c2CO)N4C(=O)c3cc5c(n3CC4)CC(C)(C)C5)Nc6ncc(cc6)N7CCN(C[C@@H]7C)C8COC8"
    LAST P274 "C37H44N8O4"
    LAST P234 "1S/C37H44N8O4/c1-23-18-42(27-21-49-22-27)9-10-43(23)26-5-6-33(39-17-26)40-30-13-25(19-41(4)35(30)47)28-7-8-38-34(29(28)20-46)45-12-11-44-31(36(45)48)14-24-15-37(2,3)16-32(24)44/h5-8,13-14,17,19,23,27,46H,9-12,15-16,18,20-22H2,1-4H3,(H,39,40)/t23-/m0/s1"
    LAST P235 "WNEODWDFDXWOLU-QHCPKHFHSA-N"
    LAST P662 "86567195"

The first line creates a new Wikidata item, while the next ones add information about this compound. GDC-0853 is now also Q23304817. The label I added manually afterwards. Note how the Bioclipse script found the PubChem identifier, using the InChIKey. I also use this approach to add compounds to Wikidata that we have in WikiPathways.

Chemical blogspace is getting more chemical

2007-01-04T00:00:00+00:00

The best remedy for being depressed is the rush after hacking some nice new feature (unfortunately, it is addictive). After hacking InChI support into Chemical blogspace a couple of days back, adding some more visual feedback on those molecules is not that hard, with PubChem around that is:

Beware! Every marked up molecule in your blog is being picked up! So should the compound with the SMILES N(=NC1=CC=C(C=C1)N(CCO)CCO)C3=CC=C(C=CC2=C(C(=C(C#N)C#N)OC2(C)C)C#N)S3, which is reported to be the most light sensitive molecule every synthesized so far .

SMILES, CAS and InChI in blogs: Greasemonkey

2006-12-17T00:00:00+00:00

As follow up on my Including SMILES, CML and InChI in blogs blog last week, I had a go at Greasemonkey. Some time ago already, Flags and Lollipops and Nodalpoint showed with two cool mashups (one Connotea/Postgenomic and one Pubmed/Postgenomic) that userscripts are rather useful in science too. I can very much recommend the PubMed/Postgenomic mashup, as PubMed has several organic chemistry journals indexed too!

So, how does this relate to my blog of last week? Well, would it not be nice that if your blog uses the markup as suggested in that blog, that you automatically get links to PubChem and Google? That is now possible with a small GPL-ed Greasemonkey script called blogchemistry.user.js.

The Greasemonkey plugin requires Firefox to be installed. If ready, install the script by cli·cking this link earlier, and the Greasemonkey will ask you if you want to install the script. After, check the output for this RDFa markup content:

a SMILES: CCO
a CAS registry number: 50-00-0
and an InChI: InChI=1/CH4/h1H4

It should look like the output for this blog item:

Note the superscript PubChem and Google links.