• Touchgraphing my blog

    Via SciFoo Planet (from Partial immortalization ) I learned about TouchGraph Google (Peter brought it into Chemical blogspace). It’s cool, though not open source. Here’s the touch graph for my blog:
  • Centralized or decentralized?

    Peter wondered if data should be stored centralized or decentralized, when Deepak blogged about Freebase and Metaweb. Now, I haven’t really looked into these two projects, but the question of centralized versus decentralized is interesting. It’s MySQL versus the world wide web; it’s the PubChem compound ID versus the InChI; it’s http://cb.openmolecules.net/rdf/?InChI=1/CH4/h1H4 versus info:inchi/InChI=1/CH4/h1H4 (see RDF-ing molecular space ).
  • Molecular Connectivity Tables in Images

    Rich blogged about to Never Draw the Same Molecule Twice: Viewing Image Metadata in which he shows his molecular editor outputting images of molecular structure where the connectivity table of structure is embedded in the image. His molecular editor can read the image again, and will automatically pick up the embedded connection table. Noel showed that such can not only be done in Java, but in Python too.
  • Molecules in Wikipedia without InChIs

    I reported last week about the Molecules in Wikipedia and the plethora of templates used. Chemical blogspace has also been using Wikipedia URLs as molecular identifier and extracting InChIs from the wiki pages (see Using Wikipedia to recognize Molecules in Blogspace ). Several people have shown interest in adding InChIs for molecules in Wikipedia, so here’s a new version of a list it molecules without InChIs:
  • Molecules in Wikipedia

    I do not care about physical and chemical properties in Wikipedia, as I can easily extract them from other sources. The main value of Wikipedia for molecules is, I think, that it describes the history of a molecule. Additionally, the Wikipedia URL is a nice unique molecular identifier (for example http://en.wikipedia.org/wiki/Lactose) given certain conditions, and many bloggers are using it as such . But, it only is a useful identifier if one (and only one) InChI is stated on the wiki page.
  • Excel messes up your data analysis :)

    Well, no wonder: Excel is meant to be used to process money flows. Anyway, greyarea pointed me to this nice blog item from March 2006. It discusses a 2004 article in BMC Bioinformatics Mistaken Identifiers: Gene name errors can be introduced inadvertently when using Excel in bioinformatics by Barry Zeeberg et al. (DOI:10.1186/1471-2105-5-80). Hence, the importance of semantics and proper markup languages. The quotes are illustrative:
  • RDF-ing molecular space

    RDF might be the solution we are looking for to get a grip on the huge amount of information we are facing. microformats, and RDFa, are just solutions along the way, and Gleaning Resource Descriptions from Dialects of Languages (GRDDL) might be an important tool to get the web RDF-ied.