-
Editing and Validation of PubChem XML documents
With the general framework set up for editing and validation of CML documents, it was fairly easy to support the PubChem XML file format schema too. -
Editing and Validation of CML documents in Bioclipse
One advantage of using XML is that one can rely on good support in libraries for functionality. When parsing XML, one does not have to take care of the syntax, and focus on the data and its semantics. This comes at the expense of verbosity, though, but having the ability to express semantics explicitly is a huge benefit for flexibility. -
11 Years of Debian
11 years ago, a day more or less, I bought an the special issue of CHIP which shipped Debian 1.3.1. I think I’ve tried SuSe and RedHat earlier that year, but this Debian release made me switch away from proprietary products 98% (taxes I still had to do with Windows98). Right now, I am mostly running Ubuntu, which leans heavily on the work of the Debian project. -
State of CDK 1.2.0...
The reason why I have not blogged in more than two weeks, was that I was hoping to blog about the CDK 1.2.0 release. This was originally aimed at September, slipped into October, November and then December. There were only three show stoppers (see this wiki page), one of which the IChemObject interfaces were not properly tested. -
Peer reviewed Cheminformatics #2: Code review for the Chemistry Development Kit
Peer review is an important component of open source development, and recently there was the discussion the other way around, if open source is required for peer review. Depends on your definition of peer review: No, if you restrict peer review to what it is in publishing (see Re: Open Source != peer review); Yes, if we really want to speed up cheminformatics evolution and assume unrestricted, open peer review where reviewers can openly publish there review report with all the greasy details (see Peer reviewed Chemoinformatics: Why OpenSource Chemoinformatics should be the default). -
Cheminformatics Benchmark Project #1
Yesterday’s blog about Who says Java is not fast?!? caused quite some feedback (thanx to all commenters!) with several good points. Of course, a table like that in the cinfony paper (see also the comments in the blogs by Noel (the author) and Rich). Many things determine why the CDK might be fastest in that table for SDF iterating. Suggestions have been that OpenBabel and RDKit may be doing much more than simple reading; Java might actually take advantage of the second core for caching file content.