and the HOWTO which provides the documentation as it stands at the moment is here:
If you want to pull the source code down and start hacking it, you can get it using subversion (http://subversion.tigris.org/) by typing:
svn co http://svn.citeulike.org/svn/ citeulike
The relevant features are:
- Language neutral. You can write plugins in whatever programming language you like (assuming it can run on my server).
- Sample code. I've tidied up a few of the existing plugins and released them as part of the new system. Ultimately I'll convert them all, but it's now at a state where it makes sense to release it and get everyone hacking on it.
- Proper documentation. It walks you through all the steps required to produce a new plugin.
- Test suite. You can (and should) write test cases for your scraper, and we'll know when the site changes its format and breaks the scraper (such is the nature of writing these things).
- Test harness. You can actually run the tests against your code without having to guess whether they'd work or not (which was the case up until now if you wanted to write a plugin).
- Common utility functionality (Author names, RIS, and BibTeX parsing) build into the "driver" part of the code, so you don't need to re-invent the wheel.
It's live on the server now, as is the first user submitted plugin (from Diwaker Gupta) to scrape the proceedings from the computer science journals on the USENIX site.