A weekend project to help extract machine-readable data from websites. I’ve been using this for a couple of months now, the first version was only a ruby script, last weekend I decide to wrap it in a very simple website.
A github blog post:
Extract it using wscrap:
The first version was based on the pismo gem, worked well for a few websites but then I decided to write my own version (wrong decision but worked well at the end). Requests are throttled with Rack, cache is done using Redis.
It’s a tiny project, runs on a single heroku dyno which gives a few limitations. In order to keep things fast every extracted url is cached on redis for 60 seconds and you have a limit of 30 requests per minute.