noodle

Overview

noodle is a Node.js server and module for querying and scraping data from web documents. It features:

  • Cross domain document querying (html, json, xml, atom, rss feeds)
  • Server supports querying via JSONP and JSON POST
  • Multiple queries per request
  • Access to queried server headers
  • Allows for POSTing to web documents
  • In memory caching for query results and web documents

Download

Star the project on GitHub, or download it:

noodle on GitHub Edge (master)

Try it out

Install via NPM

$ npm install noodlejs

Install via Git

$ git clone https://github.com/dharmafly/noodle.git

Run the server and GET or POST queries on localhost:8888

$ cd noodle
# or `cd node_modules/noodlejs` if installed via npm
$ bin/noodle-server
  Server running on port 8888

Or use as a node module

$ var noodle = require('noodlejs');

Editor

Below is an editor where you can try writing a query yourself.

The query below tells noodle to go to the google search result for JavaScript and expect a html file. Then using the selector pick out all of the result anchors. Finally the query says to extract the text for each of those anchor elements.

Press run below to see the output:

var query = {
    url: 'http://google.com/search?q=javascript',
    type: 'html',
    selector: 'h3.r a',
    extract: 'text'
  },
  uriQuery = encodeURIComponent(JSON.stringify(query)),
  request  = 'http://example.noodlejs.com/?q=' +
             uriQuery + '&callback=?';

// Make Ajax request to Noodle server
jQuery.getJSON(request, function (data) {
  alert(data[0].results);
});

Noodle queries don’t just support html but also json, feeds and plain xml. They can be a lot more powerful too. Read the reference for more details.

Server quick setup

Setup

$ git clone https://github.com/dharmafly/noodle.git
$ cd noodle
$ npm install

Start the server by running the binary

$ bin/noodle-server
 Server running on port 8888

You may specify a port number as an argument

$ bin/noodle-server 9090
 Server running on port 9090