Full Text Indexing and Query

Cloudant Search 2.0 is a full text indexing (FTI) and query system based on embedded Lucene libraries. Search is ideal for arbitrary queries on high-dimensional data. Cloudant uses a system very similar to the incremental MapReduce engine to index your data in real time and provide a simple, scalable, and fast search engine that can process arbitrarily large volumes of data or concurrent queries without requiring the user to worry about scaling concerns. Cloudant Search 2.0 is an ideal replacement for the limited in-database search capabilities of MySQL or SQL server, "bolt-on" external search systems like Sphinx and SOLR, or the multi-dimensional query language of MongoDB.

See the example

Lobbyist Disclosure Data

This example uses the public lobbyist disclosure dataset from that can be downloaded from the US Senate. We have extracted the information from the individual XML documents and uploaded them to a public database hosted on Cloudant.com and accessible at http://examples.cloudant.com/lobby-search/. The dataset consists of 757,123 individual documents. The uncompressed XML documents are 2.5 GB on disk, and the corresponding Cloudant database is only 1.3 GB. You can view any of the documents in your browser, e.g. http://examples.cloudant.com/lobby-search/019b716168d45be2c2bd8371d400272a.

In Cloudant Search 2.0, we provide a simple interface to allow the user to choose what data should be indexed, as well as "power-user" features like language choice, specific Lucene Analyzers, etc. The interface is nearly identical to how one writes map and reduce functions, except you substitute index() in place of the regular emit() function for each property of a document that you want to be indexed. For example, if you want to store doc.user_name and search it with a query like ?q=name:bob you simply call:

index("name", doc.user_name, {"store":"yes"});

To index the full lobbyist dataset we use this javascript function, which is automatically bundled into the design document using the couchapp python tool. In fact, this entire application is contained in a single design document!

Let's get searching

Using the above design document, we have prepared an index for all of the fields in the document corpus. The full list of fields is javascript available here. Suppose we want to find all health related filings, from the year 2009, in the state of California, with a a filing amount of over 50000. We simply write:

This example has shown you the simplicity, power, and speed of Cloudant Search 2.0 for high-dimensional selections that don't require the complex computations and aggregations of incremental MapReduce. This HTML5 application is served directly from the Cloudant Data Layer using no middle tier. You can clone this application by simply performing:

couchapp clone 'http://examples.cloudant.com/lobby-search/_design/lookup'