Data Rich Applications

Modern web browsers have enabled a new class of data rich, visualisation heavy web applications. Where in the past visualisations would be rendered server side as images you can serve the raw data to the browser and let it render the data for the user. This can reduce server load and makes for a more responsive, interactive end user experience.

The Cloudant Data Layer is the perfect home for data rich HTML5 applications. Building complex, responsive applications that process and present live data from large datasets is easy using the features of the data layer and modern JavaScript frameworks.

See the example

Context How do people use twitter

Context is how a user tweets; are they replying to another user, mentioning or retweeting them or just putting something new out there. It's interesting to look at to see how people are engaging with their followers; are they just broadcasting or are they having conversations with people?

The pie to the left shows the number of tweets per category. Use the buttons below to toggle different users.

The visualisation uses data from a view. This view could be queried at a different group level to pull out data at coarser granularity (in this case counts per context, rather than counts per context per user) or could be queried to return the full tweet details for a given context/user. The views are queried with stale=ok. This means that the view data is returned immediately but may not include the most recent changes to the dataset. You can choose to query without stale=ok to make sure that the user sees the latest data, but they may have to wait for the view to build. The Cloudant data layer takes care of maintaining your views; triggering indexing as data arrives. This means you can tailor your accesses to what you application needs; raw speed or the freshest data without worrying about when the views are built.

Reach How far does what they say go?

Reach is a measure of how far a users tweet goes. It is proportional to the number of followers the user has and the fraction of their tweets that get retweeted.

While not exactly a scientific measurement it is useful for comparing different users and how far their influence goes through the twitter-sphere.

The plot to the right uses data from two views to determine "reach"; the number of retweets and the number of followers. The reach value is then calculated as:

log(number of followers * (number of retweets / number of tweets))

What do they say? How do they say it?

Tweets are limited to 140 characters, but do people use that full amount or send shorter messages? The histogram to the left shows tweet lengths across our dataset. As you'd expect there's a spike near 140 characters but very few people actually make sure to use all the characters available to them.

Tweets can be geo-located, but do people actually use this feature? The pie chart to the left shows the ratio of located tweets (either at a specific place or just a lat-long point) to those without. As you might expect celebrity tweeters tend not to broadcast their whereabouts.

Surprisingly a large number of tweets come from the web site, not via third party tools (although this is dominated by @justinbieber). It seems @stevewoz pretty much only tweets via Foursquare.

Stats About the dataset

This application uses Backbone to connect the page to the data layer. The visualisations are rendered in the browser using d3.js.

See the raw data

See the raw data

This example has shown you how data rich applications, with significant dynamic visual content, can be built upon the Cloudant data layer using modern JavaScript frameworks. The Cloudant data layer is the perfect place to build and host these applications; from the ground up it has been built to process big data and serve it over the web to users. Processing a huge dataset, sending the resulting data to the browser and having it render your reports live is a very powerful and enabling technique. If you find this interesting, amusing or just really like Justin Bieber let the world know.