Sumit Sharma

Archive for October 26th, 2009

The role of Data and Information Aggregators

leave a comment »

“We will see two levels of data and information emerging: Knowledge in terms of facts and Knowledge in terms of insights.” – Eric Schmidt, CEO and Chairman, Google

We’ve organized the data, now what?

Regardless of the approach taken to define characteristics to the webs of core underlying raw data, it will inevitably imply additional overlays of metadata – contextual filters that describe, or contextualize, what the web of data consists of. The foundation has been laid, so how to build on this foundation is the next question. The next step is for analysis to be conducted on this semantic web set of data to figure out patterns of our characteristics and even predict behavior to ultimately give the right information, at the right place, at the right time.  This is predominately the job of data and information aggregators.

What are aggregators?

Aggregators can be in the form of engines that aggregate information based on keyword popularity, web cookies, search and browsing history etc – Google and Bing are examples of such aggregators. Other aggregators today utilize direct subjective user generated content such as ratings and rankings and have continuous dynamic feeds of information streamed based on rating/ranking popularity – aggregators such as Digg, Delicious, Fark and StumbleUpon do this primarily for news, articles, media and some products and services. Other aggregators of data are services and product platforms, news sites and blogs and social networking sites. However, in reality, the industry understands aggregators to be those entities that operate on a more macro level across greater sets of data.

All the examples just noted are deliberate aggregators, but there is a large set of entities that, through their functionality, are siphoning information to its user and they in a sense are indirect aggregators too. For example, productivity widgets such as for stock tickers should automatically populate which data you’d be interested in, or a company’s product has certain details relevant to you highlighted. The key message is that aggregators, in theory, are basically any entities that have content available for consumption.

All these aggregators have the immense responsibility to consume, understand and then develop insights from their users’ data and behavior. Going a level deeper, we can define the natures of aggregators using the notion of pull and push aggregators.

Push aggregators

Push aggregators target a certain audience and are most common today within the news media, i.e. Huffington post, drudge report can search engines, commerce platforms, advertising platforms, social networking platforms, knowledge and news platforms and all other utilities’ and services’ platforms– they are all providing information to users at some level or another, and this targeting will be based on understanding user requirements and hypothesizing which data and information to provide. For example, someone might prefer to read about Indian entertainment news only, American basketball and global technology news for example.

Pull aggregators

Pull aggregators are those with very selective consumer bases who self select themselves – examples of such aggregators include Aardvark, Hunch and other specific bulletin boards and silo’d information services. These aggregators, who base their algorithms on voluntarily provided user data, gathered data and information for users looking for very specific information such as “How can I fix my DVD player”.

Unclassified (neither Push not Pull) aggregators

There are also some aggregators, that aren’t quite Push nor Pull oriented but are in the middle. They tell us not what we need to know, and not what we’re looking for, but rather what we would be attracted to. An example of such an aggregator is DemandMedia – a storehouse of information on all sorts of topics, aggregated across a series of websites – some topics are so random that we’d never be specifically looking for them however attractive enough for us to be interested in.

Today, aggregators and their functions and operating models are extremely distributed, with little or no integration amongst each other – Google and Facebook do not share data.

There are still limitations to the aggregator algorithms today in that they are basing their data on very limited sets that will not accurately cover our real intent, especially since our activity touches data that is beyond the realm of disparate data sets . In other words, aggregating across silo’d sets of data, and enabling solutions is a dis-service and so the more data inputs we have to aggregators, and the more these aggregators know about you, the richer the benefits to you there will be. Being able to interpret each customer/user as a unique individual, and through analyzing contextual data, applying a filter which enables the aggregator to shape serendipitous connections with knowledge/information, people and other entities such as recommending physical locations and so on.

In an ideal world, aggregators will be omnipresent entities that integrate the physical and virtual worlds’ data and information, to provide an experience through which relevance and return on attention is maximized. The aggregator would have access to an unlimited amount of data and information, across all boundaries within the virtual and physical plains. That is to say that data between Google, Digg, Facebook, Yahoo, YouTube, WebMD, Amazon.com and all other sites, as well as user information across all 3 user plains of data will be available in a standardized and federated semantic format.

In essence, if done properly and all conditions such as data federation and privacy conditions are dealt with, push and pull aggregators’ data and information will converge to be the same. This will be highly disruptive to the world as we know it today: media, services and products’ industries will all need to re-think their operations and strategies because at their current models, they will to provide increasingly irrelevant information to users.

Written by Sumit

October 26, 2009 at 6:54 am