CI-BER: CyberInfrastructure for Billions of Electronic Records: CI-BER at LDAV 2011

The CI-BER project took a poster of its latest work to the Large Data Analysis and Visualization symposium, co-located with VisWeek 2011 in Providence, RI from September 23-25th. The conference was great, and I wanted to quickly share the highlights of the poster we presented.

We focused in the poster on communicating the unique problems that are faced when indexing and visualizing archives. Namely, the problem of dealing with data with highly variable structure and "quality." This has posed a number of challenges over the last year as older files found ways of corrupting the indexer process, corrupting the index, and confusing the indexer to no end. This was eventually overcome by separating the indexer core, that is the queue of files and the scanning process, from the part that actually delved into files to collect metadata. These are isolated into separate UNIX processes to keep a core dump or other major failure on one file from bringing down the entire indexing process or making it grind to a halt. This also permits us to distribute indexing across multiple machines, making for better overall performance. The indexer architecture is shown below:

This indexer allowed us to create prototypes that give a highly interactive view of the over 60TB collection as far as it relates to geography. The indexer core will be going Open Source on GitHub by December 2011, and I will post the announcement of its availability here.

In addition to the previously blogged prototype on treemap visualization, we created a visualization which allows a user to geographically search the collection. The user draws bounding boxes with a swipe of the finger on a tablet device (such as the iPad) and that searches the index for metadata records that can be reviewed as results in the side-pane. The user can then drill down on the side pane to retrieve the actual metadata record itself.

These tools will also be going Open Source about the same time, and will be on GitHub with the indexer.

LDAV Poster on Slideshare -- Click *HERE*

CI-BER: CyberInfrastructure for Billions of Electronic Records

Monday, November 14, 2011

CI-BER at LDAV 2011

No comments:

Post a Comment

Background

Total Pageviews