Research »» Java

Image may be NSFW.
Clik here to view.

Working with large sets

May 13, 2011, 8:36 am

We do a lot of work with unique user counting and we have developed some techniques for accurate counting in small bounded-size structures. Periodically I like to make sure that all of our assumptions...

View Article

Image may be NSFW.
Clik here to view.

My Love/Hate Relationship with Hadoop

June 5, 2011, 11:39 am

A few months ago, the need for some log file analysis popped up. As the junior Data Scientist, I had the genuine pleasure of waking up one morning to an e-mail from Matt and Rob letting me know that I...

View Article

Image may be NSFW.
Clik here to view.

Custom Input/Output Formats in Hadoop Streaming

August 30, 2011, 4:41 am

Like I’ve mentioned before, working with Hadoop’s documentation is not my favorite thing in the world, so I thought I’d provide a straightforward explanation of one of Hadoop’s coolest features –...

View Article

Image may be NSFW.
Clik here to view.

Big Memory, Part 1

October 18, 2011, 3:41 pm

Author’s note: This will be the first of a series of posts about my adventures in building a “large”, in-memory hash table. This first post will focus on a few philosophical notes that inspired this...

View Article

Image may be NSFW.
Clik here to view.

Never trust a profiler

November 14, 2011, 8:10 am

A week or so ago I had mentioned to Timon that for the first time a profiler had actually pointed me in a direction that directly lead to a positive increase in performance. Initially Timon just gave...

View Article

Image may be NSFW.
Clik here to view.

Big Memory, Part 3

November 15, 2011, 11:01 am

Author’s Note: This is part 3 of a series of posts about my adventures in building a “large”, in-memory hash table. Part 1 introduced our goals and our approach to the task at hand. This post is a...

View Article

Image may be NSFW.
Clik here to view.

Efficient Field-Striped, Nested, Disk-backed Record Storage

August 10, 2012, 1:30 pm

At AK we deal with a torrent of data every day. We can report on the lifetime of a campaign which may encompass more than a year’s worth of data. To be able to efficiently access our data we are...

View Article

Image may be NSFW.
Clik here to view.

Adventures in Concurrency

October 18, 2012, 11:07 am

The Past The Summarizer, our main piece of aggregation infrastructure, used to have a very simple architecture: RSyslog handed Netty some bytes. A Netty worker turned those bytes into a String. The...

View Article

Image may be NSFW.
Clik here to view.

Open Source Release: java-hll

December 24, 2013, 10:45 am

We’re happy to announce our newest open-source project, java-hll, a HyperLogLog implementation in Java that is storage-compatible with the previously released postgresql-hll and js-hll implementations....

View Article

Image may be NSFW.
Clik here to view.

HLL talk at SFPUG

September 23, 2014, 11:46 pm

I had the pleasure of speaking at the SF PostgreSQL User Group’s meetup tonight about sketching, the history of HLL, and our implementation of HLL as a PG extension. My slides are embedded below and...

View Article

More Pages to Explore .....

Latest Images