Working with large sets
We do a lot of work with unique user counting and we have developed some techniques for accurate counting in small bounded-size structures. Periodically I like to make sure that all of our assumptions...
View ArticleMy Love/Hate Relationship with Hadoop
A few months ago, the need for some log file analysis popped up. As the junior Data Scientist, I had the genuine pleasure of waking up one morning to an e-mail from Matt and Rob letting me know that I...
View ArticleCustom Input/Output Formats in Hadoop Streaming
Like I’ve mentioned before, working with Hadoop’s documentation is not my favorite thing in the world, so I thought I’d provide a straightforward explanation of one of Hadoop’s coolest features –...
View ArticleBig Memory, Part 1
Author’s note: This will be the first of a series of posts about my adventures in building a “large”, in-memory hash table. This first post will focus on a few philosophical notes that inspired this...
View ArticleNever trust a profiler
A week or so ago I had mentioned to Timon that for the first time a profiler had actually pointed me in a direction that directly lead to a positive increase in performance. Initially Timon just gave...
View ArticleBig Memory, Part 3
Author’s Note: This is part 3 of a series of posts about my adventures in building a “large”, in-memory hash table. Part 1 introduced our goals and our approach to the task at hand. This post is a...
View ArticleEfficient Field-Striped, Nested, Disk-backed Record Storage
At AK we deal with a torrent of data every day. We can report on the lifetime of a campaign which may encompass more than a year’s worth of data. To be able to efficiently access our data we are...
View ArticleAdventures in Concurrency
The Past The Summarizer, our main piece of aggregation infrastructure, used to have a very simple architecture: RSyslog handed Netty some bytes. A Netty worker turned those bytes into a String. The...
View ArticleOpen Source Release: java-hll
We’re happy to announce our newest open-source project, java-hll, a HyperLogLog implementation in Java that is storage-compatible with the previously released postgresql-hll and js-hll implementations....
View ArticleHLL talk at SFPUG
I had the pleasure of speaking at the SF PostgreSQL User Group’s meetup tonight about sketching, the history of HLL, and our implementation of HLL as a PG extension. My slides are embedded below and...
View Article
More Pages to Explore .....