Quantcast
Channel: Research »» Java
Browsing latest articles
Browse All 12 View Live

Image may be NSFW.
Clik here to view.

Working with large sets

We do a lot of work with unique user counting and we have developed some techniques for accurate counting in small bounded-size structures.  Periodically I like to make sure that all of our assumptions...

View Article



Image may be NSFW.
Clik here to view.

My Love/Hate Relationship with Hadoop

A few months ago, the need for some log file analysis popped up. As the junior Data Scientist, I had the genuine pleasure of waking up one morning to an e-mail from Matt and Rob letting me know that I...

View Article

Image may be NSFW.
Clik here to view.

Custom Input/Output Formats in Hadoop Streaming

Like I’ve mentioned before, working with Hadoop’s documentation is not my favorite thing in the world, so I thought I’d provide a straightforward explanation of one of Hadoop’s coolest features –...

View Article

Image may be NSFW.
Clik here to view.

Big Memory, Part 1

Author’s note: This will be the first of a series of posts about my adventures in building a “large”, in-memory hash table. This first post will focus on a few philosophical notes that inspired this...

View Article

Image may be NSFW.
Clik here to view.

Never trust a profiler

A week or so ago I had mentioned to Timon that for the first time a profiler had actually pointed me in a direction that directly lead to a positive increase in performance. Initially Timon just gave...

View Article


Image may be NSFW.
Clik here to view.

Big Memory, Part 3

Author’s Note: This is part 3 of a series of posts about my adventures in building a “large”, in-memory hash table. Part 1 introduced our goals and our approach to the task at hand. This post is a...

View Article

Image may be NSFW.
Clik here to view.

Efficient Field-Striped, Nested, Disk-backed Record Storage

At AK we deal with a torrent of data every day. We can report on the lifetime of a campaign which may encompass more than a year’s worth of data. To be able to efficiently access our data we are...

View Article

Image may be NSFW.
Clik here to view.

Adventures in Concurrency

  The Past The Summarizer, our main piece of aggregation infrastructure, used to have a very simple architecture: RSyslog handed Netty some bytes. A Netty worker turned those bytes into a String. The...

View Article


Image may be NSFW.
Clik here to view.

Open Source Release: java-hll

We’re happy to announce our newest open-source project, java-hll, a HyperLogLog implementation in Java that is storage-compatible with the previously released postgresql-hll and js-hll implementations....

View Article


Image may be NSFW.
Clik here to view.

HLL talk at SFPUG

I had the pleasure of speaking at the SF PostgreSQL User Group’s meetup tonight about sketching, the history of HLL, and our implementation of HLL as a PG extension. My slides are embedded below and...

View Article
Browsing latest articles
Browse All 12 View Live




Latest Images