This coming fall Facebook is set to release Presto. For those not in the loop, Presto is not some consumer novelty or mobile accessory such as Home – it is actually part of Facebook’s internal backend. The Presto engine is Facebook’s way of dealing with its massive scale and is able to search through 250 petabytes of data, and it is soon to be made available as open source.
Facebook, like many companies that are applying big data analytics, has maintained a Hadoop and Hive implementation – the largest in the world in fact. However, there has been one problem with this: Hadoop, being geared towards matrix-oriented solutions such as PageRank and batch processing, does not map over well to Facebook’s needs. Hence the need for Presto, which replaces Hadoop (though Hive is still the underlying data warehouse). Presto can handle all of the data under Facebook’s ownership, and has been shown to execute queries eight to ten times faster than the comperable implementation using Hadoop.