Developer Aspirations

YAPB - Yet Another Programming Blog

Hadoop

Wednesday

02

December 2009

Apache Pig Tips #1

by Colin Miller, on Hadoop, Pig, Scalability

Pig is a new and growing platform on top of Hadoop that makes writing jobs easier because you can avoid writing Map and Reduce functions in Java directly while still allowing you to do so if you choose. Instead it creates a bunch of basic functions such as COUNT, FILTER, FOREACH, and such that you would normally have to independently write for each data manipulation you want to perform. Unfortunately,…

Monday

30

November 2009

Pig Frustrations

by Colin Miller, on Hadoop, Pig

My desires to implement better scalability through pre-processing reports via the Grid have lead me to Pig. Unfortunately, while Pig does remove some of the difficulties of writing for Hadoop (you no longer have to write all of the map-reduce jobs yourself in java), it has many limitations. The biggest limitation I've found so far is just lack of documentation. There are a few tutorials, and a language reference, but…

Monday

23

November 2009

Dynamic Offline Reports

by Colin Miller, on Clojure, Erlang, Hadoop, MapReduce, Scala, Scalability

Many applications have the primary concern of storing and retrieving data. The raw data by itself is often not very useful, so an additional process is put into place to turn that data into useful information. Many applications generate these reports from the data quickly at the user's request through either a narrow SQL select statement, in application data processing, or both. However, in larger applications where the data is…