Big Data Industry Solutions

In the US 3 of the top 5 organizations in banking, telecommunications, defense, media and retail run Hadoop based solutions.

Indicative per sector:

Continue reading


Hadoop Distributions and Offerings

Hadoop is available from either the Apache Software Foundation or from companies that offer their own Hadoop distributions.

The Hadoop ecosystem has many component parts, all of which exist as their own Apache projects. Because Hadoop has grown considerably, and faces some significant further changes, different versions of these open source community components might not be fully compatible with other components. This poses considerable difficulties for people looking to get an independent start with Hadoop by downloading and compiling projects directly from Apache.

Continue reading

The Apache Hadoop Ecosystem

Hadoop is the most common single platform for storing and analyzing big data.

Apache projects are created to develop open source software and are supported by the Apache Software Foundation,  a nonprofit organization made up of a decentralized community of developers. Open source software, which is usually developed in a public and collaborative way, is software whose source code is freely available to anyone for study, modification and distribution.

Hadoop was originally intended to serve as the infrastructure for the Nutch project in 2002. Nutch needed an architecture that could scale to billions of web pages, and this needed architecture was inspired by the Google File System, that would ultimately become HDFS. In 2004 Google published a paper introducing MapReduce and by 2006 Nutch was using both MapReduce and HDFS.

Continue reading