Time Series Analysis Part 1 (intro)

A time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data.

Time Series Data obviously has to do with time and the first thing that comes to mind is finance. The world of finance is the world of time series, so stocks, and currency exchange rates and interest rates, they’re all Time series.

Continue reading

Graph Databases

According to Wikipedia, in computing, a graph database is a database that uses graph structures for semantic queries with nodes, edges and properties to represent and store data.

Early in the 2000, engineers and scientists explored other models for working with data, further to relational databases, particularly models that were centered around graphs. They were blown away by the idea that it might be possible to replace the tabular SQL semantic with a graph-centric model that would be much easier for developers to work with when navigating connected data. They sensed that, armed with a graph data model, the development team might not waste half of its time working with relational databases.

Continue reading

What is NoSQL?

NoSQL means ‘Not only SQL’, aka ‘Non-relational’. These are databases specifically introduced to handle the rise in data types, data access and data availability needs.

Today’s needs require a database that is capable of providing a scalable, flexible solution to efficiently and safely manage the massive flow of data to and from a global user base.

Continue reading

Hadoop Distributions and Offerings

Hadoop is available from either the Apache Software Foundation or from companies that offer their own Hadoop distributions.

The Hadoop ecosystem has many component parts, all of which exist as their own Apache projects. Because Hadoop has grown considerably, and faces some significant further changes, different versions of these open source community components might not be fully compatible with other components. This poses considerable difficulties for people looking to get an independent start with Hadoop by downloading and compiling projects directly from Apache.

Continue reading

The Apache Hadoop Ecosystem

Hadoop is the most common single platform for storing and analyzing big data.

Apache projects are created to develop open source software and are supported by the Apache Software Foundation,  a nonprofit organization made up of a decentralized community of developers. Open source software, which is usually developed in a public and collaborative way, is software whose source code is freely available to anyone for study, modification and distribution.

Hadoop was originally intended to serve as the infrastructure for the Nutch project in 2002. Nutch needed an architecture that could scale to billions of web pages, and this needed architecture was inspired by the Google File System, that would ultimately become HDFS. In 2004 Google published a paper introducing MapReduce and by 2006 Nutch was using both MapReduce and HDFS.

Continue reading