According to Wikipedia, in computing, a graph database is a database that uses graph structures for semantic queries with nodes, edges and properties to represent and store data.
Early in the 2000, engineers and scientists explored other models for working with data, further to relational databases, particularly models that were centered around graphs. They were blown away by the idea that it might be possible to replace the tabular SQL semantic with a graph-centric model that would be much easier for developers to work with when navigating connected data. They sensed that, armed with a graph data model, the development team might not waste half of its time working with relational databases.
The various NoSQL databases available today differ quite a bit, but there ae common threads uniting them: Flexibility, Scalability, Availability, Lower Costs and Special Capabilities.
NoSQL means ‘Not only SQL’, aka ‘Non-relational’. These are databases specifically introduced to handle the rise in data types, data access and data availability needs.
Today’s needs require a database that is capable of providing a scalable, flexible solution to efficiently and safely manage the massive flow of data to and from a global user base.
In the US 3 of the top 5 organizations in banking, telecommunications, defense, media and retail run Hadoop based solutions.
Indicative per sector:
Hadoop is available from either the Apache Software Foundation or from companies that offer their own Hadoop distributions.
The Hadoop ecosystem has many component parts, all of which exist as their own Apache projects. Because Hadoop has grown considerably, and faces some significant further changes, different versions of these open source community components might not be fully compatible with other components. This poses considerable difficulties for people looking to get an independent start with Hadoop by downloading and compiling projects directly from Apache.
Hadoop is the most common single platform for storing and analyzing big data.
Apache projects are created to develop open source software and are supported by the Apache Software Foundation, a nonprofit organization made up of a decentralized community of developers. Open source software, which is usually developed in a public and collaborative way, is software whose source code is freely available to anyone for study, modification and distribution.
Hadoop was originally intended to serve as the infrastructure for the Nutch project in 2002. Nutch needed an architecture that could scale to billions of web pages, and this needed architecture was inspired by the Google File System, that would ultimately become HDFS. In 2004 Google published a paper introducing MapReduce and by 2006 Nutch was using both MapReduce and HDFS.
Following are some reasons why Big Data solutions are required:
- 90% of the data in the world today has been created in the last two years alone.
- 80% of the data is unstructured or exists in widely varying structures which are difficult to analyse.
- Structured formats have some limitations with respect to handling large quantities of data.
- It is difficult to integrate information distributed across multiple systems.
- Most business users do not know what should be analyzed
- Potentially valuable data is dormant or discarded.
- It is too expensive to integrate large volumes of unstrucured data.
- A lot of information has a short, useful lifespan.
- Context adds meaning to the existing information.
This Blog is created and maintained by Iraklis Mardiris