Machine Learning (ML) is a computing technique that has its origins in artificial intelligence (AI) and statistics. Machine Learning solutions include:
- Classification– Predicting a Boolean true/false value for an entity with a given set of features.
- Regression– Predicting a real numeric value for an entity with a given set of features.
- Clustering– Grouping entities with similar features.
- Recommendation– Recommending an item to a user based on past behavior or preferences of similar users.
As already discussed the Stages in the knowledge discovery process are:
- Opportunity Assessment & Business Understanding
- Data Understanding & Data Acquisition
- Data Cleaning and Transformation
- Model Building
- Policy Construction
- Evaluation, Residuals and Metrics
- Model Deployment, Monitoring, Model Updates
Let’s see these stages in more detail: Continue reading
Data Science is not new. In fact, it’s been around for many years.
Over that time, various groups of data professionals have defined and documented methodologies that are useful when we need to conduct a data science project. All these several attempts to make the process of discovering knowledge scientific, resulted to similar steps, therefore it is safe to state that there are some core principles that underlie the data science process.
Data Science is the exploration and quantitative analysis of all available structured and unstructured data to develop understanding, extract knowledge, and formulate actionable results.
The term “data science” has existed for over thirty years and was used initially as a substitute for computer science by Peter Naur in 1960. In 1974, Naur published Concise Survey of Computer Methods, which freely used the term data science in its survey of the contemporary data processing methods that are used in a wide range of applications. In 1996, members of the International Federation of Classification Societies (IFCS) met in Kobe for their biennial conference. Here, for the first time, the term data science is included in the title of the conference (“Data Science, classification, and related methods”).
So what Data Science actually mean?
According to Wikipedia, in computing, a graph database is a database that uses graph structures for semantic queries with nodes, edges and properties to represent and store data.
Early in the 2000, engineers and scientists explored other models for working with data, further to relational databases, particularly models that were centered around graphs. They were blown away by the idea that it might be possible to replace the tabular SQL semantic with a graph-centric model that would be much easier for developers to work with when navigating connected data. They sensed that, armed with a graph data model, the development team might not waste half of its time working with relational databases.
The various NoSQL databases available today differ quite a bit, but there ae common threads uniting them: Flexibility, Scalability, Availability, Lower Costs and Special Capabilities.
NoSQL means ‘Not only SQL’, aka ‘Non-relational’. These are databases specifically introduced to handle the rise in data types, data access and data availability needs.
Today’s needs require a database that is capable of providing a scalable, flexible solution to efficiently and safely manage the massive flow of data to and from a global user base.