“Come and join us, and we will show you how to predict the future”.
I tend to get invitations like this all time. Friendly people from marketing want to meet me and eventually sell tools or services to predict the future. So, it makes me wonder, if they actually have a machine to predict the future, why on earth are they spending their energy selling software and organizing events? Perhaps, you could pause for a second and consider what would you do if you had such a machine?
Although artificial intelligence has led to remarkable achievements in recent years, expectations for what the field is able and will be able to achieve in the next decade tend to run much higher than what will actually turn out to be possible. While some world-changing applications like autonomous cars are already within reach, many more are likely to remain elusive for a long time.
The risk with high expectations for the short term is that, as technology fails to deliver, research investment will dry up, slowing down progress for a long time.
This has happened before. Twice in the past, AI went through a cycle of intense optimism followed by disappointment and skepticism, and a dearth of funding as a result. It started with symbolic AI in the 1960s. In these early days, projections about AI were flying high. One of the best-known pioneers and proponents of the symbolic AI approach was Marvin Minsky, who claimed in 1967: “Within a generation […] the problem of creating ‘artificial intelligence’ will substantially be solved”. Three years later, in 1970, he also made a more precisely quantified prediction: “in from three to eight years we will have a machine with the general intelligence of an average human being”. In 2016, such an achievement still appears to be far in the future, so far in fact that we have no way to predict how long it will take, but in the 1960s and early 1970s, several experts believed it to be right around the corner.
In the 1980s, a new take on symbolic AI, “expert systems”, started gathering steam among large companies. A few initial success stories triggered a wave of investment, with corporations around the world starting their own in-house AI departments to develop expert systems. Around 1985, companies were spending over a billion dollar a year on the technology, but by the early 1990s, these systems had proven expensive to maintain, difficult to scale, and limited in scope, and interest died down. Thus began the second AI winter.
It might be that we are currently witnessing the third cycle of AI hype and disappointment –and we are still in the phase of intense optimism. The best attitude to adopt is to moderate our expectations for the short term and make sure that people less familiar with the technical side of the field still have a clear idea of what we can and cannot deliver.
That said as an introduction, let’s see what we can actually achieve. We may not have a crystal ball, but we do have significant algorithmic advances, powerful hardware, large datasets and benchmarks. All these can help us to better understand the world and recognize patterns, hence creating models of the world, and when we apply the model we can have a feeling, an insight of what might happen in the future.
But as the statistician George Box said, “All models are wrong“. However, if all models are wrong, what are we doing here? Are we wasting our time? Should we discard science and go to astrology instead?
Of course not. The meaning is that the scientist cannot obtain a “correct” model by excessive elaboration. He/she should seek an economical description of the phenomena.
It would be very remarkable if any system existing in the real world could be exactly represented by any simple model.
The very word model implies simplification and idealization. The idea that complex systems can be exactly described by a few formulae absurd. A model is a simplification or approximation of reality and hence will not reflect all of the reality.
No models are true – not even the Newtonian laws. When we construct a model we leave out all the details which we, with the knowledge at our disposal, at that time, consider inessential…. Models should not be true, but it is important that they are applicable.
Cunningly chosen parsimonious models often do provide remarkably useful approximations. For example, the law PV = RT relating pressure P, volume V and temperature T of an “ideal” gas via a constant R is not exactly true for any real gas, but it frequently provides a useful approximation and furthermore its structure is informative since it springs from a physical view of the behavior of gas molecules.
For such a model there is no need to ask the question “Is the model true?”. If “truth” is to be the “whole truth” the answer must be “No”. The only question of interest is “Is the model illuminating and useful?”.
So, can we provide businesses illuminating and useful information? Yes, we can.
If our problem is time-related, then we already discussed in the previous sections some popular techniques that can give us insights.
ARIMA is with us since the seventies, it is well researched, has a solid methodology and provides good results. It is definitely the way to start with any time-series problem and then used as a benchmark for any other subsequent attempt.
Machine Learning algorithms, although very powerful, have to be treated with caution. Data should be pre-processed in a special way for time-series, and there are a lot of different ways to do it depending upon the problem. Also, many different machine learning algorithms exist that can easily be used off the shelf, however, each of these algorithms has its own hyperparameters that can drastically change its performance, and there is a staggeringly large number of possible alternatives overall. Consequently, the parametrization possibilities are practically infinite, and quite often to the question: “why did you use the value ABC for parameter X?” the answer is: “why not? I had a good feeling!” We may be lucky, we may be skillful and a good model is created in a timely fashion. But we also may not… In a real life business context, and not in a research laboratory, change happens fast and data keep coming at a high rate. The risk is creating a model which is already obsolete by the time it goes into production…
Personally, my next step after ARIMA is Deep Learning.
As a rule of thumb, LSTM Deep Learning works better if we are dealing with a large amount of data and enough training data is available, while ARIMA performs well also with smaller datasets.
Traditional machine learning has the advantage of feature introspection – that is, it knows why it is classifying an input in one way or another, which is important for analytics. But that very advantage is what excludes it from working with unlabeled and unstructured data, as well as attaining the record-breaking accuracy of the latest deep learning models.
Feature engineering is one of the major choke points of traditional machine learning since so few people are able to do it well and quickly enough to adapt to changing data.
For cases where feature introspection is necessary (e.g. the law requires that we justify a decision to, say, deny a loan due to predicted credit risk), then the usage of Deep Learning is problematic, even prohibitive.
A recommended workaround is using Deep Learning in an ensemble with machine-learning algorithms, allowing each one to vote and relying on each for its strength. Alternatively, one can perform various analyses on the results of deep nets to form hypotheses about their decision-making.
And a final thought: all models are wrong, but some are useful.
This Blog is created and maintained by Iraklis Mardiris