One of the questions often addressed to me is whether an organization needs a data science team or not.
The way in which an organization will interact with data science depends a little bit on what kind of organization it is.
To some extent, it depends a lot on the size of the organization. So, when it is just a start up, when it is an early stage company, or just one person with a very small team, then we may not need to worry so much yet about how to do experimentation, how to do machine learning, how to do sort of prediction and downstream calculations. The first order of business is just making sure we keep our data in order. And the way to do that is to make sure we focus on infrastructure.
So the first thing that we need to do is build out the infrastructure for storing the data, the databases and so forth. The software that’s going to be run to pull those data, the servers that are going to serve the data to other people, and the servers that we will interact with in order to get the data out. So all that requires infrastructure building up at first. Thus, often the first people that we get to hire into a data science team, are not people that we would necessarily call data scientists in the sense that they’re not analyzing the data, they’re not doing machine learning. They might do a little bit of that, but mostly they’re involved on just making sure the machine is running, making sure the data’s getting collected, its secure, its stored and so fourth.
When we are a mid-size organization, then hopefully we have got the basic infrastructure in place. And we can start thinking about building out our real data science team. To do that we can bring on board people that are actually called data scientists. Those are the folks who will then actually use the data. They might run some experiments. They might build machine learning algorithms. They might analyze the data to see if you can identify any patterns or trends in behavior that we care about.
This is the point where we are thinking about actually building the data science team. We are also thinking about implementing these data science ideas and products. So again, the data scientist might build something like a machine learning algorithm that predicts, for example, consumer behavior. Once we have that algorithm built out, we might need to implement it back on to our system. And we might need to scale it up, so that it can be run on the whole data set. We might want to build some sort of visualization that people who aren’t necessarily data scientists can interact with.
By doing so we might have to turn back to the data engineering team. There might be still infrastructure concerns, because we now have a large set of data that we hopefully collected at this point. We need to be secure about it, we need to have a database and be able to scale it.
For a large organization we have all those same sorts of things. We now have a data infrastructure, we might have a data science team that’s running experiments. We may be using those experiments to make decisions. But now we have one additional component which is really managing the team and keeping everybody on task and coordinated.
At this point the data science manager role becomes a little bit more involved, in the sense that we might be coordinating multiple teams of data scientists working on different projects. We might have a team that works exclusively on building machine learning projects. We might have another team that works exclusively on running experiments and inferring what you can from those experiments. And then someone has to be in charge of coordinating those activities making sure they’re connected to the right people within the organization. Whether that’s the marketing team, the business group or whoever else that we are collaborating with. We have to be able to connect those people. And so at that scale the full data science infrastructure is in place.
This Blog is created and maintained by Iraklis Mardiris