Five Ways How to Not Do a Data Science Project

Yesterday, Sebastian held a talk at the Business Intelligence and Analytics meet up at RapidMiner in Dortmund. There, he took a slightly different approach to talking about data science projects: Instead of enthusiastically discussing merits and benefits of establishing data science in a company, he pointed out possible pitfalls along the way. It seems that over recent years, many people – especially managers and supervisors – have forgotten what’s long been known: Mistakes are allowed, even helpful. Indeed, one of the best ways to learn is from your own mistakes. Luckily for you, we have already made some of them so you don’t have to go through it yourself. 

Click here for the presentation as a .pdf document in German and here for an English version, or read on for a short summary of the five strategies you can adapt if you want your project to fail as quickly and thoroughly as possible. Or, you know, learn from our mistakes and make your project a success with higher probability. 

Photo: Business Intelligence and Analytics Dortmund

1. Getting Lost in the Marketing Jungle 

Data Science has been on everyone’s lips for a while now, but, as so often with latest trends, everybody likes to talk about it, while no one really knows what it is. It sounds great, all the big players on the market have it, so obviously we need it, too, and as large-scaled as possible, right? 

Well, maybe take a step back and have a map with you when you enter the jungle.

While data science can be very beneficial for your company, it is crucial to know what exactly you want and how to get it. Data science isn’t just a product, it needs to be deeply embedded within the company. To get there, a long learning process on all sides is essential. But if you are looking for a quick and easy one-fits-all solution, here are four steps how you can make your project a complete failure:

 1. Mix up infrastructure and methods

2. Buy big for long-time commitment

3. Leave the project with your overwhelmed employees

4. Let the project run without further support

5. Increase pressure and blame your employees for failures

So now you have an ill-fitting infrastructure with a whole hullabaloo of methods, spent your budget on purchases you might not even need, while having to explain to your dissatisfied employees that they will be left alone with the problems they have now as well as with those that might come up later on. What else can go wrong?

 2. Going Backwards

Often, the capabilities and opportunities that data science and machine learning offer haven’t been fully understood by a company’s employees. The ways of data science differ from their established work processes, creating fear: if the computer can now do all I do and more, doesn’t that mean I have become dispensable? 

Behind this is a lack of understanding of what machine learning can do and cannot do: If the machine learns something, it does not mean that the engineer is able to construct a better machine. Also, the machine is only able to learn within a static context, so it will not start constructing something entirely new. However, within these contexts it gives better predictions about the most probable future than any human can make. This allows to make much better tactical decisions in repetitive contexts. But still, strategic decisions that will change the context are the domain of humans that can leverage background knowledge and intuition. 

Understanding this is what we call the predictive mindset, and it is absolutely essential for a successful project. If the employees don’t even know the possibilities, how can they effectively use them to their advantage? How can they tell the data science team what they need? 

So, a good way to failure is:

  • Ask your uninformed employees in the respective departments about their project ideas
  • Put all of these into a large request catalog
  • Tender and commission the project to someone
  • Simply assume the work is finished and done

3. Going at It from the Wrong End

There is a problem inherent to data science projects you don’t find as much in other, more classic projects: you never really know how good your results will be in the end. Even more, you cannot know how good the best solution would be. So before you start, you have to think about what you consider a good solution – but you also have to accept that there is no such thing as a „correct“ solution. Data science projects are exploratory by nature and you will have to make your map while you are walking.

Underlying is a chicken-egg-problem: For conducting a project you have to plan it and the necessary infrastructure. But to do so, you need experience. And to gain experience you have to conduct a project.

So what are the steps to let your project fail?

1. Employ a lot of bright data scientists

2. Invest all your money in infrastructure

3. Start by building a perfect data warehouse. Then carry out the first data science project on this “perfect” basis

5. Have vaguely defined goals.

 4. Walking Alone

Data science itself might be new, but the fields in which it is employed are not – knowledge accumulated over centuries can and should be used to the advantage of everyone involved. To this end, communication between data scientists and professional experts is essential. A data scientist in their ivory tower might find brilliant solutions – just probably for problems nobody actually has. Interchanging ideas and data across departments will lead to understanding on both sides, a prerequisite for success. You don’t want success? Then simply do this:

 1. Put your data science team in an ivory tower far from the reality of your company

2. Let them say ”Give me data and I’ll explain your world to you” to your domain experts

3. Cut all communications between data science and domain experts

4. Let them solve problems that no one actually has

5. Confusing Humans with Robots

Data science solutions tend to interfere with established processes and work methods. While it can be used as a supportive element, it can also partially or fully automate procedures – who wouldn’t feel obsolete if they had been doing these things until now?

If the domain experts start to boycott the project, it will be hard to get it to success due to lacking knowledge. Even worse, if your machine learning works alright: If nobody ever uses its predictions, what is it worth? 

So another great recipe for failure would be to

  • Ask experts to focus all their energy on a data science project aimed at process automation
  • Design a solution without talking to the experts
  • Force experts to use said solution
  • And then fire half of them

 

The only way to success is to include all stakeholders affected by the project. Fears must be openly addressed and eliminated to build support among employees. Therefore, understanding and a common language for data science must be spread all over the company.