What is Data Science?

Artifical intelligence, big data analytics, data mining, industry 4.0... all these buzzwords can be quite confusing. They do, however, have one thing in common:

They usually describe methods to gain value from data.

The value of data can at first be about gaining insights or be an interactive support for decision-makers. Where applicable, industrial or business processes can be partially automated through a feedback loop with experts. After possible solutions have been evaluated and one of them has been found to work well, a complete automation of these processes can be aimed for.

In order to reach these goals, the interdisciplinary field of data science uses different methods from computer science, mathematics, and statistics. Among these are, specifically in the area of machine learning, decision trees or deep learning for classification and regression problems, or unsupervised learning such as the k-means algorithm for cluster analysis.

Additionally, in most use cases expert knowledge from the respective field of the company or organisation is incorporated. Here, the data scientist acts as the linking element between domain experts and data-based algorithms.

This Venn diagram puts four of the most widely used terms in relation to one another: deep learning is a special case of machine learning, which in turn is an important part of data science and can further be considered a type of artificial intelligence (AI). These and other terms such as big data, business analytics, and industry 4.0 are explained in more detail below.

Data Science and Other Terms: A Short Dictionary.

Data Science

“The job of the data scientist is to ask the right questions. If I ask a question like ‘how many clicks did this link get?’ which is something we look at all the time, that’s not a data science question. It’s an analytics question. If I ask a question like, ‘based on the previous history of links on this publisher’s site, can I predict how many people from France will read this in the next three hours?’ that’s more of a data science question.”

―Hilary Mason, Founder, Fast Forward Labs

Data Science is closely connected to artificial intelligence, big data, and industry 4.0., and generally concerns working with data in order to gain value from it. Thus, it is an umbrella term for most other terms explained hereafter such as machine learning, text mining, business analytics, and others.

Artifical Intelligence

Of the terms presented here, artificial intelligence (AI) might well be the most overloaded one, connected to wrong ideas. AI describes computer-aided techniques able to solve problems on their own. In this broadest sense, even a calculator can be considered an artificial intelligence. As the extreme opposite – inspired by movies and TV series – AI often comes with the notion that machines might become sentient and develop superhuman intelligence. However, we are still very far from this actually being a possibility.

The media mostly mentions AI in connection to future technologies such as self-driving cars or chatbots trying to pass the Turing test, i.e. give answers so similiar to a human's that a human conversation partner is not able to tell whether they are communicating with another human or a machine. Actually however, everyone has most likely had contact with some established, modern AI technologies like automatic speech recognition, search engines, or machine translation.

Machine Learning and Human Learning

With data science, it is possible to process data in a way they can be an assistance to humans making strategic decisions. This can enable humans to learn and make decision based on that data.

In machine learning, on the other hand, the computer, using algorithms, looks for patterns in the data that allow it to make statements about a state or the future within a defined context in order to automate decisions. There are numerous examples for that, from spam filters and personalized reccomendation on Netflix, to speech and text recognition by digital assistants, predictive maintenance (see below), automated quality control and monitoring, or autonomous vehicles. Up to now however, the operating principles of most machine learning techniques are within a black box, i.e. the decisions made automatically are not traceable for humans.


Deep Learning

Deep learning is a specific type of machine learning. Here, neural networks – cell systems created with the model of the human brain in mind – are used to develop predictive models. These (artificial) neural letworks contain many layers of inner neural levels (hence, "deep" learning). Deep learning is currently relatively popular, but at the same time, there a many other machine learning techniques that depending on the use can might yield better results.

Text Mining

Text mining is an application of linguistic data processing – texts are the data here from which value is to be generated. Text mining aims to gather information which can be processed for subsequent use in futher steps. Like this, machine learning algorithms can be trained to classify new texts. A special case here is web mining, where the contents of internet documents, i.e. of websites such as twitter, facebook, or news outlets, are analysed.

Business Analytics

Business analytics is applied data science in a business environment. Specifically, business processes can be improved by using data relevant to the respective business context in order to derive insights and predictions from them.

Predictive Maintenance

Predictive Maintenance is a special application scenario of machine learning and one of the core components of industry 4.0. It aims at predicting the optimal point in time for maintenance of machines and installations, preventing disruptions and their negative effects such as unplanned downtime or quality deficiencies.

In contrast to preventive maintenance with routine services or inspections, it is based on machine or production data periodically or continually recorded by sensors, and not on statistics regarding the average or expected lifetime. As with the predictive approach, maintenance work is only carried out when necessary, downtimes as well as maintenance costs can be reduced. At the same time, it still brings with it the advantages preventive maintenance has over unplanned breakdown-induced maintenance, such as longer plant lifetime, increased plant security with less accidents and less negative health or environmental effects, or optimized parts supply.

Industry 4.0

Industry 4.0.is a term used mainly in the German-speaking areas to describe the (primarily economic) changes digitization will bring. The interaction of digital technologies is characteristic here, such as the interplay between the terms explained here and classic business processes, which are deemed to be changing drastically. More and more it becomes apparent that digital economic processes differ greatly from classic ones.

Digitization and industry 4.0 are met with high expectations, such as savings potential, but new risks can also be foreseen.

Big Data

Big data describes both a system architecture and a new programming paradigm. A descriptive explanation would be that nowadays data is produced in quantaties so large, it can no longer be processed by present system architectures. For this reason data is spread among several systems. In order to deal with these large and spread-out volumes of data, new programming concepts are neccessary, as otherwise it would not be possible to access the data in a reasonable time frame. Instead, processes must be executed paralelly. Technologies used here are e.g. Apache Hadoop or Apache Spark.

Do you want to learn the basics of carrying out a data science project?

Are you wondering what use cases might be relevant for you?

Or are you already further along and need help realizing your data science project?