Predictive Analytics in a Big Data Context

Course Overview

Type of course: classroom training, duration: 2 days

Predictive Analytics in a Big Data Context is a two-day course focusing on data science technologies for handling large amounts of data. We start by giving an overview of the technologies available and put keywords like data lake, in-memory and Hadoop into a meaningful overall context. Participants will learn how big data technology can be used to solve data science problems.

In this training course, the scenario looked at in the basic training courses is continued and scaled to big data. RapidMiner Radoop is used for this purpose, which enables large amounts of data to be analytically processed in the familiar environment of RapidMiner Studio. The distributed execution over Hadoop clusters means that any amount of data can be processed.

After the course, participants will have an in-depth knowledge about the pros and cons of big data technologies and will know how large amounts of data can be processed using RapidMiner on Hadoop. In the course, the participants use their personal laptops, meaning they can take home the knowledge and example solutions from the course and use these as a basis for their own big data challenges.

The course is structured to provide constant alternation between the study of theoretical basics and proven best practices and the practical application of knowledge acquired. Participants will form a data science team that completes the tasks set by the course instructor together.

Course Objectives

The skills acquired by participants of the training course include:

  • Understanding of big data infrastructure with its possibilities and limitations
  • Connecting a desktop computer with a Hadoop cluster
  • Exploration of large volumes of data
  • Extracting and loading data
  • Producing big data analyses with RapidMiner
  • Knowledge of methods to efficiently process large volumes of data

Training Content

What is big data and when does it make sense?

When can analytics benefit from big data?

Introduction to Hadoop

  • General infrastructure
  • Hadoop integration with RapidMiner: Radoop
  • Introduction to the Radoop user interface
  • Connecting desktop computers or laptops with a Hadoop cluster

Data Exploration

  • Searching tables
  • Statistics and data aggregation for high level information

Extracting and Loading Data

  • Formulating queries
  • Entering data in Hadoop

Analytics Processes on Hadoop

  • In-Hadoop training
  • Profiling and natural aggregation
  • In-memory training, in-Hadoop scoring

Beyond Natural Aggregation

  • Chunking
  • Voting
  • In-Hadoop modelling
  • Clustering

Batch-oriented Processing of Large Volumes of Data

Previous Knowledge Required

For this training course, you will need the knowledge from the previous courses Basics 1 & 2 and Deployment - Predictive Analytics in Live Use. If you have already acquired equivalent knowledge from comparable training courses, please contact us.

Target Group

Data scientists, advanced analysts


After you have attended the training courses Deployment - Predictive Analytics in Live UsePredictive Analytics in a Big Data Context and the course on Text and Web Mining, you can take an exam to acquire the “RapidMiner Expert” certificate and show off your new qualification.


Predictive Analytics in a Big Data Context         € 2000 per participant (plus VAT)
Certificate                                                                 Free of charge for participants*
*A later examination can be taken online at any time for an additional € 200.

If two or more participants register, a price of € 1440 per participant will be charged (plus VAT).

Information about arrival, accomodations and meals.