So, finally we have done it! If you have followed our twitter posts or some remarks in the RapidMiner Community, you already may have been aware that we have been working on something larger. Now, after more than a month of spit polishing our brand new extension with documentation, tests and improved user interface, it's done. It's available now!

Wow, this really is a great feeling right now. I guess you would have an easier time sharing my feelings and excitement, if you knew what the extension is actually about. So please find below a very short collection of images. A long description of the features can be found here.

Loops to save time and complexity

The new loop operators allow to use arbitrary input ports. If you need to continuously change one specific object, you can get it from the loop port which is in the first iteration holding a copy of the same object as on the first input port. But once the first iteration is executed, the results from the first iteration will be forwarded back to the input loop port for the next one. No Remember / Recall needed to continuously aggregate data as above.

The screenshot shows how data is appended to previous results and then swapped onto the disk.

Continuous Tests for your process collection

Here you see a simple process that combines our new Loop Repository operator, that allows you to iterate over processes with the testing operator. It loads the process and the respective result and then compares it. If the result does not match, regardless whether it is data, performance vector or model, an exception is thrown and you know that something went wrong in your infrastructure.

On Disk Memory and Caching

The Storage Statistics Panel gives you an overview of how much memory you are consuming on your hard disk (o better solid state disk) and how many cache entries are currently stored.

Above you see how ten thousands of columns are stored, each of them in a data set with around 60000 rows. And this extra memory just comes with a speed penalty of factor three.

The new Cross Validation

The new cross validation looks nearly the same from the inside...

...but allows to generate final preprocessing models if they are built during the training phase. Training phase may also use other external input. Finally the cross validation can output test results for all rows like an X-Prediction with now time overhead. But that's not all...

Unlike the default version, it will make of your powerful multi core CPU like here on my Laptop.

The same is true for the loop operators that can execute iterations in parallel if the results are not depending on each other! A great way to speed up complex operations on larger (but not yet big) data sets.

New Panels

As you can not only use your multi core cpu for speeding up some operators, but you can also run multiple processes at once, right within your studio. Simply drag them onto the panel or select the play button!

You see above that they will be shown beside the tasks of operators. Finished processes allow to access their results until you clean the list. You can even access the logs of the processes during runtime!

Free Demo Version

The good news is: Everybody who wants to try it can just proceed with the download from our product page here. While the free demo version is restricted in some features, most of them are free to use for everybody! But if you like it, we will be very happy to welcome you as paying customer...

Of course we are also happy to receive some feedback about the features, what's great, what's missing and so on, but first you should get yourself familiar with the extension: Here's the list of the features.

As the benefit of some of the features may be not obvious for everybody, we will release a series of videos, explaining when, why and how to use them. Most of them are really meant for the hard boiled data scientist in a productive environment that not all of us might face everyday. On the other hand, the extension also features several operators that are very useful for the beginner as well as they make things simply easier to achieve.