We are excited to announce that we have a new extension in development, which will implement an interface to Apache Solr into RapidMiner, utilizing the fast and efficient text analysis functions of Solr for your RapidMiner work.
Apache Solr is an enterprise search platform from the Apache Lucene project. Among its features are full-text and faceted search as well as computing statistics, with a major advantage being the speed with which you receive your search results. With the new extension, you will be able to harness this speed: The Solr Extension acts as an interface between the search platform and RapidMiner, directly tapping into the Solr database. You will be able to carry out a faceted search – a search where your results are returned categorized – in RapidMiner, rather than having to modify the resulting ExampleSet by aggregating.
The principle advantage of this extension is the efficient index strategy of Solr, which allows you to carry out text analysis orders of magnitudes faster compared to using RapidMiner alone. Besides the more efficient computing, you can also save time on designing your RapidMiner process, since you already receive a clean and adapted data set and do not have to filter it yourself through proporcessing steps in RapidMiner. With that comes another advantage: processes will be substantially less cluttered and better organized since you can lose all those operators.
As the Solr development focuses more and more on the JSON facet API, the Solr Extension will of course be able to support JSON faceted search, and works great in combination with our WebAutomation Extension.
We hope you are as excited as we are about this new extension and we will be back with new info as soon as possible!