integration of non-SQL database mongoDB client API

turicumturicum Member Posts: 15 Contributor II
edited November 2018 in Help
Hi everybody,

I'd like to use RM with a non-SQL database, namely MongoDB (www.mongodb.org). A Java db client API is already available: would it be difficult to integrate it into RM?

Thanks!
Alex

Answers

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi Alex,
    I think this depends :) Probably it would not cost us more than one or two days to integrate it (without looking deeper in that matter), but if you are not familiar with RapidMiner's operator constructions, especially the InputReader, it might take some time. Unfortunately there's no special tutorial available for implementing new Input Operators and in fact, unfortunately I didn't manage to finish the "Extending RapidMiner" tutorial anyway...

    Greetings,
      Sebastian
  • turicumturicum Member Posts: 15 Contributor II
    Hi Sebastian,

    thank you for your reply!

    Is there any .java file I can look at as an example and/or that I can extend to integrate mongodb's API?

    thanks!
    Alex
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi Alex,
    the best would be making an extension for that. This way you are flexible with adding this operators to various RapidMiner 5 installations. You could take a look at the source code of any of our extensions for getting an impression, how this works and what is needed.

    Your class should extend AbstractExampleSource, which has only 2 methods, you should implement: createExampleSet and getGeneratedMetaData. Last is optional but without this, all the nice meta data transformations available in RapidMiner 5 won't work.

    You could take a look at any subclasses of AbstractExampleSource for getting an impression how things work.

    Greetings,
      Sebastian
  • turicumturicum Member Posts: 15 Contributor II
    Hi Sebastian,

    thank you for the suggestion, I'll check out those classes!

    Cheers,
    Alex
  • TobiasMalbrechtTobiasMalbrecht Moderator, Employee, Member Posts: 294 RM Product Management
    Hi Alex,

    in fact, it might be even easier to extend [tt]AbstractDataReader[/tt] (which is a subclass of [tt]AbstractExampleSource[/tt]). There you only need to implement one method returning a [tt]DataSet[/tt] which is like a kind of iterator over the data and resembles a [tt]ResultSet[/tt]. Additionally, while constructing the DataSet you also might want to call the method setColumnNames(String[] columnNames) to name the columns correctly depending on your data. Using this mechanism, you will not need to care about meta data generation or the data generation itself - only the extraction of values from the data source needs to be implemented. Have a look at the [tt]CSVDataReader[/tt] or the [tt]DatabaseDataReader[/tt] and you will understand how it works. However be aware, that using this mechanism, your database will be accessed also for the generation of the meta data.

    Kind regards,
    Tobias

  • turicumturicum Member Posts: 15 Contributor II
    Hi Tobias,

    what do you mean by "your database will be accessed also for the generation of the meta data"?

    Thanks
    Alex
  • TobiasMalbrechtTobiasMalbrecht Moderator, Employee, Member Posts: 294 RM Product Management
    Hi Alex,

    throughout process design, meta is propagated through the process to support the user in designing the process. Therefore, some data readers (those that extend [tt]AbstractDataReader[/tt]) pre-read some data (i.e. a couple of rows) already during the process design phase (when you have added the operator to the process and made the settings accordingly) to generate some meta data. Hence, if you implement your operator using the [tt]AbstractDataReader[/tt], data will be read from your database twice - once to generate some meta data during the proces design and a second time during process execution.

    Kind regards,
    Tobias
  • turicumturicum Member Posts: 15 Contributor II
    HI!

    Once I have extended AbstractDataReader to access the database, what's the easiest way to integrate the new class into RM?

    Thanks
    Alex
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi Alex,
    the easiest way is to buy the "How to Extend RapidMiner 5.0" Tutorial in our web shop. It explains on 40 pages in detail what you have to do. Additionally it comes with a sample project for eclipse, that will make it very easy for you to deploy a new Extension.

    Greetings,
      Sebastian
Sign In or Register to comment.