[New Extension] Projects

MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist
Hello Everybody!

I've just released a new extension called "Projects". This extension adds two new entries to your repository actions:
This allows you to create a standard setup for a project. It automatically creates a folder structure, add standard processes with defined documentation and so on. The structure for the Project looks like this:

Some of the processes already have a implementation. The 04-Learning one looks like this:


I hope that these templates can make your life easier andmake yourself more efficient. I appriciate any feedback on the templates and of course on further enhancements!

Best,
Martin
- Sr. Director Data Solutions, Altair RapidMiner -
Dortmund, Germany
Tagged:

Comments

  • SGolbertSGolbert RapidMiner Certified Analyst, Member Posts: 344 Unicorn
    Hi Martin,

    nice tool! It is nice to have some built-in support for an organized project in RM. Is there a plan to integrate CRISP-DM or Microsoft TDSP more tightly in the future?

    Regards,
    Sebastian
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist
    what do you mean with "integrate"?

    The proposed project structure can be mapped to Crisp. It is loosly:
    - Data Prep
    - Model (Optimize..)
    - Investigate Results

    What are your ideas? I am just doing an update here anyway. if there is something quick to add, i would be happy to to it.

    BR,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • SGolbertSGolbert RapidMiner Certified Analyst, Member Posts: 344 Unicorn
    Hi Martin,

    I would like to have more support for documentation, specially an easy way of saving the visualizations within the process (I have already commented to Egi about this). It would be great to have a "Dashboard" view to save visualizations and comments. I normally also do a table describing the variables in Excel or Google Docs, maybe that could also be done directly in RM.

    Regards,
    Sebastian
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist
    thats not a quick one i can do :/ But Egi and Crew are on it.

    best
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist
    Hi Everbody,

    I've just released v0.1.1 of Projects extension. It is covering:
    === 0.1.1 ===
    * Added an option to only add the learner templates to your project. This includes the performance subprocess
    * Learner templates are now using ../../data/prepped data and not /Samples/Titanic. Added titanic as an example into data/prepped/.
    * Added a Radial SVM, Linear SVM, kNN and Naive Bayes template to learners
    Plans for 0.2 are:
    * Threshold optimzation
    * Feature Selection

    Best,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • christos_karraschristos_karras Member Posts: 50 Guru
    mschmitz,

    That's an interesting extension and I was trying to adapt our current project to use it, but found that it's not really obvious how to use it as intended. Is there any documentation explaining the typical workflow with this extension? 

    For example:
    - How to change the version number? (I tried setting a "version" macro with Set Macro in the "! Main" process, but results are still written to version 1)
    - How to store data from one step to reuse it in the next step (If I try to run steps individually, I see they are expecting data from the previous step to be there in the repository, but I could not find in the default generated processes anything that stores outputs to the repository)
    - How to specify the learners to use when training:  (I found the answer: Edit the "Used Learners" repository entry but it would be useful to have it documented)
    - How to compare the different trained model to decide which one to use (is there a single view showing the performance of all models, or should we individually open each "%{version}/learner/performance" repository entry to review it?
    - Once we have decided which model should be used in production, how do we "promote" the model to become the production model? (For example, is there a predefined variable/macro to reference the selected model in the repository?)

    An example of a full solution might also be helpful, but eventually we would need documentation so that it can be used by non-expert users (for example to allow other users to maintain/re-train/improve a model after we have completed initial development). If we end using the extension we could also contribute some documentation, but if you have some answers to the questions above, or existing documentation/examples to get started, that would be helpful.

    Thanks
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist
    Hi @christos_karras ,
    thank you for your detailed feedback. I would build it in, but this extension will be a bit outdated with the new features we show during wisdom next week. That will make all of our life way easier.

    Best,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.