"Move Rapidminer project from one eclipse installation to another"

leitoldleitold Member Posts: 22 Contributor I
edited June 6 in Help
Hello,

after not working on the project for a long time, I am finally in the process of finishing my Rapidminer extension to implement a method from computational physics. Now, in order to test it thoroughly, I would like to copy my whole Eclipse project to another machine, ideally on another platform (I did the development on Linux and would like to test everything on Mac OS and / or Windows as well).

So, in Eclipse, I did a File -> Export -> Archive File on the source machine, and then the same with "Import" on the target machine. However, there RM simply fails to run with the error "Could not find or load main class com.rapidminer.gui.RapidMinerGUI". The problem persists whether the target machine runs Linux or Mac OS, as a matter of fact, the Linux target is actually just the same Eclipse version on another user account of my source machine. It seems to me that some configuration is left from the original project that prevents the new project from properly executing RM, but I have so far not found out what exactly is the problem.

Thanks a lot for your help in advance!
Lennex

Answers

  • Marco_BoeckMarco_Boeck Team Lead Software Engineering Moderator, Employee, Member, University Professor Posts: 1,806   RM Engineering
    Hi,

    I assume the goal is to test the extension on different OS? If so, you can just install Studio on these systems and place your extension in the USER_HOME/.RapidMiner/extensions folder. It will be picked up by Studio 6.4 and later and will be loaded by Studio.

    Regards,
    Marco
  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 2,107  RM Data Scientist
    Hi Lennex,

    just for the record - i am very interested in that extension. I am holding a PhD in astro particle physics. Thus i am always curious what other physicists are doing with RM.

    Cheers,
    Martin
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
  • leitoldleitold Member Posts: 22 Contributor I
    Marco Boeck wrote:

    I assume the goal is to test the extension on different OS? If so, you can just install Studio on these systems and place your extension in the USER_HOME/.RapidMiner/extensions folder. It will be picked up by Studio 6.4 and later and will be loaded by Studio.
    Hello Marco, thanks for your suggestion. I actually use the community version 5.3, the the folder is lib/plugins in that case. The problem with that approach is, on Mac OS my extension does not work properly. In particular, some of the settings in the operator have no effect, i. e. are not saved by RM at all, and hence the operator fails. So I thought it would be a good idea for debugging to copy the whole project to the Mac OS machine, and not only the extension. Strangely enough, on another Linux machine my extension works fine.

    @Martin:
    Don't worry, I will post the final result here soon :-).

    Cheers,
    Lennex
  • Marco_BoeckMarco_Boeck Team Lead Software Engineering Moderator, Employee, Member, University Professor Posts: 1,806   RM Engineering
    Hi,

    I'd encourage you to have a look at our Studio 6.5 Community release then because a lot has improved in the past years :)
    https://rapidminer.com/get-more-open-core/

    I can't really help with setting up 5.3 for development, sorry.
    If I should hazard a guess, text fields are not working, correct? If so, pressing Enter after editing will fix that in 5.3.

    Regards,
    Marco
  • leitoldleitold Member Posts: 22 Contributor I
    OK, I will definitely look into version 6.5 Community release. In 5.3 on the Mac, pressing Enter will not fix the problem though (it indeed occurs for text fields, regardless whether they allow any string or just integer values to be entered).

    Cheers,
    Christian
  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 2,107  RM Data Scientist
    Hey,

    there was a bug having this effect. That happend because apple changed it java version. In 6.5 this should be fixed.

    Bets,
    Martin
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
  • leitoldleitold Member Posts: 22 Contributor I
    So, I have now _kind of_ successfully imported the RM 6.5 source from the Github (https://github.com/rapidminer/rapidminer-studio) into my Eclipse workspace. However, the build fails as there is an import that does not work:

    "The import com.rapidminer.license cannot be resolved".

    Apart from that, everything seems to work, but of course I cannot test it properly without the missing import.

    Cheers,
    Christian
  • Marco_BoeckMarco_Boeck Team Lead Software Engineering Moderator, Employee, Member, University Professor Posts: 1,806   RM Engineering
    Hi,

    did you refresh your Gradle dependencies?

    Regards,
    Marco
  • leitoldleitold Member Posts: 22 Contributor I
    Hm, I'm afraid I don't know how to do that. What is Gradle and how can I update its dependencies in Eclipse?

    Cheers,
    Lennex
  • Marco_BoeckMarco_Boeck Team Lead Software Engineering Moderator, Employee, Member, University Professor Posts: 1,806   RM Engineering
    Hi,

    that's our build system. We no longer use Ant but instead Gradle. Thus all the build.gradle files :)
    You need to import that as a Gradle project in Eclipse and then refresh the Gradle dependencies.
    An updated Developer "How to" will follow at some point, I don't have any dates though, I'm afraid.

    Regards,
    Marco
  • leitoldleitold Member Posts: 22 Contributor I
    Wow, so finally I got RM 6.5 working from within my Eclipse. Turned out that my previous Eclipse version was way too old, without Cradle support, so I had to upgrade to the latest version before I could finally add the Cradle plugin.

    I still have a rather weird issue: when trying to start, I get an exception when RM tries to check its version. In particular, it is happening in PlatformUtilities.java, in the part
    	/**
    * Initializes the current version by reading the version.properties file
    */
    private static void initializeReleaseVersion() {
    synchronized (INIT_VERSION_LOCK) {
    currentVersion = readResourceProperty(VERSION_PROPERTY_KEY);
    if (currentVersion == null) {
    logInfo("Could not read current version from resources. Looking for 'gradle.properties'...");
    currentVersion = readConfigProperty(GRADLE_PROPERTIES, VERSION_PROPERTY_KEY);
    if (currentVersion == null) {
    throw new IllegalStateException("Could not initialize RapidMiner Studio version from properties file");
    }
    }
    }
    }
    the IllegalStateException is thrown. So far I have simply hardcoded the version number and commented the throw statement here, and now RM starts and works fine, but that seems like a rather stupid "solution"...

    Anyway, thanks a lot for your patience!
    Lennex
  • leitoldleitold Member Posts: 22 Contributor I
    I have found a somewhat better solution: In the "Arguments" tab of "Run configurations", I can pass variables, in particular using

    -Drapidminer.home=/path/to/my/directory/where/rm/resides

    Then, the gradle.properties file is found without problems.

    Cheers,
    Lennex
  • Marco_BoeckMarco_Boeck Team Lead Software Engineering Moderator, Employee, Member, University Professor Posts: 1,806   RM Engineering
    Edit: Sorry, my bad. Disregard this post.
  • leitoldleitold Member Posts: 22 Contributor I
    Hm, I cannot find this class in my version. Nevermind, it runs now satisfactory and it is definitely RM Studio 6.5 :-).

    Cheers,
    Lennex
  • leitoldleitold Member Posts: 22 Contributor I
    So, this is all still work in progress, but you can find a somewhat preliminary version of my extension here:

    http://christian.leitold.info/index.php?page=string-coordinate

    I'm afraid it's all rather technical if you are not from my particular field of computational physics, but maybe it's interesting anyway :-). Once the code is properly cleaned up and documented, I would also like to publish the extension at the Marketplace, are there any particular requirements apart from licensing it under AGPL 3?

    Cheers,
    Lennex
  • Marco_BoeckMarco_Boeck Team Lead Software Engineering Moderator, Employee, Member, University Professor Posts: 1,806   RM Engineering
    Hi,

    no, there are no particular requirements at the moment :)

    Regards,
    Marco
  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 2,107  RM Data Scientist
    Hi Lennex,

    i tried out the extension and have a few questions:

    Can you point me to some (industrial) use cases for this?

    What is the implemented algorithm?
    I quickly scrolled over your diploma thesis. Is it the Metropolis Algorithm with the TPS?

    And know my key question: What and how to use it?

    So let me tell you how i think this extension is used:
    You have an inital set of (measured?) values for some polymer. This is the data in your .csv file. From "baseline" you try to simulate different other outcomes based on other conditions. Am I right with that?

    If so - What could be other use cases for this?

    Think about the following: You have a company A producing a product, which is created in some chemical reaction based on a receipt. Using the receipts you can try to predict the quality of the product. Can your algorithm be used to predict other receipt <-> outcome pairs?

    I have tested it on my windows
    first it ran fine. After some changes on the settings (those you mention on your webpage) i got a
    Process failed - 11
    error message.
    Stacktrace and process XML are attached.

    Best,
    Martin

    Exception: java.lang.ArrayIndexOutOfBoundsException
    Message: 11
    Stack trace:
     cl.stringcoordinate.StringCoordinate.learn(StringCoordinate.java:291)
     com.rapidminer.operator.learner.AbstractLearner.doWork(AbstractLearner.java:142)
     com.rapidminer.operator.Operator.execute(Operator.java:974)
     com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:35)
     com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:779)
     com.rapidminer.operator.OperatorChain.doWork(OperatorChain.java:377)
     com.rapidminer.operator.Operator.execute(Operator.java:974)
     com.rapidminer.Process.run(Process.java:1037)
     com.rapidminer.Process.run(Process.java:939)
     com.rapidminer.Process.run(Process.java:892)
     com.rapidminer.Process.run(Process.java:887)
     com.rapidminer.Process.run(Process.java:877)
     com.rapidminer.gui.ProcessThread.run(ProcessThread.java:51)
    Process

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="6.4.000">
     <context>
       <input/>
       <output/>
       <macros/>
     </context>
     <operator activated="true" class="process" compatibility="6.4.000" expanded="true" name="Process">
       <process expanded="true">
         <operator activated="true" class="read_csv" compatibility="6.4.000" expanded="true" height="60" name="Read CSV" width="90" x="45" y="30">
           <parameter key="csv_file" value="C:\Users\Martin\Desktop\database_polymer.csv"/>
           <parameter key="first_row_as_names" value="false"/>
           <list key="annotations">
             <parameter key="0" value="Name"/>
           </list>
           <parameter key="encoding" value="windows-1252"/>
           <list key="data_set_meta_data_information">
             <parameter key="0" value="pB.true.real.label"/>
             <parameter key="1" value="NA.true.integer.attribute"/>
             <parameter key="2" value="NB.true.integer.attribute"/>
             <parameter key="3" value="U.true.real.attribute"/>
             <parameter key="4" value="Rg2.true.real.attribute"/>
             <parameter key="5" value="a.true.real.attribute"/>
             <parameter key="6" value="Q4.true.real.attribute"/>
             <parameter key="7" value="Q6.true.real.attribute"/>
             <parameter key="8" value="I1.true.real.attribute"/>
             <parameter key="9" value="I2.true.real.attribute"/>
             <parameter key="10" value="I3.true.real.attribute"/>
             <parameter key="11" value="Ncore.true.integer.attribute"/>
             <parameter key="12" value="Nconpart.true.integer.attribute"/>
             <parameter key="13" value="Ncompactpart.true.integer.attribute"/>
           </list>
         </operator>
         <operator activated="true" class="normalize" compatibility="6.4.000" expanded="true" height="94" name="Normalize" width="90" x="313" y="75"/>
         <operator activated="true" class="stringcoordinate:string_coordinate_extension" compatibility="0.1.000" expanded="true" height="130" name="String Coordinate" width="90" x="514" y="75">
           <parameter key="fixed_attribute" value="U"/>
           <parameter key="use_committor_data" value="true"/>
           <parameter key="attribute_for_NA" value="NA"/>
           <parameter key="attribute_for_NB" value="NB"/>
           <parameter key="reverse_string" value="true"/>
         </operator>
         <connect from_op="Read CSV" from_port="output" to_op="Normalize" to_port="example set input"/>
         <connect from_op="Normalize" from_port="example set output" to_op="String Coordinate" to_port="training set"/>
         <connect from_op="String Coordinate" from_port="model" to_port="result 1"/>
         <connect from_op="String Coordinate" from_port="stringCoordinateVisualization" to_port="result 2"/>
         <portSpacing port="source_input 1" spacing="0"/>
         <portSpacing port="sink_result 1" spacing="0"/>
         <portSpacing port="sink_result 2" spacing="0"/>
         <portSpacing port="sink_result 3" spacing="0"/>
       </process>
     </operator>
    </process>
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
  • leitoldleitold Member Posts: 22 Contributor I
    Martin,

    cool that you have already tested the extension. I will try to answer your questions one by one...
    Martin Schmitz wrote:
    Can you point me to some (industrial) use cases for this?
    In principle, with the string reaction coordinate, one should be able to predict the progress of _any_ process that has a well-defined start and end point. The basic idea is to project a very complex "reaction" or transition process, characterized by many variables or features, onto a one-dimensional progress coordinate ranging from say 0 (start) to 1 (end). So, that could have a lot of possible applications, the question is of course, is it able to perform better than existing machine learning techniques.
    What is the implemented algorithm?
    I quickly scrolled over your diploma thesis. Is it the Metropolis Algorithm with the TPS?

    And know my key question: What and how to use it?
    So let me tell you how i think this extension is used:
    You have an inital set of (measured?) values for some polymer. This is the data in your .csv file. From "baseline" you try to simulate different other outcomes based on other conditions. Am I right with that?
    Let's start with the polymer: it is similar to the one investigated in my diploma thesis, though there are differences. Basically, instead of using the Metropolis algorithm, we perform a Langevin molecular dynamics simulation. That is, the actual physical movement of the particles that comprise the polymer is simulated. The "Langevin" part of the simulation says that we also would like to consider the presence of some solvent, say, water, in an implicit way: by simply adding a friction term as well as random forces to the equation of motion.

    Now, it turns out that when the conditions are right, that is, the temperature has a certain value, the polymer can exist in either one of two stable states: folded or unfolded. What we would like to learn is how the transition between these two states looks like, on a molecular level. The data present in the csv file are obtained by preparing the polymer in some "random" (intermediate) configuration and then start the simulation. After some time, the polymer will either end up in state A (unfolded) or state B (folded). Now, due to the stochastic nature of Langevin molecular dynamics, if we repeat this experiment many times, in the end, some number of simulations runs will have ended in A, and another number in B. That is what NA and NB in the data file means. pB, or the committor, is calculated as pB = NB / (NA + NB) and hence also called the folding probability, as it is the probability for a given configuration to end up in the folded state. All the other variables are characterizing the configuration of the polymer. So the _goal_ of the analysis is to predict the folding probability, pB, given a vector of variable characterizing the polymer configuration.
    If so - What could be other use cases for this?

    Think about the following: You have a company A producing a product, which is created in some chemical reaction based on a receipt. Using the receipts you can try to predict the quality of the product. Can your algorithm be used to predict other receipt <-> outcome pairs?
    I'm afraid it doesn't work that way. However, given some reaction, you might be able to predict what is the most efficient way to drive the reaction, i. e. the best receipt.
    I have tested it on my windows
    first it ran fine. After some changes on the settings (those you mention on your webpage) i got a
    Process failed - 11
    error message.
    Stacktrace and process XML are attached.
    OK,  I have tried to run your process on my installation, it failed with the same error. I am pretty sure that the problem is in the normalization operator: it is not configured. The optimization in String Coordinate implicitly assumes that all variables (except NA and NB of course, which are integer numbers) are within [0, 1], so I use the range transformation. If values are way out of this range, for some so far unknown reason the operator fails with an ArrayIndexOutOfBoundsException. I will definitely investigate, thanks! I am on a conference right now, so it might take some time until I can find the problem.

    Cheers,
    Lennex
  • StaryVenaStaryVena Member Posts: 126  Maven
    Marco Boeck wrote:

    Hi,

    since 6.5 the main entry point is the com.rapidminer.launcher.GUILauncher class. If you use that to start Studio in Eclipse, it should find everything automatically.

    Regards,
    Marco
    Hello Marco,
    I can't find the launcher package or the GUILauncher class in the repository - https://github.com/rapidminer/rapidminer-studio/tree/master/src/main/java/com/rapidminer
    Am I something missing or where is the class located?

    Best wishes
    Vaclav
  • Marco_BoeckMarco_Boeck Team Lead Software Engineering Moderator, Employee, Member, University Professor Posts: 1,806   RM Engineering
    Hi,

    sorry, my bad. Ignore that post :)
    RapidMinerGUI.main() is fine.

    Regards,
    Marco
  • leitoldleitold Member Posts: 22 Contributor I
    I have fixed the bug discovered by Martin, so again the download link for the latest version:

    http://christian.leitold.info/index.php?page=string-coordinate

    The bug actually didn't occur in the demo by pure chance, the normalization simply changed the order of attributes and thus prevented the bug from occurring. Now, while the extension now will not crash any more, it is still important to have all your attributes in the interval [0,1] to get meaningful results.

    Cheers,
    Lennex
  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 2,107  RM Data Scientist
    Hi again,

    why don't you put the normalization inside your operator if it does not work otherwise? Would prevent people from doing "wrong" things.

    Cheers,
    Martin
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
  • leitoldleitold Member Posts: 22 Contributor I
    Hi,
    Martin Schmitz wrote:
    why don't you put the normalization inside your operator if it does not work otherwise? Would prevent people from doing "wrong" things.
    I have thought about that, and my idea was that the operator should only do one thing, especially as there _is_ already an operator to do the normalization. But now that you mention it, I will add an option to to the normalization within the new operator.

    Cheers,
    Lennex
Sign In or Register to comment.