"Move Rapidminer project from one eclipse installation to another"

leitold · September 2015

Hello,

after not working on the project for a long time, I am finally in the process of finishing my Rapidminer extension to implement a method from computational physics. Now, in order to test it thoroughly, I would like to copy my whole Eclipse project to another machine, ideally on another platform (I did the development on Linux and would like to test everything on Mac OS and / or Windows as well).

So, in Eclipse, I did a File -> Export -> Archive File on the source machine, and then the same with "Import" on the target machine. However, there RM simply fails to run with the error "Could not find or load main class com.rapidminer.gui.RapidMinerGUI". The problem persists whether the target machine runs Linux or Mac OS, as a matter of fact, the Linux target is actually just the same Eclipse version on another user account of my source machine. It seems to me that some configuration is left from the original project that prevents the new project from properly executing RM, but I have so far not found out what exactly is the problem.

Thanks a lot for your help in advance!
Lennex

Marco_Boeck · September 2015

Hi,

I assume the goal is to test the extension on different OS? If so, you can just install Studio on these systems and place your extension in the USER_HOME/.RapidMiner/extensions folder. It will be picked up by Studio 6.4 and later and will be loaded by Studio.

Regards,
Marco

MartinLiebig · September 2015

Hi Lennex,

just for the record - i am very interested in that extension. I am holding a PhD in astro particle physics. Thus i am always curious what other physicists are doing with RM.

Cheers,
Martin

leitold · September 2015

Marco Boeck wrote:

I assume the goal is to test the extension on different OS? If so, you can just install Studio on these systems and place your extension in the USER_HOME/.RapidMiner/extensions folder. It will be picked up by Studio 6.4 and later and will be loaded by Studio.

Hello Marco, thanks for your suggestion. I actually use the community version 5.3, the the folder is lib/plugins in that case. The problem with that approach is, on Mac OS my extension does not work properly. In particular, some of the settings in the operator have no effect, i. e. are not saved by RM at all, and hence the operator fails. So I thought it would be a good idea for debugging to copy the whole project to the Mac OS machine, and not only the extension. Strangely enough, on another Linux machine my extension works fine.

@Martin:
Don't worry, I will post the final result here soon :-).

Cheers,
Lennex

Marco_Boeck · September 2015

Hi,

I'd encourage you to have a look at our Studio 6.5 Community release then because a lot has improved in the past years

https://rapidminer.com/get-more-open-core/

I can't really help with setting up 5.3 for development, sorry.
If I should hazard a guess, text fields are not working, correct? If so, pressing Enter after editing will fix that in 5.3.

Regards,
Marco

leitold · September 2015

OK, I will definitely look into version 6.5 Community release. In 5.3 on the Mac, pressing Enter will not fix the problem though (it indeed occurs for text fields, regardless whether they allow any string or just integer values to be entered).

Cheers,
Christian

MartinLiebig · September 2015

Hey,

there was a bug having this effect. That happend because apple changed it java version. In 6.5 this should be fixed.

Bets,
Martin

leitold · September 2015

So, I have now _kind of_ successfully imported the RM 6.5 source from the Github (https://github.com/rapidminer/rapidminer-studio) into my Eclipse workspace. However, the build fails as there is an import that does not work:

"The import com.rapidminer.license cannot be resolved".

Apart from that, everything seems to work, but of course I cannot test it properly without the missing import.

Cheers,
Christian

Marco_Boeck · September 2015

Hi,

did you refresh your Gradle dependencies?

Regards,
Marco

leitold · September 2015

Hm, I'm afraid I don't know how to do that. What is Gradle and how can I update its dependencies in Eclipse?

Cheers,
Lennex

Marco_Boeck · September 2015

Hi,

that's our build system. We no longer use Ant but instead Gradle. Thus all the build.gradle files

You need to import that as a Gradle project in Eclipse and then refresh the Gradle dependencies.
An updated Developer "How to" will follow at some point, I don't have any dates though, I'm afraid.

Regards,
Marco

leitold · September 2015

Wow, so finally I got RM 6.5 working from within my Eclipse. Turned out that my previous Eclipse version was way too old, without Cradle support, so I had to upgrade to the latest version before I could finally add the Cradle plugin.

I still have a rather weird issue: when trying to start, I get an exception when RM tries to check its version. In particular, it is happening in PlatformUtilities.java, in the part

	/**
	 * Initializes the current version by reading the version.properties file
	 */
	private static void initializeReleaseVersion() {
		synchronized (INIT_VERSION_LOCK) {
			currentVersion = readResourceProperty(VERSION_PROPERTY_KEY);
			if (currentVersion == null) {
				logInfo("Could not read current version from resources. Looking for 'gradle.properties'...");
				currentVersion = readConfigProperty(GRADLE_PROPERTIES, VERSION_PROPERTY_KEY);
				if (currentVersion == null) {
					throw new IllegalStateException("Could not initialize RapidMiner Studio version from properties file");
				}
			}
		}
	}

the IllegalStateException is thrown. So far I have simply hardcoded the version number and commented the throw statement here, and now RM starts and works fine, but that seems like a rather stupid "solution"...

Anyway, thanks a lot for your patience!
Lennex

leitold · September 2015

I have found a somewhat better solution: In the "Arguments" tab of "Run configurations", I can pass variables, in particular using

-Drapidminer.home=/path/to/my/directory/where/rm/resides

Then, the gradle.properties file is found without problems.

Cheers,
Lennex

Marco_Boeck · September 2015

Edit: Sorry, my bad. Disregard this post.

leitold · September 2015

Hm, I cannot find this class in my version. Nevermind, it runs now satisfactory and it is definitely RM Studio 6.5 :-).

Cheers,
Lennex

leitold · September 2015

So, this is all still work in progress, but you can find a somewhat preliminary version of my extension here:

http://christian.leitold.info/index.php?page=string-coordinate

I'm afraid it's all rather technical if you are not from my particular field of computational physics, but maybe it's interesting anyway :-). Once the code is properly cleaned up and documented, I would also like to publish the extension at the Marketplace, are there any particular requirements apart from licensing it under AGPL 3?

Cheers,
Lennex

Marco_Boeck · September 2015

Hi,

no, there are no particular requirements at the moment

Regards,
Marco

MartinLiebig · September 2015

Hi Lennex,

i tried out the extension and have a few questions:

Can you point me to some (industrial) use cases for this?

What is the implemented algorithm?
I quickly scrolled over your diploma thesis. Is it the Metropolis Algorithm with the TPS?

And know my key question: What and how to use it?
So let me tell you how i think this extension is used:
You have an inital set of (measured?) values for some polymer. This is the data in your .csv file. From "baseline" you try to simulate different other outcomes based on other conditions. Am I right with that?

If so - What could be other use cases for this?

Think about the following: You have a company A producing a product, which is created in some chemical reaction based on a receipt. Using the receipts you can try to predict the quality of the product. Can your algorithm be used to predict other receipt <-> outcome pairs?

I have tested it on my windows
first it ran fine. After some changes on the settings (those you mention on your webpage) i got a
Process failed - 11
error message.
Stacktrace and process XML are attached.

Best,
Martin


Exception: java.lang.ArrayIndexOutOfBoundsException
Message: 11
Stack trace:
  cl.stringcoordinate.StringCoordinate.learn(StringCoordinate.java:291)
  com.rapidminer.operator.learner.AbstractLearner.doWork(AbstractLearner.java:142)
  com.rapidminer.operator.Operator.execute(Operator.java:974)
  com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:35)
  com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:779)
  com.rapidminer.operator.OperatorChain.doWork(OperatorChain.java:377)
  com.rapidminer.operator.Operator.execute(Operator.java:974)
  com.rapidminer.Process.run(Process.java:1037)
  com.rapidminer.Process.run(Process.java:939)
  com.rapidminer.Process.run(Process.java:892)
  com.rapidminer.Process.run(Process.java:887)
  com.rapidminer.Process.run(Process.java:877)
  com.rapidminer.gui.ProcessThread.run(ProcessThread.java:51)

Process


<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="6.4.000">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="6.4.000" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="read_csv" compatibility="6.4.000" expanded="true" height="60" name="Read CSV" width="90" x="45" y="30">
        <parameter key="csv_file" value="C:\Users\Martin\Desktop\database_polymer.csv"/>
        <parameter key="first_row_as_names" value="false"/>
        <list key="annotations">
          <parameter key="0" value="Name"/>
        </list>
        <parameter key="encoding" value="windows-1252"/>
        <list key="data_set_meta_data_information">
          <parameter key="0" value="pB.true.real.label"/>
          <parameter key="1" value="NA.true.integer.attribute"/>
          <parameter key="2" value="NB.true.integer.attribute"/>
          <parameter key="3" value="U.true.real.attribute"/>
          <parameter key="4" value="Rg2.true.real.attribute"/>
          <parameter key="5" value="a.true.real.attribute"/>
          <parameter key="6" value="Q4.true.real.attribute"/>
          <parameter key="7" value="Q6.true.real.attribute"/>
          <parameter key="8" value="I1.true.real.attribute"/>
          <parameter key="9" value="I2.true.real.attribute"/>
          <parameter key="10" value="I3.true.real.attribute"/>
          <parameter key="11" value="Ncore.true.integer.attribute"/>
          <parameter key="12" value="Nconpart.true.integer.attribute"/>
          <parameter key="13" value="Ncompactpart.true.integer.attribute"/>
        </list>
      </operator>
      <operator activated="true" class="normalize" compatibility="6.4.000" expanded="true" height="94" name="Normalize" width="90" x="313" y="75"/>
      <operator activated="true" class="stringcoordinate:string_coordinate_extension" compatibility="0.1.000" expanded="true" height="130" name="String Coordinate" width="90" x="514" y="75">
        <parameter key="fixed_attribute" value="U"/>
        <parameter key="use_committor_data" value="true"/>
        <parameter key="attribute_for_NA" value="NA"/>
        <parameter key="attribute_for_NB" value="NB"/>
        <parameter key="reverse_string" value="true"/>
      </operator>
      <connect from_op="Read CSV" from_port="output" to_op="Normalize" to_port="example set input"/>
      <connect from_op="Normalize" from_port="example set output" to_op="String Coordinate" to_port="training set"/>
      <connect from_op="String Coordinate" from_port="model" to_port="result 1"/>
      <connect from_op="String Coordinate" from_port="stringCoordinateVisualization" to_port="result 2"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
    </process>
  </operator>
</process>

leitold · September 2015

Martin,

cool that you have already tested the extension. I will try to answer your questions one by one...

Martin Schmitz wrote:
Can you point me to some (industrial) use cases for this?

In principle, with the string reaction coordinate, one should be able to predict the progress of _any_ process that has a well-defined start and end point. The basic idea is to project a very complex "reaction" or transition process, characterized by many variables or features, onto a one-dimensional progress coordinate ranging from say 0 (start) to 1 (end). So, that could have a lot of possible applications, the question is of course, is it able to perform better than existing machine learning techniques.

What is the implemented algorithm?
I quickly scrolled over your diploma thesis. Is it the Metropolis Algorithm with the TPS?

And know my key question: What and how to use it?
So let me tell you how i think this extension is used:
You have an inital set of (measured?) values for some polymer. This is the data in your .csv file. From "baseline" you try to simulate different other outcomes based on other conditions. Am I right with that?

Let's start with the polymer: it is similar to the one investigated in my diploma thesis, though there are differences. Basically, instead of using the Metropolis algorithm, we perform a Langevin molecular dynamics simulation. That is, the actual physical movement of the particles that comprise the polymer is simulated. The "Langevin" part of the simulation says that we also would like to consider the presence of some solvent, say, water, in an implicit way: by simply adding a friction term as well as random forces to the equation of motion.

Now, it turns out that when the conditions are right, that is, the temperature has a certain value, the polymer can exist in either one of two stable states: folded or unfolded. What we would like to learn is how the transition between these two states looks like, on a molecular level. The data present in the csv file are obtained by preparing the polymer in some "random" (intermediate) configuration and then start the simulation. After some time, the polymer will either end up in state A (unfolded) or state B (folded). Now, due to the stochastic nature of Langevin molecular dynamics, if we repeat this experiment many times, in the end, some number of simulations runs will have ended in A, and another number in B. That is what NA and NB in the data file means. pB, or the committor, is calculated as pB = NB / (NA + NB) and hence also called the folding probability, as it is the probability for a given configuration to end up in the folded state. All the other variables are characterizing the configuration of the polymer. So the _goal_ of the analysis is to predict the folding probability, pB, given a vector of variable characterizing the polymer configuration.

If so - What could be other use cases for this?

Think about the following: You have a company A producing a product, which is created in some chemical reaction based on a receipt. Using the receipts you can try to predict the quality of the product. Can your algorithm be used to predict other receipt <-> outcome pairs?

I'm afraid it doesn't work that way. However, given some reaction, you might be able to predict what is the most efficient way to drive the reaction, i. e. the best receipt.

I have tested it on my windows
first it ran fine. After some changes on the settings (those you mention on your webpage) i got a
Process failed - 11
error message.
Stacktrace and process XML are attached.

OK, I have tried to run your process on my installation, it failed with the same error. I am pretty sure that the problem is in the normalization operator: it is not configured. The optimization in String Coordinate implicitly assumes that all variables (except NA and NB of course, which are integer numbers) are within [0, 1], so I use the range transformation. If values are way out of this range, for some so far unknown reason the operator fails with an ArrayIndexOutOfBoundsException. I will definitely investigate, thanks! I am on a conference right now, so it might take some time until I can find the problem.

Cheers,
Lennex

StaryVena · September 2015

Marco Boeck wrote:

Hi,

since 6.5 the main entry point is the com.rapidminer.launcher.GUILauncher class. If you use that to start Studio in Eclipse, it should find everything automatically.

Regards,
Marco

Hello Marco,
I can't find the launcher package or the GUILauncher class in the repository - https://github.com/rapidminer/rapidminer-studio/tree/master/src/main/java/com/rapidminer
Am I something missing or where is the class located?

Best wishes
Vaclav

Marco_Boeck · September 2015

Hi,

sorry, my bad. Ignore that post

RapidMinerGUI.main() is fine.

Regards,
Marco

leitold · September 2015

I have fixed the bug discovered by Martin, so again the download link for the latest version:

http://christian.leitold.info/index.php?page=string-coordinate

The bug actually didn't occur in the demo by pure chance, the normalization simply changed the order of attributes and thus prevented the bug from occurring. Now, while the extension now will not crash any more, it is still important to have all your attributes in the interval [0,1] to get meaningful results.

Cheers,
Lennex

MartinLiebig · September 2015

Hi again,

why don't you put the normalization inside your operator if it does not work otherwise? Would prevent people from doing "wrong" things.

Cheers,
Martin

leitold · September 2015

Hi,

Martin Schmitz wrote:
why don't you put the normalization inside your operator if it does not work otherwise? Would prevent people from doing "wrong" things.

I have thought about that, and my idea was that the operator should only do one thing, especially as there _is_ already an operator to do the normalization. But now that you mention it, I will add an option to to the normalization within the new operator.

Cheers,
Lennex

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

"Move Rapidminer project from one eclipse installation to another"

Answers