The RapidMiner community is on read-only mode until further notice. Technical support via cases will continue to work as is. For any urgent licensing related requests from Students/Faculty members, please use the Altair academic forum here.
"[SOLVED] Plotting in RM / result overview: extend scatter plot"
Hello!
I have recently implemented a new learning operator in RM 5, so far only for my own testing purposes. The basic idea is to construct a regression as a projection onto a one-dimension "string" which resides in the feature space, so obviously only real-valued features can be used. The "string" is in the end just a number of n-dimensional points connected by straight lines, where n is the number of used features, and then some geometric procedure is used to project a feature vector onto the string.
Now I would like to visualize the result in a plot. I am thinking of something like the standard scatter plot, where you can select two features as x and y and a third for the color code. In addition, I want to draw the string in the same plot, so that one can see how well it "fits" to the data. My problem is, I have no clue where to implement this. In particular, as far as I understand it, the scatter plot is only a way to visualize the ExampleSet which represents the data, and of course there is no information about the used learner in the example set. On the other hand, the model does not know anything about the training data. So in the end, I would need a "thing" which uses as input an example set (with or without predicted labels, does not matter), as well as my model from the new operator, and creates a plot which shows both the "string" and the points from the example set in the same drawing. I would be grateful for any hints how to do this!
Thanks a lot!
jlennex
I have recently implemented a new learning operator in RM 5, so far only for my own testing purposes. The basic idea is to construct a regression as a projection onto a one-dimension "string" which resides in the feature space, so obviously only real-valued features can be used. The "string" is in the end just a number of n-dimensional points connected by straight lines, where n is the number of used features, and then some geometric procedure is used to project a feature vector onto the string.
Now I would like to visualize the result in a plot. I am thinking of something like the standard scatter plot, where you can select two features as x and y and a third for the color code. In addition, I want to draw the string in the same plot, so that one can see how well it "fits" to the data. My problem is, I have no clue where to implement this. In particular, as far as I understand it, the scatter plot is only a way to visualize the ExampleSet which represents the data, and of course there is no information about the used learner in the example set. On the other hand, the model does not know anything about the training data. So in the end, I would need a "thing" which uses as input an example set (with or without predicted labels, does not matter), as well as my model from the new operator, and creates a plot which shows both the "string" and the points from the example set in the same drawing. I would be grateful for any hints how to do this!
Thanks a lot!
jlennex
Tagged:
0
Answers
there are at least two ways to do this:
1) You can define your own IOObject for this. You define them in the ioobjectsExtensionname.xml file. There you can specify your own renderer where you can control how to display the output of your operator to the user. The advantage is full control over the result display, however it will be a bit tricky to use the result of your operator with other RapidMiner Studio operators.
2) You can attach arbitrary data to an IOObject (i.e. an ExampleSet) via the #setUserData(String key, Object value) method. Retrieving works via #getUserData(String key). Your operator could attach a POJO to your result ExampleSet which is then retrieved by your new plot. The advantage here is that you don't need to create your own IOObject and you can freely use the results of your operator as input for default RM Studio operators. However it might be a bit hard to add your custom plot for it (and only for it).
Regards,
Marco
thanks a lot for your help. I think I will go for the second suggestion. I might just attach an object with all necessary information to draw the string to the example set and then create a modified version of the "scatter plot" plotter that can use this information.
Regards,
jlennex
it seems that in RM 5.3, the #setUserData method is not available for an IOObject. Is there an easy way to circumvent this, or should I better use RM 6 instead?
Also, so far I have troubles finding out how exactly the ExampleSet from the output port is converted to the DataTable that is used for the actual plotting procedure in the end (I have already created a custom plotter which is an extension of the ScatterPlotter2 class).
Kind regards,
Lennex
if your extension is supposed to work in 5.3 you will have to either use something which works in both versions or use different code depending on the version the extension is running on.
You could also maintain 2 extensions (one for 5, one for 6). That's a decision that is up to you. If you don't need to support RM Studio 5, I would suggest using version 6 because it's a major improvement over version 5.
To get a DataTable you can use the DataTableExampleSetAdapter class. There is a constructor DataTableExampleSetAdapter(ExampleSet exampleSet, AttributeWeights weights, boolean ignoreId) which takes the ExampleSet, weights (or null) and a simple boolean as parameter. It does not copy the ExampleSet so it's not wasting memory.
Regards,
Marco
ok, thanks, I might just use RM 6 instead, but so far I could not find any documentation for extension development in version 6.
Also, what I meant with the earlier question is, suppose I attach the necessary additional data to an ExampleSet which is delivered at the output port of my new operator. Now, somewhere in between, a DataTable is extracted from the ExampleSet and passed to the plotter class. There are two problems with that:
a) I do not know where exactly this happens when I execute the process (my first question).
b) I need the additional data as well, so somehow I have to pass it on as well, which means also the existing code in between has to be modified, so in the end this might get all very messy.
Considering all that, it might be better to create a new IOObject after all, and create a custom renderer for it, as your first suggestion was.
Related to that, is to possible to track the "flow" of information in a RM process in Eclipse? In particular, I would like to "see" what happens to the output of my operator when I connect the output port to the result port on the right hand side of the process view.
Kind regards,
Lennex
a) See DockableResultDisplay#showData(final IOContainer container, final String statusMessage). This will take you to Renderer#getVisualizationComponent(Object renderable, IOContainer ioContainer) which shows you available implementations when pressing Ctrl-T on it in Eclipse. ExampleSetDataRenderer is the one for ExampleSets.
b) You are correct, this would be messy..
Regarding the "flow" of information, have a look at the Process#run() method. This is basically the whole process and you can see what is going on. At the end of a proecss, Mainframe#processEnded() is interesting as it creates the actual result display.
Regards,
Marco
thanks again for all your suggestions. I thought it might be a good idea to actually sketch my solution here. In the end, I have just created an additional ExampleSet with the string points to be plotted, and an extra attribute indicating that these points are special. Then, this ExampleSet is merged with the original one, and this merged set is delivered to an additional output port. The relevant part of the code is basically taken from the ExampleSetUnion operator: Plotting is then handled by an extension of the ScatterPlotter2 class, which detects the special points and plots them on top of the original data as dots connected with straight lines. The approach with the ExampleSet has the advantage that one can immediately inspect all the values of the string points, in addition to plotting them.
Regards,
Lennex
cool, thanks for sharing
Regards,
Marco