Options

"Modifying an existing operator (in separate extension)"

colocolo Member Posts: 236 Maven
edited May 2019 in Help
Hi,

I am just working on a modification of the "Get Page" operator to address some issues that appear in the daily use. To keep track of the modifications and to protect them from further updates of the original operator I want to integrate the modified operator in my own extension where I develop supplementary operators as they become usefull. I copied GetWebpageOperator.java from the web mining extension, renamed the class and added the operator to a group (using the XML files). But during startup of RapidMiner there are some errors:
WARNING: Failed to register operator: com.rapidminer.operator.OperatorCreationException: Operator cannot be constructed: 'get_page_advanced(com.rapidminer.matthias.webmining.retrieval.GetWebpageAdvancedOperator)': com/rapidminer/operator/text/Document
com.rapidminer.operator.OperatorCreationException: Operator cannot be constructed: 'get_page_advanced(com.rapidminer.matthias.webmining.retrieval.GetWebpageAdvancedOperator)': com/rapidminer/operator/text/Document
at com.rapidminer.operator.OperatorDescription.createOperatorInstance(OperatorDescription.java:347)
at com.rapidminer.tools.OperatorService.registerOperator(OperatorService.java:430)
at com.rapidminer.tools.OperatorService.parseOperators(OperatorService.java:260)
at com.rapidminer.tools.OperatorService.parseOperators(OperatorService.java:256)
at com.rapidminer.tools.OperatorService.parseOperators(OperatorService.java:256)
at com.rapidminer.tools.OperatorService.parseOperators(OperatorService.java:256)
at com.rapidminer.tools.OperatorService.parseOperators(OperatorService.java:232)
at com.rapidminer.tools.OperatorService.registerOperators(OperatorService.java:206)
at com.rapidminer.tools.plugin.Plugin.registerOperators(Plugin.java:471)
at com.rapidminer.tools.plugin.Plugin.registerAllPluginOperators(Plugin.java:725)
at com.rapidminer.tools.OperatorService.init(OperatorService.java:167)
at com.rapidminer.RapidMiner.init(RapidMiner.java:465)
at com.rapidminer.gui.RapidMinerGUI.run(RapidMinerGUI.java:221)
at com.rapidminer.gui.RapidMinerGUI.launch(RapidMinerGUI.java:505)
at com.rapidminer.gui.RapidMinerGUI.main(RapidMinerGUI.java:488)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
at java.lang.reflect.Constructor.newInstance(Unknown Source)
at com.rapidminer.operator.OperatorDescription.createOperatorInstanceByDescription(OperatorDescription.java:360)
at com.rapidminer.operator.OperatorDescription.createOperatorInstance(OperatorDescription.java:339)
... 14 more
Caused by: java.lang.NoClassDefFoundError: com/rapidminer/operator/text/Document
at com.rapidminer.matthias.webmining.retrieval.GetWebpageAdvancedOperator.<init>(GetWebpageAdvancedOperator.java:64)
... 20 more
Caused by: java.lang.ClassNotFoundException: com.rapidminer.operator.text.Document
at java.net.URLClassLoader$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(Unknown Source)
at com.rapidminer.tools.plugin.PluginClassLoader.loadClass(PluginClassLoader.java:102)
at java.lang.ClassLoader.loadClass(Unknown Source)
... 21 more
Well, I think the problem is the use of the operator in an isolated place. But the error refers to the Document type of the Text Processing extension which is also a separate one. What do I miss to allow this for my operator too? I tried to integrate the extended AbstractReader into my local project, but this didn't help. Since I worked mainly on smaller Java projects until now, I am not really familiar with the connection of different packages and libraries (what is allowed and what is required). Perhaps you can give me a hint what else is neccessary?

Thanks in advance and best regards
Matthias
Tagged:

Answers

  • Options
    Marco_BoeckMarco_Boeck Administrator, Moderator, Employee, Member, University Professor Posts: 1,993 RM Engineering
    Hi,

    looks like your plugin cannot find the text extension. Did you define it as a dependency in your ant buildfile?

    If not, add these lines to your build.xml:
    <property name="extension.needsVersion" value="5.1" />
    <property name="extension.dependencies" value="rmx_text[5.1]" />
    Regards,
    Marco
  • Options
    colocolo Member Posts: 236 Maven
    Hi Marco,

    thank you very much, I just took care of the build path and including the relevant projects there but didn't take a look at the build file. Not really familiar with them...

    I defined the dependency and the error message is gone. But now appears a similar one, this time complaining about a class from the web mining extension. I tried to add a second dependency, since "Get Page" is accessing functions from both extensions. But this did not seem to work. Is it possible to define a second dependency? Or is there any other way to get an operator working, which is dependent on multiple extensions?

    I would be glad if you could help me with that once again.

    Best regards
    Matthias

    P.S. Maybe you are interested in my additions to the operator. They address character encoding (charset) issues that occur when retrieving web sites and writing them into document objects. The current operator just checks the HTTP header "Content-Type", which is often not set properly.
  • Options
    Marco_BoeckMarco_Boeck Administrator, Moderator, Employee, Member, University Professor Posts: 1,993 RM Engineering
    Hi,

    try this:
    <property name="extension.dependencies" value="rmx_text[5.1], rmx_web[5.1]" />
    Regards,
    Marco

    P.S.: I will point a developer involved in the web extension to this thread, who knows :)
  • Options
    colocolo Member Posts: 236 Maven
    Hi Marco,

    thanks for your reply. I already tried this - without any effect (the second dependency seems to be ignored this way). I also tried
    <property name="extension.dependencies" value="rmx_text[5.1]; rmx_web[5.1]" />
    which results in the following error message during RM startup:
    Exception in thread "main" java.lang.UnsupportedOperationException: Only one dependent plugin allowed!
    at com.rapidminer.tools.plugin.Plugin.checkDependencies(Plugin.java:323)
    at com.rapidminer.tools.plugin.Plugin.registerAllPluginDescriptions(Plugin.java:653)
    at com.rapidminer.tools.plugin.Plugin.initAll(Plugin.java:875)
    at com.rapidminer.RapidMiner.init(RapidMiner.java:456)
    at com.rapidminer.gui.RapidMinerGUI.run(RapidMinerGUI.java:221)
    at com.rapidminer.gui.RapidMinerGUI.launch(RapidMinerGUI.java:505)
    at com.rapidminer.gui.RapidMinerGUI.main(RapidMinerGUI.java:488)
    Any further ideas? :)

    Regards
    Matthias
  • Options
    Marco_BoeckMarco_Boeck Administrator, Moderator, Employee, Member, University Professor Posts: 1,993 RM Engineering
    Hi,

    hm I did not know that ;D
    Have you tried just setting it to "rmx_web[5.1]"? The web plugin depends on the text plugin, so the text plugin should be included automatically (I hope..).

    Regards,
    Marco
  • Options
    colocolo Member Posts: 236 Maven
    Hi Marco,

    what an idea. As simple as brilliant! Now everything works fine :)

    But the general question of how to build extensions that depend on multiple other extensions (that are not related and do not allow such solutions as web and text plugins do) persists. This is nothing urgent for me right now, but should perhaps be considered. Especially because the XML attribute is called dependencies, which is definitely plural :D

    Thank you again!

    Best regards
    Matthias
  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    actually this should work already, but I don't know if we released this already with an update.

    By the way: Your improvements might be valuable to the community. Did you ever thought about contributing the improvements instead of maintaining your own extension?

    Greetings,
      Sebastian
  • Options
    colocolo Member Posts: 236 Maven
    Hi,

    @Sebastian: I will first describe a newly discovered problem and reply to your question below the post.

    After things are working fine for my Eclipse based RapidMiner, I wanted to use the new operator on our RapidAnalytics server. Previous operators from my own extension are working fine on the server. The new one is the first that depends on other plugins, and the dependencies seem to be defined different on the server (although the dependency between web and text extension seems to work fine). I just copied the relevant JARs (Web Mining, Text Processing, own extension) from my RapidMiner_Vega\lib\plugins to the server's plugin directory. When trying to execute a process containing the new "GetPage" derivate I receive a message complaining about a similar problem to the one from my first post:
    Process cannot be created: com.rapidminer.tools.XMLException: Cannot create operator: Operator cannot be constructed: 'get_page_advanced(com.rapidminer.matthias.webmining.retrieval.GetWebpageAdvancedOperator)': com/rapidminer/operator/text/Document.
    I tried using the Webstart version where I cannot even drag the operator into my process area. The other operators (from my extension) are working fine.

    I am not sure if I should start a new topic for this one on the RapidAnalytics board, but I will try here first since this is the same problem as solved before just in another setting.

    I would be glad if the solution was as simple as before and you can help me with this once again :)

    Best regards
    Matthias


    The latest operator from the subversion repository did definitely not properly detect character encodings if they were not properly submitted via HTTP header. You might check the website of my home town (http://www.siegburg.de) for example. The document is UTF-8 encoded but there is no charset declared in the HTTP header. So you are using the hard-coded Latin-1 encoding which creates nice symbols instead of correct characters for our special german characters. This way you can easily check if your operator version is capable of this or not. The public available one isn't.
    Up to now I just added new operators or new functions very special for my current work. I guess they are not very valueable for others at the moment, maybe this will change in the future. The additions to the "GetPage" operator were the first improvements that might be worth contributing. But I don't have too much time for following special coding conventions since I'm currently developing such things while working on my master thesis. Since there were some improvements from your side when we started working with the web mining extension (after some days with one of your consultants), I decided to put my own code in a separate extensions to avoid multiple code joining when requested updates were released (for example POST requests or cookie handling). But if you check your "GetPage" operator and see that the mentioned issue was not fixed, I am willing to share my improvements. Not so many lines of code, the integration should be easy.
Sign In or Register to comment.