RapidMiner 9.7 is Now Available

Lots of amazing new improvements including true version control! Learn more about what's new here.

CLICK HERE TO DOWNLOAD

Create Operator or Output with Groovy

darktemptationdarktemptation Member Posts: 4 Contributor I
edited June 2019 in Help
My Problem is quite simple:
I want to load a Webpage with Groovy in a Text-Operator (i.e. Document) and then extract certain attributes (e.g. all <li>-Texts).

Now I can fetch the HTML from a page with

"http://rapid-i.com".toURL().text
But the Script Operator does not return anything to the output, even when I use the "return" from Groovy.

Can somebody give me a hint?
Tagged:

Answers

  • steffensteffen Member Posts: 347  Maven
    Hm ... maybe I didnt grasp the problem, but rapidminer cannot deal with arbitrary groovy types. You have to convert the output into an IO-type rapidminer does understand.

    ... and "no", I do not know which one and how  :-\.
  • haddockhaddock Member Posts: 849  Guru
    Hi there,

    I think you were getting nothing back because you were being re-directed, the following ( with the '/' on the end of the URL ) produces words of infinite beauty, wisdom, etc..etc..
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input>
          <location/>
        </input>
        <output>
          <location>//R5 Forum/groovout</location>
          <location/>
          <location/>
          <location/>
          <location/>
          <location/>
        </output>
        <macros/>
      </context>
      <operator activated="true" class="process" expanded="true" name="Process">
        <process expanded="true" height="478" width="915">
          <operator activated="true" class="execute_script" expanded="true" height="76" name="Execute Script" width="90" x="176" y="9">
            <parameter key="script" value="operator.getProcess().getLog().log(&quot;http://rapid-i.com/&quot;.toURL().text)"/>
          </operator>
          <connect from_op="Execute Script" from_port="output 1" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
  • darktemptationdarktemptation Member Posts: 4 Contributor I
    Thanks for the hints so far.

    But @haddock:

    Even if I try your setting, the log tells me that there is nothing delivered to the Result 1 Port.

    And if you like you can also take another page where you aren't redirected.

    "http://www.aboutgroovy.com".toURL().text

    So the problem is still how to get the fetched result to the the output-port, with a data type that is known by RM.
    Maybe then the question must be, how can I create an IOObject with a String/Text Attribute in Groovy (as Steffen suggest)?
  • haddockhaddock Member Posts: 849  Guru
    Ooops, I thought you weren't getting anything back. If you want to manipulate the contents you can with macros, and so to logs and example, like this...
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input>
          <location/>
        </input>
        <output>
          <location/>
          <location/>
        </output>
        <macros>
          <macro>
            <key>HTML</key>
            <value/>
          </macro>
        </macros>
      </context>
      <operator activated="true" class="process" expanded="true" name="Process">
        <process expanded="true" height="258" width="915">
          <operator activated="true" class="execute_script" expanded="true" height="76" name="Execute Script" width="90" x="45" y="30">
            <parameter key="script" value="&#13;operator.getProcess().getMacroHandler().addMacro(&quot;HTML&quot;, &quot;http://rapid-i.com/&quot;.toURL().text.substring(0,10));&#10;//def html=&quot;http://rapid-i.com/&quot;.toURL().text.substring(0,10)"/>
          </operator>
          <operator activated="true" class="provide_macro_as_log_value" expanded="true" height="76" name="Provide Macro as Log Value" width="90" x="230" y="74">
            <parameter key="macro_name" value="HTML"/>
          </operator>
          <operator activated="true" class="log" expanded="true" height="76" name="Log" width="90" x="359" y="73">
            <list key="log">
              <parameter key="HTML?" value="operator.Provide Macro as Log Value.value.macro_value"/>
            </list>
          </operator>
          <operator activated="true" class="log_to_data" expanded="true" height="94" name="Log to Data" width="90" x="511" y="71"/>
          <connect from_op="Execute Script" from_port="output 1" to_op="Provide Macro as Log Value" to_port="through 1"/>
          <connect from_op="Provide Macro as Log Value" from_port="through 1" to_op="Log" to_port="through 1"/>
          <connect from_op="Log" from_port="through 1" to_op="Log to Data" to_port="through 1"/>
          <connect from_op="Log to Data" from_port="exampleSet" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    There's some Groovy stuff on the Wiki which may be helpful.

  • darktemptationdarktemptation Member Posts: 4 Contributor I
    Thanks haddock

    This goes into the right direction, for what I looked. After a few modifications I found the way how to do it.

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input>
          <location/>
        </input>
        <output>
          <location/>
          <location/>
        </output>
        <macros>
          <macro>
            <key>HTML</key>
            <value/>
          </macro>
        </macros>
      </context>
      <operator activated="true" class="process" expanded="true" name="Process">
        <process expanded="true" height="500" width="655">
          <operator activated="true" class="execute_script" expanded="true" height="60" name="Execute Script" width="90" x="45" y="30">
            <parameter key="script" value="&#13;operator.getProcess().getMacroHandler().addMacro(&quot;HTML&quot;, &quot;http://rapid-i.com/&quot;.toURL().text);&#10;//def html=&quot;http://rapid-i.com/&quot;.toURL().text.substring(0,10)"/>
          </operator>
          <operator activated="true" class="text:create_document" expanded="true" height="60" name="Create Document" width="90" x="246" y="300">
            <parameter key="text" value="%{HTML}"/>
            <parameter key="add label" value="true"/>
            <parameter key="label_type" value="text"/>
            <parameter key="label_value" value="raw html"/>
          </operator>
          <connect from_op="Create Document" from_port="output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    So I just used the scripting Operator to define the macro and used the macro as parameter for the "Create Document" Operator. This is a nice solution I think.

    Now there will be the next step, how to find all the "<li>" and write all in one Attribute.

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531   Unicorn
    Hi,
    a detailed description how to use the Groovy Operator and how to return things there is given in the How to Extend RapidMiner tutorial.

    Greetings,
      Sebastian
  • darktemptationdarktemptation Member Posts: 4 Contributor I
    Hi Sebastian

    After a short search I found the tutorial in the Shop
    http://rapid-i.com/component/page,shop.product_details/flypage,flypage.tpl/product_id,52/category_id,5/option,com_virtuemart/Itemid,180/

    I guess you mean this.
    But when I read there the detailed description there is written:

    Together with the white paper you receive two projects for Eclipse. The one is an extension containing all examples covered in the book and the other is a template for building own Extensions with Eclipse.
    What book do you guys talk there about? There is only a White Paper to purchase/download.
    And may it be possible to get a short preview of the content, at least this White Paper costs €40 (resp. CHF 60.-), then I want to know exactly what is covered in there and maybe see an example.

    Best regards

    Darktemptation
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531   Unicorn
    Hi,
    this "book" is the white paper. The whitepaper has 45 pages in DIN A4, so other people would format it differently and call it a book.
    Sorry, but what examples do you mean? It covers everything you need to write your own extensions. In fact we are using it internally to teach new colleges...There's everything in I know about extensions and I wrote the published ones...

    Greetings,
      Sebastian
Sign In or Register to comment.