Create Operator or Output with Groovy

darktemptationdarktemptation Member Posts: 4 Contributor I
edited June 19 in Help
My Problem is quite simple:
I want to load a Webpage with Groovy in a Text-Operator (i.e. Document) and then extract certain attributes (e.g. all <li>-Texts).

Now I can fetch the HTML from a page with

"http://rapid-i.com".toURL().text
But the Script Operator does not return anything to the output, even when I use the "return" from Groovy.

Can somebody give me a hint?
Tagged:

Answers

  • steffensteffen Member Posts: 347  Guru
    Hm ... maybe I didnt grasp the problem, but rapidminer cannot deal with arbitrary groovy types. You have to convert the output into an IO-type rapidminer does understand.

    ... and "no", I do not know which one and how  :-\.
  • haddockhaddock Member Posts: 849  Guru
    Hi there,

    I think you were getting nothing back because you were being re-directed, the following ( with the '/' on the end of the URL ) produces words of infinite beauty, wisdom, etc..etc..
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input>
          <location/>
        </input>
        <output>
          <location>//R5 Forum/groovout</location>
          <location/>
          <location/>
          <location/>
          <location/>
          <location/>
        </output>
        <macros/>
      </context>
      <operator activated="true" class="process" expanded="true" name="Process">
        <process expanded="true" height="478" width="915">
          <operator activated="true" class="execute_script" expanded="true" height="76" name="Execute Script" width="90" x="176" y="9">
            <parameter key="script" value="operator.getProcess().getLog().log(&quot;http://rapid-i.com/&quot;.toURL().text)"/>
          </operator>
          <connect from_op="Execute Script" from_port="output 1" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
  • darktemptationdarktemptation Member Posts: 4 Contributor I
    Thanks for the hints so far.

    But @haddock:

    Even if I try your setting, the log tells me that there is nothing delivered to the Result 1 Port.

    And if you like you can also take another page where you aren't redirected.

    "http://www.aboutgroovy.com".toURL().text

    So the problem is still how to get the fetched result to the the output-port, with a data type that is known by RM.
    Maybe then the question must be, how can I create an IOObject with a String/Text Attribute in Groovy (as Steffen suggest)?
  • haddockhaddock Member Posts: 849  Guru
    Ooops, I thought you weren't getting anything back. If you want to manipulate the contents you can with macros, and so to logs and example, like this...
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input>
          <location/>
        </input>
        <output>
          <location/>
          <location/>
        </output>
        <macros>
          <macro>
            <key>HTML</key>
            <value/>
          </macro>
        </macros>
      </context>
      <operator activated="true" class="process" expanded="true" name="Process">
        <process expanded="true" height="258" width="915">
          <operator activated="true" class="execute_script" expanded="true" height="76" name="Execute Script" width="90" x="45" y="30">
            <parameter key="script" value="&#13;operator.getProcess().getMacroHandler().addMacro(&quot;HTML&quot;, &quot;http://rapid-i.com/&quot;.toURL().text.substring(0,10));&#10;//def html=&quot;http://rapid-i.com/&quot;.toURL().text.substring(0,10)"/>
          </operator>
          <operator activated="true" class="provide_macro_as_log_value" expanded="true" height="76" name="Provide Macro as Log Value" width="90" x="230" y="74">
            <parameter key="macro_name" value="HTML"/>
          </operator>
          <operator activated="true" class="log" expanded="true" height="76" name="Log" width="90" x="359" y="73">
            <list key="log">
              <parameter key="HTML?" value="operator.Provide Macro as Log Value.value.macro_value"/>
            </list>
          </operator>
          <operator activated="true" class="log_to_data" expanded="true" height="94" name="Log to Data" width="90" x="511" y="71"/>
          <connect from_op="Execute Script" from_port="output 1" to_op="Provide Macro as Log Value" to_port="through 1"/>
          <connect from_op="Provide Macro as Log Value" from_port="through 1" to_op="Log" to_port="through 1"/>
          <connect from_op="Log" from_port="through 1" to_op="Log to Data" to_port="through 1"/>
          <connect from_op="Log to Data" from_port="exampleSet" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    There's some Groovy stuff on the Wiki which may be helpful.

  • darktemptationdarktemptation Member Posts: 4 Contributor I
    Thanks haddock

    This goes into the right direction, for what I looked. After a few modifications I found the way how to do it.

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input>
          <location/>
        </input>
        <output>
          <location/>
          <location/>
        </output>
        <macros>
          <macro>
            <key>HTML</key>
            <value/>
          </macro>
        </macros>
      </context>
      <operator activated="true" class="process" expanded="true" name="Process">
        <process expanded="true" height="500" width="655">
          <operator activated="true" class="execute_script" expanded="true" height="60" name="Execute Script" width="90" x="45" y="30">
            <parameter key="script" value="&#13;operator.getProcess().getMacroHandler().addMacro(&quot;HTML&quot;, &quot;http://rapid-i.com/&quot;.toURL().text);&#10;//def html=&quot;http://rapid-i.com/&quot;.toURL().text.substring(0,10)"/>
          </operator>
          <operator activated="true" class="text:create_document" expanded="true" height="60" name="Create Document" width="90" x="246" y="300">
            <parameter key="text" value="%{HTML}"/>
            <parameter key="add label" value="true"/>
            <parameter key="label_type" value="text"/>
            <parameter key="label_value" value="raw html"/>
          </operator>
          <connect from_op="Create Document" from_port="output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    So I just used the scripting Operator to define the macro and used the macro as parameter for the "Create Document" Operator. This is a nice solution I think.

    Now there will be the next step, how to find all the "<li>" and write all in one Attribute.

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,529   Unicorn
    Hi,
    a detailed description how to use the Groovy Operator and how to return things there is given in the How to Extend RapidMiner tutorial.

    Greetings,
      Sebastian
  • darktemptationdarktemptation Member Posts: 4 Contributor I
    Hi Sebastian

    After a short search I found the tutorial in the Shop
    http://rapid-i.com/component/page,shop.product_details/flypage,flypage.tpl/product_id,52/category_id,5/option,com_virtuemart/Itemid,180/

    I guess you mean this.
    But when I read there the detailed description there is written:

    Together with the white paper you receive two projects for Eclipse. The one is an extension containing all examples covered in the book and the other is a template for building own Extensions with Eclipse.
    What book do you guys talk there about? There is only a White Paper to purchase/download.
    And may it be possible to get a short preview of the content, at least this White Paper costs €40 (resp. CHF 60.-), then I want to know exactly what is covered in there and maybe see an example.

    Best regards

    Darktemptation
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,529   Unicorn
    Hi,
    this "book" is the white paper. The whitepaper has 45 pages in DIN A4, so other people would format it differently and call it a book.
    Sorry, but what examples do you mean? It covers everything you need to write your own extensions. In fact we are using it internally to teach new colleges...There's everything in I know about extensions and I wrote the published ones...

    Greetings,
      Sebastian
Sign In or Register to comment.