Extracting the most representative 10 keywords from web page

Wisdom logo Registration now open for RapidMiner Wisdom Americas | New Orleans | October 10-12, 2018   Learn More
Contributor II singing_bird_1
Contributor II

Re: Extracting the most representative 10 keywords from web page

<?xml version="1.0" encoding="UTF-8"?><process version="7.5.003">
<operator activated="true" class="process" compatibility="7.5.003" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="open_file" compatibility="7.5.003" expanded="true" height="68" name="Open File" width="90" x="45" y="34">
<parameter key="filename" value="C:\Users\Mennatollah\Desktop\url_test_test.csv"/>
<operator activated="true" class="read_csv" compatibility="7.5.003" expanded="true" height="68" name="Read CSV" width="90" x="179" y="34">
<parameter key="use_quotes" value="false"/>
<parameter key="parse_numbers" value="false"/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations"/>
<list key="data_set_meta_data_information"/>
<operator activated="true" class="nominal_to_text" compatibility="7.5.003" expanded="true" height="82" name="Nominal to Text" width="90" x="246" y="187"/>
<operator activated="true" class="web:retrieve_webpages" compatibility="7.3.000" expanded="true" height="68" name="Get Pages" width="90" x="447" y="34">
<parameter key="link_attribute" value="att1"/>
<operator activated="true" class="multiply" compatibility="7.5.003" expanded="true" height="103" name="Multiply" width="90" x="380" y="289"/>
<operator activated="true" class="textSmiley Tonguerocess_document_from_data" compatibility="7.5.000" expanded="true" height="82" name="Process Documents from Data" width="90" x="581" y="34">
<parameter key="keep_text" value="true"/>
<parameter key="prune_method" value="by ranking"/>
<parameter key="prune_below_rank" value="0.009"/>
<parameter key="prune_above_rank" value="0.095"/>
<list key="specify_weights"/>
<process expanded="true">
<operator activated="true" class="web:extract_html_text_content" compatibility="7.3.000" expanded="true" height="68" name="Extract Content" width="90" x="45" y="34">
<parameter key="ignore_non_html_tags" value="false"/>
<operator activated="true" class="multiply" compatibility="7.5.003" expanded="true" height="103" name="Multiply (2)" width="90" x="448" y="44"/>
<connect from_port="document" to_op="Extract Content" to_port="document"/>
<connect from_op="Extract Content" from_port="document" to_op="Multiply (2)" to_port="input"/>
<connect from_op="Multiply (2)" from_port="output 1" to_port="document 2"/>
<connect from_op="Multiply (2)" from_port="output 2" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
<portSpacing port="sink_document 3" spacing="0"/>
<connect from_op="Open File" from_port="file" to_op="Read CSV" to_port="file"/>
<connect from_op="Read CSV" from_port="output" to_op="Nominal to Text" to_port="example set input"/>
<connect from_op="Nominal to Text" from_port="example set output" to_op="Get Pages" to_port="Example Set"/>
<connect from_op="Get Pages" from_port="Example Set" to_op="Multiply" to_port="input"/>
<connect from_op="Multiply" from_port="output 1" to_port="result 2"/>
<connect from_op="Multiply" from_port="output 2" to_op="Process Documents from Data" to_port="example set"/>
<connect from_op="Process Documents from Data" from_port="example set" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>

here is the xml code

thank you

Community Manager Community Manager
Community Manager

Re: Extracting the most representative 10 keywords from web page

hello @singing_bird_1 ok we're making some progress.  Thank you for pasting your XML.  It seems that you are running RM 7.5 which is an old version.  Some of your operators were updated in 7.6 and you have pasted things like 

"textSmiley Tonguerocess_document_from_data"

in your XML which does not work well.  Smiley Happy  Can you please try updating RapidMiner to 7.6, opening your process, going to the XML tab, copying exactly what is there, and pasting it here again in this thread?




Scott Genzer
Senior Community Manager
RapidMiner, Inc.