Information Extraction - TextVisualizer Problem

jcurry1 · December 2012

Hi Folks,

I have an issue with the TextVisualizer component that is part of the Information Extraction plugin. When I try and run any of the sample processes for the plugin (downloadable from link below) that use the TextVisualiser I get an error such as below.

Anyone familiar with the Information Extraction plugin around to help? I am using the latest RapidMiner: 5.2.008 and the latest Information Extraction plugin on SourceForge. The other Information Extraction components seem to work fine so far.

[tt]Dec 31, 2012 4:15:01 PM WARNING: Error creating renderer: java.lang.ClassCastException: com.rapidminer.operator.visualization.TextVisualizer cannot be cast to com.rapidminer.operator.visualization.TextVisualizer[/tt]

Sample processes were downloaded from:

http://ieplugin4rm.svn.sourceforge.net/viewvc/ieplugin4rm/informationextractionplugin_Vega/trunk/informationExtraction_Vega/samples/


http://ieplugin4rm.svn.sourceforge.net/viewvc/ieplugin4rm/informationextractionplugin_Vega/trunk/informationExtraction_Vega/samples/

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.008">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
    <process expanded="true" height="540" width="815">
      <operator activated="true" class="text:read_document" compatibility="5.2.004" expanded="true" height="60" name="Read Document" width="90" x="45" y="30">
        <parameter key="file" value="C:\Documents and Settings\jcurry\My Documents\A_AAAAAA\samples\toyText.txt"/>
      </operator>
      <operator activated="true" class="text:documents_to_data" compatibility="5.2.004" expanded="true" height="76" name="Documents to Data" width="90" x="179" y="30">
        <parameter key="text_attribute" value="text"/>
      </operator>
      <operator activated="true" class="informationExtraction:sentence_tokenizer" compatibility="1.0.000" expanded="true" height="76" name="SentenceTokenizer" width="90" x="45" y="165">
        <parameter key="attribute" value="text"/>
        <parameter key="new token-name" value="sentence"/>
      </operator>
      <operator activated="true" class="informationExtraction:word_tokenizer" compatibility="1.0.000" expanded="true" height="76" name="WordTokenizer" width="90" x="179" y="165">
        <parameter key="attribute" value="sentence"/>
        <parameter key="new token-name" value="word"/>
      </operator>
      <operator activated="true" class="informationExtraction:text_visualizer" compatibility="1.0.000" expanded="true" height="76" name="TextVisualizer" width="90" x="313" y="165">
        <parameter key="text-attribute" value="word"/>
        <parameter key="label-attribute" value="sentence"/>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="5.2.008" expanded="true" height="76" name="Select Attributes" width="90" x="45" y="300">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attributes" value="batch|word"/>
        <parameter key="include_special_attributes" value="true"/>
      </operator>
      <operator activated="true" class="store" compatibility="5.2.008" expanded="true" height="60" name="Store" width="90" x="179" y="300">
        <parameter key="repository_entry" value="./ToyTextTokenized"/>
      </operator>
      <connect from_op="Read Document" from_port="output" to_op="Documents to Data" to_port="documents 1"/>
      <connect from_op="Documents to Data" from_port="example set" to_op="SentenceTokenizer" to_port="example set input"/>
      <connect from_op="SentenceTokenizer" from_port="example set output" to_op="WordTokenizer" to_port="example set input"/>
      <connect from_op="WordTokenizer" from_port="example set output" to_op="TextVisualizer" to_port="example set input"/>
      <connect from_op="TextVisualizer" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="TextVisualizer" from_port="text visualizer port" to_port="result 1"/>
      <connect from_op="Select Attributes" from_port="example set output" to_op="Store" to_port="input"/>
      <connect from_op="Store" from_port="through" to_port="result 2"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="54"/>
      <portSpacing port="sink_result 2" spacing="18"/>
      <portSpacing port="sink_result 3" spacing="0"/>
    </process>
  </operator>
</process>

Regards,
John.

Asunto · January 2013

Hello All,

here the same problem, unfortunately...
I use the same examples.

Jan 6, 2013 8:57:21 PM INFO: Process //VoorbeeldenInformationExtractionPlugin/1_Read-Tokenize-Visualize starts
Jan 6, 2013 8:57:21 PM INFO: Loading initial data.
Jan 6, 2013 8:57:22 PM INFO: Saving results.
Jan 6, 2013 8:57:22 PM INFO: Process //VoorbeeldenInformationExtractionPlugin/1_Read-Tokenize-Visualize finished successfully after 0 s
Jan 6, 2013 8:57:22 PM WARNING: Error creating renderer: java.lang.ClassCastException: com.rapidminer.operator.visualization.TextVisualizer cannot be cast to com.rapidminer.operator.visualization.TextVisualizer

Can anybody help? I'm very interested in getting the plugin going correctly...

Regards,
Kees.

jcurry1 · February 2013

I am waiting to hear back from Mr. Jungerman, the creator of the component when he finds some time to look at the issue.. In the meantime, he mentioned that it may work successfully with an older version of RapidMiner, the ones built from the Vega build. So perhaps it is worth going back... Not sure which RapidMiner release matches up with that - maybe one of the 4.x releases available below. I will try that.

If anyone has success with a specific older version, please update this thread with the version number.

http://sourceforge.net/projects/rapidminer/files/1.%20RapidMiner/

aborg · February 2013

Vega was 5.0.

fnl · February 2015

Here is how to set up the Information Extraction plugin for RapidMiner correctly and avoid the problems and the reported CCE.
Note that the IE plugin on the marketplace is outdated and should not be used; Instead, you need to manually install the plugin.
I can confirm this procedure works for RM 6.2, and I suppose it should also for the 5.x series, given the plugin has not been updated since.

1. Install the IE plugin jar; download it from:

http://sourceforge.net/projects/ieplugin4rm/

Then, copy the jar to $RAPIDMINER_HOME/lib/plugins
where $RAPIDMINER_HOME is the base directory where RapidMiner itself is installed.

2. Download the examples with

svn checkout https://svn.code.sf.net/p/ieplugin4rm/code/informationextractionplugin_Vega/trunk/informationExtraction_Vega/samples/

Then, install them by placing them into a repository of yours ("MyRepo"):

cp -r samples ~/.RapidMiner/repositories/MyRepo/IE_samples

Note that you have to first create your own repository within RapidMiner, "MyRepo" does not exist per se!
Open your repository tree (here "MyRepo") in RapidMiner, an you are done!

You can load and run the examples. Note that you might need to fix/set the paths of the text "databases", but this should be trivial.

ZeiniNahed · December 2017

I am trying to use the information extraction using rapidminer 8, but there is no result. I have searched more to know how to extract certian sentance that includes certain words but no results. Is there any way to do that.

this is the only reference I have found: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.645.7232&rep=rep1&type=pdf

I need help.

Thanks in advance.

sgenzer · December 2017

hello @ZeiniNahed - so that extension has not been supported in a long time (see here). I would strongly recommend using our very popular text processing extension instead.

Scott

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Information Extraction - TextVisualizer Problem

Answers