The Altair Community is migrating to a new platform to provide a better experience for you. The RapidMiner Community will merge with the Altair Community at the same time. In preparation for the migration, both communities are on read-only mode from July 15th - July 24th, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here.
Options

outofmemory problem

hodeffdhodeffd Member Posts: 2 Contributor I
edited November 2018 in Help
i am a newbie(student) with this software, i had saw some tutorials and i reached some info about this software
i have a project about text mining, i was given 2 classes of texts sets  and another texts set that is needed to be classified to one of the classes

i have done this:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.002">
 <context>
   <input/>
   <output/>
   <macros/>
 </context>
 <operator activated="true" class="process" compatibility="5.2.002" expanded="true" name="Process">
   <process expanded="true" height="588" width="968">
     <operator activated="true" class="text:process_document_from_file" compatibility="5.2.001" expanded="true" height="76" name="Process Documents from Files" width="90" x="84" y="179">
       <list key="text_directories">
         <parameter key="auth" value="C:\david computer backup\david university\year 3\machine learning\texts\auth"/>
         <parameter key="other" value="C:\david computer backup\david university\year 3\machine learning\texts\other"/>
       </list>
       <process expanded="true">
         <portSpacing port="source_document" spacing="0"/>
         <portSpacing port="sink_document 1" spacing="0"/>
       </process>
     </operator>
     <operator activated="true" class="select_attributes" compatibility="5.2.002" expanded="true" height="76" name="Select Attributes" width="90" x="179" y="120">
       <parameter key="attribute_filter_type" value="no_missing_values"/>
     </operator>
     <operator activated="true" class="set_role" compatibility="5.2.002" expanded="true" height="76" name="Set Role" width="90" x="282" y="117">
       <parameter key="name" value="label"/>
       <parameter key="target_role" value="label"/>
       <list key="set_additional_roles"/>
     </operator>
     <operator activated="true" class="x_validation" compatibility="5.2.002" expanded="true" height="112" name="Validation" width="90" x="447" y="120">
       <process expanded="true" height="588" width="459">
         <operator activated="true" class="decision_tree" compatibility="5.2.002" expanded="true" height="76" name="Decision Tree" width="90" x="180" y="138"/>
         <connect from_port="training" to_op="Decision Tree" to_port="training set"/>
         <connect from_op="Decision Tree" from_port="model" to_port="model"/>
         <portSpacing port="source_training" spacing="0"/>
         <portSpacing port="sink_model" spacing="0"/>
         <portSpacing port="sink_through 1" spacing="0"/>
       </process>
       <process expanded="true" height="588" width="459">
         <operator activated="true" class="apply_model" compatibility="5.2.002" expanded="true" height="76" name="Apply Model" width="90" x="76" y="147">
           <list key="application_parameters"/>
         </operator>
         <operator activated="true" class="performance" compatibility="5.2.002" expanded="true" height="76" name="Performance" width="90" x="180" y="255"/>
         <connect from_port="model" to_op="Apply Model" to_port="model"/>
         <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
         <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
         <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
         <portSpacing port="source_model" spacing="0"/>
         <portSpacing port="source_test set" spacing="0"/>
         <portSpacing port="source_through 1" spacing="0"/>
         <portSpacing port="sink_averagable 1" spacing="0"/>
         <portSpacing port="sink_averagable 2" spacing="0"/>
       </process>
     </operator>
     <connect from_op="Process Documents from Files" from_port="example set" to_op="Select Attributes" to_port="example set input"/>
     <connect from_op="Process Documents from Files" from_port="word list" to_port="result 2"/>
     <connect from_op="Select Attributes" from_port="example set output" to_op="Set Role" to_port="example set input"/>
     <connect from_op="Set Role" from_port="example set output" to_op="Validation" to_port="training"/>
     <connect from_op="Validation" from_port="training" to_port="result 1"/>
     <portSpacing port="source_input 1" spacing="0"/>
     <portSpacing port="sink_result 1" spacing="0"/>
     <portSpacing port="sink_result 2" spacing="0"/>
     <portSpacing port="sink_result 3" spacing="0"/>
   </process>
 </operator>
</process>
just like in this tuttorial: http://vancouverdata.blogspot.com/2010/11/text-analytics-with-rapidminer-part-5.html

the problem is the data is huge, so i get this error:

sorry for the long post:


Stack trace:
------------

Exception: java.lang.RuntimeException
Message: Cannot clone com.rapidminer.example.set.SplittedExampleSet: java.lang.reflect.InvocationTargetException. Target: java.lang.RuntimeException: Cannot clone com.rapidminer.example.set.SimpleExampleSet: java.lang.reflect.InvocationTargetException. Target: java.lang.OutOfMemoryError: GC overhead limit exceeded. Cause: java.lang.OutOfMemoryError: GC overhead limit exceeded.. Cause: java.lang.RuntimeException: Cannot clone com.rapidminer.example.set.SimpleExampleSet: java.lang.reflect.InvocationTargetException. Target: java.lang.OutOfMemoryError: GC overhead limit exceeded. Cause: java.lang.OutOfMemoryError: GC overhead limit exceeded..
Stack trace:
 com.rapidminer.example.set.AbstractExampleSet.clone(AbstractExampleSet.java:375)
 com.rapidminer.operator.learner.tree.TreeBuilder.learnTree(TreeBuilder.java:90)
 com.rapidminer.operator.learner.tree.AbstractTreeLearner.learn(AbstractTreeLearner.java:119)
 com.rapidminer.operator.learner.AbstractLearner.doWork(AbstractLearner.java:152)
 com.rapidminer.operator.Operator.execute(Operator.java:833)
 com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:51)
 com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:709)
 com.rapidminer.operator.validation.ValidationChain.executeLearner(ValidationChain.java:214)
 com.rapidminer.operator.validation.ValidationChain.learn(ValidationChain.java:305)
 com.rapidminer.operator.validation.XValidation.performIteration(XValidation.java:159)
 com.rapidminer.operator.validation.XValidation.estimatePerformance(XValidation.java:151)
 com.rapidminer.operator.validation.ValidationChain.doWork(ValidationChain.java:273)
 com.rapidminer.operator.Operator.execute(Operator.java:833)
 com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:51)
 com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:709)
 com.rapidminer.operator.OperatorChain.doWork(OperatorChain.java:379)
 com.rapidminer.operator.Operator.execute(Operator.java:833)
 com.rapidminer.Process.run(Process.java:925)
 com.rapidminer.Process.run(Process.java:848)
 com.rapidminer.Process.run(Process.java:807)
 com.rapidminer.Process.run(Process.java:802)
 com.rapidminer.Process.run(Process.java:792)
 com.rapidminer.gui.ProcessThread.run(ProcessThread.java:63)



Process:
------------

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.002">
 <context>
   <input/>
   <output/>
   <macros/>
 </context>
 <operator activated="true" class="process" compatibility="5.2.002" expanded="true" name="Process">
   <parameter key="logverbosity" value="init"/>
   <parameter key="random_seed" value="2001"/>
   <parameter key="send_mail" value="never"/>
   <parameter key="notification_email" value=""/>
   <parameter key="process_duration_for_mail" value="30"/>
   <parameter key="encoding" value="UTF-8"/>
   <parameter key="parallelize_main_process" value="false"/>
   <process expanded="true" height="588" width="968">
     <operator activated="true" class="text:process_document_from_file" compatibility="5.2.001" expanded="true" height="76" name="Process Documents from Files" width="90" x="84" y="179">
       <list key="text_directories">
         <parameter key="auth" value="C:\david computer backup\david university\year 3\machine learning\texts\auth"/>
         <parameter key="other" value="C:\david computer backup\david university\year 3\machine learning\texts\other"/>
       </list>
       <parameter key="file_pattern" value="*"/>
       <parameter key="extract_text_only" value="true"/>
       <parameter key="use_file_extension_as_type" value="true"/>
       <parameter key="content_type" value="txt"/>
       <parameter key="encoding" value="UTF-8"/>
       <parameter key="create_word_vector" value="true"/>
       <parameter key="vector_creation" value="TF-IDF"/>
       <parameter key="add_meta_information" value="true"/>
       <parameter key="keep_text" value="false"/>
       <parameter key="prune_method" value="none"/>
       <parameter key="prunde_below_percent" value="3.0"/>
       <parameter key="prune_above_percent" value="30.0"/>
       <parameter key="prune_below_rank" value="0.05"/>
       <parameter key="prune_above_rank" value="0.05"/>
       <parameter key="datamanagement" value="double_sparse_array"/>
       <parameter key="parallelize_vector_creation" value="false"/>
       <process expanded="true" height="588" width="968">
         <operator activated="true" class="text:tokenize" compatibility="5.2.001" expanded="true" height="60" name="Tokenize" width="90" x="74" y="145">
           <parameter key="mode" value="non letters"/>
           <parameter key="characters" value=".:"/>
           <parameter key="language" value="English"/>
           <parameter key="max_token_length" value="3"/>
         </operator>
         <connect from_port="document" to_op="Tokenize" to_port="document"/>
         <connect from_op="Tokenize" from_port="document" to_port="document 1"/>
         <portSpacing port="source_document" spacing="0"/>
         <portSpacing port="sink_document 1" spacing="0"/>
         <portSpacing port="sink_document 2" spacing="0"/>
       </process>
     </operator>
     <operator activated="true" class="select_attributes" compatibility="5.2.002" expanded="true" height="76" name="Select Attributes" width="90" x="179" y="30">
       <parameter key="attribute_filter_type" value="no_missing_values"/>
       <parameter key="attribute" value=""/>
       <parameter key="attributes" value=""/>
       <parameter key="use_except_expression" value="false"/>
       <parameter key="value_type" value="attribute_value"/>
       <parameter key="use_value_type_exception" value="false"/>
       <parameter key="except_value_type" value="time"/>
       <parameter key="block_type" value="attribute_block"/>
       <parameter key="use_block_type_exception" value="false"/>
       <parameter key="except_block_type" value="value_matrix_row_start"/>
       <parameter key="invert_selection" value="false"/>
       <parameter key="include_special_attributes" value="false"/>
     </operator>
     <operator activated="true" class="set_role" compatibility="5.2.002" expanded="true" height="76" name="Set Role" width="90" x="313" y="30">
       <parameter key="name" value="label"/>
       <parameter key="target_role" value="label"/>
       <list key="set_additional_roles"/>
     </operator>
     <operator activated="true" class="x_validation" compatibility="5.2.002" expanded="true" height="112" name="Validation" width="90" x="447" y="30">
       <parameter key="create_complete_model" value="false"/>
       <parameter key="average_performances_only" value="true"/>
       <parameter key="leave_one_out" value="false"/>
       <parameter key="number_of_validations" value="10"/>
       <parameter key="sampling_type" value="stratified sampling"/>
       <parameter key="use_local_random_seed" value="false"/>
       <parameter key="local_random_seed" value="1992"/>
       <parameter key="parallelize_training" value="false"/>
       <parameter key="parallelize_testing" value="false"/>
       <process expanded="true" height="588" width="459">
         <operator activated="true" class="decision_tree" compatibility="5.2.002" expanded="true" height="76" name="Decision Tree" width="90" x="180" y="138">
           <parameter key="criterion" value="gain_ratio"/>
           <parameter key="minimal_size_for_split" value="4"/>
           <parameter key="minimal_leaf_size" value="2"/>
           <parameter key="minimal_gain" value="0.1"/>
           <parameter key="maximal_depth" value="20"/>
           <parameter key="confidence" value="0.25"/>
           <parameter key="number_of_prepruning_alternatives" value="3"/>
           <parameter key="no_pre_pruning" value="false"/>
           <parameter key="no_pruning" value="false"/>
         </operator>
         <connect from_port="training" to_op="Decision Tree" to_port="training set"/>
         <connect from_op="Decision Tree" from_port="model" to_port="model"/>
         <portSpacing port="source_training" spacing="0"/>
         <portSpacing port="sink_model" spacing="0"/>
         <portSpacing port="sink_through 1" spacing="0"/>
       </process>
       <process expanded="true" height="588" width="459">
         <operator activated="true" class="apply_model" compatibility="5.2.002" expanded="true" height="76" name="Apply Model" width="90" x="76" y="147">
           <list key="application_parameters"/>
           <parameter key="create_view" value="false"/>
         </operator>
         <operator activated="true" class="performance" compatibility="5.2.002" expanded="true" height="76" name="Performance" width="90" x="180" y="255">
           <parameter key="use_example_weights" value="true"/>
         </operator>
         <connect from_port="model" to_op="Apply Model" to_port="model"/>
         <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
         <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
         <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
         <portSpacing port="source_model" spacing="0"/>
         <portSpacing port="source_test set" spacing="0"/>
         <portSpacing port="source_through 1" spacing="0"/>
         <portSpacing port="sink_averagable 1" spacing="0"/>
         <portSpacing port="sink_averagable 2" spacing="0"/>
       </process>
     </operator>
     <connect from_op="Process Documents from Files" from_port="example set" to_op="Select Attributes" to_port="example set input"/>
     <connect from_op="Process Documents from Files" from_port="word list" to_port="result 2"/>
     <connect from_op="Select Attributes" from_port="example set output" to_op="Set Role" to_port="example set input"/>
     <connect from_op="Set Role" from_port="example set output" to_op="Validation" to_port="training"/>
     <connect from_op="Validation" from_port="training" to_port="result 1"/>
     <portSpacing port="source_input 1" spacing="0"/>
     <portSpacing port="sink_result 1" spacing="0"/>
     <portSpacing port="sink_result 2" spacing="0"/>
     <portSpacing port="sink_result 3" spacing="0"/>
   </process>
 </operator>
</process>

Answers

  • Options
    hodeffdhodeffd Member Posts: 2 Contributor I



    System properties:
    ------------

    os properties:
     os.name = Windows 7
     os.version = 6.1
     os.arch = x86
    java properties:
     java.home = C:\Program Files (x86)\Rapid-I\RapidMiner5\jre
     java.endorsed.dirs = C:\Program Files (x86)\Rapid-I\RapidMiner5\jre\lib\endorsed
     java.vendor.url = http://java.sun.com/
     java.version = 1.6.0_31
     java.vendor.url.bug = http://java.sun.com/cgi-bin/bugreport.cgi
     java.runtime.name = Java(TM) SE Runtime Environment
     java.specification.name = Java Platform API Specification
     java.io.tmpdir = C:\Users\David\AppData\Local\Temp\
     java.vm.info = mixed mode
     java.vm.specification.name = Java Virtual Machine Specification
     java.awt.printerjob = sun.awt.windows.WPrinterJob
     java.specification.vendor = Sun Microsystems Inc.
     java.vm.name = Java HotSpot(TM) Client VM
     java.library.path = C:\Program Files (x86)\Rapid-I\RapidMiner5\jre\bin;C:\Windows\Sun\Java\bin;C:\Windows\system32;C:\Windows;C:\Program Files\Common Files\Microsoft Shared\Windows Live;C:\Program Files (x86)\Common Files\Microsoft Shared\Windows Live;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Program Files (x86)\Common Files\Lenovo;C:\Program Files (x86)\Common Files\Ulead Systems\MPEG;C:\Program Files (x86)\Windows Live\Shared;C:\Program Files (x86)\Lenovo\Access Connections\;C:\SWTOOLS\ReadyApps;C:\Program Files (x86)\Intel\Services\IPT\;C:\Program Files (x86)\Symantec\VIP Access Client\;C:\Program Files\Intel\WiFi\bin\;C:\Program Files\Common Files\Intel\WirelessCommon\;C:\Program Files\Common Files\Lenovo;C:\Program Files\Intel\WiFi\bin\;C:\Program Files\Common Files\Intel\WirelessCommon\;C:\Program Files (x86)\SSH Communications Security\SSH Secure Shell;.
     java.class.version = 50.0
     java.awt.graphicsenv = sun.awt.Win32GraphicsEnvironment
     java.vm.specification.version = 1.0
     java.ext.dirs = C:\Program Files (x86)\Rapid-I\RapidMiner5\jre\lib\ext;C:\Windows\Sun\Java\lib\ext
     java.vm.vendor = Sun Microsystems Inc.
     java.vm.version = 20.6-b01
     java.class.path = lib/launcher.jar
     java.vm.specification.vendor = Sun Microsystems Inc.
     java.runtime.version = 1.6.0_31-b05
     java.vendor = Sun Microsystems Inc.
     java.specification.version = 1.6
    RapidMiner Parameters:
     ftp.nonProxyHosts =
     ftp.proxyHost =
     ftp.proxyPassword =
     ftp.proxyPort =
     ftp.proxySet = false
     ftp.proxyUsername =
     http.nonProxyHosts =
     http.proxyHost =
     http.proxyPassword =
     http.proxyPort =
     http.proxySet = false
     http.proxyUsername =
     https.proxyHost =
     https.proxyPassword =
     https.proxyPort =
     https.proxySet = false
     https.proxyUsername =
     rapidminer.general.capabilities.warn = false
     rapidminer.general.debugmode = false
     rapidminer.general.encoding = UTF-8
     rapidminer.general.fractiondigits.numbers = 3
     rapidminer.general.fractiondigits.percent = 2
     rapidminer.general.locale.language = en
     rapidminer.general.logfile.format = no
     rapidminer.general.max_rows_used_for_guessing = 100
     rapidminer.general.md_nominal_values_limit = 100
     rapidminer.general.number_of_threads = 0
     rapidminer.general.randomseed = 2001
     rapidminer.general.timezone = SYSTEM
     rapidminer.gui.add_breakpoint_results_to_history = false
     rapidminer.gui.attributeeditor.columnlimit = 20
     rapidminer.gui.attributeeditor.rowlimit = 50
     rapidminer.gui.auto_switch_to_resultview = true
     rapidminer.gui.autowire_input = true
     rapidminer.gui.autowire_output = true
     rapidminer.gui.beep.breakpoint = true
     rapidminer.gui.beep.error = true
     rapidminer.gui.beep.success = true
     rapidminer.gui.close_results_before_run = ask
     rapidminer.gui.confirm_exit = false
     rapidminer.gui.disconnect_on_disable = true
     rapidminer.gui.evaluate_meta_data_for_sql_queries = true
     rapidminer.gui.fetch_data_base_table_names = true
     rapidminer.gui.log_level = CONFIG
     rapidminer.gui.max_displayed_values = 50
     rapidminer.gui.max_sortable_rows = 100000
     rapidminer.gui.max_statistics_rows = 100000
     rapidminer.gui.messageviewer.highlight.errors = 255,51,204
     rapidminer.gui.messageviewer.highlight.logservice = 184,184,184
     rapidminer.gui.messageviewer.highlight.notes = 51,151,51
     rapidminer.gui.messageviewer.highlight.warnings = 51,51,255
     rapidminer.gui.messageviewer.rowlimit = 1000
     rapidminer.gui.plaf = system
     rapidminer.gui.plotter.colors.classlimit = 10
     rapidminer.gui.plotter.legend.classlimit = 20
     rapidminer.gui.plotter.legend.maxcolor = 255,0,0
     rapidminer.gui.plotter.legend.mincolor = 0,0,255
     rapidminer.gui.plotter.matrixplot.size = 200
     rapidminer.gui.plotter.rows.maximum = 5000
     rapidminer.gui.processinfo.show = true
     rapidminer.gui.resolve_relative_repository_locations = true
     rapidminer.gui.result_display_type = docking
     rapidminer.gui.save_before_run = ask
     rapidminer.gui.save_on_process_creation = false
     rapidminer.gui.savedialog = true
     rapidminer.gui.snap_to_grid = true
     rapidminer.gui.transfer_usagestats = ask
     rapidminer.gui.undolist.size = 10
     rapidminer.gui.update.check = true
     rapidminer.init.plugins = true
     rapidminer.init.plugins.location =
     rapidminer.parallel.number_of_threads = 8
     rapidminer.paren.wizard.meta_learning_model =
     rapidminer.tools.db.assist.show_only_standard_tables = true
     rapidminer.tools.editor =
     rapidminer.tools.gnuplot.command = gnuplot
     rapidminer.tools.mail.default_recipient =
     rapidminer.tools.mail.method = SMTP
     rapidminer.tools.mail.process_duration_for_mail = 30
     rapidminer.tools.sendmail.command = /usr/sbin/sendmail
     rapidminer.tools.smtp.host =
     rapidminer.tools.smtp.passwd =
     rapidminer.tools.smtp.port =
     rapidminer.tools.smtp.user =
     rapidminer.update.check = true
     rapidminer.update.incremental = true
     rapidminer.update.to_home = true
     rapidminer.update.url = http://rapidupdate.de:80/UpdateServer
     rapidminer.version = 5.2.001
     socksProxyHost =
     socksProxyPort =

    i hope if some one can help me , thanks!

    :)
  • Options
    MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi,

    if the data is huge, you only have two possibilites:

    a) reduce the size of your data, e.g. work on a subset of your input data (create a folder which contains only some of your files).
    b) increase the maximum amount of memory which RapidMiner can use. You are on Windows 7, so you can go to the "scripts" folder of your RapidMiner instance, open RapidMinerGUI.bat with an editor and increase the value of the variable MAX_JAVA_MEMORY.
    Then start RapidMiner by using that batch file.

    Best,
    Marius
Sign In or Register to comment.