Error loading simple product master file

saikat_nandisaikat_nandi Member Posts: 2 Contributor I
edited November 2018 in Help
I will appreciate help on the following error trying to read a simple txt file using ReadCSV operator.

I am trying to load a simple product file with 75,0000 records. Each record has filelds like Product ID, Category, Sub-Category, Product Description etc. no more than 20 attributes.

I am getting an error saying the following:

Apr 30, 2010 12:38:46 AM SEVERE: Process failed: operator cannot be executed (18). Check the log messages...
Apr 30, 2010 12:38:46 AM SEVERE: Here:          Process[1] (Process)
          subprocess 'Main Process'
      ==>  +- Read CSV[1] (Read CSV)
Apr 30, 2010 12:38:46 AM SEVERE: java.lang.ArrayIndexOutOfBoundsException: 18

XML version of the process is given below.

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" expanded="true" name="Process">
    <process expanded="true" height="100" width="145">
      <operator activated="true" class="read_csv" expanded="true" height="60" name="Read CSV" width="90" x="45" y="30">
        <parameter key="file_name" value="C:\My Documents\ITEM_MASTER.TXT"/>
        <parameter key="trim_lines" value="true"/>
        <parameter key="use_first_row_as_attribute_names" value="false"/>
        <parameter key="parse_numbers" value="false"/>
      </operator>
      <connect from_op="Read CSV" from_port="output" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="18"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

Thanks for your help.
Saikat

Answers

  • SebastianLohSebastianLoh Member Posts: 99  Maven
    Hi saikat,

    it hard to tell where the error excatly is. Could you provide the csv file? This would help to find the problem. However, I guess the file is simply to big for your memory. Try to cut it in quarters (18 750 Records) and try to load the smaller files (all of them, to find out, if there is a problem with the data or memory). Maybe this helps.

    Ciao Sebastian
  • saikat_nandisaikat_nandi Member Posts: 2 Contributor I
    Thanks for your reply.

    The error went away after I changed encoding type from System to UTF16. I also stopped parsing numbers, but the first change was probaly what fixed the error.

    Now I am getting error saying that there is not enough memory. My process is simple beginner's one - to read a file of 75K or so products so that I can run some queries later.

    If memory is typically an issue, how does people do real life data exploration? Is it done outside Rapid Miner using a RDBMS or something like that? And only after that a subset of data used for model training in Rapid Miner?

    Thanks,
    Saikat
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,642  RM Founder
    Hi,

    I am pretty sure that 75K records with about 20 attributes should not be a problem for RapidMiner running on a decent system. The question is on what system you are running RapidMiner and - if applicable - how much memory you gave to it. Check out the memory monitor in the results perspective.

    I am repeating myself but we are able to work with RapidMiner on 120 million records with a typical system setup. It's a question of the data source (data coming from a database instead of files always is a good idea  ;) and the analysis process design. Not everything can be done on data sets of every size and the analyst has to know what can work in principle and what cannot.

    Cheers,
    Ingo
Sign In or Register to comment.