Options

"Read CSV fails with null pointer exception"

alejandro_tobonalejandro_tobon Member Posts: 16 Maven
edited June 2019 in Help
Hi, when I try to read a file that is 2001 lines, including the title, this process work perfectly, but when I try to read a file 2002, this process crashes, I already checked my file, and replaced the line wich was cousing the problem to another one that is not, but the problem still happens.
I have just one operation, and is to read the file, is there any restriction I don’t know about reading CSV files.
The error thar rapid miner shows is:
The setup does not seem to contain any obvius error, and in the log says: Log messages says NullPointerException.

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" expanded="true" name="Process">
    <process expanded="true" height="145" width="279">
      <operator activated="true" breakpoints="after" class="read_csv" expanded="true" height="60" name="Read CSV" width="90" x="179" y="75">
        <parameter key="file_name" value="C:\Users\Alejandro\Desktop\ToSinc\Austral\Tesis\Security\ParcerCVSNVD\Data\nvdcve-2.0-2009-0-2495-10.csv"/>
      </operator>
      <connect from_op="Read CSV" from_port="output" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>
Tagged:

Answers

  • Options
    alejandro_tobonalejandro_tobon Member Posts: 16 Maven
    I have, a both files, the 2000 lines file and the 2001 lines, bat they are 29 MB size, please let me know how can I send them to you.

    Thanks
  • Options
    alejandro_tobonalejandro_tobon Member Posts: 16 Maven
    The files are in rapid share servers,

    http://rapidshare.com/files/383481678/Files.rar.html

    In this link you will find 2 files, one with 2000 lines of data, that works perfecctly, and a 2001 lines data, that doesnt work.

    Thaks
  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    there's no limitation of the file's length or the number of columns, but it seems, that the restriction to have in each row the equal number of columns has been violated. Did you check this?

    We are currently revising the import to improve it's functionality and will add your file to the tests it must stand or at least give a meaningful error.

    Greetings,
      Sebastian
  • Options
    alejandro_tobonalejandro_tobon Member Posts: 16 Maven
    Hi Sebastian.

    Regarding the question you just asked, yes I did check, in fact, the only difference between the file that works and the file that doesn’t is just one record, and if you replace this record, with a record, from the beginning of the file, it won’t work either, please let me know, I you get to test it, and if you know another way to make it work.

    Thanks.
    ;)
  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    did you test if for example the before last line makes the problems?
    I forwarded your data set to my college that is responsible for the coding of the import wizards and operators. He's currently improving them and added your file to the test set of files.

    Greetings,
      Sebastian
  • Options
    alejandro_tobonalejandro_tobon Member Posts: 16 Maven
    Hi Sebastian.

    Yes I did test it, indeed I tried to copy the first line of the file that doesn’t work in to the file that works, and didn’t work either.

    I know is a lot of information at the beginning I tried to insert this data in to a Database, bas wasn’t possible because it has 14.000 columns and SQL Server has a limit of 1200 columns, so I decide to write this information on a CSV file, but I think rapid miner can handle it.

    This information is part of my thesis of the program Masters on Data mining and knowledge discovery, is an investigation about making a classification model to categorize software vulnerabilities, and I choose Rapid Miner as a tool because it’s a really amazing software to analyze and create data mining models.
    :)
  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    thank you for the compliment :) I have another idea what could be worth a try: The read aml operator provides you with a data loading and configuration wizard. Might be this works for you...

    Greetings,
      Sebastian
Sign In or Register to comment.