Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

Missing rows with ExampleSource

MuehliManMuehliMan Member Posts: 85 Maven
edited November 2018 in Help
I am trying to import a large dataset into RM. As source I have a CSV File with about 200 rows and app. 250 columns.
(ExampleCSVSource gives an error complaining that there are different columns in line...)

Using the ExampleSource and the ExampleSource Wizard I can see in the lower part of the window that 189 rows and 251 columns to import, so I click the Finish button.
When click on the Edit... Button to see my dataset I get table with all 251 columns, but only 19 examples.

Where are my missing rows? Any help is welcome!
BTW: I am still using version 4.1

Answers

  • IngoRMIngoRM Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hello,

    in the AttributeEditor, you can define which rows should be shown and press the Update button in the panel on the left. You could of course also simply load the data and see if all data is there. Just run the process and check the meta data view and the data view.

    (ExampleCSVSource gives an error complaining that there are different columns in line...)
    If you have missing values in this data set at the end of the lines I would suggest upgrading to RapidMiner 4.2 since there was a bug for previous versions ignoring missing values at the end of lines in CSV files.

    Cheers,
    Ingo
  • MuehliManMuehliMan Member Posts: 85 Maven
    Hi Ingo,
    mierswa wrote:

    in the AttributeEditor, you can define which rows should be shown and press the Update button in the panel on the left. You could of course also simply load the data and see if all data is there. Just run the process and check the meta data view and the data view.
    The number of examples in the AttributeEditor is given as 19 (20 rows) maximum.  If I open the dat file in a text editor I find  all of the entries there. Could it be that there is an option preventing RM from showing all entries?

    Another funny thing about it is that if i import my data to OpenOffice, export the data as XLS File and load this file as ExcelExampleSource I get all columns and rows.
    mierswa wrote:

    If you have missing values in this data set at the end of the lines I would suggest upgrading to RapidMiner 4.2 since there was a bug for previous versions ignoring missing values at the end of lines in CSV files.
    I have read some threads about that bug in other posts and I have switched to version 4.2.

    Thanks for your help,
    Markus
  • IngoRMIngoRM Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hello,

    the attribute editor stops reading if anything goes wrong. So, I assume that there is something unusual with line 19 or 20. Probably, there is a problem with quoting or with the definition of the column separators not matching your data format. If you like, and the data is not too sensible, you could post an excerpt of your data and I could have a look what the problem might be.

    Cheers,
    Ingo
  • MuehliManMuehliMan Member Posts: 85 Maven
    hi,

    I am pretty sure that you are right with your assumption. I'll try and go through the CSV File with a text editor to check commas and the columns.

    Greets,
    Markus
Sign In or Register to comment.