RapidMiner 9.8 Beta is now available

Be one of the first to get your hands on the new features. More details and downloads here:

GET RAPIDMINER 9.8 BETA

Why is

SkyTraderSkyTrader Member Posts: 88 Contributor I
edited July 18 in Help
Hi, I've looked for a similar issue in search but can't see why  the"Ignore Errors" button isn't working on an Excel file Import?

Tagged:

Best Answer

Answers

  • SkyTraderSkyTrader Member Posts: 88 Contributor I
    My comment above got posted accidentally without finihsing it. I've added the questions I have on an image of the failed Excel import.the faile
  • jacobcybulskijacobcybulski Member, University Professor Posts: 235   Unicorn
    Ignore Errors is designed for "little" errors, such as encountering a nominal value when numerics are expected or when the date is not formatted the way it is expected. When a "big" error is found, such as the corrupted Excel file or the file format in a very new Excel version, the reader gives up and crashes. I have always had better luck in reading CSV or JSON files, or even SAS files than Excel files. I suggest to save your Excel file into CSV and use a CSV reader.
    SkyTrader
  • SkyTraderSkyTrader Member Posts: 88 Contributor I
    edited July 18
    Thanks Jacob, well I got slightly further with a csv! But RM still thinks the variables in the first row are errors. They are variables periods for technical financial indicators like the RSI, as explained in the image above.

    Oddly RM doesn't even pull in Row 1 (SMA and EMA etc variables like 20 day or 50 day EMAverage) on this import, yet still "calls them out" as being errors?

    Would it be better if I just deleted Row 1 and its variables?
  • jacobcybulskijacobcybulski Member, University Professor Posts: 235   Unicorn
    It is best to format your dates as proper date strings before exporting from Excel.
    SkyTrader
  • SkyTraderSkyTrader Member Posts: 88 Contributor I
    Success. Thanks very much. My first large data set imported into RM.

    I'd already decided to delete Row 1 and rename anything that looked similar.

    "format your dates as proper date strings"
    I think this is currently the best format dd/mm/yyyy.... but then I found about a years worth of dates with a different format so that was converted.

    Then it showed I had commas on the Open data (Dow Jones), so I deleted those, then it found more in another column and another...
    so eventually I selected every relevant column and formatted the whole workbook and then it imported fine.

    Cheers again Jacob.

  • jacobcybulskijacobcybulski Member, University Professor Posts: 235   Unicorn
    Great, well done!
    SkyTrader
  • SkyTraderSkyTrader Member Posts: 88 Contributor I
    Ps/ On a column that doesn't calculate the first 19 cells like a 20 day moving average does it matter that RM depicts those empty column cells as a "?" Cheers,
  • jacobcybulskijacobcybulski Member, University Professor Posts: 235   Unicorn
    "?" indicates missing values, if in the results tab you go to Statistics, you will see how many missing values you have in each column. You can then decide what to do with those columns / attributes, you can either remove such columns (using Select Attributes) or you can "impute" those missing values (e.g. with column average using Replace Missing Values). I suspect that in your case you'd want the column to calculate the value, in which case you probably want to delete that column and add a new column using a Generate Attribute which would have a calculation inside.
  • SkyTraderSkyTrader Member Posts: 88 Contributor I
    Thanks, those kind of missing values are to be expected in Excel or RM because a 20 day moving average  won't give an values until Row 20. I just wasn't sure how RM would treat them.
Sign In or Register to comment.