questions on importing EXCEL files whose columns are of mixed formats

surfretasurfreta Member Posts: 3 Contributor I
edited December 2019 in Help
Hello, I am trying to read an Excel file into Rapidminer. However, this Excel file have mixed data format. For instance, a given column may contain some cells which are just numerical values, while some other cells are plain texts. How should I set up formats when using ReadExcel operator.
Tagged:

Answers

  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi,

    RapidMiner can only handle columns with a consistent data type. So in your case you have to define all columns as polynominal to make sure that the Read Excel operator can correctly read the complete file. Then you can process the file further in RapidMiner. If there is a condition that determines the type of a row you could use Filter Examples to split the data set into a part that contains the text values and another that contains the numbers. Then use Parse Numbers on the number part to convert polynominal to numbers.
    Please note that Parse Numbers will silently fail if there is a value in the column that cannot be converted to a number.

    Best regards,
    Marius
  • amenaakhterchyamenaakhterchy Member Posts: 7 Contributor I

    hello 

     

    i am also faceing same problem . my dataset contains mixed formats . when i am importing excel file . it showing error "cannot get numeric value from a text file . " how can i solve this ?

     

    thnak you 

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    As Marius suggested originally, the best solution is to set the data type to polynominal during the Excel import, which will then allow both numeric and non-numeric characters and all the data will be imported.  Once all the data is in RapidMiner, you can then convert the numerical data using "Parse Numbers" and also decide how you want to deal with the non-numerics (simply have missing values, impute missing values, etc.).

     

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
Sign In or Register to comment.