Copy Dataset Properties

btibertbtibert Member, University Professor Posts: 146 Guru
This is a "is it possible, if not, best way to handle this" type of question. 

My use-case is one where I have two files; a training file and a validation set.  The training is meant to fit the model ,and the validation has the same columns short of the label.  I am doing a decent amount of preprocessing, and want to leverage that work. 

I am hitting a roadblock because when I do Read CSV on the validation set, the predicted data type for a given column varies (train = polynominal, test = integer), and even though I can bring forward the preprocessing steps via Apply Model, the column is not being dummy encoded with the Nominal to Numeric operator I am carrying forward.  As such, applying the model to the validation set fails because the column is not present.

I know that I could manually fix the file on load or via an operator, but I am wondering if there is a "copy data type" when columns share the same name.  I would prefer this type of error not to happen during my in-class data competitions, and with a dataset that has 50 columns, my end goal would be to try to avoid having them ensure  column types 1 by 1. 


Best Answer

Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist
    edited September 2019
    Hi,
    Nominal to Numerical has a preprocessing model. you can group this with your prediction model, so that you always to do both at the same time.
    Best,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • btibertbtibert Member, University Professor Posts: 146 Guru
    Thanks Martin, I leveraged that, but its more about how the two files get read in with a different datatype to start.  I can set it manually but have been trying to explicitly use operators for everything, which is why I was curious to know if there was a "copy data types" from one raw file to another.  By data types, I simply mean numeric, nominal, text, id, etc.  Not a huge deal, just wondering as I am the farthest thing from an expert on all of the tooling that is baked into RM.
  • btibertbtibert Member, University Professor Posts: 146 Guru
    Got it, that makes sense in terms of how to do it.  Thank you.
Sign In or Register to comment.