RapidMiner 9.7 is Now Available
Lots of amazing new improvements including true version control! Learn more about what's new here.
CLICK HERE TO DOWNLOAD
Transform data into table with every attribute representation
moritz_moeller
Member Posts: 5 Learner I
Hey there,
since my data set is too big to analyze it with a clustering algorithm (moreover I don't want to wait as long as it needs), I want to transform it into a smaller set.
The question I have is if it is possible to transform it into a data set that represents every attribute in a representative amount? For example: I have a data set that has 3 columns that all have 5 different, possible values (i.e. 15) and 10 million rows. Now I want to have a data set that contains all 3 columns with all types of values but only 100k rows so that I can analyze it. Is there an option to do that automatically in RM? If not I think I have to do it manually somehow.
Thanks and Greetings,
Moritz
since my data set is too big to analyze it with a clustering algorithm (moreover I don't want to wait as long as it needs), I want to transform it into a smaller set.
The question I have is if it is possible to transform it into a data set that represents every attribute in a representative amount? For example: I have a data set that has 3 columns that all have 5 different, possible values (i.e. 15) and 10 million rows. Now I want to have a data set that contains all 3 columns with all types of values but only 100k rows so that I can analyze it. Is there an option to do that automatically in RM? If not I think I have to do it manually somehow.
Thanks and Greetings,
Moritz
0
Best Answer

SGolbert Posts: 342 UnicornHi Moritz,I haven't found dimension reduction techniques for polinomial variables in RM. Maybe it is possible to use feature selection.Regarding the rows, these are the examples you are using for training and testing. It is up to you, how many examples you want to use. There is no need to use all the rows, at least while you are not deploying the final model. It of course depends of the kind of data also, if it is a time series the approach should be different.Regards,Sebastian6
Answers
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts