How to set the whole dataset (1000*100) as a label

TeeHTeeH Member Posts: 18 Contributor II
let's say I have 4 datasets of 1000 rows and 100 columns, and each dataset is a different variable (4 variables), so out of 4, 3 are predictors and one is a target, so how do I set a dataset of 1000*100 as a label so that I can build predicting model using 3 other datasets as predictors, take these datasets as multidimensional dataset
Tagged:

Answers

  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn
    Hi,

    what's your use case? Which problem are you trying to solve?

    Traditional machine learning models predict one number (regression) or class (classification) from the predictors. Are you trying to generate an entire dataset here? Including predictors and the label variable?

    On the top of my head I don't know of a way to do this in RapidMiner. This would be advanced hackery with generative neural networks or something like this.

    Regards,
    Balázs
  • TeeHTeeH Member Posts: 18 Contributor II
    I'm trying to predict vegetation change using climate variable, I'm using zonal statistics generated from a multi-dimension dataset, is a spatiotemporal dataset, comprised of the time dimension, pixel id value, and the value for each variable
  • TeeHTeeH Member Posts: 18 Contributor II
    follow up on the first quest, apart from prediction, is it possible to generate a new dataset from the other dataset, maybe by just carrying out a simple computation like addition, is it possible?
  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn
    Oh, so your target variable is the vegetation change in every area one by one.

    Predicting every 1000x100 area one by one will take time but that's what one would do in this situation.

    I guess you have historic data for each area (selected by x, y coordinates for example). Like change 4 years ago, 3 years ago, etc. In this case you would build a model for each area, maybe taking into consideration the neighboring cells. 

    I would start with a small part (not 100,000 models at once) or aggregate areas into larger ones to get a more robust prediction.

    Regards,
    Balázs
  • TeeHTeeH Member Posts: 18 Contributor II
    yeah! I did aggregate but I wanted to do prediction at pixel size, maybe I should try using image operators...
Sign In or Register to comment.