Generating Simulated DataFrame

cedric_anover · August 2017

This is an Operator Idea where it takes a DataSet as input and then it analyze/estimate the distribution of each attributes/columns, and then outputs another dataframe (which may have different nuber of rows) with same columns/attributes but have different simulated observations/examples.

Input:

DF(Type: DataFrame)

Parameters:

nrow (Type: Int) = nrow of DF (by default)

Output:

DF_Out (Type: DataFrame)

MartinLiebig · August 2017

Dear @cedric_anover ,

what do you mean by "analyze/estimate the distribution"? Simple check for usual distributions like Normal, Poisson or Cauchy?

I think this is rather academic, because in real life distributions aren't that easy. If you don't use the histogram as a estimate for the pdf you get a problem. In any case, these simulations are not taking account dependecies between two attributes (or more). If you want to do it more correctly you are forced to use techniques like Markov Chain i suppose.

Best,

Martin

sgenzer · October 2017

pending response from user

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Generating Simulated DataFrame

Declined · Last Updated May 2019

Comments