Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

Polynominal Value Reduction

seanv507seanv507 Member Posts: 2 Contributor I
edited November 2018 in Help
Hi
I would like to replicate a process i have done in Python/scikit-learn/R:

I am looking at Advertising Click Through Rate prediction. ( Millions of rows, say ~5 polynominal features... each with up to 1000 different values (eg feature=Website, Country etc).

Since the feature data is "skewed", ie many values have very few instances in data and vice versa, I want to restrict the polynominal features to those that  change CTR significantly from base CTR ( and replace the "long tail" by a single "NA" category for each polynominal feature).

Is there any way of doing this within rapid miner?

Answers

  • frasfras Member Posts: 93 Contributor II
    Hi,

    as far as I understand the problem I would do two things first:

    - get a sample of your data (reduce rows, 1%)
    - apply operator "NominalToBinominal"

    Then analyse how sparse your data is.
    For more advice examples are useful.
  • seanv507seanv507 Member Posts: 2 Contributor I
    CTR data is "unbalanced" - ie ~1% chance of clicking.  So subsampling is good - but I have to do it only on the "non-click class" and then reweight the class in the training algorithm [ eg  data  contains 100 clicks, 100000 non-clicks - I am happy to subsample non-clicks]

    feature data is JUST IDs: WebsiteID, AdID etc [ eg google.com=1, yahoo.com=2, cnbc.com=3,....], so no description of website.

    So yes I want to to NominaltoBinominal, but then/at same time/before I want to FILTER out those Binominals eg certain websites for which there is little training data]
    ( see eg http://www.kaggle.com/about/papers ... click though rate)
Sign In or Register to comment.