The Altair Community is migrating to a new platform to provide a better experience for you. The RapidMiner Community will merge with the Altair Community at the same time. In preparation for the migration, both communities are on read-only mode from July 15th - July 24th, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here.

Polynominal Value Reduction

seanv507seanv507 Member Posts: 2 Contributor I
edited November 2018 in Help
I would like to replicate a process i have done in Python/scikit-learn/R:

I am looking at Advertising Click Through Rate prediction. ( Millions of rows, say ~5 polynominal features... each with up to 1000 different values (eg feature=Website, Country etc).

Since the feature data is "skewed", ie many values have very few instances in data and vice versa, I want to restrict the polynominal features to those that  change CTR significantly from base CTR ( and replace the "long tail" by a single "NA" category for each polynominal feature).

Is there any way of doing this within rapid miner?


  • Options
    frasfras Member Posts: 93 Contributor II

    as far as I understand the problem I would do two things first:

    - get a sample of your data (reduce rows, 1%)
    - apply operator "NominalToBinominal"

    Then analyse how sparse your data is.
    For more advice examples are useful.
  • Options
    seanv507seanv507 Member Posts: 2 Contributor I
    CTR data is "unbalanced" - ie ~1% chance of clicking.  So subsampling is good - but I have to do it only on the "non-click class" and then reweight the class in the training algorithm [ eg  data  contains 100 clicks, 100000 non-clicks - I am happy to subsample non-clicks]

    feature data is JUST IDs: WebsiteID, AdID etc [ eg,,,....], so no description of website.

    So yes I want to to NominaltoBinominal, but then/at same time/before I want to FILTER out those Binominals eg certain websites for which there is little training data]
    ( see eg ... click though rate)
Sign In or Register to comment.