Options

Compound search for attribute selection

wesselwessel Member Posts: 537 Maven
edited June 2019 in Help
Compound search combines intermediate search result to search the space of all possible attribute subsets more effectively.
For example in a dataset with 100 attributes, and the optimal subset containing 10 attributes, forward search can only find this optimal subset after 10 generations.
Compound search will put all promising attributes in the second generation.
So if attributes 1 to 10 have some good accuracy and the rest has 0% accuracy, it will test permutations of attributes 1 to 10 in the second generation.

A paper explaining this idea in more detail, and proving that this is indeed a good idea with several experiments:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.40.4640
robotics.stanford.edu/~ronnyk/fssWrapper.ps 

Answers

  • Options
    fischerfischer Member Posts: 439 Maven
    Hi,

    thanks for the hint. Sounds interesting. Are there any implementations yet? Would be happy to incorporate that into the RM core.

    Cheers,
    Simon
  • Options
    wesselwessel Member Posts: 537 Maven
    I can implement it.

    It is just a search algorithm.

    But my java experience is minor.

    I have 2 university courses on Java programming, but somehow seems not enough.
  • Options
    wesselwessel Member Posts: 537 Maven
    Is it possible to write a piece of code that is not a complete implementation into Rapid miner?

    But just has the algorithm?
  • Options
    fischerfischer Member Posts: 439 Maven
    Hi,

    as a start, you can look into the forward selection:

      com.rapidminer.operator.features.selection.ForwardSelectionOperator

    If you want to do it, you can checkout RM from SVN and build it directly into the core, so you don't have to bother with making a plugin in the first place. You can still do that later, or, if you want, you can contribute the code and we would probably build it into the core.

    Cheers,
    Simon
  • Options
    wesselwessel Member Posts: 537 Maven
    Right I implemented all the functions I need for the algorithm.
    A HashMap data type all containing all states searched so far.
    A Array data type top, containing the top-N states.
    A function to sort top by fitness.
    A function mutate, which randomly inverts attribute on/off for 1 attribute and creates a new state.
    A function crossover, which randomly copies attribute on/off from source A or B and creates a new state.



    Now I will see if I can somehow modify the forward selection operator, and recompile it.

    And I guess to see if it works I can download some datasets from UCI and compare kappa score and runtime.
    // BitSet Demonstration.
    import java.util.BitSet;
    import java.util.HashSet;
    import java.util.Random;
    import java.util.Arrays;

    class SearchBitSetSpace {
    public static int bsLen = 16; // number of bits in BitSet
    public static int topLen = 6; // number of States in top pool

    // BitSet strongest = top[LAST]
    public static State[] top = new State[topLen]; // top-N states
    public static HashSet<BitSet> all = new HashSet<BitSet>();
    public static Random ran = new Random();

    public static void main(String args[]) {
    initTop();
    updateTopViaCrossOver();
    updateTopViaMutate();
    updateTopViaCrossOver();
    updateTopViaMutate();
    updateTopViaCrossOver();
    updateTopViaMutate();

    for (int i = 0; i < topLen; i++) {
    echo(top);
    }

    echo(all);
    }

    public static void updateTopViaMutate() {
    for (int i = 0; i < topLen; i++) {
    // create mutant
    BitSet bs = mutate(top.bs);
    // add bs to all and check if it was already contained
    if(all.add(bs) == false) {
    continue;
    }
    int bsfitness = fitness(bs);

    // check if mutant is stronger than top[0], which is the current weakest
    if(bsfitness > top[0].fitness) {
    top[0].bs = bs;
    top[0].fitness = bsfitness;
    sortTop();
    }
    }
    }

    public static void updateTopViaCrossOver() {
    // create crossover from strongest and 1 other top-BitSet
    BitSet strongest = top[topLen - 1].bs;

    for (int i = 0; i < topLen - 1; i++) {
    // create crossover, top is the other BitSet
    BitSet bs = cross(strongest, top.bs);
    //add bs to all and check if it was already contained
    if(all.add(bs) == false) {
    continue;
    }
    int bsfitness = fitness(bs);

    // check if crossover is stronger than top[0], which is the current weakest
    if(bsfitness > top[0].fitness) {
    top[0].bs = bs;
    top[0].fitness = bsfitness;
    sortTop();
    }
    }
    }

    public static void sortTop() {
    Arrays.sort(top);
    }

    // fill the top pool with different BitSets
    public static void initTop() {
    for (int i = 0; i < topLen; i++) {
    BitSet a = new BitSet(bsLen);
    a.set(i);
    top = new State(a, fitness(a));
    all.add(a);
    }
    }

    // this is a fake fitness function
    public static int fitness(BitSet a) {
    return a.cardinality();
    }

    // randomly change 1 bit in a BitSet
    public static BitSet mutate(BitSet a) {
    BitSet copy = (BitSet) a.clone();
    copy.flip(ran.nextInt(bsLen));
    return copy;
    }

    // randomly inherit bits from BitSet a or b
    public static BitSet cross(BitSet a, BitSet b) {
    BitSet copy = (BitSet) a.clone();

    for (int i = 0; i < bsLen; i++) {
    if (ran.nextBoolean() == true) {
    copy.set(i, b.get(i));
    }
    }
    return copy;
    }

    // print a BitSet BitString
    public static void print(BitSet bs) {
    for (int i = 0; i < bsLen; i++) {
    if (bs.get(i) == true) {
    System.out.print("1");
    } else {
    System.out.print("0");
    }
    }
    System.out.println();
    }

    public static void echo(Object o) {
    System.out.println(o);
    }
    }

    class State implements Comparable<State> {
    BitSet bs;
    int fitness;

    public State(BitSet bs, int fitness) {
    this.bs = bs;
    this.fitness = fitness;
    }

    public String toString() {
    return bs.toString() + " :: " + fitness;
    }

    public int compareTo(State state) {
    return this.fitness - state.fitness;
    }
    }
  • Options
    wesselwessel Member Posts: 537 Maven
    Hmm, seems forward selection uses a boolean array to represent a state.
    But since I need to remember all states that I once visited the memory size of this array is gonna be a problem.

    boolean[] selected = new boolean[numberOfAttributes];

    memory size:  numberOfAttributes * 8 bit + array overhead

    It think its better to use a BitSet.

    BitSet selected = new BitSet(numberOfAttributes);

    memory size: numberOfAttributes * 1 bit + bitset overhead




    I think the part below is the heuristic / fitness part / performance?
    innerExampleSource.deliver(exampleSet); // change attributes in dataset?
    getSubprocess(0).execute(); // makes call to learning algorithm?


    boolean[] selected = new boolean[numberOfAttributes];
    PerformanceVector bestPerformance = null;
    for (i = 0; i < numberOfSteps; i++) {
    int bestIndex = 0;
    boolean gain = false;
    for (int current = 0; current < numberOfAttributes; current++) {
    if (!selected[current]) {
    // switching on
    attributes.addRegular(attributeArray[current]);

    // evaluate performance
    innerExampleSource.deliver(exampleSet);
    getSubprocess(0).execute();

    PerformanceVector performance = innerPerformanceSink.getData();
    if (bestPerformance == null || performance.compareTo(bestPerformance) > 0 ) {
    bestIndex = current;
    bestPerformance = performance;
    gain = true;
    }

    // switching off
    attributes.remove(attributeArray[current]);
    }
    }

  • Options
    wesselwessel Member Posts: 537 Maven
    Maybe its not such a good idea to implement this anyway.
    I'm not sure if RM is compatible.
    To store all search results, states need to be optimized for memory.

    The trick is to use information from previous search results to guide the search.

    This is done by combining the top search result.
    Which is a lot like cross over in genetic search algorithms.

    The big difference is, is that there is no random selection to create pairs to generate off springs.
    But instead you use best first selection.
  • Options
    fischerfischer Member Posts: 439 Maven
    Hi,

    all of what you say makes a lot of sense. Looks like you already made some progress.

    However, I don't understand what you mean by "I'm not sure if RM is compatible." Compatible with what?

    Apart from that, I don't see any questions in your post :-) except of these two:

    I think the part below is the heuristic / fitness part / performance?
          innerExampleSource.deliver(exampleSet); // change attributes in dataset?
          getSubprocess(0).execute(); // makes call to learning algorithm?

    Yes. The first pushes the example set to the inner OutputPorts of the subprocess (the ones on the left side). It can then be used by the subprocess, which typically contains the learner and an evaluator. This is executed by the second line of code. After that, you can collect the results from the inner sinks (InputPorts).

    Cheers,
    Simon
Sign In or Register to comment.