The Altair Community is migrating to a new platform to provide a better experience for you. The RapidMiner Community will merge with the Altair Community at the same time. In preparation for the migration, both communities are on read-only mode from July 15th - July 24th, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here.
Options

A strange result by repeating invoking the apply() method

gfyanggfyang Member Posts: 29 Maven
edited November 2018 in Help
Hi,

I am building a text classifier,

// build the text input
OperatorChain textInput = (OperatorChain) OperatorService.createOperator("TextInput");
List<String[]> para = new ArrayList<String[]>();
String[] para1 = {"graphics", "c:/data/Reuters/acq"};
String[] para2 = {"hardware", "c:/data/Reuters/corn"};
String[] para3 = {"hardware", "c:/data/Reuters/crude"};
String[] para4 = {"hardware", "c:/data/Reuters/earn"};
String[] para5 = {"hardware", "c:/data/Reuters/grain"};
para.add(para1);
para.add(para2);
para.add(para3);
para.add(para4);
para.add(para5);
textInput.setListParameter("texts", para);
textInput.setParameter("prune_below", "3");
textInput.setParameter("output_word_list", "d:/test/word.list");

Operator stringTokenizer = OperatorService.createOperator("StringTokenizer");
Operator stopWord = OperatorService.createOperator("EnglishStopwordFilter");
Operator tokenLen = OperatorService.createOperator("TokenLengthFilter");
tokenLen.setParameter("min_chars", "3");
Operator stemmer = OperatorService.createOperator("PorterStemmer");
Operator gramGenerator = OperatorService.createOperator("TermNGramGenerator");
textInput.addOperator(stringTokenizer);
textInput.addOperator(stopWord);
textInput.addOperator(tokenLen);
textInput.addOperator(stemmer);
textInput.addOperator(gramGenerator);

// build the validation
OperatorChain xValidation = (OperatorChain) OperatorService.createOperator("XValidation");
OperatorChain applierChain = (OperatorChain) OperatorService.createOperator("OperatorChain");
xValidation.setParameter("keep_example_set", "true");
Operator naiveBayes = OperatorService.createOperator("KernelNaiveBayes");
Operator modelApplier = OperatorService.createOperator("ModelApplier");
Operator performance = OperatorService.createOperator("ClassificationPerformance");
performance.setParameter("accuracy", "true");
applierChain.addOperator(modelApplier);
applierChain.addOperator(performance);
xValidation.addOperator(naiveBayes);
xValidation.addOperator(applierChain);

// start applying
IOContainer container = textInput.apply(new IOContainer());
container = xValidation.apply(container);
PerformanceVector pv = container.get(PerformanceVector.class);
double precision = pv.getCriterion("accuracy").getAverage();
// the result is 0.89

container = xValidation.apply(container);
pv = container.get(PerformanceVector.class);
precision = pv.getCriterion("accuracy").getAverage();
// the result is 0.86

container = xValidation.apply(container);
pv = container.get(PerformanceVector.class);
precision = pv.getCriterion("accuracy").getAverage();
// the result is 0.90

xValidation.apply(container) is invoked 3 times, giving 3 completely different results. WHY?

Because nothing is changed about the data and the learner, the results should be the same, in my opinion.

Sincerely yours,
gfyang
Tagged:

Answers

  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    this should only the case if you set at all operators to use a local random seed. Otherwise they will use the same continuous stream of random numbers and hence will have different results. For example the XValidation then splits the data set in different sets.

    Greetings,
      Sebastian
  • Options
    gfyanggfyang Member Posts: 29 Maven
    Hi, Sebastian,

    Thank you for the answer.

    For the above example, how to set the rand seed in RM, so that three runs have the same random numbers and the same xValidation results? It may be related to RandomGenerator, however, I am sorry that I have not figured it out.

    Sincerely yours,
    gfyang
  • Options
    gfyanggfyang Member Posts: 29 Maven
    Hi, Sebastian,

    I have figured it out. It is so easy  :D

    In xValidation, there is a parameter, sampling_type. If this is linear, then there is no random factor. For shuffled or stratified sampling, just set another parameter local_random_seed.

    Thank you very much for the help.

    Sincerely yours,
    gfyang
Sign In or Register to comment.