"Hierarchical Text Classification"

mdcmdc Member Posts: 58 Maven
edited May 2019 in Help
Hi,

I am planning to do a hierarchical classification using a top-down approach. My idea of the top-down approach is to first classify the exampleset using the top classes, then filter the examples of each top class and apply another classification using the top class' subclasses.

Here is an example procedure:
1. generate exampleset
2. classify into class X, Y or Z
3. filter examples of predicted X class
4. classify X examples into subclass X1, X2 or X3

... iterate the other top classes

9. merge the filtered examples into one exampleset.
10. end

I was about to start building the process when I realize that using the modelapplier requires also to apply the training word list to the exampleset. That means I need to have as many training word list as the models. How do I load the other training word lists? I know that the first training word list can loaded in the TextInput operator. But how about the other word lists?

Or is there a better way of doing this in RM?

thanks in advance.
Matthew

Answers

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    why don't you use always the same wordlist built over the complete corpus?

    Greetings,
      Sebastian
  • mdcmdc Member Posts: 58 Maven
    Hi,

    If I understand correctly, I can use the wordlist built from the original exampleset, and then apply the different models subsequently without needing to load new wordlist:

    - TextInput <--- load the wordlist
    - Apply ModelXYZ
    - Filter X
    - Apply ModelX1X2X3
    etc

    Is this correct?

    thanks
    Matthew
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Yes. This way it should work.



    Greetings,
      Sebastian
  • TechCrunchTechCrunch Member Posts: 1 Contributor I
    I've a similar question. I have a review dataset, each review with labels 1, 2, 4 and 5. I want to first label test data  1, 2, 4 and 5 and then filter training data set of 1 and 2 -> classify 1 and 2 of test dataset again as 1 and 2 with new model. Similarly for 4 and 5. I'm not sure if I can use hierarchical classification operator in that case.
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi,

    of course you can do that, or, if you want to have more control, first use e.g. the Generate Attributes operator to combine 1 and 2, and 4 and 5, then train a classifier to separate 1_2 from 4_5 and pass the data then to the next, more fine-grained classifiers.

    Best regards,
    Marius
Sign In or Register to comment.