Creating ExcelExampleSet

rtaankrtaank Member Posts: 10 Contributor II
edited November 2018 in Help
Hi have successfully managed to read an excel sheet containing 30 rows of unique data (with just one regular attribute).

I am then trying to pipe this into the NB operator and I am getting the following error upon execution:

Mar 11, 2009 12:53:25 PM: [Fatal] UserError occured in 1st application of NaiveBayes (NaiveBayes)
Mar 11, 2009 12:53:25 PM: [Fatal] Process failed: Input example set has no attributes
          Root[1] (Process)
          +- ExcelExampleSource[1] (ExcelExampleSource)
here ==> +- NaiveBayes[1] (NaiveBayes)

Any ideas why this is the case?

I want to classify the 30 pieces of text (i.e. each row in the excel sheet) into associated groups.

Thanks.

Answers

  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hello,

    several remarks:

    1. you have to use the Text Plugin in order to transform your texts into word vectors with the StringTextInput operator
    2. you do not seem to have a label --> clustering seems more appropriate than NaiveBayes which is a classification method

    Cheers,
    Ingo
  • rtaankrtaank Member Posts: 10 Contributor II
    Thanks for that.

    So which clustering algo do you recommend for standard written english text?
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi,

    there is no standard algorithm - just try them and check which one delivers results you like best. If performance is an issue, I would start with KMeans, if you want something hierarchical and you have not too many examples, I would try agglomerative clustering.

    Cheers,
    Ingo
  • rtaankrtaank Member Posts: 10 Contributor II
    Okay i will consider those clustering algorithms, performance really isn't an issue, but will experiment with the various unsupervised algos.

    Going back to your original responses/remarks however:

    1. you have to use the Text Plugin in order to transform your texts into word vectors with the StringTextInput operator

    my_response: will i need to do this for the clustering algorithms too? or just for the classification algorithms?

    2. you do not seem to have a label --> clustering seems more appropriate than NaiveBayes which is a classification method

    my_response: what are these labels? i have been through the documentation but cannot fully interpret why the labels are required? also, within my excel sheet, do i need to have another column for these labels? what are they used for? ideally i would like to use supervised learning in order to produce a model.

    Thanks Ingo.
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi,

    will i need to do this for the clustering algorithms too? or just for the classification algorithms?
    Yes.

    what are these labels? i have been through the documentation but cannot fully interpret why the labels are required? also, within my excel sheet, do i need to have another column for these labels? what are they used for? ideally i would like to use supervised learning in order to produce a model.
    Labels are the classes you provide during the training phase. The different values of the label column will then be predicted by a classification model for new and unseen data (which no longer needs a given label). For supervised learning, you will always need a label (target, class... you name it). If you are not able to provide a label, then you usually perform an unsupervised learning method instead (like clustering).

    Cheers,
    Ingo
  • rtaankrtaank Member Posts: 10 Contributor II
    Thanks Ingo, a fantastic explanation!
Sign In or Register to comment.