classification or clustering

Macd · January 2023

Hi,

I am currently busy with a dataset that contains of text. I have questions how to handle this dataset.
- because of the size of the dataset i want to use the filter example for one type of title and sample to decrease the number of items. But how can this be done exactly?
- I want to apply necessary classifications to solve the business problem. I use the operators: Retrieve- nominal to text- process documents and tokenize. Can somebody help me what i do wrong here?

BalazsBarany · January 2023

Hi!

The Filter Examples operator has operators for nominal attributes like "contains", "starts with" or "matches". These should help you filter the title.
Sampling is done with one of the Sample operators.
Academy video: https://academy.rapidminer.com/learn/video/sampling-weighting-intro

I don't think that you're doing something wrong with the steps you're describing in your document classification. You should have a target (label) attribute for the classification and apply a learner like Naive Bayes or Support Vector Machine in a cross validation to the data.

Text Mining is a large topic. Please check out this course in the Academy:
https://academy.rapidminer.com/courses/text-and-web-mining-with-rapidminer

Regards,
Balázs

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

classification or clustering

Answers