Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Text Mining classification problem with two data sets
mschmidkon
Member Posts: 2 Contributor I
Hey!
I have an issue with text mining and classification according to keywords with two datasets. The goal is to classify products according to textual description.
INITIAL SITUATION:
I've got two data sets, the first one contains a unique identifier (a number representing a product) and four columns including text describing this product (short/long text description etc.). The second data set contains two columns, the first one is text describing a label for classification and the second column contains a classification code. The goal is to classify the products from data set 1 according to the second data set, therefore, identical word occurences have to be identified and the classification code with the highest occurences of similar words should be taken. The process should take one product from the first data set and look up all labels from the second data set in order to find the best suiting label.
CURRENT SITUATION:
I created a RapidMiner process which reads the two csv files seperately, converts the input with 'Process Documents from Data' including Tokenizing, Filter Stopwords, Stem and Generate n-Grams. The result set includes the occurences of the tokenized words and now I want to compare the result sets of the two data sets (both data sets don't have the same amount of attributes in the same order, but there are identical ones) with the goal to find 'similar' words and classify the product. Does anybody know how to compare these two datasets with an operator from rapidminer and how to classify these products?
Thank you very much!
Michael
I have an issue with text mining and classification according to keywords with two datasets. The goal is to classify products according to textual description.
INITIAL SITUATION:
I've got two data sets, the first one contains a unique identifier (a number representing a product) and four columns including text describing this product (short/long text description etc.). The second data set contains two columns, the first one is text describing a label for classification and the second column contains a classification code. The goal is to classify the products from data set 1 according to the second data set, therefore, identical word occurences have to be identified and the classification code with the highest occurences of similar words should be taken. The process should take one product from the first data set and look up all labels from the second data set in order to find the best suiting label.
CURRENT SITUATION:
I created a RapidMiner process which reads the two csv files seperately, converts the input with 'Process Documents from Data' including Tokenizing, Filter Stopwords, Stem and Generate n-Grams. The result set includes the occurences of the tokenized words and now I want to compare the result sets of the two data sets (both data sets don't have the same amount of attributes in the same order, but there are identical ones) with the goal to find 'similar' words and classify the product. Does anybody know how to compare these two datasets with an operator from rapidminer and how to classify these products?
Thank you very much!
Michael
Tagged:
1
Best Answer
-
rfuentealba RapidMiner Certified Analyst, Member, University Professor Posts: 568 UnicornHey @mschmidkon,Do you mind to share your process with us, so that we can provide you better guidance?All the best,Rod.7
Answers