"[SOLVED]Classify PDF files using a set of wordlists"
My SPAD 7.4 went expired and it takes forever for my institute to negotiate a new license so I decided to move to RapidMiner.
I am wondering how to use RM to get to the following outcomes.
I have a large set (about 900) of equity reports in PDF format to analyze. Each report ranges from 1 to 30 pages, but only the sentences with the word “quality” are relevant for my analysis. I have a list of negative words and a list of positive words that are used to describe “quality” constructed by someone else. What process in RM can be used to analyze the sentences with the word “quality” and then classify a PDF as (1) NEGATIVE VIEW if it describes the “quality” using any words from the negative word list, as (2) POSITIVE VIEW if it describes the “quality” using any words from the positive word list, and as (3) UNKNOWN if neither positive nor negative words are used.