Using a custom theme/feature lexicon in opinion mining
I am working on a multi-faceted solution based around web opinion mining. The basic overview of the solution is as follows:
1. Given a certain company scrape review details from a number of opinion websites/forums
2. Perform text analytics to stratify themes across both negative and positive reviews (e.g. Value for Money, Customer Service, Quality)
3. For positive and negative reviews identify products and product groups that are mentioned (i.e. so that we can link opinion (positive/negative) with products (e.g. milk, meat) and theme (quality, value for money)
3. At the same time perform basket analysis to look for mention of product groupings together (e.g. bread and milk)
4. At the same time I have built a process to identify similarity between postings to identify possible duplicates and/or spamming by competitors
In order to achieve 2, 3 and 4 above I have built basic theme and product lexicon files which contain themes and terms and product group and product terms. My problem comes when trying to join these files to my word vector output from my process documents step. Any guidance on how to achieve this?!
Many thanks in advance!