Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Patents mining
Hello,
I'm a real newbie and am posting here to ask for help.
In the context of a patent set analysis, I got an extraction (csv/xlsx) of a list of patents, in a semi-structured format: in rows, I have patents, in columns, attributes (aka patent title, abstract, novelty, etc.).
Given the large size of the patent set (>6500 hits), I would like to automate the patent analysis as follows:
1- identify topics (keywords) for each patent
2- cluster patents based on these keywords
3- display clusters with their respective weights
I assume that 1 and 2 can be done through Rapidminer, while 3 could be done with Gephi. But it is only an assumption, as I am a real beginner here: I have never used Rapidminer.
Therefore, any indication on feasibility/guidance on how to start would be really appreciated.
Thank you,
Peter
I'm a real newbie and am posting here to ask for help.
In the context of a patent set analysis, I got an extraction (csv/xlsx) of a list of patents, in a semi-structured format: in rows, I have patents, in columns, attributes (aka patent title, abstract, novelty, etc.).
Given the large size of the patent set (>6500 hits), I would like to automate the patent analysis as follows:
1- identify topics (keywords) for each patent
2- cluster patents based on these keywords
3- display clusters with their respective weights
I assume that 1 and 2 can be done through Rapidminer, while 3 could be done with Gephi. But it is only an assumption, as I am a real beginner here: I have never used Rapidminer.
Therefore, any indication on feasibility/guidance on how to start would be really appreciated.
Thank you,
Peter
0
Answers
indeed there exists a couple of projects
where RapidMiner is the key tool to analyse patent
data. Using the text mining extension documents can be tokenized and
clustered based on word vectors. It doesnt matter whether your
documents/patents are spread over a file system or already put into
an excel sheet/data base.
Especially TF-IDF transformation and n-Grams are used to segment patents effectivley.
We offer a training on this at 21./22.5.2014 in Dortmund.
- Frank