Patents mining

pfbpfb Member Posts: 1 Contributor I
edited November 2018 in Help

I'm a real newbie and am posting here to ask for help.
In the context of a patent set analysis, I got an extraction (csv/xlsx) of a list of patents, in a semi-structured format: in rows, I have patents, in columns, attributes (aka patent title, abstract, novelty, etc.).

Given the large size of the patent set (>6500 hits), I would like to automate the patent analysis as follows:
1- identify topics (keywords) for each patent
2- cluster patents based on these keywords
3- display clusters with their respective weights

I assume that 1 and 2 can be done through Rapidminer, while 3 could be done with Gephi. But it is only an assumption, as I am a real beginner here: I have never used Rapidminer.

Therefore, any indication on feasibility/guidance on how to start would be really appreciated.

Thank you,


  • frasfras Member Posts: 93 Contributor II
    Hi Peter,

    indeed there exists a couple of projects
    where RapidMiner is the key tool to analyse patent
    data. Using the text mining extension documents can be tokenized and
    clustered based on word vectors. It doesnt matter whether your
    documents/patents are spread over a file system or already put into
    an excel sheet/data base.
    Especially TF-IDF transformation and n-Grams are used to segment patents effectivley.
    We offer a training on this at 21./22.5.2014 in Dortmund.
    - Frank
Sign In or Register to comment.