The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Patents mining
Hello,
I'm a real newbie and am posting here to ask for help.
In the context of a patent set analysis, I got an extraction (csv/xlsx) of a list of patents, in a semi-structured format: in rows, I have patents, in columns, attributes (aka patent title, abstract, novelty, etc.).
Given the large size of the patent set (>6500 hits), I would like to automate the patent analysis as follows:
1- identify topics (keywords) for each patent
2- cluster patents based on these keywords
3- display clusters with their respective weights
I assume that 1 and 2 can be done through Rapidminer, while 3 could be done with Gephi. But it is only an assumption, as I am a real beginner here: I have never used Rapidminer.
Therefore, any indication on feasibility/guidance on how to start would be really appreciated.
Thank you,
Peter
I'm a real newbie and am posting here to ask for help.
In the context of a patent set analysis, I got an extraction (csv/xlsx) of a list of patents, in a semi-structured format: in rows, I have patents, in columns, attributes (aka patent title, abstract, novelty, etc.).
Given the large size of the patent set (>6500 hits), I would like to automate the patent analysis as follows:
1- identify topics (keywords) for each patent
2- cluster patents based on these keywords
3- display clusters with their respective weights
I assume that 1 and 2 can be done through Rapidminer, while 3 could be done with Gephi. But it is only an assumption, as I am a real beginner here: I have never used Rapidminer.
Therefore, any indication on feasibility/guidance on how to start would be really appreciated.
Thank you,
Peter
0
Answers
indeed there exists a couple of projects
where RapidMiner is the key tool to analyse patent
data. Using the text mining extension documents can be tokenized and
clustered based on word vectors. It doesnt matter whether your
documents/patents are spread over a file system or already put into
an excel sheet/data base.
Especially TF-IDF transformation and n-Grams are used to segment patents effectivley.
We offer a training on this at 21./22.5.2014 in Dortmund.
- Frank