The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
I need help building taxonomies from large number of documents
boatanchorguy
Member Posts: 3 Contributor I
Thank you to Marius for the Read Before Posting instructions. Following his suggestions,
1. Describe what you are doing.
I need to build many taxonomies from a large number of documents.
2. If you are working with data, give a detailed description of your data (number of examples and attributes, attribute types, label type etc.).
I did enormous amounts of searches over months and now have several thousand documents I need to process, mostly pdf, some msword, excel, & ppt.
3. Describe which results or actions you are expecting.
I need good clean taxonomies, for many different topics. I am hoping to set up a proper method using Rapidminer, but there does not seem to be an obvious pathway to do this.
Ideally, for each topic, the method would a) pre-process the documents, filtering for such items as the proper word or key phrase in the title, or the abstract; b) assembling the filtered documents; c) (optional) extracting tables of contents, indices, glossaries, etc.; d) extracting and amalgamating the sub-topics appropriate to the particular topic; e) generating the taxonomy.
I am new to Rapidminer, and relatively new to data mining in general, so please keep it simple for me.
Please help me understand any and all methods I could use to accomplish this.
And please let me know if I am following the proper procedures for this forum, or how I can improve this post.
Thank you very much.
Sam
1. Describe what you are doing.
I need to build many taxonomies from a large number of documents.
2. If you are working with data, give a detailed description of your data (number of examples and attributes, attribute types, label type etc.).
I did enormous amounts of searches over months and now have several thousand documents I need to process, mostly pdf, some msword, excel, & ppt.
3. Describe which results or actions you are expecting.
I need good clean taxonomies, for many different topics. I am hoping to set up a proper method using Rapidminer, but there does not seem to be an obvious pathway to do this.
Ideally, for each topic, the method would a) pre-process the documents, filtering for such items as the proper word or key phrase in the title, or the abstract; b) assembling the filtered documents; c) (optional) extracting tables of contents, indices, glossaries, etc.; d) extracting and amalgamating the sub-topics appropriate to the particular topic; e) generating the taxonomy.
I am new to Rapidminer, and relatively new to data mining in general, so please keep it simple for me.
Please help me understand any and all methods I could use to accomplish this.
And please let me know if I am following the proper procedures for this forum, or how I can improve this post.
Thank you very much.
Sam
0