The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"Any Text Processing 5 extension examples?"
thomas0221
Member Posts: 4 Contributor I
Dear RapidMiner Experts,
I am able to get RapidMiner 4.6 and Text Plugin 4.6 work with the help from "rapidminer-text-4.6-tutorial.pdf" and "rapidminer-text-4.6-examples.zip", and other online resources including the discussions in this user forum. However, when I try basic text mining tasks (such as the ones based on the idea in "rapidminer-text-4.6-examples.zip") in RapidMiner 5 with Text Processing 5 extension, I have no luck. It seems that some members in this forum have figured out how to use Text Processing 5 extension in RapidMiner 5 for some basic tasks that we can accomplish in V4.6. So I wonder whether some of experts could help to share some of your working examples of text mining process XML file with RapidMiner 5. I understand that RapidMiner 5 product team has limited resources and time. Thus they do not get a chance to provide completed tutorial and examples for Text Processing 5 extension in RapidMiner 5 (for the same reason V4.6 has Web Crawler, but V5 does not yet). I wish some community members could help out by sharing your sample XML files of text mining process. I would greatly appreciate the help. The documentation, tutorial, and examples are the single defining factor to get the software work or not.
By the way, I have been using RapidMiner only about 10 days and I am impressed with the rich features. With RapidMiner 5 I like the new flow design (compared to V4.6's tree process), meta-data availability on design page, and quick fix suggestions. However, I find that the process designed in RapidMiner 4.6 cannot be imported to RapidMiner 5. Also RapidMiner 5's process XML file cannot be opened in V4.6. I understand the significant changes from V4.6 to V5, many operators get name changed and reorganized to be more logical. I guess one way to get around for getting V4.6's process XML work in new V5 is to just redesign the process from scratch in V5.
Thanks,
Thomas
I am able to get RapidMiner 4.6 and Text Plugin 4.6 work with the help from "rapidminer-text-4.6-tutorial.pdf" and "rapidminer-text-4.6-examples.zip", and other online resources including the discussions in this user forum. However, when I try basic text mining tasks (such as the ones based on the idea in "rapidminer-text-4.6-examples.zip") in RapidMiner 5 with Text Processing 5 extension, I have no luck. It seems that some members in this forum have figured out how to use Text Processing 5 extension in RapidMiner 5 for some basic tasks that we can accomplish in V4.6. So I wonder whether some of experts could help to share some of your working examples of text mining process XML file with RapidMiner 5. I understand that RapidMiner 5 product team has limited resources and time. Thus they do not get a chance to provide completed tutorial and examples for Text Processing 5 extension in RapidMiner 5 (for the same reason V4.6 has Web Crawler, but V5 does not yet). I wish some community members could help out by sharing your sample XML files of text mining process. I would greatly appreciate the help. The documentation, tutorial, and examples are the single defining factor to get the software work or not.
By the way, I have been using RapidMiner only about 10 days and I am impressed with the rich features. With RapidMiner 5 I like the new flow design (compared to V4.6's tree process), meta-data availability on design page, and quick fix suggestions. However, I find that the process designed in RapidMiner 4.6 cannot be imported to RapidMiner 5. Also RapidMiner 5's process XML file cannot be opened in V4.6. I understand the significant changes from V4.6 to V5, many operators get name changed and reorganized to be more logical. I guess one way to get around for getting V4.6's process XML work in new V5 is to just redesign the process from scratch in V5.
Thanks,
Thomas
Tagged:
0
Answers
-Jen
in general RapidMiner 4.x process files are very well importable to RapidMiner 5.0. We made a huge effort in writing an import mechanism although the process structure has been changed completely and several operators had been redesigned to make their parameter settings more user friendly and understandable. Even for the old plugins we wrote Import rules and so we would have done with the Text Plugin. Unfortunately we found it much too limited in the old version, hard to maintain and it didn't fit into the RapidMiner construction with IO Objects very well, because it rather tended to writing everything into temporary files. So we decided to redesign it from scratch, keep the best ideas (and there were many) and combine it with an up to date way of handling data objects. The result is a more flexible, more powerful and a much faster (!) Extension, that unfortunately changed so much, that old processes couldn't be adapted automatically. So only for processes containing operators of the former Text Plugin, you need to redesign your processes.
Here I will give you a basic example of how to work with the Text Processing Extension. The below process will load data, that contains two attributes of type text. They are chosen for Vector Creation by the specify weights parameter of the Process Documents operator.
Inside the process Documents operator, first of all all letters are changed to lower case, then the texts are splitted into the single tokens and finally stemmed. Each token of the document delivered finally to the Process Documents operator will become part of the word list and hence a single attribute in the resulting word vector forming the example set.
During this transformation, Meta Data might be attached to the documents. If you make a breakpoint inside the Process Documents operator, you will see all meta data at the right of the text. This meta data is added as additional attribute to the resulting ExampleSet if the add_meta_information parameter of the Process Documents operator is checked.
Here's the process:
And here's what the data looks like:
label score numeric
regular reasons_negative text
regular reasons_positive text
regular customer_age polynominal
regular customer_type polynominal
regular customer_group polynominal
and here's a small snippet from the data:
1 7.3 Hoher Preis für Internetnutzung. Schnelles Hotel - schnell in der City. 41-50 Jahre geschäftlich allein reisend
2 8.7 Bei dem Preis für´s Frühstück fehlt uns ein wenig der Fisch aber es geht auch mal ohne. Auch nach unserem 3. Besuch in diesem Hotel. Alles in Ordnung, besonders das Personal, immer freundlich, immer hilfsbereit, kurz gesagt immer gut drauf. 51-60 Jahre geschäftlich als Paar reisend
I hope this will help you, to get your processes run again. After this, you will reveal the new possibilities bit by bit. Anyway we will add a basic tutorial as soon as possible.
Greetings,
Sebastian
Thank you so much for your example text processing XML code. Based on your example, I finally figure it out using Text Processing extension. What struck me (and maybe for other newbie) is that in RapidMiner 5 design workspace, it has parent and child sub-process. I need to navigate from parent process (such as Process Documents from Data or cross validation) to its child sub-process by double clicking the parent process. then in the child sub-process page, I can add Tokenize, stopword filter, stem ... I should not add these sub-process in the parent level process. Maybe this is the reason that I did not get RapidMiner 5 Text Processing Extension work in the first place, as put Process Documents from Data, Tokenize, stopword filter, stem ... at the same level and try to connect them. anyway, it is only my partial understanding and I might be wrong. While in RapidMiner 4.6 Text Plugin, in the tree design mode, everything appears in the same page. Moving to RapidMiner 5, I should understand the parent-child sub process relationship. Just in case other users want to see a simple example, I attach my text mining process XML file bellow. You could change the text directories to your local ones, while I use the example data coming with wvtool-1.1.
Thomas
if you are used to the tree, you might add the Tree as a View in RapidMiner 5, too. It will give you an overview what your process is about. There has been only slight changes, because now subprocesses are modeled explicitly instead of the implicit design in RapidMiner 4.x
Greetings,
Sebastian
Thank you for your help. I do find "Tree View" in RapidMiner V5, under "View" --"Show View". So I can use the Tree view in RapidMiner V5.
In RapidMiner V5, I see a new feature of searching operators by name. I can type in part of the name of an operator that I vaguely remember, then the software will find some relevant ones for me. However, in Rapidminer 4.6 I do not see such operator search filter. Is there any way to search operators in RapidMiner V4.6?
Moreover, in RapidMiner V4.6, it has BOX View that I can export to a JPEG file of the process design. In RapidMiner V5, I cannot find such BOX View. So does RapidMiner V5 only support Flow View and Tree View? No Box View anymore?
Thanks!
Thomas
the search box in RapidMiner 4.6 is below the operator tree. But it is there. Otherwise you could use the new operator dialog, where you can filter and search after various properties.
The box view is gone now, because the data flow is now modeled explicitly and not implicitly, so that the process isn't well defined with only the execution order of the operators.
Greetings,
Sebastian
Can you provide a 100 rows (or more) snip of the file "D01 - ProcessedHotelCustomerSatisfaction_de"? I couldn't find it in the sample repository.
Do you plan to provide samples for the new text processing extension?
we already planned to deliver it with the first version...I will see what we can do.
Greetings,
Sebastian