The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Arabic words recognition
I was wondering if someone could solve the encoding problem for Arabic language . Basically , by choosing the right encoding forma t in the content_encoding _parameter the system displays the Arabic word correctly in the result view . However , two problem raised :
1. The message viewer when I apply a model displays the words as "?????" .
2. The wordlist produced also consists of question marks instead of words.
3. When I try to use StopWordFilter , I discovered that the system isn't able to match Arabic to filter .
Thanks in advance;
Hassan
1. The message viewer when I apply a model displays the words as "?????" .
2. The wordlist produced also consists of question marks instead of words.
3. When I try to use StopWordFilter , I discovered that the system isn't able to match Arabic to filter .
Thanks in advance;
Hassan
0
Answers
did you try to also define the encoding in the main process operator (root)? Maybe this helps.
About the stop words: RM currently does not support a stop word filter for arabic words but you could simply create one with the file based stop word filter (don't remember the exact name right now).
Cheers,
Ingo
Thanks for your prompt response .
actually , I have defined the encoding in the root process and in the preference and it didn't work . However, i want to know if there is an
enhancement of output encoding in Rapidminer because as i said in the beginning , the reading process of the input data was perfect .
i am looking for your help to resolve this problem.
cheers ;
Hassan
hmm, that's sort of weird. I must admit that we do not have any experience with Arabic characters but we know that the output should also work for Chinese characters so I assume there is no principal problem with this. Could you provide us some texts so we could try to find out what's going on?
Thanks and cheers,
Ingo
i been waiting for your response .
this sample of arabic texts:
ان الرهن العقاري ذا الأصول الإسلامية، عُمل فيه بطرق موسعة وناجحة بكل المقاييس، في الدول الأجنبية، ونقل هذا النظام عن طريق باحثين تخصصوا في الرهن العقاري، إلى دول إسلامية مثل ماليزيا، وسنغافورة، وكذلك البحرين ودبي.
i appreciate your reaction , and i really need to sort this out . Also, to keep informed about the probelm , it is in writing with the program give the
feedback. it looks direct with default encoding not with the specified encoding .
i am eagrly awaited to hear form you , because i need to sort it out to start my disertation .
Cheers;
Hassan
I must admit that I was not even able to properly work with the test sample since I had no program available which was able to display it. I wanted to create a small data file containing some of the words together with an .aml file describing the data in order to work with that but I didn't manage to get create those files - at least I was not able to see anything and I am assuming I lost the information about the characters somewhere in this process.
My suggestion: please create a date file together with an .aml file which I can directly load with the ExampleSource operator. Please also specify the encoding in the .aml file and attach both files together with the information about the correct encodig here. Maybe then I am able to sort out what's happening in the output.
Cheers,
Ingo
https://sourceforge.net/tracker/?func=detail&aid=2724678&group_id=131810&atid=722307
If you need to save a modelfile as something other than binary, then changes also have to be made to the ModelWriter operator.
Regards,
Andreas
thank you for this hint. Since we are not faced with non latin text in the usual day work, we weren't fully aware of this. But we will keep this in mind, while revising the text plugin for the next major version of rapidMiner.
Greetings,
Sebastian
Thanks,
Steve