The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Bug in Execute R operator
Rapidminerpartner
Member Posts: 35 Contributor II
Hello, how are you, everyone.
I am using "Execute R" operator.
However, if the column name of the input table has Korean alphabet
(that is, if column name is Korean)
it crashes. (Error message shows, talking about java exeception...)
So please fix this problem for Korean users.
Thank you in advance and see you again.
KMC
Tagged:
0
Answers
Below is rmp file...
<?xml version="1.0" encoding="UTF-8"?><process version="9.3.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="9.3.001" expanded="true" name="Process">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2001"/>
<parameter key="send_mail" value="never"/>
<parameter key="notification_email" value=""/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="9.3.001" expanded="true" height="68" name="Retrieve 101_DT_1B04005N_Y_2016---" width="90" x="45" y="34">
<parameter key="repository_entry" value="//Local Repository/processes/101_DT_1B04005N_Y_2016---"/>
</operator>
<operator activated="true" class="r_scripting:execute_r" compatibility="9.1.000" expanded="true" height="103" name="Execute R" width="90" x="246" y="34">
<parameter key="script" value="# rm_main is a mandatory function, # the number of arguments has to be the number of input ports (can be none) rm_main = function(data) { print('Hello, world!') # output can be found in Log View print(str(data)) # your code goes here # for example: data2 <- as.data.table(matrix(1:16,4,4)) # connect 2 output ports to see the results return(list(data,data2)) } "/>
</operator>
<connect from_op="Retrieve 101_DT_1B04005N_Y_2016---" from_port="output" to_op="Execute R" to_port="input 1"/>
<connect from_op="Execute R" from_port="output 1" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Hell, I didn't know I could attach files.
Here you are... and Thank you.
I tried your dataset using your process and it didn't give any error for me, but it is changing the attribute names inside R-script. I tried adding breakpoint before R-Script (Execute-R operator), and it showed me the exact attribute names as present in the CSV file attached in your post. But once it is processed by the script in R it blanked some symbols with boxes as shown in the center figure below. I also see that you didn't write any script in R and just using the default script in Execute R operator. I uploaded the CSV data using read.csv in R-studio separately and observed R is changing your attribute names. This is shown in the last image in the below screenshot.
I used the data imported from CSV file to train a decision tree in rapidminer instead of R-script and see if there is any change in attribute names by Rapidminer, I see there is no change in attribute names.
So, my understanding is that the R program is changing your attribute names as it is unable to understand some special characters. I am not so sure what kind of error you are getting if you have any images of error you can attach the same.
@sgenzer might have something for this.
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
Hello, varunm1
Thank you for your help
I will read your detailed message this evening when I return from
my office
Also I will attach the error message window
Have a nice day, varunm1!
Scott
Hello, varunm1 and everybody
I upload the repository data file (file extension ioo)
Please try to test my source with this attached data file
I believe all of you will see the error message
Thank you
I also upload the capture images showing error messages
Thanks.
I am able to reproduce the error with the repository file you provided. I am kind of confused seeing your data in the repository file, it all consists of some boxes. The earlier .csv file that you provided and I uploaded is fine and it doesn't even throw any errors. I am not sure why this exception is coming maybe Dr. YY can help you with this. Thanks.
Error:
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
Thanks for sharing the data and process. I was able to re-produce the same error as yous and @varunm1 's. The issue is not about RapidMiner because it can handle korean input and parenthesis in the column name, but we can make it better in future integrations. A trick to make your R recognize Korean text, @varunm1 you can set the locale for the R environment. But the setlocale function in R will be overwritten by RapidMiner locale and encoding settings.
As you may know that R can not read data tables with special characters in the column names. So it will automatically convert (,),{,},[,] into dots. But it would fail ans show exceptions "script terminated abnormally" inside RapidMiner if we are passing a data frame with special characters in the columns from RapidMiner to R.
I did two modifications, either rename or select attributes (remove the columns with korean parenthesis) will fix the issue.
Thank yyhuang and varunm1.
I will read your comment when I return from office this evening again
Have a nice day!
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
Hello, YY
Thank you for your help
I will return after checking with my source file
I thought Rapidminer doesn't support Korean column names.
I will check it as you said
Have a nice day!
Hello, YY and varunm1
I have to report that still there's problem
YY said that it will be OK if there's no special characters in the column(attribute) name
but I just checked it cause crash even in such case.
I attached "Select Attribute" to the process
so that "Select Attribute" selects just one attribute, the fifth attribute ("시점") which doesn't contain special characters
but in that case, it still crashes.
I attached the capture images. so please solve the problem for me.
Thank you and see you
Hello, YY and varunm1
Here is the xml, rmp files
Please check those for me,
I am unable to reproduce this error, its working fine for me. @yyhuang I have a question. Why am I seeing boxes instead of korean characters? Am I missing some setting?
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
Good question. I guess you and @Rapidminerpartner are using windows OS.
@Michael also helped test the same data and the encodings under MacOS is smoother.
https://answers.microsoft.com/en-us/windows/forum/all/korean-characters-shown-as-blocks/471ca66a-c09c-4d18-85ed-7aed8afde075
If you have never installed language pack besides English, you may have issues for display of korean characters on windwos.
So I did the following on my win10
I installed language pack for Korean. I have Chinese pack installed for testing Chinese text mining long time ago
The system setting for WinOS is tricky. Hope it helps.
YY
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
Thank you @varunm1 for all your help testing and troubleshooting!!