NOTE: IF YOU WISH TO REPORT A NEW BUG, PLEASE POST A NEW QUESTION AND TAG AS "BUG REPORT". THANK YOU.
The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

In Rapid Miner Go the linear regression algorithm used some inputs I did not select.

BillPBillP Member Posts: 9 Learner I
edited April 2020 in Product Feedback
I hope my question has not been asked before. In short, RapidMiner Go seems to be running a regression with variables I did not select. An explanation follows. In Rapid Miner Go I dropped a csv files with 64 columns and almost 2900 rows. I wanted to predict a single column (of numbers) using linear regression and decision tree ("Easily Interpretable"). The first two columns were date and time. The other columns were numbers. I selected only 5 inputs and an indicator on that page said 5 were selected. I ran the regression and in the Data Metrics it reported the correlation for the 5 inputs that I selected plus 7 others I did not select. Assuming that it ran the regression with the 7 inputs I did not select how do I run the regression with only the 5 inputs I selected? Thanks very much. Regards, Bill
1
1 votes

Sent to Engineering · Last Updated

IC-1842

Comments

  • varunm1varunm1 Member Posts: 1,207 Unicorn
    edited April 2020
    Hello @BillP

    Can you cross-check if the model is built on more than what you selected? You can do so by clicking on the model link after it executes and then scroll down to see how many attributes are there with coefficients.



    Coefficients checking:


    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • BillPBillP Member Posts: 9 Learner I
    Thank you Varun. I did what you suggested (actually I have checked the coefficients many many time) and there are 12 coefficients not 5 as there should be. The 7 extra inputs seem to have been randomly chosen. When I go to Model Simulator and move the slider corresponding to the inputs that I did not select over their entire ranges, they will move the predicted variable by a small amount, 0.01 or 0.001%. The "weight" for some of them seems significant but moving the slider for that variable doesn't move the predicted value very much. It is as if the regression went haywire. I don't know why.
  • BillPBillP Member Posts: 9 Learner I
    I suppose it doesn't matter, but the fit is very good. The plot of model versus actual value looks a lot better than I expected it would be.
  • varunm1varunm1 Member Posts: 1,207 Unicorn
    That seems weird. Can you do the following and provide me your process?

    In Rapidminer Go, once you click on the model link as inform earlier. You have an option called "Export" on top right corner. If you click on that, you will have an option called "Download Process". Can you download that process file and attach it here to check?


    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • BillPBillP Member Posts: 9 Learner I
    Thanks. I attached the exported model.
  • varunm1varunm1 Member Posts: 1,207 Unicorn
    Hello @BillP

    Thanks for sharing this. I will take a look, also if possible try to share your data here or in a private message so that I can rerun and explain to you the reason for his phenomenon.
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • BillPBillP Member Posts: 9 Learner I
    Hi Varun, Thanks so much. By private message do you mean your private email at your web site?
  • varunm1varunm1 Member Posts: 1,207 Unicorn
    edited April 2020
    I got your email. In the future you can use the message option on the rapidminer community as well. If you click on my name it will take you to my profile, you can find "Message option on Right corner. You can send a message attaching the file. Sample image below.




    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • BillPBillP Member Posts: 9 Learner I
    Thanks!
  • BillPBillP Member Posts: 9 Learner I
    As you indicated in your private message to me, I removed commas from the 6 column headings that I did not select but still ended up in the analysis. After removing the commas, the linear regression model results showed only 5 inputs and the estimation results were fantastic. I hope this info about commas in the column headings helps anybody that might have the same problem. However, I don't think anyone in the future will find this thread unless the subject is something like "Commas in column headings cause a problem when selecting inputs". I am a little surprised that this was not caught long ago. I use commas to separate a label and the units of that label such as "Mass flow, t/h". The following should not cause a problem in RapidMiner Go: "Mass flow [t/h]" Thanks so much for your help and have a good day!
  • BillPBillP Member Posts: 9 Learner I
    How do I credit you Varun with the answer? I can't click yes on my comment because I just acted on your advice
  • varunm1varunm1 Member Posts: 1,207 Unicorn
    Hello Bill,

    No problem, Lets keep this question open as I want our friends at RM to check this and maybe open a ticket to resolve this comma issue. I am not sure if there is already a NOTE that says we cannot use comma in attribute names but will wait to see this getting resolved so that there won't be future issues for anyone.

    @sgenzer any inputs here?
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • BillPBillP Member Posts: 9 Learner I
    If there are commas in the column headings should there be a warning to the user when they are detected. It is not that inconvenient to remove them, once you know that they should not be there. The inconvenience is finding out that information. Congrats on noticing the commas, that was very astute. 
  • varunm1varunm1 Member Posts: 1,207 Unicorn
    edited April 2020
    Hello @sgenzer ,

    In order to repeat this error, please upload this CSV file to RapidMiner GO and select "angle" as prediction variable and attributes in the below image (which doesn't have comma) and use default selections in the next window, Easily Interpretable and everything left as default and run the analysis.



    Once the analysis is done, we can observe the GLM model also used unselected attributes as shown below. 



    The reason observed is related to the presence of "," comma in the attribute name. My understanding is that REGEX function present in the Load and Process Data --> Remove column module is being tricked by this comma value. I don't see this once the comma is removed from attribute names. Also, with a comma in the attribute name, this doesn't happen in the auto model.

    I am not sure if there is an instruction to not use comma in attribute names.
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • sgenzersgenzer Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    thx @varunm1 I'm pushing this to Prod Feedback and will report to the RM Go team.
Sign In or Register to comment.