NOTE: IF YOU WISH TO REPORT A NEW BUG, PLEASE POST A NEW QUESTION AND TAG AS "BUG REPORT". THANK YOU.
The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
In Rapid Miner Go the linear regression algorithm used some inputs I did not select.
BillP
Member Posts: 9 Learner I
I hope my question has not been asked before. In short, RapidMiner Go seems to be running a regression with variables I did not select. An explanation follows. In Rapid Miner Go I dropped a csv files with 64 columns and almost 2900 rows. I wanted to predict a single column (of numbers) using linear regression and decision tree ("Easily Interpretable"). The first two columns were date and time. The other columns were numbers. I selected only 5 inputs and an indicator on that page said 5 were selected. I ran the regression and in the Data Metrics it reported the correlation for the 5 inputs that I selected plus 7 others I did not select. Assuming that it ran the regression with the 7 inputs I did not select how do I run the regression with only the 5 inputs I selected? Thanks very much. Regards, Bill
Tagged:
1
Comments
Can you cross-check if the model is built on more than what you selected? You can do so by clicking on the model link after it executes and then scroll down to see how many attributes are there with coefficients.
Coefficients checking:
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
In Rapidminer Go, once you click on the model link as inform earlier. You have an option called "Export" on top right corner. If you click on that, you will have an option called "Download Process". Can you download that process file and attach it here to check?
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
Thanks for sharing this. I will take a look, also if possible try to share your data here or in a private message so that I can rerun and explain to you the reason for his phenomenon.
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
No problem, Lets keep this question open as I want our friends at RM to check this and maybe open a ticket to resolve this comma issue. I am not sure if there is already a NOTE that says we cannot use comma in attribute names but will wait to see this getting resolved so that there won't be future issues for anyone.
@sgenzer any inputs here?
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
In order to repeat this error, please upload this CSV file to RapidMiner GO and select "angle" as prediction variable and attributes in the below image (which doesn't have comma) and use default selections in the next window, Easily Interpretable and everything left as default and run the analysis.
Once the analysis is done, we can observe the GLM model also used unselected attributes as shown below.
The reason observed is related to the presence of "," comma in the attribute name. My understanding is that REGEX function present in the Load and Process Data --> Remove column module is being tricked by this comma value. I don't see this once the comma is removed from attribute names. Also, with a comma in the attribute name, this doesn't happen in the auto model.
I am not sure if there is an instruction to not use comma in attribute names.
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing