The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
How to find out average ratios between more than 2 variables ?
CaptainChaos
Member Posts: 17 Contributor II
in Help
Hi Guys,
My Data is in the Format shown below:
Its real Data from a Transporting agency where I am doing my bachelor thesis at the moment. I will first explain the data even if most of it should pretty clear. The first Attribute "id" is just the id of the vehicle sending the Data. Second Attribute "DrivenDistance" is the Distance the truck traveld, the the third attribute is the Time the Truck travlled in seconds, the fourth attribute are the litres the truck used for the traveled distance, the fith attribute "weight" is the averrage weight of the truck during the journey, the six attribute is the note calculate for the driver beacause of his style of driving and the seventh attribute "Routdificulty" means how hard the rout to drive is, that means for example driving thorugh the mountains with a lot of weight and speed will give a higher mark.
So what i would like to find out is how the ratio between such variable is in average to check the plausibility of each veriable. For example i would like to make conclusions like:" If the DrivenDistance, time,weight,Routdificult, DriverNote the TotalConsumption should be between x and y".
So i started to calculate the correlation between the attributes and they are pretty weak with one exception the TotalConsumption is strongly correlated to Drivendistance (0,944) which is pretty logical. But i know from field tests that the Weight and Routdificulty should influence it more than the correlation schows (0.0177 and 0.22).
So my question is if there is anyway to find out/make conclusion about the ratios between more than 2 variables? should i use another method than the correlation matrix? or should i change my process listed below?:
Any advice would be highly apreciatted (if i didnt explained it in suffiecient detail or logicaly enough please ask me - english isnt my naitive languae) ;D
My Data is in the Format shown below:
Its real Data from a Transporting agency where I am doing my bachelor thesis at the moment. I will first explain the data even if most of it should pretty clear. The first Attribute "id" is just the id of the vehicle sending the Data. Second Attribute "DrivenDistance" is the Distance the truck traveld, the the third attribute is the Time the Truck travlled in seconds, the fourth attribute are the litres the truck used for the traveled distance, the fith attribute "weight" is the averrage weight of the truck during the journey, the six attribute is the note calculate for the driver beacause of his style of driving and the seventh attribute "Routdificulty" means how hard the rout to drive is, that means for example driving thorugh the mountains with a lot of weight and speed will give a higher mark.
So what i would like to find out is how the ratio between such variable is in average to check the plausibility of each veriable. For example i would like to make conclusions like:" If the DrivenDistance, time,weight,Routdificult, DriverNote the TotalConsumption should be between x and y".
So i started to calculate the correlation between the attributes and they are pretty weak with one exception the TotalConsumption is strongly correlated to Drivendistance (0,944) which is pretty logical. But i know from field tests that the Weight and Routdificulty should influence it more than the correlation schows (0.0177 and 0.22).
So my question is if there is anyway to find out/make conclusion about the ratios between more than 2 variables? should i use another method than the correlation matrix? or should i change my process listed below?:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.1.006">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.1.006" expanded="true" name="Process">
<process expanded="true" height="446" width="628">
<operator activated="true" class="read_excel" compatibility="5.1.006" expanded="true" height="60" name="Read Excel" width="90" x="45" y="120">
<parameter key="excel_file" value="C:\Users\Rojas\Desktop\BA_A-z\Analyse\Rapidminer_Forum.xls"/>
<parameter key="imported_cell_range" value="A1:G77"/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<list key="data_set_meta_data_information">
<parameter key="0" value="VEHICLEID.true.integer.id"/>
<parameter key="1" value="DrivenDistance.true.numeric.attribute"/>
<parameter key="2" value="Time.true.integer.attribute"/>
<parameter key="3" value="TotalConsumptio.true.real.attribute"/>
<parameter key="4" value="Weight.true.numeric.attribute"/>
<parameter key="5" value="DriverNote.true.real.attribute"/>
<parameter key="6" value="RoutDificulty.true.real.attribute"/>
</list>
</operator>
<operator activated="true" class="correlation_matrix" compatibility="5.1.006" expanded="true" height="94" name="Correlation Matrix" width="90" x="246" y="120">
<parameter key="squared_correlation" value="true"/>
</operator>
<connect from_op="Read Excel" from_port="output" to_op="Correlation Matrix" to_port="example set"/>
<connect from_op="Correlation Matrix" from_port="example set" to_port="result 1"/>
<connect from_op="Correlation Matrix" from_port="matrix" to_port="result 2"/>
<connect from_op="Correlation Matrix" from_port="weights" to_port="result 3"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
<portSpacing port="sink_result 4" spacing="0"/>
</process>
</operator>
</process>
Any advice would be highly apreciatted (if i didnt explained it in suffiecient detail or logicaly enough please ask me - english isnt my naitive languae) ;D
0
Answers
there are other means of estimating the value of an attribute. Try e.g. the operators in the Attribute Weighting group (Weight by ...). Also some classifiers create an attribute weighting, e.g. the SVM. If the classifier is capable of detecting attribute interactions, it can deliver more valuable results than the simple good old, but 1-attribute-based correlation.
Good luck for your thesis!
-Marius
thanks for your answer but the Captain still is off course..........
I tried out the SVM Model but dont know if i used it right so I am asking you some more questions sorry for that. If i understood right using the SVM I have to select one attribute as a label and on as the attribute to be predicted hope iam right so far please correct if not. The problem i have is that the data set isn't labeled yet so i chose the id as a label, what maybe is a bit stupid because now each label just consists of only one data record. So next questions is should I try to put the records first in to classes/labels an secondly, would it be useful to put the values of the TotalConsumption into intervals like 30-32 and so on because i think predicting values like 30.46 is not very likely.
Furthermore i would like to know what the values shown by the Kernel Model SVM mean:
Las but not least i am not getting any result on the Performance Vector(SVM) just shows to messages: " svm_objective_function: -6327798.250 " and "no_support_vectors: 76.000 "
my Process: Sorry for asking that much but the manuel didn't helped me with any of this issues .....!!
Thanks for any solution, suggestion or explanations in advance
lets start with your dataset: you are right, the svm needs a label to work correctly. However, this label does not need to be categorical: you can also predict continuous values - this is called regression (in contrast to classification[i/i] for categorical values).
Now let's have a look at your process. First of all, you said you are interested in weights, but you did not connect the weights output of the svm - you should do that You are probably not interested in all other ports in your context
Defining the id as label does not make any sense, as you correctly stated. But you said that probably the relation of one or several attributes to another attribute might be interesting - set that attribute as label.
Last but not least I think you would profit from getting a deeper understanding of data mining in general and with the help of RapidMiner. At least for the latter our video tutorials on our website (also linked from the post in my signature) are a good start.
Best,
Marius