The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"Usage of Excel, CSV and SPSS Daa-Files"
misskeynes
Member Posts: 4 Contributor I
Hello,
I am just getting started with this powerful tool and after following the tutorial I am thrilled because of the many possibilties RapidMiner offers though - or maby because - beeing an open source Software. Unfortunately I am having some problems with using my data, which is in SPSS, Excel or CSV-Format. When using the Wizard, my SPSS Files are not recognized, the data looks totally chaotic in the preview, my CSV can be recognized, but there are some problems concerning the columns, the wizard tells me, that there are different numbers of columns detected in different rows. I could not find any conclusion for my problems in the tutorial or in the comunity-discussions. Can somebody please help me?
I´m working with MS Windows XP. Is that a problem? I have installed all new java-updates etc.
Thanks a lot in advance,
greetz from munich
Vera
I am just getting started with this powerful tool and after following the tutorial I am thrilled because of the many possibilties RapidMiner offers though - or maby because - beeing an open source Software. Unfortunately I am having some problems with using my data, which is in SPSS, Excel or CSV-Format. When using the Wizard, my SPSS Files are not recognized, the data looks totally chaotic in the preview, my CSV can be recognized, but there are some problems concerning the columns, the wizard tells me, that there are different numbers of columns detected in different rows. I could not find any conclusion for my problems in the tutorial or in the comunity-discussions. Can somebody please help me?
I´m working with MS Windows XP. Is that a problem? I have installed all new java-updates etc.
Thanks a lot in advance,
greetz from munich
Vera
0
Answers
unfortunately the wizard does not support Excel or SPSS file directly. For reading these files you should simply place (drag and drop) a ExcelExampleSource or SPSSExampleSource operator from the New Operator tab, group IO.Examples into the operator tree and specify the file that you want to read in the file parameter. For CSV files the appropriate operator is named CSVExampleSource. I hope that solves your problem,
Regards,
Tobias
thanks for you quick reply.
it seems to be kind of muphy´s-law-like: ...now i´m able to import the spss data. great! but: when dragging the spssexamplesource to the left, I resume this errormessage:
G Jun 24, 2008 2:01:27 PM: [Error] Parameter 'filename' is not set and has no default value.
the when trying to run a Decision-Tree (or anything else) over the data, i resum this errormessage:
Error in: DecisionTree (DecisionTree) Input example set does not have a label attribute Many operators like classification and regression methods or the PerformancEvaluator require the input example sets to have a label or class attribute. If this not the case, applying these operators is pointless. If you read the data using an ExampleSource, you can specify the label attribute by using a 'label' tag in the attribute description file.
How can i define these attrbutes? it seemed to be so easy using the wizard...
sorry for the stupid questions, is there any further thing like a tutorial or similar that helps me getting comfortable with the software? i´, very eager to get to know it better, since it seems to be the missing link for solving most of my analytical problems.
thanks in advance,
greeting from munich
vera
well it might of course be a little bit complicated for novice users, but I think if you have once understood the concepts or - as you said - got comfortable with it, the functionaly of RapidMiner is nearly overwhelming. A good general start for learning to use RapidMiner is the built in tutorial which gives kind of a guided tour by presenting some example processes. The first processes mainly show how to load in the data, then how to define special attributes (see below) and then how to learn a model on the data. You might only need to have a look at the first few steps of the tutorial and you will get an understanding of how to do some simple analyses with RapidMiner.
But anyway, you are not far from successfully setting up an analysis. So first you already dragged an [tt]SPSSExampleSource[/tt] operator into the operator tree. The error shown then is simply because you have not specified a filename yet. Therefore click on the operator in the operator tree, and the parameters will show up on the right side. There simply specify a file in the parameter [tt]filename[/tt] and this will do. Then you have to specify a label or target variable (as it is called by statisticians). This can be done by the operator [tt]ChangeAttributeRole[/tt] which can again be found in the operator lists on the right sight in the group Preprocessing.Attribtues.Filter. In the parameters of the [tt]ChangeAttributeRole[/tt] operator you have to specify the attribute ([tt]name[/tt]) that should be your label and as [tt]target_role[/tt] chose label. Then you are able to use the [tt]DecisionTree[/tt] subsequently.
I hope that clarifies the proceeding a little bit. Otherwise I have attached a demo process which will exactly suit your need. You just have to set the [tt]filename[/tt] in the input operator: If you have more questions, please feel free to ask.
Regards from Dortmund,
Tobias
thank you so much for your reply
a light appeared at the end of the tunnel now...
but i still recieve this error message
Error in: ChangeAttributeRole (ChangeAttributeRole) The attribute 'DEGREE' does not exist. The example set does not contain an attribute with the given name.
i tried to filll in a target variable which exists in my dataset instead of degree, too, but it didn´t work either.
have you got any further advice for me?
sorry for bugging...but i just cant wait to have a first experience of success, because i too believe, that the possibilities rapidminer offers are overwhelming as you said.
thanks in advance
vera
well .... ... of course the attribute DEGREE does presumably not exist in your dataset. Hence, you have to fill in the name of a variable which exists in your data set. I think the matching is case sensitive, so you have to be quite exact when specifying the name. You might want to put a break point after the input operator (by double-clicking on it) and then have a look at the meta data of your data set to find the name of a suitable attribute which you can then fill in the parameter of the [tt]ChangeAttributeRole[/tt] operator.
Hope that helps,
regards,
Tobias
I was not sure, if DEGREE was a technical term or a variable name of your data-set, so I treid both: DeGREE and my target-variable name. my mistake was, to write the variable name in lower case letters, not in capital letters. now my variable seems to be accepted. but now i recieve a new error:
Error in: DecisionTree (DecisionTree) This learning scheme does not have sufficient capabilities for the given data set: numerical label not supported Each learning scheme has particular capabilities for data set handling. For example, some learners can only handle numerical attributes and can not learn from nominal attributes. Please perform a preprocessing step to transform your data set or use an alternative learning scheme. In case of a polynominal label attribute, i.e. a classification task with more than two classes, you can use a learning scheme capable only for binominal classes by wrapping a Binary2MultiClassLearner around the learning operator.
but i think i just have to clean my data and get rid of all varialbes i do not need for the d-tree. no kitchen-sink-approach possible with this tool it seems ;-) spss is more generous, but i forgot that there is no possibility to drop variables from the analysis during the process.
so thanks a lot, i´m on my way now into a brighter future with rapidminer!
have a nice day and thank you for your patience,
vera
Hope that enlightens your future with RapidMiner even more .. ;-)
Regards,
Tobias
Regards,
Tobias
thank you, i just figured out, that "numerical" does not refer to the variable typ but it refers to the fact if a variable ist discrete or metric (do you call it like that in english? must brush mine u a little bit...) now i changed the level to nominal and everything is fine
now i´ll try my best to derive some reasonable models from my data.
thank you so much,
best regards
vera