The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Read SPSS ==> Attribute Selector returns no attributes at all
Hello everyone,
I'm new here so please be gentle on me.
Anyway, I have an SPSS file ready to be imported into RapidMiner. I've created an SPSS Reader from import --> data --> SPSS.
The reader works just fine and when I set it to output immediately I get a nice table of my data.
However, I need to filter this data (I need to remove some attributes).So I added an "Attribute Selector" and connected the two. This worked just fine when importing normal CSV files.
For some reason the Attribute Selector node never receives any output from the SPSS node. The same happens when I put a node like the Sample filter node in there: it never receives output from SPSS Reader. Yet, the SPSS reader does return my table correctly when I assign it directly to the final result port.
What might be causing this? Am I doing something wrong?
I'm new here so please be gentle on me.
Anyway, I have an SPSS file ready to be imported into RapidMiner. I've created an SPSS Reader from import --> data --> SPSS.
The reader works just fine and when I set it to output immediately I get a nice table of my data.
However, I need to filter this data (I need to remove some attributes).So I added an "Attribute Selector" and connected the two. This worked just fine when importing normal CSV files.
For some reason the Attribute Selector node never receives any output from the SPSS node. The same happens when I put a node like the Sample filter node in there: it never receives output from SPSS Reader. Yet, the SPSS reader does return my table correctly when I assign it directly to the final result port.
What might be causing this? Am I doing something wrong?
0
Answers
the output is recieved normally, however what you are missing is the metadata for the ReadSPSS operator, which is used to show the expected outcome after each operator to ease process design. Missing metadata there will not break your process, you can still put for example the Select Attributes operator after it and it will work anyway.
However I don't see how you can see metadata for the Read CSV operator, it will also not show any metadata because it would actually have to read the file for that and that is often not desired (and can severely impact performance)
Regards,
Marco
As mentioned, the attribute selector returns no elements. So how do I select the elements that it should not filter out?
umm, by simply typing in the names of the desired attributes? This is possible either in the subset selection GUI by typing and adding each desired attribute or by using a regular expression by using "|" between the attribute names.
If you want the most comfortable way just use the preferred way which is also suggested in the manual: load the data and directly store it in your repository with the operator "Store". Use the repository entry then, it will deliver all meta data and you can use those at all places during process design.
Cheers,
Ingo
So how do I for example set it to only allow variables test1, test2 and test 3?
Also, the Store method does not allow me to import SPSS files.
The other option would be to use the parameter setting "regular_expression" and use the expression "test1|test2|test3". Easy for only a few attributes but less comfortable for hundreds of them
As I said before, the best way is to use the repository to get all the good things about the meta data propagation (read the manual about how and why). So we should resolve the following problem: Of course not. It should not import, but store. Import it with the, well, import operator "Read SPSS". So you will end with a small import process consisting of only two operators: "Read SPSS" and "Store". In a second (third...) process, you can start with the actual data transformation and analysis from the example set stored in the repository and the SPSS file will no longer be used. Then you will always get the (transformed) meta data during the process design - at least as far as possible.
Cheers,
Ingo
Thanks a lot for the help.
glad to hear that.
About the store: Did you really have used two separate processes? (First one: just read the data and store it; second one: use the stored data set as starting point for the selection). From your text I would assume that you only have used "Store" which alone is of course not going to help at all. Please refer to the manual about more information about how to use the repository and why - it is really worth the efforts since it will dramatically increase your RapidMiner experience
Cheers,
Ingo