"Append" problem
Dear Rapidminer Community
I have a strange problem with "append" I hope you can help me with!
I have 7 datasets with the same attributes which I want to append but I am only able to "append" 5 of them. I have tried now everything to get the same attribute names but it still doesn't work. I have even copied the attribute names from a working ("appendable") dataset and implemented it in the table which can't be appended. Still doesnt work. Furthermore, I have made sure that there are everywhere the same datatypes. And I have restarted my computer just to make sure this wouldn't be the problem..
Is there anything else I can check? The error message I get is: "The attributes of the example set have to have the same attributes". But no further information where the problem ist.
I am quite clueless what else I could do..
Best regards
Felix
Answers
hello @felix_w - I'd recommend posting your XML process and attach your dataset. This way we can replicate what you're doing and help you better.
Thanks!
Scott
Hi Scott,
normally I would do that but the data and the process is sensitive and I can't post it here. :-/
Do you maybe have any general idea what could be the problem?
Best regards
Felix
Hi @felix_w,
- Have you try to append the same "appendable" dataset seven times ?
- Have you try to append the same "non appendable" dataset seven times ?
Regards,
Lionel
Hi Lionel,
I have 7 different datasets. I have been able to append 5 of them. 2 of them can't be appended with the other 5 and also not with each other and I have no idea why.
Regards
Felix
Update: I am using exactly the same process set up for the non working datasets as for those who work. Moreover, I also copied the content from the excel into a new excel file and used the attribute names from a working dataset. Still, not appendable.
Anybody else had this problem before?
so it's really hard to tell without seeing the data sets. The sets need to be EXACTLY the same to append. Have you tried just doing two at a time? Union instead of Append?
Scott
Hi @felix_w,
Did you make sure that the Attribute names have no leading or trailing blanks?
The Append Operator requires that all Attributes are available in all ExampleSets which you want to Append - Is this true in your task?
Best,
Edin
And finally: they value types need to be exactly the same as well. So if one column is binominal in one dataset but polynominal in another then this will fail as well. Typically the error message should give you a hint though what the nature of the problem is...
Hope one of these ideas is the right one and let us know how it goes,
Ingo
As @IngoRM said, the data types in RapidMiner need to be EXACTLY the same. I've gotten tripped up on this before with numbers, where one dataset maybe had only whole numbers, so RapidMiner decided it was "integer" data type, while another set had decimal values, which RapidMiner treats as "real" type. Having the same name and general data type isn't enough. So my suggestion is to go through all the attributes one by one in the two sets that won't append and make sure there are no differences in the data types.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
Hi again Felix,
Maybe you can use the Select Attributes operator, to append your 7 datasets
with only the first attribute, then only the second , and so on ..... in order to determine which attribute(s) is (are) problematic.
I hope it helps
Regards,
Lionel
Hi all,
thank you very much for all your post and help!!
I have now selected each attribute seperately and looked for the trouble maker. I found out that two attributes are troublesome although they are written totally similarly as in the other data sets and have the same data type. The attribute contains a "€" sign, could this be a problem? But as I said, it has also not been troublesome in the other datasets, which I dont understand.
The whole attribute is called: "Entry Netz€/Mwh". When I tried to rename it (with the rename operator) I got an error message which stated "Potential problem detected; The attribute Entry Netz€/Mwh is missing in the input example set. Which is not true because it is there.. but as soon as I try to rename I get this error message. The "quick fix" only tells me to stay with the old name and don't change it.
Any solutions for this? Is the € sign maybe the big problem?? And why can't I rename the attribute?
Best regards
Felix
Ignore the potential problem error message, it has to do with metadata that isn't propagating. You should still be able to run the process with the Rename operator.
The original attribute contains two special characters, the one you mention (I don't have it on my keyboard) and also the "/" sign. These could both be causing problems. I would use Rename to get rid of both of those.
You can also use "Rename by Replacing" and use regular expressions to get rid of any non-word characters. That's usually what I do anyways to avoid these types of problems. Other programs can be finicky about special characters in attribute names as well if you ever need to export from RapidMiner.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
Hi!
Many of these warnings are just warnings. It's more likely that at some point of your process RapidMiner just doesn't carry the metadata anymore, so the Rename operator doesn't know about that attribute. The special characters are usually not at all problematic. I've seen example sets with Chinese and Japanese characters; RapidMiner itself sometims generates attribute names with special characters.
You might want to check out this building block: Append with Union
Make sure to read How do I use Building Blocks? to use it.
To get good results, you really need to have the same set of attributes. But this will tolerate small differences (one attribute missing, some additional ones). It could still fail if you have the same name but not the same type.
Regards,
Balázs
Thank you all for your suggestions and continuous help
I have solved the problem now differently because "rename" and "rename by replacing" hasn't worked for me. I only got this error message (see picture).
For everybody else who is reading this post and is looking for a solution:
I have changed the name of the problematic attributes directly in the "read excel" operator for all the files and now it works and I can append all the sets together.
Again, thank you all very much for your support!
Best regards
Felix
AH, this might be related to an Encoding issue. I saw that fancy E with two lines through it in your attribute name. That might be causing the error. Maybe.
Can you just do a Rename by Replacing operator and replace with an underscore. Use \W to get rid of all the non-letter stuff.
yes @Thomas_Ott I've been thinking this may be an encoding issue all along.
@felix_w - what is your encoding scheme for both your system and for your Read Excel?
Scott
The encoding I used in read excel is "SYSTEM", which is standard I think.
My laptop is encoded to german style I guess, as I live in Austria ;-) Normally it should know a "€" sign.