The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"split operator"
Hi all,
I'm havving probblems with the split transformation.
Example wise, my data set is:
ID, File name
I'm trying to extract the extension from the file name and create a new attribute with it.
and end up with the following:
ID, File name, File extension
It just seems not to be doing anything. When I write the output to a csv the file is excactlly the same as the original, no new attributes are generated.
Can any body advise?
Thanks
I'm havving probblems with the split transformation.
Example wise, my data set is:
ID, File name
I'm trying to extract the extension from the file name and create a new attribute with it.
and end up with the following:
ID, File name, File extension
It just seems not to be doing anything. When I write the output to a csv the file is excactlly the same as the original, no new attributes are generated.
Can any body advise?
Thanks
Tagged:
0
Answers
what did you use as split pattern? A simple dot? Be aware, that this pattern uses regular expressions, so if you want to split at the dot you have to escape it with a backslash (otherwise it is used as meta sign in the expression): \.
Afterwards you have to rename the attributes, if you want to have the names you posted. Perhaps you should take the "Generate Extract" operator into consideration, which allows naming the new attributes directly and also takes something from an existing attribute via regular expression.
Regards,
Matthias
Thanks for the reply.
If i escape the dot I get 20 new attributeseedom when choosing file names is not always good SNIF
Using the follwing expression:
\.[^.]*$
Wich should match the last dot followed by any character and the end of the string.
But the result set is exacttly the same
I suppose the whole pattern match is used for the split. If you have a filename "some.filename.ext" your pattern will match everything from the last dot: ".ext". So the filename is splitted at every occurrence of ".ext" which always is the last part, so that the part after the split always will be empty. If you still don't want to switch to the "Generate Extract" operator you must use assertions (because the matches are not considered in the overall match). If you want to match the last dot in a string you could use \.(?!.*\.)
Regards,
Matthias
Thanks a lot, that did the trick.
As for the generate extract operator, I can't find it under data transformation operators.
Cheers
miguel
you can find the "Generate Extract" under Data Transformation, Attribute Set Reduction and Transformation, Generation. But it's quite simpler to use the operator search feature
If you want to use it, here is a little example: The first capturing group is used as value for the new attribute.
Regards,
Matthias
note that "Generate Extract" is in the Text Mining Extension.
Best,
Simon
Perhaps it would be a simple and slight increase of overview to add a note to the operator description which tells about the operator's belonging to an extension?
Regards,
Matthias
if you post a process into the forum or to myExperiment, the XML code will contain the information from which extension an operator comes. RapidMiner will offer a quick fix to install missing extensions when opening such processes.
In RapidMiner itself, extensions are mostly color-coded.
Best,
Simon