The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Loop through DataSet with given Header
Artur_Heinle22
Member Posts: 4 Learner III
in Help
Hello everyone,
i have the following Problem. I have an unsorted dataset where i pulled the headers out of this data set. Now I want to build a new data set with the Data and the header. I have the Data Set in Rapidminer i only use Excel to make an example.
Data Set
Header
Result
The data should then be sorted as in the result in a new Data Set.
Look for the Header in the Data Set. If you find one of the header in the row then pick the next value and put in in the new Data Set under the right header. Else ignore the value. If the Header is an specific header like Sonstiges or Index, took every value until the next header or end of the row an put every value until then in one value und safe it in the Result Data Set. The Header in the raw Data Set is not always in the order as in the Result Data Set.
Can you do that with Rapidminer Operators or do you have to build something yourself with Execute Sript?
Best regards,
Artur
Tagged:
0
Answers
First filter by Article, then by Date, then by type and finally by color. Now you can use the aggregate operator to concatenate your Others and your indexes so you get single record, which you can then append all together filter by filter.
So filter article values
- - > filter Date values (for this article)
- -> filter Type values (for this date and article)
- - > filter by color value (for this... You get it)
- - > concat
And loop till end
Any idea if it's like a single character causing the 'jump' in your data? Is it the same in your excel? If not probably something went wrong with the conversion. Otherwise trying to get rid of this 'jumper' on import might make it a lot easier for you.
It seems like you have missing (the question marks) attributes that are caused by an unexpected tab or so, and empty fields that just indicate there is no data for the field but are expected.
If so you could first loop through all the records and remove all the attributes that are missing, if you then append these again your dataset might be lined out again.
Anyway, the order isn't a problem at all as rm will use grouping and sorting behind the scenes when filtering by value. The spread might be more problematic. From Att6 onwards it seems things start to go wrong.
You could also do some tricks by using the loop attributes operator in combi with loop examples, first you loop and validate the content, if it's empty or missing you ignore, if it contains Type you know the next attribute should be your Type value. You store these in macros, do the same for your other known fields and then you generate new attributes with your macro values and just remove the rest. This way you could also construct your table again as it should have been.
Bit hard to explain, but maybe you can figure it out with this. If not you may always share your file so I can have a play with it also. Seems like a nice challenge 😉. But first try and learn yourself of course
Unfortunately there are no magical tools that turn garbage into beautiful data automatically so you are kind of forced to use the loop by example and then for each example by attribute to get rid of the jumps and reorganize the order.
Any way to just export your baseset as an excel that you could share?