The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Vlookup in Rapid Miner?
Hello!
I'm quite new to Rapid Miner and was wondering if there's a VLOOKUP equivalent in it?
I work with two sets of data - the first one being sales file where our supplier lets us know which products have sold, how much etc. The second file is a reference file, which I use to check all product names with codes (ie: Product 1 = GBCIF281939499).
Is there a way to do this in Rapid Miner?
Kyra
1
Answers
hello @kyra,
if you choose the "Turbo Prep" in the Views, you will see that one of the actions there is called PIVOT, if you click it, it will give you the opportunity to group the data by the columns you want, and it also gives the possibility to aggregate. try it might be it will help you with your concern
as I am also new to RapidMiner, I would be so pleased if this answer could help :smileyhappy:
Cheers and Good luck
Jana
Hi @kyra,
the concept of Excel's VLOOKUP is called a "join" everywhere else.
Use the Join operator with the two input example sets, and set the common attribute (the product code).
Decide if you only want to keep matching rows (inner join) or also keep entries without matching product codes (left or right join).
Regards,
Balázs
Thank you, that was super helpful!
Is there a way to create a process where Rapid Miner creates a new column with the matching product codes?
For example - after cross referencing both files, a new column is created in the output file called Final Product Code (for example), where it shows you a list of all the Product Codes it was able to match, and highlights the ones that weren't matching or missing?
Hi @kyra!
The Join operator has a setting for keeping both key attributes. (You might have to activate the Expert mode in the Parameters panel.)
To create a new attribute, you use Generate Attributes. It has a graphical formula editor which lists the functions you'll need (if(), missing() etc.).
Regards,
Balázs
Thanks for your response - perhaps I need to provide more info as I'm still having a bit of trouble creating the workflow i want:
Am I right in thinking that Turbo Prep is not available in the free version? Is that a paid add on?
How about setting up a "Master Mapping Table" that maps all of the field names in the data that should map back to each of your core values, like Region, ISRC, Amount, etc? would imagine that there are other line of business applications in your company that could also make use of such a Mapping Table - so it could be worth starting a separate project to resolve these inconsistencies. Data Warehouses couldn't function without these types of tables, which are also known as Dimension Tables.
Once the Mapping Table has been put together, it seems to me that you could do what you want to do with the RapidMiner JOIN operator - with the join type being INNER.
Key point: the Mapping Table has to be maintained - because if another "business synonym" for one of your "core fields" enters the data, you'll need to map this new "synonym" to the appropriate "core" field name. This is like what you need to do if you're tracking retail product sales - as new products come to market, you need to segment them into categories and sub-categorises so that aggregates by categories and sub-categories capture the sales of these new products. Hope this helps, and please write back if you have any questions. Best wishes, Michael Martin