The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
How to handle a CSV file that has JSON columns in Rapidminer Studio
Hello everyone,
I am new to Rapdiminer Studio. I want to create a prediction model using the TMDB-Box-Office dataset. This dataset is given as a CSV file, but some columns of the file have the JSON data. Could you advise a process that can read this file correctly and make it ready for building a prediction model? The dataset is attached to the post.
Your help is much appreciated!
Thanks,
xc
I am new to Rapdiminer Studio. I want to create a prediction model using the TMDB-Box-Office dataset. This dataset is given as a CSV file, but some columns of the file have the JSON data. Could you advise a process that can read this file correctly and make it ready for building a prediction model? The dataset is attached to the post.
Your help is much appreciated!
Thanks,
xc
0
Best Answer
-
rjones13 Member Posts: 204 UnicornHi cx,
It definitely looks like it, I must admit I wasn't aware of that functionality! I will run a few tests on my side over the weekend to confirm, but you should be fine with this.
Best,
Roland0
Answers
I had a quick test and it's definitely possible to read it in, however I would ask what format you have planned? Is it the case that the json entries need to turn into additional columns?
Best,
Roland
Thanks for the quick response. Most datasets I used to build models in Rapidminer Studio have regular rows and columns. In those datasets, each cell has only one value. This dataset embeds JSON data in several columns. I wonder if I need to transform this dataset into a regular dataset in which each cell has only one value before I supply it to an algorithm to create a model or if Rapidminer Studio can read this data format and process it correctly to build models. If I need to transform manually before I can build models, I would think I need to transform it into a regular dataset in which each cell has only one value. I probably need to do some pivotings on the dataset.
Thanks,
xc
This surprises me slightly as I'm not seeing the same behaviour. Would you be able to send a screen shot of the Select Task stage of Auto Model, like I've shown here:
Best,
Roland
Please see the attached image for the Select Task stage and the result. I didn't see the genre values in the decision tree, but cast and crew values are.
Thanks,
xc