The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"XML parser seems to lack robustness"
aruberutou
Member Posts: 23 Contributor II
Hello,
Rapidminer is a lovely tool and has helped my work tremendously. One glaring weakness, however, is the limited capacity of "Read XML".
It chokes on even moderately sized xml files. As a work-around, I have taken to using BASEX to process my giant XML files (1-2GB or so), into lighter-weight Excel-readable XML files. I then load that Excel file into Rapidminer.
Obviously, this is not a deal-breaker, but it would be nicer if I could simply do everything within Rapidminer.
Thanks,
Rapidminer is a lovely tool and has helped my work tremendously. One glaring weakness, however, is the limited capacity of "Read XML".
It chokes on even moderately sized xml files. As a work-around, I have taken to using BASEX to process my giant XML files (1-2GB or so), into lighter-weight Excel-readable XML files. I then load that Excel file into Rapidminer.
Obviously, this is not a deal-breaker, but it would be nicer if I could simply do everything within Rapidminer.
Thanks,
Tagged:
0
Answers
The old version 5.3 does have a problem with reading XML files & I know that the library was updated for 6.4 so it should be better now.
However, if you are having problems still with the speed of it running try exploring some of the XML parsing features in Groovy Script, they're pretty good.
I had to read large XML files with 5.3 and solved the issue by writing a short groovy script to parse the files for me as needed and return an example set back to RM.
Good luck!
Thanks for the follow-up. I am actually not at all familiar with Groovy script. How would I got about setting that up? I am indeed using the most current version of Rapidminer, but I still get performance issues. Perhaps part of the problem is my using the wizard interface, rather than something more programatic.
Thanks for the tip!