The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"Any tips on optimizing the Read XML operator?"
JEdward
RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 578 Unicorn
Hi,
I've a rather lengthy process that at one point reads an XML file using the ReadXML operator and I've found this is a bottle neck in execution speed.
The XML file is only 1,000 records in total with 20 attributes of which the operator only extracts 4 of these fields. Yet it takes around 1minute 30 seconds to run each time. (Doesn't sound like much, but it's going to loop over several hundred of these files)
Are there any tips on speeding up execution time of this operator?
Would it help if I turned off Parse Numbers, Read not matching values as missings or changed data management from double_array to a different value?
I've a rather lengthy process that at one point reads an XML file using the ReadXML operator and I've found this is a bottle neck in execution speed.
The XML file is only 1,000 records in total with 20 attributes of which the operator only extracts 4 of these fields. Yet it takes around 1minute 30 seconds to run each time. (Doesn't sound like much, but it's going to loop over several hundred of these files)
Are there any tips on speeding up execution time of this operator?
Would it help if I turned off Parse Numbers, Read not matching values as missings or changed data management from double_array to a different value?
Tagged:
0
Answers
One possible workaround could be to split the XML into several pieces.
The following process is only an Example and will not fit to your needs: