Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
"XML parser seems to lack robustness"
aruberutou
Member Posts: 23 Contributor II
Hello,
Rapidminer is a lovely tool and has helped my work tremendously. One glaring weakness, however, is the limited capacity of "Read XML".
It chokes on even moderately sized xml files. As a work-around, I have taken to using BASEX to process my giant XML files (1-2GB or so), into lighter-weight Excel-readable XML files. I then load that Excel file into Rapidminer.
Obviously, this is not a deal-breaker, but it would be nicer if I could simply do everything within Rapidminer.
Thanks,
Rapidminer is a lovely tool and has helped my work tremendously. One glaring weakness, however, is the limited capacity of "Read XML".
It chokes on even moderately sized xml files. As a work-around, I have taken to using BASEX to process my giant XML files (1-2GB or so), into lighter-weight Excel-readable XML files. I then load that Excel file into Rapidminer.
Obviously, this is not a deal-breaker, but it would be nicer if I could simply do everything within Rapidminer.
Thanks,
Tagged:
0
Answers
The old version 5.3 does have a problem with reading XML files & I know that the library was updated for 6.4 so it should be better now.
However, if you are having problems still with the speed of it running try exploring some of the XML parsing features in Groovy Script, they're pretty good.
I had to read large XML files with 5.3 and solved the issue by writing a short groovy script to parse the files for me as needed and return an example set back to RM.
Good luck!
Thanks for the follow-up. I am actually not at all familiar with Groovy script. How would I got about setting that up? I am indeed using the most current version of Rapidminer, but I still get performance issues. Perhaps part of the problem is my using the wizard interface, rather than something more programatic.
Thanks for the tip!