Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

"XML parser seems to lack robustness"

aruberutouaruberutou Member Posts: 23 Contributor II
edited June 2019 in Help
Hello,

Rapidminer is a lovely tool and has helped my work tremendously. One glaring weakness, however, is the limited capacity of "Read XML".

It chokes on even moderately sized xml files. As a work-around, I have taken to using BASEX to process my giant XML files (1-2GB or so), into lighter-weight Excel-readable XML files. I then load that Excel file into Rapidminer.

Obviously, this is not a deal-breaker, but it would be nicer if I could simply do everything within Rapidminer.

Thanks,
Tagged:

Answers

  • JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 578 Unicorn
    Which version of RapidMiner are you using? 
    The old version 5.3 does have a problem with reading XML files & I know that the library was updated for 6.4 so it should be better now. 

    However, if you are having problems still with the speed of it running try exploring some of the XML parsing features in Groovy Script, they're pretty good. 
    I had to read large XML files with 5.3 and solved the issue by writing a short groovy script to parse the files for me as needed and return an example set back to RM. 

    Good luck!
  • aruberutouaruberutou Member Posts: 23 Contributor II
    Hi,

    Thanks for the follow-up. I am actually not at all familiar with Groovy script. How would I got about setting that up? I am indeed using the most current version of Rapidminer, but I still get performance issues. Perhaps part of the problem is my using the wizard interface, rather than something more programatic.

    Thanks for the tip!
Sign In or Register to comment.