"XML parser seems to lack robustness"

aruberutouaruberutou Member Posts: 23 Contributor II
edited June 2019 in Help

Rapidminer is a lovely tool and has helped my work tremendously. One glaring weakness, however, is the limited capacity of "Read XML".

It chokes on even moderately sized xml files. As a work-around, I have taken to using BASEX to process my giant XML files (1-2GB or so), into lighter-weight Excel-readable XML files. I then load that Excel file into Rapidminer.

Obviously, this is not a deal-breaker, but it would be nicer if I could simply do everything within Rapidminer.



  • Options
    JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 578 Unicorn
    Which version of RapidMiner are you using? 
    The old version 5.3 does have a problem with reading XML files & I know that the library was updated for 6.4 so it should be better now. 

    However, if you are having problems still with the speed of it running try exploring some of the XML parsing features in Groovy Script, they're pretty good. 
    I had to read large XML files with 5.3 and solved the issue by writing a short groovy script to parse the files for me as needed and return an example set back to RM. 

    Good luck!
  • Options
    aruberutouaruberutou Member Posts: 23 Contributor II

    Thanks for the follow-up. I am actually not at all familiar with Groovy script. How would I got about setting that up? I am indeed using the most current version of Rapidminer, but I still get performance issues. Perhaps part of the problem is my using the wizard interface, rather than something more programatic.

    Thanks for the tip!
Sign In or Register to comment.