Split Dcoument by Content

andkandk Member Posts: 21 Maven
edited November 2019 in Help

As I am a Newbie do RM I have a question regarding the "Split Dcoument by Content" operator. I have to supply an input and an output folder, so I don't really get why the software asks me to connect the ports.? In general is there a description about the port shortcuts, this means thr, op, doc etc.? Am I right when I suppose that the purchase of this "How to extend RM 5.0"? I mean the name implies that it is mainly directed to developers but as i assume that it is a further development of the former "rm tutorial...." and so it also contains detailed description of the different operators and processes.

Help would be highly appreciated! Best regards,



  • haddockhaddock Member Posts: 849 Maven
    Bonjour André,

    No need to buy the Extenders manual, but it is probably a good idea to check out page 33 of the normal manual ( available here http://rapid-i.com/content/view/26/84/ ) for info about what the graphics mean. On whether you have to connect your operators, it depends on the context; don't be alarmed by warning messages, if in doubt start the process to find out! RM is pretty forgiving.

    Good weekend!

  • andkandk Member Posts: 21 Maven
    thank you, ok i really should have read the manual with more caution!  ;)

    best regards,

  • andkandk Member Posts: 21 Maven

    i am sorry but i couldn't find any solution in the manual regarding my problem. i am trying to split a collection of xml documents by a xpath query with the "split file by content" operator. for now this is everything i want to do. i supply the input folder (on my harddisk) at the properties "texts", and the output folder in "output" and of course i define the xpath at which each document in the collection should be split as well. nevertheless the operator still asks me for a connection of ports. i can't understand why is this case as all this process should do is to split my files and store it at the defined location. in the description of the operator it says: input: through 1 output: through 1 .... what does that mean? what is this thr port for? these are infos i couldn't find in the manual. it would be very nice if someone could help me with this.

    best regards,


  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    you don't need to connect the ports at all, they won't change the data anyway. It is just a help to define the order of execution. Without any connection, there's no order defined, so it might happen that although you want to process the files that result from the split you will split them after trying to process them.
    There's a button to show the actual order if you don't want to use the through ports.

Sign In or Register to comment.