RapidMiner 9.8 Beta is now available

Be one of the first to get your hands on the new features. More details and downloads here:

GET RAPIDMINER 9.8 BETA

Concatenate examples from XML files

RomBRomB Member Posts: 3 Contributor I
edited January 2019 in Help
Dear all,

I have a bunch of XML files, that I need to import in Rapidminer and extract some elements.
In each file the text is spread between several <p> tags. In the output example set I can get an example per <p> tag with the file names duplicated:
File Text
File1.xml abc
File1.xml def
File2.xml ghi
File2.xml jkl

I would like to concatenate the examples of the same files to get all the texts from <p> tags in the same cells:
File AllText
File1.xml abcdef
File2.xml ghijkl

I followed this thread but I'm a real beginner and I didn't really understand how it worked:
In my case it doesn't really worked. I have the message "Undefined macro: File_value" where in the example process I used the macro didn't seem defined at this step.
Any help would be appreciated!
Best
Romain
Tagged:

Answers

  • SGolbertSGolbert RapidMiner Certified Analyst, Member Posts: 344   Unicorn
    Hi Romain,

    You have a bug in the Loop Examples operator, more specifically in the Branch operator inside. The condition is
    File = %{File_value} [%{example}]
    but the macro File_value doesn't exist!

    I don't really know what you are trying to do, but at least we have pinpointed the problem's location!

    Regards,
    Sebastian
    sgenzer
  • RomBRomB Member Posts: 3 Contributor I
    Hi Sebastian,

    Thank you for your answer. Yes the Branch operator is exactly the point where I'm stuck.
    To be clearer, I tried to reproduce, with my XML files, the process I'm attaching that was done as a generic use case. In this test process the branch operator doesn't have a bug even while I don't see the macro defined earlier.
    Can you see the difference?

    Thanks again,
    Best regards,

    Romain
  • SGolbertSGolbert RapidMiner Certified Analyst, Member Posts: 344   Unicorn
    Hi Romain,

    sorry, but you lost me with this process. I can see that att3 is generated according to some criterion, which is not clear to me. If I knew what the goal is, I could probably do a simpler process. I suspect that an appropiate Generate Attributes call could do the job much easier.

    Regards,
    Sebastian
  • RomBRomB Member Posts: 3 Contributor I
    Hi Sebastian,

    Sorry, I didn't mean to loose you.

    I tried to explain my goal in my first post. When you import a bunch of XML files with the operators Loop Files + Read XML, you can query the XML elements you want to extract with a XPath query.
    In each file I queried several <p> tags containing all the text I wanted to extract. Then I obtained an example set with 2 attributes : Text, and File. The text inside the <p> tags are in the Text attribute, and one example line is created for each <p>. The filenames are duplicated for each <p> found in each file.

    File Text
    File1.xml abc
    File1.xml def
    File2.xml ghi
    File2.xml jkl

    Now I want to concatenate the examples of the same files and have one cell of concatenated text instead.
    File AllText
    File1.xml abcdef
    File2.xml ghijkl

    I attach a process without the concatenating part so maybe it is clearer.

    The Test7_XML.xml process does that. att1 are the identifiers (eg. File in my case). att2 is the text content (Text in my case). att3 is the concatenation of att2 for each att1 (AllText in my case).
    However when I try to adapt Test7_XML.xml to my case with XML files, it has a bug on the branch operator.

    Thanks again for your invaluable help.

    Romain
Sign In or Register to comment.