Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
"Read XML, reading Parent attributes"
francis_sathiak
Member Posts: 2 Learner I
Hey All,
Have been using the Data import Wizard, and on Step 4 when you define you xpaths i have an attribute set up using "../" to get a parent attribute of all the entries. In step 4 it actually shows the correct current value but in step 5 when it shows the preview of the 100 rows that defind attribute is blank.. It is also blank when i export the data.
My data looks like this
and set up my attributes as follows
and was hoping for an output of
INCLUSION NSW5 UNNAMED
INCLUSION NSW5128 FEDERAL
EXCLUSION NSW5005 FEDERAL
EXCLUSION NSW5025 FEDERAL
EXCLUSION NSW5505 FEDERAL
INCLUSION NSW1706 UNNAMED
INCLUSION NSW7030 FEDERAL
But the attributes for spatialRuleCode are blank any help?? hopefully its just the notation of '..' that is wrong..
Have been using the Data import Wizard, and on Step 4 when you define you xpaths i have an attribute set up using "../" to get a parent attribute of all the entries. In step 4 it actually shows the correct current value but in step 5 when it shows the preview of the 100 rows that defind attribute is blank.. It is also blank when i export the data.
My data looks like this
<routeCondition> <spatialRuleCode>INCLUSION</spatialRuleCode> <segment> <persistentIdentifier>NSW5</persistentIdentifier> <segmentText>UNNAMED</segmentText> </segment> <segment> <persistentIdentifier>NSW5128</persistentIdentifier> <segmentText>FEDERAL</segmentText> </segment> <routeCondition> <spatialRuleCode>EXCLUSION</spatialRuleCode> <segment> <persistentIdentifier>NSW5005</persistentIdentifier> <segmentText>FEDERAL</segmentText> </segment> <segment> <persistentIdentifier>NSW5025</persistentIdentifier> <segmentText>FEDERAL</segmentText> </segment> <segment> <persistentIdentifier>NSW5505</persistentIdentifier> <segmentText>FEDERAL</segmentText> </segment> <segment> <persistentIdentifier>NSW500517065</persistentIdentifier> <segmentText>FEDERAL</segmentText> </segment> <routeCondition> <spatialRuleCode>INCLUSION</spatialRuleCode> <segment> <persistentIdentifier>1706</persistentIdentifier> <segmentText>FEDERAL</segmentText> </segment> <segment> <persistentIdentifier>7030</persistentIdentifier> <segmentText>FEDERAL</segmentText> </segment>Because the number of segments change for each route condition, i had selected my xpath for examples to be //routeCondition/segment
and set up my attributes as follows
../spatialRuleCode/text() persistentIdentifier[1]/text() segmentText[1]/text()
and was hoping for an output of
INCLUSION NSW5 UNNAMED
INCLUSION NSW5128 FEDERAL
EXCLUSION NSW5005 FEDERAL
EXCLUSION NSW5025 FEDERAL
EXCLUSION NSW5505 FEDERAL
INCLUSION NSW1706 UNNAMED
INCLUSION NSW7030 FEDERAL
But the attributes for spatialRuleCode are blank any help?? hopefully its just the notation of '..' that is wrong..
Tagged:
0
Best Answer
-
kayman Member Posts: 662 UnicornWhen you have nested and repetitive XML I think it's better to use the XSLT operator (part of the text mining extension).
Find attached a working example based on your data (thought the XML you provided is not properly build so I modified it a bit)<operator activated="true" class="process" compatibility="9.0.003" expanded="true" name="Process"> <process expanded="true"> <operator activated="true" class="text:create_document" compatibility="8.1.000" expanded="true" height="68" name="XML" width="90" x="112" y="34"> <parameter key="text" value="<root> 	<routeCondition> 		<spatialRuleCode>EXCLUSION</spatialRuleCode> 		<segment> 			<persistentIdentifier>NSW5005</persistentIdentifier> 			<segmentText>FEDERAL</segmentText> 		</segment> 		<segment> 			<persistentIdentifier>NSW5025</persistentIdentifier> 			<segmentText>FEDERAL</segmentText> 		</segment> 		<segment> 			<persistentIdentifier>NSW5505</persistentIdentifier> 			<segmentText>FEDERAL</segmentText> 		</segment> 		<segment> 			<persistentIdentifier>NSW500517065</persistentIdentifier> 			<segmentText>FEDERAL</segmentText> 		</segment> 	</routeCondition> 	<routeCondition> 		<spatialRuleCode>INCLUSION</spatialRuleCode> 		<segment> 			<persistentIdentifier>1706</persistentIdentifier> 			<segmentText>FEDERAL</segmentText> 		</segment> 		<segment> 			<persistentIdentifier>7030</persistentIdentifier> 			<segmentText>FEDERAL</segmentText> 		</segment> 	</routeCondition> </root>"/> </operator> <operator activated="true" class="text:create_document" compatibility="8.1.000" expanded="true" height="68" name="XSLT" width="90" x="112" y="136"> <parameter key="text" value="<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> 	<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/> 		<xsl:template match="/"> 		<root> 		<xsl:for-each select="//routeCondition"> 			<xsl:variable name="spatialRuleCode" select="spatialRuleCode"/> 			<xsl:for-each select="segment"> 			<row spatialRuleCode="{$spatialRuleCode}" persistentIdentifier="{persistentIdentifier}" segmentText="{segmentText}"/> 			</xsl:for-each> 		</xsl:for-each> 		</root> 	</xsl:template> </xsl:stylesheet>"/> </operator> <operator activated="true" class="text:process_xslt" compatibility="8.1.000" expanded="true" height="82" name="Process XSLT" width="90" x="246" y="34"/> <operator activated="true" class="text:cut_document" compatibility="8.1.000" expanded="true" height="68" name="Cut Document" width="90" x="380" y="34"> <parameter key="query_type" value="XPath"/> <list key="string_machting_queries"/> <list key="regular_expression_queries"/> <list key="regular_region_queries"/> <list key="xpath_queries"> <parameter key="row" value="//row"/> </list> <list key="namespaces"/> <parameter key="ignore_CDATA" value="false"/> <parameter key="assume_html" value="false"/> <list key="index_queries"/> <list key="jsonpath_queries"/> <process expanded="true"> <operator activated="true" class="text:extract_information" compatibility="8.1.000" expanded="true" height="68" name="Extract Information" width="90" x="179" y="34"> <parameter key="query_type" value="XPath"/> <list key="string_machting_queries"/> <list key="regular_expression_queries"/> <list key="regular_region_queries"/> <list key="xpath_queries"> <parameter key="spatialRuleCode" value=".//@spatialRuleCode"/> <parameter key="persistentIdentifier" value=".//@persistentIdentifier"/> <parameter key="segmentText" value=".//@segmentText"/> </list> <list key="namespaces"/> <parameter key="ignore_CDATA" value="false"/> <parameter key="assume_html" value="false"/> <list key="index_queries"/> <list key="jsonpath_queries"/> </operator> <connect from_port="segment" to_op="Extract Information" to_port="document"/> <connect from_op="Extract Information" from_port="document" to_port="document 1"/> <portSpacing port="source_segment" spacing="0"/> <portSpacing port="sink_document 1" spacing="0"/> <portSpacing port="sink_document 2" spacing="0"/> </process> </operator> <operator activated="true" class="text:documents_to_data" compatibility="8.1.000" expanded="true" height="82" name="Documents to Data" width="90" x="514" y="34"> <parameter key="text_attribute" value="tmp"/> </operator> <operator activated="true" class="select_attributes" compatibility="9.0.003" expanded="true" height="82" name="Select Attributes" width="90" x="648" y="34"> <parameter key="attribute_filter_type" value="subset"/> <parameter key="attributes" value="query_key|tmp"/> <parameter key="invert_selection" value="true"/> </operator> <connect from_op="XML" from_port="output" to_op="Process XSLT" to_port="document"/> <connect from_op="XSLT" from_port="output" to_op="Process XSLT" to_port="xslt document"/> <connect from_op="Process XSLT" from_port="document" to_op="Cut Document" to_port="document"/> <connect from_op="Cut Document" from_port="documents" to_op="Documents to Data" to_port="documents 1"/> <connect from_op="Documents to Data" from_port="example set" to_op="Select Attributes" to_port="example set input"/> <connect from_op="Select Attributes" from_port="example set output" to_port="result 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> </process> </operator> </process>
6
Answers