how to extract metadata of a table (other than using R)?

jujujuju Member Posts: 39 Guru
edited November 2018 in Help

Hi, how do I programmatically extract the metadata of a table?

For exmaple, how to extract the column name of the label?

I want to automate some workflow based on what the data look like. Thanks

 

I know how to extract metadata in "EXECTURE R". If there's no easy way using RM functions, I'll use R.

 

 

 

Untitled.png

 

sample code:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="6.4.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="6.4.000" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="generate_data" compatibility="6.4.000" expanded="true" height="60" name="Generate Data" width="90" x="112" y="30">
<parameter key="number_examples" value="10"/>
</operator>
<connect from_op="Generate Data" from_port="output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>

 

Tagged:

Best Answer

  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Solution Accepted

    Hey,

     

    Ok, got it.  There might be another way (it's getting late ;-)) but right now the only one I was able to come with was by using a small script in the Execute Script operator.  Below is a process which is doing this.

     

    Hope this helps and I will come back if I can figure out a completely code free way :smileywink:

     

    Best,

    Ingo

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.2.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.2.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="7.2.001" expanded="true" height="68" name="Retrieve Sonar" width="90" x="45" y="34">
    <parameter key="repository_entry" value="//Samples/data/Sonar"/>
    </operator>
    <operator activated="true" class="execute_script" compatibility="7.2.001" expanded="true" height="82" name="Execute Script" width="90" x="179" y="34">
    <parameter key="script" value="ExampleSet exampleSet = operator.getInput(ExampleSet.class);&#10;&#10;Attribute label = exampleSet.getAttributes().getLabel();&#10;if (label != null)&#10;&#9;operator.getProcess().getMacroHandler().addMacro(&quot;label_name&quot;, label.getName());&#10;else&#10;&#9;operator.getProcess().getMacroHandler().addMacro(&quot;label_name&quot;, &quot;null&quot;);&#10;&#9;&#10;return exampleSet;"/>
    </operator>
    <connect from_op="Retrieve Sonar" from_port="output" to_op="Execute Script" to_port="input 1"/>
    <connect from_op="Execute Script" from_port="output 1" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

Answers

  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder

    Hi,

     

    Well, I don't know all your requirements but it sounds like a combination of Branch with Generate Macro might be able to do the job.  Otherwise you can go with Execute Script / R.

     

    Sorry, I think that I would need to know a bit more (maybe an example for an application?) to be more specific...

     

    Cheers,

    Ingo

  • jujujuju Member Posts: 39 Guru

    Thank you very much Ingo!

     

    My specific questions is, when you have a data table, how to get a macro = column name of the label (null if there is no label)?

     

    The context is automatic workflow, which can be implemented using BRANCH or SELECT SUBPROCESSES.

     

    Hope this helps make it clear. Thank you for your help~

  • jujujuju Member Posts: 39 Guru

    Thank you Ingo!

     

    Sripting is actually good - neat for complex logics. :)  Thanks-

  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder

    Glad to hear this :smileyhappy: - Have a nice weekend,

    Ingo

  • jujujuju Member Posts: 39 Guru

    Hi ingo thanks again for your help!

     

    I don't know java. I tested the code and it displayed the dataset itself...

    Untitled.png

     

     

     

     

    my implementation using R: - see LOG panel for things printed in R

    (if can't find label/attribute, get empty list, not a NULL)

    Untitled.png

    Untitled.png

     

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="6.4.000">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="6.4.000" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="generate_data" compatibility="6.4.000" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
    <parameter key="number_examples" value="20"/>
    <parameter key="number_of_attributes" value="3"/>
    </operator>
    <operator activated="true" class="generate_id" compatibility="6.4.000" expanded="true" height="76" name="Generate ID" width="90" x="179" y="30"/>
    <operator activated="true" breakpoints="after" class="set_role" compatibility="6.4.000" expanded="true" height="76" name="Set Role" width="90" x="313" y="30">
    <parameter key="attribute_name" value="att1"/>
    <parameter key="target_role" value="weight"/>
    <list key="set_additional_roles"/>
    </operator>
    <operator activated="true" class="r_scripting:execute_r" compatibility="6.4.000" expanded="true" height="76" name="Execute R" width="90" x="447" y="30">
    <parameter key="script" value="library(data.table)&#10;library(reshape2)&#10;&#10;rm_main = function(dat, in_rapidminer = T){&#10; &#10; cat('Starting R script now ...\n')&#10; &#10; dat = as.data.frame(dat) #tables are in data.table format in RM&#10;&#10; &#10; # find columns of x (attribute) and y (response) ####&#10; if(in_rapidminer){&#10; &#10; meta = melt(metaData)&#10; cat('\n\n', 'metadata melted: \n')&#10; print(meta)&#10; &#10; names(meta) = c('value', 'variable', 'column', 'data')&#10; cat('\n\n', 'renamed: \n')&#10; print(meta)&#10; &#10; meta = dcast(meta, formula = data + column ~ variable)&#10; cat('\n\n', 'reorganized:\n')&#10; print(meta)&#10; &#10; y_name = meta[meta$role %in% 'label', 'column']&#10; x_name = meta[meta$role %in% 'attribute', 'column']&#10; &#10; y = names(dat) %in% y_name&#10; x = names(dat) %in% x_name&#10; }&#10; &#10; cat('\n\n')&#10; cat('y column name:', y_name, '\n')&#10; cat('y column index:', which(y), '\n')&#10; cat('x column name:', x_name, '\n')&#10; cat('x column index:', which(x), '\n')&#10;&#10; data.table(x = 'nothing')&#10;}"/>
    </operator>
    <connect from_op="Generate Data" from_port="output" to_op="Generate ID" to_port="example set input"/>
    <connect from_op="Generate ID" from_port="example set output" to_op="Set Role" to_port="example set input"/>
    <connect from_op="Set Role" from_port="example set output" to_op="Execute R" to_port="input 1"/>
    <connect from_op="Execute R" from_port="output 1" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

     

     

     

     

  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder

    In my code I only have extracted the name of the label column into the macro "label_name" and then it delivers the data again as output.  You would need to add the other columns as well if you want to go down the Java route.  And you can only see the effect if you use the macro afterwards or if you use the panel "Macros" from the "View" menu - probably it would have been helpful if I would have mentioned that :-)

     

    Cheers,

    Ingo

     

     

  • jujujuju Member Posts: 39 Guru

    Ahhh in the macro! Yes actually we need to send it into a macro to use it later.

    Thank you Ingo! Have a good weekend :)

Sign In or Register to comment.