Using Coordinate Data in predictive models

davidnealbrowndavidnealbrown Member Posts: 4 Contributor I
edited December 2018 in Help

 I have data that includes coordinates  X,Y,Z + condition and a label of interest (outcome). How do I build a predictive model using these coordinates as a factor?

Answers

  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder

    Hi,

     

    Is this data in tabular format, i.e. can you read it into RapidMiner and end up with 4 columns for x,y,z, condition and then a 5th one with the outcome information?  If yes, all you need to do is to define the outcome column as label using the "Set Role" operator and using any of the included machine learning algorithms to build the model.

     

    And if this does not tell you anything, I highly recommend to go through the tutorials first which are available at the help screen which comes up if you press the icon in the top right corner of RapidMiner.

     

    Hope this helps,

    Ingo

  • davidnealbrowndavidnealbrown Member Posts: 4 Contributor I

    Thank you Ingo.  This does help and yes the data sets are in tis format. So if I understand you correctly you are saying that the rapid miner models will automatically consider the x,y, and z coordinate in the example rather than considering an individual axis.  Is that correct?

     

    Just to clarify further, I am working with fMRI data - brain activation coordinates in Tailarach standardized sapce.  here is a sample spread sheet.  I know there is not enough data for a model it is for illustration purposes only.

     

    Thanks!

     

    DNB

     

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    What @IngoRM is saying is that the coordinates will just be input in your model and won't carry with it the implied information of it being in some 3D space tied to a reference point. A set of coordinates tied to no reference point (i.e. the Earth) would be meaningless.

     

    In your case I would look at transforming the coordinates to a new feature, maybe a new distance feature. Without knowing more about your particulary task but you could calculate the distance between the coordinates to each other, or your reference point. That might be more useful than just coordinates. Something to think about. :)

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    hello @davidnealbrown thanks for the csv file.  That's a very interesting use case!  Some thoughts/questions...

    - so as @Thomas_Ott said, RapidMiner has no idea what x, y, and z coordinates are.  They are just numerical attributes.  But I cannot see any reason why that would be a problem in your use case.

    - what are the cluster #s?  Are these the results of a segmentation process or some medical terminology?

    - I assume you're trying to predict Label via decision tree, right?  So your process looks something like this?  

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="7.6.001" expanded="true" height="68" name="Retrieve DecisionTree_Test" width="90" x="45" y="85">
    <parameter key="repository_entry" value="//RapidMiner OneDrive/random community stuff/DecisionTree_Test"/>
    </operator>
    <operator activated="true" class="set_role" compatibility="7.6.001" expanded="true" height="82" name="Set Role" width="90" x="179" y="85">
    <parameter key="attribute_name" value="Label"/>
    <parameter key="target_role" value="label"/>
    <list key="set_additional_roles"/>
    </operator>
    <operator activated="true" class="concurrency:cross_validation" compatibility="7.6.001" expanded="true" height="145" name="Cross Validation" width="90" x="313" y="85">
    <process expanded="true">
    <operator activated="true" class="concurrency:parallel_decision_tree" compatibility="7.6.001" expanded="true" height="82" name="Decision Tree" width="90" x="112" y="34"/>
    <connect from_port="training set" to_op="Decision Tree" to_port="training set"/>
    <connect from_op="Decision Tree" from_port="model" to_port="model"/>
    <portSpacing port="source_training set" spacing="0"/>
    <portSpacing port="sink_model" spacing="0"/>
    <portSpacing port="sink_through 1" spacing="0"/>
    </process>
    <process expanded="true">
    <operator activated="true" class="apply_model" compatibility="7.6.001" expanded="true" height="82" name="Apply Model" width="90" x="112" y="34">
    <list key="application_parameters"/>
    </operator>
    <operator activated="true" class="performance" compatibility="7.6.001" expanded="true" height="82" name="Performance" width="90" x="246" y="34"/>
    <connect from_port="model" to_op="Apply Model" to_port="model"/>
    <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
    <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
    <connect from_op="Performance" from_port="performance" to_port="performance 1"/>
    <portSpacing port="source_model" spacing="0"/>
    <portSpacing port="source_test set" spacing="0"/>
    <portSpacing port="source_through 1" spacing="0"/>
    <portSpacing port="sink_test set results" spacing="0"/>
    <portSpacing port="sink_performance 1" spacing="0"/>
    <portSpacing port="sink_performance 2" spacing="0"/>
    </process>
    </operator>
    <connect from_op="Retrieve DecisionTree_Test" from_port="output" to_op="Set Role" to_port="example set input"/>
    <connect from_op="Set Role" from_port="example set output" to_op="Cross Validation" to_port="example set"/>
    <connect from_op="Cross Validation" from_port="model" to_port="result 1"/>
    <connect from_op="Cross Validation" from_port="example set" to_port="result 2"/>
    <connect from_op="Cross Validation" from_port="performance 1" to_port="result 3"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    <portSpacing port="sink_result 4" spacing="0"/>
    </process>
    </operator>
    </process>

    Very cool.  Keep us posted.


    Scott

     

    [EDIT: and thank you - I just googled "Talairach standardized space" and learned a whole new thing.  :)

  • davidnealbrowndavidnealbrown Member Posts: 4 Contributor I

    Thanks!

Sign In or Register to comment.