Features effecting Bottom Line (Revenue)

msacs09msacs09 Member Posts: 55 Contributor II
edited December 2018 in Help
Experts,

Can you please help me on how to perform a feature weights/contributing factors that effecting the revenue. We would like understand why are some instances of revenue low and some high, what is the differentiator. Please see the sample data. I wanted to see what features are affecting a revenue percentages. 

Can you please help me how to approach this. I do have lot of nominal attributes, should i convert everything to numerical etc., can you point me to a sample process please.

As Always thanks you for your valuable advice and time

Answers

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    If you are understanding the univariate relationships between Revenue and other attributes one at a time, you should look at the Weighting operators.  Weight by Correlation is good for numerical attributes and Weight by Information Gain or Weight by Chi Square is good for nominal variables.

    These will only show you individual relationships.  Your question may actually be about what combinations of factors are most associated with Revenue.  If that is the case and you are interested in exploring multivariate relationships, then that is basically a supervised machine learning problem.  In that case, you probably want to build a simple predictive model to start, using a highly interpretable algorithm.  I suggest a simple Decision Tree model so you can get a sense of what combinations of factors are associated with different levels of Revenue.

    In both cases, looking at the tutorial processes contained in RapidMiner will be useful for understanding the basic setup and use in RapidMiner. 

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • msacs09msacs09 Member Posts: 55 Contributor II
    @Telcontar120 Thank you sir.  Your understanding is exactly right. I need to "explore multivariate relationships effecting Revenue" . Can I kindly request a sample/similar process that I can infer please??
  • msacs09msacs09 Member Posts: 55 Contributor II
    Telcontar120 Thank you sir. Is there a sample process around exploring multivariate relationships please?
  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    edited December 2018
    Here's a simple cross-validation with a DT for a numerical dataset.  You'll need to substitute your own dataset of course and make sure Revenue is set as the label.
    <?xml version="1.0" encoding="UTF-8"?><process version="9.1.000-BETA2">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.1.000-BETA2" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="120"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="9.1.000-BETA2" expanded="true" height="68" name="Retrieve Polynomial" width="90" x="112" y="85">
            <parameter key="repository_entry" value="//Samples/data/Polynomial"/>
          </operator>
          <operator activated="true" class="concurrency:cross_validation" compatibility="8.2.000" expanded="true" height="145" name="Validation" width="90" x="380" y="34">
            <parameter key="split_on_batch_attribute" value="false"/>
            <parameter key="leave_one_out" value="false"/>
            <parameter key="number_of_folds" value="10"/>
            <parameter key="sampling_type" value="shuffled sampling"/>
            <parameter key="use_local_random_seed" value="false"/>
            <parameter key="local_random_seed" value="1992"/>
            <parameter key="enable_parallel_execution" value="true"/>
            <process expanded="true">
              <operator activated="true" class="concurrency:parallel_decision_tree" compatibility="9.1.000-BETA2" expanded="true" height="103" name="Decision Tree" width="90" x="179" y="34">
                <parameter key="criterion" value="least_square"/>
                <parameter key="maximal_depth" value="10"/>
                <parameter key="apply_pruning" value="true"/>
                <parameter key="confidence" value="0.1"/>
                <parameter key="apply_prepruning" value="true"/>
                <parameter key="minimal_gain" value="0.01"/>
                <parameter key="minimal_leaf_size" value="2"/>
                <parameter key="minimal_size_for_split" value="4"/>
                <parameter key="number_of_prepruning_alternatives" value="3"/>
              </operator>
              <connect from_port="training set" to_op="Decision Tree" to_port="training set"/>
              <connect from_op="Decision Tree" from_port="model" to_port="model"/>
              <portSpacing port="source_training set" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
              <portSpacing port="sink_through 1" spacing="0"/>
              <description align="left" color="green" colored="true" height="113" resized="true" width="284" x="33" y="148">Builds a model on the current training data set (90 % of the data by default, 10 times).&lt;br&gt;&lt;br&gt;Make sure that you only put numerical attributes into a linear regression!</description>
            </process>
            <process expanded="true">
              <operator activated="true" class="apply_model" compatibility="9.1.000-BETA2" expanded="true" height="82" name="Apply Model" width="90" x="45" y="34">
                <list key="application_parameters"/>
                <parameter key="create_view" value="false"/>
              </operator>
              <operator activated="true" class="performance" compatibility="9.1.000-BETA2" expanded="true" height="82" name="Performance" width="90" x="179" y="34">
                <parameter key="use_example_weights" value="true"/>
              </operator>
              <connect from_port="model" to_op="Apply Model" to_port="model"/>
              <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
              <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
              <connect from_op="Performance" from_port="performance" to_port="performance 1"/>
              <connect from_op="Performance" from_port="example set" to_port="test set results"/>
              <portSpacing port="source_model" spacing="0"/>
              <portSpacing port="source_test set" spacing="0"/>
              <portSpacing port="source_through 1" spacing="0"/>
              <portSpacing port="sink_test set results" spacing="0"/>
              <portSpacing port="sink_performance 1" spacing="0"/>
              <portSpacing port="sink_performance 2" spacing="0"/>
              <description align="left" color="blue" colored="true" height="107" resized="true" width="333" x="28" y="139">Applies the model built from the training data set on the current test set (10 % by default).&lt;br/&gt;The Performance operator calculates performance indicators and sends them to the operator result.</description>
            </process>
            <description align="center" color="transparent" colored="false" width="126">A cross validation including a linear regression.</description>
          </operator>
          <connect from_op="Retrieve Polynomial" from_port="output" to_op="Validation" to_port="example set"/>
          <connect from_op="Validation" from_port="model" to_port="result 1"/>
          <connect from_op="Validation" from_port="test result set" to_port="result 2"/>
          <connect from_op="Validation" from_port="performance 1" to_port="result 3"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
          <portSpacing port="sink_result 4" spacing="0"/>
        </process>
      </operator>
    </process>

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • msacs09msacs09 Member Posts: 55 Contributor II
    edited November 2018
    @Telcontar120 Thank you very much sir. Can you suggest the best way to represent this via chart. What charts in Rapidminer would help us to interpret the below for the Business folks. Does the below sample says that Med has highest margin, since the count is 10?? Basically i want to extract the decision tree model and present in a meaningful way

           RegressionTree

    segment = global: 0.018 {count=4}
    segment = local
    |   Sector = AD: 0.016 {count=3}
    |   Sector = ES: 0.011 {count=2}
    segment = med: 0.020 {count=10}
  • msacs09msacs09 Member Posts: 55 Contributor II
    Telcontar120 Thank you sir. Is there a sample process around exploring multivariate relationships please?
  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    The sample process I provided earlier in this thread is suitable for exploring and showing multivariate relationships via a decision tree.  You could also swap the learner and do something similar with a linear regression or GLM.

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
Sign In or Register to comment.