Wierd (?) Polynomial Regression results

ogjtechogjtech Member Posts: 5 Contributor II
Hi,

I'm trying to analyse a dataset with data about airbnb accomodation and what attributes give these accomodation high rating score (which attributes are most influential).
I try doing this by setting the 'review_score_rating'-attribute as label, then running a polynomial regression operator, applying it to a model and looking at the results.
The predicted values however are very weird, typically the value of the label is between 0 and 100, yet the predicted values go from +100000 to -200000.
I'm completely new to data mining and don't really understand how the Polynomial Regression operator works.

Below is my process, I'm wondering what exactly it is I am doing wrong or how I should go about this differently?
Also, please find attached the dataset I'm using for this process.
<?xml version="1.0" encoding="UTF-8"?><process version="9.3.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="9.3.001" expanded="true" name="Process">
    <parameter key="logverbosity" value="init"/>
    <parameter key="random_seed" value="2001"/>
    <parameter key="send_mail" value="never"/>
    <parameter key="notification_email" value=""/>
    <parameter key="process_duration_for_mail" value="30"/>
    <parameter key="encoding" value="SYSTEM"/>
    <process expanded="true">
      <operator activated="true" class="retrieve" compatibility="9.3.001" expanded="true" height="68" name="Retrieve airbnb" width="90" x="45" y="34">
        <parameter key="repository_entry" value="data/airbnb"/>
      </operator>
      <operator activated="true" class="filter_examples" compatibility="9.3.001" expanded="true" height="103" name="Filter Examples" width="90" x="179" y="34">
        <parameter key="parameter_expression" value=""/>
        <parameter key="condition_class" value="custom_filters"/>
        <parameter key="invert_filter" value="false"/>
        <list key="filters_list">
          <parameter key="filters_entry_key" value="city.equals.NYC"/>
          <parameter key="filters_entry_key" value="number_of_reviews.ge.5"/>
          <parameter key="filters_entry_key" value="city.is_not_missing."/>
          <parameter key="filters_entry_key" value="number_of_reviews.is_not_missing."/>
          <parameter key="filters_entry_key" value="property_type.is_not_missing."/>
          <parameter key="filters_entry_key" value="room_type.is_not_missing."/>
          <parameter key="filters_entry_key" value="amenities.is_not_missing."/>
          <parameter key="filters_entry_key" value="accommodates.is_not_missing."/>
          <parameter key="filters_entry_key" value="bathrooms.is_not_missing."/>
          <parameter key="filters_entry_key" value="bed_type.is_not_missing."/>
          <parameter key="filters_entry_key" value="cancellation_policy.is_not_missing."/>
          <parameter key="filters_entry_key" value="cleaning_fee.is_not_missing."/>
          <parameter key="filters_entry_key" value="host_has_profile_pic.is_not_missing."/>
          <parameter key="filters_entry_key" value="host_identity_verified.is_not_missing."/>
          <parameter key="filters_entry_key" value="host_response_rate.is_not_missing."/>
          <parameter key="filters_entry_key" value="instant_bookable.is_not_missing."/>
          <parameter key="filters_entry_key" value="neighbourhood.is_not_missing."/>
          <parameter key="filters_entry_key" value="number_of_reviews.is_not_missing."/>
          <parameter key="filters_entry_key" value="review_scores_rating.is_not_missing."/>
          <parameter key="filters_entry_key" value="bedrooms.is_not_missing."/>
          <parameter key="filters_entry_key" value="beds.is_not_missing."/>
          <parameter key="filters_entry_key" value="review_scores_rating.ge.80"/>
        </list>
        <parameter key="filters_logic_and" value="true"/>
        <parameter key="filters_check_metadata" value="true"/>
      </operator>
      <operator activated="true" class="subprocess" compatibility="9.3.001" expanded="true" height="82" name="Subprocess" width="90" x="380" y="34">
        <process expanded="true">
          <operator activated="true" class="select_attributes" compatibility="9.3.001" expanded="true" height="82" name="Select Attributes" width="90" x="45" y="34">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value="accommodates|bathrooms|bed_type|bedrooms|beds|cancellation_policy|cleaning_fee|host_has_profile_pic|host_identity_verified|host_response_rate|id|instant_bookable|number_of_reviews|property_type|review_scores_rating|room_type"/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
          </operator>
          <operator activated="true" class="subprocess" compatibility="9.3.001" expanded="true" height="82" name="Subprocess (2)" width="90" x="313" y="136">
            <process expanded="true">
              <operator activated="true" class="select_attributes" compatibility="9.3.001" expanded="true" height="82" name="Select Attributes (3)" width="90" x="45" y="136">
                <parameter key="attribute_filter_type" value="subset"/>
                <parameter key="attribute" value=""/>
                <parameter key="attributes" value="amenities|id"/>
                <parameter key="use_except_expression" value="false"/>
                <parameter key="value_type" value="attribute_value"/>
                <parameter key="use_value_type_exception" value="false"/>
                <parameter key="except_value_type" value="time"/>
                <parameter key="block_type" value="attribute_block"/>
                <parameter key="use_block_type_exception" value="false"/>
                <parameter key="except_block_type" value="value_matrix_row_start"/>
                <parameter key="invert_selection" value="false"/>
                <parameter key="include_special_attributes" value="false"/>
              </operator>
              <operator activated="true" class="replace" compatibility="9.3.001" expanded="true" height="82" name="Replace" width="90" x="112" y="289">
                <parameter key="attribute_filter_type" value="single"/>
                <parameter key="attribute" value="amenities"/>
                <parameter key="attributes" value=""/>
                <parameter key="use_except_expression" value="false"/>
                <parameter key="value_type" value="nominal"/>
                <parameter key="use_value_type_exception" value="false"/>
                <parameter key="except_value_type" value="file_path"/>
                <parameter key="block_type" value="single_value"/>
                <parameter key="use_block_type_exception" value="false"/>
                <parameter key="except_block_type" value="single_value"/>
                <parameter key="invert_selection" value="false"/>
                <parameter key="include_special_attributes" value="false"/>
                <parameter key="replace_what" value="[{}&quot;]"/>
                <parameter key="replace_by" value=""/>
              </operator>
              <operator activated="true" class="split" compatibility="9.3.001" expanded="true" height="82" name="Split" width="90" x="179" y="442">
                <parameter key="attribute_filter_type" value="single"/>
                <parameter key="attribute" value="amenities"/>
                <parameter key="attributes" value=""/>
                <parameter key="use_except_expression" value="false"/>
                <parameter key="value_type" value="nominal"/>
                <parameter key="use_value_type_exception" value="false"/>
                <parameter key="except_value_type" value="file_path"/>
                <parameter key="block_type" value="single_value"/>
                <parameter key="use_block_type_exception" value="false"/>
                <parameter key="except_block_type" value="single_value"/>
                <parameter key="invert_selection" value="false"/>
                <parameter key="include_special_attributes" value="false"/>
                <parameter key="split_pattern" value=","/>
                <parameter key="split_mode" value="ordered_split"/>
              </operator>
              <operator activated="true" class="de_pivot" compatibility="9.3.001" expanded="true" height="82" name="De-Pivot" width="90" x="313" y="442">
                <list key="attribute_name">
                  <parameter key="amenity" value="amenities.*"/>
                </list>
                <parameter key="index_attribute" value="nr"/>
                <parameter key="create_nominal_index" value="false"/>
                <parameter key="keep_missings" value="false"/>
              </operator>
              <operator activated="true" class="blending:pivot" compatibility="9.3.001" expanded="true" height="82" name="Pivot" width="90" x="447" y="442">
                <parameter key="group_by_attributes" value="id"/>
                <parameter key="column_grouping_attribute" value="amenity"/>
                <list key="aggregation_attributes">
                  <parameter key="nr" value="count"/>
                </list>
                <parameter key="use_default_aggregation" value="false"/>
                <parameter key="default_aggregation_function" value="first"/>
              </operator>
              <operator activated="true" class="replace_missing_values" compatibility="9.3.001" expanded="true" height="103" name="Replace Missing Values" width="90" x="581" y="442">
                <parameter key="return_preprocessing_model" value="false"/>
                <parameter key="create_view" value="false"/>
                <parameter key="attribute_filter_type" value="all"/>
                <parameter key="attribute" value=""/>
                <parameter key="attributes" value=""/>
                <parameter key="use_except_expression" value="false"/>
                <parameter key="value_type" value="attribute_value"/>
                <parameter key="use_value_type_exception" value="false"/>
                <parameter key="except_value_type" value="time"/>
                <parameter key="block_type" value="attribute_block"/>
                <parameter key="use_block_type_exception" value="false"/>
                <parameter key="except_block_type" value="value_matrix_row_start"/>
                <parameter key="invert_selection" value="false"/>
                <parameter key="include_special_attributes" value="false"/>
                <parameter key="default" value="zero"/>
                <list key="columns"/>
              </operator>
              <operator activated="true" class="rename_by_replacing" compatibility="9.3.001" expanded="true" height="82" name="Rename by Replacing" width="90" x="715" y="442">
                <parameter key="attribute_filter_type" value="all"/>
                <parameter key="attribute" value=""/>
                <parameter key="attributes" value=""/>
                <parameter key="use_except_expression" value="false"/>
                <parameter key="value_type" value="attribute_value"/>
                <parameter key="use_value_type_exception" value="false"/>
                <parameter key="except_value_type" value="time"/>
                <parameter key="block_type" value="attribute_block"/>
                <parameter key="use_block_type_exception" value="false"/>
                <parameter key="except_block_type" value="value_matrix_row_start"/>
                <parameter key="invert_selection" value="false"/>
                <parameter key="include_special_attributes" value="false"/>
                <parameter key="replace_what" value="count.nr._*"/>
              </operator>
              <connect from_port="in 1" to_op="Select Attributes (3)" to_port="example set input"/>
              <connect from_op="Select Attributes (3)" from_port="example set output" to_op="Replace" to_port="example set input"/>
              <connect from_op="Replace" from_port="example set output" to_op="Split" to_port="example set input"/>
              <connect from_op="Split" from_port="example set output" to_op="De-Pivot" to_port="example set input"/>
              <connect from_op="De-Pivot" from_port="example set output" to_op="Pivot" to_port="input"/>
              <connect from_op="Pivot" from_port="output" to_op="Replace Missing Values" to_port="example set input"/>
              <connect from_op="Replace Missing Values" from_port="example set output" to_op="Rename by Replacing" to_port="example set input"/>
              <connect from_op="Rename by Replacing" from_port="example set output" to_port="out 1"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="source_in 2" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
            </process>
            <description align="center" color="transparent" colored="false" width="126">Handle amenities attribute&lt;br/&gt;</description>
          </operator>
          <operator activated="true" class="set_role" compatibility="9.3.001" expanded="true" height="82" name="Set Role" width="90" x="313" y="34">
            <parameter key="attribute_name" value="review_scores_rating"/>
            <parameter key="target_role" value="label"/>
            <list key="set_additional_roles"/>
          </operator>
          <operator activated="true" class="concurrency:join" compatibility="9.3.001" expanded="true" height="82" name="Join" width="90" x="581" y="34">
            <parameter key="remove_double_attributes" value="true"/>
            <parameter key="join_type" value="inner"/>
            <parameter key="use_id_attribute_as_key" value="false"/>
            <list key="key_attributes">
              <parameter key="id" value="id"/>
            </list>
            <parameter key="keep_both_join_attributes" value="false"/>
          </operator>
          <operator activated="true" class="nominal_to_numerical" compatibility="9.3.001" expanded="true" height="103" name="Nominal to Numerical" width="90" x="916" y="34">
            <parameter key="return_preprocessing_model" value="false"/>
            <parameter key="create_view" value="false"/>
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value="bed_type|cancellation_policy|property_type|room_type|amenities|cleaning_fee|host_has_profile_pic|host_identity_verified|instant_bookable"/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="nominal"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="file_path"/>
            <parameter key="block_type" value="single_value"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="single_value"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="coding_type" value="dummy coding"/>
            <parameter key="use_comparison_groups" value="false"/>
            <list key="comparison_groups"/>
            <parameter key="unexpected_value_handling" value="all 0 and warning"/>
            <parameter key="use_underscore_in_name" value="true"/>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="9.3.001" expanded="true" height="82" name="Select Attributes (2)" width="90" x="1117" y="34">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value="bed_type_Airbed|bed_type_Couch|cancellation_policy_long_term|cancellation_policy_super_strict_30|cancellation_policy_super_strict_60|property_type_Bed &amp; Breakfast|property_type_Boat|property_type_Boutique hotel|property_type_Bungalow|property_type_Cabin|property_type_Camper/RV|property_type_Castle|property_type_Cave|property_type_Chalet|property_type_Dorm|property_type_Earth House|property_type_Guest suite|property_type_Guesthouse|property_type_Hostel|property_type_In-law|property_type_Serviced apartment|property_type_Tent|property_type_Timeshare|property_type_Tipi|property_type_Train|property_type_Treehouse|property_type_Vacation home|property_type_Villa|property_type_Yurt|room_type_Shared room|id"/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="true"/>
            <parameter key="include_special_attributes" value="false"/>
          </operator>
          <connect from_port="in 1" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_op="Set Role" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="original" to_op="Subprocess (2)" to_port="in 1"/>
          <connect from_op="Subprocess (2)" from_port="out 1" to_op="Join" to_port="right"/>
          <connect from_op="Set Role" from_port="example set output" to_op="Join" to_port="left"/>
          <connect from_op="Join" from_port="join" to_op="Nominal to Numerical" to_port="example set input"/>
          <connect from_op="Nominal to Numerical" from_port="example set output" to_op="Select Attributes (2)" to_port="example set input"/>
          <connect from_op="Select Attributes (2)" from_port="example set output" to_port="out 1"/>
          <portSpacing port="source_in 1" spacing="0"/>
          <portSpacing port="source_in 2" spacing="0"/>
          <portSpacing port="sink_out 1" spacing="0"/>
          <portSpacing port="sink_out 2" spacing="0"/>
        </process>
        <description align="center" color="transparent" colored="false" width="126">Data prepping&lt;br/&gt;</description>
      </operator>
      <operator activated="true" class="split_data" compatibility="9.3.001" expanded="true" height="103" name="Split Data" width="90" x="648" y="34">
        <enumeration key="partitions">
          <parameter key="ratio" value="0.7"/>
          <parameter key="ratio" value="0.3"/>
        </enumeration>
        <parameter key="sampling_type" value="shuffled sampling"/>
        <parameter key="use_local_random_seed" value="false"/>
        <parameter key="local_random_seed" value="1992"/>
      </operator>
      <operator activated="true" class="polynomial_regression" compatibility="9.3.001" expanded="true" height="82" name="Polynomial Regression" width="90" x="916" y="34">
        <parameter key="max_iterations" value="5000"/>
        <parameter key="replication_factor" value="1"/>
        <parameter key="max_degree" value="5"/>
        <parameter key="min_coefficient" value="-100.0"/>
        <parameter key="max_coefficient" value="100.0"/>
        <parameter key="use_local_random_seed" value="false"/>
        <parameter key="local_random_seed" value="1992"/>
      </operator>
      <operator activated="true" class="apply_model" compatibility="9.3.001" expanded="true" height="82" name="Apply Model" width="90" x="983" y="289">
        <list key="application_parameters"/>
        <parameter key="create_view" value="false"/>
      </operator>
      <operator activated="true" class="performance_regression" compatibility="9.3.001" expanded="true" height="82" name="Performance" width="90" x="1050" y="136">
        <parameter key="main_criterion" value="first"/>
        <parameter key="root_mean_squared_error" value="true"/>
        <parameter key="absolute_error" value="false"/>
        <parameter key="relative_error" value="false"/>
        <parameter key="relative_error_lenient" value="false"/>
        <parameter key="relative_error_strict" value="false"/>
        <parameter key="normalized_absolute_error" value="false"/>
        <parameter key="root_relative_squared_error" value="false"/>
        <parameter key="squared_error" value="false"/>
        <parameter key="correlation" value="false"/>
        <parameter key="squared_correlation" value="true"/>
        <parameter key="prediction_average" value="false"/>
        <parameter key="spearman_rho" value="false"/>
        <parameter key="kendall_tau" value="false"/>
        <parameter key="skip_undefined_labels" value="true"/>
        <parameter key="use_example_weights" value="true"/>
      </operator>
      <connect from_op="Retrieve airbnb" from_port="output" to_op="Filter Examples" to_port="example set input"/>
      <connect from_op="Filter Examples" from_port="example set output" to_op="Subprocess" to_port="in 1"/>
      <connect from_op="Subprocess" from_port="out 1" to_op="Split Data" to_port="example set"/>
      <connect from_op="Split Data" from_port="partition 1" to_op="Polynomial Regression" to_port="training set"/>
      <connect from_op="Split Data" from_port="partition 2" to_op="Apply Model" to_port="unlabelled data"/>
      <connect from_op="Polynomial Regression" from_port="model" to_op="Apply Model" to_port="model"/>
      <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
      <connect from_op="Apply Model" from_port="model" to_port="result 2"/>
      <connect from_op="Performance" from_port="performance" to_port="result 1"/>
      <connect from_op="Performance" from_port="example set" to_port="result 3"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
      <portSpacing port="sink_result 4" spacing="0"/>
    </process>
  </operator>
</process>


Best Answer

  • lionelderkrikorlionelderkrikor Posts: 862   Unicorn
    Solution Accepted
    Hi @ogjtech,

    First congratulations for supplying your process and a dataset and to have asked your question rigorously. You have good understood the rules of this community...

    In deed, what you observe is strange.
    If you are new in data mining, I advise you to submit your prepared data to AutoModel. With this tool, RapidMiner will automatically create relevant models for your regression problem.

    Here the steps : 

     - Rename your attributes Cat(s) and Dog(s) by Cat and Dog in your Excel file. In deed it seems that the brackets "()" in  an attribute name raises an error in AutoModel.
     - In your process, set a Breakpoint After on the Subprocess ("data prepping")  operator (Right click on this operator).
    - Execute the process
     - The process stops after the Subprocess and the Results panel displays your prepared dataset . Then click on Auto Model



     - Then click on "Predict"  and select your target attribute/Label ("review_scores_rating")
     - Then click several times on NEXT until the final window.

    You will see the different performances of the proposed models.

    I personnaly tried to submit your prepared data to AutoModel and with the Generalized Linear Model, I obtain relevant predictions (between 0 and 100) : 


    Hope this helps,

    Regards,

    Lionel



Answers

  • ogjtechogjtech Member Posts: 5 Contributor II
    Hi,

    Thankyou for the swift reply and sorry for the late approval.
    This did give me the results I wanted and really helped me continue on my research!

    Many thanks again for the help!

    Regards,

    Jeroen
    lionelderkrikorTghadially
Sign In or Register to comment.