Generalized Linear Model (GLM)

bkrugerbkruger Member Posts: 17 Contributor II
edited November 2018 in Help

Is there a standard process for doing GLM's in RapidMiner, or can someone please point me to a process example?



  • am_dasam_das Member Posts: 2 Learner III


    Did you finally receive any content or example of GLM functionality in Rapidminer?

    I am looking for the same.




  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,528 RM Data Scientist

    it's in for quite a while now. just search for glm in the operators.




    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • am_dasam_das Member Posts: 2 Learner III

    Yes, I read through the operators documentations. I would like more info on how to specify beta constraints.

    My model has around 30 input variables and I want to constrain the coefficients of few variables as positive (because I know that the relationship is +ve) by specifying a lower bound as 0 and upper bound as +infinity. I am struggling to implement it in the paramters window (screenshots):

    - What is the 'category' input right next to attribute name?

    - How to input +infinity as upper bound?




    Capture_beta_constraint.JPGCapture-documentation.JPGscreenshot of the documentation

  • yyhuangyyhuang Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 364 RM Data Scientist

    Hi @am_das,


    That is a good question. The beta constraint parameter can be setup in your GLM.


    Screen Shot 2018-03-28 at 11.43.27 AM.png


    In my attached process, I used deals data with customer profile. Input data has a categorical variable "payement method" and suppose I know the coefficients (beta) for that "credit card" category need to be positve, then I set up the constraints for the coefficients of that category. 

    upper_bounds is (optional): The upper bounds of the beta. Must be greater than or equal to lower_bounds. You need to have real value there.

    Hope this helps.


    <?xml version="1.0" encoding="UTF-8"?><process version="8.1.001">
    <operator activated="true" class="process" compatibility="8.1.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="8.1.001" expanded="true" height="68" name="Retrieve Deals" width="90" x="45" y="34">
    <parameter key="repository_entry" value="//Samples/data/Deals"/>
    <operator activated="true" class="h2o:generalized_linear_model" compatibility="7.2.000" expanded="true" height="124" name="Generalized Linear Model" width="90" x="179" y="34">
    <parameter key="specify_beta_constraints" value="true"/>
    <list key="beta_constraints">
    <parameter key="Payment Method.credit card" value="0\.01.5\.0.0\.0.0\.0"/>
    <list key="expert_parameters"/>
    <operator activated="true" class="retrieve" compatibility="8.1.001" expanded="true" height="68" name="Retrieve Deals-Testset" width="90" x="179" y="238">
    <parameter key="repository_entry" value="//Samples/data/Deals-Testset"/>
    <operator activated="true" class="apply_model" compatibility="7.1.001" expanded="true" height="82" name="Apply Model" width="90" x="380" y="34">
    <list key="application_parameters"/>
    <operator activated="true" class="performance_classification" compatibility="8.1.001" expanded="true" height="82" name="Performance" width="90" x="514" y="85">
    <list key="class_weights"/>
    <connect from_op="Retrieve Deals" from_port="output" to_op="Generalized Linear Model" to_port="training set"/>
    <connect from_op="Generalized Linear Model" from_port="model" to_op="Apply Model" to_port="model"/>
    <connect from_op="Retrieve Deals-Testset" from_port="output" to_op="Apply Model" to_port="unlabelled data"/>
    <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
    <connect from_op="Apply Model" from_port="model" to_port="result 2"/>
    <connect from_op="Performance" from_port="performance" to_port="result 1"/>
    <connect from_op="Performance" from_port="example set" to_port="result 3"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    <portSpacing port="sink_result 4" spacing="0"/>


