Problem with generalized linear model (lambda seach)

scottchung64 · March 2018

Hi all,

I'm trying to do classification using generalized linear model.

In default setting, the lambda value is chosen by H2O (described in documentation).

However, I found that if I use lambda search, the performance is much better.

I don't understand what is the difference between this two method.

Is the better performance from doing lambda search comes from overfitting?

Thanks!

Best,

Scott

yyhuang · March 2018

Hi @scottchung64,

You are correct. The lambda search is used for controlling the regularization to avoid overfitting. When performing regularization, penalties are introduced to the model buidling process to avoid overfitting. GLM needs to find the optimal values of the regularization parameters alpha and lambda. The lambda parameter controls the amount of regularization applied to the model.

When you activate the labmda search in GLM operator, it will take longer time to find the best value of parameters.

YY

staskhalitov · September 2018

is it possible to initiate an Alpha search?

i see this: "Providing multiple alpha values via the advanced parameters triggers a search."

but how do i actualy provide multiple values...what is the format?

yyhuang · September 2018

Hi @staskhalitov,

Good point. You will need to edit the "expert parameters" list

<?xml version="1.0" encoding="UTF-8"?><process version="9.0.002">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="9.0.002" expanded="true" name="Process" origin="GENERATED_TUTORIAL">
    <process expanded="true">
      <operator activated="true" class="retrieve" compatibility="9.0.002" expanded="true" height="68" name="Retrieve Titanic Training" width="90" x="45" y="34">
        <parameter key="repository_entry" value="//Samples/data/Titanic Training"/>
      </operator>
      <operator activated="true" class="h2o:generalized_linear_model" compatibility="7.2.000" expanded="true" height="124" name="Generalized Linear Model" origin="GENERATED_TUTORIAL" width="90" x="179" y="34">
        <parameter key="lambda_search" value="true"/>
        <parameter key="number_of_lambdas" value="3"/>
        <parameter key="alpha" value="0.6"/>
        <list key="beta_constraints"/>
        <list key="expert_parameters">
          <parameter key="additional_alphas" value="0.2"/>
          <parameter key="additional_alphas" value="0.1"/>
        </list>
      </operator>
      <operator activated="true" class="apply_model" compatibility="7.1.001" expanded="true" height="82" name="Apply Model" origin="GENERATED_TUTORIAL" width="90" x="380" y="34">
        <list key="application_parameters"/>
      </operator>
      <operator activated="true" class="performance_classification" compatibility="9.0.002" expanded="true" height="82" name="Performance" origin="GENERATED_TUTORIAL" width="90" x="514" y="85">
        <list key="class_weights"/>
      </operator>
      <connect from_op="Retrieve Titanic Training" from_port="output" to_op="Generalized Linear Model" to_port="training set"/>
      <connect from_op="Generalized Linear Model" from_port="model" to_op="Apply Model" to_port="model"/>
      <connect from_op="Generalized Linear Model" from_port="exampleSet" to_op="Apply Model" to_port="unlabelled data"/>
      <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
      <connect from_op="Apply Model" from_port="model" to_port="result 2"/>
      <connect from_op="Performance" from_port="performance" to_port="result 1"/>
      <connect from_op="Performance" from_port="example set" to_port="result 3"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
      <portSpacing port="sink_result 4" spacing="0"/>
    </process>
  </operator>
</process>

Hope it helps.

YY

staskhalitov · September 2018

so I tried your xml, but it seems like the model just uses what ever value of Alpha you have in the initial settings, .6 in your example.

It doesnt look like it considered the additional Alphas, .2 & .1, in the expert parameters.

How do i actualy initiate a search for an Alpha per this description?

alpha
Description: The alpha parameter controls the distribution between the L1 (Lasso) and L2 (Ridge regression) penalties. A value of 1.0 for alpha represents Lasso, and an alpha value of 0.0 produces Ridge regression. Providing multiple alpha values via the advanced parameters triggers a search. Default is 0.0 for the L-BFGS solver, else 0.5.
Range: real; 0.0-1.0
Optional: true

If i leave the initial Alpha .6 blank, and have additional Alphas in expert parameters i get an error.

yyhuang · September 2018

Hi @staskhalitov,

Thanks for the followup! Great catch. I double checked the model descriptions and unfortunately the additional alpha values are not used for alpha search. We are investigating the bug. @phellinger

At the same time, you can manually do a grid search by loop. Here is an example:

<?xml version="1.0" encoding="UTF-8"?><process version="9.0.002">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="9.0.002" expanded="true" name="Process" origin="GENERATED_TUTORIAL">
    <process expanded="true">
      <operator activated="true" class="retrieve" compatibility="9.0.002" expanded="true" height="68" name="Retrieve Titanic Training" width="90" x="313" y="187">
        <parameter key="repository_entry" value="//Samples/data/Titanic Training"/>
      </operator>
      <operator activated="true" class="generate_data" compatibility="9.0.002" expanded="true" height="68" name="Generate Data" width="90" x="179" y="34">
        <parameter key="target_function" value="grid function"/>
        <parameter key="number_examples" value="5"/>
        <parameter key="number_of_attributes" value="1"/>
        <parameter key="attributes_lower_bound" value="0.0"/>
        <parameter key="attributes_upper_bound" value="1.0"/>
      </operator>
      <operator activated="true" class="numerical_to_polynominal" compatibility="9.0.002" expanded="true" height="82" name="Numerical to Polynominal" width="90" x="313" y="34"/>
      <operator activated="true" class="concurrency:loop_values" compatibility="9.0.002" expanded="true" height="124" name="Loop Values" width="90" x="514" y="34">
        <parameter key="attribute" value="att1"/>
        <parameter key="iteration_macro" value="alpha"/>
        <parameter key="enable_parallel_execution" value="false"/>
        <process expanded="true">
          <operator activated="true" class="concurrency:cross_validation" compatibility="9.0.002" expanded="true" height="145" name="Cross Validation" width="90" x="112" y="34">
            <process expanded="true">
              <operator activated="true" class="h2o:generalized_linear_model" compatibility="7.2.000" expanded="true" height="124" name="Generalized Linear Model" origin="GENERATED_TUTORIAL" width="90" x="112" y="85">
                <parameter key="alpha" value="%{alpha}"/>
                <parameter key="standardize" value="false"/>
                <list key="beta_constraints"/>
                <list key="expert_parameters">
                  <parameter key="additional_alphas" value="0.3"/>
                  <parameter key="additional_alphas" value="0.1"/>
                  <parameter key="additional_alphas" value="0.55"/>
                  <parameter key="keep_cross_validation_predictions" value="true"/>
                </list>
              </operator>
              <connect from_port="training set" to_op="Generalized Linear Model" to_port="training set"/>
              <connect from_op="Generalized Linear Model" from_port="model" to_port="model"/>
              <portSpacing port="source_training set" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
              <portSpacing port="sink_through 1" spacing="0"/>
            </process>
            <process expanded="true">
              <operator activated="true" class="apply_model" compatibility="7.1.001" expanded="true" height="82" name="Apply Model" origin="GENERATED_TUTORIAL" width="90" x="112" y="34">
                <list key="application_parameters"/>
              </operator>
              <operator activated="true" class="performance_binominal_classification" compatibility="9.0.002" expanded="true" height="82" name="Performance" width="90" x="246" y="34">
                <parameter key="classification_error" value="true"/>
                <parameter key="kappa" value="true"/>
                <parameter key="AUC" value="true"/>
                <parameter key="recall" value="true"/>
                <parameter key="f_measure" value="true"/>
              </operator>
              <connect from_port="model" to_op="Apply Model" to_port="model"/>
              <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
              <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
              <connect from_op="Performance" from_port="performance" to_port="performance 1"/>
              <portSpacing port="source_model" spacing="0"/>
              <portSpacing port="source_test set" spacing="0"/>
              <portSpacing port="source_through 1" spacing="0"/>
              <portSpacing port="sink_test set results" spacing="0"/>
              <portSpacing port="sink_performance 1" spacing="0"/>
              <portSpacing port="sink_performance 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="performance_to_data" compatibility="9.0.002" expanded="true" height="82" name="Performance to Data" width="90" x="313" y="136"/>
          <operator activated="true" class="generate_attributes" compatibility="9.0.002" expanded="true" height="82" name="Generate Attributes" width="90" x="447" y="85">
            <list key="function_descriptions">
              <parameter key="ALPHA" value="%{alpha}"/>
            </list>
          </operator>
          <connect from_port="input 2" to_op="Cross Validation" to_port="example set"/>
          <connect from_op="Cross Validation" from_port="model" to_port="output 1"/>
          <connect from_op="Cross Validation" from_port="performance 1" to_op="Performance to Data" to_port="performance vector"/>
          <connect from_op="Performance to Data" from_port="example set" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Performance to Data" from_port="performance vector" to_port="output 3"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_port="output 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="source_input 2" spacing="0"/>
          <portSpacing port="source_input 3" spacing="0"/>
          <portSpacing port="sink_output 1" spacing="0"/>
          <portSpacing port="sink_output 2" spacing="21"/>
          <portSpacing port="sink_output 3" spacing="42"/>
          <portSpacing port="sink_output 4" spacing="0"/>
        </process>
      </operator>
      <connect from_op="Retrieve Titanic Training" from_port="output" to_op="Loop Values" to_port="input 2"/>
      <connect from_op="Generate Data" from_port="output" to_op="Numerical to Polynominal" to_port="example set input"/>
      <connect from_op="Numerical to Polynominal" from_port="example set output" to_op="Loop Values" to_port="input 1"/>
      <connect from_op="Loop Values" from_port="output 1" to_port="result 1"/>
      <connect from_op="Loop Values" from_port="output 2" to_port="result 2"/>
      <connect from_op="Loop Values" from_port="output 3" to_port="result 3"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
      <portSpacing port="sink_result 4" spacing="0"/>
    </process>
  </operator>
</process>

Best,

YY

sgenzer · September 2018

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Problem with generalized linear model (lambda seach)

Declined · Last Updated March 2020

Comments