The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Process freezes in the optimization operator


Hi,
My process starts logging the same sentence over and over again. Sometimes this happens at 20% optimization, sometimes at 30% and sometimes at 99%, which is extremely annoying after hours of waiting for it to finish. Here are the log entries:
This only seems to happen when I'm optimizing a logistics regression model. Please help me, I don't know what to do. Here is my process.
Thank you,
Ana
My process starts logging the same sentence over and over again. Sometimes this happens at 20% optimization, sometimes at 30% and sometimes at 99%, which is extremely annoying after hours of waiting for it to finish. Here are the log entries:
May 7, 2021 12:40:02 PM INFO: H2O: 2% - iter=0 lmb=.0E0 obj=0.6931 imp=.1E1 bdf=.0E0
May 7, 2021 12:40:04 PM INFO: H2O: 2% - iter=0 lmb=.0E0 obj=0.6928 imp=.1E1 bdf=.56E-1
May 7, 2021 12:40:06 PM INFO: H2O: 2% - iter=0 lmb=.0E0 obj=0.693 imp=.1E1 bdf=.28E-1
May 7, 2021 12:40:07 PM INFO: H2O: 2% - iter=0 lmb=.0E0 obj=0.6931 imp=.1E1 bdf=.0E0
May 7, 2021 12:40:09 PM INFO: H2O: 2% - iter=0 lmb=.0E0 obj=0.6928 imp=.1E1 bdf=.56E-1
May 7, 2021 12:40:11 PM INFO: H2O: 2% - iter=0 lmb=.0E0 obj=0.693 imp=.1E1 bdf=.28E-1
May 7, 2021 12:40:12 PM INFO: H2O: 2% - iter=0 lmb=.0E0 obj=0.6931 imp=.1E1 bdf=.0E0
May 7, 2021 12:40:14 PM INFO: H2O: 2% - iter=0 lmb=.0E0 obj=0.6928 imp=.1E1 bdf=.56E-1
May 7, 2021 12:40:16 PM INFO: H2O: 2% - iter=0 lmb=.0E0 obj=0.693 imp=.1E1 bdf=.28E-1
May 7, 2021 12:40:17 PM INFO: H2O: 2% - iter=0 lmb=.0E0 obj=0.6931 imp=.1E1 bdf=.0E0
May 7, 2021 12:40:19 PM INFO: H2O: 2% - iter=0 lmb=.0E0 obj=0.6928 imp=.1E1 bdf=.56E-1
May 7, 2021 12:40:21 PM INFO: H2O: 2% - iter=0 lmb=.0E0 obj=0.693 imp=.1E1 bdf=.28E-1
May 7, 2021 12:40:22 PM INFO: H2O: 2% - iter=0 lmb=.0E0 obj=0.6931 imp=.1E1 bdf=.0E0
May 7, 2021 12:40:24 PM INFO: H2O: 2% - iter=0 lmb=.0E0 obj=0.6928 imp=.1E1 bdf=.56E-1
May 7, 2021 12:40:26 PM INFO: H2O: 2% - iter=0 lmb=.0E0 obj=0.693 imp=.1E1 bdf=.28E-1
May 7, 2021 12:40:27 PM INFO: H2O: 2% - iter=0 lmb=.0E0 obj=0.6931 imp=.1E1 bdf=.0E0
May 7, 2021 12:40:29 PM INFO: H2O: 2% - iter=0 lmb=.0E0 obj=0.6928 imp=.1E1 bdf=.56E-1
May 7, 2021 12:40:31 PM INFO: H2O: 2% - iter=0 lmb=.0E0 obj=0.693 imp=.1E1 bdf=.28E-1
Thank you,
Ana
<?xml version="1.0" encoding="UTF-8"?><process version="9.9.000"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="9.9.000" expanded="true" name="Process"> <parameter key="logverbosity" value="init"/> <parameter key="random_seed" value="2001"/> <parameter key="send_mail" value="never"/> <parameter key="notification_email" value=""/> <parameter key="process_duration_for_mail" value="30"/> <parameter key="encoding" value="SYSTEM"/> <process expanded="true"> <operator activated="false" class="read_csv" compatibility="9.9.000" expanded="true" height="68" name="Read train" width="90" x="45" y="340"> <parameter key="csv_file" value="C:/Users/ASUS/Documents/Mestrado BBC/tese/4. Feature Extraction/Lesion_data/lesion_trainSet.csv"/> <parameter key="column_separators" value=","/> <parameter key="trim_lines" value="false"/> <parameter key="use_quotes" value="true"/> <parameter key="quotes_character" value="""/> <parameter key="escape_character" value="\"/> <parameter key="skip_comments" value="false"/> <parameter key="comment_characters" value="#"/> <parameter key="starting_row" value="1"/> <parameter key="parse_numbers" value="true"/> <parameter key="decimal_character" value="."/> <parameter key="grouped_digits" value="false"/> <parameter key="grouping_character" value=","/> <parameter key="infinity_representation" value=""/> <parameter key="date_format" value=""/> <parameter key="first_row_as_names" value="true"/> <list key="annotations"/> <parameter key="time_zone" value="SYSTEM"/> <parameter key="locale" value="English (United States)"/> <parameter key="encoding" value="SYSTEM"/> <parameter key="read_all_values_as_polynominal" value="false"/> <list key="data_set_meta_data_information"/> <parameter key="read_not_matching_values_as_missings" value="true"/> </operator> <operator activated="false" class="read_csv" compatibility="9.9.000" expanded="true" height="68" name="Read rad1" width="90" x="45" y="136"> <parameter key="csv_file" value="C:/Users/ASUS/Documents/Mestrado BBC/tese/4. Feature Extraction/Lesion_data/lesion_rad1_features.csv"/> <parameter key="column_separators" value=","/> <parameter key="trim_lines" value="false"/> <parameter key="use_quotes" value="true"/> <parameter key="quotes_character" value="""/> <parameter key="escape_character" value="\"/> <parameter key="skip_comments" value="false"/> <parameter key="comment_characters" value="#"/> <parameter key="starting_row" value="1"/> <parameter key="parse_numbers" value="true"/> <parameter key="decimal_character" value="."/> <parameter key="grouped_digits" value="false"/> <parameter key="grouping_character" value=","/> <parameter key="infinity_representation" value=""/> <parameter key="date_format" value=""/> <parameter key="first_row_as_names" value="true"/> <list key="annotations"/> <parameter key="time_zone" value="SYSTEM"/> <parameter key="locale" value="English (United States)"/> <parameter key="encoding" value="SYSTEM"/> <parameter key="read_all_values_as_polynominal" value="false"/> <list key="data_set_meta_data_information"/> <parameter key="read_not_matching_values_as_missings" value="true"/> </operator> <operator activated="false" class="read_csv" compatibility="9.9.000" expanded="true" height="68" name="Read rad2" width="90" x="45" y="238"> <parameter key="csv_file" value="C:/Users/ASUS/Documents/Mestrado BBC/tese/4. Feature Extraction/Lesion_data/lesion_rad2_features.csv"/> <parameter key="column_separators" value=","/> <parameter key="trim_lines" value="false"/> <parameter key="use_quotes" value="true"/> <parameter key="quotes_character" value="""/> <parameter key="escape_character" value="\"/> <parameter key="skip_comments" value="false"/> <parameter key="comment_characters" value="#"/> <parameter key="starting_row" value="1"/> <parameter key="parse_numbers" value="true"/> <parameter key="decimal_character" value="."/> <parameter key="grouped_digits" value="false"/> <parameter key="grouping_character" value=","/> <parameter key="infinity_representation" value=""/> <parameter key="date_format" value=""/> <parameter key="first_row_as_names" value="true"/> <list key="annotations"/> <parameter key="time_zone" value="SYSTEM"/> <parameter key="locale" value="English (United States)"/> <parameter key="encoding" value="SYSTEM"/> <parameter key="read_all_values_as_polynominal" value="false"/> <list key="data_set_meta_data_information"/> <parameter key="read_not_matching_values_as_missings" value="true"/> </operator> <operator activated="false" class="python_scripting:execute_python" compatibility="9.8.000" expanded="true" height="145" name="Stability analysis" width="90" x="179" y="187"> <parameter key="script" value="import pandas as pd import re def icc(ratings, model='twoway', type='agreement', unit='single', confidence_level=0.95): import numpy as np from scipy.stats import f ratings = np.asarray(ratings) if (model, type, unit) not in {('oneway', 'agreement', 'single'), ('twoway', 'agreement', 'single'), ('twoway', 'consistency', 'single'), ('oneway', 'agreement', 'average'), ('twoway', 'agreement', 'average'), ('twoway', 'consistency', 'average'), }: raise ValueError('Using not implemented configuration.') n_subjects, n_raters = ratings.shape if n_subjects < 1: raise ValueError('Using one subject only. Add more subjects to calculate ICC.') #print("Ratings:", ratings) #print("n_subjects:", n_subjects) #print("n_raters:", n_raters) SStotal = np.var(ratings, ddof=1) * (n_subjects * n_raters - 1) alpha = 1 - confidence_level MSr = np.var(np.mean(ratings, axis=1), ddof=1) * n_raters MSw = np.sum(np.var(ratings, axis=1, ddof=1) / n_subjects) MSc = np.var(np.mean(ratings, axis=0), ddof=1) * n_subjects MSe = (SStotal - MSr * (n_subjects - 1) - MSc * (n_raters - 1)) / ((n_subjects - 1) * (n_raters - 1)) # Single Score ICCs if unit == 'single': if model == 'oneway': # ICC(1,1) One-Way Random, absolute coeff = (MSr - MSw) / (MSr + (n_raters - 1) * MSw) Fvalue = MSr / MSw df1 = n_subjects - 1 df2 = n_subjects * (n_raters - 1) pvalue = 1 - f.cdf(Fvalue, df1, df2) # Confidence interval FL = Fvalue / f.ppf(1 - alpha, df1, df2) FU = Fvalue * f.ppf(1 - alpha, df2, df1) lbound = (FL - 1) / (FL + (n_raters - 1)) ubound = (FU - 1) / (FU + (n_raters - 1)) elif model == 'twoway': if type == 'agreement': # ICC(2,1) Two-Way Random, absolute coeff = (MSr - MSe) / (MSr + (n_raters - 1) * MSe + (n_raters / n_subjects) * (MSc - MSe)) Fvalue = MSr / MSe df1 = n_subjects - 1 df2 = (n_subjects - 1) * (n_raters - 1) pvalue = 1 - f.cdf(Fvalue, df1, df2) # Confidence interval Fj = MSc / MSe vn = (n_raters - 1) * (n_subjects - 1) * ( (n_raters * coeff * Fj + n_subjects * (1 + (n_raters - 1) * coeff) - n_raters * coeff)) ** 2 vd = (n_subjects - 1) * n_raters ** 2 * coeff ** 2 * Fj ** 2 + ( n_subjects * (1 + (n_raters - 1) * coeff) - n_raters * coeff) ** 2 v = vn / vd FL = f.ppf(1 - alpha, n_subjects - 1, v) FU = f.ppf(1 - alpha, v, n_subjects - 1) lbound = (n_subjects * (MSr - FL * MSe)) / (FL * ( n_raters * MSc + (n_raters * n_subjects - n_raters - n_subjects) * MSe) + n_subjects * MSr) ubound = (n_subjects * (FU * MSr - MSe)) / (n_raters * MSc + ( n_raters * n_subjects - n_raters - n_subjects) * MSe + n_subjects * FU * MSr) elif type == 'consistency': # ICC(3,1) Two-Way Mixed, consistency coeff = (MSr - MSe) / (MSr + (n_raters - 1) * MSe) Fvalue = MSr / MSe df1 = n_subjects - 1 df2 = (n_subjects - 1) * (n_raters - 1) pvalue = 1 - f.cdf(Fvalue, df1, df2) # Confidence interval FL = Fvalue / f.ppf(1 - alpha, df1, df2) FU = Fvalue * f.ppf(1 - alpha, df2, df1) lbound = (FL - 1) / (FL + (n_raters - 1)) ubound = (FU - 1) / (FU + (n_raters - 1)) elif unit == 'average': if model == 'oneway': # ICC(1,k) One-Way Random, absolute coeff = (MSr - MSw) / MSr Fvalue = MSr / MSw df1 = n_subjects - 1 df2 = n_subjects * (n_raters - 1) pvalue = 1 - f.cdf(Fvalue, df1, df2) # Confidence interval FL = (MSr / MSw) / f.ppf(1 - alpha, df1, df2) FU = (MSr / MSw) * f.ppf(1 - alpha, df2, df1) lbound = 1 - 1 / FL ubound = 1 - 1 / FU elif model == 'twoway': if type == 'agreement': # ICC(2,k) Two-Way Random, absolute coeff = (MSr - MSe) / (MSr + (MSc - MSe) / n_subjects) Fvalue = MSr / MSe df1 = n_subjects - 1 df2 = (n_subjects - 1) * (n_raters - 1) pvalue = 1 - f.cdf(Fvalue, df1, df2) # Confidence interval icc2 = (MSr - MSe) / (MSr + (n_raters - 1) * MSe + (n_raters / n_subjects) * (MSc - MSe)) Fj = MSc / MSe vn = (n_raters - 1) * (n_subjects - 1) * ( (n_raters * icc2 * Fj + n_subjects * (1 + (n_raters - 1) * icc2) - n_raters * icc2)) ** 2 vd = (n_subjects - 1) * n_raters ** 2 * icc2 ** 2 * Fj ** 2 + ( n_subjects * (1 + (n_raters - 1) * icc2) - n_raters * icc2) ** 2 v = vn / vd FL = f.ppf(1 - alpha, n_subjects - 1, v) FU = f.ppf(1 - alpha, v, n_subjects - 1) lb2 = (n_subjects * (MSr - FL * MSe)) / (FL * ( n_raters * MSc + (n_raters * n_subjects - n_raters - n_subjects) * MSe) + n_subjects * MSr) ub2 = (n_subjects * (FU * MSr - MSe)) / (n_raters * MSc + ( n_raters * n_subjects - n_raters - n_subjects) * MSe + n_subjects * FU * MSr) lbound = lb2 * n_raters / (1 + lb2 * (n_raters - 1)) ubound = ub2 * n_raters / (1 + ub2 * (n_raters - 1)) elif type == 'consistency': # ICC(3,k) Two-Way Mixed, consistency coeff = (MSr - MSe) / MSr Fvalue = MSr / MSe df1 = n_subjects - 1 df2 = (n_subjects - 1) * (n_raters - 1) pvalue = 1 - f.cdf(Fvalue, df1, df2) # Confidence interval FL = Fvalue / f.ppf(1 - alpha, df1, df2) FU = Fvalue * f.ppf(1 - alpha, df2, df1) lbound = 1 - 1 / FL ubound = 1 - 1 / FU return coeff, Fvalue, df1, df2, pvalue, lbound, ubound def rm_main(rad1, rad2, train): patientIDs = list(train['ID']) rad1_p = list(rad1['ID']) rad2_p = list(rad2['ID']) both_rad = [value for value in rad1_p if value in rad2_p] both_rad = [value for value in both_rad if value in patientIDs] rad1 = rad1.set_index('ID') rad2 = rad2.set_index('ID') df_rad1 = rad1.loc[both_rad, :] df_rad2 = rad2.loc[both_rad, :] feature_names = df_rad1.columns[1:-1] robustness_analysis = {} for i in feature_names: a = df_rad1[i] b = df_rad2[i] features = pd.concat([a, b], axis=1) d = {} coeff, Fvalue, df1, df2, pvalue, lbound, ubound = icc(features) d['coeff'] = coeff d['Fvalue'] = Fvalue d['df1'] = df1 d['df2'] = df2 d['pvalue'] = pvalue d['lbound'] = lbound d['ubound'] = ubound robustness_analysis[i] = d robust = pd.DataFrame(robustness_analysis, columns = robustness_analysis.keys(), index = ['coeff', 'Fvalue', 'df1', 'df2', 'pvalue', 'lbound', 'ubound']) not_keep = list(robust.loc['lbound'] <= 0.8) features_to_eliminate = feature_names[not_keep] elm = list(features_to_eliminate) for col in train.columns: if col in elm: del train[col] print("Number of patients used in the stability analysis:", len(both_rad)) print("Number of features eliminated:", len(elm)) t = 0 d = 0 a = 0 for i in list(features_to_eliminate): if re.search('T2W', i): t += 1 elif re.search('DWI', i): d += 1 elif re.search('ADC', i): a += 1 print('T2W:', t) print('DWI:', d) print('ADC:', a) return train"/> <parameter key="notebook_cell_tag_filter" value=""/> <parameter key="use_default_python" value="true"/> <parameter key="package_manager" value="conda (anaconda)"/> <parameter key="use_macros" value="false"/> </operator> <operator activated="true" class="read_csv" compatibility="9.9.000" expanded="true" height="68" name="Read train (2)" width="90" x="45" y="34"> <parameter key="csv_file" value="C:/Users/ASUS/Documents/Mestrado BBC/tese/4. Feature Extraction/Gland_data/gland_trainSet_stable.csv"/> <parameter key="column_separators" value=","/> <parameter key="trim_lines" value="false"/> <parameter key="use_quotes" value="true"/> <parameter key="quotes_character" value="""/> <parameter key="escape_character" value="\"/> <parameter key="skip_comments" value="false"/> <parameter key="comment_characters" value="#"/> <parameter key="starting_row" value="1"/> <parameter key="parse_numbers" value="true"/> <parameter key="decimal_character" value="."/> <parameter key="grouped_digits" value="false"/> <parameter key="grouping_character" value=","/> <parameter key="infinity_representation" value=""/> <parameter key="date_format" value=""/> <parameter key="first_row_as_names" value="true"/> <list key="annotations"/> <parameter key="time_zone" value="SYSTEM"/> <parameter key="locale" value="English (United States)"/> <parameter key="encoding" value="SYSTEM"/> <parameter key="read_all_values_as_polynominal" value="false"/> <list key="data_set_meta_data_information"/> <parameter key="read_not_matching_values_as_missings" value="true"/> </operator> <operator activated="true" class="filter_examples" compatibility="9.9.000" expanded="true" height="103" name="Remove missing data" width="90" x="179" y="34"> <parameter key="parameter_expression" value=""/> <parameter key="condition_class" value="no_missing_attributes"/> <parameter key="invert_filter" value="false"/> <list key="filters_list"/> <parameter key="filters_logic_and" value="true"/> <parameter key="filters_check_metadata" value="true"/> </operator> <operator activated="true" class="set_role" compatibility="9.9.000" expanded="true" height="82" name="Set Role (2)" width="90" x="313" y="34"> <parameter key="attribute_name" value="ID"/> <parameter key="target_role" value="id"/> <list key="set_additional_roles"> <parameter key="ID" value="id"/> <parameter key="Target" value="label"/> </list> </operator> <operator activated="true" class="multiply" compatibility="9.9.000" expanded="true" height="103" name="Multiply (2)" width="90" x="447" y="34"/> <operator activated="true" class="sample" compatibility="9.9.000" expanded="true" height="82" name="Sample (3)" width="90" x="581" y="34"> <parameter key="sample" value="absolute"/> <parameter key="balance_data" value="true"/> <parameter key="sample_size" value="100"/> <parameter key="sample_ratio" value="0.1"/> <parameter key="sample_probability" value="0.1"/> <list key="sample_size_per_class"> <parameter key="False" value="51"/> <parameter key="True" value="51"/> </list> <list key="sample_ratio_per_class"/> <list key="sample_probability_per_class"/> <parameter key="use_local_random_seed" value="false"/> <parameter key="local_random_seed" value="1992"/> </operator> <operator activated="true" class="concurrency:optimize_parameters_grid" compatibility="9.9.000" expanded="true" height="145" name="with Downsampling" width="90" x="715" y="34"> <list key="parameters"> <parameter key="Remove Correlated Attributes.correlation" value="[0.4;1.0;6;linear]"/> <parameter key="MRMR-FS.k" value="[10;24;7;linear]"/> <parameter key="Logistic Regression.alpha" value="[0.0;1.0;5;linear]"/> </list> <parameter key="error_handling" value="fail on error"/> <parameter key="log_performance" value="true"/> <parameter key="log_all_criteria" value="false"/> <parameter key="synchronize" value="false"/> <parameter key="enable_parallel_execution" value="true"/> <process expanded="true"> <operator activated="true" class="concurrency:cross_validation" compatibility="9.9.000" expanded="true" height="145" name="Cross Validation (2)" width="90" x="45" y="34"> <parameter key="split_on_batch_attribute" value="false"/> <parameter key="leave_one_out" value="false"/> <parameter key="number_of_folds" value="4"/> <parameter key="sampling_type" value="automatic"/> <parameter key="use_local_random_seed" value="false"/> <parameter key="local_random_seed" value="1992"/> <parameter key="enable_parallel_execution" value="true"/> <process expanded="true"> <operator activated="true" class="subprocess" compatibility="9.9.000" expanded="true" height="82" name="Remove outliers (2)" width="90" x="45" y="34"> <process expanded="true"> <operator activated="true" class="multiply" compatibility="9.9.000" expanded="true" height="103" name="Multiply (3)" width="90" x="45" y="34"/> <operator activated="true" class="normalize" compatibility="9.9.000" expanded="true" height="103" name="Normalize (2)" width="90" x="112" y="187"> <parameter key="return_preprocessing_model" value="false"/> <parameter key="create_view" value="false"/> <parameter key="attribute_filter_type" value="all"/> <parameter key="attribute" value=""/> <parameter key="attributes" value=""/> <parameter key="use_except_expression" value="false"/> <parameter key="value_type" value="numeric"/> <parameter key="use_value_type_exception" value="false"/> <parameter key="except_value_type" value="real"/> <parameter key="block_type" value="value_series"/> <parameter key="use_block_type_exception" value="false"/> <parameter key="except_block_type" value="value_series_end"/> <parameter key="invert_selection" value="false"/> <parameter key="include_special_attributes" value="false"/> <parameter key="method" value="Z-transformation"/> <parameter key="min" value="0.0"/> <parameter key="max" value="1.0"/> <parameter key="allow_negative_values" value="false"/> </operator> <operator activated="true" class="detect_outlier_lof" compatibility="9.9.000" expanded="true" height="82" name="Detect Outlier (LOF)" width="90" x="246" y="187"> <parameter key="minimal_points_lower_bound" value="10"/> <parameter key="minimal_points_upper_bound" value="20"/> <parameter key="distance_function" value="euclidian distance"/> </operator> <operator activated="true" class="python_scripting:execute_python" compatibility="9.8.000" expanded="true" height="124" name="Execute Python (3)" width="90" x="380" y="34"> <parameter key="script" value="import pandas # rm_main is a mandatory function, # the number of arguments has to be the number of input ports (can be none), # or the number of input ports plus one if "use macros" parameter is set # if you want to use macros, use this instead and check "use macros" parameter: #def rm_main(data,macros): def rm_main(ori, norm): ids = list(norm.loc[norm['outlier']<2, 'ID']) data = ori.set_index('ID', drop = False).loc[ids,:] return data"/> <parameter key="notebook_cell_tag_filter" value=""/> <parameter key="use_default_python" value="true"/> <parameter key="package_manager" value="conda (anaconda)"/> <parameter key="use_macros" value="false"/> </operator> <operator activated="true" class="set_role" compatibility="9.9.000" expanded="true" height="82" name="Set Role (4)" width="90" x="581" y="34"> <parameter key="attribute_name" value="ID"/> <parameter key="target_role" value="id"/> <list key="set_additional_roles"> <parameter key="Target" value="label"/> <parameter key="ID" value="id"/> </list> </operator> <connect from_port="in 1" to_op="Multiply (3)" to_port="input"/> <connect from_op="Multiply (3)" from_port="output 1" to_op="Execute Python (3)" to_port="input 1"/> <connect from_op="Multiply (3)" from_port="output 2" to_op="Normalize (2)" to_port="example set input"/> <connect from_op="Normalize (2)" from_port="example set output" to_op="Detect Outlier (LOF)" to_port="example set input"/> <connect from_op="Detect Outlier (LOF)" from_port="example set output" to_op="Execute Python (3)" to_port="input 2"/> <connect from_op="Execute Python (3)" from_port="output 1" to_op="Set Role (4)" to_port="example set input"/> <connect from_op="Set Role (4)" from_port="example set output" to_port="out 1"/> <portSpacing port="source_in 1" spacing="0"/> <portSpacing port="source_in 2" spacing="0"/> <portSpacing port="sink_out 1" spacing="0"/> <portSpacing port="sink_out 2" spacing="0"/> </process> </operator> <operator activated="true" class="remove_correlated_attributes" compatibility="9.9.000" expanded="true" height="82" name="Remove Correlated Attributes" width="90" x="179" y="34"> <parameter key="correlation" value="1.0"/> <parameter key="filter_relation" value="greater"/> <parameter key="attribute_order" value="random"/> <parameter key="use_absolute_correlation" value="true"/> <parameter key="use_local_random_seed" value="false"/> <parameter key="local_random_seed" value="1992"/> </operator> <operator activated="true" class="featselext:mrmr_feature_selection" compatibility="1.1.004" expanded="true" height="82" name="MRMR-FS" width="90" x="313" y="34"> <parameter key="normalize_weights" value="false"/> <parameter key="sort_weights" value="false"/> <parameter key="sort_direction" value="ascending"/> <parameter key="sets_or_ranks" value="sets"/> <parameter key="calculate full ranking" value="true"/> <parameter key="k" value="24"/> <parameter key="relevance_redundancy_relation" value="quotient"/> <parameter key="use_ensemble_method" value="none"/> <parameter key="ensemble_size" value="10"/> <parameter key="logging" value="false"/> </operator> <operator activated="true" class="h2o:logistic_regression" compatibility="9.9.000" expanded="true" height="124" name="Logistic Regression" width="90" x="447" y="34"> <parameter key="solver" value="AUTO"/> <parameter key="reproducible" value="false"/> <parameter key="maximum_number_of_threads" value="4"/> <parameter key="use_regularization" value="true"/> <parameter key="lambda_search" value="false"/> <parameter key="number_of_lambdas" value="0"/> <parameter key="lambda_min_ratio" value="0.0"/> <parameter key="early_stopping" value="true"/> <parameter key="stopping_rounds" value="3"/> <parameter key="stopping_tolerance" value="0.001"/> <parameter key="standardize" value="true"/> <parameter key="non-negative_coefficients" value="false"/> <parameter key="add_intercept" value="true"/> <parameter key="compute_p-values" value="true"/> <parameter key="remove_collinear_columns" value="true"/> <parameter key="missing_values_handling" value="MeanImputation"/> <parameter key="max_iterations" value="0"/> <parameter key="max_runtime_seconds" value="0"/> </operator> <connect from_port="training set" to_op="Remove outliers (2)" to_port="in 1"/> <connect from_op="Remove outliers (2)" from_port="out 1" to_op="Remove Correlated Attributes" to_port="example set input"/> <connect from_op="Remove Correlated Attributes" from_port="example set output" to_op="MRMR-FS" to_port="example set"/> <connect from_op="MRMR-FS" from_port="example set" to_op="Logistic Regression" to_port="training set"/> <connect from_op="Logistic Regression" from_port="model" to_port="model"/> <portSpacing port="source_training set" spacing="0"/> <portSpacing port="sink_model" spacing="0"/> <portSpacing port="sink_through 1" spacing="0"/> </process> <process expanded="true"> <operator activated="true" class="apply_model" compatibility="9.9.000" expanded="true" height="82" name="Apply Model (2)" width="90" x="45" y="34"> <list key="application_parameters"/> <parameter key="create_view" value="false"/> </operator> <operator activated="true" class="performance_binominal_classification" compatibility="9.9.000" expanded="true" height="82" name="CV-D" width="90" x="179" y="34"> <parameter key="manually_set_positive_class" value="true"/> <parameter key="positive_class" value="True"/> <parameter key="main_criterion" value="recall"/> <parameter key="accuracy" value="false"/> <parameter key="classification_error" value="false"/> <parameter key="kappa" value="true"/> <parameter key="AUC (optimistic)" value="false"/> <parameter key="AUC" value="true"/> <parameter key="AUC (pessimistic)" value="false"/> <parameter key="precision" value="true"/> <parameter key="recall" value="true"/> <parameter key="lift" value="false"/> <parameter key="fallout" value="false"/> <parameter key="f_measure" value="false"/> <parameter key="false_positive" value="false"/> <parameter key="false_negative" value="false"/> <parameter key="true_positive" value="false"/> <parameter key="true_negative" value="false"/> <parameter key="sensitivity" value="false"/> <parameter key="specificity" value="false"/> <parameter key="youden" value="false"/> <parameter key="positive_predictive_value" value="false"/> <parameter key="negative_predictive_value" value="false"/> <parameter key="psep" value="false"/> <parameter key="skip_undefined_labels" value="true"/> <parameter key="use_example_weights" value="true"/> </operator> <operator activated="true" class="operator_toolbox:performance_auprc" compatibility="2.9.000" expanded="true" height="82" name="Performance (AUPRC)" width="90" x="313" y="34"> <parameter key="main_criterion" value="first"/> <parameter key="accuracy" value="false"/> <parameter key="AUC" value="false"/> <parameter key="AUPRC" value="true"/> <parameter key="skip_undefined_labels" value="true"/> <parameter key="use_example_weights" value="true"/> </operator> <operator activated="true" class="radiomics_test:my_own_operator" compatibility="1.0.000" expanded="true" height="82" name="Performance (Fbeta-score)" width="90" x="447" y="34"> <parameter key="Manually set positive class" value="true"/> <parameter key="Positive class" value="True"/> <parameter key="Make Fbeta-score the main criterion" value="true"/> <parameter key="Beta" value="2.0"/> </operator> <connect from_port="model" to_op="Apply Model (2)" to_port="model"/> <connect from_port="test set" to_op="Apply Model (2)" to_port="unlabelled data"/> <connect from_op="Apply Model (2)" from_port="labelled data" to_op="CV-D" to_port="labelled data"/> <connect from_op="CV-D" from_port="performance" to_op="Performance (AUPRC)" to_port="performance"/> <connect from_op="CV-D" from_port="example set" to_op="Performance (AUPRC)" to_port="labelled data"/> <connect from_op="Performance (AUPRC)" from_port="performance" to_op="Performance (Fbeta-score)" to_port="performance vector"/> <connect from_op="Performance (AUPRC)" from_port="example set" to_op="Performance (Fbeta-score)" to_port="labelled example set"/> <connect from_op="Performance (Fbeta-score)" from_port="performance vector" to_port="performance 1"/> <connect from_op="Performance (Fbeta-score)" from_port="labelled example set" to_port="test set results"/> <portSpacing port="source_model" spacing="0"/> <portSpacing port="source_test set" spacing="0"/> <portSpacing port="source_through 1" spacing="0"/> <portSpacing port="sink_test set results" spacing="0"/> <portSpacing port="sink_performance 1" spacing="0"/> <portSpacing port="sink_performance 2" spacing="0"/> </process> </operator> <connect from_port="input 1" to_op="Cross Validation (2)" to_port="example set"/> <connect from_op="Cross Validation (2)" from_port="model" to_port="model"/> <connect from_op="Cross Validation (2)" from_port="test result set" to_port="output 1"/> <connect from_op="Cross Validation (2)" from_port="performance 1" to_port="performance"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="source_input 2" spacing="0"/> <portSpacing port="sink_performance" spacing="0"/> <portSpacing port="sink_model" spacing="0"/> <portSpacing port="sink_output 1" spacing="0"/> <portSpacing port="sink_output 2" spacing="0"/> </process> </operator> <operator activated="true" class="store" compatibility="9.9.000" expanded="true" height="68" name="Store" width="90" x="916" y="85"> <parameter key="repository_entry" value="../Models_mRMR/G_D_mRMR_LR-EN"/> </operator> <operator activated="true" class="concurrency:optimize_parameters_grid" compatibility="9.9.000" expanded="true" height="145" name="without downsampling" width="90" x="715" y="187"> <list key="parameters"> <parameter key="Remove Correlated Attributes (2).correlation" value="[0.4;1.0;6;linear]"/> <parameter key="MRMR-FS (2).k" value="[10;24;7;linear]"/> <parameter key="Logistic Regression (2).alpha" value="[0.0;1.0;5;linear]"/> </list> <parameter key="error_handling" value="fail on error"/> <parameter key="log_performance" value="true"/> <parameter key="log_all_criteria" value="false"/> <parameter key="synchronize" value="false"/> <parameter key="enable_parallel_execution" value="true"/> <process expanded="true"> <operator activated="true" class="concurrency:cross_validation" compatibility="9.9.000" expanded="true" height="145" name="Cross Validation" width="90" x="45" y="34"> <parameter key="split_on_batch_attribute" value="false"/> <parameter key="leave_one_out" value="false"/> <parameter key="number_of_folds" value="4"/> <parameter key="sampling_type" value="automatic"/> <parameter key="use_local_random_seed" value="false"/> <parameter key="local_random_seed" value="1992"/> <parameter key="enable_parallel_execution" value="true"/> <process expanded="true"> <operator activated="true" class="subprocess" compatibility="9.9.000" expanded="true" height="82" name="Remove outliers" width="90" x="45" y="34"> <process expanded="true"> <operator activated="true" class="multiply" compatibility="9.9.000" expanded="true" height="103" name="Multiply (4)" width="90" x="45" y="34"/> <operator activated="true" class="normalize" compatibility="9.9.000" expanded="true" height="103" name="Normalize" width="90" x="112" y="187"> <parameter key="return_preprocessing_model" value="false"/> <parameter key="create_view" value="false"/> <parameter key="attribute_filter_type" value="all"/> <parameter key="attribute" value=""/> <parameter key="attributes" value=""/> <parameter key="use_except_expression" value="false"/> <parameter key="value_type" value="numeric"/> <parameter key="use_value_type_exception" value="false"/> <parameter key="except_value_type" value="real"/> <parameter key="block_type" value="value_series"/> <parameter key="use_block_type_exception" value="false"/> <parameter key="except_block_type" value="value_series_end"/> <parameter key="invert_selection" value="false"/> <parameter key="include_special_attributes" value="false"/> <parameter key="method" value="Z-transformation"/> <parameter key="min" value="0.0"/> <parameter key="max" value="1.0"/> <parameter key="allow_negative_values" value="false"/> </operator> <operator activated="true" class="detect_outlier_lof" compatibility="9.9.000" expanded="true" height="82" name="Detect Outlier (LOF) (2)" width="90" x="246" y="187"> <parameter key="minimal_points_lower_bound" value="10"/> <parameter key="minimal_points_upper_bound" value="20"/> <parameter key="distance_function" value="euclidian distance"/> </operator> <operator activated="true" class="python_scripting:execute_python" compatibility="9.8.000" expanded="true" height="124" name="Execute Python (4)" width="90" x="380" y="34"> <parameter key="script" value="import pandas # rm_main is a mandatory function, # the number of arguments has to be the number of input ports (can be none), # or the number of input ports plus one if "use macros" parameter is set # if you want to use macros, use this instead and check "use macros" parameter: #def rm_main(data,macros): def rm_main(ori, norm): ids = list(norm.loc[norm['outlier']<2, 'ID']) data = ori.set_index('ID', drop = False).loc[ids,:] return data"/> <parameter key="notebook_cell_tag_filter" value=""/> <parameter key="use_default_python" value="true"/> <parameter key="package_manager" value="conda (anaconda)"/> <parameter key="use_macros" value="false"/> </operator> <operator activated="true" class="set_role" compatibility="9.9.000" expanded="true" height="82" name="Set Role (5)" width="90" x="581" y="34"> <parameter key="attribute_name" value="ID"/> <parameter key="target_role" value="id"/> <list key="set_additional_roles"> <parameter key="Target" value="label"/> <parameter key="ID" value="id"/> </list> </operator> <connect from_port="in 1" to_op="Multiply (4)" to_port="input"/> <connect from_op="Multiply (4)" from_port="output 1" to_op="Execute Python (4)" to_port="input 1"/> <connect from_op="Multiply (4)" from_port="output 2" to_op="Normalize" to_port="example set input"/> <connect from_op="Normalize" from_port="example set output" to_op="Detect Outlier (LOF) (2)" to_port="example set input"/> <connect from_op="Detect Outlier (LOF) (2)" from_port="example set output" to_op="Execute Python (4)" to_port="input 2"/> <connect from_op="Execute Python (4)" from_port="output 1" to_op="Set Role (5)" to_port="example set input"/> <connect from_op="Set Role (5)" from_port="example set output" to_port="out 1"/> <portSpacing port="source_in 1" spacing="0"/> <portSpacing port="source_in 2" spacing="0"/> <portSpacing port="sink_out 1" spacing="0"/> <portSpacing port="sink_out 2" spacing="0"/> </process> </operator> <operator activated="true" class="remove_correlated_attributes" compatibility="9.9.000" expanded="true" height="82" name="Remove Correlated Attributes (2)" width="90" x="179" y="34"> <parameter key="correlation" value="0.2"/> <parameter key="filter_relation" value="greater"/> <parameter key="attribute_order" value="random"/> <parameter key="use_absolute_correlation" value="true"/> <parameter key="use_local_random_seed" value="false"/> <parameter key="local_random_seed" value="1992"/> </operator> <operator activated="true" class="featselext:mrmr_feature_selection" compatibility="1.1.004" expanded="true" height="82" name="MRMR-FS (2)" width="90" x="313" y="34"> <parameter key="normalize_weights" value="false"/> <parameter key="sort_weights" value="false"/> <parameter key="sort_direction" value="ascending"/> <parameter key="sets_or_ranks" value="sets"/> <parameter key="calculate full ranking" value="true"/> <parameter key="k" value="100"/> <parameter key="relevance_redundancy_relation" value="quotient"/> <parameter key="use_ensemble_method" value="none"/> <parameter key="ensemble_size" value="10"/> <parameter key="logging" value="false"/> </operator> <operator activated="true" class="h2o:logistic_regression" compatibility="9.9.000" expanded="true" height="124" name="Logistic Regression (2)" width="90" x="581" y="34"> <parameter key="solver" value="AUTO"/> <parameter key="reproducible" value="false"/> <parameter key="maximum_number_of_threads" value="4"/> <parameter key="use_regularization" value="true"/> <parameter key="lambda_search" value="false"/> <parameter key="number_of_lambdas" value="0"/> <parameter key="lambda_min_ratio" value="0.0"/> <parameter key="early_stopping" value="true"/> <parameter key="stopping_rounds" value="3"/> <parameter key="stopping_tolerance" value="0.001"/> <parameter key="standardize" value="true"/> <parameter key="non-negative_coefficients" value="false"/> <parameter key="add_intercept" value="true"/> <parameter key="compute_p-values" value="true"/> <parameter key="remove_collinear_columns" value="true"/> <parameter key="missing_values_handling" value="MeanImputation"/> <parameter key="max_iterations" value="0"/> <parameter key="max_runtime_seconds" value="0"/> </operator> <connect from_port="training set" to_op="Remove outliers" to_port="in 1"/> <connect from_op="Remove outliers" from_port="out 1" to_op="Remove Correlated Attributes (2)" to_port="example set input"/> <connect from_op="Remove Correlated Attributes (2)" from_port="example set output" to_op="MRMR-FS (2)" to_port="example set"/> <connect from_op="MRMR-FS (2)" from_port="example set" to_op="Logistic Regression (2)" to_port="training set"/> <connect from_op="Logistic Regression (2)" from_port="model" to_port="model"/> <portSpacing port="source_training set" spacing="0"/> <portSpacing port="sink_model" spacing="0"/> <portSpacing port="sink_through 1" spacing="0"/> </process> <process expanded="true"> <operator activated="true" class="apply_model" compatibility="9.9.000" expanded="true" height="82" name="Apply Model" width="90" x="45" y="34"> <list key="application_parameters"/> <parameter key="create_view" value="false"/> </operator> <operator activated="true" class="performance_binominal_classification" compatibility="9.9.000" expanded="true" height="82" name="CV-nD" width="90" x="179" y="34"> <parameter key="manually_set_positive_class" value="true"/> <parameter key="positive_class" value="True"/> <parameter key="main_criterion" value="recall"/> <parameter key="accuracy" value="false"/> <parameter key="classification_error" value="false"/> <parameter key="kappa" value="true"/> <parameter key="AUC (optimistic)" value="false"/> <parameter key="AUC" value="true"/> <parameter key="AUC (pessimistic)" value="false"/> <parameter key="precision" value="true"/> <parameter key="recall" value="true"/> <parameter key="lift" value="false"/> <parameter key="fallout" value="false"/> <parameter key="f_measure" value="false"/> <parameter key="false_positive" value="false"/> <parameter key="false_negative" value="false"/> <parameter key="true_positive" value="false"/> <parameter key="true_negative" value="false"/> <parameter key="sensitivity" value="false"/> <parameter key="specificity" value="false"/> <parameter key="youden" value="false"/> <parameter key="positive_predictive_value" value="false"/> <parameter key="negative_predictive_value" value="false"/> <parameter key="psep" value="false"/> <parameter key="skip_undefined_labels" value="true"/> <parameter key="use_example_weights" value="true"/> </operator> <operator activated="true" class="operator_toolbox:performance_auprc" compatibility="2.9.000" expanded="true" height="82" name="Performance (AUPRC) (2)" width="90" x="313" y="34"> <parameter key="main_criterion" value="first"/> <parameter key="accuracy" value="false"/> <parameter key="AUC" value="false"/> <parameter key="AUPRC" value="true"/> <parameter key="skip_undefined_labels" value="true"/> <parameter key="use_example_weights" value="true"/> </operator> <operator activated="true" class="radiomics_test:my_own_operator" compatibility="1.0.000" expanded="true" height="82" name="Performance (Fbeta-score) (3)" width="90" x="447" y="34"> <parameter key="Manually set positive class" value="true"/> <parameter key="Positive class" value="True"/> <parameter key="Make Fbeta-score the main criterion" value="true"/> <parameter key="Beta" value="2.0"/> </operator> <connect from_port="model" to_op="Apply Model" to_port="model"/> <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/> <connect from_op="Apply Model" from_port="labelled data" to_op="CV-nD" to_port="labelled data"/> <connect from_op="CV-nD" from_port="performance" to_op="Performance (AUPRC) (2)" to_port="performance"/> <connect from_op="CV-nD" from_port="example set" to_op="Performance (AUPRC) (2)" to_port="labelled data"/> <connect from_op="Performance (AUPRC) (2)" from_port="performance" to_op="Performance (Fbeta-score) (3)" to_port="performance vector"/> <connect from_op="Performance (AUPRC) (2)" from_port="example set" to_op="Performance (Fbeta-score) (3)" to_port="labelled example set"/> <connect from_op="Performance (Fbeta-score) (3)" from_port="performance vector" to_port="performance 1"/> <connect from_op="Performance (Fbeta-score) (3)" from_port="labelled example set" to_port="test set results"/> <portSpacing port="source_model" spacing="0"/> <portSpacing port="source_test set" spacing="0"/> <portSpacing port="source_through 1" spacing="0"/> <portSpacing port="sink_test set results" spacing="0"/> <portSpacing port="sink_performance 1" spacing="0"/> <portSpacing port="sink_performance 2" spacing="0"/> </process> </operator> <connect from_port="input 1" to_op="Cross Validation" to_port="example set"/> <connect from_op="Cross Validation" from_port="model" to_port="model"/> <connect from_op="Cross Validation" from_port="test result set" to_port="output 1"/> <connect from_op="Cross Validation" from_port="performance 1" to_port="performance"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="source_input 2" spacing="0"/> <portSpacing port="sink_performance" spacing="0"/> <portSpacing port="sink_model" spacing="0"/> <portSpacing port="sink_output 1" spacing="0"/> <portSpacing port="sink_output 2" spacing="0"/> </process> </operator> <operator activated="true" class="store" compatibility="9.9.000" expanded="true" height="68" name="Store (2)" width="90" x="916" y="238"> <parameter key="repository_entry" value="../Models_mRMR/G_nD_mRMR_LR-EN"/> </operator> <operator activated="false" class="python_scripting:execute_python" compatibility="9.8.000" expanded="true" height="82" name="DeLong Test (AUPRC) (3)" width="90" x="916" y="340"> <parameter key="script" value="import pandas import scipy.stats as st from sklearn import metrics from sklearn.metrics import precision_recall_curve from sklearn.metrics import auc def kernel(X, Y): return .5 if Y==X else int(Y < X) def structural_components(X, Y): V10 = [1/len(Y) * sum([kernel(x, y) for y in Y]) for x in X] V01 = [1/len(X) * sum([kernel(x, y) for x in X]) for y in Y] return V10, V01 def get_S_entry(V_A, V_B, auc_A, auc_B): return 1/(len(V_A)-1) * sum([(a-auc_A)*(b-auc_B) for a,b in zip(V_A, V_B)]) def z_score(var_A, var_B, covar_AB, auc_A, auc_B): return (auc_A - auc_B)/((var_A + var_B - 2*covar_AB)**(.5)) def group_preds_by_label(preds, actual): X = [p for (p, a) in zip(preds, actual) if a=='True'] Y = [p for (p, a) in zip(preds, actual) if not a=='True'] return X, Y def rm_main(dataA, dataB): preds_A = dataA.loc[:, 'prediction(Target)'] preds_B = dataB.loc[:, 'prediction(Target)'] actual_A = dataA.loc[:, 'Target'] actual_B = dataB.loc[:, 'Target'] X_A, Y_A = group_preds_by_label(preds_A, actual_A) X_B, Y_B = group_preds_by_label(preds_B, actual_B) V_A10, V_A01 = structural_components(X_A, Y_A) V_B10, V_B01 = structural_components(X_B, Y_B) a_A = [1 if elem == 'True' else 0 for elem in actual_A] a_B = [1 if elem == 'True' else 0 for elem in actual_B] p_A = [1 if elem == 'True' else 0 for elem in preds_A] p_B = [1 if elem == 'True' else 0 for elem in preds_B] precision_A, recall_A, thresholds_A = precision_recall_curve(a_A, p_A) auc_A = auc(recall_A, precision_A) precision_B, recall_B, thresholds_B = precision_recall_curve(a_B, p_B) auc_B = auc(recall_B, precision_B) # Compute entries of covariance matrix S (covar_AB = covar_BA) var_A = (get_S_entry(V_A10, V_A10, auc_A, auc_A) * 1/len(V_A10) + get_S_entry(V_A01, V_A01, auc_A, auc_A) * 1/len(V_A01)) var_B = (get_S_entry(V_B10, V_B10, auc_B, auc_B) * 1/len(V_B10) + get_S_entry(V_B01, V_B01, auc_B, auc_B) * 1/len(V_B01)) covar_AB = (get_S_entry(V_A10, V_B10, auc_A, auc_B) * 1/len(V_A10) + get_S_entry(V_A01, V_B01, auc_A, auc_B) * 1/len(V_A01)) # Two tailed test z = z_score(var_A, var_B, covar_AB, auc_A, auc_B) p = st.norm.sf(abs(z))*2 print('Is AUPRC_A significantly different from AUPRC_B?') print('CV p-value:', p) return p"/> <parameter key="notebook_cell_tag_filter" value=""/> <parameter key="use_default_python" value="true"/> <parameter key="package_manager" value="conda (anaconda)"/> <parameter key="use_macros" value="false"/> </operator> <connect from_op="Read train" from_port="output" to_op="Stability analysis" to_port="input 3"/> <connect from_op="Read rad1" from_port="output" to_op="Stability analysis" to_port="input 1"/> <connect from_op="Read rad2" from_port="output" to_op="Stability analysis" to_port="input 2"/> <connect from_op="Read train (2)" from_port="output" to_op="Remove missing data" to_port="example set input"/> <connect from_op="Remove missing data" from_port="example set output" to_op="Set Role (2)" to_port="example set input"/> <connect from_op="Set Role (2)" from_port="example set output" to_op="Multiply (2)" to_port="input"/> <connect from_op="Multiply (2)" from_port="output 1" to_op="Sample (3)" to_port="example set input"/> <connect from_op="Multiply (2)" from_port="output 2" to_op="without downsampling" to_port="input 1"/> <connect from_op="Sample (3)" from_port="example set output" to_op="with Downsampling" to_port="input 1"/> <connect from_op="with Downsampling" from_port="performance" to_port="result 1"/> <connect from_op="with Downsampling" from_port="model" to_op="Store" to_port="input"/> <connect from_op="without downsampling" from_port="performance" to_port="result 2"/> <connect from_op="without downsampling" from_port="model" to_op="Store (2)" to_port="input"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> <portSpacing port="sink_result 3" spacing="0"/> </process> </operator> </process>
Tagged:
0