The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Validation process hangs up
Dear all,
I created a simple validation process which is intended to be run several times in order to examine the optimal number of training cycles. (--> run with 10 cycles, run with 20 cycles, run with 200 cycles and compare performances)
With 10 training cycles of the neural net this process works fine. But once set to 200 the process hangs up. It keeps running and running and running. Finally, the "send bug report" dialog appears.
To determine the rootcause I tried to modify several things:
- set training cycles to a lower number --> works at very small numbers of training cycles only
- additionally, I tried to reveal if this behaviour is related to a certain column or value. Therefore, I deleted some columns and ran the process (--> works). Then I took the former deleted columns only and ran the process (--> works). Same observation with selected rows. So obviously the error is not related to the data values itself. But when taking the original file it doesn't work. --> overall result: works sometimes
- take the generate data sample with the same number of rows and columns instead of my excel sheet --> works
To reproduce this error I uploaded my excel sample file here
http://datahost.bplaced.net/sample.xls
Any kind of help appreciated... :-\
Regards
Sachs
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.008">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
<process expanded="true" height="447" width="622">
<operator activated="true" class="read_excel" compatibility="5.2.008" expanded="true" height="60" name="Read Excel" width="90" x="45" y="120">
<parameter key="excel_file" value="C:\sample.xls"/>
<parameter key="imported_cell_range" value="A1:AR74"/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<parameter key="date_format" value="dd.MM.yyyy"/>
<list key="data_set_meta_data_information">
<parameter key="0" value="a.true.real.attribute"/>
<parameter key="1" value="b.true.real.attribute"/>
<parameter key="2" value="c.true.real.attribute"/>
<parameter key="3" value="d.true.real.attribute"/>
<parameter key="4" value="e.true.real.attribute"/>
<parameter key="5" value="f.true.real.attribute"/>
<parameter key="6" value="g.true.real.attribute"/>
<parameter key="7" value="h.true.real.attribute"/>
<parameter key="8" value="i.true.real.attribute"/>
<parameter key="9" value="j.true.real.attribute"/>
<parameter key="10" value="k.true.real.attribute"/>
<parameter key="11" value="l.true.real.attribute"/>
<parameter key="12" value="m.true.real.attribute"/>
<parameter key="13" value="n.true.real.attribute"/>
<parameter key="14" value="o.true.real.attribute"/>
<parameter key="15" value="p.true.real.attribute"/>
<parameter key="16" value="q.true.real.attribute"/>
<parameter key="17" value="r.true.real.attribute"/>
<parameter key="18" value="s.true.real.attribute"/>
<parameter key="19" value="t.true.real.attribute"/>
<parameter key="20" value="u.true.real.attribute"/>
<parameter key="21" value="v.true.real.attribute"/>
<parameter key="22" value="w.true.real.attribute"/>
<parameter key="23" value="x.true.real.attribute"/>
<parameter key="24" value="y.true.real.attribute"/>
<parameter key="25" value="z.true.real.attribute"/>
<parameter key="26" value="aa.true.real.attribute"/>
<parameter key="27" value="bb.true.real.attribute"/>
<parameter key="28" value="cc.true.real.attribute"/>
<parameter key="29" value="dd.true.real.attribute"/>
<parameter key="30" value="ee.true.real.attribute"/>
<parameter key="31" value="ff.true.real.attribute"/>
<parameter key="32" value="gg.true.real.attribute"/>
<parameter key="33" value="hh.true.real.attribute"/>
<parameter key="34" value="ii.true.real.attribute"/>
<parameter key="35" value="jj.true.real.attribute"/>
<parameter key="36" value="kk.true.real.attribute"/>
<parameter key="37" value="ll.true.real.attribute"/>
<parameter key="38" value="mm.true.real.attribute"/>
<parameter key="39" value="nn.true.real.attribute"/>
<parameter key="40" value="oo.true.real.attribute"/>
<parameter key="41" value="pp.true.real.attribute"/>
<parameter key="42" value="label.true.real.label"/>
<parameter key="43" value="ID.true.real.id"/>
</list>
</operator>
<operator activated="false" class="generate_data" compatibility="5.2.008" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
<parameter key="number_examples" value="74"/>
<parameter key="number_of_attributes" value="44"/>
<parameter key="attributes_lower_bound" value="0.0"/>
<parameter key="attributes_upper_bound" value="150.0"/>
</operator>
<operator activated="true" class="series:sliding_window_validation" compatibility="5.2.000" expanded="true" height="112" name="Validation" width="90" x="246" y="30">
<parameter key="training_window_width" value="20"/>
<parameter key="training_window_step_size" value="10"/>
<parameter key="test_window_width" value="20"/>
<process expanded="true" height="465" width="295">
<operator activated="true" class="neural_net" compatibility="5.2.008" expanded="true" height="76" name="Neural Net" width="90" x="102" y="30">
<list key="hidden_layers"/>
<parameter key="training_cycles" value="200"/>
</operator>
<connect from_port="training" to_op="Neural Net" to_port="training set"/>
<connect from_op="Neural Net" from_port="model" to_port="model"/>
<portSpacing port="source_training" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
</process>
<process expanded="true" height="465" width="295">
<operator activated="true" class="apply_model" compatibility="5.2.008" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="series:forecasting_performance" compatibility="5.2.000" expanded="true" height="76" name="Performance" width="90" x="170" y="30">
<parameter key="horizon" value="1"/>
</operator>
<connect from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
<connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_averagable 1" spacing="0"/>
<portSpacing port="sink_averagable 2" spacing="0"/>
</process>
</operator>
<connect from_op="Read Excel" from_port="output" to_op="Validation" to_port="training"/>
<connect from_op="Validation" from_port="averagable 1" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
0
Answers
Moreover, I found that when pushing the stop button the process still keeps running and CPU usage remains high...
I looked at your data, some columns with missing values, one consists only of missing values! If you take them out the problem disappears. This glitch has been reported already, here http://rapid-i.com/rapidforum/index.php/topic,5104.msg18296.html#msg18296 .
Hi haddock,
thanks for having a look into my issue here. I have to admit that I haven't found the other post...
Ok, it seems obvious that this error is related to missing values then. (Stupid me! I posted the wrong file... In the end it showed the same behaviour but the original one hasn't had empty columns. Just some missing values.)
Nevertheless, I wonder if there is a rule of thumb on how many missing values a neural net can handle. I've had a generated data set where I deleted only some single values just to see if a neural net can handle this - success. However, it tends to throw errors if either the number of missing values or training cycles increase.
My real data is indeed similar to what I've posted. In case that I filter all the examples with missing values there won't be much left to build a model with... Any ideas on that?
(And there is still this issue that pressing the stop bottom doesn't stop the process... though this is obsolete once the model works fine..)
Bye & take care
Sachs
I don't use neuros, but my guess is that you'll need to replace missing values with something, perhaps the average of the given values for the attribute, but at least a value that doesn't distort too much. It's not just neuros that choke on missing values and incomplete data inevitably invites bias so you have my sympathy!
Best wishes.
Ok... I tested a new process for hours now and I came to the point where I am convinced that something is wrong with the neural net.
My investigation revealed that:
- I can run the attached process with a certain number of attributes
- One more attribute and the neural net hangs up (doesn't matter which one)
- If I take another arbitrary attribute out then it works again
- If I reduce the lengths of the attributes' names then I can run the neural net with more attributes (still not with all)
- The higher the number of training cycles the less the number off attributes which can be handeled
To reproduce this error try the following:
- Run the process as provided here --> ok
- Add attribute "s" to the subset in the "select attribute" operator --> fail
- Change number of training cycles to 10 --> ok again
I uploaded another sample (this time there are no missing values, etc.)
http://datahost.bplaced.net/sample2.xls
I am completely at a loss...
Looking forward to hearing from you
Sachs
Sorry, I cannot confirm this, all works OK on my my linux 16GB box.
sorry, there was a typo in my post [now modified]: attribute "r" is already in the list. I meant please add attribute "s" and the process will hang up.
PS: I run Rapidminer on Windows 7 with 4GB ram.
Have a nice day
Sachs
Indeed, this does hang, so we're left to ponder why, and here we hit the reason that I don't use neuros, namely that convergence to a solution cannot be guaranteed. You can hit local minima which trap the search and so hang the machine. Apologies if you're fully aware of this, otherwise just imagine black holes in your search space. Support vector machines do not have this property.
There could be something wrong with the implementation, but there is something frail in the whole neuro approach.
Sorry not to be more definite.