RAPIDMINER DATA SCIENCE COMPETITION: FARMING ON "MARS" – SEPTEMBER 12 TO OCTOBER 13, 2017

sgenzer · September 2017

Hello all community members -

Welcome to the 2nd RapidMiner Data Science Competition: Farming on "Mars"!

Our sponsor and we are super excited to bring this open competition to our 270,000+ users and we hope that you have a great time exploring this unique use case. Below is a brief summary and rules of the competition; complete documentation can be found in the attachments below. PLEASE READ all the attached documentation before beginning the competition and let the best model win!

Summary

One of the major challenges of the human colonization of “Mars” is the introduction of Earth-independent food production facilities, i.e. farming. A key element to farming on “Mars” will be the fertilization of available soil, which in its current state is not farmable due to a lack of nutrients. In order to address this, an experimental setup has been created under “Martian” environmental conditions to produce bio-fertilizer made from algae and measure the usable yield after each production run. This yield varies based on the exact quantities of certain base nutrients and the optional addition of one of two possible additional nutrients, α or β, inserted into the bio-fertilizer at some time t during the production run. The research facility has already done 1653 production runs, each one lasting 36 hours with 41 sensors recording data every hour, and recorded the potential yield of each one. These are your data to work with during this challenge.

Challenge

The goal of the challenge is to build a model that will classify which additional nutrient, α or β, and at what time t, will be most likely to boost yield during a production run. The metric to be optimized is the cumulative score value of the same 178 production runs in the test set; the baseline example above has a cumulative score value of 1000.

Submission and Evaluation

All submissions in this competition need to be posted in this thread with the entire XML of the process and the score. This includes the finished models, as well as the entire training process and all pre-processing steps. The deadline for submissions is October 13, 2017 at 23:59:59 UTC.

RapidMiner Server Instance

In order to increase the efficiency of model training and to demonstrate RapidMiner’s powerful parallel processing capabilities with its new SaaS on Amazon AWS EC2 , RapidMiner has agreed to provide a free Server EC2 instance for all participants for the duration of this competition. This server instance can be used by any participant free of charge, as often as desired, for the duration of the competition as long as all use is restricted to this competition only. Participants wishing to use this server must @sgenzer a private message to register and obtain the relevant connection details. The instance URL is https://competitions.rapidminer.com and will be online only for the duration of the competition.

Winner and Prizes

The winner of the competition will be selected based on the highest aggregate score value of the 178 testing production runs ≥ 1000, after applying the test dataset to the submitted models. All submissions will be validated by RapidMiner and the competition’s sponsor within 72 hours after their submission. The winners of this RapidMiner Data Science Challenge will be announced by October 17, 2017 in the competition’s thread.

RapidMiner and the competition sponsor will award the following prices to the winners:

1^st place: US$1000

2^nd place: US$250

3^rd place: US$100

PLUS all participants who submit a valid entry in the thread prior to the deadline will be eligible to win one or more amazing RapidMiner “swag” items. Supplies are limited and will be awarded on a first come-first served basis.

Restrictions

All participants of the RapidMiner Data Science Competitions must be registered users in good standing of the RapidMiner User Community and age 18 or older at the time of entry. Employees, directors, consultants, and any other persons affiliated with RapidMiner, Inc. are not eligible to participate in this competition.

Good luck everyone and reply to this thread with questions and your models!

Scott

Links: Training Data Set

Test Data Set

Annotated Data Set Example

bigD · September 2017

Hi Scott,

Looks like an interesting problem :smileyhappy: It appears that 'run 1341' in the test dataset may be corrupted.

Cheers

Dan

sgenzer · September 2017

Hi Dan -

Hmm. I just downloaded the zip from the link above and I see no problems the files.

Screen Shot 2017-09-15 at 4.44.24 PM.png

Download again?

https://rapidminer-my.sharepoint.com/personal/sgenzer_rapidminer_com/_layouts/15/guestaccess.aspx?docid=1c7686d0d5c0241e9b293c07bb98beeec&authkey=AfDPdBh_3zuwnerIo59cyA8

Scott

bigD · September 2017

I guess it does have run 1341 but it also has a corrupted fragment at the bottom of the list. I'll just delete it.

D.

16B543J · September 2017

Hi,

I would like to clarify a few points on the explanation given.

"

These are the production yield increases for the production run at each hour of production. For this example, all yield increases for nutrient A (column AS) will be scored as invalid (-100) because it was shown later that nutrient B was needed (see cell C10). For column AT, the score is determined by which hour nutrient B was inserted: if nutrient B was inserted at t=0, score = 62.5 If nutrient B was inserted at t=5 hours, score = 59.5. If nutrient A was inserted at t = 24 hours, score = 54.3"

1. If nutrient B was inserted at t=5 hours, score = 59.5. It should be 59.9.

2. If nutrient A was inserted at t = 24 hours, score = 54.3. This statement is true only when the Label is equal to "A".

Pls clarify. Thank you.

sgenzer · September 2017

hello @16B543J - thanks for your questions. I am assuming you are referring to the annotated training set 1? Here are my answers.

These are the production yield increases for the production run at each hour of production. For this example, all yield increases for nutrient A (column AS) will be scored as invalid (-100) because it was shown later that nutrient B was needed (see cell C10).

1. Yes that is correct.

For column AT, the score is determined by which hour nutrient B was inserted: if nutrient B was inserted at t=0, score = 62.5 If nutrient B was inserted at t=5 hours, score = 59.5. If nutrient A was inserted at t = 24 hours, score = 54.3".

1. If nutrient B was inserted at t=5 hours, score = 59.5. It should be 59.9.

2. If nutrient A was inserted at t = 24 hours, score = 54.3. This statement is true only when the Label is equal to "A".

2. I'm not really sure what your question is. For the annotated training set 1, if nutrient B was inserted at t=5, the score would be 59.9. And if nutrient B was inserted at t=24 hrs, the score would be 54.3. If nutrient A is inserted at any time, score = -100.

Thanks and good luck!

Scott

Pls clarify. Thank you.

16B543J · September 2017

Thanks Scott for the clarification.

Andrew · September 2017

Hello Scott

I noticed there are around 7% of the rows contain missing values for the attributes sensor41, yieldIncreaseA and yieldIncreaseB. For example trainingset 1001 shows this. Is this intentional?

Andrew

Andrew · September 2017

Hello Scott

Could you change the annotation in cell AS:5 in the worked example to match your reply to avoid confusing later readers.

regards

Andrew

sgenzer · September 2017

Hello @Andrew - thank you for the feedback. I finally got the aha moment about what @16B543J was referring to yesterday, i.e. the text explanation in the pink boxes. I think I have looked at that so many times that I glanced over it completely. My apologies. I will update the file in a few minutes.

As for your question about missing values, yes, there are many. These are actually real data from our sponsor and hence there all sorts of wonky things in it.

Scott

jacobcybulski · September 2017

Hi Scott, I am joining this discussion a bit late...

I need some clarification on how the data was collected. According to the spec, in any run nutrient A or B can be added to the bio-fertiliser once at some time t. What is not clear: Is nutrient added before or after the reading of the sensors at time t? For example, at time t=0 was nutrient added before the very first reading of the sensors or after the first reading? It is crucial as it seems in a number of cases t=0 was the best option to add the nutrient, however, it would not make any sense to do so without taking the very first reading and it seems only sensor 41 was kind enough to give any data at that time.

Thanks a lot -- Jacob

sgenzer · September 2017

hello @jacobcybulski - thanks for your question. Here's the answer I have received from the sponsor (who created the data set):

"The answer to Jacob's question would be that the nutrient is added right after the reading of all sensors is available for the specific point in time. The situation at t = 0 is a bit special and he makes a fair point. I have personally completely disregarded the option of making predictions at t=0 in my models, as there is only one sensor that provides data at this point in time. However, this does not mean that making a prediction at t = 0 is entirely implausable."

Scott

jacobcybulski · September 2017

Thanks a lot Scott -- Jacob

16B543J · October 2017

Hi Scott,

At a data row where the label is B, and the value for "yieldIncreaseA" is "19900". Can I assume the value is "-100" and ignore the "19900".value?

Thanks

sgenzer · October 2017

Hello @16B543J - that is correct. If the label is B and nutrient A is added at any point in the production run, the score is -100 irrespective of what is in column "yieldIncreaseA".

Scott

sgenzer · October 2017

Good morning competitors,

Just wanted to remind everyone that there are less than 2 weeks left for this competition. In this vein, I would like to share again how submissions must be made in order to be valid:

The deadline for submissions is October 13, 2017 at 23:59:59 UTC. Absolutely no exceptions whatsoever.
All submissions in this competition need to be posted in this thread with the entire XML of the process and the score. This includes the finished models, as well as the entire training process and all pre-processing steps. Please use the </> tool to post your XML process.
All processes that are submitted on this thread will be independently evaluated by both the sponsor and myself and all decisions will be final. If we cannot run your process the first time, we may reach out to you here on the community (via PM) for questions. If you do not respond within 12 hours, we will consider your entry disqualified. If we cannot run your process after a second attempt, we will consider your entry disqualified.
It is our hope that we will be able to determine a winner by October 17 and announce it here on this thread. If, for some reason, we are delayed, we will announce it here.
All winners posted in this thread will be notified independently via PM on the RapidMiner User Community. If a winner does not respond within 72 hours, we will award to the next highest submission.
The RapidMiner Competition Server will be completely erased on or after October 17. If you wish to retain any data or processes on the server, please ensure that you have copied them to your own repository before this date.
All submissions must be made by people over 18 years old and not a current employee of RapidMiner, Inc. Contestants may be a resident/citizen of any country to enter but we reserve the right to refuse payment if forbidden by the U.S. Department of the Treasury. Any fees incurred while remitting payment may be subtracted from prize money at the discretion of RapidMiner, Inc.

As usual, please do not hesitate to ask questions as they arise on this thread. Good luck to everyone!

Scott

Andrew · October 2017

To check my understanding, I've implemented a process that uses the label of the test data as the correct class and I've assumed that t=0 is the time when the sample is introduced. I then calculate a score based on the sum of the t=0 valuea of yieldIncreaseA or yieldIncreaseB to get a result of 14354.5. Obviously, I'm cheating but my questions are

Is my method of selecting the yield and calculation of the final score correct?
I am setting the yield values to 0 if they are missing - is that what you are expecting?

<?xml version="1.0" encoding="UTF-8"?><process version="7.5.000">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="7.5.000" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="concurrency:loop_files" compatibility="7.5.000" expanded="true" height="82" name="test" width="90" x="45" y="34">
        <parameter key="directory" value="D:\RMCompetition\RM_Competition_TestData_random\RM_Competition_TestData_random"/>
        <parameter key="filter_by_glob" value="test*.xlsx"/>
        <process expanded="true">
          <operator activated="true" class="read_excel" compatibility="7.5.000" expanded="true" height="68" name="Read Excel (2)" width="90" x="246" y="85">
            <parameter key="excel_file" value="D:\RMCompetition\RM_Competition_TrainingData_random\RM_Competition_TrainingData_random\training set - run 1.xlsx"/>
            <parameter key="imported_cell_range" value="A1:AT38"/>
            <parameter key="first_row_as_names" value="false"/>
            <list key="annotations">
              <parameter key="0" value="Name"/>
            </list>
            <list key="data_set_meta_data_information">
              <parameter key="0" value="Id.true.integer.attribute"/>
              <parameter key="1" value="hour.true.integer.attribute"/>
              <parameter key="2" value="Label.true.polynominal.attribute"/>
              <parameter key="3" value="sensor1.true.integer.attribute"/>
              <parameter key="4" value="sensor2.true.integer.attribute"/>
              <parameter key="5" value="sensor3.true.integer.attribute"/>
              <parameter key="6" value="sensor4.true.integer.attribute"/>
              <parameter key="7" value="sensor5.true.integer.attribute"/>
              <parameter key="8" value="sensor6.true.integer.attribute"/>
              <parameter key="9" value="sensor7.true.integer.attribute"/>
              <parameter key="10" value="sensor8.true.integer.attribute"/>
              <parameter key="11" value="sensor9.true.integer.attribute"/>
              <parameter key="12" value="sensor10.true.integer.attribute"/>
              <parameter key="13" value="sensor11.true.integer.attribute"/>
              <parameter key="14" value="sensor12.true.integer.attribute"/>
              <parameter key="15" value="sensor13.true.integer.attribute"/>
              <parameter key="16" value="sensor14.true.integer.attribute"/>
              <parameter key="17" value="sensor15.true.integer.attribute"/>
              <parameter key="18" value="sensor16.true.integer.attribute"/>
              <parameter key="19" value="sensor17.true.integer.attribute"/>
              <parameter key="20" value="sensor18.true.integer.attribute"/>
              <parameter key="21" value="sensor19.true.integer.attribute"/>
              <parameter key="22" value="sensor20.true.integer.attribute"/>
              <parameter key="23" value="sensor21.true.integer.attribute"/>
              <parameter key="24" value="sensor22.true.integer.attribute"/>
              <parameter key="25" value="sensor23.true.integer.attribute"/>
              <parameter key="26" value="sensor24.true.integer.attribute"/>
              <parameter key="27" value="sensor25.true.integer.attribute"/>
              <parameter key="28" value="sensor26.true.integer.attribute"/>
              <parameter key="29" value="sensor27.true.integer.attribute"/>
              <parameter key="30" value="sensor28.true.integer.attribute"/>
              <parameter key="31" value="sensor29.true.integer.attribute"/>
              <parameter key="32" value="sensor30.true.integer.attribute"/>
              <parameter key="33" value="sensor31.true.integer.attribute"/>
              <parameter key="34" value="sensor32.true.integer.attribute"/>
              <parameter key="35" value="sensor33.true.integer.attribute"/>
              <parameter key="36" value="sensor34.true.integer.attribute"/>
              <parameter key="37" value="sensor35.true.integer.attribute"/>
              <parameter key="38" value="sensor36.true.integer.attribute"/>
              <parameter key="39" value="sensor37.true.integer.attribute"/>
              <parameter key="40" value="sensor38.true.integer.attribute"/>
              <parameter key="41" value="sensor39.true.numeric.attribute"/>
              <parameter key="42" value="sensor40.true.numeric.attribute"/>
              <parameter key="43" value="sensor41.true.numeric.attribute"/>
              <parameter key="44" value="yieldIncreaseA.true.numeric.attribute"/>
              <parameter key="45" value="yieldIncreaseB.true.real.attribute"/>
            </list>
          </operator>
          <connect from_port="file object" to_op="Read Excel (2)" to_port="file"/>
          <connect from_op="Read Excel (2)" from_port="output" to_port="output 1"/>
          <portSpacing port="source_file object" spacing="0"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_output 1" spacing="0"/>
          <portSpacing port="sink_output 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="append" compatibility="7.5.000" expanded="true" height="82" name="Append (2)" width="90" x="179" y="34"/>
      <operator activated="true" class="select_attributes" compatibility="7.5.000" expanded="true" height="82" name="Select Attributes (3)" width="90" x="313" y="34">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attributes" value="Label|hour|yieldIncreaseA|yieldIncreaseB"/>
      </operator>
      <operator activated="true" class="filter_examples" compatibility="7.5.000" expanded="true" height="103" name="Filter Examples (4)" width="90" x="447" y="34">
        <list key="filters_list">
          <parameter key="filters_entry_key" value="hour.eq.0"/>
        </list>
      </operator>
      <operator activated="true" class="replace_missing_values" compatibility="7.5.000" expanded="true" height="103" name="Replace Missing Values" width="90" x="581" y="34">
        <parameter key="default" value="value"/>
        <list key="columns"/>
        <parameter key="replenishment_value" value="0"/>
      </operator>
      <operator activated="true" class="generate_attributes" compatibility="7.5.000" expanded="true" height="82" name="Generate Attributes" width="90" x="715" y="34">
        <list key="function_descriptions">
          <parameter key="Score" value="if(Label == &quot;A&quot;, yieldIncreaseA, yieldIncreaseB)"/>
        </list>
      </operator>
      <operator activated="true" class="aggregate" compatibility="7.5.000" expanded="true" height="82" name="Aggregate" width="90" x="849" y="34">
        <list key="aggregation_attributes">
          <parameter key="Score" value="sum"/>
        </list>
      </operator>
      <connect from_op="test" from_port="output 1" to_op="Append (2)" to_port="example set 1"/>
      <connect from_op="Append (2)" from_port="merged set" to_op="Select Attributes (3)" to_port="example set input"/>
      <connect from_op="Select Attributes (3)" from_port="example set output" to_op="Filter Examples (4)" to_port="example set input"/>
      <connect from_op="Filter Examples (4)" from_port="example set output" to_op="Replace Missing Values" to_port="example set input"/>
      <connect from_op="Replace Missing Values" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
      <connect from_op="Generate Attributes" from_port="example set output" to_op="Aggregate" to_port="example set input"/>
      <connect from_op="Aggregate" from_port="example set output" to_port="result 1"/>
      <connect from_op="Aggregate" from_port="original" to_port="result 2"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
    </process>
  </operator>
</process>

Andrew

sgenzer · October 2017

Hello @Andrew and all -

Yes a sample scoring process would be useful. Here is one that can be used if you like. Note this this "model" does nothing but always pick nutrient A at hour 13 - not a good idea.

<?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
 <context>
 <input/>
 <output/>
 <macros/>
 </context>
 <operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
 <parameter key="random_seed" value="-1"/>
 <process expanded="true">
 <operator activated="true" class="concurrency:loop_files" compatibility="7.6.001" expanded="true" height="82" name="train" width="90" x="45" y="136">
 <parameter key="directory" value="/Users/GenzerConsulting/OneDrive - RapidMiner/OneDrive Repository/RM Competitions/Comp1-Mars Farming-Sept 2017/RM_Competition_TrainingData_random"/>
 <parameter key="filter_type" value="regex"/>
 <parameter key="filter_by_glob" value="tra*.xlsx"/>
 <parameter key="filter_by_regex" value="train.*.xlsx"/>
 <process expanded="true">
 <operator activated="true" class="read_excel" compatibility="7.6.001" expanded="true" height="68" name="Read Excel (3)" width="90" x="112" y="34">
 <parameter key="excel_file" value="D:\RMCompetition\RM_Competition_TrainingData_random\RM_Competition_TrainingData_random\training set - run 1.xlsx"/>
 <parameter key="imported_cell_range" value="A1:AT38"/>
 <parameter key="first_row_as_names" value="false"/>
 <list key="annotations">
 <parameter key="0" value="Name"/>
 </list>
 <list key="data_set_meta_data_information">
 <parameter key="0" value="Id.true.integer.attribute"/>
 <parameter key="1" value="hour.true.integer.attribute"/>
 <parameter key="2" value="Label.true.polynominal.attribute"/>
 <parameter key="3" value="sensor1.true.integer.attribute"/>
 <parameter key="4" value="sensor2.true.integer.attribute"/>
 <parameter key="5" value="sensor3.true.integer.attribute"/>
 <parameter key="6" value="sensor4.true.integer.attribute"/>
 <parameter key="7" value="sensor5.true.integer.attribute"/>
 <parameter key="8" value="sensor6.true.integer.attribute"/>
 <parameter key="9" value="sensor7.true.integer.attribute"/>
 <parameter key="10" value="sensor8.true.integer.attribute"/>
 <parameter key="11" value="sensor9.true.integer.attribute"/>
 <parameter key="12" value="sensor10.true.integer.attribute"/>
 <parameter key="13" value="sensor11.true.integer.attribute"/>
 <parameter key="14" value="sensor12.true.integer.attribute"/>
 <parameter key="15" value="sensor13.true.integer.attribute"/>
 <parameter key="16" value="sensor14.true.integer.attribute"/>
 <parameter key="17" value="sensor15.true.integer.attribute"/>
 <parameter key="18" value="sensor16.true.integer.attribute"/>
 <parameter key="19" value="sensor17.true.integer.attribute"/>
 <parameter key="20" value="sensor18.true.integer.attribute"/>
 <parameter key="21" value="sensor19.true.integer.attribute"/>
 <parameter key="22" value="sensor20.true.integer.attribute"/>
 <parameter key="23" value="sensor21.true.integer.attribute"/>
 <parameter key="24" value="sensor22.true.integer.attribute"/>
 <parameter key="25" value="sensor23.true.integer.attribute"/>
 <parameter key="26" value="sensor24.true.integer.attribute"/>
 <parameter key="27" value="sensor25.true.integer.attribute"/>
 <parameter key="28" value="sensor26.true.integer.attribute"/>
 <parameter key="29" value="sensor27.true.integer.attribute"/>
 <parameter key="30" value="sensor28.true.integer.attribute"/>
 <parameter key="31" value="sensor29.true.integer.attribute"/>
 <parameter key="32" value="sensor30.true.integer.attribute"/>
 <parameter key="33" value="sensor31.true.integer.attribute"/>
 <parameter key="34" value="sensor32.true.integer.attribute"/>
 <parameter key="35" value="sensor33.true.integer.attribute"/>
 <parameter key="36" value="sensor34.true.integer.attribute"/>
 <parameter key="37" value="sensor35.true.integer.attribute"/>
 <parameter key="38" value="sensor36.true.integer.attribute"/>
 <parameter key="39" value="sensor37.true.integer.attribute"/>
 <parameter key="40" value="sensor38.true.integer.attribute"/>
 <parameter key="41" value="sensor39.true.numeric.attribute"/>
 <parameter key="42" value="sensor40.true.numeric.attribute"/>
 <parameter key="43" value="sensor41.true.numeric.attribute"/>
 <parameter key="44" value="yieldIncreaseA.true.numeric.attribute"/>
 <parameter key="45" value="yieldIncreaseB.true.real.attribute"/>
 </list>
 </operator>
 <connect from_port="file object" to_op="Read Excel (3)" to_port="file"/>
 <connect from_op="Read Excel (3)" from_port="output" to_port="output 1"/>
 <portSpacing port="source_file object" spacing="0"/>
 <portSpacing port="source_input 1" spacing="0"/>
 <portSpacing port="sink_output 1" spacing="0"/>
 <portSpacing port="sink_output 2" spacing="0"/>
 </process>
 </operator>
 <operator activated="true" class="append" compatibility="7.6.001" expanded="true" height="82" name="Append (3)" width="90" x="179" y="136"/>
 <operator activated="true" class="replace_missing_values" compatibility="7.6.001" expanded="true" height="103" name="Replace Missing Values" width="90" x="313" y="136">
 <parameter key="attribute_filter_type" value="subset"/>
 <parameter key="attributes" value="yieldIncreaseA|yieldIncreaseB"/>
 <parameter key="default" value="value"/>
 <list key="columns"/>
 <parameter key="replenishment_value" value="0"/>
 <description align="center" color="transparent" colored="false" width="126">replace missing yield values with zero</description>
 </operator>
 <operator activated="true" class="subprocess" compatibility="7.6.001" expanded="true" height="82" name="Subprocess (4)" width="90" x="514" y="187">
 <process expanded="true">
 <operator activated="true" class="numerical_to_polynominal" compatibility="7.6.001" expanded="true" height="82" name="Numerical to Polynominal (2)" width="90" x="45" y="34">
 <parameter key="attribute_filter_type" value="single"/>
 <parameter key="attribute" value="Id"/>
 </operator>
 <operator activated="true" class="set_role" compatibility="7.6.001" expanded="true" height="82" name="Set Role (3)" width="90" x="179" y="34">
 <parameter key="attribute_name" value="Id"/>
 <parameter key="target_role" value="id"/>
 <list key="set_additional_roles"/>
 </operator>
 <operator activated="true" class="concurrency:loop_values" compatibility="7.6.001" expanded="true" height="82" name="Loop Values (2)" width="90" x="313" y="34">
 <parameter key="attribute" value="Id"/>
 <parameter key="iteration_macro" value="id"/>
 <process expanded="true">
 <operator activated="true" class="generate_attributes" compatibility="7.6.001" expanded="true" height="82" name="Generate Attributes (6)" width="90" x="45" y="34">
 <list key="function_descriptions">
 <parameter key="nutrientPrediction" value="&quot;A&quot;"/>
 <parameter key="hourPrediction" value="13"/>
 <parameter key="hourPredictionMatch" value="if(hour==hourPrediction,TRUE,FALSE)"/>
 </list>
 <description align="center" color="transparent" colored="false" width="126">THIS IS WHAT YOUR MODEL SHOULD DO - THIS OPERATOR IS JUST SELECTING THE NUTRIENT AND HOUR AT RANDOM</description>
 </operator>
 <operator activated="true" class="filter_examples" compatibility="7.6.001" expanded="true" height="103" name="Filter Examples (3)" width="90" x="179" y="34">
 <list key="filters_list">
 <parameter key="filters_entry_key" value="Id.equals.%{id}"/>
 <parameter key="filters_entry_key" value="hourPredictionMatch.equals.true"/>
 </list>
 </operator>
 <connect from_port="input 1" to_op="Generate Attributes (6)" to_port="example set input"/>
 <connect from_op="Generate Attributes (6)" from_port="example set output" to_op="Filter Examples (3)" to_port="example set input"/>
 <connect from_op="Filter Examples (3)" from_port="example set output" to_port="output 1"/>
 <portSpacing port="source_input 1" spacing="0"/>
 <portSpacing port="source_input 2" spacing="0"/>
 <portSpacing port="sink_output 1" spacing="0"/>
 <portSpacing port="sink_output 2" spacing="0"/>
 </process>
 </operator>
 <operator activated="true" class="append" compatibility="7.6.001" expanded="true" height="82" name="Append (4)" width="90" x="447" y="34"/>
 <operator activated="true" class="order_attributes" compatibility="7.6.001" expanded="true" height="82" name="Reorder Attributes (4)" width="90" x="581" y="34">
 <parameter key="attribute_ordering" value="Label|hour|hourPrediction|nutrientPrediction|yieldIncreaseA|yieldIncreaseB"/>
 </operator>
 <connect from_port="in 1" to_op="Numerical to Polynominal (2)" to_port="example set input"/>
 <connect from_op="Numerical to Polynominal (2)" from_port="example set output" to_op="Set Role (3)" to_port="example set input"/>
 <connect from_op="Set Role (3)" from_port="example set output" to_op="Loop Values (2)" to_port="input 1"/>
 <connect from_op="Loop Values (2)" from_port="output 1" to_op="Append (4)" to_port="example set 1"/>
 <connect from_op="Append (4)" from_port="merged set" to_op="Reorder Attributes (4)" to_port="example set input"/>
 <connect from_op="Reorder Attributes (4)" from_port="example set output" to_port="out 1"/>
 <portSpacing port="source_in 1" spacing="0"/>
 <portSpacing port="source_in 2" spacing="0"/>
 <portSpacing port="sink_out 1" spacing="0"/>
 <portSpacing port="sink_out 2" spacing="0"/>
 </process>
 <description align="center" color="transparent" colored="false" width="126">MODELING</description>
 </operator>
 <operator activated="true" class="concurrency:loop_files" compatibility="7.6.001" expanded="true" height="82" name="test" width="90" x="45" y="544">
 <parameter key="directory" value="/Users/genzerconsulting/OneDrive - RapidMiner/OneDrive Repository/RM Competitions/Comp1-Mars Farming-Sept 2017/RM_Competition_TestData_random"/>
 <parameter key="filter_type" value="regex"/>
 <parameter key="filter_by_glob" value="test*.xlsx"/>
 <parameter key="filter_by_regex" value="test.*.xlsx"/>
 <process expanded="true">
 <operator activated="true" class="read_excel" compatibility="7.6.001" expanded="true" height="68" name="Read Excel (2)" width="90" x="45" y="34">
 <parameter key="excel_file" value="D:\RMCompetition\RM_Competition_TrainingData_random\RM_Competition_TrainingData_random\training set - run 1.xlsx"/>
 <parameter key="imported_cell_range" value="A1:AT38"/>
 <parameter key="first_row_as_names" value="false"/>
 <list key="annotations">
 <parameter key="0" value="Name"/>
 </list>
 <list key="data_set_meta_data_information">
 <parameter key="0" value="Id.true.integer.attribute"/>
 <parameter key="1" value="hour.true.integer.attribute"/>
 <parameter key="2" value="Label.true.polynominal.attribute"/>
 <parameter key="3" value="sensor1.true.integer.attribute"/>
 <parameter key="4" value="sensor2.true.integer.attribute"/>
 <parameter key="5" value="sensor3.true.integer.attribute"/>
 <parameter key="6" value="sensor4.true.integer.attribute"/>
 <parameter key="7" value="sensor5.true.integer.attribute"/>
 <parameter key="8" value="sensor6.true.integer.attribute"/>
 <parameter key="9" value="sensor7.true.integer.attribute"/>
 <parameter key="10" value="sensor8.true.integer.attribute"/>
 <parameter key="11" value="sensor9.true.integer.attribute"/>
 <parameter key="12" value="sensor10.true.integer.attribute"/>
 <parameter key="13" value="sensor11.true.integer.attribute"/>
 <parameter key="14" value="sensor12.true.integer.attribute"/>
 <parameter key="15" value="sensor13.true.integer.attribute"/>
 <parameter key="16" value="sensor14.true.integer.attribute"/>
 <parameter key="17" value="sensor15.true.integer.attribute"/>
 <parameter key="18" value="sensor16.true.integer.attribute"/>
 <parameter key="19" value="sensor17.true.integer.attribute"/>
 <parameter key="20" value="sensor18.true.integer.attribute"/>
 <parameter key="21" value="sensor19.true.integer.attribute"/>
 <parameter key="22" value="sensor20.true.integer.attribute"/>
 <parameter key="23" value="sensor21.true.integer.attribute"/>
 <parameter key="24" value="sensor22.true.integer.attribute"/>
 <parameter key="25" value="sensor23.true.integer.attribute"/>
 <parameter key="26" value="sensor24.true.integer.attribute"/>
 <parameter key="27" value="sensor25.true.integer.attribute"/>
 <parameter key="28" value="sensor26.true.integer.attribute"/>
 <parameter key="29" value="sensor27.true.integer.attribute"/>
 <parameter key="30" value="sensor28.true.integer.attribute"/>
 <parameter key="31" value="sensor29.true.integer.attribute"/>
 <parameter key="32" value="sensor30.true.integer.attribute"/>
 <parameter key="33" value="sensor31.true.integer.attribute"/>
 <parameter key="34" value="sensor32.true.integer.attribute"/>
 <parameter key="35" value="sensor33.true.integer.attribute"/>
 <parameter key="36" value="sensor34.true.integer.attribute"/>
 <parameter key="37" value="sensor35.true.integer.attribute"/>
 <parameter key="38" value="sensor36.true.integer.attribute"/>
 <parameter key="39" value="sensor37.true.integer.attribute"/>
 <parameter key="40" value="sensor38.true.integer.attribute"/>
 <parameter key="41" value="sensor39.true.numeric.attribute"/>
 <parameter key="42" value="sensor40.true.numeric.attribute"/>
 <parameter key="43" value="sensor41.true.numeric.attribute"/>
 <parameter key="44" value="yieldIncreaseA.true.numeric.attribute"/>
 <parameter key="45" value="yieldIncreaseB.true.real.attribute"/>
 </list>
 </operator>
 <operator activated="true" class="replace_missing_values" compatibility="7.6.001" expanded="true" height="103" name="Replace Missing Values (2)" width="90" x="179" y="34">
 <parameter key="attribute_filter_type" value="subset"/>
 <parameter key="attributes" value="yieldIncreaseA|yieldIncreaseB"/>
 <parameter key="default" value="value"/>
 <list key="columns"/>
 <parameter key="replenishment_value" value="0"/>
 </operator>
 <connect from_port="file object" to_op="Read Excel (2)" to_port="file"/>
 <connect from_op="Read Excel (2)" from_port="output" to_op="Replace Missing Values (2)" to_port="example set input"/>
 <connect from_op="Replace Missing Values (2)" from_port="example set output" to_port="output 1"/>
 <portSpacing port="source_file object" spacing="0"/>
 <portSpacing port="source_input 1" spacing="0"/>
 <portSpacing port="sink_output 1" spacing="0"/>
 <portSpacing port="sink_output 2" spacing="0"/>
 </process>
 </operator>
 <operator activated="true" class="append" compatibility="7.6.001" expanded="true" height="82" name="Append (2)" width="90" x="179" y="544"/>
 <operator activated="true" class="replace_missing_values" compatibility="7.6.001" expanded="true" height="103" name="Replace Missing Values (3)" width="90" x="313" y="544">
 <parameter key="attribute_filter_type" value="subset"/>
 <parameter key="attributes" value="yieldIncreaseA|yieldIncreaseB"/>
 <parameter key="default" value="value"/>
 <list key="columns"/>
 <parameter key="replenishment_value" value="0"/>
 <description align="center" color="transparent" colored="false" width="126">replace missing yield values with zero</description>
 </operator>
 <operator activated="true" class="subprocess" compatibility="7.6.001" expanded="true" height="82" name="Subprocess (5)" width="90" x="514" y="544">
 <process expanded="true">
 <operator activated="true" class="numerical_to_polynominal" compatibility="7.6.001" expanded="true" height="82" name="Numerical to Polynominal (4)" width="90" x="45" y="34">
 <parameter key="attribute_filter_type" value="single"/>
 <parameter key="attribute" value="Id"/>
 </operator>
 <operator activated="true" class="set_role" compatibility="7.6.001" expanded="true" height="82" name="Set Role (4)" width="90" x="179" y="34">
 <parameter key="attribute_name" value="Id"/>
 <parameter key="target_role" value="id"/>
 <list key="set_additional_roles"/>
 </operator>
 <operator activated="true" class="concurrency:loop_values" compatibility="7.6.001" expanded="true" height="82" name="Loop Values (4)" width="90" x="313" y="34">
 <parameter key="attribute" value="Id"/>
 <parameter key="iteration_macro" value="id"/>
 <process expanded="true">
 <operator activated="true" class="generate_attributes" compatibility="7.6.001" expanded="true" height="82" name="Generate Attributes (2)" width="90" x="45" y="34">
 <list key="function_descriptions">
 <parameter key="nutrientPrediction" value="&quot;A&quot;"/>
 <parameter key="hourPrediction" value="13"/>
 <parameter key="hourPredictionMatch" value="if(hour==hourPrediction,TRUE,FALSE)"/>
 </list>
 <description align="center" color="transparent" colored="false" width="126">THIS IS WHAT YOUR MODEL SHOULD DO - THIS OPERATOR IS JUST SELECTING THE NUTRIENT AND HOUR AT RANDOM</description>
 </operator>
 <operator activated="true" class="filter_examples" compatibility="7.6.001" expanded="true" height="103" name="Filter Examples (2)" width="90" x="179" y="34">
 <list key="filters_list">
 <parameter key="filters_entry_key" value="Id.equals.%{id}"/>
 <parameter key="filters_entry_key" value="hourPredictionMatch.equals.true"/>
 </list>
 </operator>
 <connect from_port="input 1" to_op="Generate Attributes (2)" to_port="example set input"/>
 <connect from_op="Generate Attributes (2)" from_port="example set output" to_op="Filter Examples (2)" to_port="example set input"/>
 <connect from_op="Filter Examples (2)" from_port="example set output" to_port="output 1"/>
 <portSpacing port="source_input 1" spacing="0"/>
 <portSpacing port="source_input 2" spacing="0"/>
 <portSpacing port="sink_output 1" spacing="0"/>
 <portSpacing port="sink_output 2" spacing="0"/>
 </process>
 </operator>
 <operator activated="true" class="append" compatibility="7.6.001" expanded="true" height="82" name="Append (6)" width="90" x="447" y="34"/>
 <operator activated="true" class="order_attributes" compatibility="7.6.001" expanded="true" height="82" name="Reorder Attributes (5)" width="90" x="581" y="34">
 <parameter key="attribute_ordering" value="Label|hour|hourPrediction|nutrientPrediction|yieldIncreaseA|yieldIncreaseB"/>
 </operator>
 <connect from_port="in 1" to_op="Numerical to Polynominal (4)" to_port="example set input"/>
 <connect from_op="Numerical to Polynominal (4)" from_port="example set output" to_op="Set Role (4)" to_port="example set input"/>
 <connect from_op="Set Role (4)" from_port="example set output" to_op="Loop Values (4)" to_port="input 1"/>
 <connect from_op="Loop Values (4)" from_port="output 1" to_op="Append (6)" to_port="example set 1"/>
 <connect from_op="Append (6)" from_port="merged set" to_op="Reorder Attributes (5)" to_port="example set input"/>
 <connect from_op="Reorder Attributes (5)" from_port="example set output" to_port="out 1"/>
 <portSpacing port="source_in 1" spacing="0"/>
 <portSpacing port="source_in 2" spacing="0"/>
 <portSpacing port="sink_out 1" spacing="0"/>
 <portSpacing port="sink_out 2" spacing="0"/>
 </process>
 <description align="center" color="transparent" colored="false" width="126">APPLY MODEL</description>
 </operator>
 <operator activated="true" class="subprocess" compatibility="7.6.001" expanded="true" height="103" name="Subprocess (2)" width="90" x="715" y="544">
 <process expanded="true">
 <operator activated="true" class="generate_attributes" compatibility="7.6.001" expanded="true" height="82" name="Generate Attributes (3)" width="90" x="45" y="34">
 <list key="function_descriptions">
 <parameter key="nutrientCorrect" value="if(Label==nutrientPrediction,TRUE,FALSE)"/>
 </list>
 <description align="center" color="transparent" colored="false" width="126">nutrientCorrect and hourMatch</description>
 </operator>
 <operator activated="true" class="generate_attributes" compatibility="7.6.001" expanded="true" height="82" name="Generate Attributes (4)" width="90" x="179" y="34">
 <list key="function_descriptions">
 <parameter key="SCORE" value="if(nutrientCorrect==TRUE&amp;&amp;Label==&quot;A&quot;,yieldIncreaseA,&#10;if(nutrientCorrect==TRUE&amp;&amp;Label==&quot;B&quot;,yieldIncreaseB,-100))"/>
 </list>
 <description align="center" color="transparent" colored="false" width="126">SCORE</description>
 </operator>
 <operator activated="true" class="order_attributes" compatibility="7.6.001" expanded="true" height="82" name="Reorder Attributes (2)" width="90" x="313" y="34">
 <parameter key="attribute_ordering" value="Label|hour|hourPrediction|nutrientPrediction|nutrientCorrect|yieldIncreaseA|yieldIncreaseB|SCORE"/>
 </operator>
 <operator activated="true" class="aggregate" compatibility="7.6.001" expanded="true" height="82" name="Aggregate" width="90" x="447" y="34">
 <list key="aggregation_attributes">
 <parameter key="SCORE" value="sum"/>
 </list>
 </operator>
 <connect from_port="in 1" to_op="Generate Attributes (3)" to_port="example set input"/>
 <connect from_op="Generate Attributes (3)" from_port="example set output" to_op="Generate Attributes (4)" to_port="example set input"/>
 <connect from_op="Generate Attributes (4)" from_port="example set output" to_op="Reorder Attributes (2)" to_port="example set input"/>
 <connect from_op="Reorder Attributes (2)" from_port="example set output" to_op="Aggregate" to_port="example set input"/>
 <connect from_op="Aggregate" from_port="example set output" to_port="out 1"/>
 <connect from_op="Aggregate" from_port="original" to_port="out 2"/>
 <portSpacing port="source_in 1" spacing="0"/>
 <portSpacing port="source_in 2" spacing="0"/>
 <portSpacing port="sink_out 1" spacing="0"/>
 <portSpacing port="sink_out 2" spacing="0"/>
 <portSpacing port="sink_out 3" spacing="0"/>
 </process>
 <description align="center" color="transparent" colored="false" width="126">SCORING</description>
 </operator>
 <connect from_op="train" from_port="output 1" to_op="Append (3)" to_port="example set 1"/>
 <connect from_op="Append (3)" from_port="merged set" to_op="Replace Missing Values" to_port="example set input"/>
 <connect from_op="Replace Missing Values" from_port="example set output" to_op="Subprocess (4)" to_port="in 1"/>
 <connect from_op="test" from_port="output 1" to_op="Append (2)" to_port="example set 1"/>
 <connect from_op="Append (2)" from_port="merged set" to_op="Replace Missing Values (3)" to_port="example set input"/>
 <connect from_op="Replace Missing Values (3)" from_port="example set output" to_op="Subprocess (5)" to_port="in 1"/>
 <connect from_op="Subprocess (5)" from_port="out 1" to_op="Subprocess (2)" to_port="in 1"/>
 <connect from_op="Subprocess (2)" from_port="out 1" to_port="result 1"/>
 <connect from_op="Subprocess (2)" from_port="out 2" to_port="result 2"/>
 <portSpacing port="source_input 1" spacing="0"/>
 <portSpacing port="sink_result 1" spacing="0"/>
 <portSpacing port="sink_result 2" spacing="0"/>
 <portSpacing port="sink_result 3" spacing="0"/>
 <description align="center" color="yellow" colored="false" height="196" resized="true" width="182" x="470" y="498">this is the model from above</description>
 <description align="center" color="yellow" colored="false" height="309" resized="false" width="183" x="458" y="11">this is your model that you have built from the training set - it will generate two new attributes: nutrientPredicted and hourPredicted - my &amp;quot;model&amp;quot; here always picks nutrient A at hour 13.</description>
 <description align="center" color="yellow" colored="false" height="267" resized="true" width="178" x="671" y="447">this is the scoring of my &amp;quot;model&amp;quot; - pretty terrible. The goal is to get this aggregate score &amp;#8805; 1000</description>
 </process>
 </operator>
</process>

Scott

Andrew · October 2017

I've managed to score 1134 but I'll post at the last possible moment

Andrew · October 2017

Hello all

I've managed to score 1134 but I'll post my process at the last possible moment.

regards

Andrew

sgenzer · October 2017

Boom! Well done, @Andrew! Anyone else coming in? There are prizes for 2nd and 3rd prizes.

Scott

jacobcybulski · October 2017

Hi Scott,

Just a question on the submission method. I am sure it has been addressed in the original competition document and your post above, I just want to ensure I am following your instructions to the letter. Here are some of my assumptions:

the submission is simply a post to this discussion area;
the submission can include multiple processes, including those called by "Execute Process";
we should refrain from using Python or R and instead focus on the pure RM solution;
the post needs to explain how to run the included processes;
all included XML inserts would be saved into the same folder with correct names;
you are not happy accepting a zipped directory of all RMP files;
we need to explain the method of data pre-processing and that we do not violate any rules;
you are going to penalise any copy-cats and any attempts of plagiarism of the submitted solutions;
finally, can I assume that the data provided to us has been nicely unzipped into two folders, or must we rely on the zipped data as it was provided to the competitors?

Jacob

P.S. Lots of questions and I am yet to get some good results to submit

sgenzer · October 2017

Hello @jacobcybulski - all good questions. Let me answer below.

the submission is simply a post to this discussion area;

YES.

the submission can include multiple processes, including those called by "Execute Process";

YES, as long as the processes that "Execute Process" calls are also included in your submission.

we should refrain from using Python or R and instead focus on the pure RM solution;

NO. You can use Python and R if you like. There were no rules that stipulated otherwise. Of course all scripts need to be executable, open-source, etc... We are not going to spend time ensuring that dependencies are there and so forth.

the post needs to explain how to run the included processes;

I would expect that it is fairly self-evident how to run your process. If you think that it needs some explanation, by all means go ahead. Otherwise we may reach out to you (see my previous post) if we cannot get it to run or do not understand something.

all included XML inserts would be saved into the same folder with correct names;

You can submit via XML posted directly in this thread using the </> tool, attached as .rmp files, or one attached .zip file with XML or .rmp inside. Any of these methods are fine.

you are not happy accepting a zipped directory of all RMP files;

NO. It is perfectly fine to attach a .zip file with all your .rmp files as long as the zip is able to be opened by anyone.

we need to explain the method of data pre-processing

No. We are not requiring any explanation. We will of course be looking at your process and ensuring that you are not gaming the systems (e.g. gaming the process so your score is high). I always assume that people are honest and have integrity until proven otherwise.

and that we do not violate any rules;

Yes - for all rules stated in this thread by me.

you are going to penalise any copy-cats and any attempts of plagiarism of the submitted solutions;

So again this is a collegial competition and I always assume that people are honest and have integrity. In addition, all RM processes are rather similar (we all use the same operators) so trying to examine millions of subprocesses for code snippets is not feasible nor desired. All submissions are public and open for the purposes of transparency and so that we can learn from one another (the main objective of these competitions).

That said, the sponsor and I have reserved the right to disqualify a submission if we deem it necessary, and if someone really does something dishonest, I absolutely have the right to disquality the submission and permanently ban the user from this community.

finally, can I assume that the data provided to us has been nicely unzipped into two folders, or must we rely on the zipped data as it was provided to the competitors?

We will have the data in both zipped and unzipped forms - it does not matter. However your process needs to grab the data as was originally posted.

Jacob

P.S. Lots of questions and I am yet to get some good results to submit

Wahoo! Well done, Jacob. Six days left!

Scott

sgenzer · October 2017

Hello all competitors - FYI the Competition Server is currently locked up so any jobs sent to the server will not be queued. I will ask my colleagues to do a hard reboot tomorrow morning first thing.

UPDATED - COMPETITION SERVER IS BACK UP AND RUNNING (9:30AM EST).

Thanks for your understanding. Lots of lessons learned here for me too. Three more days to go!

Scott

16B543J · October 2017

<?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
    <parameter key="logverbosity" value="init"/>
    <parameter key="random_seed" value="2001"/>
    <parameter key="send_mail" value="never"/>
    <parameter key="notification_email" value=""/>
    <parameter key="process_duration_for_mail" value="30"/>
    <parameter key="encoding" value="SYSTEM"/>
    <process expanded="true">
      <operator activated="true" class="retrieve" compatibility="7.6.001" expanded="true" height="68" name="Retrieve RC2_TestData_178" width="90" x="45" y="289">
        <parameter key="repository_entry" value="//Local Repository/data/RC2_TestData_178"/>
      </operator>
      <operator activated="true" class="filter_examples" compatibility="7.6.001" expanded="true" height="103" name="Filter Examples (test)" width="90" x="179" y="289">
        <parameter key="parameter_expression" value=""/>
        <parameter key="condition_class" value="no_missing_attributes"/>
        <parameter key="invert_filter" value="false"/>
        <list key="filters_list"/>
        <parameter key="filters_logic_and" value="true"/>
        <parameter key="filters_check_metadata" value="true"/>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="7.6.001" expanded="true" height="82" name="Select Attributes (test)" width="90" x="313" y="289">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attribute" value=""/>
        <parameter key="attributes" value="yieldIncrease|sensor9|sensor8|sensor7|sensor6|sensor5|sensor41|sensor40|sensor4|sensor39|sensor38|sensor37|sensor36|sensor35|sensor34|sensor33|sensor32|sensor31|sensor30|sensor3|sensor29|sensor28|sensor27|sensor26|sensor25|sensor24|sensor23|sensor22|sensor21|sensor20|sensor2|sensor19|sensor18|sensor17|sensor16|sensor15|sensor14|sensor13|sensor12|sensor11|sensor10|sensor1|hour|Label"/>
        <parameter key="use_except_expression" value="false"/>
        <parameter key="value_type" value="attribute_value"/>
        <parameter key="use_value_type_exception" value="false"/>
        <parameter key="except_value_type" value="time"/>
        <parameter key="block_type" value="attribute_block"/>
        <parameter key="use_block_type_exception" value="false"/>
        <parameter key="except_block_type" value="value_matrix_row_start"/>
        <parameter key="invert_selection" value="false"/>
        <parameter key="include_special_attributes" value="false"/>
      </operator>
      <operator activated="true" class="set_role" compatibility="7.6.001" expanded="true" height="82" name="Set Role (test)" width="90" x="447" y="289">
        <parameter key="attribute_name" value="yieldIncrease"/>
        <parameter key="target_role" value="label"/>
        <list key="set_additional_roles"/>
      </operator>
      <operator activated="true" class="nominal_to_numerical" compatibility="7.6.001" expanded="true" height="103" name="Nominal to Numerical (test)" width="90" x="581" y="289">
        <parameter key="return_preprocessing_model" value="false"/>
        <parameter key="create_view" value="false"/>
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="Label"/>
        <parameter key="attributes" value=""/>
        <parameter key="use_except_expression" value="false"/>
        <parameter key="value_type" value="nominal"/>
        <parameter key="use_value_type_exception" value="false"/>
        <parameter key="except_value_type" value="file_path"/>
        <parameter key="block_type" value="single_value"/>
        <parameter key="use_block_type_exception" value="false"/>
        <parameter key="except_block_type" value="single_value"/>
        <parameter key="invert_selection" value="false"/>
        <parameter key="include_special_attributes" value="false"/>
        <parameter key="coding_type" value="dummy coding"/>
        <parameter key="use_comparison_groups" value="false"/>
        <list key="comparison_groups"/>
        <parameter key="unexpected_value_handling" value="all 0 and warning"/>
        <parameter key="use_underscore_in_name" value="false"/>
      </operator>
      <operator activated="true" class="retrieve" compatibility="7.6.001" expanded="true" height="68" name="Retrieve" width="90" x="45" y="34">
        <parameter key="repository_entry" value="../data/RC2_TrainDate_1475"/>
      </operator>
      <operator activated="true" class="filter_examples" compatibility="7.6.001" expanded="true" height="103" name="Filter Examples" width="90" x="179" y="34">
        <parameter key="parameter_expression" value=""/>
        <parameter key="condition_class" value="no_missing_attributes"/>
        <parameter key="invert_filter" value="false"/>
        <list key="filters_list"/>
        <parameter key="filters_logic_and" value="true"/>
        <parameter key="filters_check_metadata" value="true"/>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="7.6.001" expanded="true" height="82" name="Select Attributes" width="90" x="313" y="34">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attribute" value=""/>
        <parameter key="attributes" value="yieldIncrease|Label|hour|sensor9|sensor8|sensor7|sensor6|sensor5|sensor41|sensor40|sensor4|sensor39|sensor38|sensor37|sensor36|sensor35|sensor34|sensor33|sensor32|sensor31|sensor30|sensor3|sensor29|sensor28|sensor27|sensor26|sensor25|sensor24|sensor23|sensor22|sensor21|sensor20|sensor2|sensor19|sensor18|sensor17|sensor16|sensor15|sensor14|sensor13|sensor12|sensor11|sensor10|sensor1"/>
        <parameter key="use_except_expression" value="false"/>
        <parameter key="value_type" value="attribute_value"/>
        <parameter key="use_value_type_exception" value="false"/>
        <parameter key="except_value_type" value="time"/>
        <parameter key="block_type" value="attribute_block"/>
        <parameter key="use_block_type_exception" value="false"/>
        <parameter key="except_block_type" value="value_matrix_row_start"/>
        <parameter key="invert_selection" value="false"/>
        <parameter key="include_special_attributes" value="false"/>
      </operator>
      <operator activated="true" class="set_role" compatibility="7.6.001" expanded="true" height="82" name="Set Role" width="90" x="447" y="34">
        <parameter key="attribute_name" value="yieldIncrease"/>
        <parameter key="target_role" value="label"/>
        <list key="set_additional_roles"/>
      </operator>
      <operator activated="true" class="nominal_to_numerical" compatibility="7.6.001" expanded="true" height="103" name="Nominal to Numerical" width="90" x="581" y="34">
        <parameter key="return_preprocessing_model" value="false"/>
        <parameter key="create_view" value="false"/>
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="Label"/>
        <parameter key="attributes" value=""/>
        <parameter key="use_except_expression" value="false"/>
        <parameter key="value_type" value="nominal"/>
        <parameter key="use_value_type_exception" value="false"/>
        <parameter key="except_value_type" value="file_path"/>
        <parameter key="block_type" value="single_value"/>
        <parameter key="use_block_type_exception" value="false"/>
        <parameter key="except_block_type" value="single_value"/>
        <parameter key="invert_selection" value="false"/>
        <parameter key="include_special_attributes" value="false"/>
        <parameter key="coding_type" value="dummy coding"/>
        <parameter key="use_comparison_groups" value="false"/>
        <list key="comparison_groups"/>
        <parameter key="unexpected_value_handling" value="all 0 and warning"/>
        <parameter key="use_underscore_in_name" value="false"/>
      </operator>
      <operator activated="true" class="normalize" compatibility="7.6.001" expanded="true" height="103" name="Normalize" width="90" x="715" y="34">
        <parameter key="return_preprocessing_model" value="false"/>
        <parameter key="create_view" value="false"/>
        <parameter key="attribute_filter_type" value="all"/>
        <parameter key="attribute" value=""/>
        <parameter key="attributes" value=""/>
        <parameter key="use_except_expression" value="false"/>
        <parameter key="value_type" value="numeric"/>
        <parameter key="use_value_type_exception" value="false"/>
        <parameter key="except_value_type" value="real"/>
        <parameter key="block_type" value="value_series"/>
        <parameter key="use_block_type_exception" value="false"/>
        <parameter key="except_block_type" value="value_series_end"/>
        <parameter key="invert_selection" value="false"/>
        <parameter key="include_special_attributes" value="true"/>
        <parameter key="method" value="Z-transformation"/>
        <parameter key="min" value="0.0"/>
        <parameter key="max" value="1.0"/>
        <parameter key="allow_negative_values" value="false"/>
      </operator>
      <operator activated="true" class="local_polynomial_regression" compatibility="7.6.001" expanded="true" height="82" name="Local Polynomial Regression" width="90" x="715" y="187">
        <parameter key="degree" value="2"/>
        <parameter key="ridge_factor" value="1.0E-9"/>
        <parameter key="use_robust_estimation" value="false"/>
        <parameter key="use_weights" value="true"/>
        <parameter key="iterations" value="20"/>
        <parameter key="numerical_measure" value="EuclideanDistance"/>
        <parameter key="kernel_type" value="radial"/>
        <parameter key="kernel_gamma" value="1.0"/>
        <parameter key="kernel_sigma1" value="1.0"/>
        <parameter key="kernel_sigma2" value="0.0"/>
        <parameter key="kernel_sigma3" value="2.0"/>
        <parameter key="kernel_degree" value="3.0"/>
        <parameter key="kernel_shift" value="1.0"/>
        <parameter key="kernel_a" value="1.0"/>
        <parameter key="kernel_b" value="0.0"/>
        <parameter key="neighborhood_type" value="Fixed Number"/>
        <parameter key="k" value="5"/>
        <parameter key="fixed_distance" value="5.0"/>
        <parameter key="distance" value="10.0"/>
        <parameter key="at_least" value="20"/>
        <parameter key="smoothing_kernel" value="Triweight"/>
      </operator>
      <operator activated="true" class="normalize" compatibility="7.6.001" expanded="true" height="103" name="Normalize (test)" width="90" x="715" y="289">
        <parameter key="return_preprocessing_model" value="false"/>
        <parameter key="create_view" value="false"/>
        <parameter key="attribute_filter_type" value="all"/>
        <parameter key="attribute" value=""/>
        <parameter key="attributes" value=""/>
        <parameter key="use_except_expression" value="false"/>
        <parameter key="value_type" value="numeric"/>
        <parameter key="use_value_type_exception" value="false"/>
        <parameter key="except_value_type" value="real"/>
        <parameter key="block_type" value="value_series"/>
        <parameter key="use_block_type_exception" value="false"/>
        <parameter key="except_block_type" value="value_series_end"/>
        <parameter key="invert_selection" value="false"/>
        <parameter key="include_special_attributes" value="true"/>
        <parameter key="method" value="Z-transformation"/>
        <parameter key="min" value="0.0"/>
        <parameter key="max" value="1.0"/>
        <parameter key="allow_negative_values" value="false"/>
      </operator>
      <operator activated="true" class="apply_model" compatibility="7.6.001" expanded="true" height="82" name="Apply Model" width="90" x="849" y="187">
        <list key="application_parameters"/>
        <parameter key="create_view" value="false"/>
      </operator>
      <operator activated="true" class="performance_regression" compatibility="7.6.001" expanded="true" height="82" name="Performance" width="90" x="983" y="187">
        <parameter key="main_criterion" value="first"/>
        <parameter key="root_mean_squared_error" value="true"/>
        <parameter key="absolute_error" value="false"/>
        <parameter key="relative_error" value="false"/>
        <parameter key="relative_error_lenient" value="false"/>
        <parameter key="relative_error_strict" value="false"/>
        <parameter key="normalized_absolute_error" value="false"/>
        <parameter key="root_relative_squared_error" value="false"/>
        <parameter key="squared_error" value="false"/>
        <parameter key="correlation" value="false"/>
        <parameter key="squared_correlation" value="false"/>
        <parameter key="prediction_average" value="false"/>
        <parameter key="spearman_rho" value="false"/>
        <parameter key="kendall_tau" value="false"/>
        <parameter key="skip_undefined_labels" value="true"/>
        <parameter key="use_example_weights" value="true"/>
      </operator>
      <operator activated="false" class="generate_attributes" compatibility="7.6.001" expanded="true" height="82" name="Generate Attributes" width="90" x="983" y="289">
        <list key="function_descriptions">
          <parameter key="DIFF" value="yieldIncrease-[prediction(yieldIncrease)]"/>
        </list>
        <parameter key="keep_all" value="true"/>
      </operator>
      <connect from_op="Retrieve RC2_TestData_178" from_port="output" to_op="Filter Examples (test)" to_port="example set input"/>
      <connect from_op="Filter Examples (test)" from_port="example set output" to_op="Select Attributes (test)" to_port="example set input"/>
      <connect from_op="Select Attributes (test)" from_port="example set output" to_op="Set Role (test)" to_port="example set input"/>
      <connect from_op="Set Role (test)" from_port="example set output" to_op="Nominal to Numerical (test)" to_port="example set input"/>
      <connect from_op="Nominal to Numerical (test)" from_port="example set output" to_op="Normalize (test)" to_port="example set input"/>
      <connect from_op="Retrieve" from_port="output" to_op="Filter Examples" to_port="example set input"/>
      <connect from_op="Filter Examples" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="Select Attributes" from_port="example set output" to_op="Set Role" to_port="example set input"/>
      <connect from_op="Set Role" from_port="example set output" to_op="Nominal to Numerical" to_port="example set input"/>
      <connect from_op="Nominal to Numerical" from_port="example set output" to_op="Normalize" to_port="example set input"/>
      <connect from_op="Normalize" from_port="example set output" to_op="Local Polynomial Regression" to_port="training set"/>
      <connect from_op="Local Polynomial Regression" from_port="model" to_op="Apply Model" to_port="model"/>
      <connect from_op="Normalize (test)" from_port="example set output" to_op="Apply Model" to_port="unlabelled data"/>
      <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
      <connect from_op="Performance" from_port="performance" to_port="result 2"/>
      <connect from_op="Performance" from_port="example set" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
    </process>
  </operator>
</process>

jacobcybulski · March 2019

Hi Scott @sgenzer are there any plans for the new RapidMiner competition? I have 400 master students in their 2nd week of Predictive Analytics class with RapidMiner and they are all eager try some "global" challenges. The past challenges were a great incentive to learn! Otherwise, I'd be doomed to send them to Kaggle

Jacob

sgenzer · March 2019

hello @jacobcybulski! It is good to hear from you. Unfortunately no, there are no competitions scheduled as the response was less than we hoped. You can send them to Kaggle, or as an alternative, DrivenData.

Scott

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

RAPIDMINER DATA SCIENCE COMPETITION: FARMING ON "MARS" – SEPTEMBER 12 TO OCTOBER 13, 2017

Answers