[SOLVED] Write results in different files automatically?

T-Unit · September 2012

Hi everyone,

i'm doing some clustering and want to bring up some cluster-models (k-medeoids) using different model-parameters (number of clusters, max runs, max optimization steps, ...). Doing so i used the "optimize"-Operator to generate several cluster-models using different parameters. I need the clustered data for further analytics (doesn't matter if the used parameter combinations are perfect or not) so i use the "write excel"-operator to extract the generated data into an excel sheet. But doing so i only get the clustered data of the first run (eg. when k was 2) into the final excel file. In the "optimize"-operator i tell the process it should change (for example) the number of clusters from k= 2 to 20.

My Question:
Is is it possible to change the name of the Output-File automatically during the process is running?

I mean it this way:
choose k=2 --> do the clustering --> save the results to file named "results_k_2.xls"
choose k=3 --> do the clustering --> save the results to file named "results_k_3.xls"
...
choose k= 20 --> do the clustering --> save the results to file named "results_k_20.xls"

Thanks for help.

Greetings,
Thomas

awchisholm · September 2012

Hello,

You can use macros to do this. If you have a macro containing k then you could create another from it containing the filename you want and use that as the parameter to the write excel operator.

Regards

Andrew

T-Unit · September 2012

Hello Andrew,

first of all thanks for your fast reply.

Your idea sounds logical to me but - to be honest - i don't have any glue how to work with macros in rapidminer. Neither I know how and where to define them nor how to use them in the process. Maybe you can give a recommendation to a website where working with marcos in rapidminer is (detailed) explained? Your blog from september 15th gives a short look on what the macro can be used for but i can't implement this to my process.

Regards,
Thomas

awchisholm · September 2012

Hello

You could modify this example

http://rapidminernotes.blogspot.co.uk/2012/07/chopping-files-into-smaller-bits.html

regards

Andrew

Skirzynski · September 2012

Macros are some kind of named variables you can set and use everywhere in the process. To set a macro there are two ways:

In the context tab of your process
With the macro operators in Utility/Macros (see the help tab for usage)

To use a macro you write %{name_of_the_macro}, e.g. results_k_%{k}.xls. Don't forget to define a macro k with the macro operator before you use it.

T-Unit · September 2012

Hello Marcin,

i implemented - using the "Set macro"-Operator - a macro called "k". How can i give this Parameter the value of the actual count of clusters of "Cluster"-Operator (the count of clusters is set by the "optimize Parameter"-Operator and changes from 2 to 20)? I tried "operator.Clustering.parameter.k" but this didn't work properly. Instead of different files of the kind "results_k_2.xls", "results_k_3.xls", ... i got only one file named "results_k_operator.Clustering.parameter.k.xls". Maybe it's impossible to direct access to the value of a models parameters?

Regards,
Thomas

Skirzynski · September 2012

I thought that there is a predefined macro for this but i was wrong. So unfortunately there is no easy way to do this, but a hack. You can log the parameter of an operator, transform it to an example set and extract a macro from the last example (-1) from this example set. Here is an example process.


<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.009">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.2.009" expanded="true" name="Process">
    <process expanded="true" height="520" width="643">
      <operator activated="true" class="retrieve" compatibility="5.2.009" expanded="true" height="60" name="Retrieve" width="90" x="45" y="75">
        <parameter key="repository_entry" value="//Samples/data/Iris"/>
      </operator>
      <operator activated="true" class="optimize_parameters_grid" compatibility="5.2.009" expanded="true" height="94" name="Optimize Parameters (Grid)" width="90" x="246" y="75">
        <list key="parameters">
          <parameter key="Clustering.k" value="[2.0;20;19;linear]"/>
        </list>
        <process expanded="true" height="538" width="643">
          <operator activated="true" class="k_means" compatibility="5.2.009" expanded="true" height="76" name="Clustering" width="90" x="45" y="30">
            <parameter key="k" value="20"/>
          </operator>
          <operator activated="true" class="apply_model" compatibility="5.2.009" expanded="true" height="76" name="Apply Model" width="90" x="179" y="30">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="log" compatibility="5.2.009" expanded="true" height="76" name="Log" width="90" x="45" y="210">
            <list key="log">
              <parameter key="k" value="operator.Clustering.parameter.k"/>
            </list>
          </operator>
          <operator activated="true" class="log_to_data" compatibility="5.2.009" expanded="true" height="94" name="Log to Data" width="90" x="179" y="210"/>
          <operator activated="true" class="write_csv" compatibility="5.2.009" expanded="true" height="76" name="Write CSV" width="90" x="380" y="300">
            <parameter key="csv_file" value="/home/marcin/temp/result_k_%{k}.csv"/>
          </operator>
          <operator activated="true" class="extract_macro" compatibility="5.2.009" expanded="true" height="60" name="Extract Macro" width="90" x="313" y="165">
            <parameter key="macro" value="k"/>
            <parameter key="macro_type" value="data_value"/>
            <parameter key="attribute_name" value="k"/>
            <parameter key="example_index" value="-1"/>
            <list key="additional_macros"/>
          </operator>
          <operator activated="true" class="performance" compatibility="5.2.009" expanded="true" height="76" name="Performance" width="90" x="514" y="300"/>
          <connect from_port="input 1" to_op="Clustering" to_port="example set"/>
          <connect from_op="Clustering" from_port="cluster model" to_op="Apply Model" to_port="model"/>
          <connect from_op="Clustering" from_port="clustered set" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Apply Model" from_port="labelled data" to_op="Log" to_port="through 1"/>
          <connect from_op="Log" from_port="through 1" to_op="Log to Data" to_port="through 1"/>
          <connect from_op="Log to Data" from_port="exampleSet" to_op="Extract Macro" to_port="example set"/>
          <connect from_op="Log to Data" from_port="through 1" to_op="Write CSV" to_port="input"/>
          <connect from_op="Write CSV" from_port="through" to_op="Performance" to_port="labelled data"/>
          <connect from_op="Performance" from_port="performance" to_port="performance"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="source_input 2" spacing="0"/>
          <portSpacing port="sink_performance" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
        </process>
      </operator>
      <connect from_op="Retrieve" from_port="output" to_op="Optimize Parameters (Grid)" to_port="input 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
    </process>
  </operator>
</process>

Please note that the "Extract Macro" operator has to be executed before you use the macro (click on the blue double-arrow with the question mark to check and alter the execution order).

T-Unit · September 2012

Thanks for your fast reply and your suggestion, Marcin!

I crawled around the forum an found a thread that helped me to solve my problem (the hint of Sebastian Land is it):
http://rapid-i.com/rapidforum/index.php/topic,1014.0.html

So here is my adaption:
I put a "Clone Parameters"-operator after the cluster-operator. The clone-operator is connected to the "set macro"-operator. In the "Clone Parameters"-operator i filled in the following:
source: Clustering.k
target: Set Macro.value

So the changing value of k is copied to the value for the macro and the macro is later used to generate the different filenames (results_k_2.xls, results_k_3.xls, and so on).

The solution is kinda simple but I assure, that i would have never solve this problem by myself (or even expect that the "clone parameters"-operator would do it). Hope this will help other users with the same problem.

Regards and thanks to all who tried to help me,
Thomas

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

[SOLVED] Write results in different files automatically?

Answers