Wrinting an excel file with multiple sheets

mandeep_kumar · January 2018

Hi

I'm looking for an option to write an excel/csv file with multiple sheets in it. The current "Write Excel" operator in Rapidminer can only write to a single sheet (under sheet name).

Is it possible to merge 2 excel files with different tab names into 1 file containing data from all tabs (across files) in Append mode ?

Thanks

Mandeep

sgenzer · January 2018

hello @mandeep_kumar - welcome to the community. There are several threads on this topic including one exactly about this two weeks ago. Did you do a search?

https://community.rapidminer.com/t5/forums/searchpage/tab/message?advanced=false&allow_punctuation=false&q=excel%20multiple%20sheets

Scott

mandeep_kumar · January 2018

@sgenzer

I have gone through these threads and i don't think there is any option except to have Execute R/python script that can perform.

Do you know if there are any operators (in rapidminer) that can write data in multiple sheets in a single excel file.

thanks

Mandeep

lionelderkrikor · January 2018

Hi @mandeep_kumar,

To perform this task, you will need of Advanced Reporting Extension (to download and install from the MarketPlace).

Here a process, where 2 exampleSets are written in two differents sheets of a same Excel file.

<?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="operator_toolbox:create_exampleset_from_doc" compatibility="0.7.000" expanded="true" height="68" name="ExampleSet_1" width="90" x="45" y="34">
        <parameter key="Input Csv" value="Att1,Att2&#10;1,4&#10;2,5&#10;3,6"/>
      </operator>
      <operator activated="true" class="operator_toolbox:create_exampleset_from_doc" compatibility="0.7.000" expanded="true" height="68" name="ExampleSet_2" width="90" x="45" y="136">
        <parameter key="Input Csv" value="Att1,Att2&#10;1,8&#10;2,10&#10;3,12"/>
      </operator>
      <operator activated="true" class="collect" compatibility="8.0.001" expanded="true" height="103" name="Collect" width="90" x="179" y="85"/>
      <operator activated="true" class="loop_collection" compatibility="8.0.001" expanded="true" height="82" name="Loop Collection" width="90" x="313" y="85">
        <parameter key="set_iteration_macro" value="true"/>
        <parameter key="macro_name" value="sheetNumber"/>
        <process expanded="true">
          <operator activated="true" class="rmx_adv_reporting:open_report_excel" compatibility="2.1.693" expanded="true" height="103" name="Open Report (2)" width="90" x="447" y="34">
            <parameter key="form_template_file" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Tests_Rapidminer\Test_export_Excel_Extension\RM_Export_Excel.xlsx"/>
            <parameter key="target_file" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Tests_Rapidminer\Test_export_Excel_Extension\RM_Export_Excel.xlsx"/>
            <process expanded="true">
              <operator activated="true" class="rmx_adv_reporting:write_data_entry_excel" compatibility="2.1.693" expanded="true" height="68" name="Write Data Entry (2)" width="90" x="514" y="34">
                <parameter key="sheet" value="%{sheetNumber}"/>
                <parameter key="fit_to_range" value="true"/>
                <parameter key="ignore_names" value="false"/>
              </operator>
              <connect from_port="input 1" to_op="Write Data Entry (2)" to_port="example set"/>
              <connect from_op="Write Data Entry (2)" from_port="exampe set" to_port="output 1"/>
              <portSpacing port="source_input 1" spacing="0"/>
              <portSpacing port="source_input 2" spacing="0"/>
              <portSpacing port="sink_output 1" spacing="0"/>
              <portSpacing port="sink_output 2" spacing="0"/>
            </process>
          </operator>
          <connect from_port="single" to_op="Open Report (2)" to_port="input 1"/>
          <connect from_op="Open Report (2)" from_port="output 1" to_port="output 1"/>
          <portSpacing port="source_single" spacing="0"/>
          <portSpacing port="sink_output 1" spacing="0"/>
          <portSpacing port="sink_output 2" spacing="0"/>
        </process>
      </operator>
      <connect from_op="ExampleSet_1" from_port="output" to_op="Collect" to_port="input 1"/>
      <connect from_op="ExampleSet_2" from_port="output" to_op="Collect" to_port="input 2"/>
      <connect from_op="Collect" from_port="collection" to_op="Loop Collection" to_port="collection"/>
      <connect from_op="Loop Collection" from_port="output 1" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

NB : You have to create preliminarily in the Excel file, the sheets in which you want to write.

I hope this help,

Regards,

Lionel

Brian_Wells · March 2019

So it appears that this is still a shortcoming even in RM 9.2 and I have to say that I am stunned by such a glaring omission in the base feature set. If I did not love RapidMiner Studio so much I probably wouldn't care but simply use Alteryx or some other tool when I need to do large scale data reporting. Given how common the format is, how much it is used throughout most companies and the fact that there is no reasonable alternative format to Excel's collection of tabular data tables within a single file, this should be #1 on the feature enhancement list with no close second. Having to purchase a third party extension to do this, even if the fee is nominal, is completely unacceptable for a piece of software that I pay $10,000/yr for, as is having to create my own operator using R or Apache POI, Python, etc.

Please address this soon, guys. I am fighting for wide-spread RapidMiner adoption in my company and I can guarantee that this will be a major stumbling block for a lot of folks.

kayman · March 2019

If you don't mind using python this can be easily be achieved using the xlsxwriter package.
I share you point that having it out of the box available would be a great thing, but till then this may help also.

Find attached a working sample writing different datasets to different tabs

<?xml version="1.0" encoding="UTF-8"?><process version="9.1.000">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="9.1.000" expanded="true" name="Process">
    <parameter key="logverbosity" value="init"/>
    <parameter key="random_seed" value="2001"/>
    <parameter key="send_mail" value="never"/>
    <parameter key="notification_email" value=""/>
    <parameter key="process_duration_for_mail" value="30"/>
    <parameter key="encoding" value="SYSTEM"/>
    <process expanded="true">
      <operator activated="true" class="generate_data" compatibility="9.1.000" expanded="true" height="68" name="Generate Data" width="90" x="112" y="136">
        <parameter key="target_function" value="random"/>
        <parameter key="number_examples" value="100"/>
        <parameter key="number_of_attributes" value="5"/>
        <parameter key="attributes_lower_bound" value="-10.0"/>
        <parameter key="attributes_upper_bound" value="10.0"/>
        <parameter key="gaussian_standard_deviation" value="10.0"/>
        <parameter key="largest_radius" value="10.0"/>
        <parameter key="use_local_random_seed" value="false"/>
        <parameter key="local_random_seed" value="1992"/>
        <parameter key="datamanagement" value="double_array"/>
        <parameter key="data_management" value="auto"/>
      </operator>
      <operator activated="true" class="generate_data" compatibility="9.1.000" expanded="true" height="68" name="Generate Data (2)" width="90" x="112" y="238">
        <parameter key="target_function" value="random"/>
        <parameter key="number_examples" value="100"/>
        <parameter key="number_of_attributes" value="5"/>
        <parameter key="attributes_lower_bound" value="-10.0"/>
        <parameter key="attributes_upper_bound" value="10.0"/>
        <parameter key="gaussian_standard_deviation" value="10.0"/>
        <parameter key="largest_radius" value="10.0"/>
        <parameter key="use_local_random_seed" value="false"/>
        <parameter key="local_random_seed" value="1992"/>
        <parameter key="datamanagement" value="double_array"/>
        <parameter key="data_management" value="auto"/>
      </operator>
      <operator activated="true" class="generate_data" compatibility="9.1.000" expanded="true" height="68" name="Generate Data (3)" width="90" x="112" y="340">
        <parameter key="target_function" value="random"/>
        <parameter key="number_examples" value="100"/>
        <parameter key="number_of_attributes" value="5"/>
        <parameter key="attributes_lower_bound" value="-10.0"/>
        <parameter key="attributes_upper_bound" value="10.0"/>
        <parameter key="gaussian_standard_deviation" value="10.0"/>
        <parameter key="largest_radius" value="10.0"/>
        <parameter key="use_local_random_seed" value="false"/>
        <parameter key="local_random_seed" value="1992"/>
        <parameter key="datamanagement" value="double_array"/>
        <parameter key="data_management" value="auto"/>
      </operator>
      <operator activated="true" class="python_scripting:execute_python" compatibility="9.1.000" expanded="true" height="145" name="Execute Python (2)" width="90" x="313" y="136">
        <parameter key="script" value="import pandas as pd&#10;import xlsxwriter&#10;import os&#10;&#10;user = os.getlogin()&#10;tpath = 'C:\\Users\\' + user + '\\data\\tmp'&#10;&#10;def rm_main(d3,d1,d2):&#10;&#10;    writer = pd.ExcelWriter(tpath + '/multitab.xlsx', engine='xlsxwriter')&#10;&#10;    d1.to_excel(writer, 'set1')  &#10;    d2.to_excel(writer, 'set2')&#10;    d3.to_excel(writer, 'set3')&#10;    # Save the result &#10;    writer.save()&#10;&#10;    return"/>
        <parameter key="use_default_python" value="true"/>
        <parameter key="package_manager" value="conda (anaconda)"/>
      </operator>

      <connect from_op="Generate Data" from_port="output" to_op="Execute Python (2)" to_port="input 1"/>
      <connect from_op="Generate Data (2)" from_port="output" to_op="Execute Python (2)" to_port="input 2"/>
      <connect from_op="Generate Data (3)" from_port="output" to_op="Execute Python (2)" to_port="input 3"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
    </process>
  </operator>
</process>

sgenzer · March 2019

stay tuned @Brian_Wells

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Wrinting an excel file with multiple sheets

Answers