Example Process for Reporting Extension
Usually, you can use RapidMiner Server for creating web-based reports and even complete web applications. Other users prefer to deliver results into other data visualization products like Qlik or Tableau which is also supported by RapidMiner.
But there is another simple way to generate visual outputs as a result of your processes. This is done by using the Reporting extension for RapidMiner which is available here: https://marketplace.rapidminer.com/UpdateServer/faces/product_details.xhtml?productId=rmx_reporting
The Reporting extension does not require the use of any other parts of the RapidMiner platform but RapidMiner Studio. It also does not require any third party products in general. It simply generated the elements of HTML pages or PDF reports while RapidMiner executes the different steps of a process. The output is very similar to Notebooks as they are used by many data scientists.
Please download and install the extension first.
How to generate PDF / HTML / ... reports with a RapidMiner process?
The extension works quite simply. First thing you need to do is to "open" a new report and give it a name. This is done with the operator "Generate Report". In the settings of this operator you specify the name of the report (important: you will need to use the same report name for all other operators later on!). You can also select the type of the report (HTML, PDF etc.) as well as configure the look of the report.
Pro-Tip: You can even generate multiple reports in same process by using different report names.
Then you can add different operators to add elements to your report while RapidMiner is progressing its process. Those operators are:
- Add Section: adds a new section to the report. This section gets a name as well as a level which basically defines the hierarchy of your reporting document.
- Add Text: adds an arbitrary text to the report.
- Add Pagebreak: adds a page break (for example in PDF reports).
- Report: this is the key operator for reporting all dynamic content like data or models and will be explained below.
The operator "Report" is taking an arbitrary input and turns it into a graphical representation. Just like for all other reporting operators, you need to specify the name of the report to which the visualization should be added. Then you can configure the object by clicking on the button "Configure Report..." in the settings of the operator.
Here you can specify the output type and how the output should look like, e.g. that you want to export the data or a chart.
The process below is a working example which generates a PDF report based on the Iris data set. Read here about how to import the XML description below. Also make sure that you edit the filename in the "Generate Report" operator.
<?xml version="1.0" encoding="UTF-8"?><process version="7.4.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.4.000" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="reporting:generate_report" compatibility="5.3.000" expanded="true" height="82" name="Generate Report" width="90" x="45" y="34">
<parameter key="report_name" value="Report1"/>
<parameter key="pdf_output_file" value="C:\Users\IngoMierswa\Desktop\Report1_Test.pdf"/>
</operator>
<operator activated="true" class="reporting:add_section" compatibility="5.3.000" expanded="true" height="82" name="Add Section" width="90" x="179" y="34">
<parameter key="report_name" value="Report1"/>
<parameter key="report_section_name" value="Data"/>
</operator>
<operator activated="true" class="retrieve" compatibility="7.4.000" expanded="true" height="68" name="Retrieve Iris" width="90" x="51" y="136">
<parameter key="repository_entry" value="//Samples/data/Iris"/>
</operator>
<operator activated="true" class="reporting:report" compatibility="5.3.000" expanded="true" height="68" name="Report" width="90" x="185" y="136">
<parameter key="report_name" value="Report1"/>
<parameter key="report_item_header" value="Data"/>
<parameter key="specified" value="true"/>
<parameter key="reportable_type" value="Data Table"/>
<parameter key="renderer_name" value="Data View"/>
<list key="parameters">
<parameter key="attribute_filter_type" value="all"/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="attribute_value"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="time"/>
<parameter key="block_type" value="attribute_block"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="value_matrix_row_start"/>
<parameter key="invert_selection" value="false"/>
<parameter key="include_special_attributes" value="false"/>
<parameter key="min_row" value="1"/>
<parameter key="max_row" value="150"/>
</list>
</operator>
<operator activated="true" class="reporting:add_section" compatibility="5.3.000" expanded="true" height="82" name="Add Section (3)" width="90" x="313" y="136">
<parameter key="report_name" value="Report1"/>
<parameter key="report_section_name" value="Class Distribution"/>
</operator>
<operator activated="true" class="reporting:report" compatibility="5.3.000" expanded="true" height="68" name="Report (3)" width="90" x="447" y="136">
<parameter key="report_name" value="Report1"/>
<parameter key="report_item_header" value="Class Distribution"/>
<parameter key="specified" value="true"/>
<parameter key="reportable_type" value="Data Table"/>
<parameter key="renderer_name" value="Plot View"/>
<list key="parameters">
<parameter key="plotter" value="Pie"/>
<parameter key="scatter_axis_x_axis_log_scale" value="false"/>
<parameter key="scatter_axis_y_axis_log_scale" value="false"/>
<parameter key="scatter_jitter_amount" value="0"/>
<parameter key="scatter_rotate_labels" value="false"/>
<parameter key="scatter_multiple_axis_x_axis_log_scale" value="false"/>
<parameter key="scatter_multiple_jitter_amount" value="0"/>
<parameter key="scatter_multiple_rotate_labels" value="false"/>
<parameter key="scatter_matrix_jitter_amount" value="0"/>
<parameter key="bubble_axis_x_axis_log_scale" value="false"/>
<parameter key="bubble_rotate_labels" value="false"/>
<parameter key="parallel_rotate_labels" value="false"/>
<parameter key="parallel_local_normalization" value="false"/>
<parameter key="series_rotate_labels" value="false"/>
<parameter key="series_multiple_rotate_labels" value="false"/>
<parameter key="som_jitter_amount" value="0"/>
<parameter key="block_axis_x_axis_log_scale" value="false"/>
<parameter key="block_axis_y_axis_log_scale" value="false"/>
<parameter key="block_jitter_amount" value="0"/>
<parameter key="block_rotate_labels" value="false"/>
<parameter key="deviation_rotate_labels" value="false"/>
<parameter key="deviation_local_normalization" value="false"/>
<parameter key="histogram_absolute_values" value="false"/>
<parameter key="histogram_rotate_labels" value="false"/>
<parameter key="histogram_log_scale" value="false"/>
<parameter key="histogram_number_of_bins" value="40"/>
<parameter key="histogram_opaqueness" value="100"/>
<parameter key="histogram_color_absolute_values" value="false"/>
<parameter key="histogram_color_rotate_labels" value="false"/>
<parameter key="histogram_color_log_scale" value="false"/>
<parameter key="histogram_color_number_of_bins" value="40"/>
<parameter key="histogram_color_opaqueness" value="100"/>
<parameter key="bars_absolute_values" value="false"/>
<parameter key="bars_rotate_labels" value="false"/>
<parameter key="bars_aggregation" value="none"/>
<parameter key="bars_use_distinct" value="false"/>
<parameter key="bars_orientation" value="vertical"/>
<parameter key="bars_stacked_absolute_values" value="false"/>
<parameter key="bars_stacked_rotate_labels" value="false"/>
<parameter key="bars_stacked_aggregation" value="none"/>
<parameter key="bars_stacked_use_distinct" value="false"/>
<parameter key="bars_stacked_orientation" value="vertical"/>
<parameter key="pareto_rotate_labels" value="false"/>
<parameter key="pareto_sorting_direction" value="Descending Keys"/>
<parameter key="pareto_show_bar_labels" value="true"/>
<parameter key="pareto_show_cumulative_labels" value="false"/>
<parameter key="distribution_rotate_labels" value="false"/>
<parameter key="web_absolute_values" value="false"/>
<parameter key="web_rotate_labels" value="false"/>
<parameter key="web_aggregation" value="none"/>
<parameter key="web_use_distinct" value="false"/>
<parameter key="pie_axis_group_by_column" value="label"/>
<parameter key="pie_plot_column" value="label"/>
<parameter key="pie_absolute_values" value="false"/>
<parameter key="pie_aggregation" value="count"/>
<parameter key="pie_use_distinct" value="false"/>
<parameter key="pie_explosion_amount" value="0"/>
<parameter key="pie_3d_absolute_values" value="false"/>
<parameter key="pie_3d_aggregation" value="none"/>
<parameter key="pie_3d_use_distinct" value="false"/>
<parameter key="ring_absolute_values" value="false"/>
<parameter key="ring_aggregation" value="none"/>
<parameter key="ring_use_distinct" value="false"/>
<parameter key="ring_explosion_amount" value="0"/>
</list>
</operator>
<operator activated="true" class="reporting:add_text" compatibility="5.3.000" expanded="true" height="68" name="Add Text" width="90" x="45" y="238">
<parameter key="report_name" value="Report1"/>
<parameter key="report_text_header" value="End of Report"/>
<parameter key="report_text" value="This is the end of this report."/>
</operator>
<connect from_op="Generate Report" from_port="through 1" to_op="Add Section" to_port="through 1"/>
<connect from_op="Retrieve Iris" from_port="output" to_op="Report" to_port="reportable in"/>
<connect from_op="Report" from_port="reportable out" to_op="Add Section (3)" to_port="through 1"/>
<connect from_op="Add Section (3)" from_port="through 1" to_op="Report (3)" to_port="reportable in"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
</process>
</operator>
</process>
Comments
Great! Thank you!