The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Advanced Reporting Extension published by Old World Computing

landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
edited November 2018 in Knowledge Base

The idea of the Advanced Reporting Extension published by Old World Computing is to use the capabilities of RapidMiner to automate any regular reporting task that results in an Excel sheet. There have been many projects and data science departments that simply drown in these kind of request, consuming all resource before you can get to the really fun part of data science. Now you can simply start at the beginning to create a nearly zero overhead reporting, even if you don't have or can't use real business intelligence tools like tableau or qlik.

How does that work?

 

 

Step 1: Create a template in Excel

 First we create a dummy sheet and add all of the desired layout components, diagrams, texts and of course areas for data.

We can use any formatting, chart type or conditional coloring that we like, including the nice spark lines. Just one thing is important: We need to reserve space for inserting the data. What will happen later is, that we overwrite parts of the content of the table with data from RapidMiner. So if we have more than three employees, we would need either let more space between the table and the diagram, or just put the data into a separate sheet and reference this in the diagram. But if you are used to Excel reporting, you probably know all these tricks...

Insert some dummy values so that you can see the charts in action.

Don't forget to save the file. We will need it later.

 image

 

 

Step 2: Create a process in RapidMiner to load the data

RapidMiner is very versatile to get the data into the shape you want. It can read and combine many different formats and sources and then aggregate, join, pivot and process the data into the shape that you need it.

On the right you see a process combining data from four different sources with multiple joins and preprocessing steps to match the data. Such a process could just deliver us the data we want to put into our nice Worktime sheet. 

Of course it could be much simpler and just contain a single SQL query or also be very much more complex involving calling of webservices, Big Data and analytics on hadoop, some machine learning or whatever. The trick is that we can leverage the entire flexibility of RapidMiner to get the data we want to put into an Excel sheet.

image

 

 

Step 3: Open Report

Once we have the data in the desired format, we add an Open Report (Excel) operator from our extension. You see it on the right hand side in the operator tree. We need to point the operator on two files: The template file we created and saved in Step 1. You can either use the parameter form template file or the tem input port. The second file can be specified as target file parameter or by using the tar output port.

Why are there ports for the files? Because it allows you to handle the files conveniently in scenarios where you want to do stuff with them in the process later. You could even create a template file in a RapidMiner process, or less fancy and more realistic: Store the file in the repository of a RapidMiner Server to share among many users. The output file port is most useful if you want to either zip the result or return it as a webservice result in a RapidMiner Server Webservice or Web Application.

Any data we want to insert into the Excel file, we need to forward to the input ports of the Open Report (Excel) operator. Don't worry, there will always be another input port if you connect the last one. We will use the data delivered to these ports in the inner subprocess to do the actual insertion.

image

 

 

Step 4: Insert Tabular Data

If we entered the inner process of the Open Report (Excel), we can add the Write Data Entry (Excel) operator to insert an ExampleSet into the excel. We have done so with the first ExampleSet on the screenshot on the right. The operator allows to select which attributes to use and where to place it. Therefore you specify the sheet where it will be insert by it's index. Then you point it to a fill range. A range can be either open ended by specifying the left upper cell of the area or closed, if followed by a colon and the right lower cell. So B2 would start in the second column, second row. B2:D4 would allow to fill 2 rows and 2 columns.

For our little employee table from Step 1, we set it to B11:C13. Unless we select fit to range, the process will now fail if our data does not fit into this range.

We will add another operator of this type to output the second table.

image

 

 

Step 5: Insert Data

The only thing missing is the version tag, so that people know what this report was about, when they open them at some point later.

Therefore we first use a Generate Macro operator from RapidMiner's core functionality to create a process variable (or macro as they call it) containing the current date and time. We then add a Write Cell (Excel) operator from the Advanced Reporting Extension and connect the ports. Although there will be no data flowing from the Generate Macro operator to the Write Cell (Excel) operator, the connection makes sure that the Generate Macro will be executed first and set the process variable before it is read.

Then we just need to point the Write Cell (Excel) operator to the right fill position, which is F5 in our case. Setting the value and type correctly and we are good to go.

Short notice on dates: There is an unlimited number of different date formats out there. If you want to write a date to excel, you first need to parse the date format that the value has in RapidMiner. So if you enter something like 2017-03-29 23:59:59 as value, you should enter "yyyy-MM-dd HH:mm:ss" in the date format parameter of the Write Cell (Excel) operator. Once it knows the date, it will automatically transform it in the correct format of the Excel Template Sheet, where you set it with the Cell Format.

image

 

 

Once the subprocess is finished the target file will be written and you just need to mail it to someone else and be done with it.

We would like to recommend to just automate about everything right from the beginning. There will be nothing like a "I just need to do this once". In 90% of all cases, you will need to do it twice and then the additional overhead of the automation already would have paid off. So please feel free to download the extension, order a license and ask any questions you might have. In case you are not convinced, yet, the free version let's you access the full functionality and only limits the number of Write operators to one within each subprocess.

Download it here.

Tagged:

Answers

  • IngoRMIngoRM Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi Sebastian,
    this is awesome! I tried it and it worked very well. A small comments: it looks like the "Activate License Button" did not get the proper UI label - I only got the I18N key instead. Otherwise it worked like a charm! Many thanks, Ingo
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn

    Hi Ingo,

    that's indeed true, but only if you didn't install it together with our Jackhammer extension. And obviously nobody would NOT like to have it, so simply install it as well and there is a label.

    Okay, just kidding, we will fix that with the next version. Thanks for letting me know!

     

    Greetings,

     Sebastian

     

    PS: The Advanced Reporting extension is now available over the marketplace!

  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn

    Hi Sebastian,

     

    you've a great vision on what RapidMiner users need!

     

    I'm looking forward to using this extension. 

     

    Regards,

    Balázs

  • Telcontar120Telcontar120 RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    Sebastian, this is a terrific extension and brings much needed reporting automation capabilities to RapidMiner.  I've already ordered my license and can't wait to start using this extension!

    Brian

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • manuel_martinmanuel_martin Member Posts: 2 Learner III
    Hi Sebastian,

    is it possible, to write an IOObject Collection into one excel-File via your extension, with the result, that each example set of the collection is written in one sheet of the excel-file?

    Best regards,

    Manuel
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn

    Hi Manuel,

    in principle, yes. Only requirement is, that the sheets are already existing in the template file, as the extension is thought to fill prepared reporting forms.

    So if you have an excel template file with say 12 sheets and your collection does contain the 12 data sets, you can simply put the Write Excel Report operator into a Loop Collection operator and set the sheet parameter of the operator to "%{a}". This macro will be replaced with the execution count of the operator, meaning it will automatically increment from 1 to 12 if you loop over the collection.

    Does that help? However, I just got the idea that a copy sheet operator would also be handy, in case you have a flexible number of sheets and don't want to prepare all of them manually. What do you think?

     

    Greetings,

    Sebastian

  • manuel_martinmanuel_martin Member Posts: 2 Learner III
    Thanks for your fast answer! This really solves most parts of my issue. A solution, like the one you related to at the end of your post, would be perfect to deal with varying numbers of sheets. Nevertheless, your answer already helps a lot.

    Regards,
    Manuel
Sign In or Register to comment.