ALL FEATURE REQUESTS HERE ARE MONITORED BY OUR PRODUCT TEAM.

VOTING MATTERS!

IDEAS WITH HIGH NUMBERS OF VOTES (USUALLY ≥ 10) ARE PRIORITIZED IN OUR ROADMAP.

NOTE: IF YOU WISH TO SUGGEST A NEW FEATURE, PLEASE POST A NEW QUESTION AND TAG AS "FEATURE REQUEST". THANK YOU.
The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Generate multiple visualizations from a RapidMiner process

christos_karraschristos_karras Member Posts: 50 Guru
edited December 2019 in Product Ideas
I would like to automate the generation of RapidMiner visualizations for cases such as:
- generate a set of scatter plots for the most correlated pairs of variables in a data set and store all resulting visualizations in the RapidMiner repository so that they can be easily viewed
- re-generate a previously generated visualization on new data, with exactly the same appearance as the previous time I generated it (without having to manually export and re-import a JSON definition of the visualization)

The only way I found to do this was through the Python scripting operator (use matplotlib and save the results to a PNG file in the RapidMiner repository). Is there a way to do the same thing using RapidMiner's visualizations (either using built-in operators or using an extension)?
3
3 votes

Open for Voting · Last Updated

IC-1721

Comments

  • pschlunderpschlunder Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, RMResearcher, Member Posts: 96 RM Research

    thanks for the input. Indeed, that is a very useful scenario. As of now you can use your described way through python, or use the reporting extension to automatically generate visualizations.

    For the future a mechanism like you described could be a nice way to go. How would you like to see it build? One option might be closely to what you've describe:
    Having a new "Apply Plot Configuration" Operator that allows to apply a new plot configuration object to a data set. The output would be a plot.

    Let me know what your ideas around it are.

    - Philipp
  • christos_karraschristos_karras Member Posts: 50 Guru
    Hi @pschlunder,

    About the reporting extension, I was not aware of it, I experimented a bit with it today and will experiment more with it later but it doesn't seem as easy to use as what I had in mind. Also it generates static images instead of interactive visualizations. This can be useful if the intention is to share visualizations with others, but not when the intention is to use as a starting point for further data exploration.

    Also, with the reporting extension, if I use a visualization defined in a JSON file, the JSON is not actually stored in the RapidMiner process, so it breaks if I delete the JSON file. Therefore this will be a problem when sharing RapidMiner processes to generate reports by storing them in a RapidMiner Server Repository.

    An "Apply Plot Configuration" as you describe is closer to what I had in mind, but I would like to elaborate on it:
    - The operator should include a "designer" view that opens the same UI as the Visualization designer in the results tab.
    - The plot configuration should be stored directly in the process file (as part of the operator's configuration), not in an external JSON file. But reuse of existing configurations by exporting/importing JSON file should also be possible
    - It should be possible to reference macros in the designer, for example macros that define the variable name(s) to plot on the X axis, Y axis, color, size, etc
    - If used within a Loop operator, it should be possible to have a single "IOObjectCollection" (or something similar) that combines all plots in a single object, so that it can be stored as a single object in the repository. It should also be possible to specify the name of each object in the collection, for example "Var1-Var2-ScatterPlot"
    - Each stored visualization should be fully interactive. For example, it should be possible to zoom in one of the generated scatter plots, customize the color for a specific scatter plot, etc

    Thanks
  • pschlunderpschlunder Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, RMResearcher, Member Posts: 96 RM Research

    thanks for the detailed input. I'll share it with colleagues for when we're working on it!

    Regarding the problem with the JSON file when sharing the process, you can load a JSON file with the Read JSON operator and then store the loaded ExampleSet in the Repository to re-use it later. This might help.

    Cheers,
    Philipp
Sign In or Register to comment.