TurboPrep as usual intermediate step in a process

land · August 2018

Hi all,

I just saw the turbo prep preview video. Looks really great and we are looking forward to use it as it probably is helpful in a good share of our daily work.

However, I'm not sure if this is usable in the experts workflow? I would like to get this functionality as a regular view on the data that has been produced by a process. When I run into a breakpoint and see, the data is still wrong, I would like to have the ability to just fix it with turbo prep and the resulting operator chain can be inserted after the operator which created the data. THAT would be really a great integration into the power of the process engine!

Otherwise one would need to store the data in repository, load in the turbo prep, fix the problems, copy the created process, re-open the producing process, pasting, removing retrieve operator, connecting ports, that's a lot of overhead. That will probably slow me down so far that it's not efficient anymore.

PS: Is there an extension API so that we could add our own functions to the turbo prep? There may be domain based repetitive steps that could be solved very nicely with that

Greetings,

Sebastian

sgenzer · August 2018

cc @IngoRM

IngoRM · August 2018

Hi Sebastian,

Fantastic idea - this is exactly the type of feedback I have been hoping for. This should be doable in principle but would require some careful planning and some additional UI changes (since we would need to distinguish the scenarios how TP was started...). I will think about this a bit and see if and how we can do this.

On the API: although every function is nicely put into interfaces, there is currently no public extension API (yet). Another thing worth thinking about for sure.

Now a personal comment on the working modes (processes vs. TP): I found myself using processes for anything which requires process control structures (loops, macros etc. - still a lot of things especially if you put stuff into production). But for anything else, especially for some visualization or model prototypes, I now use TP wherever I can.

Part of the reason for this separation is for sure the missing link like you described it. But part of the reason is also simply that some process-related work will never be possible in a true data-centric work mode. So let's see how people use and combine it and how to improve both and their seamless interaction - because I truly think that there is place for both approaches. Which is exactly why I loved your feedback!

Cheers,

Ingo

land · August 2018

Hi Ingo,

I hope for the best! I think the Renderer interface should give you all you need as the IOObject will know the operator who produced it last. So you should be able to insert the chain at the right position. Of course one would need to shift the operators, which would require some logic to not destroy the layout, but well.

And I completely agree: In the more compelx scenarios we are experiencing at our customer's, we will always need a process based approach to be able to automate things. If you handle data from dozens or hundreds of machines in a single use case, the manual, compeltely data centric way is...well, infeasible. We rather have to go over loops, adjust what is done based on control tables, etc.

But there is always a place where you will need to do data preparation inside such a process. In many situations multiple. And being able to do that right where I need it WITHIN the process, that would be awesome (and I really don't use that term often )

Only bad thing is, that I have to completely reinvent our training courses, as we probably will solve the problems now much quicker. Which may be a good measure of quality of your development, by the way.

Greetings,

Sebastian

IngoRM · August 2018

Ok, an "awesome" from you is not very frequent indeed :smileyvery-happy:

"Of course one would need to shift the operators, which would require some logic to not destroy the layout, but well."

Yeah, this is one of the things I was a bit afraid off as well. But we could put all the generated operators into one subprocess so that on the high level, not too much would be ruined anyway...

However, I am more concerned about hacking my way into the execution engine somehow. I would either pick up the result from the result perspective as you describe. Or, if you want to start TP from a process you currently design, I could envision something like a right click on a port and then run the process until the port is reached, get the data result at the port, and fire up TP. At the end, we add the new subprocess at this port (or on its connection if it is connected).

No matter which way, this would require some changes in the core first since TP and AM are both in an extension and the extension API would not really allow for any of this as far as I can see... I will discuss with the core team and see what we can do to change that :-)

Thanks again!

land · August 2018

Hi Ingo,

I would strongly oppose putting everything into a subprocess. It really clutters up your process quite fast and all that different levels are hard enough to oversee. If you put preprocessing into just another level, it get's really confusing. After years of larger scale RM projects, we completely discouraged this in our best practices.

However, it shouldn't be hard to implement this algorithm for shifting them. At least not if the user sticks to some golden rules of process design.

"However, I am more concerned about hacking my way into the execution engine somehow. I would either pick up the result from the result perspective as you describe. "

I don't think there's a lot of hacking involved, unless the API was changed since 7.5 drastically.

It's pretty easy: You just add a new renderer in the ioobjects.xml for the ExampleSets (you can do so in the extension, it just adds it to the already registered to my knowledge). This one then creates a new view starting the TP with the ExampleSet. The ExampleSet itself hast the getProcessingHistory, which gives you the last operator and port that touched this. You can simply get the currently opened process, find the operator and port by name and add the chain there. One should only check what happens with data loaded from repositories. As the processing history is transient, it's probably not restored so a check might be pretty simple, too.

Hope this feature makes it into 9.1!

Greetings,

Sebastian

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

TurboPrep as usual intermediate step in a process

Answers