Repository Paths in RapidMiner Studio

sgenzer · May 2019

Repository Paths are very powerful tools in RapidMiner but can be tricky to understand for beginners. Let's go over the basics...

What's a Repository?

A repository is simply a folder that holds all of your RapidMiner data sets (we call them "ExampleSets), processes, and other file objects that you will create using RapidMiner Studio. This folder can be stored locally on your computer, or on a RapidMiner Server.

Let's assume you're just working on your local computer for now. When you start RapidMiner Studio for the first time, it will automatically create a "Local Repository" for you. If you want to see where it is, just go to your .RapidMiner folder on your computer, go into "repositories", and then into "Local Repository". You should see two folders here: 'data' and 'processes'. These correspond EXACTLY to what you see in the Repository panel in RapidMiner Studio.

Image: https://us.v-cdn.net/6030995/uploads/editor/r8/elb70aep1ngg.png

Image: https://us.v-cdn.net/6030995/uploads/editor/mp/93r5v9qfn7a7.png

[You will notice there some extra 'properties' files on your computer that you do not see in RapidMiner. They are storing metadata and you should just ignore them]

How does RapidMiner find things in your Local Repository?

Let's use the standard Retrieve operator to see how this works. Let's say you drag the Titanic data set from the Samples folder and put it in a process:

Image: https://us.v-cdn.net/6030995/uploads/editor/6b/i08sv3d6l1aa.png

RapidMiner automatically inserted the repository path for this ExampleSet in the "repository entry" parameter. What does this mean? The most important item here is the double-forward-slash '//' in front. This means that that the path starts from your 'root' folder (we call this an absolute path). The Samples folder is a special, built-in one that you cannot see, but that's what it means. So what if you change "Samples" to "Local Repository" in the path? You get an error!!

Image: https://us.v-cdn.net/6030995/uploads/editor/4u/jftlbh7sv4sy.png

Why? Because RapidMiner Studio is looking for an ExampleSet in your Local Repository -> data folder, and it's not there. Now copy-and-paste that Titanic data set from the Samples to Local Repository -> data and run again. Great! It found the ExampleSet right where you told it to.

Image: https://us.v-cdn.net/6030995/uploads/editor/m7/66hsu1t3dxs5.png

Why won't processes with 'Retrieve' work when I share them with someone else?

Most likely this is due to repository path errors. For example, say you shared that process we just did with a friend. It would not work because it would be looking for 'Titanic' on her computer under Local Repository -> data, and most likely it's not there!

How can I define repository paths in RapidMiner so that I can share them with someone else?

The best way to do this is with relative repository paths, rather than absolute paths (the double slash //). You can do this as follows:

- Save your data set (ExampleSet) in the same folder as your process. Then just put the name of the ExampleSet in 'repository entry' with no other path information. RapidMiner will automatically look inside the same folder as the process if no path is specified.

- Save your data set (ExampleSet) in the data folder in your Local Repository and change the repository path to '../data/[name]'. When RapidMiner sees those '..' before a path, it looks for a folder next to the folder with the current process.

sonny_plankton · February 2020

Hello,

thank you for your remarks on this topic.

Is it possible to retrieve python scripts (.py) as well with the retrieve operator? Although the file is
stored on the same Github repo as the main process it returns an "can not retrieve repository data" error.

It does however work well with the open file operator. But then we facing the problem with an absolute
path where the .py file was once stored.

Thanks in advance. highly appreciated
sonny.

sgenzer · February 2020

hi @sonny_plankton I'm going to cc my colleague @Marco_Boeck to see if he has any insight here.

Scott

Marco_Boeck · February 2020

Hi,

Not right now. But without spoiling too much, we have something in the pipeline that will make the cross-functional team experience that much better and is very much related to your question

Regards,
Marco

sonny_plankton · March 2020

Thank you very @sgenzer and @Marco_Boeck for taking the time to answer.

Looking forward to the enhancements connecting our python and RapidMiner specialists. If anyone facing
the same issue, we have simply stored the py-Script as text File in the operator.

Philipp

sgenzer · March 2020

hi @sonny_plankton yes good progress is being made here. Keep an eye out for our next beta release; you may find what you're looking for.

Scott

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Repository Paths in RapidMiner Studio

What's a Repository?

How does RapidMiner find things in your Local Repository?

Why won't processes with 'Retrieve' work when I share them with someone else?

How can I define repository paths in RapidMiner so that I can share them with someone else?

Comments