The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Repository Paths in RapidMiner Studio
sgenzer
Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
Repository Paths are very powerful tools in RapidMiner but can be tricky to understand for beginners. Let's go over the basics...
Let's assume you're just working on your local computer for now. When you start RapidMiner Studio for the first time, it will automatically create a "Local Repository" for you. If you want to see where it is, just go to your .RapidMiner folder on your computer, go into "repositories", and then into "Local Repository". You should see two folders here: 'data' and 'processes'. These correspond EXACTLY to what you see in the Repository panel in RapidMiner Studio.
[You will notice there some extra 'properties' files on your computer that you do not see in RapidMiner. They are storing metadata and you should just ignore them]
RapidMiner automatically inserted the repository path for this ExampleSet in the "repository entry" parameter. What does this mean? The most important item here is the double-forward-slash '//' in front. This means that that the path starts from your 'root' folder (we call this an absolute path). The Samples folder is a special, built-in one that you cannot see, but that's what it means. So what if you change "Samples" to "Local Repository" in the path? You get an error!!
Why? Because RapidMiner Studio is looking for an ExampleSet in your Local Repository -> data folder, and it's not there. Now copy-and-paste that Titanic data set from the Samples to Local Repository -> data and run again. Great! It found the ExampleSet right where you told it to.
The best way to do this is with relative repository paths, rather than absolute paths (the double slash //). You can do this as follows:
- Save your data set (ExampleSet) in the same folder as your process. Then just put the name of the ExampleSet in 'repository entry' with no other path information. RapidMiner will automatically look inside the same folder as the process if no path is specified.
- Save your data set (ExampleSet) in the data folder in your Local Repository and change the repository path to '../data/[name]'. When RapidMiner sees those '..' before a path, it looks for a folder next to the folder with the current process.
What's a Repository?
A repository is simply a folder that holds all of your RapidMiner data sets (we call them "ExampleSets), processes, and other file objects that you will create using RapidMiner Studio. This folder can be stored locally on your computer, or on a RapidMiner Server.Let's assume you're just working on your local computer for now. When you start RapidMiner Studio for the first time, it will automatically create a "Local Repository" for you. If you want to see where it is, just go to your .RapidMiner folder on your computer, go into "repositories", and then into "Local Repository". You should see two folders here: 'data' and 'processes'. These correspond EXACTLY to what you see in the Repository panel in RapidMiner Studio.
[You will notice there some extra 'properties' files on your computer that you do not see in RapidMiner. They are storing metadata and you should just ignore them]
How does RapidMiner find things in your Local Repository?
Let's use the standard Retrieve operator to see how this works. Let's say you drag the Titanic data set from the Samples folder and put it in a process:RapidMiner automatically inserted the repository path for this ExampleSet in the "repository entry" parameter. What does this mean? The most important item here is the double-forward-slash '//' in front. This means that that the path starts from your 'root' folder (we call this an absolute path). The Samples folder is a special, built-in one that you cannot see, but that's what it means. So what if you change "Samples" to "Local Repository" in the path? You get an error!!
Why? Because RapidMiner Studio is looking for an ExampleSet in your Local Repository -> data folder, and it's not there. Now copy-and-paste that Titanic data set from the Samples to Local Repository -> data and run again. Great! It found the ExampleSet right where you told it to.
Why won't processes with 'Retrieve' work when I share them with someone else?
Most likely this is due to repository path errors. For example, say you shared that process we just did with a friend. It would not work because it would be looking for 'Titanic' on her computer under Local Repository -> data, and most likely it's not there!
How can I define repository paths in RapidMiner so that I can share them with someone else?
The best way to do this is with relative repository paths, rather than absolute paths (the double slash //). You can do this as follows:- Save your data set (ExampleSet) in the same folder as your process. Then just put the name of the ExampleSet in 'repository entry' with no other path information. RapidMiner will automatically look inside the same folder as the process if no path is specified.
- Save your data set (ExampleSet) in the data folder in your Local Repository and change the repository path to '../data/[name]'. When RapidMiner sees those '..' before a path, it looks for a folder next to the folder with the current process.
Tagged:
2
Comments
thank you for your remarks on this topic.
Is it possible to retrieve python scripts (.py) as well with the retrieve operator? Although the file is
stored on the same Github repo as the main process it returns an "can not retrieve repository data" error.
It does however work well with the open file operator. But then we facing the problem with an absolute
path where the .py file was once stored.
Thanks in advance. highly appreciated
sonny.
Scott
Not right now. But without spoiling too much, we have something in the pipeline that will make the cross-functional team experience that much better and is very much related to your question
Regards,
Marco
Looking forward to the enhancements connecting our python and RapidMiner specialists. If anyone facing
the same issue, we have simply stored the py-Script as text File in the operator.
Philipp
Scott