The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
[SOLVED] RapidMiner Working Directory on Windows?
I am trying to use relative path names for files (e.g., ReadCSV) on Windows, to facilitate easily moving processes between machines with different file structures. The working directory for my RapidMiner gets set to USERPROFILE. Is there an easy way to set the working directory to some more useful value, i.e., on startup? Any thoughts?
BTW, I thought to use %{process_path} but that is getting set to the (useless) value "\\Repository-Name\Process-Name". Since process_path is documented as being the absolute path name for the current process, this seems to be broken. Comments?
Thanks in advance (as always!) for any help.
BTW, I thought to use %{process_path} but that is getting set to the (useless) value "\\Repository-Name\Process-Name". Since process_path is documented as being the absolute path name for the current process, this seems to be broken. Comments?
Thanks in advance (as always!) for any help.
0
Answers
it would greatly help us if you described your usecase a bit more detailed.
In general, however, the concept of RapidMiner is based on repositories, not on files. So the process_path macro is working correctly, since it specifies the absolute path of the process in the repository.
If you want to share processes and data, I strongly recommend you to install the RapidAnalytics server. The feature that will be useful for you is the possibility to define a so-called Remote RapidAnalytics Repository, which you can access from any RapidMiner instance in the network and share both processes and data. That means that you have to import you csv files only once to the server, and can then use that data seamlessly from any RapidMiner instance. It is even possible to execute long-running processes on the server, such that you workstation is not blocked, or to execute recurring tasks automatically on a daily, weekly or user-defined basis.
Another possibility to store the data would be a central SQL database - again you would import your data only once, and then access them directly from the server without using any files.
Last, but not least, it is possible to run RapidMiner processes from the command line and pass macros to that process with the -M parameter. The syntax would be: Best regards,
Marius
I'm working on Windows with 5.2. I have a repository named "Test Repository" that is stored at (say) C:\My Repositories\Test Repository. I create a new process in that repository, name it "pathtest" and add the "Read CSV" operator. I set the file name for the operator to be "test.csv" and then I run the process. I get the following error:
The file 'java.io.FileNotFoundException: C:\Documents and Settings\srt19170\text.csv (The system cannot find the file specified)' does not exist.
... For a given filename, RapidMiner resolves the filename against the directory the experiment file is stored in...
This seems to be a clear error. RapidMiner is not resolving the filename against the directory the experiment file is stored in (C:\My Repositories\Test Repository) but rather against the value of $USERPROFILE$ (at least on Windows). (Side note: You should correct the error message to remove the outdated 'experiment' reference.) I haven't checked all the operators, but "Read Excel" exhibits the same behavior, so I'm assuming this is broken in all the I/O operators.
Let's move on to %{process_path}. According to the documentation, this should be set to the "absolute path name for the current process". In this case I would expect that to be "C:\My Repositories\Test Repository\pathtest". If I change Read CSV to use the filename "%{process_path}\test.csv", I get the error:
The file 'java.io.FileNotFoundException: \\Test Repository\pathtest\text.csv (The system cannot find the file specified)' does not exist.
The value of %{process_path} is clearly not the "absolute path name for the current process." It's not even a legitimate UNC pathname in Windows -- it's trying to use the name of the repository as a server name.
So regardless of my particular problem, it seems to me that both of these are "broken" at least in the sense that they don't operate as described in the documentation.
As to my problem, I appreciate the suggestion to use a central server, but that's not an option in my case. The two machines I'm moving the process between don't share any common infrastructure. Defining a macro in the RapidMiner start up may be my best option, but I think RapidMiner should have some capability to provide the path name of the current process.
You'll probably see a NullPointerException emerging from the RapidMiner.quit() method, which is obviously a bug, but does not harm the process execution. I already filed an internal bug for that issue.
As I explained in my previous post, this actually is the absolute path of the current process, though not in terms of file system path, but in terms of RapidMiner repository paths. That will probably not be changed. I'll propose that to the developers, but I don't think that we'll introduce such a functionality. After all, the process is located in a repository, and the repository should not contain anything but RapidMiner processes and RapidMiner data, so providing the process path encourages the user to misuse the repository folder to place his data files there. If you are dealing with file system paths, please use external scripts to find them and start RapidMiner via the command line.
However, we will probably introduce new possibilities of storing non-RapidMiner files in the repository. That will give you the following possibility:
- Copy your CSV-file into the repository folder
- In RapidMiner, use the Open File operator to create a file object from your file*
- pass the file object directly into the Read CSV operator
- continue as if you were reading the CSV directly from disk (as you did before)
This will probably not be part of the next release, but is planned for the future.
* using the Open File operator to read files from the repository folder you no longer have the need to know the file system path of your processes.
Best regards,
Marius
(BTW, it's a little disheartening to install a new release and discover that the basic run scripts don't work. It seems like there was no testing at all of the scripts before the 5.3 release.)
The other problem with this solution is that (at least in 5.3), you cannot set a macro value on the command line without specifying a process, and if you specify a process, RapidMiner runs the process and exits. This makes it impossible to open RapidMiner to do development and set the macro as well.
One possibility for me is to use Set Macro and force it to be executed first. On every new machine I can modify the Set Macro to hold the appropriate file path. Unfortunately, if I have a lot of machines and a lot of processes, this ends up being a lot of work and error prone. Of course, if the command line option to set a macro can be fixed this won't be necessary.
1. you do not have to find the Set Macro operator in each process, but can edit all relevant macros in one single view.
2. using the Process Context allows you to overwrite the macros defined there from the command line, which is not as easily possible when using the Set Macro operator.
I would like to ask you to describe your setup with these multiple machines, what you are doing, and your usual workflow of developing and running processes. Maybe when having the big picture, we can give a better advice.
Best regards,
Marius
Happy Mining