The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Answers
there maybe a plausible explanation for your observation. In fact I do not really know enough details about those virtual computing cores of an EC2 instance to provide a technical explanation, but depending on your process setup a local core may be faster than a virtual cpu core. If your process makes heavy use of an operator whose internal processing steps cannot be parallelized due to the chosen algorithm your local core may very likely be faster computing this step than one of the virtual cores.
Cheers,
Helge
I'm having an association rules workfow:
- pivoting a couple user-item
- transforming the table into a binary one
- creating frequent data sets with FP-GROWTH
- applying the creation association rule component
Does that means that these algorithmes are not paralelised?
To be clear I will give more details:
- I'm using rapidminer on my machine and I'm launcing the process from it on rapidanalytics(amazon). RapidAnalytics is reading the data from a MySQL database.
- I'm having 1 234 600 couples user-item.
- The simple reading on the rapidanalytics database took 20 minutes. By reading the CSV file that correspons to this database on my local machine it takes 1-2 seconds.
- By runing the process on a sample of 1000 couples on my computer it took several seconds and on the server it didn't finish in one hour..
I'm using the comunity edition of the RapidMiner (5.3) and the RapidAnalytics (comunity edition) version 1.3.015
Alina
in this case we can easily reject the "single-thread" theory. There are some algorithms you simple cannot distribute, but this is not applicable here. The huge differences in runtimes make it much more likely that there is an connectivity issue somewhere. Where is your database located? Could you try to upload your example set to the server and check the runtime again?
Cheers,
Helge
The database is situated on the server. (on the ec2)
I created the dataabse beecause when I tryed to use
- the upload button sait it uploaded the data (.csv) but in fact I could't find it there
- by putting the file or the rapidminer data object in the rapidanalytics repository I had a socket error (after half an hour). Only putting workflow is working perfectly..
Note: I tryed several times for each one of these 2 methods and I got the same result.
Thank you!!
Alina
do I understand you correctly that a process which runs on your local machine in a few seconds does not finish on your server after one hour? If this is the case there must be a significant issue in the server config. Could you check if there is enough hdd capacity available for the server and its database? Logged in the server you can click on >>Administration -> System information<< and have a look the server log and memory usage. Maybe you can post the log file or send more details about the errors you receive when trying to access the server repository.
Cheers,
Helge
yes, you do understand it right:
Server informations:
Time Sep 25, 2014 9:02:39 AM
Up since Sep 25, 2014 8:51:52 AM
Total memory 4.5 GB
Maximum memory 12 GB
Free memory 2.2 GB
In the logs there are a lot of errors, I'm not sure if this is normal for RapidMiner.
In the server logs there are a lot of X11 errors (the EC2 server is not having an interface).
The java version:
java version "1.7.0_67"
Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
I will post only the error lines from the logs since I have 10 719 and lines of logs (boot + server)
I started to copy the files on the server with the rapidanalytics repository, I will post the error message as soon as I have it!
Thnak you!
Note: I do not have this time out error when I try to copy any other workflow!
Thank you!
Thank you!
thank you for the detailed information. Here are two configuration changes you might try:
1) In RapidMiner you set a higher timeout value for server connections (Tools -> Preferences -> System -> connection.timeout). Increasing this value might help when connecting to servers via the Internet.
2) It looks like your MySQL database server works really slow. This is a known issue for version 5.5 using the innodb engine. Please check your MySQL configuration (usually /etc/my.cnf) and if necessary try to set the following options: Please backup your data before experimenting with these settings. Hope it helps.
Cheers,
Helge
I tryed changing the time out value in RapidMiner: it stopped(with the time out error) after one hour and a half.(note: I transfered the same file in filezilla in 5 minutes)
With the mysql configuration the reading time has improved (now it is done in 3 minutes) => Thank you very much!!
However, the same process that is working on my local computer (the association rules one that I described in my second post) is giving a stackoverflow error after 3 minutes and a half in the FP-Growth operator.. By watching the memory and the CPU I see that only 1.5CPU is used /4 CPU and the memory is at 50%+ about 20% cache
Thank you!
Alina
the error message shows that there is not enough memory for the java stack left. This could be a false configuration or maybe an infinite recursion (a bug so to speak) in the FP-GROWTH operator. Since I have not found anything about such a bug in our bugtracker lets see if you can increase java stack memory a little bit:
1) Please edit bin/run.conf in your RA installation folder and add -Xss:4m to the java command line.
2) I noticed you have a java setting -XX:MaxPermSize=4096m active in your setup. You may try to decrease this value to something like 512m since java will add this on top of the memory stack. In your setup this may cause a memory allocation of up to 19 gigs.
Cheers,
Helge
I did the changes ont the run.conf script but it didn't worked (I still have the error message on the FP-Growt operator):
My java configurations: I didn't knew java configuration xss before, are there any changes that I would need a bigger xss value (than 4)?
thank you!
Alina
(the bloking point is in replacing missing with zeros!!!)
I will try to increase even more the xss...
do you have others ideeas?
Thank you,
Alina