The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
job container forcibly killed
i run job modeling in rapidminer server. after all the operator sucessfully finished and the result sucesfully stored and correct, the process end with error state
"Job container '1' was killed forcefully and therefore the job execution has been stopped. Reason: Restart of job container has been invoked".
i didnt change the behaviour of container restart policies so it still default.
Has anyone else encountered anything similar? Any suggestions on to diagnose the issue?
"Job container '1' was killed forcefully and therefore the job execution has been stopped. Reason: Restart of job container has been invoked".
i didnt change the behaviour of container restart policies so it still default.
Has anyone else encountered anything similar? Any suggestions on to diagnose the issue?
0
Best Answer
-
aschaferdiek Employee-RapidMiner, Member Posts: 76 RM EngineeringHi @rur68. With 9.5.0 we introduced the persistent job container which speeds up execution. Job containers are no longer shut down after each job execution but instead kept alive so jobs don't need to wait for the job container boot time any more. Just mentioning it as it's a fundamental change in architecture.Related to your problem:
- Would it be possible to try out our latest 9.7.1 Server/AI Hub release?
- Do you experience any network related problems on the machine the JobAgent is running on? The timeout we see in the log could be related to network problems.
- Do you have enough memory on the machine where the JobAgent is running? Does the process maybe take more than the job container has?
- You could try to increase the maximum error amount and time between the health checks of the Job Container. If those limits are exceeded, the Job Container is flagged as unresponsive, killed and then restarted. You can give it a try by adding the following properties to the agent.properties file. If they help, then your machine is probably overloaded or your local network interface might experience hick-ups.
# amount of errors tolerated before shutdown
jobagent.container.maxErrorAmountBeforeSpawn = 10
# time between errors in milliseconds
jobagent.container.maxTimeBetweenErrors = 100005
Answers
i got many warning "matrix is singular" in the log. it's probably because of my data that im trying to predict. i exclude the problem and then it run well.
but, this eror keep coming after i did upgrade to rm server 9.6. some of my job that was ok in the previous version is end with error state like this. i don't know whats going on, is 9.6 version has a "warning sensitive" like this?
fyi, the previous version i used is 9.0. and this is not the only job causing job container killed, i have another one job that always end with error state like this but the result is sucesfully stored.
1. unfortunately it's not possible to try 9.7.1 Server/AI Hub by now. but, what's fundamental change in architecture of this versions?
2. i think we didnt have problem in network because it run well on others job
3. also not the memory, i already increased the memory
4. this is the only option i can do and i already did and it works. but still confuse why it run well in version 9.0 but 9.6 got some errors like this.
anyway thank you very much @aschaferdiek
i already added the following properties to the agent.properties file and it was work before.
now the same job didnt sucessfull at all because of the connection refuse in the process of building the model using deep learning, the process end with error state "Job container '1' was killed forcefully and therefore the job execution has been stopped. Reason: Restart of job container has been invoked". here's i attach the log file. Any suggestions on to handling this connection refuse in the middle of the process?