The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
RM Server: scheduled job terminating abnormally
Hi,
I have a long-running scheduled job on RM Server 9.6 that normally runs without any issues. However, it failed last night and shows the following error:
This is the second time in 14 days that it has failed in this way so it's not a one-off. Any ideas where I should be looking to resolve this?
Many thanks,
Paul
I have a long-running scheduled job on RM Server 9.6 that normally runs without any issues. However, it failed last night and shows the following error:
Execution exited abnormally
Failed to submit job. Reason: I/O error on POST request for "http://localhost:10002/jobs": Read timed out; nested exception is java.net.SocketTimeoutException: Read timed out
com.rapidminer.execution.jobagent.service.exception.ServiceException: Failed to submit job. Reason: I/O error on POST request for "http://localhost:10002/jobs": Read timed out; nested exception is java.net.SocketTimeoutException: Read timed out
at com.rapidminer.execution.jobagent.clients.rest.JobContainerRestClient.submitJob(JobContainerRestClient.java:73)
at com.rapidminer.execution.jobagent.service.executor.JobExecutorService.startProcess(JobExecutorService.java:227)
at com.rapidminer.execution.jobagent.service.executor.JobExecutorService.executeProcess(JobExecutorService.java:150)
at com.rapidminer.execution.jobagent.queue.JobMessageConsumer.executeJob(JobMessageConsumer.java:171)
at com.rapidminer.execution.jobagent.queue.JobMessageConsumer.acceptJobMessage(JobMessageConsumer.java:90)
at sun.reflect.GeneratedMethodAccessor171.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.springframework.messaging.handler.invocation.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:180)
at org.springframework.messaging.handler.invocation.InvocableHandlerMethod.invoke(InvocableHandlerMethod.java:112)
at org.springframework.jms.listener.adapter.MessagingMessageListenerAdapter.invokeHandler(MessagingMessageListenerAdapter.java:104)
at org.springframework.jms.listener.adapter.MessagingMessageListenerAdapter.onMessage(MessagingMessageListenerAdapter.java:69)
at org.springframework.jms.listener.AbstractMessageListenerContainer.doInvokeListener(AbstractMessageListenerContainer.java:719)
at org.springframework.jms.listener.AbstractMessageListenerContainer.invokeListener(AbstractMessageListenerContainer.java:679)
at org.springframework.jms.listener.AbstractMessageListenerContainer.doExecuteListener(AbstractMessageListenerContainer.java:649)
at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.doReceiveAndExecute(AbstractPollingMessageListenerContainer.java:317)
at org.springframework.jms.listener.AbstractPollingMessageListenerContainer.receiveAndExecute(AbstractPollingMessageListenerContainer.java:255)
at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.invokeListener(DefaultMessageListenerContainer.java:1168)
at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.run(DefaultMessageListenerContainer.java:1062)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.springframework.web.client.ResourceAccessException: I/O error on POST request for "http://localhost:10002/jobs": Read timed out; nested exception is java.net.SocketTimeoutException: Read timed out
at org.springframework.web.client.RestTemplate.doExecute(RestTemplate.java:675)
at org.springframework.web.client.RestTemplate.execute(RestTemplate.java:637)
at org.springframework.web.client.RestTemplate.exchange(RestTemplate.java:558)
at com.rapidminer.execution.jobagent.clients.rest.JobContainerRestClient.submitJob(JobContainerRestClient.java:69)
... 21 more
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:735)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:678)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1587)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1492)
at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
at org.springframework.http.client.SimpleClientHttpResponse.getRawStatusCode(SimpleClientHttpResponse.java:52)
at org.springframework.web.client.DefaultResponseErrorHandler.hasError(DefaultResponseErrorHandler.java:54)
at org.springframework.web.client.RestTemplate.handleResponse(RestTemplate.java:697)
at org.springframework.web.client.RestTemplate.doExecute(RestTemplate.java:662)
... 24 more
Log : (Last 1000 lines)
Last update was 7 minutes ago
This is the second time in 14 days that it has failed in this way so it's not a one-off. Any ideas where I should be looking to resolve this?
Many thanks,
Paul
Tagged:
0
Answers
However, I made an interesting discovery. The machine runs to job containers and in the other container there is a scheduled process that runs every 3 minutes and at 2am every night it fails when it attempts to write to an external MySQL database. I need to talk to the DB team but maybe they run some kind of backup process or similar on that database at that time which causes that error.
The fact that both of these issues happen at 2am can't be coincidence. During that error would RM be dumping a lot of error logging to the file system which could cause the intermittent failure in the other job container running on the same machine?
Thanks,
Paul