looking at the log on query coordinator node, some the fragments run on the broken node never report back complete status. But reading the log of the broken node, looks like the fragments is finished. I wonder if network issue can cause queris hang rather than failed? Is there are some retry logic in reporting fragments running status?
>> Is there are some retry logic in reporting fragments running status?
Just take look at the code, it seems there is no retry logic for reporting exec-fragments status.
After coordinator dispatch the fragments to different backend-host to execute,
Coordinator::Exec
Status fragments_exec_status = ParallelExecutor::Exec(
bind<Status>(mem_fn(& Coordinator:: ExecRemoteFragment), this, _1),
reinterpret_cast<void**>(& backend_exec_states_[backend_ num - num_hosts]),
num_hosts, &latencies);
Then different worker parallelly to execute the fragment, actually execute the rpc call "backend_client-> ExecPlanFragment".
ParallelExecutor::Exec
for (int i = 0; i < num_args; ++i) {
stringstream ss;
ss << "worker-thread(" << i << ")";
worker_threads.AddThread(new Thread("parallel-executor", ss.str(),
&ParallelExecutor::Worker, function, args[i], &lock, &status, latencies));
}
worker_threads.JoinAll();
return status;
Hmm, there is a retry logic when execute the rpc call "backend_client-> ExecPlanFragment, backend_client.Reopen()".
>> I wonder if network issue can cause queris hang rather than failed?
In your case, the networking issue may cause the query hung. However, there are various timeout setting of impala, including the query, http://www.cloudera.com/ content/cloudera/en/ documentation/core/latest/ topics/impala_timeouts.html
댓글 없음:
댓글 쓰기