we're currently struggling to stabilize large impala cluster.
the graph below shows network usage of statestored.
we have one node for statestored and catalogd, 84 nodes for impalad. 2.0 is installed and being tested.
until 18:32, statestored and catalogd were running for about 15 min, without --load_catalog_in_
from 18:36, statestored and catalogd were running for about 15 min, without --load_catalog_in_
we have one node for statestored and catalogd, 84 nodes for impalad. 2.0 is installed and being tested.
until 18:32, statestored and catalogd were running for about 15 min, without --load_catalog_in_
from 18:36, statestored and catalogd were running for about 15 min, without --load_catalog_in_
at 19:00, all 84 + 1 nodes are running "WITH" --load_catalog_in_
note that i have set -statestore_subscriber_
I know that all the traffic here is generated by statestored process by monitoring SendQ size constantly.
I believe that this behavior may related to one of the enhancements in version 2.1 "seperated heartbeat and catalog data sync".
I believe that this behavior may related to one of the enhancements in version 2.1 "seperated heartbeat and catalog data sync".
Any idea?
What you are seeing is the effect of the --load_catalog_in_background flag. When the catalog service starts, if that flag is set to true, it will try to load all the catalog metadata eagerly. When that metadata is loaded, it will be broadcast to the Impala demons via the statestore.
When --load_catalog_in_background is false, the catalog service will only load a skeleton version of the metadata and broadcast that (the skeleton is mostly just the table names). Tables are then loaded when requested for the first time by a query.
If you have a large catalog, it can take a long time to complete the initial broadcast. In versions before Impala 2.1, this could cause Impala demons to incorrectly think their connection to the statestore was failing (because they were not getting the periodic heartbeat+metadata messages), and would lead to the missed deadline messages you have noticed. Starting in Impala 2.1, the statestore sends metadata differently to heartbeats, and should be less prone to this problem as it does not have to send all the metadata in a heartbeat message.
Could you try Impala 2.1, and let us know if you see any change in behaviour? You may still want to keep --load_catalog_in_background=
I will try 2.1, but before doing that, is it compatible with other components in cdh 5.1.3?
I have installed and tested a little bit on a smaller testing cluster, but it is running cdh 5.2.0.
Any problems you can think of before I try new version on our live system?
Since Impala is part of CDH, I think you would need to upgrade all the platform components to be compatible with Impala 2.1. If that's not possible in your current environment, let me know and we can look at tuning your 2.0 deployment further.

 
댓글 없음:
댓글 쓰기