2014년 12월 31일 수요일

[cdh-user] Job failure for large input data

We have a map reduce application working on CDH 2 (version Hadoop 2.0.0-cdh4.7.0). It runs fine with a lesser amount of data/files as input, however, if we try to pump in more data, it fails immediately after job submission, with the mapper progress also at 0%, with the following error: 

14/12/30 23:47:28 INFO mapreduce.Job: The url to track the job: http://.../application_1415275960253_50244/
14/12/30 23:47:28 INFO mapreduce.Job: Running job: job_1415275960253_50244
14/12/30 23:47:36 INFO mapreduce.Job: Job job_1415275960253_50244 running in uber mode : false
14/12/30 23:47:36 INFO mapreduce.Job:  map 0% reduce 0%
14/12/30 23:47:36 INFO mapreduce.Job: Job job_1415275960253_50244 failed with state FAILED due to: Application application_1415275960253_50244 failed 1 times due to AM Container for appattempt_1415275960253_50244_000001 exited with  exitCode: 1 due to:
.Failing this attempt.. Failing the application.
There is no other information. The Job tracker URL also shows: "The requested application exited before setting a tracking URL"

We typically notice this error once the number of splits increases above 95K-100K, corresponding to around 45K input files totalling around 10Tb. 


Would greatly appreciate help on ways to handle this. Preferably some hadoop command line parameters to increase some internal limit, to quickly workaround this issue.



Can you check the job tracker logs for more information, also can you see what is the value you have set for property mapreduce.job.split.metainfo.maxsize  (or) mapreduce.jobtracker.split.metainfo.maxsize. Can you make this property to -1 and retry your job and let me know your feedback. Thanks



Thanks a lot for the quick reply. 

I did actually try that out after I found the below error in the yarn logs: 
org.apache.hadoop.yarn.YarnException: java.io.IOException: Split metadata size exceeded 10000000. Aborting job,
But could not override it at the job level via the command line (e.g. "hadoop jar process.jar 
-Dmapreduce.job.split.metainfo.maxsize=-1 -inputDir=/data"). 
Got the same effect. Seems that it can be done only by changing the mapred-site.xml and not via command line params.



We can do this through mapred-site.xml and even by passing during job invoke. Make sure which parameter is causing actual problem based on your cdh version 
mapreduce.job.split.metainfo.maxsize or mapreduce.jobtracker.split.metainfo.maxsize. We had similar issue once and we verified the job configuration .xml file created has the parameter i just mentioned with a value of 10M and we were overriding the other parameter.

You can see the job configuration file in your job id url check the 10M parameter name.

In hive we did following property set mapreduce.job.split.metainfo.maxsize=-1 (or) set mapreduce.jobtracker.split.metainfo.maxsize=-1 and it worked. Since you are running a hadoop job overriddding config parameter with -D method is correct but make sure to override the right parameter that is getting used in by mapred daemons. Let me know your feedback. 

set mapreduce.jobtracker.split.metainfo.maxsize=-1 



Thanks again for the help Sonal.

Since we did not want to change the default in mapred-site, we increased the split size at a job level (using mapreduce.input.fileinputformat.split.minsize), and that indirectly reduced the number of splits. 

That fixed the problem (for now).



Good to know. 


댓글 없음:

댓글 쓰기