Debug MapReduce Jobs
When you work with Hadoop MapReduce jobs for this course, here are some of the tips for debugging your Hadoop MapReduce issues:
- Do your input files exist, and in the format expected?
- You can use
hadoop fs -cat file | headcommand to find this out.
- You can use
- Does your output folder already exist?
- Hadoop won't run if the output folder already exists.
- Use
hadoop fs -rm -r -f output_dirto remove the output folder.
- Does your python mapper/reducer have syntax or run-time errors?
- If you pass step 3, does your mapper/reducer file has wrong line endings?
- Follow tips here to view and convert line endings.