Extension of MapReduce improving performance on Hadoop
- 주제(키워드) MapReduce
- 발행기관 고려대학교 융합소프트웨어전문대학원
- 지도교수 유혁
- 발행년도 2011
- 학위수여년월 2011. 8
- 학위구분 석사
- 학과 융합소프트웨어전문대학원 임베디드소프트웨어학과
- 원문페이지 29 p
- 실제URI http://www.dcollection.net/handler/korea/000000026614
- 본문언어 영어
- 제출원본 000045669473
초록/요약
MapReduce is programming model that is suitable for parallel and distributed data processing and Hadoop is popular open-source implementation of MapReduce. One of the major issues of MapReduce is handling failure in cluster environment. Two methods, failure recovery and speculative task, are used in MapReduce. However, speculative task method has certain issues such as how many duplicated tasks are needed and how can define threshold determining a delayed one. To ameliorate these issues, this project modifies speculative task scheduling method by using Hadoop platform. In the previous speculative method, it permits multiple backup tasks concurrently for a delayed one. When the fastest task finishes its work, the result of the task is adapted and outcomes from other duplicated tasks are ignored. The modified method will eliminate a main culprit task causing delay after assigning same work to other node. Adopting this method, unassigned tasks could be assigned to nodes faster and have better chance to keep its data locality. Evaluation on modified module is performed in terms of response time. Also Comparing with previous method is conducted.
more목차
Contents
1. Introduction = 1
2. Background = 3
2.1 Background on MapReduce and speculative tasks = 3
2.2 General Problems of the current speculative method = 6
3. Solution approaches considered = 8
3.1 Advantage of the deterministic method = 9
4. Evaluation of the proposed solution = 10
4.1 Comparing execution time = 11
4.2 Task run-time evaluation = 13
4.3 Comparison of progress rate = 16
5. Analyze of the results = 18
6. Conclusion = 20
7. References = 21