검색 상세

Extension of MapReduce improving performance on Hadoop

초록/요약

MapReduce is programming model that is suitable for parallel and distributed data processing and Hadoop is popular open-source implementation of MapReduce. One of the major issues of MapReduce is handling failure in cluster environment. Two methods, failure recovery and speculative task, are used in MapReduce. However, speculative task method has certain issues such as how many duplicated tasks are needed and how can define threshold determining a delayed one. To ameliorate these issues, this project modifies speculative task scheduling method by using Hadoop platform. In the previous speculative method, it permits multiple backup tasks concurrently for a delayed one. When the fastest task finishes its work, the result of the task is adapted and outcomes from other duplicated tasks are ignored. The modified method will eliminate a main culprit task causing delay after assigning same work to other node. Adopting this method, unassigned tasks could be assigned to nodes faster and have better chance to keep its data locality. Evaluation on modified module is performed in terms of response time. Also Comparing with previous method is conducted.

more

목차

Contents

1. Introduction = 1

2. Background = 3
2.1 Background on MapReduce and speculative tasks = 3
2.2 General Problems of the current speculative method = 6

3. Solution approaches considered = 8
3.1 Advantage of the deterministic method = 9

4. Evaluation of the proposed solution = 10
4.1 Comparing execution time = 11
4.2 Task run-time evaluation = 13
4.3 Comparison of progress rate = 16

5. Analyze of the results = 18

6. Conclusion = 20

7. References = 21

more