검색 상세

Fast Tuple Extraction Algorithms for Streaming XML Data

초록/요약

An eXtensible Markup Language (XML) is rapidly becoming the de facto standard for retrieving and exchanging web information, and is having a significant impact on the evolution of existing database management systems. The efficient query processing on streaming XML data is a very important research challenge. The query processing on streaming XML data is an emerging technology designed to support queries (i.e., XPath or XQeury) over continuous streams of XML data. The interest in the query processing on streaming XML data is growing due to the increasing number of real world applications such as monitoring systems for stock, email, and sensor data that need to analyze incoming data streams. In this thesis, we focus on the tuple extraction query which is the most important issue in the query processing techniques on streaming XML data. Recently, a new approach, called StreamTX, was proposed to support multiple tuple extraction queries efficiently with minimal buffer sizes. However, we empirically observe that StreamTX still incurs computational overhead unnecessarily during processing for tuple extraction query, since it builds on TwigStack, an XML query processing algorithm originally developed for stored XML. To solve above-mentioned problems, we propose the three algorithms to process the tuple extraction queries on streaming XML data. We first design a non-recursive XQStream algorithm to handle inefficient recursive calls of StreamTX. Subsequently, we extend the basic XQStream by incorporating two novel schemes: (1) the relational pointer to efficiently and effectively evaluate the structural relationship of elements, and (2) the pattern reuse to reduce redundant path evaluations for pattern matching. The performance evaluation on various datasets provides new empirical findings. First, XQStream++, which incorporates the relational pointer and the pattern reuse scheme into XQStream, significantly outperforms the state-of-the-art algorithms in the running time with a small, nearly constant memory usage. Second, the most recently released XQuery engines outperform StreamTX in the running time.

more

목차

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
List of Algorithms . . . . . . . . . . . . . . . . . . . . . . . . ix
1 Introduction 1
1.1 Background and Motivation . . . . . . . . . . . . . . . . 1
1.1.1 Overview of XML Query Processing for Stored
XML Data . . . . . . . . . . . . . . . . . . . . . . 5
1.1.2 Overview of XML Query Processing for Streaming
XML Data . . . . . . . . . . . . . . . . . . . 8
1.2 Research Problem . . . . . . . . . . . . . . . . . . . . . . 12
1.2.1 Introduction to StreamTX . . . . . . . . . . . . . 13
1.2.2 Problems of StreamTX . . . . . . . . . . . . . . . 15
1.3 Contributions of This Thesis . . . . . . . . . . . . . . . . 18
1.4 Organization of This Thesis . . . . . . . . . . . . . . . . 19
2 Literature Reviews 22
2.1 XML and XML Query Language Model . . . . . . . . . . 22
2.1.1 Introduction to XML Data Model . . . . . . . . . 22
2.1.2 Introduction to XML Query Language Model . . 27
2.2 XML Labeling Schemes . . . . . . . . . . . . . . . . . . . 30
2.2.1 Region-Encoding Scheme . . . . . . . . . . . . . 30
2.2.2 Pre x-Encoding Scheme . . . . . . . . . . . . . . 32
2.3 XML Query Processing for Stored XML Data . . . . . . 33
2.4 XML Query Processing for Streaming XML Data . . . . 36
2.5 Keyword Search and Parallel Processing in XML Query
Processing . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3 Preliminary 39
3.1 Streaming XML Data Model . . . . . . . . . . . . . . . . 39
3.2 Tuple Extraction Query . . . . . . . . . . . . . . . . . . 44
4 Algorithms for Tuple Extraction Query 46
4.1 Basic Idea of Algorithms . . . . . . . . . . . . . . . . . . 47
4.2 Non-Recursive Algorithm: XQStream . . . . . . . . . . . 52
4.3 Extended Algorithm: XQStream+ . . . . . . . . . . . . . 60
4.3.1 Relational Pointer . . . . . . . . . . . . . . . . . . 60
4.3.2 XQStream+ Algorithm . . . . . . . . . . . . . . . 63
4.4 Fast Tuple Extraction Algorithm: XQStream++ . . . . . 68
4.4.1 Pattern Reuse . . . . . . . . . . . . . . . . . . . . 68
4.4.2 XQStream++ Algorithm . . . . . . . . . . . . . . 72
5 Performance Evaluation 80
5.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . 80
5.2 Evaluation Results . . . . . . . . . . . . . . . . . . . . . 85
5.2.1 Comparison in Running Time . . . . . . . . . . . 85
5.2.2 Comparison in Element Access Frequency . . . . 94
5.2.3 Comparison in Memory Footprint . . . . . . . . . 98
6 Conclusion and Future Works 105
6.1 Summary of This Thesis . . . . . . . . . . . . . . . . . . 105
6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . 107
Bibliography 108

more