검색 상세

DATA-MINING BASED DATABASE INTRUSION DETECTION USING INTERNAL QUERY TREE

초록/요약

With the development of information technology, a massive amount of sensitive personal information and proprietary information has accumulated in databases. Therefore, detecting database intrusion is becoming increasingly important in organization. Database intrusion could be broadly categorized into SQL injection attack (SQLIA) and SQL direct attack (SQLDA). In case of SQLIAs, malicious users indirectly access the database using the vulnerability of database-driven web applications. That is, malicious users attack the database by altering the original SQL statements within the applications, through the user input values. In case of SQLDAs, malicious users with privileged user accounts or compromised user accounts directly access the database, and abuse the SQL statements, to harvest the data. In this thesis, we propose a framework to detect two types of database intrusion by using SVM classification and various kernel functions. The key issue of database intrusion detection framework is how to represent the internal query tree collected from database log suitable for SVM classification algorithm in order to acquire good performance in detecting SQLIAs and SQLDAs. To solve the issue, we first propose a novel method to convert the query tree into an n-dimensional feature vector by using a multi-dimensional sequence as an intermediate representation. The reason that it is difficult to directly convert the query tree into an n-dimensional feature vector is the complexity and variability of the query tree structure. Second, we propose a method to extract the syntactic features, as well as the semantic features when generating feature vector. Third, we propose a method to transform string feature values into numeric feature values, combining multiple statistical models. The combined model maps one string value to one numeric value by containing the multiple characteristic of each string value. In order to demonstrate the feasibility of our proposals in practical environments, we implement database intrusion detection system based on PostgreSQL, a popular open source database system, and we perform experiments. The experimental results using the internal query trees of PostgreSQL validate that our proposal is effective in detecting SQLIAs, with AUC of 99.6% and in detecting SQLDAs, with AUC of 99.2%. This means that our database intrusion detection method yields at least 99% of the probability that the probability for malicious queries to be correctly predicted as database intrusion is greater than the probability for normal queries to be incorrectly predicted as database intrusion. Finally, we perform additional experiments to compare our proposal with other methods. In case of SQLIA detection, we compare our proposal with the methods using syntax-focused feature extraction and experiments using the feature transformation based on only a single statistical model. In case of SQLDA detection, we compare our proposal with the methods using syntax-focused feature extraction based on SVM classification and Naïve Bayesian classification. The experimental results show that our proposal significantly increases the probability of correctly detecting database intrusion containing both of SQLIAs and SQLDAs, when compared to the previous methods.

more

목차

1. INTRODUCTION 1
1.1 MOTIVATION 1
1.2 RELATED WORKS 4
1.2.1 Detection of SQL Injection Attacks 5
1.2.2 Detection of SQL Direct Attacks 10
1.3 CONTRIBUTIONS 13
2. PRELIMINARIES 15
2.1 TREE STRUCTURE AND MULTI-DIMENSIONAL SEQUENCE 15
2.2 SUPPORT VECTOR MACHINE AND KERNEL FUNCTIONS 17
3. DATABASE INTRUSION DETECTION FRAMEWORK 22
3.1 DESIGN CONCEPTS 23
3.2 DESIGN ARCHITECTURE 26
4. QUERY TREE REPRESENTATION 29
4.1 TWO-STEP CONVERSION PROCESS 29
4.2 MULTI-DIMENSIONAL SEQUENCE GENERATION 31
4.2.1 Feature Extraction 31
4.2.2 Feature Transformation with Combined Statistical Model 34
4.3 FEATURE VECTOR GENERATION 36
4.3.1 Feature Vector Representation for SQLIAs 38
4.3.2 Feature Vector Representation for SQLDAs 40
5. EXPERIMENTS 42
5.1 EXPERIMENTAL ENVIRONMENT 42
5.2 PRACTICAL EXAMPLES 44
5.3 EXPERIMENT OF DETECTING SQLIAS 53
5.3.1 Data preparation 53
5.3.2 Data preprocessing 55
5.3.3 SVM model generation and evaluation 56
5.3.4 Comparison 63
5.4 EXPERIMENTS OF DETECTING SQLDAS 72
5.4.1 Data preparation 72
5.4.2 Data preprocessing 75
5.4.3 SVM model generation and evaluation 76
5.4.4 Comparison 82
6. CONCLUSION AND FUTURE WORKS 88
BIBLIOGRAPHY 90

more