General set covering for feature selection in data mining
- 주제(키워드) Combinatorial optimization , Data mining
- 발행기관 고려대학교 대학원
- 지도교수 류홍서
- 발행년도 2012
- 학위수여년월 2012. 8
- 학위구분 석사
- 학과 일반대학원 산업경영공학과
- 원문페이지 43 p
- 실제URI http://www.dcollection.net/handler/korea/000000035049
- 본문언어 영어
- 제출원본 000045719727
초록/요약
Set covering has widely been accepted as a staple feature selection tool for data mining and classification purposes. In data mining, we need a small yet sufficient large number of good features and the common practice is to use set covering models a multiple number of times for the purpose. In this paper, we present a generalized version of this classical combinatorial optimization model to make it better suited for its applications in data mining and develop a procedure based upon surrogate relaxation and subgradient method for meta-heuristic solution. And we also develop the primal heuristic procedure: we improved the strategy to construct a better feasible solution, which considered the interplay among the columns. Mathematically and also numerically with experiments on 25 set covering instances from Beasley’s OR library, we demonstrate the utility of the proposed model and the proposed solution method. We also compare the results of Lagrangian-based heuristic and surrogate-based heuristic for general set covering problems, and discuss the difference between this two widespread used heuristic methods.
more목차
1 Introduction 1
1.1 Motivation 1
1.2 Organization 3
2 Set covering problems 4
2.1 Overview of standard set covering problem 4
2.2 General set covering problem 5
2.3 Feature selection in data mining 6
2.4 Utility of (gSC) for feature selection 8
3 Lagrangian and surrogate duality 11
3.1 Lagrangian relaxation and duality 11
3.2 Surrogate relaxation and duality 13
3.3 Subgradient optimization 16
3.4 Proposed algorithm 20
4 Preliminary numerical results 23
4.1 Results of (sSC) and (gSC) 23
4.2 Lagrangian vs. surrogate methods 25
5. Conclusion 32
Reference 33

