Semantic-Based Feature Engineering and Explainable Machine Learning for Online Community Intelligence
- 주제(키워드) Online platform community , Online customer review , Review helpfulness , Information quality management , Text mining , Machine learning , Explainable artificial intelligence
- 발행기관 고려대학교 대학원
- 지도교수 이홍철
- 발행년도 2024
- 학위수여년월 2024. 8
- 학위명 박사
- 학과 및 전공 대학원 산업경영공학과
- 세부전공 산업경영공학전공
- 원문페이지 104 p
- 실제URI http://www.dcollection.net/handler/korea/000000289057
- UCI I804:11009-000000289057
- DOI 10.23186/korea.000000289057.11009.0001569
- 본문언어 영어
초록/요약
This study focuses on predicting the helpfulness of online customer reviews (OCRs) and identifying the determinants of helpfulness in two types of online communities. The approach consists of three stages. Initially, feature engineering is performed using various text mining techniques. Subsequently, machine learning models are constructed to predict the potential helpfulness of OCRs. Finally, explainable artificial intelligence (XAI) methods are employed to identify the determinants of helpfulness. The results indicate the effectiveness of the ensemble model for prediction. Additionally, extended semantic features derived from text mining play a crucial role in determining helpfulness. This paper provides insights for customers on writing more helpful reviews, enables managers to conduct intelligent OCR management by identifying valuable reviews, and assists companies in extracting the voice of the customer. Moreover, it contributes to ongoing research on understanding the determinants of review helpfulness.
more목차
ABSTRACT i
국문 초록 ii
PREFACE iv
TABLE OF CONTENTS v
LIST OF TABLES viii
LIST OF FIGURES ix
CHAPTER 1. Introduction 1
1.1 Background and purpose 1
1.2 Organization of this paper 2
CHAPTER 2. Related work 3
2.1 Online community platform and review helpfulness 3
2.2 Feature engineering and text mining for online reviews 4
2.3 Determinants of online customer review helpfulness 5
2.4 Predictive machine learning and explainable artificial intelligence 6
CHAPTER 3. Proposed methodologies 8
3.1 Semantic-based intelligent text analytics 8
3.1.1 Lexicon-based sentiment analysis 8
3.1.2 Deep-learning-based sentiment analysis 8
3.1.3 Readability analysis 9
3.1.4 Topic labeling using latent dirichlet allocation 11
3.1.5 Typo analysis 12
3.2 Semantic-based feature engineering using topic modeling 13
3.3 Predictive machine learning and boosting algorithms 15
3.4 Ensemble meta learning 18
3.5 Explainable artificial intelligence (XAI) Methodologies 21
3.5.1 Partial Dependence Plot 21
3.5.2 Permutation Feature Importance 22
3.5.3 Shapley additive explanation 23
CHAPTER 4. Explainable machine learning, and ensemble for experience goods review 25
4.1 Helpfulness of physical experience goods review 25
4.2 Case study data – online customer review of experience goods 27
4.3 Overall process 28
4.4 Feature engineering for physical experience goods 31
4.4.1 Basic features of unstructured data 31
4.4.2 Expanded features of unstructured data using intelligent text analytics 34
4.5 Machine learning model construction 35
4.6 Results 36
4.6.1 Regression results 36
4.6.2 Interpretation using partial dependence plot 37
4.6.3 Interpretation using permutation feature importance 40
4.6.4 Interpretation using shapley additive explanations 41
4.6.5 Ablation experiments 47
4.6.7 Ensemble learning for review filtering 49
4.7 Discussion 51
4.7.1 Summary and key findings 51
4.7.2 Implications 51
4.7.3 Limitations and future works 52
CHAPTER 5. Semantic-based feature engineering for online employee review 53
5.1 Helpfulness of online employee review 53
5.2 Theoretical background 55
5.2.1 Elaboration likelihood model on review helpfulness 55
5.2.2 Signaling theory on review helpfulness 56
5.3 Case study data - online employee review 58
5.4 Text preprocessing and construction of company neologism dictionary 61
5.5 Topic modeling for feature engineering 63
5.6 Overall process 65
5.7 Results 67
5.7.1 LDA results 67
5.7.2 Variable operationalization 69
5.7.3 Helpfulness prediction results 71
5.6.4 Ablation analysis 72
5.7.5 Post-hoc analysis 73
5.8 Discussion 76
5.8.1 Summary and key findings 76
5.8.2 Implications for theory 77
5.8.3 Implications for practice 78
5.8.4 Limitations and future works 79
CHAPTER 6. Conclusion 80
REFERENCES 81
SUPPLEMENTARY MATERIALS 90
A. Distribution for target values for two datasets 90

