검색 상세

Stock Forecasting with Large Language Models : A Comparative Study Using Bilingual News and Time Series Data

대형 언어 모델을 활용한 주가 예측: 이중언어 뉴스와 시계열 데이터를 활용한 비교 연구

초록/요약

This study investigates the integration of bilingual financial news (English and Chinese) with historical stock price data to enhance stock price prediction. Traditional time series models, such as LSTM and GRU, are compared to large language models (LLMs), including GPT-4o mini and XuanYuan2-70B-Chat, under various configurations. LLMs outperformed traditional models for medium and longer time windows (\( w = 3 \) and \( w = 5 \)) by effectively leveraging unstructured news data. Ablation studies show the notable influence of news sentiment in enhancing short-term predictions. However, for 1-day windows, LSTM achieved superior performance by capturing short-term market dependencies with minimal noise. Combining English and Chinese datasets did not outperform using English news alone, emphasizing data consistency over volume. The findings underscore the potential and limitations of LLMs in financial forecasting, offering insights into aligning data features with model capabilities for robust predictions.

more

초록/요약

본 연구는 영문와 중문으로 작성된 금융 뉴스 데이터와 과거 주가 데이터를 통합하여 주가 예측 정확도를 향상시키는 방법을 탐구합니다. LSTM 및 GRU와 같은 전통적인 시계열 모델과 GPT-4o mini 및 XuanYuan2-70B-Chat과 같은 대규모 언어 모델(LLM)의 성능을 다양한 설정에서 비교 분석했습니다. 연구 결과, LLM은 상대적 중장기 시간창(\( w = 3 \) and \( w = 5 \)) 에서 비정형 텍스트 데이터를 효과적으로 활용하여 전통적인 모델을 능가하는 예측 성능을 보였습니다. 제거 연구에서는 단기 예측에서 뉴스 감정이 유의미한 영향을 미친다는 점이 확인되었습니다. 그러나 1일 시간창에서는 LSTM이 최소한의 노이즈로 단기 시장 의존성을 효과적으로 포착하여 가장 우수한 성능을 달성하였습니다. 영문과 중문 뉴스 데이터셋을 통합했을 때는 성능이 향상되지 않았으며, 이는 데이터의 양보다 일관성과 적합성이 더욱 중요하다는 점을 시사합니다. 본 연구의 결과는 금융 예측에서 LLM의 가능성과 한계를 명확히 제시하며, 데이터 특징과 모델 구조를 정렬하여 보다 강건한 예측을 달성하기 위한 통찰력을 제공합니다.

more

목차

Table of Contents
Abstract i
국문초록 ii
Table of Contents iii
List of Tables vi
List of Figures vii
1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Research Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Literature Review 4
2.1 Advances in Financial Time Series Prediction . . . . . . . . . . . . . . . 4
2.2 Role of News Sentiment in Financial Markets . . . . . . . . . . . . . . . 5
2.3 Large Language Models in Financial Applications . . . . . . . . . . . . . 6
3 Data Preparation 8
3.1 Data Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.1.1 Stock Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.1.2 News Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 News Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2.1 Company Identification in Chinese News . . . . . . . . . . . . . 10
3.2.2 Duplicate Removal . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2.3 Handling Non-Trading Days . . . . . . . . . . . . . . . . . . . . 11
3.3 Exploratory Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.3.1 Stock Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . 11
3.3.2 Text-Return Correlation . . . . . . . . . . . . . . . . . . . . . . 13
4 Experimental Design 15
4.1 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.2 Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.2.1 Time Series Models . . . . . . . . . . . . . . . . . . . . . . . . 16
4.2.2 Large Language Models (LLMs) . . . . . . . . . . . . . . . . . . 16
4.3 Feature Vector Composition . . . . . . . . . . . . . . . . . . . . . . . . 17
4.4 Moving Window Construction and Input Structures . . . . . . . . . . . . 18
4.4.1 Data Processing for Zero-Shot Prompting . . . . . . . . . . . . . 18
4.4.2 Data Processing for Few-Shot Prompting . . . . . . . . . . . . . 18
4.4.3 Equivalence in Data Utilization . . . . . . . . . . . . . . . . . . 19
4.4.4 Significance of Window Size . . . . . . . . . . . . . . . . . . . . 20
4.5 Prompt Design for LLMs . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.5.1 Zero-Shot Learning . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.5.2 Few-Shot Learning . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.6 Model Training and Evaluation . . . . . . . . . . . . . . . . . . . . . . . 26
4.6.1 Time Series Models . . . . . . . . . . . . . . . . . . . . . . . . 26
4.6.2 Large Language Models . . . . . . . . . . . . . . . . . . . . . . 26
4.6.3 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . 27
5 Results and Analysis 29
5.1 Time Series Models vs. LLMs . . . . . . . . . . . . . . . . . . . . . . . 29
5.2 Zero-Shot vs. Few-Shot Learning . . . . . . . . . . . . . . . . . . . . . . 33
5.3 Language-Specific Impact . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.4 Impact of Time Window Selection on LLM Performance . . . . . . . . . 35
5.5 Ablation Study: Impact of News Data . . . . . . . . . . . . . . . . . . . 36
6 Discussion 39
6.1 Multilingual Dataset Experiments . . . . . . . . . . . . . . . . . . . . . 39
6.2 Challenges in Processing Non-Trading Day News . . . . . . . . . . . . . 40
6.3 Computational Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . 41
7 Conclusion 42
7.1 Key Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
7.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
7.3 Limitations and Future Work . . . . . . . . . . . . . . . . . . . . . . . . 43
Reference 45
Appendix 48
Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

more