dCollection 디지털 학술정보 유통시스템

META VIEW

항목	Soms Field	내용	언어
제목	dc.title	Real-time Sound-Effects Synthesis of Raw-Waveform Audio with Generative Adversarial Networks
저자	dc.creator	Minwook Chang
저자(제2언어)	somsterms.otherName	장민욱
소속	somsterms.affiliation	대학원 컴퓨터학과 소프트웨어전공
주제(키워드)	dc.subject	Sound Synthesis, Generative Adversarial Network, Virtual Reality
발행기관	dc.publisher	고려대학교 대학원
지도교수	somsterms.advisor	김정현
발행년도	dcterms.issued	2020
학위수여년월	somsterms.awarded	2020. 2
자료유형	somsterms.subType	학위논문
학위구분	somsterms.thesisDegree	석사
학과	somsterms.major	대학원 컴퓨터학과(정보대학)
세부전공	somsterms.specialty	소프트웨어전공
원문형식	dc.format	application/pdf
원문크기	dcterms.extent	1051537 bytes
원문매체	dcterms.medium	application/pdf
원문페이지	somsterms.page	54 p
원문URL	dc.identifier	http://dcollection.korea.ac.kr/common/orgView/000000127372
UCI	somsterms.UCI	I804:11009-000000127372
DOI	somsterms.DOI	10.23186/korea.000000127372.11009.0000942
본문언어	dc.language	영어
제출원본	somsterms.isBasedOn	000046026271
초록/요약	dcterms.abstract	Conventional methods of real-time sound effects in 3D graphical and virtual environments relied upon preparing all the needed samples ahead of time and simply replaying them as needed, or parametrically modifying a basic set of samples using physically based techniques such as the spring-damper simulation and modal analysis/synthesis. In this work, we propose (1) to apply the generative adversarial networks (GAN) approach to the problem at hand and (2) a novel generative model called PUGAN, which progressively synthesizes high-quality audio in a raw waveform. We demonstrate our claim by training a GAN with sounds of different drums and synthesizing the sounds on the fly for a virtual drum playing environment. The perceptual test revealed that the subjects could not discern the synthesized sounds from the ground truth nor perceived any noticeable delay upon the corresponding physical event. PUGAN leverages on the recently proposed idea of progressive generation of higher-resolution images by stacking multiple encode-decoder architectures. To effectively apply it to raw audio generation, we propose two novel modules: (1) a neural upsampling layer and (2) a sinc convolutional layer. Compared to the existing state-of-the-art model called WaveGAN, which uses a single decoder architecture, our model generates audio signals and converts them in a higher resolution in a progressive manner, while using a significantly smaller number of parameters, e.g., 20x smaller for 44.1 kHz output, than an existing technique called WaveGAN. Our experiments show that the audio signals can be generated in real-time with the comparable quality to that of WaveGAN with respect to the inception scores and the human evaluation.	영어
목차	dcterms.tableOfContents	CHAPTER 1. INTRODUCTION 1 CHAPTER 2. RELATED WORK 5 2.1 PHYSICALLY BASED SOUND SYNTHESIS 5 2.2 GAN BASED AUDIO GENERATION. 6 2.3 AUDIO-TO-AUDIO CONVERSION 8 CHAPTER 3. DATA CHARACTERISTICS: AUDIO VERSUS IMAGE 9 CHAPTER 4. PUGAN: PROGRESSIVE UPSAMPLING GAN 12 4.2 GENERATOR 15 4.2.1 Lightweight WaveGAN module 15 4.2.2 Bandwidth extension module (BWE) 16 4.3 DISCRIMINATOR 18 CHAPTER 5. EXPERIMENT 19 5.1 VIRTUAL ENVIRONMENT EXPERIMENT 19 5.1.1 Dataset 19 5.1.2 Experimental design 20 5.2 PUGAN EXPERIMENT 24 5.2.1 Dataset 24 5.2.2 Training 25 5.2.3 Inception score (IS) 26 5.2.4 Human evaluation 27 CHAPTER 6. RESULT AND DISCUSSION 28 6.1 VIRTUAL ENVIRONMENT RESULTS 28 6.1.1 Naturalness and realism 28 6.1.2 Perceived delay 30 6.2 PUGAN RESULTS 32 6.2.1 Inception score and human evaluation 32 6.2.2 Computation cost 35 CHAPTER 7. CONCLUSION AND FUTURE WORK 37 REFERENCES 39 ACKNOWLEDGEMENT

닫기