洪青阳 教授

香港城市大学 博士(2005)

研究方向:声纹识别;语音识别、语音合成;大模型技术

电子邮件:qyhong (AT) xmu.edu.cn

个人主页:http://speech.xmu.edu.cn/qyhong

个人简历:

个人简介:

bwin必赢教授,主要研究方向是语音识别、声纹识别,先后主持国家自然科学基金三项,科技部创新基金两项。牵头组建厦门大学智能语音实验室,带领XMUSPEECH团队连续两届获东方语种识别(OLR)竞赛第一名,成功研发国内第一套闽南语合成系统。具有丰富的工业研发经验,与华为、海思、鼎桥等多家企业合作,承担大量的智能语音项目,核心技术应用到华为智能手机、说咱闽南话APP、声云语音转写和全国十几个省市的司法/社保/证券/电力系统。出版专著《语音识别:原理与应用》,发布声纹识别开源工具ASV-Subtools,助力国内外学术研究和产业落地。担任2020-2021年全国声纹识别研究与应用学术研讨会主席、中文信息学会语音信息专委会副主任。获电子工业出版社“优秀作者奖”和华为“优秀技术合作成果奖”。

主持项目:

  • “基于图结构建模的说话人日志研究”,国家自然科学基金面上项目,2023.1-2026.12。

  • “端到端语音识别的模型训练与解码优化”,横向课题(鼎桥通信),2021.6-2021.12。

  • “耳内声纹项目二期”,横向课题(华为终端),2020.1-2020.12。

  • “声纹和短词唤醒技术合作项目”,横向课题(海思半导体),2019.9-2020.8。

  • “复杂场景下的说话人特征提取及识别研究”,国家自然科学基金面上项目,2019.1-2022.12。

  • “耳内声纹技术合作项目”,横向课题(华为终端),2018.8-2019.3。

  • “自由场语音交互关键后端技术研究”,横向课题(华为技术),2017.12-2018.12。

  • “闽南语智能语音对话系统”,横向课题,2017.8-2018.7。

  • “语音交互关键技术研究”,横向课题(华为技术),2016.4-2017.12。

  • “基于迁移学习的跨信道说话人识别研究”,国家自然科学基金青年项目,2012.1-2014.12。

科研专著:

洪青阳,李琳著,《语音识别:原理与应用》,电子工业出版社,2023年2月第2版。

开源工具(声纹识别):

ASV-Subtools: https://github.com/Snowdar/asv-subtools

代表性论文:

[1] Wenhao Guan, Qi Su, Haodong Zhou, Shiyu Miao, Xingjia Xie, Lin Li, Qingyang Hong, “ReFlow-TTS: A Rectified Flow Model for High-fidelity Text-to-Speech,” ICASSP 2024.

[2] Tao Li, Feng Wang, Wenhao Guan, Lingyan Huang, Qingyang Hong, Lin Li, “Improving Multi-speaker ASR with Overlap-aware Encoding and Monotonic Attention,” ICASSP 2024.

[3] Longjie Luo, Tao Li, Lin Li, Qingyang Hong, “The XMUSPEECH System for Audio-Visual Target Speaker Extraction in MISP 2023 Challenge,” ICASSP 2024.

[4] Yishuang Li, Hukai Huang, Zhicong Chen, Wenhao Guan, Jiayan Lin, Lin Li, Qingyang Hong, “SR-Hubert: an Efficient Pre-trained Model for Speaker Verification,” ICASSP 2024.

[5] Feng Wang, Lingyan Huang, Tao Li, Qingyang Hong, Lin Li, “Conformer-based Language Embedding with Self-Knowledge Distillation for Spoken Language Identification,” INTERSPEECH 2023.

[6] Lingyan Huang, Tao Li, Haodong Zhou, Qingyang Hong, Lin Li, “Cross-Modal Semantic Alignment before Fusion for Two-Pass End-to-End Spoken Language Understanding,” INTERSPEECH 2023.

[7] Wenhao Guan, Tao Li, Yishuang Li, Hukai Huang, Qingyang Hong, Lin Li, “Interpretable Style Transfer for Text-to-Speech with ControlVAE and Diffusion Bridge,” INTERSPEECH 2023.

[8] Dexin Liao, Tao Jiang, Feng Wang, Lin Li, Qingyang Hong, “Towards A Unified Conformer Structure: from ASR to ASV Task,” ICASSP2023.

[9] Qiulin Wang, Wenxuan Hu, Lin Li, Qingyang Hong, “Meta Learning with Adaptive Loss Weight for Low-Resource Speech Recognition,” ICASSP2023.

[10] Tao Li, Haodong Zhou, Jie Wang, Qingyang Hong, Lin Li, “The XMU System for Audio-Visual Diarization and Recognition in MISP Challenge 2022,” ICASSP2023.

[11] Jie Wang, Zhicong Chen, Haodong Zhou, Lin Li, Qingyang Hong, “Community Detection Graph Convolutional Network for Overlap-Aware Speaker Diarization,” ICASSP2023.

[12] Zhicong Chen, Jie Wang, Wenxuan Hu, Lin Li, Qingyang Hong, “Unsupervised Speaker Verification Using Pre-Trained Model and Label Correction,” ICASSP2023.

[13] Jie Wang, Yuji Liu, Binling Wang, Yiming Zhi, Song Li, Shipeng Xia, Jiayang Zhang, Feng Tong, Lin Li, Qingyang Hong, “Spatial-aware Speaker Diarization for Multi-channel Multi-party Meeting,” INTERSPEECH 2022.

[14] Binling Wang, Feng Wang, Wenxuan Hu, Qiulin Wang, Jing Li, Dong Wang, Lin Li, Qingyang Hong, “Oriental Language Recognition (OLR) 2021: Summary and Analysis,” INTERSPEECH 2022.

[15] Fuchuan Tong, Siqi Zheng, Haodong Zhou, Xingjia Xie, Qingyang Hong, Lin Li, “Deep Representation Decomposition for Rate-invariant Speaker Verification," Odyssey 2022.

[16] Lin Li, Fuchuan Tong, Qingyang Hong, “When Speaker Recognition Meets Noisy Labels: Optimizations for Front-Ends and Back-Ends,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 1586-1599, 2022, doi: 10.1109/TASLP.2022.3169977.

[17] Fuchuan Tong, Siqi Zheng, Min Zhang, Binling Wang, Song Li, Yafeng Chen, Hongbin Suo, Lin Li, Qingyang Hong, “Graph Convolutional Network Based Semi-supervised Learning on Multi-Speaker Meeting Data,” ICASSP 2022.

[18] 胡文轩,王秋林,李松,洪青阳,李琳,基于端到端的多语种语音识别研究,信号处理, 2021 Vol. 37 (10): 1816-1824.

[19] Yan Liu, Zheng Li, Lin Li and Qingyang Hong, “Phoneme-aware and Channel-wise Attentive Learning for Text Dependent Speaker Verification,” INTERSPEECH 2021.

[20] Fuchuan Tong, Yan Liu, Song Li, Jie Wang, Lin Li and Qingyang Hong, “Automatic Error Correction for Speaker Embedding Learning with Noisy Label,” INTERSPEECH 2021.

[21] Zheng Li, Yan Liu, Lin Li and Qingyang Hong, “Additive Phoneme-aware Margin Softmax Loss for Language Recognition” INTERSPEECH 2021.

[22] Song Li, Beibei Ouyang, Fuchuang Tong, Dexin Liao, Lin Li and Qingyang Hong, “Real-time End-to-End Monaural Multi-Speaker Speech Recognition,” INTERSPEECH 2021.

[23] Dexin Liao, Jing Li, Yiming Zhi, Song Li, Qingyang Hong, and Lin Li, “An Integrated Framework for Two-pass Personalized Voice Trigger,” INTERSPEECH 2021.

[24] Jing Li, Binling Wang, Yiming Zhi, Zheng Li, Lin Li, Qingyang Hong, and Dong Wang, “Oriental Language Recognition (OLR) 2020: Summary and Analysis,” INTERSPEECH 2021.

[25] Lin Li, Zheng Li, Yan Liu, Qingyang Hong, “Deep joint learning for language recognition,” Neural Networks, 141 (2021) 72-86.

[26] Song Li, Beibei Ouyang, Dexin Liao, Shipeng Xia, Lin Li, Qingyang Hong, “End-to-end Multi-accent Speech Recognition with Unsupervised Accent Modelling,” ICASSP 2021.

[27] Song Li, Beibei Ouyang, Lin Li, Qingyang Hong, “Light-TTS: Lightweight Multi-speaker Multi-lingual Text-to-speech,” ICASSP 2021.

[28] Fuchuan Tong, Miao Zhao, Jianfeng Zhou, Hao Lu, Zheng Li, Lin Li, Qingyang Hong, “ASV-Subtools: Open Source Toolkit for Automatic Speaker Verification”, ICASSP 2021.

[29] Song Li, Beibei Ouyang, Lin Li, Qingyang Hong, “LightSpeech: Lightweight Non-regressive Multi-speaker Text-to-speech”, IEEE Spoken Language Technology Workshop (SLT 2021), Jan 2021, Shenzhen, China.

[30] Zheng Li, Miao Zhao, Lin Li, Qingyang Hong, “Multi-feature Learning with Canonical Correlation Analysis Constraint for Text-independent Speaker Verification”, IEEE Spoken Language Technology Workshop (SLT 2021), Jan 2021, Shenzhen, China.

[31] Zheng Li, Miao Zhao, Qingyang Hong, Lin Li, Zhiyuan Tang, Dong Wang, Liming Song and Cheng Yang, “AP20-OLR Challenge: Three Tasks and Their Baselines”, APSIPA ASC 2020.

[32] Song Li, Lin Li, Qingyang Hong and Lingling Liu, "Improving Transformer-based Speech Recognition with Unsupervised Pre-training and Multi-task Semantic Knowledge Learning", INTERSPEECH 2020.

[33] Tao Jiang, Miao Zhao, Lin Li, Qingyang Hong, "The XMUSPEECH System for Short-Duration Speaker Verification Challenge 2020", INTERSPEECH 2020.

[34] Zheng Li, Miao Zhao, Jing Li, Lin Li, Qingyang Hong, "On the Usage of Multi-feature Integration for Speaker Verification and Language Identification", INTERSPEECH 2020.

[35] Zheng Li, Miao Zhao, Jing Li, Yiming Zhi, Lin Li, Qingyang Hong, "The XMUSPEECH System for AP19-OLR Challenge", INTERSPEECH 2020.

[36] Hao Lu, Jianfeng Zhou, Miao Zhao, Wendian Lei, Qingyang Hong, Lin Li, “XMU-TS Systems for NIST SRE19 CTS Challenge”, Proceedings of the IEEE International Conference on Acoustic, Speech, and Signal Processing (ICASSP 2020), Barcelona, Spain, May 2020, pp.7569-7573.

[37] 颜世江、陈越、颜婉玲、许彬彬、李琳、洪青阳. 端到端闽南语合成系统的设计与实现[J]. 厦门大学学报(自然科学版), 2020, v.59;No.279(06):90-96.

[38] Zheng Li, Miao Zhao, Qingyang Hong, Lin Li, Zhiyuan Tang, Dong Wang, Liming Song, Cheng Yang, AP20-OLR Challenge: Three Tasks and Their Baselines, APSIPA ASC 2020.

[39] Jianfeng Zhou, Tao Jiang, Zheng Li, Lin Li, Qingyang Hong, “Deep Speaker Embedding Extraction with Channel-Wise Feature Responses and Additive Supervision Softmax Loss Function”, INTERSPEECH 2019.

[40] Rongjin Li, Miao Zhao, Zheng Li, Lin Li, Qingyang Hong, “Anti-Spoofing Speaker Verification System with Multi-Feature Integration and Multi-Task Learning”, INTERSPEECH 2019.

[41] Jianfeng Zhou, Tao Jiang, Lin Li, Qingyang Hong, Zhe Wang, Bingyin Xia, “Training Multi-Task Adversarial Network for Extracting Noise-robust Speaker Embedding”, Proceedings of the IEEE International Conference on Acoustic, Speech, and Signal Processing (ICASSP 2019), Brighton, UK, May 2019, pp.6196-6200.

[42] Qingyang Hong, Lin Li, Jun Zhang, Lihong Wan, Huiyang Guo, “Transfer learning for PLDA-based speaker verification”, Speech Communication, 92:90-99, 2017.

[43] Qingyang Hong, Lin Li, Jun Zhang, Lihong Wan, Feng Tong, “Transfer Learning for Speaker Verification on Short Utterances”, INTERSPEECH 2016.

[44] Qingyang Hong, Jun Zhang, Lin Li, Lihong Wan, Feng Tong, “A Transfer Learning Method for PLDA-based Speaker Verification”, Proceedings of the IEEE International Conference on Acoustic, Speech, and Signal Processing (ICASSP 2016), Shanghai, China, Mar. 2016, pp.5455-5459.

[45] 李琳,万丽虹,洪青阳,张君,李明. 基于概率修正PLDA的说话人识别系统. 天津大学学报(自然科学版),2015.8,pp.692-696.

[46] Qingyang Hong, Lin Li, Ming Li, Ling Huang, Lihong Wan and Jun Zhang, "Modified-prior PLDA and Score Calibration for Duration Mismatch Compensation in Speaker Recognition System", INTERSPEECH 2015.

[47] Hong Qingyang, Wang Sheng, Liu Zhijian, “A robust speaker-adaptive and text-prompted speaker verification system”, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), v 8833, p 385-393, 2014.

[48] Wanli Chen, Qingyang Hong, Ximin Li, GMM-UBM for Text-Dependent Speaker Recognition, 2012 Third IEEE/IET International Conference on Audio, Language and Image Processing (ICALIP2012), July 16-18, 2012, Shanghai, China.

[49] Q.Y. Hong and S. Kwong, "A Discriminative Training Approach for Text-independent Speaker Recognition," Signal Processing, 85 (7), July 2005, pp. 1449-1463.

[50] Q.Y. Hong and S. Kwong, "A Genetic Classification Method for Speaker Recognition," Engineering Applications of Artificial Intelligence, 18 (1), February 2005, pp. 13-19.

[51] Q.Y. Hong and S. Kwong, "Discriminative Training for Speaker Identification," Electronics Letters, 40 (4), February 2004, pp. 280-281.

[52] Q.Y. Hong and S. Kwong, "Discriminative Training for Speaker Identification Based on Maximum Model Distance Algorithm," ICASSP2004.

发明专利:

  • 发明专利: 基于声纹识别技术的亲情电话系统与亲情电话通讯方法(专利号: ZL 2010 1 0274490.6)

  • 发明专利:一种可抑制移动噪声的麦克风阵列语音增强装置(专利号:ZL 2012 1 0497016.9)

  • 发明专利:谱减与动态时间规整算法联合处理的抗噪声声纹识别装置(专利号:ZL 2013 1 0370030.7)

  • 发明专利:带声源方向跟踪功能的麦克风阵列语音增强装置及其方法(专利号:ZL 2012 1 0320004.9)

  • 发明专利:文本提示型声纹门禁系统(专利号:ZL 2013 1 0294975.5)

  • 发明专利:基于声纹识别技术的满意度调查作弊排查方法(专利号:ZL 2013 1 0754586.6)

  • 发明专利:可适应强烈背景噪声的麦克风阵列语音增强装置(专利号:ZL 2016 1 0080236.X)

  • 发明专利:带相位自校正功能的声聚焦麦克风阵列长距离拾音装置(专利号:ZL 2016 1 0080008.2)

  • 发明专利:无直达声条件下的麦克风阵列语音增强装置(专利号:ZL 2017 1 0408164.1)

  • 发明专利:基于加权有限状态转换器的文本内容添加标点方法(专利号:ZL 2018 1 1180949.9)

  • 发明专利:一种基于深度神经网络的欺骗语音检测方法(专利号:ZL 2019 1 0590712.6)

  • 发明专利:一种基于深度神经网络的多类声学特征整合方法和系统(专利号:ZL 2020 1 0073244.8)

  • 发明专利:一种基于矫正流模型的高质量语音合成方法(专利号:ZL 2023 1 1587465.7)