Paper: arXiv
Authors: Yunchao He, Jian Luan, Yujun Wang
Abstract: Sequence expansion between encoder and decoder is a critical challenge in sequence-to-sequence TTS. Attention-based methods achieve great naturalness but suffer from unstable issues like missing and repeating phonemes, not to mention accurate duration control. Duration-informed methods, on the contrary, seem to easily adjust phoneme duration but show obvious degradation in speech naturalness. This paper proposes PAMA-TTS to address the problem. It takes the advantage of both flexible attention and explicit duration models. Based on the monotonic attention mechanism, PAMA-TTS also leverages token duration and relative position of a frame, especially countdown information, i.e. in how many future frames the present phoneme will end. They help the attention to move forward along the token sequence in a soft but reliable control. Experimental results prove that PAMA-TTS achieves the highest naturalness, while has on-par or even better duration controllability than the duration-informed model.
1: |
2: |
3: |
4: |
5: |
Random sample from test set.
Proposed PAMA-TTS | Tacotron-length-regulator (TLR) | Tacotron-stepwise (TSW) |
---|---|---|
1: 周霖先生您好,我这边是《小米贷款》审核机器人-小融,现在我们这边有一个为您提额的活动,需要您补充信息,请问您现在方便吗? | ||
2: 好的,您目前正常经营的公司名称是下面哪一个,一、河北商玛商贸有限公司;二、汕尾市城区正力货运代理店;三、南京雄起服饰有限公司;四、以上都不是,请您选择一二三四任意一个 | ||
3: 请问您公司于哪一年成立? | ||
4: 好的,您的主营业务是什么? | ||
5: 好的,您本年度每个月平均销售额是下面哪一个,一、小于等于10万;二、11万(至)30万;三、31万(至)50万;四、50万以上,以上请您选择一二三四任意一个 | ||
6: 好的,您上个月的销售额是多少万元? | ||
7: 抱歉,我没有听清,方便重复一下么????????? | ||
8: 好的,您的毛利是下面哪一个?一、小于等于百分之5;二、百分之6(至)百分之8; 三、百分之9(至)百分之10;四、百分之10以上,以上请您选择一二三四任意一个 | ||
9: 好的,最后请您说下您的身份证号码后四位? | ||
10: 好的非常感谢,本次已经核实身份完毕,祝您生活愉快,再见。 | ||
Samples |
---|
Samples |
---|