Audio samples from "PAMA-TTS: PROGRESSION-AWARE MONOTONIC ATTENTION FOR STABLE SEQ2SEQ TTS WITH ACCURATE PHONEME DURATION CONTROL"

Paper: arXiv

Authors: Yunchao He, Jian Luan, Yujun Wang

Abstract: Sequence expansion between encoder and decoder is a critical challenge in sequence-to-sequence TTS. Attention-based methods achieve great naturalness but suffer from unstable issues like missing and repeating phonemes, not to mention accurate duration control. Duration-informed methods, on the contrary, seem to easily adjust phoneme duration but show obvious degradation in speech naturalness. This paper proposes PAMA-TTS to address the problem. It takes the advantage of both flexible attention and explicit duration models. Based on the monotonic attention mechanism, PAMA-TTS also leverages token duration and relative position of a frame, especially countdown information, i.e. in how many future frames the present phoneme will end. They help the attention to move forward along the token sequence in a soft but reliable control. Experimental results prove that PAMA-TTS achieves the highest naturalness, while has on-par or even better duration controllability than the duration-informed model.

Comparison among systems

Recordings

1:
2:
3:
4:
5:

Synthesized Speech

Random sample from test set.

Proposed PAMA-TTS Tacotron-length-regulator (TLR) Tacotron-stepwise (TSW)
1: 周霖先生您好,我这边是《小米贷款》审核机器人-小融,现在我们这边有一个为您提额的活动,需要您补充信息,请问您现在方便吗?
2: 好的,您目前正常经营的公司名称是下面哪一个,一、河北商玛商贸有限公司;二、汕尾市城区正力货运代理店;三、南京雄起服饰有限公司;四、以上都不是,请您选择一二三四任意一个
3: 请问您公司于哪一年成立?
4: 好的,您的主营业务是什么?
5: 好的,您本年度每个月平均销售额是下面哪一个,一、小于等于10万;二、11万(至)30万;三、31万(至)50万;四、50万以上,以上请您选择一二三四任意一个
6: 好的,您上个月的销售额是多少万元?
7: 抱歉,我没有听清,方便重复一下么?????????
8: 好的,您的毛利是下面哪一个?一、小于等于百分之5;二、百分之6(至)百分之8; 三、百分之9(至)百分之10;四、百分之10以上,以上请您选择一二三四任意一个
9: 好的,最后请您说下您的身份证号码后四位?
10: 好的非常感谢,本次已经核实身份完毕,祝您生活愉快,再见。

Speed Control of PAMA-TTS

Fast speed: duration x0.75

Samples

Slow speed: duration x1.5

Samples