Prososdy Morph: A python library for manipulating pitch and duration in an algorithmic way, for resynthesizing speech.

Related tags

Deep LearningProMo
Overview

ProMo (Prosody Morph)

https://travis-ci.org/timmahrt/ProMo.svg?branch=master https://coveralls.io/repos/github/timmahrt/ProMo/badge.svg?branch=master https://img.shields.io/badge/license-MIT-blue.svg?

Questions? Comments? Feedback? Chat with us on gitter!

Join the chat at https://gitter.im/pythonProMo/Lobby

A library for manipulating pitch and duration in an algorithmic way, for resynthesizing speech.

This library can be used to resynthesize pitch in natural speech using pitch contours taken from other speech samples, generated pitch contours, or through algorithmic manipulations of the source pitch contour.

1   Common Use Cases

What can you do with this library?

Apply the pitch or duration from one speech sample to another.

  • alignment happens both in time and in hertz

    • after the morph process, the source pitch points will be at the same absolute pitch and relative time as they are in the target file
    • time is relative to the start and stop time of the interval being considered (e.g. the pitch value at 80% of the duration of the interval). Relative time is used so that the source and target files don't have to be the same length.
    • temporal morphing is a minor effect if the sampling frequency is high but it can be significant when, for example, using a stylized pitch contour with few pitch samples.
  • modifications can be done between entire wav files or between corresponding intervals as specified in a textgrid or other annotation (indicating the boundaries of words, stressed vowels, etc.)

    • the larger the file, the less useful the results are likely to be without using a transcript of some sort
    • the transcripts do not have to match in lexical content, only in the number of intervals (same number of words or phones, etc.)
  • modifications can be scaled (it is possible to generate a wav file with a pitch contour that is 30% or 60% between the source and target contours).

  • can also morph the pitch range and average pitch independently.

  • resynthesis is performed by Praat.

  • pitch can be obtained from praat (such as by using praatio) or from other sources (e.g. ESPS getF0)

  • plots of the resynthesis (such as the ones below) can be generated

2   Illustrative example

Consider the phrase "Mary rolled the barrel". In the first recording (examples/mary1.wav), "Mary rolled the barrel" was said in response to a question such as "Did John roll the barrel?". On the other hand, in the second recording (examples/mary2.wav) the utterance was said in response to a question such as "What happened yesterday".

"Mary" in "mary1.wav" is produced with more emphasis than in "mary2.wav". It is longer and carries a more drammatic pitch excursion. Using ProMo, we can make mary1.wav spoken similar to mary2.wav, even though they were spoken in a different way and by different speakers.

Duration and pitch carry meaning. Change these, and you can change the meaning being conveyed.

Note that modifying pitch and duration too much can introduce artifacts. Such artifacts can be heard even in pitch morphing mary1.wav to mary2.wav.

Pitch morphing (examples/pitch_morph_example.py):

The following image shows morphing of pitch from mary1.wav to mary2.wav on a word-by-word level in increments of 33% (33%, 66%, 100%). Note that the morph adjusts the temporal dimension of the target signal to fit the duration of the source signal (the source and generated contours are equally shorter than the target contour). This occurs at the level of the file unless the user specifies an equal number of segments to align in time (e.g. using word-level transcriptions, as done here, or phone-level transcriptions, etc.)

examples/files/mary1_mary2_f0_morph.png

With the ability to morph pitch range and average pitch, it becomes easier to morph contours produced by different speakers:

The following image shows four different pitch manipulations. On the upper left is the raw morph. Notice that final output (black line) is very close to the target. Differences stem from duration differences.

However, the average pitch and pitch range are qualities of speech that can signify differences in gender in addition to other aspects of speaker identity. By resetting the average pitch and pitch range to that of the source, it is possible to morph the contour while maintaining aspects of the source speaker's identity.

The image in the upper right contains a morph followed by a reset of the average pitch to the source speaker's average pitch. In the bottom right a morph followed by a reset of the speaker's pitch range. In the bottom right pitch range was reset and then the speaker's average pitch was reset.

The longer the speech sample, the more representative the pitch range and mean pitch will be of the speaker. In this example both are skewed higher by the pitch accent on the first word.

Here the average pitch of the source (a female speaker) is much higher than the target (a male speaker) and the resulting morph sounds like it comes from a different speaker than the source or target speakers. The three recordings that involve resetting pitch range and/or average pitch sound much more natural.

examples/files/mary1_mary2_f0_morph_compare.png

Duration morphing (examples/duration_manipulation_example.py):

The following image shows morphing of duration from mary1.wav to mary2.wav on a word-by-word basis in increments of 33% (33%, 66%, 100%). This process can operate over an entire file or, similar to pitch morphing, with annotated segments, as done in this example.

examples/files/mary1_mary2_dur_morph.png

3   Tutorials

Tutorials for learning about prosody manipulation and how to use ProMo are available.

Tutorial 1.1: Intro to ProMo

Tutorial 1.2: Pitch manipulation tutorial

4   Major revisions

Ver 1.3 (May 29, 2017)

  • added tutorials
  • f0Morph() can now exclude certain regions from the morph process if desired

Ver 1.2 (January 27, 2017)

  • added code for reshaping pitch accents (shift alignment, add plateau, or change height)

Ver 1.1 (February 22, 2016)

  • f0 morph code for modifying speaker pitch range and average pitch
  • (October 20, 2016) Added integration tests with travis CI and coveralls support.

Ver 1.0 (January 19, 2016)

  • first public release.

Beta (July 1, 2013)

  • first version which was utilized in my dissertation work

5   Requirements

Python 2.7.* or above

Python 3.3.* or above (or below, probably)

My praatIO library is used extensively and can be downloaded here

Matplotlib is needed if you want to plot graphs. Matplotlib website

Scipy is needed if you want to use interpolation--typically if you have stylized pitch contours (in praat PitchTier format, for example) that you want to use in your morphing). Scipy website

Matplotlib and SciPy are non-trivial to install, as they depends on several large packages. You can visit their websites for more information. I recommend the following instructions to install matplotlib which uses python wheels. These will install all required libraries in one fell swoop.

On Mac, open a terminal and type:

python -m pip install matplotlib

python -m pip install scipy

On Windows, open a cmd or powershell window and type:

<<path to python>> -m pip install matplotlib

<<path to python>> -m pip install scipy

e.g. C:\python27\python.exe -m install matplotlib

Otherwise, to manually install, after downloading the source from github, from a command-line shell, navigate to the directory containing setup.py and type:

python setup.py install

If python is not in your path, you'll need to enter the full path e.g.:

C:\Python27\python.exe setup.py install

6   Usage

See /examples for example usages

7   Installation

If you on Windows, you can use the installer found here (check that it is up to date though) Windows installer

Promo is on pypi and can be installed or upgraded from the command-line shell with pip like so:

python -m pip install promo --upgrade

Otherwise, to manually install, after downloading the source from github, from a command-line shell, navigate to the directory containing setup.py and type:

python setup.py install

If python is not in your path, you'll need to enter the full path e.g.:

C:\Python36\python.exe setup.py install

8   Citing ProMo

If you use ProMo in your research, please cite it like so:

Tim Mahrt. ProMo: The Prosody-Morphing Library. https://github.com/timmahrt/ProMo, 2016.

9   Acknowledgements

Development of ProMo was possible thanks to NSF grant BCS 12-51343 to Jennifer Cole, José I. Hualde, and Caroline Smith and to the A*MIDEX project (n° ANR-11-IDEX-0001-02) to James Sneed German funded by the Investissements d'Avenir French Government program, managed by the French National Research Agency (ANR).

Owner
Tim
I write tools for working with speech data.
Tim
A developer interface for creating Chat AIs for the Chai app.

ChaiPy A developer interface for creating Chat AIs for the Chai app. Usage Local development A quick start guide is available here, with a minimal exa

Chai 28 Dec 28, 2022
Detection of drones using their thermal signatures from thermal camera through YOLO-V3 based CNN with modifications to encapsulate drone motion

Drone Detection using Thermal Signature This repository highlights the work for night-time drone detection using a using an Optris PI Lightweight ther

Chong Yu Quan 6 Dec 31, 2022
Monocular 3D Object Detection: An Extrinsic Parameter Free Approach (CVPR2021)

Monocular 3D Object Detection: An Extrinsic Parameter Free Approach (CVPR2021) Yunsong Zhou, Yuan He, Hongzi Zhu, Cheng Wang, Hongyang Li, Qinhong Jia

Yunsong Zhou 51 Dec 14, 2022
GNPy: Optical Route Planning and DWDM Network Optimization

GNPy is an open-source, community-developed library for building route planning and optimization tools in real-world mesh optical networks

Telecom Infra Project 140 Dec 19, 2022
Torchserve server using a YoloV5 model running on docker with GPU and static batch inference to perform production ready inference.

Yolov5 running on TorchServe (GPU compatible) ! This is a dockerfile to run TorchServe for Yolo v5 object detection model. (TorchServe (PyTorch librar

82 Nov 29, 2022
Code for "Long Range Probabilistic Forecasting in Time-Series using High Order Statistics"

Long Range Probabilistic Forecasting in Time-Series using High Order Statistics This is the code produced as part of the paper Long Range Probabilisti

16 Dec 06, 2022
Object detection GUI based on PaddleDetection

PP-Tracking GUI界面测试版 本项目是基于飞桨开源的实时跟踪系统PP-Tracking开发的可视化界面 在PaddlePaddle中加入pyqt进行GUI页面研发,可使得整个训练过程可视化,并通过GUI界面进行调参,模型预测,视频输出等,通过多种类型的识别,简化整体预测流程。 GUI界面

杨毓栋 68 Jan 02, 2023
A MatConvNet-based implementation of the Fully-Convolutional Networks for image segmentation

MatConvNet implementation of the FCN models for semantic segmentation This package contains an implementation of the FCN models (training and evaluati

VLFeat.org 175 Feb 18, 2022
PyTorch implementation of Advantage async actor-critic Algorithms (A3C) in PyTorch

Advantage async actor-critic Algorithms (A3C) in PyTorch @inproceedings{mnih2016asynchronous, title={Asynchronous methods for deep reinforcement lea

LEI TAI 111 Dec 08, 2022
SLAMP: Stochastic Latent Appearance and Motion Prediction

SLAMP: Stochastic Latent Appearance and Motion Prediction Official implementation of the paper SLAMP: Stochastic Latent Appearance and Motion Predicti

Kaan Akan 34 Dec 08, 2022
City-seeds - A random generator of cultural characteristics intended to spark ideas and help draw threads

City Seeds This is a random generator of cultural characteristics intended to sp

Aydin O'Leary 2 Mar 12, 2022
这个开源项目主要是对经典的时间序列预测算法论文进行复现,模型主要参考自GluonTS,框架主要参考自Informer

Time Series Research with Torch 这个开源项目主要是对经典的时间序列预测算法论文进行复现,模型主要参考自GluonTS,框架主要参考自Informer。 建立原因 相较于mxnet和TF,Torch框架中的神经网络层需要提前指定输入维度: # 建立线性层 TensorF

Chi Zhang 85 Dec 29, 2022
TensorFlow implementation of the algorithm in the paper "Decoupled Low-light Image Enhancement"

Decoupled Low-light Image Enhancement Shijie Hao1,2*, Xu Han1,2, Yanrong Guo1,2 & Meng Wang1,2 1Key Laboratory of Knowledge Engineering with Big Data

17 Apr 25, 2022
Prototype-based Incremental Few-Shot Semantic Segmentation

Prototype-based Incremental Few-Shot Semantic Segmentation Fabio Cermelli, Massimiliano Mancini, Yongqin Xian, Zeynep Akata, Barbara Caputo -- BMVC 20

Fabio Cermelli 21 Dec 29, 2022
Leveraging OpenAI's Codex to solve cornerstone problems in Music

Music-Codex Leveraging OpenAI's Codex to solve cornerstone problems in Music Please NOTE: Presented generated samples were created by OpenAI's Codex P

Alex 2 Mar 11, 2022
This is an official implementation of the paper "Distance-aware Quantization", accepted to ICCV2021.

PyTorch implementation of DAQ This is an official implementation of the paper "Distance-aware Quantization", accepted to ICCV2021. For more informatio

CV Lab @ Yonsei University 36 Nov 04, 2022
LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation

LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation by Junjue Wang, Zhuo Zheng, Ailong Ma, Xiaoyan Lu, and Yanfei Zh

Payphone 8 Nov 21, 2022
Deep functional residue identification

DeepFRI Deep functional residue identification Citing @article {Gligorijevic2019, author = {Gligorijevic, Vladimir and Renfrew, P. Douglas and Koscio

Flatiron Institute 156 Dec 25, 2022
Re-implememtation of MAE (Masked Autoencoders Are Scalable Vision Learners) using PyTorch.

mae-repo PyTorch re-implememtation of "masked autoencoders are scalable vision learners". In this repo, it heavily borrows codes from codebase https:/

Peng Qiao 1 Dec 14, 2021
Auto Seg-Loss: Searching Metric Surrogates for Semantic Segmentation

Auto-Seg-Loss By Hao Li, Chenxin Tao, Xizhou Zhu, Xiaogang Wang, Gao Huang, Jifeng Dai This is the official implementation of the ICLR 2021 paper Auto

61 Dec 21, 2022