Struktura obiektu
Tytuł:

Relaxing the WDO Assumptionin Blind Extraction of Speakers from Speech Mixtures, Journal of Telecommunications and Information Technology, 2010, nr 4

Autor:

Ding, Ning ; Hamada, Nazomu ; Kasprzak, Włodzimierz

Temat i słowa kluczowe:

W-disjoint orthogonal ; spectrogram analysis ; time-frequency masking ; harmonic frequencies ; speech reconstruction ; blind source extraction ; histogram clustering

Opis:

The time-frequency masking approach in blind speech extraction consists of two main steps: feature clustering in a space spanned over delay-time and attenuation rate, and spectrogram masking in order to reconstruct the sources. Usually a binary mask is generated under the strong W-disjoint orthogonal (WDO) assumption (disjoint orthogonal representations in the frequency domain). In practice, this assumption is most often violated leading to weak quality of reconstructed sources. In this paper we propose the WDO to be relaxed by allowing some frequency bins to be shared by both sources. As we detect instantaneous fundamental frequencies the mask creation is supported by exploring a harmonic structure of speech. The proposed method is proved to be effective and reliable in experiments with both simulated and real acquired mixtures.

Wydawca:

Instytut Łączności - Państwowy Instytut Badawczy, Warszawa

Data wydania:

2010, nr 4

Typ zasobu:

artykuł

Format:

application/pdf

Identyfikator zasobu:

ISSN 1509-4553, on-line: ISSN 1899-8852

DOI:

10.26636/jtit.2010.4.1096

ISSN:

1509-4553

eISSN:

1899-8852

Źródło:

Journal of Telecommunications and Information Technology

Język:

ang

Prawa:

Biblioteka Naukowa Instytutu Łączności

×

Cytowanie

Styl cytowania: