PoP-IDLMA: Product-of-Prior Independent Deeply Learned Matrix Analysis for Multichannel Music Source Separation
Published in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 2680-2694, 2023, 2023
Independent deeply learned matrix analysis (IDLMA) is a state-of-the-art determined audio source separation method based on pretrained deep neural networks (DNNs). Owing to the excellent expression power of DNNs, IDLMA can handle a wider range of sources than conventional source models such as nonegative matrix factorization (NMF). However, owing to its supervised nature, the separation performance of IDLMA often degrades in the presence of timbral mismatches between the training data and the to-be-separated data. In this paper, we propose two source models that encompass the NMF- and DNN-based source models by constructing a prior distribution of the source power spectrogram (product of priors: PoP) on the basis of the product-of-expert concept. Since the NMF-based source model works well for a fully blind situation, the proposed models can handle the timbral mismatch without losing the expression power of DNNs. By introducing the PoP-based source models into IDLMA, we propose IDLMA extensions (PoP-IDLMAs) and derive their efficient parameter estimation algorithms on the basis of the majorization–minimization algorithm. Experimental results demonstrated the effectiveness of the proposed PoP-IDLMAs and that the proposed models greatly improve the source power estimation in frequency bands above 500 Hz.