Combining enhanced DINO with prototypical networks for self-supervised speaker verification

doi:10.1117/12.3069505

Paper

Combining enhanced DINO with prototypical networks for self-supervised speaker verification

Published Apr 23, 2025 · Xianmei Wan, Guihua Liao, Ying Lou

0

Citations

0

Influential Citations

Full text

Abstract

Training speaker-discriminative and robust speaker verification systems without explicit speaker labels remains a persisting challenge. In this paper, we propose a new self-supervised speaker verification approach, Enhanced DINO with Prototypical Networks (EDPN), which effectively facilitates self-supervised speaker representation learning. EDPN adds the prototypical networks training strategy to the self-distillation framework, integrating the advantages of contrastive learning and non-contrastive learning. By incorporating prototypical networks into the self-supervised framework of the enhanced DINO, it achieves superior performance. A series of experiments conducted on the VoxCeleb datasets demonstrates the efficacy of our self-supervised score normalization algorithm in enhanced DINO framework, leading to state-of-the-art results in self-supervised speaker verification on VoxCeleb.

Study Snapshot

Enhanced DINO with Prototypical Networks (EDPN) effectively facilitates self-supervised speaker verification, achieving state-of-the-art results on VoxCeleb datasets.

PopulationOlder adults (50-71 years)

Sample size24

MethodsObservational

OutcomesBody Mass Index projections

ResultsSocial networks mitigate obesity in older groups.

Combining enhanced DINO with prototypical networks for self-supervised speaker verification

References

Citations