M1, M2, DENS

Laboratoire des Systèmes Perceptifs


29 rue d'Ulm

75005 Paris France


English or French

We try to understand the learning dynamics that structure the emergence of auditory perception, particularly of speech. For this, the laboratory relies on in-vivo (ferret hearing) and in-silico (artificial networks of the latest generations) models.
In this context, an important series of behavioral and neural psychoacoustic experiments already exists (Basirat, Dehaene, and Dehaene-Lambertz 2014; Ding et al. 2016; Saffran, Aslin, and Newport 1996; Shannon et al. 1995). These experiments, developed since the 1950s by many researchers, have been carried out on children or adults, but few have been conducted with animals or models.
To unify these experimental results towards a clear theory, we propose the construction of an open-source tool for generating the experimental stimuli necessary to replicate these experiments.
A generation API has already been developed by the doctoral student who will supervise the project.

The recruited student will be responsible for pursuing:

- The inventory and classification of pre-existing acoustic experiences.
- The implementation of the generation of experimental stimuli with the existing API.
- The implementation of an open-source project directory.
- Benchmark development with simple acoustic models.
- Continued development of synthesis and acoustic control techniques.
During this course, he will gain the following skills:
- Extensive knowledge of the literature in the field.
- Use and development within the open-source python/AI ecosystem (Hugging-Face, pytorch, …).
- Use of self-supervised network architecture [Encodec: (Défossez et al. 2022), Wav2vec2: (Baevski et al. 2020)].

Recruitment criteria:
The internship will consist of the implementation of python code. The trainee will therefore have to have a fluid development in python from the start. Familiarity with AI tools is not required. He must be fluent in English or French and will code (comments, docs, variables) in English. Scientific conduct with integrity is mandatory.

The applicant is invited to contact Yves Boubenec.
boubenec [at] ens.fr


  • Baevski, Alexei, Henry Zhou, Abdelrahman Mohamed, et Michael Auli. 2020. « wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations ». arXiv. https://doi.org/10.48550/arXiv.2006.11477.
  • Basirat, Anahita, Stanislas Dehaene, et Ghislaine Dehaene-Lambertz. 2014. « A Hierarchy of Cortical Responses to Sequence Violations in Three-Month-Old Infants ». Cognition 132 (2): 137‑50. https://doi.org/10.1016/j.cognition.2014.03.013.
  • Défossez, Alexandre, Jade Copet, Gabriel Synnaeve, et Yossi Adi. 2022. « High Fidelity Neural Audio Compression ». arXiv. http://arxiv.org/abs/2210.13438.
  • Ding, Nai, Lucia Melloni, Hang Zhang, Xing Tian, et David Poeppel. 2016. « Cortical Tracking of Hierarchical Linguistic Structures in Connected Speech ». Nature Neuroscience 19 (1): 158‑64. https://doi.org/10.1038/nn.4186.
  • Saffran, Jenny R., Richard N. Aslin, et Elissa L. Newport. 1996. « Statistical Learning by 8-Month-Old Infants ». Science 274 (5294): 1926‑28. https://doi.org/10.1126/science.274.5294.1926.
  • Shannon, Robert V., Fan-Gang Zeng, Vivek Kamath, John Wygonski, et Michael Ekelid. 1995. « Speech Recognition with Primarily Temporal Cues ». Science 270 (5234): 303‑4. https://doi.org/10.1126/science.270.5234.303.