Pitch and harmonic complex sounds are essential for music, speech, and auditory object grouping. We have shown that marmosets are able to perceive pitch in the same way as we humans do, and that there is a specialized region in the primary auditory cortex sensitive to pitch. We are currently investigating how the circuit in the auditory system enables these computations, and how the pitch-sensitive area emerges through development.

Complex pitch perception mechanisms are shared by humans and marmosets

(Song et al., 2016) Complex pitch perception serves a pivotal role in human audition, especially in speech and music perception. It has been suggested that pitch perception mechanisms demonstrated in humans are not shared by nonhuman species. Here we provide evidence that a New World monkey, the common marmoset, shares all primary features of complex pitch perception mechanisms with humans. Combined with previous findings of a specialized pitch processing region in both marmoset and human auditory cortex, this evidence suggests that pitch perception mechanisms likely originated early in primate evolution.

Figure 2. RESs dominate marmoset pitch strength, similar to humans. (A and B) Spectra (A) and waveforms (B) of the background sounds used in marmoset F0DL measurements, for ALL (black), PTF0 (red), RESs (green), and URSs (blue) (noise masker not shown). (C) Example psychometric curves from the subject M13W under the four conditions in A. Darker and lighter lines indicate the first and the second measured curves, respectively. The gray line indicates 50% corrected hit rate. (D) F0DLs under each condition across all tested animals and measurements. Error bars indicate the mean values and SDs, with box plots above (n = 8, for each). (E) F0DL dominance ratios, defined as the ratio between the F0DL of all harmonics presented together and the F0DL measured under each decomposed condition (n = 8, for each). The gray line indicates a reference ratio equal to 1. The error bars indicate the mean values and SDs, with box plots to the right (n = 8, for each).

The role of harmonic resolvability in pitch perception marmoset

(Osmanski et al., 2013) Pitch is one of the most fundamental percepts in the auditory system and can be extracted using either spectral or temporal information in an acoustic signal. Although pitch perception has been extensively studied in human subjects, it is far less clear how nonhuman primates perceive pitch. We have addressed this question in a series of behavioral studies in which marmosets, a vocal nonhuman primate species, were trained to discriminate complex harmonic tones differing in either spectral (fundamental frequency f0) or temporal envelope (repetition rate) cues. We found that marmosets used temporal envelope information to discriminate pitch for acoustic stimuli with higher-order harmonics and lower f0 values and spectral information for acoustic stimuli with lower-order harmonics and higher f0 values. We further measured frequency resolution in marmosets using a psychophysical task in which pure tone thresholds were measured as a function of notched noise masker bandwidth. Results show that only the first four harmonics are resolved at low f0 values and up to 16 harmonics are resolved at higher f0 values. Resolvability in marmosets is different from that in humans, where the first five to nine harmonics are consistently resolved across most f0 values, and is likely the result of a smaller marmoset cochlea. In sum, these results show that marmosets use two mechanisms to extract pitch (harmonic templates [spectral] for resolved harmonics, and envelope extraction [temporal] for unresolved harmonics) and that species differences in stimulus resolvability need to be taken into account when investigating and comparing mechanisms of pitch perception across animals.

Model of harmonic resolvability. A, Resolvability boundaries for both marmosets (solid black line) and humans (dashed black line) derived from psychophysical ERB estimates. Harmonics falling to the left of each boundary are fully resolved, whereas harmonics falling to the right are unresolved or partially resolved. Compared with humans, marmosets have poorer spectral resolution less than 300 Hz. Marmosets can fully resolve up to 12 harmonics at 400 Hz, a result that matches results from Experiment 1 where temporal envelope cues were no longer salient for H1–H9 or H4–H12 stimuli ∼>450 Hz. B, Comparison of harmonic resolvability for marmosets using both behavioral data and tuning bandwidths of neurons recorded from A1. Psychophysical curves form a lower envelope for resolvability estimates derived from neural tuning bandwidths.

Pitch-selective units found in the auditory cortex of marmosets

(Bendor and Wang, 2005) Pitch perception is critical for identifying and segregating auditory objects1, especially in the context of music and speech. The perception of pitch is not unique to humans and has been experimentally demonstrated in several animal species2, 3. Pitch is the subjective attribute of a sound's fundamental frequency (f 0) that is determined by both the temporal regularity and average repetition rate of its acoustic waveform. Spectrally dissimilar sounds can have the same pitch if they share a common f 0. Even when the acoustic energy at f 0 is removed ('missing fundamental') the same pitch is still perceived1. Despite its importance for hearing, how pitch is represented in the cerebral cortex is unknown. Here we show the existence of neurons in the auditory cortex of marmoset monkeys that respond to both pure tones and missing fundamental harmonic complex sounds with the same f 0, providing a neural correlate for pitch constancy1. These pitch-selective neurons are located in a restricted low-frequency cortical region near the anterolateral border of the primary auditory cortex, and is consistent with the location of a pitch-selective area identified in recent imaging studies in humans4, 5.

Figure 1: Error bars represent standard error of the mean (s.e.m.). The dotted black lines indicate the significance level for discharge rate (plusminus 2 standard deviations away from the spontaneous discharge rate). a, Frequency spectra of a series of harmonic complex stimuli. The fundamental frequency component (f0) and its higher harmonics have equal amplitudes of 50 dB SPL. b, Peristimulus time histogram (left) and tuning curve (right) of the neuron's response to the stimuli in a. Stimuli were presented from 500 to 1,000 ms (indicated by the shaded region on the left plot). c, Frequency tuning of the neuron derived from pure tones. d, Response of the neuron to a pure tone at characteristic frequency (182 Hz) across sound levels (rate-level function). Inset plot shows an overlay of 2,434 digitized action potentials recorded from this neuron (displayed within a 2 ms window). e, The neuron's responses to individual harmonics (number 1–12) at three sound levels, respectively. All the harmonics above the f0 component (first harmonic) were outside the neuron's excitatory frequency response area, and did not elicit significant responses. SPL, sound pressure level.

Figure2: a, Characteristic frequency topographical map from the left hemisphere of one marmoset. Pitch-selective neurons (black squares) were found clustered near the anterolateral border of AI. Frequency reversals indicate the borders between AI/R and R/RT (rostral temporal field). b, The characteristic frequency distribution from pitch-selective and non-pitch neurons within the pitch area of three marmosets. M, medial; C, caudal; L, lateral; R, rostral; CF, characteristic frequency.