Enhancing Mandarin Tone and Vowel Recognition in Cochlear Implant Simulation through Combining Task Practice

Article information

Audiol Speech Res. 2024;20(4):199-207
Publication date (electronic) : 2024 October 31
doi : https://doi.org/10.21848/asr.240167
1Callier Center for Communication Disorders, School of Behavioral and Brain Sciences, University of Texas at Dallas, Richardson, TX, USA
2Department of Electrical and Electronic Engineering, Southern University of Science and Technology, Shenzhen, China
3Program of Speech and Hearing Science, College of Health Solutions, Arizona State University, Tempe, AZ, USA
Correspondence: Seeon Kim, PhD Callier Center for Communication Disorders, School of Behavioral and Brain Sciences, University of Texas at Dallas, Richardson, TX 75080, USA Tel: +1-972-883-3660 Fax: +1-972-883-3622 E-mail: Seeon.Kim@UTDallas.edu
Received 2024 October 1; Revised 2024 October 8; Accepted 2024 October 11.

Abstract

Purpose

Mandarin tone recognition with cochlear implant simulation can be enhanced via targeted auditory training. However, such training generally does not transfer to different tasks, even when they share the same stimuli. As such, this study investigated whether combined Mandarin tone and vowel recognition training leads to greater improvements in Mandarin tone, vowel, and sentence recognition compared to tone- or vowel- only training.

Methods

Twenty-nine native Mandarin-speaking young adults with normal hearing were randomly assigned to one of three training regimens: Tone-Only, Vowel-Only, Tone-Vowel. Mandarin tone, vowel, and sentence recognition were tested before and after training.

Results

Tone-Only training improved Mandarin tone recognition but not vowel recognition, while Vowel-Only training improved Mandarin vowel recognition but not tone recognition. The combined Tone-Vowel training improved both tone and vowel recognition. ToneVowel training also led to improvements in Mandarin sentence recognition, although these gains were not significantly greater than those observed in Tone-Only or Vowel-Only groups.

Conclusion

These findings suggest that within an equivalent amount of training, combined tone and vowel recognition training enhances both tone and vowel recognition, while tone- or vowel-only training improves the performance in the targeted task alone. Combined tone and vowel recognition may be necessary to achieve improvements in Mandarin tone, vowel, and sentence recognition.

INTRODUCTION

A cochlear implant (CI) is an electronic device that assists nearly one million individuals with severe-to-profound hearing loss. To restore the sense of hearing, CI converts sound waves into electrical signals that stimulate the auditory nerve, thereby allowing the brain to perceive sound. However, this process involves the interaction of frequency channel during electric stimulation and requires the extraction of temporal envelope. This results in the degradation of spectral and temporal resolution in CI users (Arslan & Luo, 2022; Friesen et al., 2001; Wilson et al., 1991). Due to limited fundamental frequency and temporal envelope cues, CI users find it challenging to distinguish similar speech sounds, such as vowels and consonants, and to accurately perceive the pitch, timing, and duration of speech sounds (Luo et al., 2020; Nie et al., 2006; Peng et al., 2008). As a result, CI users face difficulties perceiving both segmental (i.e., consonants and vowels) and suprasegmental (i.e., pitch, intensity, and duration) information in speech compared to individuals with normal hearing (NH). Tonal language like Mandarin Chinese encompass both segmental and suprasegmental information. For example, Mandarin has six single vowels (/a/, /o/, /e/, /i/, /u/, and /ü/), each single vowel associated with four different patterns of pitch contours (tone 1: high-flat; tone 2: low-rising; tone 3: low-falling-rising; and tone 4: high-falling). Therefore, Mandarin speaking CI users face extreme difficulties because Mandarin vowels carry both tone and vowel information that is crucial for Mandarin sentence recognition (Cabrera et al., 2019; Chen et al., 2014; Luo et al., 2008; Peng et al., 2017).

Targeted auditory training focuses on specific aspects of auditory process that involves repetitive listening tasks to improve particular skills, such as sound discrimination, recognition of speech in noise, or identification of phonemes, vowels, and tones. Previous studies have demonstrated that Mandarin tone and vowel recognition in CI users can be improved via targeted auditory training (Kim et al., 2021; Wu et al., 2007; Zhang et al., 2021). For example, Kim et al.(2021) trained NH Mandarin-speaking young adults on Mandarin tone recognition with CI simulation for 5 consecutive days, with 1 hour of training per day. The Mandarin tone training involved a four-alternative forced (4-AFC) choice tone recognition task and provided trial-by-trial auditory and visual feedback. The training group showed significant improvement in Mandarin tone recognition and retained for a week, whereas the no-training group, who were only exposed to Mandarin tones without any training, did not show improvement. Zhang et al.(2021) highlighted the benefits of high-variability phonetic training in lexical tone perception for Mandarin speaking pediatric CI users. Their training, similar to Kim et al.(2021)'s 4-AFC Mandarin tone identification, was conducted for five sessions over 3 weeks. The training significantly improved tone identification and generalized to the untrained discrimination task, which involved discriminating whether pairs of two tones are the same (i.e., tone 1-tone 1) or different (i.e., tone 1-tone 2). Both trained identification and untrained discrimination tasks of Mandarin tone recognition were sustained for up to 10 weeks. Similarly, Wu et al.(2007) observed that computer-assisted speech training with monosyllabic words significantly enhanced vowel, consonant, and tone recognition among hearing-impaired children, with these improvements being largely retained for up to 2 months. The training was conducted at home for half an hour per day, 5 days per week, for 10 weeks. Initially, the training involved a discrimination task, in which participants were required to discriminate a distinct sound from two identical ones presented in a 3-AFC setup. Once participants achieved over 80% accuracy in the word discrimination task, the training progressed to an identification task, where they were trained to recognize final vowels, consonants, and tones separately. In conclusion, these studies collectively underscore the efficacy of targeted auditory training in enhancing Mandarin tone and vowel recognition among CI users, demonstrating not only immediate improvements but also the durability of these improvements over time.

Despite the benefits of targeted auditory training, these improvements are often limited to the specific tasks practiced, a phenomenon known as near transfer. However, the skills developed through training typically do not extend to different, untrained tasks, a limitation referred to as far transfer. Far transfer is especially restricted in non-speech auditory training, even when the tasks share the same stimulus (Grimault et al., 2003; Wright et al., 2010; Wright & Zhang, 2009). On the other hand, auditory training studies for CI users using speech stimuli have shown mixed results regarding task transfer. For instance, Schumann et al.(2015) found phoneme training improved sentence recognition in noisy environments for CI users. Cheng et al.(2018) reported that training in music contour identification could enhance both lexical tone recognition and sentence comprehension for Mandarin-speaking CI users. Conversely, sentence recognition training did not lead to better vowel and consonant recognition (Fu et al., 2005), likely due to the contextual cues in sentences offering limited phoneme recognition benefits. Furthermore, Loebach & Pisoni(2008) suggested that auditory training is more likely to transfer to easier tasks (such as simple words or meaningful sentences) rather than more difficult ones (such as complex words, anomalous sentences, or environmental sounds) when using CI simulation. Overall, the transfer of auditory training remains limited, highlighting the challenge of achieving broader generalization across tasks in auditory training programs.

Mandarin tone recognition is crucial for Mandarin sentence recognition, with strong positive correlations between Mandarin tone recognition and Mandarin sentence recognition in CI recipients (Chen et al., 2014; Fu & Shannon, 1998; Fu & Galvin, 2003). While targeted Mandarin tone recognition training has been shown to improve performance in related tasks (Kim et al., 2021; Wu et al., 2007; Zhang et al., 2021), it remains unclear whether targeted Mandarin tone recognition training extends its benefit to Mandarin vowel or sentence recognition, and vice versa. Given the interdependence of Mandarin tone and vowel recognition in speech perception, combining training in both tasks practice may offer a more effective strategy for enhancing overall speech recognition compared to focusing solely on Mandarin tone or vowel recognition training. Prior combined perceptual learning studies have primarily focused on tasks like auditory frequency and temporal-interval discrimination (Wright et al., 2010), or auditory amplitude modulation depth and amplitude modulation rate discrimination (Maidment et al., 2015), as well as visual orientation and spatial-frequency discrimination (Szpiro et al., 2014), demonstrating enhancement in perception across both trained tasks. However, our investigation uniquely employed speech stimuli with CI simulation and applied a combined auditory training approach, offering a novel perspective on perceptual learning in the context of CI rehabilitation.

Here we compare the effectiveness of three different training regimens to investigate whether combined Mandarin tone and vowel recognition training leads to greater improvements in Mandarin tone, vowel, and sentence recognition compared to tone- or vowel-only training. The first training regimen, Tone-Only, involves exclusive Mandarin tone recognition training alone throughout the session. The second, Vowel-Only, focuses solely on Mandarin vowel recognition training for the entire session. The third regimen, Tone-Vowel, combines Mandarin tone recognition training followed by Mandarin vowel recognition training within the same session. All three regimens, Tone-Only, Vowel-Only, and Tone-Vowel, employ an equal number of total training trials. However, within the Tone-Vowel regimen, the allocation of trials to Mandarin tone recognition is half that of the Tone-Only regimen, and similarly, the allocation for Mandarin vowel recognition is half that of the Vowel-Only regimen.

MATERIALS AND METHODS

Participants

Twenty-nine Mandarin-speaking NH young adults (17 females, 12 males; 19-27 years old; average, 28.6 years old) with pure-tone hearing thresholds better than 20 dB hearing level at octave frequencies from 250 to 8,000 Hz in both ears were recruited from the student population of the Southern University of Science and Technology. They provided informed consent before the study and were compensated for their participation. They all passed the Mandarin version of the Montreal Cognitive Assessment test (Nasreddine et al., 2005) with a score higher than 26 out of 30 points. They also scored at least 85% correct in a Mandarin tone and vowel recognition test of original speech without CI simulation. This study was reviewed and approved by the Institutional Review Board of ASU.

Speech and signal processing

Six Mandarin vowels (/a/, /o/, /e/, /i/, /u/, and /ü/ in Pinyin) derived from the Chinese Standard Database (Wang, 1993) were produced by five female, and five male talkers in four lexical tones (240 stimuli) were used for both Mandarin tone and vowel recognition testing and training. Five talkers were used for Mandarin tone recognition testing (120 stimuli), and another five talkers were used for Mandarin tone recognition training (120 stimuli) (68.20% correct vs. 68.89% correct). Twelve sets of Mandarin hearing in noise tests (MHINT) were used for the sentence recognition test in quiet before and after the training. Each set contained twenty sentences. One set was used for familiarization, five sets for the pre-test, five sets for the post-test, and one set for practice. One set of MHINT was divided into 10 parts (two sentences each) and was used exclusively for practice sentences at the beginning of each testing session (pre- and post-tests).

The original speech, without CI simulation, was used only for screening at the beginning of the study, while the CI simulation was used for all subsequent testing and training. The CI simulation replicated the method outlined by Kim et al.(2021). A four-channel noise vocoder (Shannon et al., 1995), using the continuous interleaved sampling (Wilson et al., 1991) strategy, was chosen to mimic real CI performance. To pre-process the speech signal, a first-order Butterworth 1,200-Hz high-pass filter was applied by the vocoder. The speech was then divided into four frequency channels, ranging from 200 to 7,000 Hz, using fourth-order Butterworth bandpass filters. The cutoff frequencies for these four channels were determined by the Greenwood(1990) function, set at 200, 591, 1,426, 3,205, and 7,000 Hz. The temporal envelope for each channel was extracted via half-wave rectification, followed by a fourth-order Butterworth 300-Hz low-pass filter. The extracted envelopes were used to amplitude-modulate a wideband noise carrier, which was subsequently filtered by the corresponding bandpass filter. The amplitude-modulated, bandpass-filtered noise from each frequency channel was finally added together to generate the noise-band vocoded signal. All speech stimuli, with and without CI simulation, were normalized to the same root mean square (RMS) level to ensure consistency.

Procedure

This study took place in a quiet room at each participant’s home, where they used their own computer and headphones. A zoom meeting was held during each session to provide real-time support for participants and track their progress. The Gorilla Experiment Builder (Anwyl-Irvine et al., 2020) on www.gorilla.sc was used to manage task instructions, stimulus presentation, and response collection. At the start of the study, each participant adjusted the volume of a calibration 1-kHz pure tone at the normalized RMS level to be comfortably loud, which was maintained throughout the sessions. The validity of online training and testing was confirmed in Kim et al.(2021) study.

Screening

Participants initially completed a Mandarin tone and vowel recognition test without CI simulation and were required to achieve a minimum score of 85% to participate. During each trial, one of 120 testing stimuli was randomly presented. After listening to the stimulus, participants selected both Mandarin tone and vowel simultaneously by clicking one of the 24 response (four tones × six vowels) buttons shown on the computer screen. No feedback was provided.

Mandarin tone and vowel recognition test

A preview of Mandarin tone and vowel with CI simulation was provided before the pre-test as a pre-learning session. During the preview, stimuli of each training talker were presented in the order of vowel /a/, /o/, /e/, /i/, /u/, and /ü/ and for each vowel in the order of tone 1, tone 2, tone 3, and tone 4. While participants listened to each stimulus, the corresponding Mandarin tone and vowel were shown on the computer screen. After previewing all the training stimuli once, participants performed the Mandarin tone and vowel recognition test.

Mandarin tone and vowel recognition were tested separately with CI simulation before and after the training. Each Mandarin tone and vowel recognition were tested two times in ABBA or BAAB order counterbalanced across participants. Participants had to choose a Mandarin tone by clicking one of the four responses (four tones) buttons shown on the computer screen for the Mandarin tone recognition test and choose a Mandarin vowel by one of the six responses (six vowels) for the Mandarin vowel recognition test. No feedback was provided. The results were converted into a percent correct score.

Mandarin sentence recognition test

A familiarization session using 20 Mandarin sentences with CI simulation was provided prior to the pre-test. As participants listened to each Mandarin sentence, the corresponding text was displayed on the computer screen. After completing the familiarization session, participants proceeded to the Mandarin sentence recognition test.

Mandarin sentence recognition was tested with CI simulation before and after the training. Participants were required to listen to each Mandarin sentence and repeat it out loud. No feedback was provided. A total of 22 Mandarin sentences were presented for the test, with the first two serving as practice and not included in the results. Only the remaining 20 Mandarin sentences were scored, and the percentage correct score was based on the number of correctly repeated words.

Training regimen

Participants were randomly assigned to three groups. The first group (seven females, three males; 26-30 years old; average, 28.1 years old) received only Mandarin tone recognition training (Tone-Only), and the second group (five females, five males; 26-30 years old; average, 28 years old) received only Mandarin vowel recognition training (Vowel-Only), and the third group (five females, five males; 26-30 years old; average, 28.6 years old) received both Mandarin tone and vowel recognition training (Tone-Vowel).

Training started the day after the pre-test and lasted 1 hour per day for 4 consecutive days. Each day consisted of four training sessions with feedback and two testing sessions without feedback, a total of six sessions. Each session involved a set of training stimuli consisting of 120 trials. Participants were divided into three groups: 1) the Tone-Only group underwent four sessions of Mandarin tone recognition training and two sessions of Mandarin tone recognition testing; 2) the Vowel-Only group engaged in four sessions of Mandarin vowel recognition training and two sessions of Mandarin vowel recognition testing; and 3) the Tone-Vowel group participated in two sessions of each tone and vowel recognition training, with these sessions interleaved, along with one testing session each for tone and vowel recognition. Across all groups, the number of training stimuli provided per day was consistent (Figure 1).

Figure 1.

Overview of daily training regimens for the Tone-Only, Vowel-Only, and Tone-Vowel groups. Participants engaged in 1-hour training sessions for 4 days, with each block consisting of 120 trials, featuring six vowels and four tones spoken by five different talkers. Both the pre- and post-tests are consistent with the separate mandarin tone, vowel, and sentence recognition tests.

During the training, both visual and auditory feedback was given. For each correct trial, a green text “Correct! It is x!” (with x being tone 1, 2, 3, 4 or vowel a, e, o, u, i, or ü depending on the training stimulus) was shown on the computer screen for 2.5 seconds. In contrast, after each incorrect trial, the red text “Incorrect! It is x!” was shown for 2.5 seconds, and the stimulus was replayed for participants to learn from the mistake.

Data analysis

Mandarin tone, vowel, and sentence recognition scores of the testing stimuli were analyzed separately using a linear mixed-effect model. Training regimen groups (Tone-Only, Vowel-Only, and Tone-Vowel) and testing sessions (pre- and post-test) were interpreted as fixed effects, while subjects were interpreted as random effects.

RESULTS

Figure 2 illustrates the recognition scores for Mandarin 1) tone, 2) vowel, and 3) sentence from pre-and post-test sessions among listeners in the Tone-Only, Vowel-Only, and Tone-Vowel groups.

Figure 2.

Percent correct recognition of tone (left), vowel (middle), and sentence (right) in the pre- and post-tests across the training groups: Tone-Only (red), Vowel-Only (blue), and Tone-Vowel (green).

Mandarin tone recognition

A linear mixed model analysis revealed no significant main effect for the training group on Mandarin tone recognition scores, while a significant main effect for the testing session was observed (F1,27 = 40.30; p < 0.001). The interaction between the training group and testing session was marginally significant (F2,27 = 3.22; p = 0.055), suggesting a potential trend of differing effects of the session based on the training group. Pairwise comparison indicated significant improvements in Mandarin tone recognition scores after training for the Tone-Only and Tone-Vowel groups, but not for the Vowel-Only group. Specifically, the Tone-Only group showed a significant increase in mean Mandarin tone recognition scores from 60.5% to 67.1% (p < 0.001), and the Tone-Vowel group improved from 60.6% to 66.9% (p = 0.000). However, the Vowel-Only group did not exhibit a significant change in Mandarin tone recognition scores (62.0% to 63.6%); p = 0.565.

Mandarin vowel recognition

A linear mixed model analysis revealed no significant main effect for the training group on Mandarin vowel recognition scores. However, a significant main effect was observed for the testing session (F1,28 = 32.88; p < 0.001), along with a significant interaction between the training group and testing session (F2,28 = 13.64; p < 0.001). Pairwise comparison indicated that Mandarin vowel recognition scores significantly improved after training in the Vowel-Only and Tone-Vowel groups but not in the Tone-Only group. Specifically, the Vowel-Only group showed significant increases in vowel recognition scores from 53.2% to 71.9% (p < 0.001), and the Tone-Vowel group improved from 54.5% to 66.6% correct (p = 0.000), whereas scores in the Tone-Only group did not significantly change (50.0% to 47.9%; p = 0.475). Additionally, post-test vowel recognition scores were significantly higher in the Vowel-Only group compared to Tone-Only group (p = 0.003), and in the Tone-Vowel group compared to Tone-Only group (p = 0.024). The mean post-test vowel recognition scores were 47.9% for the Tone-Only group, 71.9% for the Vowel-Only group, and 68.3% for the Tone-Vowel group.

Mandarin sentence recognition

A linear mixed model analysis revealed a significant main effect of the training group (F2,28 = 3.55; p = 0.042) and the testing session (F1,28 = 4.64; p = 0.040) on Mandarin sentence recognition. However, no significant interaction was observed between the training group and testing session. Pairwise comparisons indicated that Mandarin sentence recognition scores significantly improved after training for the Tone-Vowel group (p = 0.027), with mean scores increasing from 69.1% to 74.3%. In contrast, the Tone-Only and Vowel-Only training groups did not show significant improvements. Notably, the pre-test Mandarin sentence recognition score for the Tone-Vowel group (69.1%) was significantly lower than that of the Tone-Only group (80.0%); p = 0.019.

DISCUSSIONS

This study investigated whether Mandarin tone recognition training with CI simulation improves Mandarin vowel recognition, and whether Mandarin vowel recognition training enhances Mandarin tone recognition. Additionally, it examined the effectiveness of a combined task practice approach to improve both Mandarin tone and vowel recognition, as well as examined the transfer of these improvements to Mandarin sentence recognition. The key findings are as follows. Firstly, targeted Mandarin tone recognition training significantly improved Mandarin tone recognition. However, these improvements did not transfer to Mandarin vowel recognition. Similarly, targeted Mandarin vowel recognition training significantly enhanced Mandarin vowel recognition. However, similar to the previous finding, it did not transfer to Mandarin tone recognition. On the other hand, combined targeted training in Mandarin tone and vowel recognition demonstrated a significant improvement in both Mandarin tone and vowel recognition, as well as Mandarin sentence recognition. This finding suggests that a combined training approach can simultaneously enhance multiple tasks. However, the significance of interactions between the training group and training session is limited, indicating that the improvement in Mandarin sentence recognition observed in the Tone-Vowel group may not necessarily be attributed to the specific training effect, but rather could be influenced by other factors, such as baseline differences.

In line with our expectations, only near (or close) transfer was observed as the Tone-Only group showed significant improvements in Mandarin tone recognition, and the Vowel-Only group demonstrated significant improvement in Mandarin vowel recognition. These findings confirm that targeting auditory training leads to improvements in the specific tasks that were directly trained, aligning with findings from previous research (Ingvalson et al., 2013; Kim et al., 2021; Wu et al., 2007; Zhang et al., 2021). Such improvements did not extend to untrained tasks, despite sharing the same stimuli, which has also been observed in earlier studies (Grimault et al., 2003; Meinhardt, 2002; Sigman & Gilbert, 2000; Wright et al., 2010; Wright & Zhang, 2009). These observations highlight that improvements resulting from targeted perceptual auditory training are primarily derived from task-specific practice rather than mere exposure to the stimuli themselves (Brace & Sussman, 2021; Kim et al., 2021; Wright et al., 2010).

The Tone-Vowel group exhibited significant improvements in both Mandarin tone and vowel recognition, highlighting the potential of combined task practice training to enhance the learning of multiple tasks simultaneously. This finding aligns with previous research by Wright et al.(2010), Maidment et al.(2015), and Szpiro et al.(2014), which demonstrated the effectiveness of combined task practice in improving performances across multiple domains. For example, in a study conducted by Szpiro et al.(2014), 28 participants underwent training to discriminate Gabor patches with either higher visual spatial-frequency, more clockwise orientation, or both. The results showed that combined training in both spatial-frequency and orientation led to improvements in both areas, whereas training in the orientation task alone did not result in significant gains, despite receiving an equivalent amount of training as combined training condition.

Despite the Tone-Vowel group receiving only half the number of tone recognition trials compared to the Tone-Only group, and half the number of Mandarin vowel recognition training trials as compared to the Vowel-Only group, the improvement in both Mandarin tone and vowel recognition was found to be equivalent. This phenomenon can be explained by the understanding that learning is not a direct result of stimulus exposure but also requires a sufficient level of task-specific learning to trigger the sensitization of neural processes. Once these processes reach a sensitized state, additional stimulus exposure or practicing different tasks with the same stimulus can further enhance performance in the tasks being trained (Szpiro et al., 2014; Wright et al., 2010). As a result, similar improvements were observed with only half the amount of training.

Regarding Mandarin sentence recognition, improvements were observed only in the Tone-Vowel group. However, the initial Mandarin sentence recognition score for this group was significantly lower than the Tone-Only group, suggesting the possibility of a “ceiling effect”. The effect refers to the limited potential for improvement when baseline proficiency level is already high, as appeared to be the case for the Tone-Only group. The presence of the ceiling effect underscores the importance of considering baseline performance levels when interpreting the effectiveness of training interventions. Further investigation, such as incorporating background noise into the Mandarin sentence recognition task, could provide additional insights of combined Tone-Vowel training.

This study found that, with the same amount of training, combined Mandarin tone and vowel recognition training improves both tone and vowel recognition, while training focused solely on either tone or vowel recognition enhances performance only in the specific task being trained. The CI simulation in this study effectively demonstrated the benefits of a combined auditory training approach, as well as both near and far transfer resulting from joint bottom-up training. However, the study did not account for the variability among real CI users, such as biographical, etiological, surgical, and biological factors. Given that targeted auditory training-focusing on vowel, consonant, tone recognition, and melodic contour identification-has shown transfer effects on speech perception in both English-speaking CI users (Fu et al., 2005; Green et al., 2019; Ingvalson et al., 2013; Stacey et al., 2010; Zhang et al., 2012) and Mandarin-speaking CI users (Cheng et al., 2018; Wu et al., 2007; Zhang et al., 2021), a combined approach to auditory training may accelerate more effective outcomes in real CI users. Further studies involving real CI users are needed to provide a more comprehensive understanding of auditory training outcomes.

Notes

Ethical Statement

This study was approved by the Institutional Review Board of Arizona State University (STUDY00018133).

Declaration of Conflicting Interests

There are no conflict interests.

Funding

The research was supported by Arizona State University, the Southern University of Science and Technology of China, and a Gorilla grant.

Author Contributions

Conceptualization: Seeon Kim, Xin Luo. Data collection: Qi Gao. Formal analysis: Seeon Kim. Funding acquisition: Fei Chen, Xin Luo. Writing: Seeon Kim. Approval of final manuscript: all authors.

Acknowledgements

We appreciate the time and effort of all participants in this study.

References

1. Anwyl-Irvine A., Massonnié J., Flitton A., Kirkham N., Evershed J.. 2020;Gorilla in our midst: An online behavioral experiment builder. Behavior Research Methods 52:388–407.
2. Arslan N. O., Luo X.. 2022;Assessing the relationship between pitch perception and neural health in cochlear implant users. Journal of the Association for Research in Otolaryngology 23(6):875–887.
3. Brace K. M., Sussman E. S.. 2021;The role of attention and explicit knowledge in perceiving bistable auditory input. Psychophysiology 58(9):e13875.
4. Cabrera L., Liu H. M., Granjon L., Kao C., Tsao F. M.. 2019;Discrimination and identification of lexical tones and consonants in Mandarin-speaking children using cochlear implants. The Journal of the Acoustical Society of America 146(4):2291–2302.
5. Chen J. K. C., Chuang A. Y. C., McMahon C., Tung T. H., Li L. P. H.. 2014;Contribution of nonimplanted ear to pitch perception for prelingually deafened cochlear implant recipients. Otology and Neurotology 35(8):1409–1414.
6. Cheng X., Liu Y., Shu Y., Tao D. D., Wang B., Yuan Y., et al. 2018;Music training can improve music and speech perception in pediatric mandarin-speaking cochlear implant users. Trends in Hearing 22:1–12.
7. Friesen L. M., Shannon R. V., Baskent D., Wang X.. 2001;Speech recognition in noise as a function of the number of spectral channels: Comparison of acoustic hearing and cochlear implants. The Journal of the Acoustical Society of America 110(2):1150–1163.
8. Fu Q. J., Galvin J. J.. 2003;The effects of short-term training for spectrally mismatched noise-band speech. The Journal of the Acoustical Society of America 113(2):1065–1072.
9. Fu Q. J., Galvin J. J., Wang X., Nogaki G.. 2005;Moderate auditory training can improve speech performance of adult cochlear implant patients. Acoustics Research Letters Online 6(3):106–111.
10. Fu Q. J., Shannon R. V.. 1998;Effects of amplitude nonlinearity on phoneme recognition by cochlear implant users and normal-hearing listeners. The Journal of the Acoustical Society of America 104(5):2570–2577.
11. Green T., Faulkner A., Rosen S.. 2019;Computer-based connected-text training of speech-in-noise perception for cochlear implant users. Trends in Hearing 23:2331216519843878.
12. Greenwood D. D.. 1990;A cochlear frequency‐position function for several species—29 years later. The Journal of the Acoustical Society of America 87(6):2592–2605.
13. Grimault N., Micheyl C., Carlyon R. P., Bacon S. P., Collet L.. 2003;Learning in discrimination of frequency or modulation rate: Generalization to fundamental frequency discrimination. Hearing Research 184(1-2):41–50.
14. Ingvalson E. M., Lee B., Fiebig P., Wong P. C.. 2013;The effects of short-term computerized speech-in-noise training on postlingually deafened adult cochlear implant recipients. Journal of Speech, Language, and Hearing Research 56(1):81–88.
15. Kim S., Chou H. H., Luo X.. 2021;Mandarin tone recognition training with cochlear implant simulation: Amplitude envelope enhancement and cue weighting. The Journal of the Acoustical Society of America 150(2):1218–1230.
16. Loebach J. L., Pisoni D. B.. 2008;Perceptual learning of spectrally degraded speech and environmental sounds. The Journal of the Acoustical Society of America 123(2):1126–1139.
17. Luo X., Fu Q. J., Wei C. G., Cao K. L.. 2008;Speech recognition and temporal amplitude modulation processing by mandarin-speaking cochlear implant users. Ear and Hearing 29(6):957–970.
18. Luo X., Kolberg C., Pulling K. R., Azuma T.. 2020;Psychoacoustic and demographic factors for speech recognition of older adult cochlear implant users. Journal of Speech, Language, and Hearing Research 63(6):1712–1725.
19. Maidment D. W., Kang H., Gill E. C., Amitay S.. 2015;Acquisition versus consolidation of auditory perceptual learning using mixed-training regimens. PLoS One 10(3):e0121953.
20. Meinhardt G.. 2002;Learning to discriminate simple sinusoidal gratings is task specific. Psychological Research 66(2):143–156.
21. Nasreddine Z. S., Phillips N. A., Bédirian V., Charbonneau S., Whitehead V., Collin I., et al. 2005;The montreal cognitive assessment, moca: A brief screening tool for mild cognitive impairment. Journal of the American Geriatrics Society 53(4):695–699.
22. Nie K., Barco A., Zeng F. G.. 2006;Spectral and temporal cues in cochlear implant speech perception. Ear and Hearing 27(2):208–217.
23. Peng K. A., Kuan E. C., Hagan S., Wilkinson E. P., Miller M. E.. 2017;Cochlear nerve aplasia and hypoplasia: Predictors of cochlear implant success. Otolaryngology-Head and Neck Surgery 157(3):392–400.
24. Peng S. C., Tomblin J. B., Turner C. W.. 2008;Production and perception of speech intonation in pediatric cochlear implant recipients and individuals with normal hearing. Ear and Hearing 29(3):336–351.
25. Schumann A., Serman M., Gefeller O., Hoppe U.. 2015;Computer-based auditory phoneme discrimination training improves speech recognition in noise in experienced adult cochlear implant listeners. International Journal of Audiology 54(3):190–198.
26. Shannon R. V., Zeng F. G., Kamath V., Wygonski J., Ekelid M.. 1995;Speech recognition with primarily temporal cues. Science 270(5234):303–304.
27. Sigman M., Gilbert C. D.. 2000;Learning to find a shape. Nature Neuroscience 3(3):264–269.
28. Stacey P. C., Raine C. H., O’Donoghue G. M., Tapper L., Twomey T., Summerfield A. Q.. 2010;Effectiveness of computer-based auditory training for adult users of cochlear implants. International Journal of Audiology 49(5):347–356.
29. Szpiro S. F., Wright B. A., Carrasco M.. 2014;Learning one task by interleaving practice with another task. Vision Research 101:118–124.
30. Wang R. H.. 1993. The standard Chinese database (Unpublished master’s thesis) China: University of Science and Technology of China.
31. Wilson B. S., Finley C. C., Lawson D. T., Wolford R. D., Eddington D. K., Rabinowitz W. M.. 1991;Better speech recognition with cochlear implants. Nature 352(6332):236–238.
32. Wright B. A., Sabin A. T., Zhang Y., Marrone N., Fitzgerald M. B.. 2010;Enhancing perceptual learning by combining practice with periods of additional sensory stimulation. Journal of Neuroscience 30(38):12868–12877.
33. Wright B. A., Zhang Y.. 2009;A review of the generalization of auditory learning. Philosophical Transactions of the Royal Society B: Biological Sciences 364(1515):301–311.
34. Wu J. L., Yang H. M., Lin Y. H., Fu Q. J.. 2007;Effects of computer-assisted speech training on mandarin-speaking hearing-impaired children. Audiology and Neurotology 12(5):307–312.
35. Zhang T., Dorman M. F., Fu Q. J., Spahr A. J.. 2012;Auditory training in patients with unilateral cochlear implant and contralateral acoustic stimulation. Ear and Hearing 33(6):e70–e79.
36. Zhang H., Ding H., Zhang Y.. 2021;High-variability phonetic training benefits lexical tone perception: An investigation on mandarin-speaking pediatric cochlear implant users. Journal of Speech, Language, and Hearing Research 64(6):2070–2084.

Article information Continued

Figure 1.

Overview of daily training regimens for the Tone-Only, Vowel-Only, and Tone-Vowel groups. Participants engaged in 1-hour training sessions for 4 days, with each block consisting of 120 trials, featuring six vowels and four tones spoken by five different talkers. Both the pre- and post-tests are consistent with the separate mandarin tone, vowel, and sentence recognition tests.

Figure 2.

Percent correct recognition of tone (left), vowel (middle), and sentence (right) in the pre- and post-tests across the training groups: Tone-Only (red), Vowel-Only (blue), and Tone-Vowel (green).