TY - GEN
T1 - GINA: Group Gender Identification Using Privacy-Sensitive Audio Data
AU - SHEN, Jiaxing
AU - LEDERMAN, Oren
AU - CAO, Jiannong
AU - BERG, Florian
AU - TANG, Shaojie
AU - PENTLAND, Alex Sandy
N1 - The work is completed during the visit of the first author to MIT Media Lab. It was partially supported by the funding for Project of Strategic Importance provided by The Hong Kong Polytechnic University (Project Code: 1-ZE26). It was also supported by demonstration project on large data provided by The Hong Kong Polytechnic University (project account code: 9A5V) and NSFC Key Grant with Project No. 61332004.
PY - 2018
Y1 - 2018
N2 - Group gender is essential in understanding social interaction and group dynamics. With the increasing privacy concerns of studying face-to-face communication in natural settings, many participants are not open to raw audio recording. Existing voice-based gender identification methods rely on acoustic characteristics caused by physiological differences and phonetic differences. However, these methods might become ineffective with privacy-sensitive audio for two main reasons. First, compared to raw audio, privacy-sensitive audio contains significantly fewer acoustic features. Moreover, natural settings generate various uncertainties in the audio data. In this paper, we make the first attempt to identify group gender using privacy-sensitive audio. Instead of extracting acoustic features from privacy-sensitive audio, we focus on conversational features including turn-taking behaviors and interruption patterns. However, conversational behaviors are unstable in gender identification as human behaviors are affected by many factors like emotion and environment. We utilize ensemble feature selection and a two-stage classification to improve the effectiveness and robustness of our approach. Ensemble feature selection could reduce the risk of choosing an unstable subset of features by aggregating the outputs of multiple feature selectors. In the first stage, we infer the gender composition (mixed-gender or same-gender) of a group which is used as an additional input feature for identifying group gender in the second stage. The estimated gender composition significantly improves the performance as it could partially account for the dynamics in conversational behaviors. According to the experimental evaluation of 100 people in 273 meetings, the proposed method outperforms baseline approaches and achieves an F1-score of 0.77 using linear SVM.
AB - Group gender is essential in understanding social interaction and group dynamics. With the increasing privacy concerns of studying face-to-face communication in natural settings, many participants are not open to raw audio recording. Existing voice-based gender identification methods rely on acoustic characteristics caused by physiological differences and phonetic differences. However, these methods might become ineffective with privacy-sensitive audio for two main reasons. First, compared to raw audio, privacy-sensitive audio contains significantly fewer acoustic features. Moreover, natural settings generate various uncertainties in the audio data. In this paper, we make the first attempt to identify group gender using privacy-sensitive audio. Instead of extracting acoustic features from privacy-sensitive audio, we focus on conversational features including turn-taking behaviors and interruption patterns. However, conversational behaviors are unstable in gender identification as human behaviors are affected by many factors like emotion and environment. We utilize ensemble feature selection and a two-stage classification to improve the effectiveness and robustness of our approach. Ensemble feature selection could reduce the risk of choosing an unstable subset of features by aggregating the outputs of multiple feature selectors. In the first stage, we infer the gender composition (mixed-gender or same-gender) of a group which is used as an additional input feature for identifying group gender in the second stage. The estimated gender composition significantly improves the performance as it could partially account for the dynamics in conversational behaviors. According to the experimental evaluation of 100 people in 273 meetings, the proposed method outperforms baseline approaches and achieves an F1-score of 0.77 using linear SVM.
KW - Gender detection
KW - Group gender identification
KW - Nonlinguistic audio analysis
UR - http://www.scopus.com/inward/record.url?scp=85061358045&partnerID=8YFLogxK
U2 - 10.1109/ICDM.2018.00061
DO - 10.1109/ICDM.2018.00061
M3 - Conference paper (refereed)
AN - SCOPUS:85061358045
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 457
EP - 466
BT - 2018 IEEE International Conference on Data Mining (ICDM)
PB - IEEE
T2 - 18th IEEE International Conference on Data Mining, ICDM 2018
Y2 - 17 November 2018 through 20 November 2018
ER -