基于3D CNN的大规模视频手势识别研究

发布时间:2017-08-01 作者:苗启广,李宇楠,徐昕 阅读量:

[摘要] 提出了一种基于三维卷积神经网络(CNN)的大规模视频手势识别算法。首先,为了获得统一尺度的输入数据,在时域上对所有输入视频进行了归一化处理得到32帧的输入视频;然后,为了从不同的角度描述手势特征,通过真彩(RGB)视频数据生成了光流视频,并将RGB视频和光流视频分别通过C3D模型(一个3D CNN模型)提取特征,并通过特征连接的方式加以融合输入到支持向量机(SVM)分类器来提高识别性能。该方法在Chalearn LAP 独立手势数据集(IsoGD)的验证集上达到了46.70%的准确率。

[关键词] 手势识别;三维卷积神经网络;光流;SVM

[Abstract] In this paper, an effective 3D convolutional neural network(CNN)-based method for large-scale gesture recognition is proposed. To obtain compact and uniform data for training and feature extracting, the inputs are unified into 32-frame videos. To describe features of gesture in different aspects, the optical flow data from red, green, blue (RGB) videos are generated. After that, the spatiotemporal features of RGB and optical flow data are extracted with the C3D model (a 3D CNN model) respectively and blended together in the next stage to boost the performance. Finally, the classes are predicted with a linear support vector machine (SVM) classifier. Our proposed method achieves 46.70% accuracy on the validation set of ChalearnLAP Isolated Gesture Dataset (IsoGD).

[Keywords] gesture recognition; 3D CNN; optical flow; SVM

下载阅览: PDF