基于3D CNN的大规模视频手势识别研究

发布时间：2017-08-01 作者：苗启广,李宇楠,徐昕

[摘要] 提出了一种基于三维卷积神经网络（CNN）的大规模视频手势识别算法。首先，为了获得统一尺度的输入数据，在时域上对所有输入视频进行了归一化处理得到32帧的输入视频；然后，为了从不同的角度描述手势特征，通过真彩（RGB）视频数据生成了光流视频，并将RGB视频和光流视频分别通过C3D模型（一个3D CNN模型）提取特征，并通过特征连接的方式加以融合输入到支持向量机（SVM）分类器来提高识别性能。该方法在Chalearn LAP 独立手势数据集（IsoGD）的验证集上达到了46.70%的准确率。

[关键词] 手势识别；三维卷积神经网络；光流；SVM

[Abstract] In this paper, an effective 3D convolutional neural network(CNN)-based method for large-scale gesture recognition is proposed. To obtain compact and uniform data for training and feature extracting, the inputs are unified into 32-frame videos. To describe features of gesture in different aspects, the optical flow data from red, green, blue (RGB) videos are generated. After that, the spatiotemporal features of RGB and optical flow data are extracted with the C3D model (a 3D CNN model) respectively and blended together in the next stage to boost the performance. Finally, the classes are predicted with a linear support vector machine (SVM) classifier. Our proposed method achieves 46.70% accuracy on the validation set of ChalearnLAP Isolated Gesture Dataset (IsoGD).

[Keywords] gesture recognition; 3D CNN; optical flow; SVM

下载阅览： PDF

本期相关文章

移动通信Small Cell基站供电解决方案与趋势分析

深度学习的能与不能

当深度学习遇到大视频数据

关于人机对话系统的思考

机器学习在大视频运维中的应用

基于深度学习的多目标跟踪算法研究

车辆属性识别及跨场景标注方案

领域自适应目标识别综述

深度神经网络学习的结构基础：自动编码器与限制玻尔兹曼机

深度学习进展及其在图像处理领域的应用