Deep Learning Models and Code for Pose Estimation

The task of pose estimation aims to map human pixels of an RGB image or video to the 3D surface of the human body. Pose estimation is a multifacted task, and involves several other problems: object detection, pose estimation, segmentation, and more.
Applications of pose estimation include problems that require going beyond plain landmark localization, such as graphics, augmented reality (AR), or human-computer interaction (HCI). Pose estimation also involves many aspects of 3D-based object recognition.
In this post, we share several open sourced deep learning models and code for pose estimation. If we missed out an implementation that you think deserves to be shared, leave it in the comments below.
DensePose
GitHub | Dataset | Paper
【Deep Learning Models and Code for Pose Estimation】The inspiration for this post came from Facebook Research, who released their code, models, and dataset for DensePose earlier last week. Facebook shared DensePose-COCO, a large-scale ground-truth dataset for human pose estimation. The dataset consists of image-to-surface correspondences manually annotated on 50K COCO (Common Objects in Context) images. This is an amazingly comprehensive resource for deep learning researchers. It provides a good source of data for the task of pose estimation, part segmentation, and more.
The DensePose paper proposes DensePose-RCNN, a variant of Mask-RCNN, to densely regress part-specific UV coordinates within every human region at multiple frames per second. It is based on DenseReg. The goal of the model is to determine the surface location of each pixel, and its corresponding 2D paremeterization of the part it belongs to.
DensePose adopts the architecture of Mask-RCNN with the Feature Pyramid Network (FPN) features, and ROI-Align pooling. Additionally, they introduce a fully-convolutional network on top of the ROI-pooling. For more in-depth technical details, check out the DensePose paper.

OpenPose
GitHub | Dataset
OpenPose is a real-time multi-person keypoint detection library for body, face, and hands estimation by the CMU Perceptual Computing Lab.
OpenPose provides both 2D and 3D multi-person keypoint detection, as well as a calibration toolbox for estimation of domain specific parameters. OpenPose allows a wide variety of input: image, video, webcam, IP camera, and more. It also produces output in a wide variety of formats: images and keypoints (PNG, JPG, AVI), keypoint saving in readable formats (JSON, XML, YML), and even as an array class. Input and output parameters are also adjustable to suit a wide variety of needs.
OpenPose provides a C++ API, and works on both CPU and GPU - including versions compatible with AMD graphic cards.

Realtime Multi-Person Pose Estimation
GitHub | Paper
This implementation is highly related to OpenPose, and features models related to the implementation in a wide variety of frameworks. The authors of this paper present a bottom-up approach for realtime multi-person pose estimation, without using any person detector.
This approach uses a nonparametric representation, which we refer to as Part Affinity Fields (PAFs), to learn to associate body parts with individuals in the image. For more technical details about the implementation and theory, refer to the paper.
One of the best features of this approach is that is has been implemented in many different frameworks, and code and models are readily available for your framework of choice:

  • OpenPose C++ Library
  • TensorFlow implementation
  • Keras implementation one and two
  • PyTorch implementation one, two, and three
  • MXNet implementation

AlphaPose Deep Learning Models and Code for Pose Estimation
文章图片

GitHub | Paper
Alpha Pose is an accurate multi-person pose estimator, and claims to be the first open source system. AlphaPose performs both pose estimation and pose tracking on images, videos, or lists of images. It produces a variety of outputs, including image with keypoint displays in PNG, JPEG, and AVI formats, as well as keypoint output in JSON format, making it a great tool for more application focused uses.
Deep Learning Models and Code for Pose Estimation
文章图片

At present, there is both a TensorFlow implementation and a PyTorch implementation.
AlphaPose uses a regional multi-person pose estimation (RMPE) framework to facilitate pose estimation in the presence of inaccurate human bounding boxes. There are three components: Symmetric Spatial Transformer Network (SSTN), Parametric Pose Non-Maximum-Suppression (NMS), and Pose-Guided Proposals Generator (PGPG). For more technical details, refer to the paper.

Human Body Pose Estimation
Website | GitHub | Dataset | ArtTrack Paper | DeeperCut Paper
This code repository presents a TensorFlow implementation of the Human Body Pose Estimation algorithm, presented in the ArtTrack and DeeperCut papers. The model trained makes use of the MPII Human Pose Database, a rich collection of images for evaluation of articulated human pose estimation.
This project considers the task of articulated human pose estimation of multiple people in real world images. Their approach solves both the tasks of detection and pose estimation, which differs from previous approaches that first detect people and subsequently estimate their body pose. CNN-based part detectors and an integer linear program is used in their implementation. For more technical details, refer to the ArtTrack and DeeperCut papers.

DeepPose
Paper
DeepPose is a relatively older paper from 2014, that proposes a method for human pose estimation based on Deep Neural Networks (DNNs), formulated as a DNN-based regression problem towards body joints. It reasons about pose in a holistic fashion and has a simple but yet powerful formulation.
DeepPose does not appear to have an official implementation available online. However, there have been efforts to replicate its results:
  • Chainer implementation
  • TensorFlow implementation
DeepPose is interesting as it is the first application of deep learning to human pose estimation, and achieved state of the art results at the time of its inception, providing a baseline for many of the other more recent implementations.
Pose estimation is an increasingly popular problem within the computer vision community. With the recent release of new pose estimation datasets such as DensePose-COCO by Facebook Research, there now exists more resources for work in this area. In my opinion, there are many directions that you can take pose estimation, and the release of these resources is sure to spurge new interest in the field. Hopefully, we'll see many new and innovative ideas and implementations soon.
Did we miss your favorite model or implementation for pose estimation? Post it in the comments below, and we'll update the post accordingly!

    推荐阅读