High Dimensional Heterogeneous Data based Animation Techniques for Southeast Asian Intangible Cultural Heritage Digital Content

Introduction HMI team Secondments Data Methods


The diversity of cultural heritage in Southeast Asia is not only found in museums, temples and archaeological sites, but also in present daily culture, where traditional forms of cultural heritage is passed on from their ancestors for many generations. The United Nations Educational, Scientific and Cultural Organisation (UNESCO) now lists many of the living traditional art forms in these countries as Intangible Cultural Heritages (ICH) including traditional folk puppetry, dances, and local operas. Many of these are in need of urgent safeguarding as some of them are at the verge of extinction.

Granted by Horizon 2020 program of European Commission, the European and Asean partners (National Centre for Computer Animation, Bournemouth University, UK; Centre de Recherche en Informatique de Lens-CNRS at University of Artois, France; Centre de Recherche en Informatique de Lens-CNRS University of Artois, France; Human Machine Interaction Laboratory (HMI) at Faculty of Information Technology, University of Engineering and Technology, Vietnam National University, Hanoi, Vietnam; College of Arts Media and Technology at Chiang Mai University, Thailand; College of ICT at Can Tho University, Vietnam; Vision, Virtual and Visualization Lab at Universiti Teknologi Malaysia, Malaysia) have teamed to construct this project. In this project, we mainly focus on the performing arts related ICHs, particularly traditional dances and folk puppetry. Our professional strengths and expertise in computer animation and visualization can offer effective measures for preserving the traditional art forms involving dynamic movements.

The overall aim of this project is two-fold. The first is to develop novel computer animation/visualization algorithms and tools to preserve and exploit the performing arts related to intangible cultural heritage in Southeast Asia. The development of these techniques will bring technical innovation in computer animation and ICH preservation fields, including data capture and digitalization; digital asset/data access, reuse and management; artistic style learning and transfer. The second is to promote knowledge exchange between EU and Southwest Asian partners, with a view to fostering the researchers involved. Their career will be advanced through learning new knowledge from the other partners; participating in technological developments; and publishing their innovative research in prestigious conferences and journals.

HMI team

No Work title People Operation in
1 2D&3D skeleton creation from 2D videos Prof. Nguyen Thanh Thuy, Dr. Ma Thi Chau, Dr. Nong Thi Hoa, Hoang Trung Kien, Nguyen Kim Hung CRIL, France
2 3D skeleton creation from MoCap and MS Kinect Assoc. Prof. Le Thanh Ha, Nguyen Xuan Thanh HMI, VNU-UET
3 Style learning Assoc. Prof. Bui The Duy, Dr. Tran Quoc Long, Le Thanh Ha, Nguyen Hoang Anh, Nguyen Xuan Thanh, Nguyen Phuc Loi NCCA, UK
4 Skeleton and style model database Dr. Vu Thi Hong Nhan, Nguyen Dinh Vuong HMI, VNU-UET
5 3D Model modeling and presentation Dr. Ngo Thi Duyen, Le Truong Giang HMI, VNU-UET



No Fullname Host Period
1 Dr. Tran Quoc Long NCCA-BU, UK 2 months
(4/15/2016 – 6/15/2016)
2 Dr. Ma Thi Chau CRIL-Artois, France 7 months (4/15/2016 – 1/15/2017)
3 MSc. Nguyen Xuan Thanh NCCA-BU, UK 12 months (7/15/2016 – 7/15/2017)
4 Prof. Nguyen Thanh Thuy CRIL-Artois, France 3 months (10/15/2016 – 1/15/2017)
5 Assoc. Prof. Le Thanh Ha CRIL-Artois, France 1 month (10/15/2016 – 11/15/2017)
6 Assoc. Prof. Le Thanh Ha NCCA-BU, UK 2 months (11/15/2016 – 1/15/2017)
7 Prof. Changhong Liu from Bournemounth University HMI 10 days (1/11/2017 – 1/20/2017)
8 MSc. Do Khac Phong NCCA-BU, UK 12 months (7/15/2017 – 7/15/2018)
9 Dr. Ma Thi Chau NCCA-BU, UK 2 months (15/10/2017 – 15/12/2017)
10 MSc. Nguyen Ba Tung NCCA-BU, UK 12 months (1/12/2017 – 1/12/2018)
11 Assoc. Prof. Sylvain Lagrue from Université d’Artois HMI 9 months (15/10/2017 – 15/7/2018)
12 Assoc. Prof. Nathalie Chetcuti-Sperandio from Université d’Artois HMI 9 months (15/10/2017 – 15/7/2018)


Coming soon.



Figure 1. General framework

Create 2D skeleton from videos

This module automatically extract 2D human poses motion from 2D color videos utilizing the method proposed in [1]. This method proposes a partitioning and labeling formulation of a set of body-part hypotheses generated with CNN-based part detectors. Two CNN variants are proposed to generate representative sets of body part candidates. Then, the problem of estimating articulated poses of unknown number of people is cast as an optimization problem. The goal of this formulation is to select a subset of body parts from a set of body part candidates as nodes of a graph, label each selected body part with one class, e.g., “arm”, “leg”, “torso”, and partitioning body parts that belong to same person.

– A motion capture dataset for training a Convolutional Neural Network.
– 2D videos

– 2D skeleton of video characters

This video demonstrates 2D skeleton extraction and tracing in a dance video with complicated background.

2D skeleton editor

Complex dance may results in inaccuracies of human joints detection and trace. Therefore, we implemented a tool to correct the skeleton motion which is automatically extracted from 2D video by a 2D Deepcut module [1]. This tool supports GUI for manually editing 2D motion skeleton frame by frame. It support:
– Read a 2D video and the correspond extracted 2D skeleton motion.
– Show input frame by frame, allow user manually edit 2D skeleton by dragging any joints of skeleton independently.
– Save edited 2D skeleton.

This video demonstrates the skeleton movement editor.

3D skeleton from 2D skeleton

This module estimates 3D human pose from 2D human pose. First, this method [2] learns a joint limit pose-dependent model from a shared motion capture dataset. Then, we follow a general parameterization of body pose and a new, multi-stage, method to estimate 3D pose from 2D joint locations using an over-complete dictionary of poses. At the end, applying the joint limit pose-dependent model to eliminates impossible poses.

– A motion capture dataset for training a joint limit pose-dependent model
– 2D skeleton motion
– 3D skeleton motion

3D skeleton from Microsoft Kinect

Coming soon

3D skeleton from Motion Capture Device

Coming soon

Style learning and transfer

A deep learning framework is used to extract human motion styles from skeleton movement and generate new human motions based on high-level parameters. Firstly, human motion manifold are trained on a large and free motion capture dataset using a convolutional auto-encoder. Then, motion style is defined by Gram matrix G, which is the average similarity or co-activation of the trained manifold. Secondly, to map from high-level parameters to the motion manifold, we stack a deep feed forward and a disambiguity neural network on top of the trained auto-encoder. This network is trained to produce realistic and smooth motion sequences from parameters such as a curve over the terrain that the character should follow, or a target location for punching and kicking. Combining trained motion manifold and trained feed-forward, disambiguation networks, the framework can learn and transform motion style smoothly.In details, given motion data C which contains the content, and another motion data S which contains the style of the desired output, the cost function over motion manifold H is given as the following:

Style(H) = s*|G(S) – G(H)| + c*|C – H|,

where s and c are the relative importance given to content and style. By minimizing Style(H), we found a H that has content of C and style of S. Backpropagating H on trained motion manifold network, a realistic motion respected to H is generated.

– A database of variety 3D human motion sequences.
– A motion sequence X contains expected style.
– A motion sequence Y contains expected content.
– A motion that has content of X and style of Y

This video demonstrate the visual results when applying zombie style to a neutral human motion sequence.


[1] Leonid Pishchulin, Eldar Insafutdinov, Siyu Tang, Bjoern Andres, Mykhaylo Andriluka, Peter Gehler, and Bernt Schiele. 2016. DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR2016).

[2]  Akhter, I., Black, M.J. 2015. Pose-conditioned joint angle limits for 3D human pose reconstruction. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR2015).

[3] Daniel Holden, Jun Saito, and Taku Komura. 2016. A deep learning framework for character motion synthesis and editing. ACM Trans. Graph. 35, 4, Article 138 (July 2016), 11 pages.