Abstract
The reconstruction of 3D objects from monocular images is an active field of research in 3D computer vision which is further boosted by advancements in deep learning. In context of human body,
modeling realistic 3D virtual avatars from 2D images is a recent trend, thanks to the advent of AR/VR
& metaverse. The problem is challenging, owing to non-rigid nature of human body, especially because
of the garments. Various attempts have been made to solve the problem, at least for relatively tighter
clothing styles, but loose clothing styles still pose a huge challenge. This problem has also sparked quite
an interest in the fashion e-commerce domain, where the objective is to model the 3D garments, independent from the underlying body, in order to enable intriguing applications like virtual try-on systems.
3D garment digitization has been garnering a lot of interest in the past few years, as the demand for
online window-shopping and other e-commerce activities has increased in the recent years, where the
unfortunate crisis of COVID-19 plays a huge role.
Though the problem of 3D digitization of garments seems intriguing, solving it is not as straightforward as it looks. There are existing works out there in the field, majority of which are deep learning
based solutions. Most of these methods rely on predefined garment templates which makes the task of
texture synthesis easier, but restrict the usage to a fixed number of garment styles for which templates are
available. Additionally, these methods do not deal with issues like complex poses and self-occlusions
which are very common under in-the-wild assumption. Template-free methods are also explored which
enables modeling arbitrary clothing styles, however, they lack texture information which is essential for
high-quality photorealistic appearance. The thesis aims to resolve aforementioned issues by providing
novel solutions. The main objective is 3D digitization of garments from a monocular RGB image of a
person wearing the garment, both in template-based and template-free settings.
Initially, we address challenges in existing state-of-the-art template-based methods. We aim to handle complex human poses, occlusions etc. by proposing to use a robust keypoint regressor which estimates keypoints on input monocular image. These keypoints define thin-plate-spline (TPS) based
warping of texture from input image to the UV space of a predefined template. Then, we utilize a deep
inpainting network to handle missing texture information. In order to train these neural networks, we
curate a synthetic dataset of garments with varying textures, draped on 3D human characters in various
complex poses. This dataset helps in robust training and generalization to real images. We achieve
state-of-the-art results for specific clothing styles (e.g. t-shirt and trouser). However, template-based methods cannot model any arbitrary garment style. Therefore, we next aim to handle arbitrary garment
styles in a template-free setting.
Existing state-of-the-art template-free methods can model geometrical details of arbitrary garment
styles up to some extent, but fail to recover texture information. To model arbitrary geometry of garments, we propose to use an explicit, sparse representation introduced for modeling human body. This
representation handles self-occlusion and loose clothing as well. We extend this representation by introducing semantic segmentation information for differentiating between various clothing styles (top wear
/bottom wear) and human body present in the input image. Furthermore, this representation is exploited
in a novel way to provide seams for texture mapping, thereby retaining high-quality textural details
and providing way to lot of useful applications like texture editing, appearance manipulation, texture
super-resolution etc. The proposed method is the first one to model arbitrary garment styles and recover
textures as well.
We evaluate our proposed solutions on various publicly available datasets, outperforming existing
state-of-the-art methods. We also discuss the limitations in the proposed methods and provide potential
solutions that can be explored. Finally, we discuss the future extensions of the proposed methods.
We believe this thesis significantly improves the research landscape in 3D garment digitization and
accelerates the progress in this direction.