Text this: 3D motion and skeleton construction from monocular video