Text this: Vision-based human detection by fine-tuned SSD models