Pankesh Bamotra

Data Scientist, Coupang

I am a data scientist and I work on image classification, visual search, and object detection.

Projects | CV pdf


2020 lockdown blues inspired me to create, draw, sketch, cut, and fold. I'm lovin' it.



I love to read books ranging from political thought, economics, philosophy, and ofcourse machine learning & programming. For that matter, this list is never-ending.

Notes coming soon | Books wishlist

📅 Weak logs *

June      1      7

Latest posts

18 March 2021

Working with broken images in Pytorch

Too often I’ve found myself in this problem with Pytorch where the dataloader doesn’t work because there’s a bad image in the dataset. One solution would definitely be to write a module that loads each image and then deletes the bad ones. But, I wanted something elegant and the following code is an attempt at smoothly ignoring the bad images in batches while also being able to process non-RGB images.

18 May 2019

Efficiently processing large image datasets in Python

I have been working on Computer Vision projects for some time now and moving from NLP domain the first thing I realized was that image datasets are yuge! I typically process 500GiB to 1TB of data at a time while training deep learning models. Out of the box, I rely on using ImageFolder class of Pytorch but disk reads are so slow (innit?). I was reading through open source projects