Please ensure Javascript is enabled for purposes of website accessibility

Pankesh Bamotra

Data Scientist, Coupang

I am a data scientist and I work on image classification, visual search, and object detection.

About | Projects | CV pdf


2020 lockdown blues inspired me to create, draw, sketch, cut, and fold. I'm lovin' it.



I love to read books ranging from political thought, economics, philosophy, and ofcourse machine learning & programming. For that matter, this list is never-ending.

Notes | Books wishlist


June      1      7      14      21      28
July      5      12      19      26
August      2      9      16      23      30
September      6      13      20      27

Latest posts

Working with broken images in Pytorch

Too often I’ve found myself in this problem with Pytorch where the dataloader doesn’t work because there’s a bad image in the dataset. One solution would definitely be to write a module that loads each image and then deletes the bad ones. But, I wanted something elegant and the following code is an attempt at smoothly ignoring the bad images in batches while also being able to process non-RGB images.


Efficiently processing large image datasets in Python

I have been working on Computer Vision projects for some time now and moving from NLP domain the first thing I realized was that image datasets are yuge! I typically process 500GiB to 1TB of data at a time while training deep learning models. Out of the box, I rely on using ImageFolder class of Pytorch but disk reads are so slow (innit?). I was reading through open source projects