Squeezing image loading performance using libjpeg in Pytorch
Sometimes in your project you seek cheap thrills out of changing couple of lines of code. This is what happened when I came across the libjpeg-turbo project. The turbo version of libjpeg boasts to be 2-6x faster in performing image processing operations. Performance is what everyone wants, right?
I work with Pytorch and so I started looking for Python bindings for libjpeg-turbo. Fortunately, Github user @ajkxyz has provided cffi bindings for libjpeg-turbo. Great! As per the author, in single threaded mode, we should expect a 30% improvement in image loading operations using these bindings. I was all in. 👀
Using these Python bindings is pretty straight-forward, so I went ahead to implement an end-to-end Pytorch dataloader that uses libjpeg-turbo backend to load images. Further, I also numpy-fied the random horizontal and vertical flipping operations to avoid expensive PIL.Image calls. For the code below, I assume that the image folder is flat and contain images with same dimensions.
I use a custom collate function because the in-built toTensor is a pretty involved function which slows down the whole array to tensor creation part. This collate function has been borrowed from the Nvidia/apex which lets you do half-precision training. Let’s keep that discussion for another post.