Document Analysis and Recognition is one of the foundational services towards building Enterprise Knowledge Repository and Document Archive system. In this task I have studied various image processing techniques, extremely crucial for the successful execution of Optical Character Recognition system, and learned to create a image processing pipeline that feeds the transformed image data to Tesseract system, to extract text from images and digitize them.
Read More
Vision Transformers are the new state of the art deep learning models that marks a new advancement in the field of computer vision. The idea of Vit is first introduced in the paper An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale [4] paper. This paper introduced a transformer-based vision framework that applies the self-attention mechanism to perform image classification. The paper claims that large scale training with vision transformers can remove inductive biases, which is very common to appear in CNN based architectures.