arXiv:2406.17148v1 Announce Type: new
Abstract: In LaTeX text recognition using Transformer-based architectures, this paper identifies certain “bias” issues. For instance, $e-t$ is frequently misrecognized as $e^{-t}$. This bias stems from the inherent characteristics of the dataset. To mitigate this bias, we propose a LaTeX printed text recognition model trained on a mixed dataset of pseudo-formulas and pseudo-text. The model employs a Swin Transformer as the encoder and a RoBERTa model as the decoder. Experimental results demonstrate that this approach reduces “bias”, enhancing the accuracy and robustness of text recognition. For clear images, the model strictly adheres to the image content; for blurred images, it integrates both image and contextual information to produce reasonable recognition results.
Source link
lol