Tokenization of vector graphics in the context of tactile graphics synthesis. Scientific Papers. Ukrainian Academy of Printing

Author(s)	Collection number	Pages	Download abstract	Download full text
Джуринський Є. А., Maik V. Z.	№ 2 (67)	11-20

Summary
References

In the field of inclusive publishing, the development of tactile illustration requires an appropriate set of competencies from the designer. Due to the relative shortage of personnel on the labor market among fine arts specialists who have knowledge related to the technical execution of convex-tactile graphics, the process of finding and placing such an employee at a publishing house is a non-trivial task, because it requires both time and financial costs. At the same time, publishing houses are forced to include an additional cost item, which is the training of such workers. The above applied problems can be solved with the help of information systems for the synthesis of tactile graphics, which, using the means of artificial intelligence, will partially or completely replace the designer of tactile illustrations. The work considers one of the stages of solving the problem of synthesis of tactile graphics, namely, the tokenization of vector graphics, which is perfectly suited as a format for presenting tactile graphics. Tokenization implies the representation of the original information (in this case – vector graphics) in another – more optimal representation, which can be used by a model based on artificial intelligence. The purpose of this research is to determine the expediency of using the technique of tokenization of tactile graphics in vector representation in the task of synthesizing tactile graphics. The paper considers two methods of tokenization, which differ in the architecture of the artificial intelligence model: a VAE-based model and a transformer-based model. Despite the fact that both models were primarily developed to solve other problems, nevertheless, the approach they use can be borrowed and adapted to the problem that is the subject of this study. The work provides an analysis of the listed models, with the determination of their advantages and disadvantages, and with the formalization of the principle of operation of these solutions. During the analysis, it is found that the considered models nullify the advantages of the vector representation of tactile graphics, which is primarily due to the low bandwidth of the considered models (at the same time, there are other circumstances that are given in the main part). At the end, the conclusion to which this study led is given, which is the impossibility of using the given approaches.

Keywords: information technology, artificial intelligence, model, tokenization technique, illustration requirements, image processing, tactile graphics, inclusive illustration, inclusive literature, Braille.

doi: 10.32403/1998-6912-2023-2-67-11-20

1. Dzhurynskyi, Ye. A., & Maik, V. Z. (2022). Analiz protsesu pidhotovky iliustratsii dlia inkliuzyvnoi literatury: Kvalilohiia knyhy, 1 (41), 7−15 (in Ukrainian).
2. Midjourney AI model tool for text-to-image conversion. Retrieved from https://www.midjourney.com/ (access date: 04/05/2023) (in English).
3. Stable Diffusion AI model tool for text-to-image conversion. Retrieved from https://stablediffusionweb.com/ (access date: 04/05/2023) (in English).
4. DALL·E 2 AI system that can create realistic images and art from a description in natural language. Retrieved from https://openai.com/product/dall-e-2/ (access date: 04/05/2023) (in English).
5. Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022). High-Resolution Image Synthesis with Latent Diffusion Models. Ludwig Maximilian University of Munich & IWR (in English).
6. Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., & Chen, M. (2022). Hierarchical Text-Conditional Image Generation with CLIP Latents. Doi: https://doi.org/10.48550/arXiv.2204.06125 (in English).
7. Oppenlaender, J. (2022). The Creativity of Text-to-Image Generation. In 25th International Academic Mindtrek conference (Academic Mindtrek 2022), November 16–18, 2022, Tampere, Finland. ACM, New York, NY, USA. Doi: https://doi.org/10.1145/3569219.3569352 (in English).
8. Dzhurynskyi, Ye. A., & Maik, V. Z. (2023). Pidhotovka iliustratsii dlia inkliuzyvnoi literatury za dopomohoiu modelei shtuchnoho intelektu syntezu zobrazhennia z tekstu: Naukovi zapysky [Ukrainskoi akademii drukarstva], 1 (66), 155−163 (in Ukrainian).
9. Diederik, P., & Welling, M. (2013). Auto-Encoding Variational Bayes. Universiteit van Amsterdam. Doi: https://doi.org/10.48550/arXiv.1312.6114 (in English).
10. Gontijo, Lopes R., Ha, D., Eck, D., & Shlens, J. (2019). A Learned Representation for Scalable Vector Graphics. Google Brain. Doi: https://doi.org/10.48550/arXiv.1904.02632 (in English).
11. Carlier, A., Danelljan, M., Alahi, A., & Timofte, R. (2020). DeepSVG: A Hierarchical Generative Network for Vector Graphics Animation. Ecole Polytechnique Fédérale de Lausanne. ETH Zurich. Doi: https://doi.org/10.48550/arXiv.2007.11301 (in English).
12. Graves, A. (2014). Generating Sequences With Recurrent Neural Networks. University of Toronto. Doi: https://doi.org/10.48550/arXiv.1308.0850 (in English).
13. Vaswani, A., Bengio, S., Brevdo, E., Chollet, F., N. Gomez, A., Gouws, S., & Jones, L. (2018). Tensor2Tensor for Neural Machine Translation. Google Brain. DeepMind. Doi: https://doi.org/10.48550/arXiv.1803.07416 (in English).
14. Hu, T., Chen, F., Wang, H., Li, J., Wang, W., Sun, J., & Li, Z. (2023). Complexity Matters: Rethinking the Latent Space for Generative Modeling. Hong Kong University of Science and Technology. National University of Singapore. Doi: https://doi.org/10.48550/arXiv.2307. 08283 (in English).
15. Hochreiter, Sepp, Schmidhuber, Jürgen. (1997). Long Short-term Memory. Neural computation (in English).
16. Hjorth, L. U., & Nabney, I. (1999). Regularisation of mixture density networks, 2. 521−526 (in English).
17. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., N. Gomez, A., Kaiser, L., & Polosukhin, I. (2023). Attention Is All You Need. Google Brain. Google Research. Doi: https://doi.org/10.48550/arXiv.1706.03762 (in English).
18. Ramer, Urs. (1972). An iterative procedure for the polygonal approximation of plane curves: Computer Graphics and Image Processing, 1 (3), 244–256. doi: 10.1016/S0146-664X(72) 80017-0 (in English).
19. Weaver, W. P. (2014). A More Excellent Way: Philip Melanchthon’s Corinthians Lectures of 1521-22. Renaissance and Reformation, 1, 31–63. Retrieved from http://www.jstor.org/stable/43446567 (in English).