Preparation of illustrations for inclusive literature with the help of artificial intelligence models of image from text synthesis. Scientific Papers. Ukrainian Academy of Printing

Author(s)	Collection number	Pages	Download abstract	Download full text
Джуринський Є. А., Maik V. Z.	№ 1 (66)	155-163

Summary
References

A significant problem in printed inclusive literature is the preparation of illustrations, which should convey the information to the reader that the author lays down in a graphic way. Considering the fact that the target audience of inclusive literature is people with visual impairments, it is worth remembering to comply with the requirements for such illustrations, taking into account both the technical limitations of print media and the peculiarities of the human tactile and nervous system. One of the pressing problems in the field of convex-tactile illustrations is the need to spend a large amount of time on the preparation of even one illustration. This problem is primarily related to the shortage of competent personnel with an education in the field of fine arts, who have the skills to prepare an image as a convex-tactile illustration in inclusive literature. In addition, the preparation of such an image is more likely to occur in an intuitive way, often determined by a specific editor or printer, which hinders the coherent development of illustration in inclusive literature. The tools of artificial intelligence capable of solving the given problems are considered. Among such tools there are the studies of image from text synthesis models: Midjourney, Stable Diffusion, DALL·E 2. These models are powerful tools capable of solving a wide range of problems, providing a high level of uniqueness and variability of results. The main advantage of such tools is the automation of the illustration preparation process. First of all, the automation of this process can solve one of the urgent problems of this field — the shortage of competent personnel. In addition, a big advantage of the automated approach is the significant saving of time – an illustration that could previously take hours or days to prepare in the traditional way, with the help of artificial intelligence, this task can be solved in a very short time (seconds or minutes). Experimental studies are conducted to determine the capabilities of various models of artificial intelligence for transforming text into images (illustrations) for inclusive literature, while observing the main principles and requirements for the process of preparing such illustrations. After carrying out such experiments, it is found that, despite the fact that the presented means are not able to fully prepare images as illustrations for inclusive literature, because they do not fully satisfy the requirements for such illustrations, such a method has potential and can be used in solving this problem, based on the principle of operation of such solutions.

Keywords: visually impaired people, printing, artificial intelligence, image synthesis, text-to-image conversion, Midjourney, Stable Diffusion, DALL·E 2, information technology, inclusive technology, inclusive literature, tactile literature, image, tactile illustration, requirements for illustrations, ergonomics.

doi: 10.32403/1998-6912-2023-1-66-155-163

1. Dzhurynskyi, Ye. A., & Maik, V. Z. (2022). Analiz protsesu pidhotovky iliustratsii dlia inkliuzyvnoi literatury: Kvalilohiia knyhy, 1 (41), 7−15 (in Ukrainian).
2. Midjourney AI model tool for text-to-image conversion. Retrieved from https://www.midjourney.com/ (access date: 04/05/2023) (in English).
3. Stable Diffusion AI model tool for text-to-image conversion. Retrieved from https://stablediffusionweb.com/ (access date: 04/05/2023) (in English).
4. DALL·E 2 AI system that can create realistic images and art from a description in natural language. URL: https://openai.com/product/dall-e-2/ (access date: 04/05/2023) (in English).
5. Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022). High-Resolution Image Synthesis with Latent Diffusion Models. Ludwig Maximilian University of Munich & IWR. doi: https://doi.org/10.48550/arXiv.2112.10752 (in English).
6. Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., & Chen, M. (2022). Hierarchical Text-Conditional Image Generation with CLIP Latents. doi: https://doi.org/10.48550/arXiv.2204.06125 (in English).
7. Oppenlaender, J. (2022). The Creativity of Text-to-Image Generation. In 25th International Academic Mindtrek conference (Academic Mindtrek 2022), November 16–18, 2022, Tampere, Finland. ACM, New York, NY, USA. doi: https://doi.org/10.1145/3569219.3569352 (in English).