MAL-E : Understanding Text-to-Image Generation

Joachim, SilviaSilviaJoachimHennecke, MartinMartinHennecke2025-07-112025-07-112025978-3-98989-054-1https://fis.uni-bamberg.de/handle/uniba/108885Generative AI, particularly in image generation, has attracted a lot of attention in recent years. These technologies are here to stay. Understanding how they work demystifies them, showing that they are driven by algorithms, not magic. We present a learning and experimentation module for Unplugged Text-To-Image Generation, which we have called MAL-E. It explains several key steps in the process, starting with Tokenization, Embedding, and Positional Encoding. Students learn Masked Self-Attention, an important step of the Decoder-only Transformer. They also learn about Image Components Selection and Image Assembly to produce the final image. MAL-E provides an inclusive hands-on way to understand how pre-trained neural networks and transformers create images from text prompts.engArtificial IntelligenceCS UnpluggedInclusive MaterialGenerative AIAI EducationNeural NetworksK-12 students004MAL-E : Understanding Text-to-Image Generationconferenceobject