Beyond Content Generation: AI-Based Layout Generation for Graphic Design

At Wix, we’re actively exploring how people can use creativity tools powered by AI technology to improve and reinvent the website building experience. The traditional website design process involves several steps: selecting a layout that balances text and visual content, crafting compelling titles and textual content, and uploading relevant visuals.

Wix’s AI group is working on introducing innovative machine learning and algorithmic solutions to enhance and automate such workflows. For example, the Wix AI Text Creator suite leverages the generative capabilities of Large Language Models to enable users to create compelling titles and engaging content for their websites.

Wix also employs advanced text-to-image models, like DALL-E, to infuse users’ websites with fresh and relevant visual content.

AI Layout Generation

Even with all these tools, designing layouts for professional-looking websites remains a challenge. The task is complex, requires design expertise and consumes considerable time. As a result, users are often faced with the choice of selecting from a set of pre-designed templates or hiring a professional designer.

To broaden our users' creative options, Wix's AI research group is exploring ways to automate the creation of custom layouts. Our goal is to not merely suggest suitable layout templates for a site, but to create unique and highly personalized layouts.

In our recent work on conditioned layout generation, presented at ICCV 2023, we introduced the Diffusion Layout Transformer (DLT), a general framework for layout generation. This framework features a flexible conditioning mechanism that offers users intuitive yet detailed control over the design process, while ensuring high quality outputs.

Our research primarily focuses on enhancing the website building experience. However, the flexible and general design of DLT makes it suitable for a wide variety of graphic design applications. These include creating layouts for mobile app user interfaces, as well as generating designs for information slides, magazines, scientific papers, infographics, indoor scenes, and more.

The natural representation of layouts contains both continuous attributes, like the size and location of visual components, and discrete attributes like the category of components.

Differing from previous approaches which use either continuous or discrete layout attributes, our approach provides a generative diffusion process that operates jointly on both continuous and discrete inputs. Additionally, our framework offers flexible layout editing and enables conditioning on any subset of the component attributes.

AI layout generation in the design process

To offer flexible layout design, we allow practitioners to fix specific component attributes, such as class (e.g., image, text, button) and position. Once these attributes are set, the system generates the remaining attributes. Alternatively, users can place a few components and let the generative model complete the layout.

DLT allows us to generate layouts for a broad range of design tasks in a single framework: Unconditioned - generating a layout from scratch, often used for inspiration. Conditioned - predefining certain components and their attributes, like their positions on the canvas, and letting the generative process fill in the rest.

The ability to condition layout generation on a subset of component attributes, such as specific buttons, images, or text, is essential for real-world applications involving user interaction.

Representing layouts for generative models

Layouts are typically represented as a set of components, where each component includes several attributes such as category (e.g., image, title, button), position, and size.

It's important to note that real-world applications often incorporate additional layout attributes like color, text style, and even content to achieve modern and creative designs.

Our proposed architecture

At the core of our architecture, we employ a Transformer encoder. This design choice not only makes our model non-autoregressive but also allows for flexible conditioning during the inference, which differs from previous layout generation methods. During the training, we randomly hide (mask) certain components within the layout or specific attributes, all in an effort to generate a layout. This approach strengthens our model’s robustness to diverse conditioning scenarios.

Like common image diffusion models, such as DALL-E, our model gets embeddings of components and a timestamp as inputs. After applying multiple (diffusion) iterations it outputs clean coordinates and classes of components.

Our model employs a new approach we call a joint discrete-continuous diffusion process. The layout diffusion process is obtained by sampling all component attributes from the data distribution. We apply a continuous diffusion process on the continuous attributes - like size or position - and apply a discrete diffusion process on the discrete features. The model is trained using a combined loss function that integrates both the discrete and continuous parts.

Results

To get a feeling for our model’s result, let’s look at the diffusion process output using three common conditioning settings.

The first example shows output with no conditions set. We specify the total number of components and then the model generates the layout from scratch.

In the next example, we specify the types of components we have and ask the model to generate the positions and sizes of the components.

In this example, we specify the sizes of the components and let the model position them.

Evaluation

To evaluate the generated layouts, we used popular layout datasets for diverse graphic design tasks, including annotated document images (PubLayNet), Android UI screens (RICO), and digital images (Magazine).

We evaluated output using common design esthetic metrics. We found that our proposed solution outperformed existing ones on layout synthesis and editing tasks, while maintaining computational complexity that is on par with previous approaches.

You can find more details about our evaluation methodology and experimental results in our DLT paper.

Next steps

Generative AI opens up new horizons for graphic design applications. But utilizing AI for website creation should extend beyond design. It should be combined with a comprehensive system of business functionalities, like SEO, analytics, payments, and more.

The current AI revolution is just beginning to unleash AI’s true potential. We believe that AI can reduce complexity and create value for our users, further driving innovation in the website building space.

Over the next few years, we anticipate that AI technologies will bring many opportunities to significantly improve how people and organizations create and maintain their online presence.

This post was written by Dr. Eli Brosh

Acknowledgement: We would like to thank the many people who contributed to this post and offered valuable feedback on initial drafts: Elad Levi, Mykola Mykhailych, Meir Perez, Shai Hoshkover and David Nukrai. A special thanks to Olga Diadenko for her help.

More of Wix Engineering's updates and insights:

Follow us on: Twitter | Facebook | LinkedIn | TikTok
Join our Telegram channel
Visit us on GitHub
Subscribe to our monthly newsletter
Subscribe to our YouTube channel
Follow our Medium publication
Listen to our podcast on Apple, Spotify or Google