Diffusion Models in Biotechnology

Divya Ramamoorthy | Mar 27, 2025 min read

Different types of data require distinct generative AI models. Large language models excel at generating text, while diffusion models are designed for image generation. This article delves into how diffusion models work and their applications in biotechnology.

Understanding Diffusion Models

Diffusion models operate on the principle that refining an image step-by-step is easier than generating it from scratch. Think of painting: an artist begins with rough shapes and gradually adds detail. Diffusion models follow a similar approach:

  1. Forward Diffusion: Noise is incrementally added to an image, progressively degrading its quality. This process is used to create the training data for a diffusion model.

  2. Reverse Diffusion: A model is trained to remove the noise and reconstruct the original image. The accuracy of the reconstruction determines the model’s performance.

Diffusion process

The diffusion process involves gradually adding noise to an image (forward diffusion) and then training a model to remove that noise step-by-step (reverse diffusion).

Core Components of Diffusion Models

Noise Scheduler: The noise scheduler determines the amount and pattern of noise applied at each timestep. Noise can be added linearly or through non-linear functions like cosine or quadratic distributions.

Noise-Predictor Model: The noise-predictor model estimates the noise present at each step. Traditionally, diffusion models used a U-Net architecture, originally developed for image segmentation. However, recent advancements have shifted toward transformer-based architectures, especially in protein diffusion models.

Conditional Diffusion: Conditional diffusion allows image generation based on additional inputs, such as text descriptions. For example, in text-conditioned generation, a text encoder converts prompts into numerical representations. These are incorporated into the noise prediction model through techniques like cross-attention, which maps parts of the input to corresponding text instructions.

Variational Autoencoder: Rather than operating in full-resolution pixel space, diffusion models can work in a lower-dimensional latent space. This approach, introduced in the Latent Diffusion Model (Rombach et al., 2022), improves efficiency by compressing images and working with the compressed data directly.

Applications in Biology

Diffusion models excel at capturing spatial patterns, making them highly valuable in designing 3D molecular structures. Their applications include:

  • Protein structure design
  • Protein-ligand interactions
  • Small molecule generation

One of the most mature applications is de novo protein design, where new protein structures are generated from scratch. Proteins are composed of chains of amino acids, each linked with specific angles and orientations. Diffusion models can represent proteins as a set of 3D coordinates and orientations, allowing for the generation of realistic and functional structures. By applying constraints that enforce symmetrical properties, researchers can improve model performance and generate more stable and functional protein structures.

RFDiffusion Image of generative protein diffusion model, from “De novo design of protein structure and function with RFdiffusion” (Watson et al., 2023)

What’s on the horizon?

Flow-matching is a promising advance over diffusion models. Diffusion models are often time-consuming to train and sensitive to the training schedule. Flow-matching offers a more robust and efficient framework, enabling faster convergence and improved results.

Additionally, hybrid approaches that combine diffusion models with graph neural networks (GNNs) are being explored to better capture molecular relationships and further enhance performance in biodesign.

Conclusion

While diffusion models have shown remarkable promise in biotechnology, particularly in protein design and molecular generation, they still face significant challenges. Generating realistic 3D structures is often the first step of many to get from an idea to a potent, manufacturable drug. However, with innovations like flow-matching and hybrid architectures, the field is making steady progress. These advancements pave the way for more efficient and accurate models, aiming to accelerate breakthroughs in drug discovery and synthetic biology.