Vectors Unleashed: AI Machine Learning in Capsid Engineering

24 January 2025

Viral vectors, such as AAV, are used for the delivery of RNA and DNA therapeutics. Machine learning, a type of AI, is enabling the rapid iterative development and improvement of these therapeutic vectors.

AAV Vectors

Parvoviruses are a group of small, non-enveloped DNA viruses that can infect many different species, tissues, and cell types. Because they can target so many different cells, scientists have been exploring their potential to deliver genes for treating human cancer and other diseases. While research shows that these viruses can be very useful for gene therapy, it's also clear that it's important to control where and how the genes are delivered in the body. Fortunately, viruses have preferences for particular cell types, know as "trophism". One promising example is Adeno-associated virus (AAV), a type of parvovirus. AAV’s virus shell, or capsid, can be easily modified, which allows scientists to expand its ability to target specific cells more effectively.

AAV Capsid

AAV capsids consist of a total of 60 molecules of viral proteins (VPs); a mixture of the three overlapping gene products, VP1, VP2, and VP3, encoded by the cap open reading frame (ORF) and organized in T = 1 icosahedral symmetry. The surface of the AAV capsid has special structures, called motifs, that are essential for attaching to the host cell, which is the first step in infection. Different types (serotypes) of AAV viruses use different receptors on the host cell. For example, AAV2 primarily binds to a receptor called heparin sulfate proteoglycan (HSPG) on the cell surface. Once AAV2 attaches, it can use additional receptors, such as αVβ5 integrin, fibroblast growth factor receptor-1 (FGFR-1), or hepatocyte growth factor receptor (c-met), to enter the cell and start the infection process. Research in this area aims to help scientists design AAV capsids that can more specifically target certain cells, making them more effective for gene therapy. By understanding how the virus interacts with different receptors, scientists can modify the capsid to increase its ability to target only the desired cells.

Capsid Engineering

Recent cases of severe AAV toxicity, including patient deaths, have raised concerns about the safety of AAV gene therapy. To address these concerns, scientists have developed several strategies to reduce immune responses against AAV capsids or the therapeutic genes they deliver. Some of these strategies include using immunosuppressants, creating capsid decoys, performing plasmapheresis, using IgG protease, depleting CpG motifs, and inducing regulatory T cells. However, immune responses remain a significant challenge for AAV gene therapy, limiting who can receive treatment, increasing safety risks, and reducing the long-term effectiveness of the therapy. Additionally, the efficiency of current AAV vectors in delivering genes needs further improvement.

As a result, there is a strong demand for new AAV capsids with better properties. This has led to many recent collaborations focused on developing next-generation AAV vectors. To discover improved capsids for better gene delivery, researchers are using three main approaches: rational design, directed evolution, and in silico design. These methods aim to create more effective, safer, and durable AAV-based therapies.

Schematic diagram of capsid engineering to improve CNS trophism. (A) Methods for AAV capsid diversification. (B) Workflow for capsid engineering, including multi-species selection and tailored enhancement of cellular trophism, immune evasion, and viral packaging. Credit: Ghauri, M. S., & Ou, L. (2023). AAV Engineering for Improving Tropism to the Central Nervous System. Biology, 12(2), 186. https://doi.org/10.3390/biology12020186

Rational Design

Rational design is a method of engineering AAV capsids by using existing knowledge of AAV biology to create different capsid variants. These variants are then tested and refined to improve specific functions, like how well the capsid can bind to receptors, enter cells, or move within the cell. To guide this process, researchers need a deep understanding of AAV biology to know which parts of the capsid can be modified for better performance.

One of the early successes in rational design of AAV capsids involved using site-directed mutagenesis to target specific tyrosine residues on the capsid surface, improving gene delivery to the central nervous system (CNS). Additionally, researchers have inserted high-affinity ligands or peptides into the capsid to enhance its ability to target specific cells. Using rational design, scientists have also placed cell-targeting peptides at precise locations within the AAV genome, such as residues 587-588 of the VP1 gene, identified through bacteriophage display libraries. These engineered capsids improve gene delivery, allowing for lower doses of viral vectors while still delivering more genetic material.

Directed Evolution

Directed evolution was first used to change the cellular targeting (tropism) of native AAV capsids. One simple way to create diverse variants is through error-prone PCR, which introduces random mutations into the AAV cap sequence. This method has been applied to various AAV serotypes, allowing researchers to select for improved capsid versions. While error-prone PCR creates small changes, combining it with other techniques like DNA shuffling and random peptide display can further optimize capsids for better gene delivery. Directed evolution has been particularly valuable in altering AAV's natural tropism and improving its ability to target specific tissues, expanding research and therapeutic opportunities, especially in understanding neurophysiology and diseases in the central nervous system (CNS).

(Link for the two sections above: https://pmc.ncbi.nlm.nih.gov/articles/PMC9953251/#:~:text=There%20have%20been%20many%20endeavors,evolution%2C%20and%20in%20silico%20design)

Machine Learning

Machine learning is a type of artificial intelligence that enables a machine (computer) or other system to learn and improve from experience. For example, knowing from a data set that a particular behaviour (e.g. smoking) increases the likelihood of a particular result (lung cancer). Machine learning is able to identify subtle as well as clear associations and correlations. Machine learning (ML) is a branch of artificial intelligence (AI) that focuses on adaptive learning, ML behaviour is not explicitly programmed, the algorithms learn autonomously. This allows ML models to stay flexible and get better as they process more data. ML is widely used in fields like computer vision, speech recognition, and medicine. Essentially, machine learning involves building models or algorithms that improve a set of tasks by learning from training data, making them more efficient over time.

There are mainly three types of machine learning models and algorithms, each designed for specific tasks:

Supervised Learning: In this type, the model learns by example. Developers provide a dataset with input and corresponding output, helping the AI recognize patterns to predict outcomes for new data. There are two main types of supervised learning:
- Regression models: These find relationships between variables.
- Classification models: These categorize items into groups.
Unsupervised Learning: In unsupervised learning, there is no human guidance. The AI is given large amounts of data and must find patterns on its own. A common approach in unsupervised learning is clustering, where the AI groups similar items together based on patterns in the data, even if the data is unlabelled.
Reinforcement Learning: This model learns through feedback, where the AI interacts with an environment and receives rewards or penalties based on its actions. Over time, the algorithm learns to take actions that maximize long-term rewards.

Mikos et al. developed a model to predict residue mutations that improve AAV production fitness. This model uses evolutionary principles and structural AAV data, along with microenvironment characteristics, without needing domain knowledge or experimental fitness data. By analysing the 3D positions of residues and their surrounding environment, the model identifies beneficial mutations that enhance production fitness. The results showed that the model's predictions led to a threefold increase in the percentage of beneficial mutations compared to random mutations. This approach could accelerate the development of new AAV variants with better production fitness, improving their potential as clinical pharmaceuticals.

Limitations of ML

Models trained solely to evaluate a virus's ability to replicate (fitness) may overlook other crucial steps in the infection process, such as how the virus binds to cells, enters the cell, or escapes after entry. Similarly, models that focus on how virus parts interact may not account for factors like how effectively antibodies can neutralize the virus. To overcome these limitations, newer machine learning methods combine multiple factors—such as virus productivity, cell binding, and neutralization—into a single framework. This allows the model to optimize for all these aspects simultaneously, providing a more comprehensive understanding of the virus's behaviour and improving predictions for its performance.

IMAGE AI and Machine learning in Medicine. Credit Cell Guidance Systems

Learn more about powerful technologies that are enabling research:

Facebook Twitter Google + Pintrest