I am thrilled to share our latest preprint "Peptide Sequencing Via Protein Language Models", led by first author Thuong Pham, a CS PhD student in my group. We developed a new method using AI to predict the complete sequence of proteins from limited wet lab data, which could significantly improve how we study proteins and understand diseases. This work was co-led by Joseph Buonomo's group and was a four lab collaboration also involving Alison Ravenscraft's and Justyn Jaworski's groups. In clinical genetics and wet lab research, everything we measure—genomics, transcriptomics—serves as a proxy for the proteome. Our new approach addresses a significant gap in protein sequencing. Protein Language Models are revolutionizing protein engineering, and in this work we show that they are also highly useful for protein measurement. 🔬 What's the breakthrough? We introduce a protein language model that can predict the complete sequence of a peptide from a limited set of amino acids. Traditionally, attempts toward protein sequencing rely on either mass spectrometry or some innovative Edman degradation platforms, which often struggle to sequence non-native peptides accurately. Future innovations in click chemistry will allow non-native sequencing of chains of amino acids. ⚙️ How does it work? Our method simulates partial sequencing data by selectively masking amino acids that are challenging to identify experimentally, based on protein sequences from the UniRef database. We then architecturally modified and fine-tuned a ProtBert-derived transformer-based model to predict these masked residues, achieving per-amino-acid accuracy of up to 90.5% with only four amino acids ([KCYM]) known. 🧬 Why is this important? This innovative approach allows for a probabilistic reconstruction of the complete protein sequence from limited experimental data, verified through structural assessment using AlphaFold and TM-score. Our model also shows promise for evolutionary analysis across species, opening new avenues for advancements in proteomics and structural biology. 🌐 Impact: By integrating simulated experimental constraints with computational predictions, we aim to enhance protein sequence analysis, potentially accelerating discoveries in proteomics and beyond. Our new protein language model could improve diagnostics by enabling precise peptide sequencing from limited data, improving liquid biopsy accuracy. This method will help enhance the detection and profiling of low-abundance proteins in fluids like blood and urine, offering new cancer biomarkers in the form of proteoform-based diagnostics for proteins such as PD-1/PDL-1, CTLA4, EpCAM, and EpEX. #Proteomics #Bioinformatics #ProteinSequencing #AI #MachineLearning #StructuralBiology #Genomics #Transcriptomics #Biotechnology #ResearchInnovation #ClinicalGenetics #ClickChemistry #CancerResearch #LiquidBiopsy #Diagnostics Paper: https://lnkd.in/gB6zzqff Code: https://lnkd.in/gyd2BBjk
Advanced Protein Research Techniques
Explore top LinkedIn content from expert professionals.
Summary
Advanced protein research techniques involve cutting-edge methods and technologies to study, design, and sequence proteins, which are essential molecules in biological processes. These innovations, often powered by AI, are transforming areas like drug discovery, disease diagnostics, and bioengineering.
- Adopt AI-driven tools: Utilize protein language models or generative AI systems to predict, sequence, or design proteins, significantly reducing laboratory data requirements and speeding up research processes.
- Integrate experimental data: Combine AI methodologies with experimental techniques like X-ray or cryo-EM to enhance the accuracy and reliability of protein structural predictions.
- Explore new applications: Apply these advancements to diverse fields including healthcare, environmental science, and sustainable manufacturing for innovative solutions to global challenges.
-
-
Google DeepMind has just introduced #AlphaProteo, an advanced AI system that's changing the game in protein design 🧬✨. But what is this for?? Here’s the scoop: 🔹 What Are Proteins and Why Do They Matter? Proteins are like the body's tiny machines—they drive almost every process in our cells, including growth, immune responses, and more. Imagine proteins as keys that fit into specific locks; their ability to bind (or attach) to each other is crucial for our health. 🔹 Meet AlphaProteo: Previous tools could predict how existing proteins interact, but AlphaProteo goes a step further. It actually designs NEW proteins that can strongly attach to specific target proteins. This capability is a huge leap forward for science and medicine! 🔹 Why Is This Important? AlphaProteo can help speed up drug development, improve disease understanding, and even create better diagnostic tools. It’s already shown success with important targets like: ➣ SARS-CoV-2 spike protein (involved in COVID-19 infection) ➣ VEGF-A (linked to cancer and diabetes complications) 🔹 Unmatched Performance: ➣ Stronger Ties: Binders designed by AlphaProteo are 3 to 300 times better at binding compared to existing methods. ➣ Quick and Accurate: High success rates mean fewer trials and faster results in the lab. 🔹 Real-World Validation: Leading research groups, including those from the Francis Crick Institute, have confirmed that AlphaProteo’s designed proteins work as promised. Some even prevented SARS-CoV-2 and its variants from infecting cells! 🔹 Future Potential: This technology could pave the way for new treatments, early disease diagnosis, and much more. It’s even being explored for applications like sustainable manufacturing processes and environmental clean-up. Curious to learn more? Dive into the details and see how AlphaProteo is paving the way for the future of protein design👇 https://lnkd.in/gujQ2K83 #Biotech #AI #HealthTech #ProteinDesign #DrugDiscovery #Innovation #Google #DeepMind #ArtificialIntelligence #FutureofAI
-
Our innovative AI-driven approach uses a specialized protein language model to identify the most effective protein variants, reducing data requirements by over 99%. This breakthrough addresses significant data challenges in research, enabling scientific progress even with limited resources. By streamlining the research process, Capgemini's method paves the way for new innovations in healthcare, agriculture, and environmental science. Developed in the bespoke gen AI-driven biotechnology lab of Cambridge Consultants, the deep tech powerhouse of the Capgemini Group, this methodology is already delivering remarkable results: 🚀 60% boost in plastic degradation efficiency 🌟 Sevenfold increase in the brightness of the Green Fluorescent Protein benchmark For the pharmaceutical industry, this translates to faster drug discovery, improved diagnostic tools, and novel bioengineering applications. Capgemini is at the forefront of this exciting development, driving technological advancements in this rapidly evolving field. 🔗 Learn more #lifesciences #bioeconomy #AI #proteinengineering #biotechnology #AI #bioeconomy #biotechnology #proteinengineering Capgemini Life Sciences and Healthcare
-
In the past month, new tools have improved protein design, benchmarking, and refinement, showing both progress and areas for improvement. ▪️ Proteína by NVIDIA is a flow-based generative model for de novo protein backbone design, trained on 21 million AlphaFold structures. Its 400M-parameter transformer supports C.A.T.H. conditioning, classifier-free guidance, and autoguidance for precise structure control. Proteína generates proteins up to 800 residues, outperforming RFdiffusion and Genie2 in scale and accuracy. ▪️ MotifBench is a comprehensive benchmark for evaluating motif-scaffolding methods in protein design. Featuring 30 challenging test cases from the Protein Data Bank, it provides a standardized evaluation pipeline and identifies key limitations in methods like RFdiffusion, highlighting the need for more advanced approaches. ▪️ ROCKET enhances AlphaFold2’s protein structure predictions by integrating experimental data from X-ray, cryo-EM, and cryo-ET without requiring retraining. It refines large structural changes, improves accuracy at low resolution, and allows automated, experiment-guided model building. At RECEPTOR.AI, we're closely following these innovations while recognizing their dependence on high-quality data. Rather than relying solely on model adjustments, we emphasize integrating experimental insights and physics-informed approaches with AI. This combined strategy helps overcome current limitations and accelerates meaningful advances in protein design and drug discovery. GIF credit: Ian Haydon / Institute for Protein Design (via MIT News) #ai #ml #artificialintelligence #biotech #drugdiscovery