From smarter LLM pruning to training performance insights on AMD Instinct MI355X + ROCm 7.0, here are three must-read updates for anyone building with AMD. 👇 Dive into the details Týr-the-Pruner: Search-based Global Structural Pruning for LLMs: https://bit.ly/44bt0f8 The vLLM MoE Playbook: A Practical Guide to TP, DP, PP and Expert Parallelism: https://bit.ly/48CPKWD Optimizing LLM Workloads: AMD Instinct MI355X GPUs Drive Competitive Performance: https://bit.ly/4iL04kf
AMD Developer
Semiconductor Manufacturing
Advancing AI innovation together. Built with devs, for devs. Supported through an open ecosystem. Powered by AMD.
About us
- Website
-
https://www.amd.com/en/developer
External link for AMD Developer
- Industry
- Semiconductor Manufacturing
Updates
-
AMD Developer reposted this
48 hrs AMD open robotics hackathon @ Tokyo.
-
-
Results on whether a hot dog is a sandwich or not remain inconclusive. Comment to settle the debate. 🌭
Watch 8 local LLMs on a single AMD Strix Halo mini PC debate whether a hot dog is a sandwich! Built this demo in an hour using 🍋Lemonade, which now enables many LLMs to run in parallel: https://lnkd.in/ebfd_f75 Thanks to the brilliant Steve Reinhardt for the idea :) Adrian Macias Victoria Godsoe Daniel Holanda Noronha Ramakrishnan Sivakumar Kalin Ovtcharov Tomasz Iniewicz Ramine Roane
-
AMD Developer reposted this
🎉 TensorWave turns 2! 🎉 Two years ago, we set out to prove that an AMD-only cloud could power serious AI at scale. Today, we’re supporting some of the most demanding AI teams with high-performance GPU clusters, open ecosystems, and a focus on real-world results... not hype. Huge thank you to our customers, partners, and the TensorWave team for believing in a different way to build AI infrastructure. Here’s to the next chapter. Head to https://tensorwave.com/ to learn more.
-
-
👏
Training massive Mixture-of-Experts (MoE) models like DeepSeek-V3 and Llama 4-Scout efficiently is one of the challenges in modern AI. These models push GPUs, networks, and compilers to their limits. To tackle this, AMD and Meta’s PyTorch teams joined forces to tune TorchTitan and Primus-Turbo, AMD’s open source kernel library, for the new Instinct MI325X GPUs. Together, they reached near-ideal scaling across 1,024 GPUs, showing that efficiency and scale don’t have to be a trade-off. 🖊️ AMD Contributors: Liz L., Yanyuan Qin, Yuankai Chen, Xinyu Kang, Xiaobo Chen, Zhen Huang, Shekhar Pandey, Zhenyu Gu, Andy Luo Meta Contributors: Matthias Reso, Hamid Shojanazeri, Tianyu Liu, Jiani Wang, Howard Huang, Wei Feng 📎 Read our latest blog: https://lnkd.in/gRtpteN6 #PyTorchFoundation #OpenSourceAI #TorchTitan #MoE
-
-
Making GenAI faster, smarter, and more cost-effective. Fireworks AI is tackling all three with tuned inference, fine-tuning workflows, and a platform built for real-world scale—powered in part by AMD. Hear more from their CEO and co-founder Lin Qiao.
-
AMD Developer reposted this
Excited to share our NeurIPS 2025 Tutorial on How to Build Agents to Generate Kernels for Faster LLMs (and Other Models!) A collaboration across institutions: AMD, Stanford, Google DeepMind, Arm, Nvidia, Meta, Modular AI, UC Irvine, and ML Commons. - If you're an AI researcher, check out the parts of AI compute (e.g. GPU) that affect your inference & training jobs -- from why you should do batch sizes in certain multiples of 2 to how your trillions of matrix multiplications make their way through different types of memory on a chip - Also chat with us about the trend of self-improving LLMs that can write code to make them even faster. Agents -> SFT data -> RL with profiling tools for strong reward signals - Come build with us! It's an exciting future where LLMs can make the next generation of AI research (RL, even more sparsity, hybrid models with SSMs) super fast on any new hardware generation -- on-the-fly in your IDE Sina Rafati, Hao Li, Azalia Mirhoseini, Anna Goldie, Laurence Moroney, Vartika Singh, Mark Saroufim, Chris Lattner, Sitao Huang, David Kanter, Simon Guo, Kesavan Ramakrishnan, Vincent Ouyang, Tim Gianitsos, Nithyashree Manohar, Sharon Zhou, with acknowledgements to the AMD GEAK team Zicheng Liu, Dong Li, Ziqiong Liu, Pratik Prabhanjan Brahma and the AMD ROCm Kernel team ZHAOYI LI and the AMD Omni team Muhammad Awad, Keith Lowery, Cole Ramos and kernel engineers John Tyler, Muhammad Osama and of course the one and only Emad Barsoum. When they're out, will drop... * arxiv paper link * Github repo link
-