Christian Schulz’s Post

3mo

Exciting advancements at the intersection of machine learning and system architecture. A new development by Albert Gu of Carnegie Mellon University and Tri Dao of Princeton University introduces the Mamba architecture, refining the state space sequence approach with notable efficiency and performance improvements. The Mamba model stands out by producing results five times faster and achieving better accuracy than traditional transformers of a similar size while handling long input sequences—up to a million tokens. This is achieved through a design that optimizes computational and memory demands, which typically escalate with increased input length in conventional transformers. Mamba's structured state space sequence (S4) design enhances computational efficiency by maintaining linear scalability with input size, unlike the quadratic rise seen in vanilla transformers. This makes it a compelling alternative for handling extensive datasets without the typical burdens. The innovative approach of the Mamba architecture marks a significant stride towards more efficient and capable AI systems, promising to inspire further research and application in varied domains including motion analysis, vision systems, and more. For anyone interested in the evolving landscape of AI architectures, the Mamba model offers a glimpse into the future of high-efficiency, high-performance computing. #AI #MachineLearning #Innovation #DataScience #ArtificialIntelligence #TechnologyNews

DeepLearning.AI

1,032,318 followers

3mo

Researchers from Carnegie Mellon University and Princeton University introduced the Mamba architecture, an approach that challenges traditional transformers in processing efficiency and memory usage. In tests, Mamba exceeded the performance of similar-sized transformers in both speed and accuracy across tasks like text generation and DNA sequence prediction. Read our summary of the paper in #TheBatch: https://hubs.la/Q02sWR050

Mamba, A New Approach That May Outperform Transformers

deeplearning.ai

To view or add a comment, sign in

More Relevant Posts

DeepLearning.AI

1,032,318 followers
3mo
Report this post
Researchers from Carnegie Mellon University and Princeton University introduced the Mamba architecture, an approach that challenges traditional transformers in processing efficiency and memory usage. In tests, Mamba exceeded the performance of similar-sized transformers in both speed and accuracy across tasks like text generation and DNA sequence prediction. Read our summary of the paper in #TheBatch: https://hubs.la/Q02sWR050

Mamba, A New Approach That May Outperform Transformers

deeplearning.ai

30 Comments
Like Comment
To view or add a comment, sign in
Samuel Msiska

Sponsored Content Editor/Writer at crypto.news | Data Scientist | ML Engineer | Data Analyst | Bsc Data Science
3mo
Report this post
Could this new architecture (Mamba) replace transformers? "A relatively small Mamba produced tokens five times faster and achieved better accuracy than a vanilla transformer of similar size." #AI #ML #deeplearning #transformers #mamba

DeepLearning.AI

1,032,318 followers
3mo

Researchers from Carnegie Mellon University and Princeton University introduced the Mamba architecture, an approach that challenges traditional transformers in processing efficiency and memory usage. In tests, Mamba exceeded the performance of similar-sized transformers in both speed and accuracy across tasks like text generation and DNA sequence prediction. Read our summary of the paper in #TheBatch: https://hubs.la/Q02sWR050

Mamba, A New Approach That May Outperform Transformers

deeplearning.ai
Like Comment
To view or add a comment, sign in
Mahesh P.S.

📈 225 Million Views/Year I 📊Fractional CMO I 🧪Marketing Data Scientist I 💼 AI- Marketing Automation I 📊 21000 + Mktg. Tests I 🎯B2B Digital Strategy I 🧪GTM Strategy I🚀AI-Martech I 💡eCommerce I 🧪Edtech I 💼
3mo
Report this post
DeepLearning.AI, It's truly intriguing to see the paradigm shift in AI architecture brought about by Mamba. The fact that it outperforms similar-sized transformers in both processing speed and accuracy is a testament to the innovative minds at Carnegie Mellon University and Princeton University. This could signal a significant leap forward not just for text generation, but also for fields such as genomics, where DNA sequence prediction is crucial. I eagerly look forward to more insights from TheBatch on how this development might impact various industries.

DeepLearning.AI

1,032,318 followers
3mo

Researchers from Carnegie Mellon University and Princeton University introduced the Mamba architecture, an approach that challenges traditional transformers in processing efficiency and memory usage. In tests, Mamba exceeded the performance of similar-sized transformers in both speed and accuracy across tasks like text generation and DNA sequence prediction. Read our summary of the paper in #TheBatch: https://hubs.la/Q02sWR050

Mamba, A New Approach That May Outperform Transformers

deeplearning.ai
Like Comment
To view or add a comment, sign in
Bharath Prasanna Yelchuri

Engineering Manager at Jio Health
3mo
Report this post
A new alternative(Mamba) to the transformer approach is gaining momentum. Mamba provides an alternative architecture that can accommodate very long input sequences while processing them more efficiently. #mamba #transformers #genAI #generativeAI #llm #deeplearning

DeepLearning.AI

1,032,318 followers
3mo

Researchers from Carnegie Mellon University and Princeton University introduced the Mamba architecture, an approach that challenges traditional transformers in processing efficiency and memory usage. In tests, Mamba exceeded the performance of similar-sized transformers in both speed and accuracy across tasks like text generation and DNA sequence prediction. Read our summary of the paper in #TheBatch: https://hubs.la/Q02sWR050

Mamba, A New Approach That May Outperform Transformers

deeplearning.ai
Like Comment
To view or add a comment, sign in
Daniel Bellone

Senior Software Engineer specialized in Machine Learning, Blockchain, Cloud & Full Stack development.
3mo Edited
Report this post
“Mamba” architecture, challenging Transformers, is gaining momentum. This is a great read to understand why is Mamba interesting and what the benefits are over the more traditional Transformer architecture. Highly recommended.

DeepLearning.AI

1,032,318 followers
3mo

Researchers from Carnegie Mellon University and Princeton University introduced the Mamba architecture, an approach that challenges traditional transformers in processing efficiency and memory usage. In tests, Mamba exceeded the performance of similar-sized transformers in both speed and accuracy across tasks like text generation and DNA sequence prediction. Read our summary of the paper in #TheBatch: https://hubs.la/Q02sWR050

Mamba, A New Approach That May Outperform Transformers

deeplearning.ai
Like Comment
To view or add a comment, sign in
Malik (Mike) A.

Software Developer & AI/ML Expert @ Nationwide
3mo
Report this post
It’s extremely validating when DeepLearning.AI corroborates a conclusion that you made not long after the paper came out nearly 4 months ago. I’m glad people are really starting to see the benefits of Mamba and other “RNN-like” architectures when compared to Transformers. Hybrid model architectures that combine the two are next.

DeepLearning.AI

1,032,318 followers
3mo

Researchers from Carnegie Mellon University and Princeton University introduced the Mamba architecture, an approach that challenges traditional transformers in processing efficiency and memory usage. In tests, Mamba exceeded the performance of similar-sized transformers in both speed and accuracy across tasks like text generation and DNA sequence prediction. Read our summary of the paper in #TheBatch: https://hubs.la/Q02sWR050

Mamba, A New Approach That May Outperform Transformers

deeplearning.ai
Like Comment
To view or add a comment, sign in
Matteo Cappelloni

CTO & Innovation Manager at Next Adv
3mo
Report this post
Research in AI is just started and in the last year made unbelievable progress. Adoption from small businesses is what is driving this pace and researchers find increasingly better and more efficient paradigms. Trasformers are not yet obsolete, but we can't really predict what will be released next year.

DeepLearning.AI

1,032,318 followers
3mo

Researchers from Carnegie Mellon University and Princeton University introduced the Mamba architecture, an approach that challenges traditional transformers in processing efficiency and memory usage. In tests, Mamba exceeded the performance of similar-sized transformers in both speed and accuracy across tasks like text generation and DNA sequence prediction. Read our summary of the paper in #TheBatch: https://hubs.la/Q02sWR050

Mamba, A New Approach That May Outperform Transformers

deeplearning.ai
Like Comment
To view or add a comment, sign in
Çağlar Aytekin, Ph.D.

Lead AI Developer - Multiphysics simulations @Quanscient
8mo
Report this post
I guess many of you heard about the UltraFastBERT paper. But I want to emphasize an integral part of the main approach and the authors' earlier work: Fast Feedforward Networks https://lnkd.in/dFvq9if7 The authors propose a binary tree where each decision criterion is modeled by a tiny neural network + sigmoid. The essence: During training, error from each leaf is integrated through sigmoid weights, throughout the training sigmoids are hardened, and during inference a single path is selected based on hard sigmoids. So error propogation goes through each leaf during training and this helps training. And during inference only one child is selected which makes the inference ultra fast. Neat idea and gives one further thoughts about how to utilize hybrid tree and neural network architectures. #decisiontrees #neuralnetworks #llm #transformers #artificialintelligence #ai

Fast Feedforward Networks

arxiv.org
Like Comment
To view or add a comment, sign in
NLPlanet | Breaking Down Generative AI Daily

11,026 followers
10mo
Report this post
Fast Feedforward Networks (FFF) can be 6x quicker than Mixture-of-Experts networks ⚡️ 🌳 FFF involves a binary tree with smaller neural networks as leaves. Non-leaf nodes host compact neural networks to find the input-dependent path. ⚠️ The FFF model's main challenge is its overly deep tree, causing fragmentation. This undermines learning as individual leaves are rarely used. 👀 Yet, despite obstacles, this approach is highly promising for scenarios that require fast inference and encoding minor details. 🔍 FFF networks can increase speed by maintaining 94.2% of predictive performance with only 1% layer neurons during inference, offering potential for vision transformers and tasks requiring fast inference and compact encoding. —————————————————— Want to stay at the forefront of Generative AI developments? Follow NLPlanet for daily insights into the most relevant news, guides, and research! 🚀

Fast Feedforward Networks

arxiv.org
Like Comment
To view or add a comment, sign in
Dr.Mohammed Arshad

UAE 🇦🇪 Ex-Govt IT Officer| Recipient of Li Top Voice - Databases | Deputy Manager-Databases| DB-Architect | Product Manager | SWIFT Security| SQL Server/Oracle/Sybase/MongoDB/Postgres | Data Scientist| Big Data | ITIL.
2mo
Report this post
“Sharing is Caring” Hi #AI Community!!! #sharingiscaring “Send me an Invite and get your skills endorsed on evaluation straightaway as pleasantries !” “#Knowledge Booster” 📚 Expand your knowledge base with this helpful article..Keep #Learning !! 👍 NOTE : This is NOT a paid Advertisement #Robotics #artificialintelligence #dataanalytics #datascience #machinelearning #deeplearning #linkedinfamily #knowledgesharing #machinelearningalgorithms #neuralnetworks #deeplearning #deeplearningai #computervision Join me on my way into an exciting world of Data/Analytics/AI Send me an Invite and get your skills endorsed on evaluation straightaway as pleasantries 😊 🍁Let’s Connect Now

Eric Feuilleaubois (Ph.D)

Deep Learning / ADAS / Autonomous Parking chez VALEO // Curator of Deep_In_Depth news feed
2mo

AnchorGT: A Novel Attention Architecture for Graph Transformers as a Flexible Building Block to Improve the Scalability of a Wide Range of Graph Transformer Models https://buff.ly/3wtK5TR

AnchorGT: A Novel Attention Architecture for Graph Transformers as a Flexible Building Block to Improve the Scalability of a Wide Range of Graph Transformer Models

https://www.marktechpost.com
Like Comment
To view or add a comment, sign in

275 followers

18 Posts

View Profile Follow

Christian Schulz’s Post

More Relevant Posts

Explore topics