Do you think LLM can THINK? Beyond Transformers lies new horizons in AI architecture
Exploring Revolutionary Models and Their Potential
The era of Transformer-based models has undeniably transformed the field of artificial intelligence, particularly in natural language processing.
However, as we've discussed in previous newsletters, these models come with inherent limitations that restrict their capabilities in certain areas.
Imagine that recently…
Apple, stated that “LLMs do not perform genuine reasoning.”
Yes, you heard it right! A team of researchers from Apple has recently published a paper asserting that large language models (LLMs), which underpin some of the most widely used artificial intelligence products such as ChatGPT and Llama, do not possess genuine reasoning capabilities.
According to their findings, the intelligence attributed to these models is significantly overstated. The researchers conducted a series of tests that demonstrated the models’ ability to reason is largely, if not entirely, based on memorization rather than true cognitive ability.
In response, researchers and innovators are exploring new architectural paradigms that could overcome these constraints and open a completely new way for more advanced AI systems.
Introducing Alternative Architectures
One such alternative is the Liquid Neural Network (LNN), developed for the first time in production by Liquid.ai, a startup backed by MIT. LNNs differ fundamentally from traditional neural networks by featuring dynamic and adaptable connections, allowing them to learn and adjust in real-time. This fluidity makes them particularly adept at handling tasks that require adaptation to changing conditions, a capability that is crucial for more generalized intelligence.
Another innovative approach is the RWKV model, which stands for Receptance Weighted Key Value. This architecture aims to merge the benefits of Transformers and RNNs, offering both efficient parallel training and linear scaling during inference. These models tackle the computational inefficiencies associated with long sequence processing! RWKV represents a step towards more scalable and efficient AI models.
The Resurgence of RNNs
Recurrent Neural Networks (RNNs), once the go-to architecture for sequential data, have seen a resurgence in interest as researchers seek alternatives to Transformers. Institutions like Borealis AI and Mila (Université de Montréal) are revisiting classic RNN variants like LSTMs and GRUs, exploring ways to enhance their capabilities and make them more competitive with Transformer models.
The appeal of RNNs lies in their ability to handle sequences efficiently, with computational requirements that scale linearly with sequence length. Can we create AI systems that can better manage temporal data and understand causal relationships?
The Role of Academia in AI Advancement
Academic institutions are at the forefront of this architectural shift, driving innovation and pushing the boundaries of what's possible in AI. However, as highlighted by Fei-Fei Li, Director of Stanford’s AI Lab, universities are facing a significant resource gap compared to tech giants.
…is the moonshot mentality to invest in public sector AI… it’s not a secret that all that the resources both in terms of talent data and compute are concentrated in in big tech industry and Americans public sector Academia is falling off the cliff pretty fast in terms of AI resources. Stanford natural language processing lab has 64 gpus. — from source
Stanford’s Natural Language Processing Group has 64 GPUs for all of its work. Meantime tech behemoths like Meta, Google and Microsoft funnel billions of dollars into AI. Meta aims to procure 350,000 of the specialized GPUs essential to run the enormous calculations needed for AI models.
How come such a massive resources gap is building with even the country’s richest universities?
This disparity threatens to stifle academic research and could slow down progress in AI development.
Investing in academic AI research is essential for maintaining a vibrant ecosystem of ideas and ensuring that foundational breakthroughs continue to occur. Why it is so hard to see this issue?
Embracing the Future of AI
As we look to the future, it's clear that the path to more advanced AI systems will require a departure from current architectures and a willingness to explore new paradigms.
How can we accelerate the development of AI models that are not only more capable but also more aligned with human values and needs?
How can we support the role of Academia Researches?
In our next newsletter, we'll deep dive into practical applications of these emerging architectures and discuss how they might transform various industries.
GIFT of the day: an article about Liquid Neural Networks (LNNs), the new type of artificial intelligence that work like a liquid.
This is only the start!
Hope you will find all of this useful. I am using Substack only for the newsletter. Here every week I am giving free links to my paid articles on Medium. Follow me and Read my latest articles https://medium.com/@fabio.matricardi
Check out my Substack page, if you missed some posts. And, since it is free, feel free to share it!