Meet RWaKuV: when RNN beats Transformers

RWKV is the Ultimate Multi-language Model family Attention free!

Mar 17, 2024

Do you want to see an open Source Language model that speaks and understands 23 languages able to compare with the Giants?

Meet RWKV, the cutting-edge architecture that combines the best of both worlds: the efficiency of Recurrent Neural Networks (RNNs) and the power of transformers.

To my surprise even the 1.5B and 3B parameter models are so good and fast you cannot believe it. So they can be the right choice for all the PoorGPUguys like me, or for any AI enthusiast that wants to explore new opportunities.

RwaKuv (RWKV)

RWKV (pronounced RwaKuv) is an RNN with GPT-level LLM performance. Until this project started the LLM community was considering RNN not good enough to be of on par the Transformers architecture in terms of NLP tasks performances.

Even if the mentioned above statement looked Carved in stones, a small group of geniuses, supported by a sponsor, decided to create RWKV: an Open Source, non profit group, under the linux foundation.

They broke the TABU and demonstrate not only that RNN are still game, but also that can be further trained. So as result we got a new family of models that:

👨‍🏫 can be directly trained like a GPT transformer
🕸️combine the best of RNN and transformer
🌏101 languages support
⚖️📈great performance
⏭️fast inference
⏭️fast training,
📑📚 saves VRAM while having basically “infinite” context length
🪆free text embedding
🤹‍♂️100% attention-free!

And this is overcoming many of the Transformers limitations!

What the community thinks about it?

As usual it is easier to follow the known path: RWKV models, in fact, need a different prompt strategy approach, so all the ChatGPT fans can be initially bewildered, not understanding what to do.

After passing the first 30 minutes of re-assessment of our habits we can fully make 100% use of these multi-languages models.

You can also find some pioneers on X and other socials talking about them. The core questions for us are really only 3:

why should I use it?
How hard is to use them?
What kind of requirements to start (money/hardware)?

1.Why should I use it

This is a fair observation. Honestly I have been reluctant too, until a watched few videos and started to read papers and comments.

The main push though was to discover that you can use the Transformers library! This means that the architecture itself is not an issue to start experimenting and evaluate the quality and performance of the RWKV models.

It is 100% free, open source, open code and under the linux foundation. And it is able to support more than 100 languages, fluently.

2.How hard is to use them?

The setup and use is really straight forward. It is totally beginner free. You can read more in the article from Medium, giveaway of this week.

There a is a convenient automatic setup with chat interface and model handling.

instructions and info in the RWKV GitHub repo

3.What kind of requirements to start (money/hardware)?

No money required! First of all you can run the RWKV models even on a CPU only computer or Laptop. With 8Gb of RAM you can still use at decent speed up to the 3B parameters weights.

There is also no costs on the inferences or tokens. The LLM runs completely and only locally: no API tokens, no costs per token or embeddings. You can see from the table below that the so called Top Models have huge costs.