One man Band Generative AI

Learn all you can do with only llama.cpp and whatever computer. All for free and locally.

Dec 29, 2024

Generative AI requires computation power and heavy libraries.

Almost 2 years ago Gergi Gerganov created a revolution: he wrote in pure C++ an inference library that can run basically on every PC on the planet a Large Language Model.

With time, the library started to inspire also others: so now with same core library you can also do Automatic Speech Recognition (ASR), image generation and image-to-text.

In this newsletter I will give away to you all of it. It is my Christmas present, after all!

Table of Contents
- Talk with your Documents
- Speech recognition
- create images

Talk with your documents - aka how to create a full RAG application for free, locally on your PC

The quest for a portable and slim Large Language model application is a long journey. Until yesterday I thought I had to stick to pytorch forever.

Llama.CPP is an amazing library: with 50 Mb of code you can basically run on your PC very performing AI models.

But when it comes to talking to your documents, you need embeddings and this means an additional 1.5 Gb of pytorch.

But what if there is a way around it?

In this article, I will guide you step by step in creating a full Talk-to-your-documents chatbot using only llama-cpp-python.

click on the image to read the article for free

Advanced Voice Recognition at your fingertips, no GPU required. A full tutorial, from scratch!

Born from the quest to bridge the gap between dreams and hardware realities, llama.cpp defies conventions, turning modest CPUs into engines of speech recognition excellence, challenging the belief that only elite hardware can fuel AI dreams.

How does it defy the odds? And how can we use it too?

Buckle up and follow me through the article: you will build yourself a beautiful application, with Streamlit and whisper.CPP, running on your Computer.

What is Whisper?

Whisper is an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web.

OpenAI open-sourced the models and inference code to serve as a foundation for building useful applications and for further research on robust speech processing. And this is great news for us!

We show that the use of such a large and diverse dataset leads to improved robustness to accents, background noise and technical language. Moreover, it enables transcription in multiple languages, as well as translation from those languages into English.

If you like a little more technical explanation, here you have a little glimpse:

Whisper.cpp is a high-performance inference of OpenAI’s Whisper automatic speech recognition (ASR) model, written completely in C++.

Following the same principles of Llama.cpp, Georgi Gerganov made another miracle happen. This C/C++ implementation basically allows anyone to use Whisper.

The entire high-level implementation of the model is contained in whisper.h and whisper.cpp. The rest of the code is part of the ggml machine learning library. Having such a lightweight implementation of the model allows to easily integrate it in different platforms and applications.

Seeing is believing: so here, as a gift a full tutorial to run it on your Laptop.

Stable-Diffusion-Cpp-Python — a revolution

Image generation is really fascinating: it makes you feel like there are no limits to your imagination because you can see the results almost immediately.

Except that if you want to do it on your computer it is not really that immediately… The Hardware specs to run Stable Diffusion are quite demanding:

- An NVIDIA GPU with 6 GB of RAM (though you might be able to make 4 GB work)
- SDXL will require even more RAM to generate larger images.
- You can make AMD GPUs work, but they require tinkering

Pioneer Georgi Gerganov wrote an amazing library in pure C++ to run LLM in a quantized format so that you can run them also with no GPU, but with a CPU and enough RAM: the library is called llama.cpp.

Following his example, other amazing programmers adapted the source code to make it available for the Stable Diffusion models.

Meet @leejet’s stable-diffusion.cpp library, later ported to Python, the real protagonist of this article. We will use stable-diffusion-cpp-python to generate images with our poor equipped Laptops

In this additional GIFT a full tutorial on how to create images with llama.cpp.

But be prepared!
You ill get results… but it will take time. On an Intel 12th generation CPU it takes from 30 to 60 seconds per step.

This is only the start!

Hope you will find all of this useful. I am using Substack only for the newsletter. Here every week I am giving free links to my paid articles on Medium. Follow me and Read my latest articles https://medium.com/@fabio.matricardi

Check out my Substack page, if you missed some posts. And, since it is free, feel free to share it!