PHI-3: Microsoft Power Mini LLM

A Highly Capable Language Model Locally on Your Phone! And we have already the weights and GGUF binaries.

Apr 23, 2024

This is ThePoorGPUguy extra edition 😁

This is exactly the title of the official Paper that introduces to the World the latest born from Microsoft powerhouse.

We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone.

Microsoft just unleashed Phi-3-Mini, a super-slim AI model that packs a serious punch. Don't let its size fool you - with 3.8 billion parameters trained on a terrific 3.3 trillion words (really, really really… a lot of data!), Phi-3-Mini goes toe-to-toe with much bigger guys like Mixtral and GPT-3.5.

Remember how everyone was raving about Llama 2? Apparently, Phi-3-Mini laughs in the face of its 7 billion parameter count. This little powerhouse can fit right on your phone, making it super accessible for everyone.

4-bit quantized phi-3-mini running natively on an iPhone with A16 Bionic chip, generating over 12 tokens per second - Image from official paper

Some Spec please?

We don’t have any weights or Repository available, but according to Microsoft this are the previews:

Updates: 04:44 April 24 2024 Shanghai time. We have weights on HuggingFace!

Model card for 128K version - available also phi-mini 4K context window version

And already quantized!!!

Phi-3 GGUF: 4K
Phi-3 ONNX: 4K

phi-3-mini

The phi-3-mini model is a transformer decoder architecture, with default context length 4K. We also introduce a long context version via LongRope that extends the context length to 128K, called phi-3-mini-128K.

To best benefit the open source community, phi-3-mini is built upon a similar block structure as Llama-2 and uses the same tokenizer with vocabulary size of 320641 . This means… super compatibility!

The model uses 3072 hidden dimension, 32 heads and 32 layers. Phi-3 mini is trained using bfloat16 for a total of 3.3T tokens.

The model is already chat-finetuned… we have already also the prompt chat template

<|user|>/n Question <|end|>/n <|assistant|>

Let’s wait for our eyes to witness the wonders… Initial benchmarks, anyway are quite astonishing:

Despite its compact size, Phi-3-Mini boasts performance levels that rival larger models such as Mixtral 8x7B and GPT-3.5.
with a default context length of 8K - the Phi-3-small 7 billion parameter model achieves an MMLU score of 75.3 and outperforms Meta’s recently launched Llama 3 8B Instruct with a score of 66.

This is a game-changer for AI, guys. Phi-3-Mini proves that big things can come in small packages. Get ready for a future where powerful AI is at your fingertips, literally!

Meantime…

Free for you my latest article on the wonders Small Language Models can do!

click on the image to read the article for free

Hope you will find all of this useful. Feel free to contact me on Medium.

I am using Substack only for the newsletter. Here every week I am giving free links to my paid articles on Medium.

Follow me and Read my latest articles https://medium.com/@fabio.matricardi