On Thursday, Meta released the first variants of its new Llama-3 large language model, including Llama-3 8b, which has been trained on a incredible 15 trillion tokens. After spending a full weekend with it, I’m in complete awe of what Meta has created. It’s a beast!
For starters, I’m able to run it on my Macbook M3 Max with 48 GB RAM. This is true of other compact LLMs as well, up to and including Mixtral 8x7b, but given that Llama-3 has “only” 8 bilion parameters, it’s lightning fast. The added bonus of being able to run it locally means I can access it anywhere at any time, without worrying about internet connectivity or throttling. Plus, my Mac shrugs it off – no fan noise, no battery drain.
Maybe the most interesting aspect of my weekend adventures is that I found LLama-3 to be superior to Chat GPT 4 when it comes to polishing emails and other business communications.
The vanilla version of Llama-3 is also the only compact LLM (and in fact the only model apart from Chat GPT 4 that I’m aware of) that can solve the logical problems I’ve presented to pretty much every LLM since October 2022:
1. The Sock Problem. Imagine being in a dark room with a box of black and white socks. How often do you need to reach into the box to guarantee a matching pair?
2. The Ball-and-Van Problem. A ball rolls onto a street, followed by a van approaching with a loud bang. After the van passes, there’s a sheet of plastic on the street. What happened?
The first problem tests reasoning, while the second tests real-world understanding, and the vanilla version of Llama-3 8b accurately solves both problems.
The instruct version doesn’t do as well on this particular task, but that’s just a case of horses for courses. What the instruct version does excel at is following instructions, which is, of course, key for companies like ours, which develop software that relies on LLMs for natural language understanding. I look forward to seeing these improvements getting incorporated into Melody, Mike, and our various other generative AI-based solutions.
What about other Llama-3 variants? I haven’t tested Llama-3 70b yet, but I hear good things about it. And Llama-3 400b appears to perform at levels comparable to Chat GPT 4. Further variants are expected to come out in the coming weeks and months, including versions with larger context sizes – the 8k context window of Llama-3 8b is the only disappointment in an otherwise hugely impressive release (Meta has indicated that it’s working on an update).
By the way: We’ve organized a few generative AI events, some in partnership with NVIDIA. At these events, we talk about AI generally, answer questions about what’s possible given the current state of the art, and give product demos using real-world applications. We’re currently in the process of planning further events in London, New York, and other cities. If you or your organization would like to receive an invitation, please let me know.