In my previous column, I wrote about how declines in workforce quantity and quality are taking their toll on CPG customers. I sketched out how OEMs might leverage generative AI to create a new generation of machines that talk to operators and technicians, in any language. Essentially, packaging and processing OEMs can build their own version of ChatGPT right into their equipment, allowing operators and technicians to carry on a dialog with the machine. In this column I’ll walk through actual small language models that your engineers can download and begin experimenting with today.
Wait a minute, you say...there’s no way that your CPG customers will stand for your machine maintaining an open connection to the cloud in order for generative AI to work. You’re right about that, which is why there’s so much buzz right now in the AI community about a new generation of self-contained large language models that are designed to run on a local machine. No Internet connection required. Yes, you read that right.
In fact, there are several language models (which some have taken to calling small language models or SLMs) that are available for download that are open source or freely available via a commercial-friendly license that costs you exactly nothing. Why free? There’s an arm’s race right now among the big tech companies with AI model development. Users of these models, like you, benefit from their innovation. For more context, see this recent post from Mark Zuckerberg on why he believes the future of AI is open source, and what’s in it for his company.
Small language models can run on any PC based control or PC-based HMI, or even a dedicated PC embedded in your machine for the purpose of running a generative AI interface. A recurring theme around small language models is that they are a fraction of the size of current large language models, require a fraction of the computing power, and can yield performance coming close to that of large language models, depending on the application. They also eliminate any latency associated with round-trip communications to the cloud. All of the above makes this game-changing technology.
The size of language models is measured by the number of parameters. (If you’re curious, see a detailed explanation of what is meant by parameters when it comes to LLMs.) For example, ChatGPT 3.5 uses 175 billion parameters.
In this column I’m going to focus on several SLMs that your engineers can begin experimenting with that are all 8 billion parameters or smaller. In fact, the more interesting models are 2 billion parameters and smaller. Counterintuitively, when it comes to embedding AI into your equipment, smaller is better: The smaller number of parameters, the less computing power is needed. I’m guessing if you create your own SLM specific to your machine that hoovers up every single word on every page of every manual ever written on that equipment, including hours of interviews with your design engineers, 2-billion-parameter models will be plenty powerful enough.
One final technical note. Many of these models are designed to run faster on a PC with a GPU (a Graphics Processing Unit), typically from Nvidia. These are the chips every AI company is desperate to lay their hands on to power their data centers. But GPUs (typically from Nvidia) can also be found in everyday PC workstations. Even some HMIs may contain GPUs to aid in visualization and graphics. You’ll want to check the hardware that you are using in or on your equipment, as a GPU will make these models run faster. That said, speed differences may be immaterial depending on what you design and how much information you incorporate into your custom models.
For details on the following models, have your engineers download this spreadsheet that we compiled that contains details on all the language models including:
- Name of the model
- Company who provides the model
- Whether it’s open source
- Links to detailed information on the model
- Links to AI chats with Perplexity for details on the model
All the models are free, and all run on a PC or PC-based HMI. The spreadsheet also covers the hardware requirements of each model, where I was able to discern it. Once your engineers download this spreadsheet, they can begin tinkering.
The five families
When it comes to small language models, I am tracking five major players that are actively competing in the space right now: Google, Microsoft, Meta (Facebook), Mistral, and Hugging Face. (OpenAI is conspicuously missing from the list, but I’ll get to that.)
Let’s start with Google. You probably have used Google’s Gemini chatbot (formerly Bard). In February of this year, Google released its Gemma small language model, based on its Gemini large language model. In June, Google released Gemma 2, the next version. And at the end of July, it released Gemma 2B, which it claims outperforms the original ChatGPT 3.5 that stunned the world less than a couple years ago, weighing in with a size of only 2.6 billion parameters. Gemma 2B was specifically designed to balance performance with efficiency, along with purported AI safety enhancements. Google is taking AI very seriously and does not want to let OpenAI or any other AI company win the AI arms race.
Not surprisingly, Microsoft is also investing heavily in small language models. In addition to literally investing in OpenAI, Microsoft is building its own AI tech: Earlier this year it released its Phi-3 model family which comes in several sizes. Its Phi-3-mini model is the one that interests me most, and weighs in 3.8 billion parameters. Microsoft is claiming that its Phi-3 family outperforms “models of the same size and next size up across a variety of language, reasoning, coding, and math benchmarks.” More recently, Microsoft released a Phi-1.5 model (1.5 billion parameters) but it appears to be more of a research proof of concept and may or may not be ready for prime time. But these models are evolving rapidly and this one's worth keeping an eye on.
Meta (popularly known as Facebook) has open-sourced its LlaMa family of AI models, which come in several sizes. Meta is iterating LlaMa at a breakneck pace, moving from version 2 earlier this year to version 3.1 this summer. The smallest model is LlaMa 8B, which is an 8-billion parameter model. 8B might require a bit of a beefy machine to run, but I’d keep an eye on this family. I wouldn’t at all be surprised if Meta releases an even smaller model (1B to 3B in size) before too long. One note about LlaMa. Before you dismiss the idea of building into your serious machine application something from Facebook, I would remind you that tech companies like Meta have been known to release highly durable technologies that gain wide purchase due to their reliability and functionality. Meta’s React open-source front-end JavaScript library for building user interfaces for web applications is a great example, and millions of websites use it today. It’s not going anywhere.
Another company serious about the AI arms race is the French AI company Mistral. Mistral is well known in AI circles and is one of the few companies outside of the U.S. and China working on AI models, especially small language models. Mistral offers three different 7 billion parameter models that can run on an embedded PC, starting with Mistral Small, which it released earlier this year. This summer Mistral also released Codestral Mamba, which is a specialized model for code generation but also for reasoning. It could be worth checking out.
There’s one last model to mention, and it’s intriguing because it’s a series of really small models. The family is called smolLM from the company Hugging Face, and is available in a 135 million, 360 million and 1.7 billion parameter models. Before you dismiss this one because of the funny name, Hugging Face is extremely well-known and well-respected in the AI community as a provider of tools and data to that community. Confusingly, the actual language models from most of the other companies mentioned in this column can also be downloaded from Hugging Face. (More on that in a future column.) But smolLM is its own offering.
For the packaging and processing machinery applications, smolLM is worth experimenting with if you’re running into hardware constraints with your existing HMI or PC hardware and the other models. Another reason to experiment with smolLM: The models are so compact they can reportedly run on an inexpensive Raspberry PI, which as many techies and geeks know is a single-board computer that can fit in the palm of your hand. Raspberry Pis were originally designed for hobbyists, but the Raspberry Pi 4 with at least 4GB of RAM, which costs only around $50, can easily be incorporated into a packaging or processing machine without messing with your existing controls architecture.
What about the infamous ChatGPT from the company OpenAI, which started the generative AI revolution? Is there a small language model version available? As of this writing, the only small language model Open AI does offer is a low-cost, small language model that it calls GPT-4o Mini. But that model runs only in the cloud, ruling it out for use in packaging and processing equipment, for now.
One other company that is worth keeping your eye on is none other than Apple. Researchers at that company have just released (and open-sourced) a family of self-contained small language models, one at 7 billion parameters and one at 1.4 billion parameters. From my take, these models seem more experimental at this time, and do NOT seem like a candidate for OEM use just now.
Companies like Anthropic and Grok do not offer small language models currently.
Conclusion
The small language model space is clearly heating up. SLMs portend a future where AI can be built into many products without requiring a permanent connection to the cloud.
I don’t endorse any particular model, but I do endorse the idea of allocating some engineering time to begin experimenting with one or more of these technologies. As mentioned earlier in this column, you can download our Researched List of all of the above models, complete with detailed specs and links to learn more and download. These models are developing at a breathtaking pace—just in the span of a few weeks I have already had to update this article frequently--as I was writing it--with newer information. We’ll continue to update this article and the researched list above, as things develop.
What about accuracy, hallucinations and overall risk of something bad happening? I’ll tackle that in my next column, so stay tuned.
OEM Magazine is pleased to inaugurate this semi-occasional column tracking the rapid advances in AI and how packaging and processing machine builders can leverage them to build next-generation equipment. Reach out to Dave at [email protected] and let him know what you think or what you’re working on when it comes to AI.