In the field of artificial intelligence, few innovations have been as disruptive as Large Language Models (LLMs), renowned for their ability to understand and generate human-like text. These models, powered by cutting-edge technologies, are reshaping industries and the way we interact with technology. While the Generative Pre-trained Transformer (GPT) series, developed by OpenAI, stands as a testament to LLMs’ potential, it is the accessibility and versatility of platforms like Hugging Face that are truly driving this revolution.
What is Hugging Face’s Model Hub?
Traditionally, tech giants operated within proprietary frameworks, guarding their advanced AI models closely. However, a growing realization of the benefits of open-source development has led companies to release some of their most sophisticated models to the wider community. Meta, Microsoft, and others have embraced this new era of openness, contributing pre-trained models to Hugging Face’s Model Hub.
Hugging Face’s Model Hub serves as a central repository for pre-trained AI models, datasets, and resources. This open platform is democratizing AI by enabling developers, researchers, and businesses to access, fine-tune, and deploy these models for various applications. The involvement of industry giants in open-sourcing their models on this platform adds a new layer of significance to its ecosystem.
The integration of models from major tech companies into Hugging Face’s Model Hub brings collaborative innovation. Developers and researchers from around the world can now build starting from these models, creating novel applications, improving performance, and tailoring solutions to specific needs. This level of collaboration has the potential to accelerate AI advancements at an unprecedented pace, while simultaneously leveling the playing field for smaller players who may not have had access to such resources before.
One of the most exciting aspects is integrating LLMs into the production pipelines. There are several ways to achieve it but in this article, we want to present the solution powered by Helicon, Radicalbit’s MLOps platform.
Helicon is able to manage the whole life-cycle of a model in production, starting from the processing of the data before the inference, going through the model serving, until the performance monitoring. One of the latest features is the integration of the Hugging Face Model’s Hub into Helicon so that the deployment of LLMs can be achieved with no code and a few clicks.
How to run a Hugging Face Model in Helicon
That’s what you need to run a Hugging Face Model:
- An active Helicon subscription (if you don’t have one or you want to start a free trial, click here);
- 2 minutes on your hands.
First of all, you need to go to the MLOps section of the platform and create a New Model, choosing a Name and providing a Description.

Helicon gives you two options for uploading models: MLflow and Hugging Face. We’re going to roll with the Hugging Face option (we will write a dedicated blog about MLFlow deployment).

Helicon has a direct line to the Hugging Face APIs. To get your model, choose the Task, the Model Repository name and les jeux sont faits. For this demonstration, we will import a text generation model called bert-base-uncased a variant of BERT (Bidirectional Encoder Representations from Transformers) developed by Google (here’s the paper https://arxiv.org/abs/1810.04805), able to fill the masked words in a sentence with the most probable ones.

All the previous steps have taken almost 30 seconds of your time, and the remaining 90 will be involved to deploy the model and make it available for inference. Press the Serve button and enjoy your Hugging Face model!

Just some considerations before concluding.
There are plenty of tools and solutions to deploy and Hugging Face model, so why should you choose Helicon?
Helicon provides an entire ecosystem of features and perks for your machine-learning models, such as versioning, performance monitoring in production, drift and data integrity detection and much more. In addition to that, it exploits the power of Pipelines to pre-process and post-process your data just before and after the inference, an essential step to make rough data ready for the model and the predictions suitable for your custom use cases.
One last mention must be made and it concerns Applications, through which it is possible to build and expose a service containing pre/post processing and inference all in one shot, accessible via an HTTP call as a single, unique service.
Do you want to know more about Helicon? Visit our website and book your free demonstration!
Enable Adaptive AI and Decision Intelligence with Helicon
Gartner has recently included Adaptive AI as one of the Top Strategic Technology Trends in 2023. It can be seen as an AI system that adapts to changing real-world situations and continuously evolves based on real-time data. Adaptive AI leverages event stream...
How IoT and AI can increase energy efficiency for large buildings
In this blogpost we put forward an innovative solution to prevent energy waste in sizable buildings such as office blocks and dorms. In these indoor environments, energy waste often arises when lighting, heating or electric appliances are inadvertently left on in...
Industry application of Machine Learning: the Alcoplast use case
One of the most fascinating aspects of industry lies in having a huge amount of data and information, through which value can be extracted and it can provide a truly amazing benefit. Knowing how to extract, transform, and move these data is a winning key in several...
Kafka Summit 2022: Keynote highlights
Hello, Kafka enthusiasts! Kafka Summit, the premier event for developers, architects, data engineers, DevOps professionals, and streaming data lovers got finally back in person on April 25th-26th at the magnificent venue The O2 in London. Hosted by Confluent, the...