Inference
Task to use the Huggingface Inference API
The Serverless Inference API offers a fast and free way to explore thousands of models for a variety of tasks. Whether you’re prototyping a new application or experimenting with ML capabilities, this API gives you instant access to high-performing models across multiple domains:
- Text Generation: Including large language models and tool-calling prompts, generate and experiment with high-quality responses.
- Image Generation: Easily create customized images, including LoRAs for your own styles.
- Document Embeddings: Build search and retrieval systems with SOTA embeddings.
- Classical AI Tasks: Ready-to-use models for text classification, image classification, speech recognition, and more.
type: "io.kestra.plugin.huggingface.Inference"
Use inference for text classification
id: huggingface_inference_text
namespace: company.team
tasks:
- id: huggingface_inference
type: io.kestra.plugin.huggingface.Inference
model: cardiffnlp/twitter-roberta-base-sentiment-latest
apiKey: "{{ secret('HUGGINGFACE_API_KEY') }}"
inputs: "I want a refund"
Use inference for image classification.
id: huggingface_inference
namespace: company.team
tasks:
- id: huggingface_inference_image
type: io.kestra.plugin.huggingface.Inference
model: google/vit-base-patch16-224
apiKey: "{{ secret('HUGGINGFACE_API_KEY') }}"
inputs: "{{ read('my-base64-image.txt') }}"
parameters:
function_to_apply: sigmoid,
top_k: 3
waitForModel: true
useCache: false
API Key
Huggingface API key (ex: hf_********)
Inputs
Inputs required for the specific model
Model
Model used for the Inference api (ex: cardiffnlp/twitter-roberta-base-sentiment-latest, google/gemma-2-2b-it)
API endpoint
Default value of the Huggingface API is https://api-inference.huggingface.co/models
Options
The options to set to customize the HTTP client
Parameters
Map of optional parameters depending on the model
Output returned by the Huggingface API
The time allowed to establish a connection to the server before failing.
The time an idle connection can remain in the client's connection pool before being closed.
The time allowed for a read connection to remain idle before closing it.
The maximum time allowed for reading data from the server before failing.