ollama_chat

Beta

component_type_dropdown::[]

Generates responses to messages in a chat conversation using the Ollama API and external tools.

Introduced in version 4.32.0.

Common
Advanced

# Common configuration fields, showing default values
label: ""
ollama_chat:
  model: llama3.1 # No default (required)
  prompt: "" # No default (optional)
  image: 'root = this.image.decode("base64")' # Decode base64 encoded image. No default (optional)
  response_format: text
  max_tokens: 0 # No default (optional)
  temperature: 0 # No default (optional)
  save_prompt_metadata: false
  tools: [] # No default (required)
  runner:
    context_size: 0 # No default (optional)
    batch_size: 0 # No default (optional)
  server_address: http://127.0.0.1:11434 # No default (optional)

# All configuration fields, showing default values
label: ""
ollama_chat:
  model: llama3.1 # No default (required)
  prompt: "" # No default (optional)
  system_prompt: "" # No default (optional)
  image: 'root = this.image.decode("base64")' # Decode base64 encoded image. No default (optional)
  response_format: text
  max_tokens: 0 # No default (optional)
  temperature: 0 # No default (optional)
  num_keep: 0 # No default (optional)
  seed: 42 # No default (optional)
  top_k: 0 # No default (optional)
  top_p: 0 # No default (optional)
  repeat_penalty: 0 # No default (optional)
  presence_penalty: 0 # No default (optional)
  frequency_penalty: 0 # No default (optional)
  stop: [] # No default (optional)
  save_prompt_metadata: false
  max_tool_calls: 3
  tools: [] # No default (required)
  runner:
    context_size: 0 # No default (optional)
    batch_size: 0 # No default (optional)
    gpu_layers: 0 # No default (optional)
    threads: 0 # No default (optional)
    use_mmap: false # No default (optional)
    use_mlock: false # No default (optional)
  server_address: http://127.0.0.1:11434 # No default (optional)
  cache_directory: /opt/cache/connect/ollama # No default (optional)
  download_url: "" # No default (optional)

This processor sends prompts to your chosen Ollama large language model (LLM) and generates text from the responses using the Ollama API and external tools.

By default, the processor starts and runs a locally-installed Ollama server. Alternatively, to use an already running Ollama server, add your server details to the server_address field. You can download and install Ollama from the Ollama website.

For more information, see the Ollama documentation and examples.

Fields

`model`

The name of the Ollama LLM to use. For a full list of models, see the Ollama website.

Type: string

# Examples

model: llama3.1

model: gemma2

model: qwen2

model: phi3

`prompt`

The prompt you want to generate a response for. By default, the processor submits the entire payload as a string. This field supports interpolation functions.

Type: string

`system_prompt`

The system prompt to submit to the Ollama LLM. This field supports interpolation functions.

Type: string

`image`

An optional image to submit along with the prompt value. The result is a byte array.

Type: bloblang

# Examples

image: 'root = this.image.decode("base64")' # Decode base64 encoded image

`response_format`

The format of the response the Ollama model generates. If specifying JSON output, then the prompt should specify that the output should be in JSON as well.

Type: string

Default: "text"

Options: text , json .

`max_tokens`

The maximum number of tokens to predict and output. Limiting the amount of output means that requests are processed faster and have a fixed limit on the cost.

Type: int

`temperature`

The temperature of the model. Increasing the temperature makes the model answer more creatively.

Type: int

`num_keep`

Specify the number of tokens from the initial prompt to retain when the model resets its internal context. By default, this value is set to 4. Use -1 to retain all tokens from the initial prompt.

Type: int

`seed`

Sets the random number seed to use for generation. Setting this to a specific number will make the model generate the same text for the same prompt.

Type: int

# Examples

seed: 42

`top_k`

Reduces the probability of generating nonsense. A higher value, for example 100, will give more diverse answers. A lower value, for example 10, will be more conservative.

Type: int

`top_p`

Works together with top-k. A higher value, for example 0.95, will lead to more diverse text. A lower value, for example 0.5, will generate more focused and conservative text.

Type: float

`repeat_penalty`

Sets how strongly to penalize repetitions. A higher value, for example 1.5, will penalize repetitions more strongly. A lower value, for example 0.9, will be more lenient.

Type: float

`presence_penalty`

Positive values penalize new tokens if they have appeared in the text so far. This increases the model’s likelihood to talk about new topics.

Type: float

`frequency_penalty`

Positive values penalize new tokens based on the frequency of their appearance in the text so far. This decreases the model’s likelihood to repeat the same line verbatim.

Type: float

`stop`

Sets the stop sequences to use. When this pattern is encountered, the LLM stops generating text and returns the final response.

Type: array

`save_prompt_metadata`

Set to true to save the prompt value to a metadata field (@prompt) on the corresponding output message. If you use the system_prompt field, its value is also saved to an @system_prompt metadata field on each output message.

Type: bool

Default: false

`max_tool_calls`

The maximum number of sequential calls you can make to external tools to retrieve additional information to answer a prompt.

Type: int

Default: 3

`tools`

The external tools the LLM can invoke, such as functions, APIs, or web browsing. You can build a series of processors that include definitions of these tools, and the specified LLM can choose when to invoke them to help answer a prompt. For more information, see examples.

Type: string

`tools[].name`

The name of the external tool you want to use.

Type: string

`tools[].description`

A description of what the tool does. The LLM uses this to decide when to invoke the tool.

Type: string

`tools[].parameters`

The parameters the LLM needs to provide to invoke the tool.

Type: object

`tools[].parameters.required`

The parameters you must define.

Type: array

Default: []

`tools[].parameters.properties`

The required inputs to invoke the tool.

Type: object

`tools[].parameters.properties.<name>.type`

The data type of the parameter.

Type: string

`tools[].parameters.properties.<name>.description`

A description of the parameter.

Type: string

`tools[].parameters.properties.<name>.enum`

Sets this parameter to a enum, and defines the specific set of values the LLM must use.

Type: array

Default: []

`tools[].processors`

The Redpanda Connect processors to execute when the LLM invokes the external tool.

Type: array

`runner`

Options for the model runner that are used when the model is first loaded into memory.

Type: object

`runner.context_size`

Sets the size of the context window used to generate the next token. Using a larger context window uses more memory and takes longer to process.

Type: int

`runner.batch_size`

The maximum number of requests to process in parallel.

Type: int

`runner.gpu_layers`

This option allows offloading some layers to the GPU for computation. This generally results in increased performance. By default, the runtime decides the number of layers dynamically.

Type: int

`runner.threads`

Set the number of threads to use during generation. For optimal performance, it is recommended to set this value to the number of physical CPU cores your system has. By default, the runtime decides the optimal number of threads.

Type: int

`runner.use_mmap`

Map the model into memory. This is only support on unix systems and allows loading only the necessary parts of the model as needed.

Type: bool

`runner.use_mlock`

Lock the model in memory, preventing it from being swapped out when memory-mapped. This option can improve performance but reduces some of the advantages of memory-mapping because it uses more RAM to run and can slow down load times as the model loads into RAM.

Type: bool

`server_address`

The address of the Ollama server to use. Leave the field blank and the processor starts and runs a local Ollama server or specify the address of your own local or remote server.

Type: string

# Examples

server_address: http://127.0.0.1:11434

`cache_directory`

If server_address is not set - the directory to download the Ollama binary and use as a model cache.

Type: string

# Examples

cache_directory: /opt/cache/connect/ollama

`download_url`

If server_address is not set - the URL to download the Ollama binary from. Defaults to the official Ollama GitHub release for this platform.

Type: string

Examples

Analyze an image and generate a description for it
Use a series of processors to make calls to external tools
Check prompts for unsafe content

This configuration fetches image URLs from stdin and uses the LLaVA LLM to describe the images.

input:
  stdin:
    scanner:
      lines: {}
pipeline:
  processors:
    - http:
        verb: GET
        url: "${!content().string()}"
    - ollama_chat:
        model: llava
        prompt: "Describe the following image"
        image: "root = content()"
output:
  stdout:
    codec: lines

In this configuration, the LLama 3.2 model executes a number of processors, which make a tool call to retrieve weather data for a specific city.

input:
  generate:
    count: 1
    mapping: |
      root = "What is the weather like in Chicago?"
pipeline:
  processors:
    - ollama_chat:
        model: llama3.2
        prompt: "${!content().string()}"
        tools:
          - name: GetWeather
            description: "Retrieve the weather for a specific city"
            parameters:
              required: ["city"]
              properties:
                city:
                  type: string
                  description: the city to look up the weather for
            processors:
              - http:
                  verb: GET
                  url: 'https://wttr.in/${!this.city}?T'
                  headers:
                    User-Agent: curl/8.11.1 # Returns a text string from the weather website
output:
  stdout: {}

This configuration runs prompts through the Llama Guard 3 model to check they comply with safety and security standards. The model marks the prompts that do not comply as unsafe, and the Bloblang mapping deletes them. To check the safety of responses, see the ollama_moderation processor.

pipeline:
  processors:
    - branch:
        processors:
          - ollama_chat:
              model: llama-guard3
              prompt: ${!this.prompt}
        result_map: |
          root.safety = content()
    # Drop the message from the pipeline if it's unsafe
    - mapping: |
        root = if this.safety.has_prefix("unsafe") {
          deleted()
        }
      # The input prompt has been checked for safety.
    - ollama_chat:
        model: llama3.2
        prompt: ${!this.prompt}

Was this helpful?

group Ask in the community

mail Share your feedback

group_add Make a contribution

What do you like about this doc?

Let us know what we do well:

Let us contact you about your feedback:

What did you not like about this doc?

Let us know what we can improve:

Let us contact you about your feedback:

ollama_chat

Fields

model

prompt

system_prompt

image

response_format

max_tokens

temperature

num_keep

seed

top_k

top_p

repeat_penalty

presence_penalty

frequency_penalty

stop

save_prompt_metadata

max_tool_calls

tools

tools[].name

tools[].description

tools[].parameters

tools[].parameters.required

tools[].parameters.properties

tools[].parameters.properties.<name>.type

tools[].parameters.properties.<name>.description

tools[].parameters.properties.<name>.enum

tools[].processors

runner

runner.context_size

runner.batch_size

runner.gpu_layers

runner.threads

runner.use_mmap

runner.use_mlock

server_address

cache_directory

download_url

Examples

Simple online edits

Contribution guide

`model`

`prompt`

`system_prompt`

`image`

`response_format`

`max_tokens`

`temperature`

`num_keep`

`seed`

`top_k`

`top_p`

`repeat_penalty`

`presence_penalty`

`frequency_penalty`

`stop`

`save_prompt_metadata`

`max_tool_calls`

`tools`

`tools[].name`

`tools[].description`

`tools[].parameters`

`tools[].parameters.required`

`tools[].parameters.properties`

`tools[].parameters.properties.<name>.type`

`tools[].parameters.properties.<name>.description`

`tools[].parameters.properties.<name>.enum`

`tools[].processors`

`runner`

`runner.context_size`

`runner.batch_size`

`runner.gpu_layers`

`runner.threads`

`runner.use_mmap`

`runner.use_mlock`

`server_address`

`cache_directory`

`download_url`