LlamaCPP Integration

Located in Scripts/Runtime/Llamacpp/

High-performance LLM inference by connecting to a remote llama.cpp server.

Key Scripts

| Script | Purpose | |--------|---------| | LLM.cs | Server configuration component. Manages connection settings (port, context size, API key). | | LLMCaller.cs | Base class for making LLM requests. Handles local/remote switching and request management. | | LLMChatTemplates.cs | Chat template definitions for different model formats. | | LLMInterface.cs | Request/response data structures for the llama.cpp API. |

Basic Usage

// LLMCaller handles the connection
public LLMCaller llmCharacter;

// Send a chat message and receive streaming response
string response = await llmCharacter.Chat(prompt, OnPartialResponse);

// Callback for streaming tokens
void OnPartialResponse(string partial) {
    responseText.text = partial;
}

Configuration

Remote Mode — Set remote = true and configure host (e.g., localhost:13333)
Context Size — Adjust contextSize on the LLM component (default: 8192)
Chat Template — Set the appropriate template for your model in chatTemplate

💡

Tip: Use the recommended qwen2.5-3b-instruct-q8_0.gguf model from the Getting Started guide for best results with the RPG Generator demo.