Documentation

Last Updated: February 12, 2026

1. Our Mission

KaliXPro was designed by Harvard and Cambridge quants and students, hoping to bridge the gap between a lack of technology and prediction markets. Through a completely ground-floor-level build process — using low-latency programming languages like Rust instead of Python, and a host of purpose-built systems — we look to level the edge, rather than sell you on bombastic claims.

Our architecture is built from scratch: a Rust-native core for sub-millisecond execution, a TypeScript orchestration layer for agent intelligence, and a modular monorepo that lets every component be tested and deployed independently. No black boxes, no repackaged APIs — just transparent infrastructure designed for speed and reliability.

2. Getting Started

Creating Your Account

Sign in at kalixpro.com/signin using one of our supported authentication providers:

After authentication, you are redirected to the Console — your central dashboard for managing context, monitoring performance, and configuring trading parameters.

First Login

On your first login, the Console opens to the Context Control overview. This is your home base. From here you can see live metrics, recent activity, and quick actions. Explore the sidebar to navigate between Context Control, Trading, and Settings.

3. Console Overview

The Console is a protected area — you must be signed in to access it. The sidebar navigation includes:

Your session is shown in the sidebar footer with your avatar, name, and a sign-out button. A status indicator shows system health at a glance.

4. Context Control Dashboard

The Context Control dashboard is the heart of KaliXPro. It gives you complete visibility into how your AI agents use tokens, and full control over optimization.

Key Metrics

Top Regressions

Tracks which context sections are growing fastest. Each regression shows the section name, current token count, previous count, and delta. Use this to identify which sections need budget adjustment or optimization.

Quick Actions

5. Budget Configuration

Control exactly how many tokens are allocated to each section of your LLM requests. Over-allocating wastes money; under-allocating degrades quality. Budgets let you find the right balance.

Model Presets

Select a model preset to load recommended budget allocations. Available presets include gemini-3-flash, gemini-3-pro, and vertex-tuned-orchestrator. Each preset has different input/output limits and section allocations optimized for its context window.

Global Limits

Section Budgets

Each section has a configurable token budget:

Dry Run Preview

Before saving, run a simulation to see how your budget changes would affect real traffic. The preview shows the number of affected requests, estimated truncation rate change, and projected cost savings.

6. Caching

Caching reduces costs by reusing common prompt prefixes across requests. When system prompts or tool schemas don't change between calls, cached tokens are served at a fraction of the cost.

Live Stats

Cache Policy

Savings Breakdown

A per-category breakdown showing tokens saved and dollar value for each cache type: system prompt cache, tool schema cache, and conversation prefix cache.

Cache Invalidation

Use the "Invalidate All Caches" button when you update prompts or tool schemas. This forces all subsequent requests to build fresh cache entries.

7. Guardrails & Cost Control

Guardrails prevent runaway costs from unexpectedly large context windows. Set hard limits and get alerts before spend gets out of hand.

Long Context Threshold

Set a maximum token limit for any single request (default: 100,000). A visual bar shows warning level (80%) and block level (100%).

Blocking Behavior

Cost Controls

Manual Override

For exceptional cases, you can request an override by typing ALLOW OVERRIDE in the confirmation field. All overrides are logged with timestamp, request ID, and user attribution.

8. Conversation Memory

Manage how conversation history is stored and compressed. Older messages are automatically summarized to save tokens while preserving context.

Memory Stats

Verbatim History

Configure how many recent conversation turns are kept word-for-word (1 to 20). Everything older is compressed into summaries.

Pinned Items

Pin critical context that should persist across all compression cycles. Categories include constraints, decisions, tasks, and general context. Pinned items are never summarized or removed.

Compression

View before/after comparisons of compression. A typical compression reduces a 180-token exchange to a 25-token summary — an 86% reduction — without losing the essential information.

9. Retrieval Packing

Optimize how retrieved documents (RAG) are packed into your context window. Smart packing eliminates redundancy and maximizes information density.

Retrieval Budget

Set a token limit for retrieved chunks (default: 2,000). A visual meter shows how much of the budget is currently used.

Chunking Settings

Smart Packing vs. Naive Top-K

The pack preview lets you compare approaches. A naive top-K approach might retrieve 5 chunks using 2,400 tokens with 35% redundancy. Smart packing selects 3 optimized chunks using 385 tokens with only 8% redundancy — an 84% token savings.

10. Tool Management

Manage the tool schemas that your AI agents use. Every tool definition consumes tokens — optimizing them directly reduces cost per request.

Schema Registry

View all registered tools with their token footprint, duplicate fragment count, and compact savings potential. Toggle tools on or off as needed.

Compact Mode

Enable compact mode to automatically deduplicate shared fields across tool schemas. This reduces the total schema footprint without changing tool behavior.

Output Policy

Configure how tool outputs are handled:

Redaction Rules

Pre-configured regex patterns automatically redact sensitive data from tool outputs before they enter the context window. Built-in rules cover API keys, email addresses, and phone numbers. Each rule can be individually enabled or disabled, and you can test redaction with sample text before deploying.

11. Trading

KaliXPro connects to multiple prediction market venues and crypto exchanges for real-time signal detection and execution.

Supported Venues

Execution Modes

Domain Agents

Six specialized AI agents analyze different market domains:

Signal Sources

Signals are aggregated from multiple sources with configurable weights:

Risk Management

12. Pricing

Starter — $99/month ($79/month annual)

Professional — $299/month ($239/month annual)

Enterprise — $999/month ($799/month annual)

13. Support & Contact

For questions, issues, or feedback, reach us at support@kalixpro.com.

Interested in early access? Join the beta program for priority onboarding and direct access to the engineering team.

Enterprise customers receive 24/7 dedicated support with a named account manager and guaranteed response SLAs.