Documentation

Last Updated: February 12, 2026

1. Our Mission

KaliXPro was designed by Harvard and Cambridge quants and students, hoping to bridge the gap between a lack of technology and prediction markets. Through a completely ground-floor-level build process — using low-latency programming languages like Rust instead of Python, and a host of purpose-built systems — we look to level the edge, rather than sell you on bombastic claims.

Our architecture is built from scratch: a Rust-native core for sub-millisecond execution, a TypeScript orchestration layer for agent intelligence, and a modular monorepo that lets every component be tested and deployed independently. No black boxes, no repackaged APIs — just transparent infrastructure designed for speed and reliability.

2. Getting Started

Creating Your Account

Google — Sign in with your Google account
X (Twitter) — Sign in with your X account
Meta (Facebook) — Sign in with your Meta account

After authentication, you are redirected to the Console — your central dashboard for managing context, monitoring performance, and configuring trading parameters.

First Login

On your first login, the Console opens to the Context Control overview. This is your home base. From here you can see live metrics, recent activity, and quick actions. Explore the sidebar to navigate between Context Control, Trading, and Settings.

3. Console Overview

The Console is a protected area — you must be signed in to access it. The sidebar navigation includes:

Context Control — Manage AI token budgets, caching, guardrails, memory, retrieval, and tools
Trading — Configure and monitor prediction market trading activity
Settings — Account preferences and configuration

Your session is shown in the sidebar footer with your avatar, name, and a sign-out button. A status indicator shows system health at a glance.

4. Context Control Dashboard

The Context Control dashboard is the heart of KaliXPro. It gives you complete visibility into how your AI agents use tokens, and full control over optimization.

Key Metrics

Avg Input Tokens — Average token usage per request, with trend indicator
Truncation Rate — Percentage of requests that needed context truncation
Retrieval Share — Portion of tokens consumed by RAG / document retrieval
Tool Schema Share — Portion of tokens consumed by tool definitions
Cache Hit Rate — How often cached prompt prefixes are reused

Top Regressions

Tracks which context sections are growing fastest. Each regression shows the section name, current token count, previous count, and delta. Use this to identify which sections need budget adjustment or optimization.

Quick Actions

View Latest Request — Inspect the most recent LLM request in detail
Adjust Budgets — Jump to the budget configuration page
Tool Optimizer — Open the tool schema management page
Export Report — Download a context usage report

5. Budget Configuration

Control exactly how many tokens are allocated to each section of your LLM requests. Over-allocating wastes money; under-allocating degrades quality. Budgets let you find the right balance.

Model Presets

Select a model preset to load recommended budget allocations. Available presets include gemini-3-flash, gemini-3-pro, and vertex-tuned-orchestrator. Each preset has different input/output limits and section allocations optimized for its context window.

Global Limits

Input Limit — Maximum tokens for the entire input (e.g., 32,000)
Output Limit — Maximum tokens for the model response (e.g., 8,000)
Safety Reserve — Buffer tokens kept aside to prevent overflow (e.g., 2,000)

Section Budgets

Each section has a configurable token budget:

system — System instructions (default: 2,000)
developer — Developer messages (default: 1,000)
conversation — Chat history (default: 8,000)
retrieval — RAG document chunks (default: 4,000)
tool_schemas — Function/tool definitions (default: 2,000)
tool_results — Function output data (default: 3,000)

Dry Run Preview

Before saving, run a simulation to see how your budget changes would affect real traffic. The preview shows the number of affected requests, estimated truncation rate change, and projected cost savings.

6. Caching

Caching reduces costs by reusing common prompt prefixes across requests. When system prompts or tool schemas don't change between calls, cached tokens are served at a fraction of the cost.

Live Stats

Hit Rate — Percentage of requests that reused cached content
Total Requests — Volume processed in the current period
Cached Tokens — Total tokens currently held in cache
Estimated Savings — Dollar amount saved via caching

Cache Policy

Short Cache TTL — Time-to-live for frequently changing content (default: 300 seconds)
Long Cache TTL — Time-to-live for stable content like system prompts (default: 3,600 seconds)
Min Prefix Length — Minimum token count to qualify for caching (default: 1,000)
Enable Prefix Caching — Master toggle to enable or disable caching entirely

Savings Breakdown

A per-category breakdown showing tokens saved and dollar value for each cache type: system prompt cache, tool schema cache, and conversation prefix cache.

Cache Invalidation

Use the "Invalidate All Caches" button when you update prompts or tool schemas. This forces all subsequent requests to build fresh cache entries.

7. Guardrails & Cost Control

Guardrails prevent runaway costs from unexpectedly large context windows. Set hard limits and get alerts before spend gets out of hand.

Long Context Threshold

Set a maximum token limit for any single request (default: 100,000). A visual bar shows warning level (80%) and block level (100%).

Blocking Behavior

Block by Default — Automatically reject requests that exceed the threshold
Require Confirmation — Force a typed confirmation to override blocked requests

Cost Controls

Max Daily Spend — Hard dollar limit per day (default: $500)
Alert Threshold — Percentage of daily spend that triggers an alert (default: 80%)

Manual Override

For exceptional cases, you can request an override by typing ALLOW OVERRIDE in the confirmation field. All overrides are logged with timestamp, request ID, and user attribution.

8. Conversation Memory

Manage how conversation history is stored and compressed. Older messages are automatically summarized to save tokens while preserving context.

Memory Stats

Total Memory — Token count of all stored context
Pinned Items — Tokens used by pinned (always-included) items
Messages — Total message count in history
Summaries — Number of compressed summary blocks

Verbatim History

Configure how many recent conversation turns are kept word-for-word (1 to 20). Everything older is compressed into summaries.

Pinned Items

Pin critical context that should persist across all compression cycles. Categories include constraints, decisions, tasks, and general context. Pinned items are never summarized or removed.

Compression

View before/after comparisons of compression. A typical compression reduces a 180-token exchange to a 25-token summary — an 86% reduction — without losing the essential information.

9. Retrieval Packing

Optimize how retrieved documents (RAG) are packed into your context window. Smart packing eliminates redundancy and maximizes information density.

Retrieval Budget

Set a token limit for retrieved chunks (default: 2,000). A visual meter shows how much of the budget is currently used.

Chunking Settings

Max Chunk Size — Maximum tokens per chunk (default: 512)
Overlap Tokens — Shared tokens between adjacent chunks for continuity (default: 50)
Boundary Strategy — How chunks are split: sentence, paragraph, or semantic

Smart Packing vs. Naive Top-K

The pack preview lets you compare approaches. A naive top-K approach might retrieve 5 chunks using 2,400 tokens with 35% redundancy. Smart packing selects 3 optimized chunks using 385 tokens with only 8% redundancy — an 84% token savings.

10. Tool Management

Manage the tool schemas that your AI agents use. Every tool definition consumes tokens — optimizing them directly reduces cost per request.

Schema Registry

View all registered tools with their token footprint, duplicate fragment count, and compact savings potential. Toggle tools on or off as needed.

Compact Mode

Enable compact mode to automatically deduplicate shared fields across tool schemas. This reduces the total schema footprint without changing tool behavior.

Output Policy

Configure how tool outputs are handled:

Token Limit — Max tokens per tool result (default: 4,000)
Byte Limit — Max bytes before encoding (default: 32,000)
Summarization — Choose between None (truncate), Auto (summarize when over 50%), or Aggressive (always summarize)
Artifact Retention — Keep all artifacts, only the latest, or none

Redaction Rules

Pre-configured regex patterns automatically redact sensitive data from tool outputs before they enter the context window. Built-in rules cover API keys, email addresses, and phone numbers. Each rule can be individually enabled or disabled, and you can test redaction with sample text before deploying.

11. Trading

KaliXPro connects to multiple prediction market venues and crypto exchanges for real-time signal detection and execution.

Supported Venues

Polymarket — Decentralized prediction market. Supports political and sports markets. Configurable exposure caps.
Kalshi — CFTC-regulated prediction market. Supports event contracts with low taker fees (7 bps).
Coinbase — Crypto trading via Advanced Trade API. Supports BTC-USDC, ETH-USDC, SOL-USDC with limit orders at midpoint pricing.

Execution Modes

Paper — Scan and analyze only, no real trades placed. Use this for testing.
Canary — Limited live trading with strict position caps. Use this for validation.
Live — Full trading enabled with configured risk limits.

Domain Agents

Six specialized AI agents analyze different market domains:

Sports — NFL, NBA, Soccer, UFC event analysis
Finance — Market movements, Fed decisions, economic indicators
Crypto — BTC, ETH, SOL price action and on-chain signals
Elections — Political races, polling data, sentiment
Weather — Weather-driven market impact analysis
Orchestrator — Routes signals to the right agent, filters noise, manages the pipeline

Signal Sources

Signals are aggregated from multiple sources with configurable weights:

Reddit sentiment analysis
Google Trends momentum
News API coverage
Live odds and pricing data

Risk Management

Per-venue and total exposure caps
Stop-loss and take-profit thresholds
Kill switch with configurable triggers
Circuit breakers for rapid market moves
Minimum confidence score requirements

12. Pricing

Starter — $99/month ($79/month annual)

2 Domain Agents (Sports + Finance)
Up to $10k monthly trading volume
Basic signal alerts
Standard latency (~500ms)
Email support

Professional — $299/month ($239/month annual)

All 6 Domain Agents
Up to $100k monthly trading volume
Real-time signal streaming
Low latency (~100ms)
Full API access
Priority support

Enterprise — $999/month ($799/month annual)

All 6 Domain Agents + Custom agents
Unlimited trading volume
Ultra-low latency (~50ms)
Dedicated infrastructure
Custom model fine-tuning
White-glove onboarding
24/7 dedicated support

13. Support & Contact

For questions, issues, or feedback, reach us at support@kalixpro.com.

Interested in early access? Join the beta program for priority onboarding and direct access to the engineering team.

Enterprise customers receive 24/7 dedicated support with a named account manager and guaranteed response SLAs.