Documentation
Last Updated: February 12, 2026
1. Our Mission
KaliXPro was designed by Harvard and Cambridge quants and students, hoping to bridge the gap between a lack of technology and prediction markets. Through a completely ground-floor-level build process — using low-latency programming languages like Rust instead of Python, and a host of purpose-built systems — we look to level the edge, rather than sell you on bombastic claims.
Our architecture is built from scratch: a Rust-native core for sub-millisecond execution, a TypeScript orchestration layer for agent intelligence, and a modular monorepo that lets every component be tested and deployed independently. No black boxes, no repackaged APIs — just transparent infrastructure designed for speed and reliability.
2. Getting Started
Creating Your Account
Sign in at kalixpro.com/signin using one of our supported authentication providers:
- Google — Sign in with your Google account
- X (Twitter) — Sign in with your X account
- Meta (Facebook) — Sign in with your Meta account
After authentication, you are redirected to the Console — your central dashboard for managing context, monitoring performance, and configuring trading parameters.
First Login
On your first login, the Console opens to the Context Control overview. This is your home base. From here you can see live metrics, recent activity, and quick actions. Explore the sidebar to navigate between Context Control, Trading, and Settings.
3. Console Overview
The Console is a protected area — you must be signed in to access it. The sidebar navigation includes:
- Context Control — Manage AI token budgets, caching, guardrails, memory, retrieval, and tools
- Trading — Configure and monitor prediction market trading activity
- Settings — Account preferences and configuration
Your session is shown in the sidebar footer with your avatar, name, and a sign-out button. A status indicator shows system health at a glance.
4. Context Control Dashboard
The Context Control dashboard is the heart of KaliXPro. It gives you complete visibility into how your AI agents use tokens, and full control over optimization.
Key Metrics
- Avg Input Tokens — Average token usage per request, with trend indicator
- Truncation Rate — Percentage of requests that needed context truncation
- Retrieval Share — Portion of tokens consumed by RAG / document retrieval
- Tool Schema Share — Portion of tokens consumed by tool definitions
- Cache Hit Rate — How often cached prompt prefixes are reused
Top Regressions
Tracks which context sections are growing fastest. Each regression shows the section name, current token count, previous count, and delta. Use this to identify which sections need budget adjustment or optimization.
Quick Actions
- View Latest Request — Inspect the most recent LLM request in detail
- Adjust Budgets — Jump to the budget configuration page
- Tool Optimizer — Open the tool schema management page
- Export Report — Download a context usage report
5. Budget Configuration
Control exactly how many tokens are allocated to each section of your LLM requests. Over-allocating wastes money; under-allocating degrades quality. Budgets let you find the right balance.
Model Presets
Select a model preset to load recommended budget allocations. Available presets include gemini-3-flash, gemini-3-pro, and vertex-tuned-orchestrator. Each preset has different input/output limits and section allocations optimized for its context window.
Global Limits
- Input Limit — Maximum tokens for the entire input (e.g., 32,000)
- Output Limit — Maximum tokens for the model response (e.g., 8,000)
- Safety Reserve — Buffer tokens kept aside to prevent overflow (e.g., 2,000)
Section Budgets
Each section has a configurable token budget:
- system — System instructions (default: 2,000)
- developer — Developer messages (default: 1,000)
- conversation — Chat history (default: 8,000)
- retrieval — RAG document chunks (default: 4,000)
- tool_schemas — Function/tool definitions (default: 2,000)
- tool_results — Function output data (default: 3,000)
Dry Run Preview
Before saving, run a simulation to see how your budget changes would affect real traffic. The preview shows the number of affected requests, estimated truncation rate change, and projected cost savings.
6. Caching
Caching reduces costs by reusing common prompt prefixes across requests. When system prompts or tool schemas don't change between calls, cached tokens are served at a fraction of the cost.
Live Stats
- Hit Rate — Percentage of requests that reused cached content
- Total Requests — Volume processed in the current period
- Cached Tokens — Total tokens currently held in cache
- Estimated Savings — Dollar amount saved via caching
Cache Policy
- Short Cache TTL — Time-to-live for frequently changing content (default: 300 seconds)
- Long Cache TTL — Time-to-live for stable content like system prompts (default: 3,600 seconds)
- Min Prefix Length — Minimum token count to qualify for caching (default: 1,000)
- Enable Prefix Caching — Master toggle to enable or disable caching entirely
Savings Breakdown
A per-category breakdown showing tokens saved and dollar value for each cache type: system prompt cache, tool schema cache, and conversation prefix cache.
Cache Invalidation
Use the "Invalidate All Caches" button when you update prompts or tool schemas. This forces all subsequent requests to build fresh cache entries.
7. Guardrails & Cost Control
Guardrails prevent runaway costs from unexpectedly large context windows. Set hard limits and get alerts before spend gets out of hand.
Long Context Threshold
Set a maximum token limit for any single request (default: 100,000). A visual bar shows warning level (80%) and block level (100%).
Blocking Behavior
- Block by Default — Automatically reject requests that exceed the threshold
- Require Confirmation — Force a typed confirmation to override blocked requests
Cost Controls
- Max Daily Spend — Hard dollar limit per day (default: $500)
- Alert Threshold — Percentage of daily spend that triggers an alert (default: 80%)
Manual Override
For exceptional cases, you can request an override by typing ALLOW OVERRIDE in the confirmation field. All overrides are logged with timestamp, request ID, and user attribution.
8. Conversation Memory
Manage how conversation history is stored and compressed. Older messages are automatically summarized to save tokens while preserving context.
Memory Stats
- Total Memory — Token count of all stored context
- Pinned Items — Tokens used by pinned (always-included) items
- Messages — Total message count in history
- Summaries — Number of compressed summary blocks
Verbatim History
Configure how many recent conversation turns are kept word-for-word (1 to 20). Everything older is compressed into summaries.
Pinned Items
Pin critical context that should persist across all compression cycles. Categories include constraints, decisions, tasks, and general context. Pinned items are never summarized or removed.
Compression
View before/after comparisons of compression. A typical compression reduces a 180-token exchange to a 25-token summary — an 86% reduction — without losing the essential information.
9. Retrieval Packing
Optimize how retrieved documents (RAG) are packed into your context window. Smart packing eliminates redundancy and maximizes information density.
Retrieval Budget
Set a token limit for retrieved chunks (default: 2,000). A visual meter shows how much of the budget is currently used.
Chunking Settings
- Max Chunk Size — Maximum tokens per chunk (default: 512)
- Overlap Tokens — Shared tokens between adjacent chunks for continuity (default: 50)
- Boundary Strategy — How chunks are split:
sentence,paragraph, orsemantic
Smart Packing vs. Naive Top-K
The pack preview lets you compare approaches. A naive top-K approach might retrieve 5 chunks using 2,400 tokens with 35% redundancy. Smart packing selects 3 optimized chunks using 385 tokens with only 8% redundancy — an 84% token savings.
10. Tool Management
Manage the tool schemas that your AI agents use. Every tool definition consumes tokens — optimizing them directly reduces cost per request.
Schema Registry
View all registered tools with their token footprint, duplicate fragment count, and compact savings potential. Toggle tools on or off as needed.
Compact Mode
Enable compact mode to automatically deduplicate shared fields across tool schemas. This reduces the total schema footprint without changing tool behavior.
Output Policy
Configure how tool outputs are handled:
- Token Limit — Max tokens per tool result (default: 4,000)
- Byte Limit — Max bytes before encoding (default: 32,000)
- Summarization — Choose between None (truncate), Auto (summarize when over 50%), or Aggressive (always summarize)
- Artifact Retention — Keep all artifacts, only the latest, or none
Redaction Rules
Pre-configured regex patterns automatically redact sensitive data from tool outputs before they enter the context window. Built-in rules cover API keys, email addresses, and phone numbers. Each rule can be individually enabled or disabled, and you can test redaction with sample text before deploying.
11. Trading
KaliXPro connects to multiple prediction market venues and crypto exchanges for real-time signal detection and execution.
Supported Venues
- Polymarket — Decentralized prediction market. Supports political and sports markets. Configurable exposure caps.
- Kalshi — CFTC-regulated prediction market. Supports event contracts with low taker fees (7 bps).
- Coinbase — Crypto trading via Advanced Trade API. Supports BTC-USDC, ETH-USDC, SOL-USDC with limit orders at midpoint pricing.
Execution Modes
- Paper — Scan and analyze only, no real trades placed. Use this for testing.
- Canary — Limited live trading with strict position caps. Use this for validation.
- Live — Full trading enabled with configured risk limits.
Domain Agents
Six specialized AI agents analyze different market domains:
- Sports — NFL, NBA, Soccer, UFC event analysis
- Finance — Market movements, Fed decisions, economic indicators
- Crypto — BTC, ETH, SOL price action and on-chain signals
- Elections — Political races, polling data, sentiment
- Weather — Weather-driven market impact analysis
- Orchestrator — Routes signals to the right agent, filters noise, manages the pipeline
Signal Sources
Signals are aggregated from multiple sources with configurable weights:
- Reddit sentiment analysis
- Google Trends momentum
- News API coverage
- Live odds and pricing data
Risk Management
- Per-venue and total exposure caps
- Stop-loss and take-profit thresholds
- Kill switch with configurable triggers
- Circuit breakers for rapid market moves
- Minimum confidence score requirements
12. Pricing
Starter — $99/month ($79/month annual)
- 2 Domain Agents (Sports + Finance)
- Up to $10k monthly trading volume
- Basic signal alerts
- Standard latency (~500ms)
- Email support
Professional — $299/month ($239/month annual)
- All 6 Domain Agents
- Up to $100k monthly trading volume
- Real-time signal streaming
- Low latency (~100ms)
- Full API access
- Priority support
Enterprise — $999/month ($799/month annual)
- All 6 Domain Agents + Custom agents
- Unlimited trading volume
- Ultra-low latency (~50ms)
- Dedicated infrastructure
- Custom model fine-tuning
- White-glove onboarding
- 24/7 dedicated support
13. Support & Contact
For questions, issues, or feedback, reach us at support@kalixpro.com.
Interested in early access? Join the beta program for priority onboarding and direct access to the engineering team.
Enterprise customers receive 24/7 dedicated support with a named account manager and guaranteed response SLAs.