Skip to Content
Guardrails

Guardrails

NeuraMeter guardrails let you set thresholds on cost and context usage, with 3 modes to choose how violations are handled.

3 Modes

ModeBehaviorUse Case
notifyAlert only, never stopDefault. Safe for production.
blockThrow error on hard limit violationsPrevent runaway costs
auto-optimizeCall your callback to fix and retrySelf-healing agents

Configuration

import { NeuraMeter } from '@neurameter/core'; const meter = new NeuraMeter({ apiKey: 'nm_xxx', projectId: 'proj_xxx', guards: { mode: 'notify', // 'notify' | 'block' | 'auto-optimize' // Soft limits (trigger alerts) maxInputTokens: 50_000, maxContextUtilization: 0.80, // 80% of context window maxCostPerCall: 0.50, // $0.50 per call maxCostPerHour: 10.0, // $10/hour per agent // Hard limits (block mode only — throws NeuraMeterGuardError) maxInputTokensHard: 100_000, maxContextUtilizationHard: 0.95, maxCostPerCallHard: 2.0, // Notifications notifySlackWebhook: 'https://hooks.slack.com/services/xxx', notifyDashboard: true, // Auto-optimize callback (auto-optimize mode only) onOptimize: async (event) => { // Your optimization logic here return { action: 'retry', messages: optimizedMessages }; }, }, });

Guard Rules

Input Token Limit

Checks the estimated input tokens against a threshold.

guards: { maxInputTokens: 50_000, // soft limit — alerts maxInputTokensHard: 100_000, // hard limit — blocks (block mode) }

Context Utilization

Checks what percentage of the model’s context window is being used.

guards: { maxContextUtilization: 0.80, // 80% — soft maxContextUtilizationHard: 0.95, // 95% — hard }

The utilization is calculated as:

utilization = estimatedInputTokens / modelContextLimit

Cost Per Call

Estimates the input cost before the API call is made.

guards: { maxCostPerCall: 0.50, // $0.50 — soft maxCostPerCallHard: 2.0, // $2.00 — hard }

Cost Per Hour

Tracks rolling 1-hour cost per agent.

guards: { maxCostPerHour: 10.0, // $10/hour }

Checking Guards

Call checkGuards() before making an API call:

const result = meter.checkGuards({ messages: [ { role: 'system', content: 'You are a helpful assistant.' }, { role: 'user', content: 'Analyze this 50-page document...' }, ], model: 'gpt-4o', provider: 'openai', agentName: 'MyAgent', });

GuardCheckResult

interface GuardCheckResult { decision: 'allow' | 'notify' | 'block' | 'optimized'; triggeredRules: TriggeredRule[]; contextAnalysis: ContextAnalysis | null; suggestion?: string; }
DecisionMeaning
allowNo rules triggered
notifyRules triggered, alert sent
blockHard limit exceeded in block mode
optimizedAuto-optimize callback was called

Block Mode

In block mode, hard limit violations throw NeuraMeterGuardError:

import { NeuraMeterGuardError } from '@neurameter/core'; try { meter.checkGuards({ /* ... */ }); const response = await openai.chat.completions.create({ /* ... */ }); } catch (err) { if (err instanceof NeuraMeterGuardError) { console.log(err.rule); // 'input_tokens' console.log(err.current); // 120000 console.log(err.threshold); // 100000 console.log(err.suggestion); // 'Reduce input tokens...' } }

Auto-Optimize Mode

In auto-optimize mode, provide an onOptimize callback:

guards: { mode: 'auto-optimize', maxContextUtilization: 0.80, onOptimize: async (event) => { // event.type: 'context_utilization' | 'cost_per_call' | 'input_tokens' // event.suggestion: human-readable optimization hint // event.metrics.messages: the current messages array const compressed = await summarizeOldMessages(event.metrics.messages); return { action: 'retry', // 'retry' | 'notify' | 'block' messages: compressed, // optimized messages for retry model: 'gpt-4o-mini', // optionally switch to cheaper model }; }, }

OptimizeResult

ActionMeaning
retryRetry the call with the returned messages/model
notifyFall back to notification only
blockStop the call

Slack Notifications

Guard triggers can be sent to Slack:

guards: { notifySlackWebhook: 'https://hooks.slack.com/services/T.../B.../xxx', }

Each triggered rule sends a message with:

  • Agent name
  • Rule type and values
  • Optimization suggestion

Suggestions

NeuraMeter generates context-aware suggestions for each triggered rule:

RuleExample Suggestion
context_utilization”Summarize conversation history to save ~60% of input tokens”
input_tokens”Reduce input tokens from 120,000 to under 50,000”
cost_per_call”Consider using a cheaper model (e.g., gpt-4o-mini)“
cost_per_hour”Hourly cost limit exceeded ($12.50 > $10.00). Throttle agent calls”