Guardrails

NeuraMeter guardrails let you set thresholds on cost and context usage, with 3 modes to choose how violations are handled.

3 Modes

Mode	Behavior	Use Case
`notify`	Alert only, never stop	Default. Safe for production.
`block`	Throw error on hard limit violations	Prevent runaway costs
`auto-optimize`	Call your callback to fix and retry	Self-healing agents

Configuration


import { NeuraMeter } from '@neurameter/core';
 
const meter = new NeuraMeter({
  apiKey: 'nm_xxx',
  projectId: 'proj_xxx',
  guards: {
    mode: 'notify',                    // 'notify' | 'block' | 'auto-optimize'
 
    // Soft limits (trigger alerts)
    maxInputTokens: 50_000,
    maxContextUtilization: 0.80,       // 80% of context window
    maxCostPerCall: 0.50,              // $0.50 per call
    maxCostPerHour: 10.0,              // $10/hour per agent
 
    // Hard limits (block mode only — throws NeuraMeterGuardError)
    maxInputTokensHard: 100_000,
    maxContextUtilizationHard: 0.95,
    maxCostPerCallHard: 2.0,
 
    // Notifications
    notifySlackWebhook: 'https://hooks.slack.com/services/xxx',
    notifyDashboard: true,
 
    // Auto-optimize callback (auto-optimize mode only)
    onOptimize: async (event) => {
      // Your optimization logic here
      return { action: 'retry', messages: optimizedMessages };
    },
  },
});

Guard Rules

Input Token Limit

Checks the estimated input tokens against a threshold.


guards: {
  maxInputTokens: 50_000,        // soft limit — alerts
  maxInputTokensHard: 100_000,   // hard limit — blocks (block mode)
}

Context Utilization

Checks what percentage of the model’s context window is being used.


guards: {
  maxContextUtilization: 0.80,       // 80% — soft
  maxContextUtilizationHard: 0.95,   // 95% — hard
}

The utilization is calculated as:


utilization = estimatedInputTokens / modelContextLimit

Cost Per Call

Estimates the input cost before the API call is made.


guards: {
  maxCostPerCall: 0.50,       // $0.50 — soft
  maxCostPerCallHard: 2.0,    // $2.00 — hard
}

Cost Per Hour

Tracks rolling 1-hour cost per agent.


guards: {
  maxCostPerHour: 10.0,   // $10/hour
}

Checking Guards

Call checkGuards() before making an API call:


const result = meter.checkGuards({
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'Analyze this 50-page document...' },
  ],
  model: 'gpt-4o',
  provider: 'openai',
  agentName: 'MyAgent',
});

GuardCheckResult


interface GuardCheckResult {
  decision: 'allow' | 'notify' | 'block' | 'optimized';
  triggeredRules: TriggeredRule[];
  contextAnalysis: ContextAnalysis | null;
  suggestion?: string;
}

Decision	Meaning
`allow`	No rules triggered
`notify`	Rules triggered, alert sent
`block`	Hard limit exceeded in block mode
`optimized`	Auto-optimize callback was called

Block Mode

In block mode, hard limit violations throw NeuraMeterGuardError:


import { NeuraMeterGuardError } from '@neurameter/core';
 
try {
  meter.checkGuards({ /* ... */ });
  const response = await openai.chat.completions.create({ /* ... */ });
} catch (err) {
  if (err instanceof NeuraMeterGuardError) {
    console.log(err.rule);        // 'input_tokens'
    console.log(err.current);     // 120000
    console.log(err.threshold);   // 100000
    console.log(err.suggestion);  // 'Reduce input tokens...'
  }
}

Auto-Optimize Mode

In auto-optimize mode, provide an onOptimize callback:


guards: {
  mode: 'auto-optimize',
  maxContextUtilization: 0.80,
  onOptimize: async (event) => {
    // event.type: 'context_utilization' | 'cost_per_call' | 'input_tokens'
    // event.suggestion: human-readable optimization hint
    // event.metrics.messages: the current messages array
 
    const compressed = await summarizeOldMessages(event.metrics.messages);
 
    return {
      action: 'retry',       // 'retry' | 'notify' | 'block'
      messages: compressed,   // optimized messages for retry
      model: 'gpt-4o-mini',  // optionally switch to cheaper model
    };
  },
}

OptimizeResult

Action	Meaning
`retry`	Retry the call with the returned messages/model
`notify`	Fall back to notification only
`block`	Stop the call

Slack Notifications

Guard triggers can be sent to Slack:


guards: {
  notifySlackWebhook: 'https://hooks.slack.com/services/T.../B.../xxx',
}

Each triggered rule sends a message with:

Agent name
Rule type and values
Optimization suggestion

Suggestions

NeuraMeter generates context-aware suggestions for each triggered rule:

Rule	Example Suggestion
`context_utilization`	”Summarize conversation history to save ~60% of input tokens”
`input_tokens`	”Reduce input tokens from 120,000 to under 50,000”
`cost_per_call`	”Consider using a cheaper model (e.g., gpt-4o-mini)“
`cost_per_hour`	”Hourly cost limit exceeded ($12.50 > $10.00). Throttle agent calls”