Guardrails
NeuraMeter guardrails let you set thresholds on cost and context usage, with 3 modes to choose how violations are handled.
3 Modes
| Mode | Behavior | Use Case |
|---|---|---|
notify | Alert only, never stop | Default. Safe for production. |
block | Throw error on hard limit violations | Prevent runaway costs |
auto-optimize | Call your callback to fix and retry | Self-healing agents |
Configuration
import { NeuraMeter } from '@neurameter/core';
const meter = new NeuraMeter({
apiKey: 'nm_xxx',
projectId: 'proj_xxx',
guards: {
mode: 'notify', // 'notify' | 'block' | 'auto-optimize'
// Soft limits (trigger alerts)
maxInputTokens: 50_000,
maxContextUtilization: 0.80, // 80% of context window
maxCostPerCall: 0.50, // $0.50 per call
maxCostPerHour: 10.0, // $10/hour per agent
// Hard limits (block mode only — throws NeuraMeterGuardError)
maxInputTokensHard: 100_000,
maxContextUtilizationHard: 0.95,
maxCostPerCallHard: 2.0,
// Notifications
notifySlackWebhook: 'https://hooks.slack.com/services/xxx',
notifyDashboard: true,
// Auto-optimize callback (auto-optimize mode only)
onOptimize: async (event) => {
// Your optimization logic here
return { action: 'retry', messages: optimizedMessages };
},
},
});Guard Rules
Input Token Limit
Checks the estimated input tokens against a threshold.
guards: {
maxInputTokens: 50_000, // soft limit — alerts
maxInputTokensHard: 100_000, // hard limit — blocks (block mode)
}Context Utilization
Checks what percentage of the model’s context window is being used.
guards: {
maxContextUtilization: 0.80, // 80% — soft
maxContextUtilizationHard: 0.95, // 95% — hard
}The utilization is calculated as:
utilization = estimatedInputTokens / modelContextLimitCost Per Call
Estimates the input cost before the API call is made.
guards: {
maxCostPerCall: 0.50, // $0.50 — soft
maxCostPerCallHard: 2.0, // $2.00 — hard
}Cost Per Hour
Tracks rolling 1-hour cost per agent.
guards: {
maxCostPerHour: 10.0, // $10/hour
}Checking Guards
Call checkGuards() before making an API call:
const result = meter.checkGuards({
messages: [
{ role: 'system', content: 'You are a helpful assistant.' },
{ role: 'user', content: 'Analyze this 50-page document...' },
],
model: 'gpt-4o',
provider: 'openai',
agentName: 'MyAgent',
});GuardCheckResult
interface GuardCheckResult {
decision: 'allow' | 'notify' | 'block' | 'optimized';
triggeredRules: TriggeredRule[];
contextAnalysis: ContextAnalysis | null;
suggestion?: string;
}| Decision | Meaning |
|---|---|
allow | No rules triggered |
notify | Rules triggered, alert sent |
block | Hard limit exceeded in block mode |
optimized | Auto-optimize callback was called |
Block Mode
In block mode, hard limit violations throw NeuraMeterGuardError:
import { NeuraMeterGuardError } from '@neurameter/core';
try {
meter.checkGuards({ /* ... */ });
const response = await openai.chat.completions.create({ /* ... */ });
} catch (err) {
if (err instanceof NeuraMeterGuardError) {
console.log(err.rule); // 'input_tokens'
console.log(err.current); // 120000
console.log(err.threshold); // 100000
console.log(err.suggestion); // 'Reduce input tokens...'
}
}Auto-Optimize Mode
In auto-optimize mode, provide an onOptimize callback:
guards: {
mode: 'auto-optimize',
maxContextUtilization: 0.80,
onOptimize: async (event) => {
// event.type: 'context_utilization' | 'cost_per_call' | 'input_tokens'
// event.suggestion: human-readable optimization hint
// event.metrics.messages: the current messages array
const compressed = await summarizeOldMessages(event.metrics.messages);
return {
action: 'retry', // 'retry' | 'notify' | 'block'
messages: compressed, // optimized messages for retry
model: 'gpt-4o-mini', // optionally switch to cheaper model
};
},
}OptimizeResult
| Action | Meaning |
|---|---|
retry | Retry the call with the returned messages/model |
notify | Fall back to notification only |
block | Stop the call |
Slack Notifications
Guard triggers can be sent to Slack:
guards: {
notifySlackWebhook: 'https://hooks.slack.com/services/T.../B.../xxx',
}Each triggered rule sends a message with:
- Agent name
- Rule type and values
- Optimization suggestion
Suggestions
NeuraMeter generates context-aware suggestions for each triggered rule:
| Rule | Example Suggestion |
|---|---|
context_utilization | ”Summarize conversation history to save ~60% of input tokens” |
input_tokens | ”Reduce input tokens from 120,000 to under 50,000” |
cost_per_call | ”Consider using a cheaper model (e.g., gpt-4o-mini)“ |
cost_per_hour | ”Hourly cost limit exceeded ($12.50 > $10.00). Throttle agent calls” |