# Baffle Hub - Rule Architecture ## Overview Baffle Hub uses a distributed rule system where the Hub generates and manages rules, and Agents download and enforce them locally using optimized SQLite queries. This architecture provides sub-millisecond rule evaluation while maintaining centralized intelligence and control. ## Core Principles 1. **Hub-side Intelligence**: Pattern detection and rule generation happens on the Hub 2. **Agent-side Enforcement**: Rule evaluation happens locally on Agents for speed 3. **Incremental Sync**: Agents poll for rule updates using timestamp-based cursors 4. **Dynamic Backpressure**: Hub controls event sampling based on load 5. **Temporal Rules**: Rules can expire automatically (e.g., 24-hour bans) 6. **Soft Deletes**: Rules are disabled, not deleted, for proper sync and audit trail ## Rule Types ### 1. Network Rules (`network_v4`, `network_v6`) Block or allow traffic based on IP address or CIDR ranges. **Use Cases**: - Block scanner IPs (temporary or permanent) - Block datacenter/VPN/proxy ranges - Allow trusted IP ranges - Geographic blocking via IP ranges **Evaluation**: - **Most specific CIDR wins** (smallest prefix) - `/32` beats `/24` beats `/16` beats `/8` - Agent uses optimized range queries on `ipv4_ranges`/`ipv6_ranges` tables **Example**: ```json { "id": 12341, "rule_type": "network_v4", "action": "deny", "conditions": { "cidr": "185.220.100.0/22" }, "priority": 22, "expires_at": "2024-11-04T12:00:00Z", "enabled": true, "source": "auto:scanner_detected", "metadata": { "reason": "Tor exit node hitting /.env", "auto_generated": true } } ``` ### 2. Rate Limit Rules (`rate_limit`) Control request rate per IP or per CIDR range. **Scopes** (Phase 1): - **Global per-IP**: Limit requests per IP across all paths - **Per-CIDR**: Different limits for different network ranges **Scopes** (Phase 2+): - **Per-path per-IP**: Different limits for `/api/*`, `/login`, etc. **Evaluation**: - Agent maintains in-memory counters per IP - Finds most specific CIDR rule for the IP - Applies that rule's rate limit configuration - Optional: Persist counters to SQLite for restart resilience **Example (Phase 1)**: ```json { "id": 12342, "rule_type": "rate_limit", "action": "rate_limit", "conditions": { "cidr": "0.0.0.0/0", "scope": "global" }, "priority": 0, "enabled": true, "source": "manual", "metadata": { "limit": 100, "window": 60, "per_ip": true } } ``` **Example (Phase 2+)**: ```json { "id": 12343, "rule_type": "rate_limit", "action": "rate_limit", "conditions": { "cidr": "0.0.0.0/0", "scope": "per_path", "path_pattern": "/api/login" }, "metadata": { "limit": 5, "window": 60, "per_ip": true } } ``` ### 3. Path Pattern Rules (`path_pattern`) Detect suspicious path access patterns (mainly for Hub analytics). **Use Cases**: - Detect scanners hitting `/.env`, `/.git`, `/wp-admin` - Identify bots with suspicious path traversal - Trigger automatic IP bans when patterns match **Evaluation**: - Agent does lightweight pattern matching - When matched, sends event to Hub with `matched_pattern: true` - Hub analyzes and creates IP block rules if needed - Agent picks up new IP block rule in next sync (~10s) **Example**: ```json { "id": 12344, "rule_type": "path_pattern", "action": "log", "conditions": { "patterns": ["/.env", "/.git/*", "/wp-admin/*", "/.aws/*", "/phpMyAdmin/*"] }, "enabled": true, "source": "default:scanner_detection", "metadata": { "auto_ban_ip": true, "ban_duration_hours": 24, "description": "Common scanner paths" } } ``` ## Rule Actions | Action | Description | HTTP Response | |--------|-------------|---------------| | `allow` | Pass request through | Continue to app | | `deny` | Block request | 403 Forbidden | | `rate_limit` | Enforce rate limit | 429 Too Many Requests | | `redirect` | Redirect to URL | 301/302 + Location header | | `challenge` | Show CAPTCHA (Phase 2+) | 403 with challenge | | `log` | Log only, don't block | Continue to app | ## Rule Priority & Specificity ### Network Rules - **Priority is determined by CIDR prefix length** - Smaller prefix (more specific) = higher priority - `/32` (single IP) beats `/24` (256 IPs) beats `/8` (16M IPs) - Example: Block `10.0.0.0/8` but allow `10.0.1.0/24` - Request from `10.0.1.5` → matches `/24` → allowed - Request from `10.0.2.5` → matches `/8` only → blocked ### Rate Limit Rules - Most specific CIDR match wins - Per-path rules take precedence over global (Phase 2+) ### Path Pattern Rules - All patterns are evaluated (not exclusive) - Used for detection, not blocking - Multiple pattern matches = stronger signal for ban ## Rule Synchronization ### Timestamp-Based Cursor Agents use `updated_at` timestamps as sync cursors to handle rule updates and deletions. **Why `updated_at` instead of `id`?** - Handles rule updates (e.g., disabling a rule updates `updated_at`) - Handles rule deletions via `enabled=false` flag - Simple for agents: "give me everything that changed since X" **Agent Sync Flow**: ``` 1. Agent starts: last_sync = nil 2. GET /api/:key/rules → Full sync, store latest updated_at 3. Every 10s or 1000 events: GET /api/:key/rules?since= 4. Process rules: add new, update existing, remove disabled 5. Update last_sync to latest updated_at from response ``` **Query Overlap**: Hub queries `updated_at >= since - 0.5s` to handle clock skew and millisecond duplicates. ### API Endpoints #### 1. Version Check (Lightweight) ```http GET /api/:public_key/rules/version Response: { "version": 1730646645123000, "count": 150, "sampling": { "allowed_requests": 0.5, "blocked_requests": 1.0, "rate_limited_requests": 1.0, "effective_until": "2024-11-03T12:30:55.123Z" } } ``` **Timestamp Format**: The `version` field uses **microsecond Unix timestamp** (e.g., `1730646645123000`) for efficient machine comparison. For backward compatibility, the API also accepts ISO8601 timestamps in the `since` parameter. #### 2. Incremental Sync ```http GET /api/:public_key/rules?since=1730646000000000 Response: { "version": 1730646645123000, "sampling": { ... }, "rules": [ { "id": 12341, "rule_type": "network_v4", "action": "deny", "conditions": { "cidr": "1.2.3.4/32" }, "priority": 32, "expires_at": "2024-11-04T12:00:00Z", "enabled": true, "source": "auto:scanner_detected", "metadata": { "reason": "Hitting /.env" }, "created_at": "2024-11-03T12:00:00Z", "updated_at": "2024-11-03T12:00:00Z" }, { "id": 12340, "rule_type": "network_v4", "action": "deny", "conditions": { "cidr": "5.6.7.8/32" }, "priority": 32, "enabled": false, "source": "manual", "metadata": { "reason": "False positive" }, "created_at": "2024-11-02T10:00:00Z", "updated_at": "2024-11-03T12:25:00Z" } ] } ``` #### 3. Full Sync ```http GET /api/:public_key/rules Response: { "version": 1730646645123000, "sampling": { ... }, "rules": [ ...all enabled rules... ] } ``` ## Dynamic Event Sampling Hub controls how many events Agents send based on load. ### Sampling Strategy **Hub monitors**: - SolidQueue job depth - Events/second rate - Database write latency **Sampling rates**: ```ruby Queue Depth | Allowed | Blocked | Rate Limited ----------------|---------|---------|------------- 0-1,000 | 100% | 100% | 100% 1,001-5,000 | 50% | 100% | 100% 5,001-10,000 | 20% | 100% | 100% 10,001+ | 5% | 100% | 100% ``` **Phase 2+: Path-based sampling**: ```json { "sampling": { "allowed_requests": 0.1, "blocked_requests": 1.0, "paths": { "block": ["/.env", "/.git/*"], "allow": ["/health", "/metrics"] } } } ``` **Agent respects sampling**: - Always sends blocked/rate-limited events - Samples allowed events based on rate - Can prioritize suspicious paths over routine traffic ## Temporal Rules (Expiration) Rules can have an `expires_at` timestamp for automatic expiration. **Use Cases**: - 24-hour scanner bans - Temporary rate limit adjustments - Time-boxed maintenance blocks **Cleanup**: - `ExpiredRulesCleanupJob` runs hourly - Disables rules where `expires_at < now` - Agent picks up disabled rules in next sync **Example**: ```ruby # Hub auto-generates rule when scanner detected: Rule.create!( rule_type: "network_v4", action: "deny", conditions: { cidr: "1.2.3.4/32" }, expires_at: 24.hours.from_now, source: "auto:scanner_detected", metadata: { reason: "Hit /.env 5 times in 10 seconds" } ) # 24 hours later: ExpiredRulesCleanupJob disables it # Agent syncs and removes from ipv4_ranges table ``` ## Rule Sources The `source` field tracks rule origin for audit and filtering. **Source Formats**: - `manual` - Created by user via UI - `auto:scanner_detected` - Auto-generated from scanner pattern - `auto:rate_limit_exceeded` - Auto-generated from rate limit abuse - `auto:bot_detected` - Auto-generated from bot behavior - `imported:fail2ban` - Imported from external source - `imported:crowdsec` - Imported from CrowdSec - `default:scanner_paths` - Default rule set ## Database Schema ### Hub Schema ```ruby create_table "rules" do |t| # Identification t.integer :id, primary_key: true t.string :source, limit: 100 # Rule definition t.string :rule_type, null: false t.string :action, null: false t.json :conditions, null: false t.json :metadata # Priority & lifecycle t.integer :priority t.datetime :expires_at t.boolean :enabled, default: true, null: false # Timestamps (updated_at is sync cursor!) t.timestamps # Indexes t.index [:updated_at, :id] # Primary sync query t.index :enabled t.index :expires_at t.index :source t.index :rule_type end ``` ### Agent Schema (Existing) ```ruby create_table "ipv4_ranges" do |t| t.integer :network_start, limit: 8, null: false t.integer :network_end, limit: 8, null: false t.integer :network_prefix, null: false t.integer :waf_action, default: 0, null: false t.integer :priority, default: 100 t.string :redirect_url, limit: 500 t.integer :redirect_status t.string :source, limit: 50 t.timestamps t.index [:network_start, :network_end, :network_prefix] t.index :waf_action end create_table "ipv6_ranges" do |t| t.binary :network_start, limit: 16, null: false t.binary :network_end, limit: 16, null: false t.integer :network_prefix, null: false t.integer :waf_action, default: 0, null: false t.integer :priority, default: 100 t.string :redirect_url, limit: 500 t.integer :redirect_status t.string :source, limit: 50 t.timestamps t.index [:network_start, :network_end, :network_prefix] t.index :waf_action end ``` ## Agent Rule Processing ### Network Rules ```ruby # Agent receives network rule from Hub: rule = { id: 12341, rule_type: "network_v4", action: "deny", conditions: { cidr: "10.0.0.0/8" }, priority: 8, enabled: true } # Agent converts to ipv4_ranges entry: cidr = IPAddr.new("10.0.0.0/8") Ipv4Range.upsert({ source: "hub:12341", network_start: cidr.to_i, network_end: cidr.to_range.end.to_i, network_prefix: 8, waf_action: 1, # deny priority: 8 }, unique_by: :source) # Agent evaluates request: # SELECT * FROM ipv4_ranges # WHERE ? BETWEEN network_start AND network_end # ORDER BY network_prefix DESC # LIMIT 1 ``` ### Rate Limit Rules ```ruby # Agent stores in memory: @rate_limit_rules = { "global" => { limit: 100, window: 60, cidr: "0.0.0.0/0" } } @rate_counters = { "1.2.3.4" => { count: 50, window_start: Time.now } } # On each request: def check_rate_limit(ip) rule = find_most_specific_rate_limit_rule(ip) counter = @rate_counters[ip] ||= { count: 0, window_start: Time.now } # Reset window if expired if Time.now - counter[:window_start] > rule[:window] counter = { count: 0, window_start: Time.now } end counter[:count] += 1 if counter[:count] > rule[:limit] { action: "rate_limit", status: 429 } else { action: "allow" } end end ``` ### Path Pattern Rules ```ruby # Agent evaluates patterns: PATH_PATTERNS = [/.env$/, /.git/, /wp-admin/] def check_path_patterns(path) matched = PATH_PATTERNS.any? { |pattern| path.match?(pattern) } if matched # Send event to Hub with flag send_event_to_hub( path: path, matched_pattern: true, waf_action: "log" # Don't block yet ) # Hub will analyze and create IP block rule if needed end end ``` ## Hub Intelligence (Auto-Generation) ### Scanner Detection ```ruby # PathScannerDetectorJob class PathScannerDetectorJob < ApplicationJob SCANNER_PATHS = %w[/.env /.git /wp-admin /phpMyAdmin /.aws] def perform # Find IPs hitting scanner paths scanner_ips = Event .where("request_path IN (?)", SCANNER_PATHS) .where("timestamp > ?", 5.minutes.ago) .group(:ip_address) .having("COUNT(*) >= 3") .pluck(:ip_address) scanner_ips.each do |ip| # Create 24h ban rule Rule.create!( rule_type: "network_v4", action: "deny", conditions: { cidr: "#{ip}/32" }, priority: 32, expires_at: 24.hours.from_now, source: "auto:scanner_detected", metadata: { reason: "Hit #{SCANNER_PATHS.join(', ')}", auto_generated: true } ) end end end ``` ### Rate Limit Abuse Detection ```ruby # RateLimitAnomalyJob class RateLimitAnomalyJob < ApplicationJob def perform # Find IPs exceeding normal rate abusive_ips = Event .where("timestamp > ?", 1.minute.ago) .group(:ip_address) .having("COUNT(*) > 200") # >200 req/min .pluck(:ip_address) abusive_ips.each do |ip| # Create aggressive rate limit or block Rule.create!( rule_type: "rate_limit", action: "rate_limit", conditions: { cidr: "#{ip}/32", scope: "global" }, priority: 32, expires_at: 1.hour.from_now, source: "auto:rate_limit_exceeded", metadata: { limit: 10, window: 60, per_ip: true } ) end end end ``` ## Performance Characteristics ### Hub - **Rule query**: O(log n) with `(updated_at, id)` index - **Version check**: Single index lookup - **Rule generation**: Background jobs, no request impact ### Agent - **Network rule lookup**: O(log n) via B-tree index on `(network_start, network_end)` - **Rate limit check**: O(1) hash lookup in memory - **Path pattern check**: O(n) regex match (n = number of patterns) - **Overall request evaluation**: <1ms for typical case ### Sync Efficiency - **Incremental sync**: Only changed rules since last sync - **Typical sync payload**: <10 KB for 50 rules - **Sync frequency**: Every 10s or 1000 events - **Version check**: <1 KB response ## Future Enhancements (Phase 2+) ### Per-Path Rate Limiting - Different limits for `/api/*`, `/login`, `/admin` - Agent tracks multiple counters per IP ### Path-Based Event Sampling - Send all `/admin` requests - Skip `/health`, `/metrics` - Sample 10% of regular traffic ### Challenge Actions - CAPTCHA challenges for suspicious IPs - JavaScript challenges for bot detection ### Scheduled Rules - Block during maintenance windows - Time-of-day rate limits ### Multi-Project Rules (Phase 10+) - Global rules across all projects - Per-project rule overrides ## Summary The Baffle Hub rule system provides: - **Fast local enforcement** (sub-millisecond) - **Centralized intelligence** (Hub analytics) - **Efficient synchronization** (timestamp-based incremental sync) - **Dynamic adaptation** (backpressure control via sampling) - **Temporal flexibility** (auto-expiring rules) - **Audit trail** (soft deletes, source tracking) This architecture scales from single-server deployments to distributed multi-agent installations while maintaining simplicity and pragmatic design choices focused on the "low-hanging fruit" of WAF functionality.