628 lines
16 KiB
Markdown
628 lines
16 KiB
Markdown
# Baffle Hub - Rule Architecture
|
|
|
|
## Overview
|
|
|
|
Baffle Hub uses a distributed rule system where the Hub generates and manages rules, and Agents download and enforce them locally using optimized SQLite queries. This architecture provides sub-millisecond rule evaluation while maintaining centralized intelligence and control.
|
|
|
|
## Core Principles
|
|
|
|
1. **Hub-side Intelligence**: Pattern detection and rule generation happens on the Hub
|
|
2. **Agent-side Enforcement**: Rule evaluation happens locally on Agents for speed
|
|
3. **Incremental Sync**: Agents poll for rule updates using timestamp-based cursors
|
|
4. **Dynamic Backpressure**: Hub controls event sampling based on load
|
|
5. **Temporal Rules**: Rules can expire automatically (e.g., 24-hour bans)
|
|
6. **Soft Deletes**: Rules are disabled, not deleted, for proper sync and audit trail
|
|
|
|
## Rule Types
|
|
|
|
### 1. Network Rules (`network_v4`, `network_v6`)
|
|
|
|
Block or allow traffic based on IP address or CIDR ranges.
|
|
|
|
**Use Cases**:
|
|
- Block scanner IPs (temporary or permanent)
|
|
- Block datacenter/VPN/proxy ranges
|
|
- Allow trusted IP ranges
|
|
- Geographic blocking via IP ranges
|
|
|
|
**Evaluation**:
|
|
- **Most specific CIDR wins** (smallest prefix)
|
|
- `/32` beats `/24` beats `/16` beats `/8`
|
|
- Agent uses optimized range queries on `ipv4_ranges`/`ipv6_ranges` tables
|
|
|
|
**Example**:
|
|
```json
|
|
{
|
|
"id": 12341,
|
|
"rule_type": "network_v4",
|
|
"action": "deny",
|
|
"conditions": { "cidr": "185.220.100.0/22" },
|
|
"priority": 22,
|
|
"expires_at": "2024-11-04T12:00:00Z",
|
|
"enabled": true,
|
|
"source": "auto:scanner_detected",
|
|
"metadata": {
|
|
"reason": "Tor exit node hitting /.env",
|
|
"auto_generated": true
|
|
}
|
|
}
|
|
```
|
|
|
|
### 2. Rate Limit Rules (`rate_limit`)
|
|
|
|
Control request rate per IP or per CIDR range.
|
|
|
|
**Scopes** (Phase 1):
|
|
- **Global per-IP**: Limit requests per IP across all paths
|
|
- **Per-CIDR**: Different limits for different network ranges
|
|
|
|
**Scopes** (Phase 2+):
|
|
- **Per-path per-IP**: Different limits for `/api/*`, `/login`, etc.
|
|
|
|
**Evaluation**:
|
|
- Agent maintains in-memory counters per IP
|
|
- Finds most specific CIDR rule for the IP
|
|
- Applies that rule's rate limit configuration
|
|
- Optional: Persist counters to SQLite for restart resilience
|
|
|
|
**Example (Phase 1)**:
|
|
```json
|
|
{
|
|
"id": 12342,
|
|
"rule_type": "rate_limit",
|
|
"action": "rate_limit",
|
|
"conditions": {
|
|
"cidr": "0.0.0.0/0",
|
|
"scope": "global"
|
|
},
|
|
"priority": 0,
|
|
"enabled": true,
|
|
"source": "manual",
|
|
"metadata": {
|
|
"limit": 100,
|
|
"window": 60,
|
|
"per_ip": true
|
|
}
|
|
}
|
|
```
|
|
|
|
**Example (Phase 2+)**:
|
|
```json
|
|
{
|
|
"id": 12343,
|
|
"rule_type": "rate_limit",
|
|
"action": "rate_limit",
|
|
"conditions": {
|
|
"cidr": "0.0.0.0/0",
|
|
"scope": "per_path",
|
|
"path_pattern": "/api/login"
|
|
},
|
|
"metadata": {
|
|
"limit": 5,
|
|
"window": 60,
|
|
"per_ip": true
|
|
}
|
|
}
|
|
```
|
|
|
|
### 3. Path Pattern Rules (`path_pattern`)
|
|
|
|
Detect suspicious path access patterns (mainly for Hub analytics).
|
|
|
|
**Use Cases**:
|
|
- Detect scanners hitting `/.env`, `/.git`, `/wp-admin`
|
|
- Identify bots with suspicious path traversal
|
|
- Trigger automatic IP bans when patterns match
|
|
|
|
**Evaluation**:
|
|
- Agent does lightweight pattern matching
|
|
- When matched, sends event to Hub with `matched_pattern: true`
|
|
- Hub analyzes and creates IP block rules if needed
|
|
- Agent picks up new IP block rule in next sync (~10s)
|
|
|
|
**Example**:
|
|
```json
|
|
{
|
|
"id": 12344,
|
|
"rule_type": "path_pattern",
|
|
"action": "log",
|
|
"conditions": {
|
|
"patterns": ["/.env", "/.git/*", "/wp-admin/*", "/.aws/*", "/phpMyAdmin/*"]
|
|
},
|
|
"enabled": true,
|
|
"source": "default:scanner_detection",
|
|
"metadata": {
|
|
"auto_ban_ip": true,
|
|
"ban_duration_hours": 24,
|
|
"description": "Common scanner paths"
|
|
}
|
|
}
|
|
```
|
|
|
|
## Rule Actions
|
|
|
|
| Action | Description | HTTP Response |
|
|
|--------|-------------|---------------|
|
|
| `allow` | Pass request through | Continue to app |
|
|
| `deny` | Block request | 403 Forbidden |
|
|
| `rate_limit` | Enforce rate limit | 429 Too Many Requests |
|
|
| `redirect` | Redirect to URL | 301/302 + Location header |
|
|
| `challenge` | Show CAPTCHA (Phase 2+) | 403 with challenge |
|
|
| `log` | Log only, don't block | Continue to app |
|
|
|
|
## Rule Priority & Specificity
|
|
|
|
### Network Rules
|
|
- **Priority is determined by CIDR prefix length**
|
|
- Smaller prefix (more specific) = higher priority
|
|
- `/32` (single IP) beats `/24` (256 IPs) beats `/8` (16M IPs)
|
|
- Example: Block `10.0.0.0/8` but allow `10.0.1.0/24`
|
|
- Request from `10.0.1.5` → matches `/24` → allowed
|
|
- Request from `10.0.2.5` → matches `/8` only → blocked
|
|
|
|
### Rate Limit Rules
|
|
- Most specific CIDR match wins
|
|
- Per-path rules take precedence over global (Phase 2+)
|
|
|
|
### Path Pattern Rules
|
|
- All patterns are evaluated (not exclusive)
|
|
- Used for detection, not blocking
|
|
- Multiple pattern matches = stronger signal for ban
|
|
|
|
## Rule Synchronization
|
|
|
|
### Timestamp-Based Cursor
|
|
|
|
Agents use `updated_at` timestamps as sync cursors to handle rule updates and deletions.
|
|
|
|
**Why `updated_at` instead of `id`?**
|
|
- Handles rule updates (e.g., disabling a rule updates `updated_at`)
|
|
- Handles rule deletions via `enabled=false` flag
|
|
- Simple for agents: "give me everything that changed since X"
|
|
|
|
**Agent Sync Flow**:
|
|
```
|
|
1. Agent starts: last_sync = nil
|
|
2. GET /api/:key/rules → Full sync, store latest updated_at
|
|
3. Every 10s or 1000 events: GET /api/:key/rules?since=<last_sync>
|
|
4. Process rules: add new, update existing, remove disabled
|
|
5. Update last_sync to latest updated_at from response
|
|
```
|
|
|
|
**Query Overlap**: Hub queries `updated_at >= since - 0.5s` to handle clock skew and millisecond duplicates.
|
|
|
|
### API Endpoints
|
|
|
|
#### 1. Version Check (Lightweight)
|
|
|
|
```http
|
|
GET /api/:public_key/rules/version
|
|
|
|
Response:
|
|
{
|
|
"version": 1730646645123000,
|
|
"count": 150,
|
|
"sampling": {
|
|
"allowed_requests": 0.5,
|
|
"blocked_requests": 1.0,
|
|
"rate_limited_requests": 1.0,
|
|
"effective_until": "2024-11-03T12:30:55.123Z"
|
|
}
|
|
}
|
|
```
|
|
|
|
**Timestamp Format**: The `version` field uses **microsecond Unix timestamp** (e.g., `1730646645123000`) for efficient machine comparison. For backward compatibility, the API also accepts ISO8601 timestamps in the `since` parameter.
|
|
|
|
#### 2. Incremental Sync
|
|
|
|
```http
|
|
GET /api/:public_key/rules?since=1730646000000000
|
|
|
|
Response:
|
|
{
|
|
"version": 1730646645123000,
|
|
"sampling": { ... },
|
|
"rules": [
|
|
{
|
|
"id": 12341,
|
|
"rule_type": "network_v4",
|
|
"action": "deny",
|
|
"conditions": { "cidr": "1.2.3.4/32" },
|
|
"priority": 32,
|
|
"expires_at": "2024-11-04T12:00:00Z",
|
|
"enabled": true,
|
|
"source": "auto:scanner_detected",
|
|
"metadata": { "reason": "Hitting /.env" },
|
|
"created_at": "2024-11-03T12:00:00Z",
|
|
"updated_at": "2024-11-03T12:00:00Z"
|
|
},
|
|
{
|
|
"id": 12340,
|
|
"rule_type": "network_v4",
|
|
"action": "deny",
|
|
"conditions": { "cidr": "5.6.7.8/32" },
|
|
"priority": 32,
|
|
"enabled": false,
|
|
"source": "manual",
|
|
"metadata": { "reason": "False positive" },
|
|
"created_at": "2024-11-02T10:00:00Z",
|
|
"updated_at": "2024-11-03T12:25:00Z"
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
#### 3. Full Sync
|
|
|
|
```http
|
|
GET /api/:public_key/rules
|
|
|
|
Response:
|
|
{
|
|
"version": 1730646645123000,
|
|
"sampling": { ... },
|
|
"rules": [ ...all enabled rules... ]
|
|
}
|
|
```
|
|
|
|
## Dynamic Event Sampling
|
|
|
|
Hub controls how many events Agents send based on load.
|
|
|
|
### Sampling Strategy
|
|
|
|
**Hub monitors**:
|
|
- SolidQueue job depth
|
|
- Events/second rate
|
|
- Database write latency
|
|
|
|
**Sampling rates**:
|
|
```ruby
|
|
Queue Depth | Allowed | Blocked | Rate Limited
|
|
----------------|---------|---------|-------------
|
|
0-1,000 | 100% | 100% | 100%
|
|
1,001-5,000 | 50% | 100% | 100%
|
|
5,001-10,000 | 20% | 100% | 100%
|
|
10,001+ | 5% | 100% | 100%
|
|
```
|
|
|
|
**Phase 2+: Path-based sampling**:
|
|
```json
|
|
{
|
|
"sampling": {
|
|
"allowed_requests": 0.1,
|
|
"blocked_requests": 1.0,
|
|
"paths": {
|
|
"block": ["/.env", "/.git/*"],
|
|
"allow": ["/health", "/metrics"]
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
**Agent respects sampling**:
|
|
- Always sends blocked/rate-limited events
|
|
- Samples allowed events based on rate
|
|
- Can prioritize suspicious paths over routine traffic
|
|
|
|
## Temporal Rules (Expiration)
|
|
|
|
Rules can have an `expires_at` timestamp for automatic expiration.
|
|
|
|
**Use Cases**:
|
|
- 24-hour scanner bans
|
|
- Temporary rate limit adjustments
|
|
- Time-boxed maintenance blocks
|
|
|
|
**Cleanup**:
|
|
- `ExpiredRulesCleanupJob` runs hourly
|
|
- Disables rules where `expires_at < now`
|
|
- Agent picks up disabled rules in next sync
|
|
|
|
**Example**:
|
|
```ruby
|
|
# Hub auto-generates rule when scanner detected:
|
|
Rule.create!(
|
|
rule_type: "network_v4",
|
|
action: "deny",
|
|
conditions: { cidr: "1.2.3.4/32" },
|
|
expires_at: 24.hours.from_now,
|
|
source: "auto:scanner_detected",
|
|
metadata: { reason: "Hit /.env 5 times in 10 seconds" }
|
|
)
|
|
|
|
# 24 hours later: ExpiredRulesCleanupJob disables it
|
|
# Agent syncs and removes from ipv4_ranges table
|
|
```
|
|
|
|
## Rule Sources
|
|
|
|
The `source` field tracks rule origin for audit and filtering.
|
|
|
|
**Source Formats**:
|
|
- `manual` - Created by user via UI
|
|
- `auto:scanner_detected` - Auto-generated from scanner pattern
|
|
- `auto:rate_limit_exceeded` - Auto-generated from rate limit abuse
|
|
- `auto:bot_detected` - Auto-generated from bot behavior
|
|
- `imported:fail2ban` - Imported from external source
|
|
- `imported:crowdsec` - Imported from CrowdSec
|
|
- `default:scanner_paths` - Default rule set
|
|
|
|
## Database Schema
|
|
|
|
### Hub Schema
|
|
|
|
```ruby
|
|
create_table "rules" do |t|
|
|
# Identification
|
|
t.integer :id, primary_key: true
|
|
t.string :source, limit: 100
|
|
|
|
# Rule definition
|
|
t.string :rule_type, null: false
|
|
t.string :action, null: false
|
|
t.json :conditions, null: false
|
|
t.json :metadata
|
|
|
|
# Priority & lifecycle
|
|
t.integer :priority
|
|
t.datetime :expires_at
|
|
t.boolean :enabled, default: true, null: false
|
|
|
|
# Timestamps (updated_at is sync cursor!)
|
|
t.timestamps
|
|
|
|
# Indexes
|
|
t.index [:updated_at, :id] # Primary sync query
|
|
t.index :enabled
|
|
t.index :expires_at
|
|
t.index :source
|
|
t.index :rule_type
|
|
end
|
|
```
|
|
|
|
### Agent Schema (Existing)
|
|
|
|
```ruby
|
|
create_table "ipv4_ranges" do |t|
|
|
t.integer :network_start, limit: 8, null: false
|
|
t.integer :network_end, limit: 8, null: false
|
|
t.integer :network_prefix, null: false
|
|
t.integer :waf_action, default: 0, null: false
|
|
t.integer :priority, default: 100
|
|
t.string :redirect_url, limit: 500
|
|
t.integer :redirect_status
|
|
t.string :source, limit: 50
|
|
t.timestamps
|
|
|
|
t.index [:network_start, :network_end, :network_prefix]
|
|
t.index :waf_action
|
|
end
|
|
|
|
create_table "ipv6_ranges" do |t|
|
|
t.binary :network_start, limit: 16, null: false
|
|
t.binary :network_end, limit: 16, null: false
|
|
t.integer :network_prefix, null: false
|
|
t.integer :waf_action, default: 0, null: false
|
|
t.integer :priority, default: 100
|
|
t.string :redirect_url, limit: 500
|
|
t.integer :redirect_status
|
|
t.string :source, limit: 50
|
|
t.timestamps
|
|
|
|
t.index [:network_start, :network_end, :network_prefix]
|
|
t.index :waf_action
|
|
end
|
|
```
|
|
|
|
## Agent Rule Processing
|
|
|
|
### Network Rules
|
|
|
|
```ruby
|
|
# Agent receives network rule from Hub:
|
|
rule = {
|
|
id: 12341,
|
|
rule_type: "network_v4",
|
|
action: "deny",
|
|
conditions: { cidr: "10.0.0.0/8" },
|
|
priority: 8,
|
|
enabled: true
|
|
}
|
|
|
|
# Agent converts to ipv4_ranges entry:
|
|
cidr = IPAddr.new("10.0.0.0/8")
|
|
Ipv4Range.upsert({
|
|
source: "hub:12341",
|
|
network_start: cidr.to_i,
|
|
network_end: cidr.to_range.end.to_i,
|
|
network_prefix: 8,
|
|
waf_action: 1, # deny
|
|
priority: 8
|
|
}, unique_by: :source)
|
|
|
|
# Agent evaluates request:
|
|
# SELECT * FROM ipv4_ranges
|
|
# WHERE ? BETWEEN network_start AND network_end
|
|
# ORDER BY network_prefix DESC
|
|
# LIMIT 1
|
|
```
|
|
|
|
### Rate Limit Rules
|
|
|
|
```ruby
|
|
# Agent stores in memory:
|
|
@rate_limit_rules = {
|
|
"global" => { limit: 100, window: 60, cidr: "0.0.0.0/0" }
|
|
}
|
|
|
|
@rate_counters = {
|
|
"1.2.3.4" => { count: 50, window_start: Time.now }
|
|
}
|
|
|
|
# On each request:
|
|
def check_rate_limit(ip)
|
|
rule = find_most_specific_rate_limit_rule(ip)
|
|
counter = @rate_counters[ip] ||= { count: 0, window_start: Time.now }
|
|
|
|
# Reset window if expired
|
|
if Time.now - counter[:window_start] > rule[:window]
|
|
counter = { count: 0, window_start: Time.now }
|
|
end
|
|
|
|
counter[:count] += 1
|
|
|
|
if counter[:count] > rule[:limit]
|
|
{ action: "rate_limit", status: 429 }
|
|
else
|
|
{ action: "allow" }
|
|
end
|
|
end
|
|
```
|
|
|
|
### Path Pattern Rules
|
|
|
|
```ruby
|
|
# Agent evaluates patterns:
|
|
PATH_PATTERNS = [/.env$/, /.git/, /wp-admin/]
|
|
|
|
def check_path_patterns(path)
|
|
matched = PATH_PATTERNS.any? { |pattern| path.match?(pattern) }
|
|
|
|
if matched
|
|
# Send event to Hub with flag
|
|
send_event_to_hub(
|
|
path: path,
|
|
matched_pattern: true,
|
|
waf_action: "log" # Don't block yet
|
|
)
|
|
|
|
# Hub will analyze and create IP block rule if needed
|
|
end
|
|
end
|
|
```
|
|
|
|
## Hub Intelligence (Auto-Generation)
|
|
|
|
### Scanner Detection
|
|
|
|
```ruby
|
|
# PathScannerDetectorJob
|
|
class PathScannerDetectorJob < ApplicationJob
|
|
SCANNER_PATHS = %w[/.env /.git /wp-admin /phpMyAdmin /.aws]
|
|
|
|
def perform
|
|
# Find IPs hitting scanner paths
|
|
scanner_ips = Event
|
|
.where("request_path IN (?)", SCANNER_PATHS)
|
|
.where("timestamp > ?", 5.minutes.ago)
|
|
.group(:ip_address)
|
|
.having("COUNT(*) >= 3")
|
|
.pluck(:ip_address)
|
|
|
|
scanner_ips.each do |ip|
|
|
# Create 24h ban rule
|
|
Rule.create!(
|
|
rule_type: "network_v4",
|
|
action: "deny",
|
|
conditions: { cidr: "#{ip}/32" },
|
|
priority: 32,
|
|
expires_at: 24.hours.from_now,
|
|
source: "auto:scanner_detected",
|
|
metadata: {
|
|
reason: "Hit #{SCANNER_PATHS.join(', ')}",
|
|
auto_generated: true
|
|
}
|
|
)
|
|
end
|
|
end
|
|
end
|
|
```
|
|
|
|
### Rate Limit Abuse Detection
|
|
|
|
```ruby
|
|
# RateLimitAnomalyJob
|
|
class RateLimitAnomalyJob < ApplicationJob
|
|
def perform
|
|
# Find IPs exceeding normal rate
|
|
abusive_ips = Event
|
|
.where("timestamp > ?", 1.minute.ago)
|
|
.group(:ip_address)
|
|
.having("COUNT(*) > 200") # >200 req/min
|
|
.pluck(:ip_address)
|
|
|
|
abusive_ips.each do |ip|
|
|
# Create aggressive rate limit or block
|
|
Rule.create!(
|
|
rule_type: "rate_limit",
|
|
action: "rate_limit",
|
|
conditions: { cidr: "#{ip}/32", scope: "global" },
|
|
priority: 32,
|
|
expires_at: 1.hour.from_now,
|
|
source: "auto:rate_limit_exceeded",
|
|
metadata: {
|
|
limit: 10,
|
|
window: 60,
|
|
per_ip: true
|
|
}
|
|
)
|
|
end
|
|
end
|
|
end
|
|
```
|
|
|
|
## Performance Characteristics
|
|
|
|
### Hub
|
|
- **Rule query**: O(log n) with `(updated_at, id)` index
|
|
- **Version check**: Single index lookup
|
|
- **Rule generation**: Background jobs, no request impact
|
|
|
|
### Agent
|
|
- **Network rule lookup**: O(log n) via B-tree index on `(network_start, network_end)`
|
|
- **Rate limit check**: O(1) hash lookup in memory
|
|
- **Path pattern check**: O(n) regex match (n = number of patterns)
|
|
- **Overall request evaluation**: <1ms for typical case
|
|
|
|
### Sync Efficiency
|
|
- **Incremental sync**: Only changed rules since last sync
|
|
- **Typical sync payload**: <10 KB for 50 rules
|
|
- **Sync frequency**: Every 10s or 1000 events
|
|
- **Version check**: <1 KB response
|
|
|
|
## Future Enhancements (Phase 2+)
|
|
|
|
### Per-Path Rate Limiting
|
|
- Different limits for `/api/*`, `/login`, `/admin`
|
|
- Agent tracks multiple counters per IP
|
|
|
|
### Path-Based Event Sampling
|
|
- Send all `/admin` requests
|
|
- Skip `/health`, `/metrics`
|
|
- Sample 10% of regular traffic
|
|
|
|
### Challenge Actions
|
|
- CAPTCHA challenges for suspicious IPs
|
|
- JavaScript challenges for bot detection
|
|
|
|
### Scheduled Rules
|
|
- Block during maintenance windows
|
|
- Time-of-day rate limits
|
|
|
|
### Multi-Project Rules (Phase 10+)
|
|
- Global rules across all projects
|
|
- Per-project rule overrides
|
|
|
|
## Summary
|
|
|
|
The Baffle Hub rule system provides:
|
|
- **Fast local enforcement** (sub-millisecond)
|
|
- **Centralized intelligence** (Hub analytics)
|
|
- **Efficient synchronization** (timestamp-based incremental sync)
|
|
- **Dynamic adaptation** (backpressure control via sampling)
|
|
- **Temporal flexibility** (auto-expiring rules)
|
|
- **Audit trail** (soft deletes, source tracking)
|
|
|
|
This architecture scales from single-server deployments to distributed multi-agent installations while maintaining simplicity and pragmatic design choices focused on the "low-hanging fruit" of WAF functionality.
|