← Projects

Watchtower NOC Dashboard

Status: In Progress Origin: Polk State College Since: 2026.01
FastAPIReactTypeScriptRedisWebSocketReactFlowZustandTailwindCSSProxmox APILibreNMS API

Overview

Watchtower is a self-hosted Network Operations Center dashboard that provides unified visibility into network infrastructure, virtualization platforms, and system health. It aggregates data from multiple monitoring systems into a single interactive topology visualization with live status updates via WebSocket.

Open Source

50+ commits, TypeScript 52% / Python 43%

The Numbers: Cost Avoidance

Enterprise monitoring platforms carry significant licensing costs. For a 200-endpoint departmental network, here's what commercial alternatives would cost:

πŸ’Έ SolarWinds (200 nodes)
Annual License $29,448
Implementation + Training $5,000–$13,000
3-Year TCO $110K–$150K
Per Endpoint $147–$177/year
3-YEAR SAVINGS $110K–$150K
βœ“ Watchtower
Annual License $0
Implementation $0
3-Year TCO $0
Per Endpoint $0
100% cost reduction 200+ endpoints monitored at zero additional cost
πŸ›οΈ
State College Spending Montgomery College: $4.1M/year on NOC. UNC System: $5.35M/year on network monitoring. 48% of colleges spend $750K–$2.5M annually.
πŸ“Š
Market Context 83% of universities lack effective SIEM. Custom NOC development costs $100K–$500K. Average IT outage costs $14,056/minute.

SolarWinds pricing: Vendr Β· Netdata Β· ManageEngine

Architecture

DATA SOURCES
LibreNMS SNMP Metrics Device status, interfaces, port traffic, CDP/LLDP neighbors
Netdisco L2 Discovery MAC tables, device inventory, switch port mappings
Proxmox Virtualization VM/container lists, CPU/memory, storage pools
Speedtest CLI WAN Health Download/upload speeds, latency, jitter
β–Όβ–Όβ–Όβ–Ό
Async Polling (httpx)
BACKEND
APScheduler Polling jobs at 30s-300s intervals
β†’
Aggregator Merges sources into unified model
β†’
Redis In-memory cache
β†’
FastAPI REST + WebSocket
β–Ό
WebSocket Push
FRONTEND
TopologyCanvas ReactFlow interactive map
  • Cluster nodes (expand/collapse)
  • Device status indicators
  • Animated link edges
  • L2/L3 view toggle
Zustand Store Global state + WS subscription

Features

Data Flow Example

How a device status change propagates through the system:

1
Detection APScheduler polls LibreNMS API at 30s interval
2
Aggregation Device status change detected, Pydantic model updated
3
Cache Update Redis CACHE_TOPOLOGY key updated with new state
4
Broadcast WebSocket pushes delta to all connected clients
5
State Update Zustand store receives message, updates React state
6
Re-render Device node color changes, alert count increments

Feature Deep Dive: WebSocket Real-Time Updates

Bidirectional WebSocket connection that pushes device status changes, alert notifications, and speedtest results from the backend to all connected dashboard clients without requiring page refreshes or polling.

⚑ Use Case: Instant Incident Detection

NOC operators need immediate notification when device status changes. Instead of waiting 30-60 seconds for the next poll cycle, the WebSocket pushes updates in real-time: a switch going down appears on the dashboard within seconds. This enables faster incident response and reduces mean time to detection (MTTD).

Architecture

⏰ POLLING SCHEDULER
poll_devices()
Every 30s
Fetch status from LibreNMS β†’ Compare previous vs current β†’ broadcast_status_changes()
poll_alerts()
Every 30s
Detect new/resolved alerts β†’ broadcast_new_alerts() / broadcast_resolved_alerts()
run_speedtest()
Every 5m
Run speedtest CLI β†’ broadcast_speedtest_result()
β–Ό broadcast()
πŸ”Œ ConnectionManager (ws_manager)
active_connections: list[WebSocket]
broadcast(message) β†’ For each connection: await connection.send_json(message)
β–Ό send_json()
πŸ–₯️ WEBSOCKET CLIENTS
Browser Tab 1
Browser Tab 2
Browser Tab 3
useWebSocket() hook
device_status_change β†’ updateDeviceStatus() β†’ Canvas rerender
new_alerts β†’ addAlert() β†’ Toast + Bell badge
alerts_resolved β†’ removeAlert() β†’ Toast removed
speedtest_result β†’ setSpeedtestStatus() β†’ WAN link color

Message Types: Server β†’ Client

connected
Sent immediately after WebSocket connection established
{
  "type": "connected",
  "message": "Connected to Watchtower"
}
device_status_change
Sent when device status changes (up ↔ down)
{
  "type": "device_status_change",
  "timestamp": "2026-01-28T10:42:15.123Z",
  "changes": [{
    "device_id": 42,
    "hostname": "cat-1.corp.local",
    "old_status": "up",
    "new_status": "down"
  }]
}
new_alerts
Sent when new LibreNMS alerts fire
{
  "type": "new_alerts",
  "timestamp": "2026-01-28T10:42:15.123Z",
  "alerts": [{
    "id": 4521,
    "device_id": 42,
    "hostname": "cat-1.corp.local",
    "severity": "critical",
    "title": "Device unreachable"
  }]
}
alerts_resolved
Sent when alerts are cleared (device recovered)
{
  "type": "alerts_resolved",
  "timestamp": "2026-01-28T10:45:30.456Z",
  "alert_ids": [4521, 4522]
}
speedtest_result
Sent after each speedtest completes
{
  "type": "speedtest_result",
  "timestamp": "2026-01-28T10:45:00.000Z",
  "result": {
    "download_mbps": 245.5,
    "upload_mbps": 125.3,
    "ping_ms": 12.5,
    "indicator": "normal"
  }
}
ping / pong
Keepalive messages (client sends ping every 30s)
{ "type": "ping" }  // Client β†’ Server
{ "type": "pong" }  // Server β†’ Client

Connection Lifecycle

CLIENT
SERVER
new WebSocket('/ws/updates')
ws_manager.connect(ws)
active_connections.append(ws)
{"type": "connected", ...}
{"type": "ping"}
every 30s
{"type": "pong"}
{"type": "device_status_change", ...}
poll_devices() detects change
Connection lost / closed
ws_manager.disconnect(ws)
active_connections.remove(ws)
Exponential backoff reconnect
1s β†’ 2s β†’ 4s β†’ 8s β†’ 16s β†’ 30s max

Reconnection Strategy

DISCONNECTION DETECTED
β–Ό
Calculate backoff delay delay = min(1000 * 2^attempts, 30000)
attempt 0:1000ms (1s)
attempt 1:2000ms (2s)
attempt 2:4000ms (4s)
attempt 3:8000ms (8s)
attempt 4:16000ms (16s)
attempt 5+:30000ms (30s max)
β–Ό
setTimeout(reconnect, delay)
β–Ό
Attempt connection
SUCCESS β†’ reconnectAttempts = 0 FAILURE β†’ goto "Calculate backoff"

Device Hostname Matching

PROBLEM
LibreNMS sends: "cat-1.corp.local"
Topology uses: "cat-1"
ALGORITHM: findTopologyDeviceId(hostname, devices)
1
IP Address Match IF device.ip === hostname β†’ RETURN device.id
2
Fuzzy Hostname Match IF hostname.toLowerCase().includes(device_id.toLowerCase()) β†’ RETURN device.id Example: "cat-1.corp.local".includes("cat-1") β†’ true
3
No Match RETURN null, log warning

Component Structure

Backend (Python/FastAPI)
websocket.py
ConnectionManager β”œβ”€ active_connections: list[WebSocket] β”œβ”€ connect(ws) β†’ add to list β”œβ”€ disconnect(ws) β†’ remove from list └─ broadcast(msg) β†’ send to all ws_manager (singleton) websocket_endpoint(ws)
main.py
app.websocket("/ws/updates")
polling/scheduler.py
broadcast_status_changes() broadcast_new_alerts() broadcast_resolved_alerts() broadcast_speedtest_result()
Frontend (React/TypeScript)
hooks/useWebSocket.ts
State: isConnected, wsRef, reconnectAttempts handleMessage() β”œβ”€ "connected" β†’ console.log β”œβ”€ "device_status_change" β†’ updateDeviceStatus() β”œβ”€ "new_alerts" β†’ addAlert() β”œβ”€ "alerts_resolved" β†’ removeAlert() └─ "speedtest_result" β†’ setSpeedtestStatus() connect(), disconnect()
store/nocStore.ts
isConnected: boolean updateDeviceStatus() updateAlertCount() setSpeedtestStatus()

URL Construction

const WS_URL = `${
  window.location.protocol === 'https:' ? 'wss:' : 'ws:'
}//${window.location.host}/ws/updates`
http://watchtower.local:5173 β†’ ws://watchtower.local:5173/ws/updates
https://watchtower.example.com β†’ wss://watchtower.example.com/ws/updates

Debug Helper (Browser Console)

Exposed via window.watchtower in App.tsx:
window.watchtower.setDeviceDown('cat-1') Simulate device going down
window.watchtower.setDeviceUp('cat-1') Simulate device coming back up
window.watchtower.listDevices() List all devices and their status
window.watchtower.setSpeedtestDown() Red external links
window.watchtower.setSpeedtestDegraded() Yellow external links
window.watchtower.setSpeedtestNormal() Green external links
window.watchtower.getStore() Get full store state

Implementation Details

Aspect Implementation
Protocol WebSocket (RFC 6455)
Endpoint /ws/updates
Protocol auto-detect ws:// for HTTP, wss:// for HTTPS
Ping interval 30 seconds (client β†’ server)
Reconnect delays 1s, 2s, 4s, 8s, 16s, 30s max
Connection tracking ws_manager.active_connections list
Thread safety asyncio.Lock() for connection list
Message format JSON (send_json / JSON.parse)
Device matching Fuzzy hostname matching with IP fallback

Feature Deep Dive: Interactive Topology Canvas

The core visualization engine that renders the physical network topology using ReactFlow. Features expandable device clusters, automatic collision detection, drag-and-drop positioning with localStorage persistence, and dynamic edge routing that adapts to expanded/collapsed states.

πŸ—ΊοΈ Use Case: At-a-Glance Network View with Drill-Down

NOC operators need to see the entire network at a glance while being able to drill down into specific clusters. The canvas provides an interactive map where clusters can be expanded to show individual devices, nodes can be dragged to custom positions (persisted across sessions), and the layout automatically adjusts to prevent overlapping nodes.

Architecture

TOPOLOGY DATA (from WebSocket)
topology.clusters[] topology.devices topology.connections[] topology.external_links[]
β–Ό
topologyToNodes()
For each cluster: IF expanded β†’ Create DeviceNode for each device (grid layout)
ELSE β†’ Create ClusterNode (collapsed view)
For each external_link: β†’ Create ExternalNode (WAN endpoints)
β–Ό
resolveOverlaps()
1. Calculate bounding boxes for all nodes 2. Check each pair for collision 3. Push overlapping nodes apart 4. Repeat until no overlaps (max 50 iterations)
β–Ό
ReactFlow Canvas
Pan: Click + Drag background Zoom: Scroll wheel (0.2x - 2x) Select: Click node β†’ sidebar details Expand: Double-click cluster β†’ show devices Collapse: Double-click device β†’ hide cluster Move: Drag node β†’ position saved to localStorage

Visual Layout: Collapsed View

Default view shows clusters as single nodes with device status indicators:

Physical (L2) Logical (L3) Reset Layout
πŸ›‘οΈ
Firewalls
2 devices | All healthy
πŸ”Œ
Core Network
3 devices | All healthy
☁ Internet
πŸ–₯️
Servers
4 devices | All healthy
πŸ“‘
Access Layer
3 devices | 1 down
Double-click cluster to expand

Visual Layout: Expanded View

When a cluster is expanded, individual devices are shown in a grid layout:

Physical (L2) Logical (L3) Reset Layout
Firewalls
Core Network (expanded)
cat-1 10.2.10.1
cat-2 10.2.10.2
cat-3 10.2.10.3
Access Layer
Double-click device to collapse cluster

Node Types

ClusterNode (collapsed)
πŸ”Œ
Core Network
8 devices    All healthy
Double-click to expand
DeviceNode (expanded)
cat-1 10.2.10.1
Double-click to collapse cluster
ExternalNode (WAN)
☁
Internet
Glowing border color reflects speedtest status

Collision Detection Algorithm

INPUT
nodes[] after position assignment
CONSTANTS
NODE_SIZES = { cluster: 180Γ—120, device: 160Γ—80, external: 140Γ—100 }
PADDING = 40px
STEP 1
Calculate bounding boxes for each node:
box = { x: node.x - PADDING, y: node.y - PADDING, width: size + PADDING*2, height: size + PADDING*2 }
STEP 2
Check for overlaps:
boxesOverlap(a, b) = !(a.right < b.x || b.right < a.x || a.bottom < b.y || b.bottom < a.y)
STEP 3
Calculate push vector (push along shortest overlap axis):
IF overlapX < overlapY β†’ push horizontally ELSE β†’ push vertically
STEP 4
Iterate until resolved:
WHILE hasOverlap AND iteration < 50: recalculate, check pairs, apply push
OUTPUT
nodes[] with non-overlapping positions

Position Persistence

localStorage Key: watchtower-node-positions
Stored Structure:
{
  "cluster-core": { "x": 450, "y": 150 },
  "cluster-servers": { "x": 450, "y": 450 },
  "cat-1": { "x": 350, "y": 120 },
  "cat-2": { "x": 550, "y": 120 },
  "external-Internet": { "x": 80, "y": 150 }
}
Save: On drag end (when dragging === false)
Load: On component mount
Reset: Clear localStorage, regenerate auto-layout

Event Handlers

Event Action
Single click device selectDevice(id) β†’ opens sidebar detail panel
Single click cluster No action (expands on double-click)
Double-click cluster toggleClusterExpanded(id) β†’ expand to show devices
Double-click device toggleClusterExpanded(clusterId) β†’ collapse back
Drag node Update position, save to localStorage on drop
Scroll wheel Zoom in/out (0.2x - 2x range)
Click + drag background Pan the canvas

Grid Layout for Expanded Devices

1 Determine grid: cols = min(3, deviceCount), rows = ceil(deviceCount / cols)
2 Spacing: spacingX = 200px, spacingY = 120px
3 Center on cluster position: startX = cluster.x - gridWidth / 2
4 Position each device: x = startX + (i % cols) * spacingX
Example: 8 devices at cluster (450, 300)
[0] ─── [1] ─── [2]
β”‚         β”‚         β”‚
[3] ─── [4] ─── [5]
β”‚         β”‚
[6] ─── [7]

Edge Routing

Edges dynamically connect based on cluster expansion state:

Both expanded: device β†’ device
Source expanded, target collapsed: device β†’ cluster
Source collapsed, target expanded: cluster β†’ device
Both collapsed: cluster β†’ cluster

Edges are deduplicated: only one edge per node pair using sorted key.

Component Hierarchy

TopologyCanvas.tsx (wrapper with ReactFlowProvider)
TopologyCanvasInner
useNocStore (topology, expandedClusters, selectDevice)
savedPositionsRef (localStorage persistence)
processedNodes = useMemo(() => topologyToNodes β†’ resolveOverlaps)
initialEdges = useMemo(() => topologyToEdges)
ReactFlow
nodeTypes: { cluster, device, external, vlanGroup }
edgeTypes: { traffic, physical }
<Background /> (grid pattern)
<Controls /> (zoom buttons)
<Panel position="top-right"> (view toggle, reset)

Implementation Details

Aspect Implementation
ReactFlow version @xyflow/react (v12+)
Min/Max zoom 0.2x - 2x
Collision padding 40px between nodes
Max collision iterations 50
Grid columns (expanded) Max 3 devices per row
Device spacing 200px Γ— 120px
Cluster spacing 450px Γ— 350px
Position save trigger On drag end (dragging === false)
Fit view padding 0.2 (20%)

Feature Deep Dive: CDP/LLDP Link Discovery

Automatic discovery of physical network connections by polling CDP (Cisco Discovery Protocol) and LLDP (Link Layer Discovery Protocol) neighbor data from LibreNMS. Discovered links are merged with static topology.yaml connections to create a complete network map.

πŸ”— Use Case: Automated Cable Documentation

Manually documenting every cable connection in a network is tedious and error-prone. CDP/LLDP discovery automates this: switches advertise their neighbors, LibreNMS collects this data via SNMP, and Watchtower polls it to build the topology graph automatically. Static connections in topology.yaml fill gaps for devices that don't support CDP/LLDP (firewalls, servers, APs).

Architecture

🌐 NETWORK INFRASTRUCTURE
cat-1 (switch)
◄─ CDP/LLDP ─►
cat-2 (switch)
◄─ CDP/LLDP ─►
...
SNMP polling β–Ό
LibreNMS
/api/v0/resources/links
Returns: local_device_id, local_port_id, remote_device_id, remote_hostname, remote_port, protocol
β–Ό GET /api/v0/resources/links
βš™οΈ WATCHTOWER BACKEND
poll_links() [scheduler.py]
1. Fetch all CDP/LLDP links from LibreNMS
2. For each link.local_port_id: call /ports/{port_id} to get port name
3. Build link_data with resolved port names
4. Cache to Redis (CACHE_LINKS, TTL: 600s)
CACHE_LINKS = "watchtower:links"
[{
  "id": 123,
  "local_device_id": 42,
  "local_port": "TenGigabitEthernet1/0/1",
  "remote_device_id": 43,
  "remote_hostname": "cat-2.corp.local",
  "remote_port": "TenGigabitEthernet1/0/1",
  "protocol": "cdp"
}]
β–Ό get_aggregated_topology()
πŸ”€ AGGREGATOR [aggregator.py]
1
Build device ID mappings topo_to_librenms: {"cat-1": 42, "cat-2": 43}
2
Process cached CDP/LLDP links Map IDs β†’ Create Connection objects β†’ Track seen_ports
3
Add static connections from topology.yaml Skip if port already discovered via CDP/LLDP
4
Merge into Topology.connections[]
β–Ό
πŸ–₯️ FRONTEND
TopologyCanvas renders edges with port labels (Gi0/1 ↔ Gi0/2)

Data Models

LibreNMSLink (from API)
class LibreNMSLink(BaseModel):
    id: int | None
    local_device_id: int
    local_port_id: int | None
    local_port: str | None
    remote_hostname: str | None
    remote_port: str | None
    remote_device_id: int | None
    protocol: str | None  # "cdp", "lldp"
Connection (Topology Model)
class Connection(BaseModel):
    id: str  # "cdp-cat-1-Te1-0-1"
    source: ConnectionEndpoint
    target: ConnectionEndpoint
    connection_type: ConnectionType
    speed: int  # Mbps
    status: ConnectionStatus
    utilization: float | None

Port Name Resolution

PROBLEM
LibreNMS /resources/links returns local_port_id (integer) but NOT the actual port name like "TenGigabitEthernet1/0/1"
SOLUTION
1. Collect unique port_ids that need resolution
2. For each port_id: GET /api/v0/ports/{port_id}
3. Extract ifName or ifDescr
4. Cache in port_names dict
GOTCHA: LibreNMS returns {"port": [{...}]} (list inside "port" key)

Deduplication Logic

SCENARIO: CDP/LLDP discovers a link that's also defined in topology.yaml
seen_ports: set[tuple[str, str]] = set()

# When processing CDP/LLDP link:
if local_port:
    seen_ports.add((source_topo_id, local_port))
if remote_port:
    seen_ports.add((target_topo_id, remote_port))

# When processing static connection:
if (source_device, source_port) in seen_ports:
    continue  # Skip - already discovered
βœ“ CDP/LLDP takes priority (more accurate)
βœ“ Static connections fill gaps
βœ“ No duplicate edges in topology

Static Connection Merging

Devices that don't run CDP/LLDP:
Firewalls (Palo Alto, Fortinet) Servers (Linux, Windows) Access Points (some models) IoT devices "Dumb" unmanaged switches
topology.yaml example:
connections:
  - id: "fw-to-core"
    source:
      device: pa-3410
      port: ethernet1/1
    target:
      device: cat-1
      port: TenGigabitEthernet1/0/48
    connection_type: uplink
    speed: 10000
Connection Types:
trunk Inter-switch links (CDP/LLDP default) access End-device connections uplink Upstream connections stack Stacking cables peer HA/cluster peer links management Management network

Connection Status Derivation

Status is derived from device status, not link state
source_device = devices.get(source_topo_id)
target_device = devices.get(target_topo_id)

IF source.status == DOWN OR target.status == DOWN:
    conn_status = DOWN
ELIF source.status == UNKNOWN OR target.status == UNKNOWN:
    conn_status = UNKNOWN
ELSE:
    conn_status = UP
Why not use link state? CDP/LLDP is neighbor discovery, not link monitoring. Device status is more reliable.

Component Structure

backend/app/polling/
librenms.py
LibreNMSLink (Pydantic model) get_all_links() β†’ list[LibreNMSLink] get_port(port_id) β†’ LibreNMSPort
scheduler.py
CACHE_LINKS = "watchtower:links" poll_links() - Fetch, resolve ports, cache Scheduler: every 300s, immediate on startup
aggregator.py
get_aggregated_topology() β”œβ”€ Load CACHE_LINKS from Redis β”œβ”€ Map LibreNMS IDs β†’ topology IDs β”œβ”€ Create Connection objects β”œβ”€ Track seen_ports for deduplication └─ Add static connections from topology.yaml

LibreNMS API Endpoints

GET /api/v0/resources/links
Returns all CDP/LLDP discovered neighbors
{
  "links": [{
    "id": 123,
    "local_device_id": 42,
    "local_port_id": 1001,
    "remote_device_id": 43,
    "remote_hostname": "cat-2.corp.local",
    "remote_port": "Te1/0/1",
    "protocol": "cdp"
  }]
}
GET /api/v0/ports/{port_id}
Returns port details including interface name
{
  "port": [{
    "port_id": 1001,
    "device_id": 42,
    "ifName": "TenGigabitEthernet1/0/1",
    "ifDescr": "TenGigabitEthernet1/0/1",
    "ifAlias": "Uplink to cat-2",
    "ifOperStatus": "up"
  }]
}

Implementation Details

Aspect Implementation
Discovery protocols CDP (Cisco), LLDP (IEEE 802.1AB)
Poll interval 300 seconds (5 minutes)
Cache TTL 600 seconds (10 minutes)
Cache key watchtower:links
Startup behavior Immediate poll via next_run_time=dt.now()
Port resolution Separate API call per unique port_id
Priority CDP/LLDP over static topology.yaml
Default connection type TRUNK for discovered links
Default speed 10000 Mbps for discovered links

Limitations

⚠️
Devices not discovered Firewalls, servers, APs, IoT, unmanaged switches don't run CDP/LLDP Workaround: Define in topology.yaml
⚠️
One-way discovery CDP/LLDP may show link on one device but not the other Workaround: Static connections fill gaps
⚠️
Port name variations Cisco uses "TenGigabitEthernet1/0/1", "Te1/0/1", "tengige1/0/1" Frontend may normalize for display

Feature Deep Dive: Speedtest Widget

This feature monitors internet connectivity health by running periodic Ookla speed tests and displaying results in a real-time dashboard widget. It also affects the visual status of external WAN links on the topology map.

🌐 Use Case: WAN Health Monitoring

Provides at-a-glance WAN health visibility for NOC operators. Instead of manually running speed tests when users report "slow internet," the dashboard continuously monitors and logs connection quality. Historical CSV data enables capacity planning and ISP SLA verification.

How It Works

βš™οΈ CONFIG (config.yaml)
speedtest:
  enabled: true
  interval_minutes: 5           # Run test every 5 minutes
  server_id: null               # null = auto-select closest
  interface: "eth0"             # Specific NIC (optional)
  thresholds:
    degraded_download_mbps: 200 # Yellow if below
    degraded_ping_ms: 50        # Yellow if above
    down_download_mbps: 10      # Red if below
  logging:
    enabled: true
    path: "/opt/watchtower/data/speedtest.csv"
β–Ό
⏱️ SCHEDULER (APScheduler)
poll_speedtest() job runs at configured interval
Calls run_speedtest() async function
β–Ό
🐍 BACKEND (speedtest.py)
1 Build command: /usr/local/bin/speedtest --format=json --accept-license --accept-gdpr
2 Execute via asyncio subprocess (no shell) with 2-minute timeout
3 Parse JSON: download/upload bandwidth (bytes/sec β†’ Mbps), ping, jitter, packet loss, server info
4 Create SpeedtestResult dataclass
5 Cache result in Redis (1 hour TTL)
6 Append to CSV file for historical logging
7 Broadcast via WebSocket to all connected clients
β–Ό
πŸ“‘ API ENDPOINTS
GET /api/speedtest Returns cached result with status indicator
POST /api/speedtest/trigger Manual test trigger (60-second cooldown enforced)
GET /api/speedtest/export Download CSV history file
β–Ό
πŸ“‹ API RESPONSE (GET /api/speedtest)
{
  "timestamp": "2026-01-27T14:30:00Z",
  "download_mbps": 487.23,
  "upload_mbps": 52.41,
  "ping_ms": 12.3,
  "jitter_ms": 2.1,
  "packet_loss_pct": 0,
  "server_id": 12345,
  "server_name": "Spectrum",
  "server_location": "Tampa, FL",
  "result_url": "https://www.speedtest.net/result/xxxxx",
  "status": "success",
  "indicator": "normal"    // Derived from thresholds
}
β–Ό
πŸ“Š FRONTEND (SpeedtestWidget.tsx)
🌐 Internet Speed ● β–Ό
↓ 487.2 Mbps Down
↑ 52.4 Mbps Up
12.3ms ping | 2.1ms jitter
Spectrum (Tampa, FL)
Last test: 3m ago
loading β†’ Skeleton placeholder no_data β†’ "Run Test" button only testing β†’ Spinner animation ready β†’ Full result display

Status Indicator Logic

The widget displays a colored status dot based on configurable thresholds:

Status Color Condition
normal ● Green Download β‰₯ 200 Mbps AND ping ≀ 50ms
degraded ● Yellow Download < 200 Mbps OR ping > 50ms
down ● Red Download < 10 Mbps OR test error
Status Logic (Python)
def get_status(result, thresholds):
    if result.status.startswith("error"):
        return "down"

    if download < 10:                    # Critical threshold
        return "down"
    if download < 200 or ping > 50:      # Warning thresholds
        return "degraded"
    return "normal"

External Link Coloring

The speedtest status also affects the topology visualization:

Topology Canvas
Firewall
ISP Router
normal β†’ Green animated link
degraded β†’ Yellow animated link
down β†’ Red static link

Data Flow

1
Scheduled Test (every 5 minutes) APScheduler triggers poll_speedtest() β†’ Runs Ookla CLI β†’ Parses output β†’ Caches in Redis β†’ Logs to CSV β†’ Broadcasts via WebSocket
2
Manual Test (user-triggered) Frontend POSTs to /api/speedtest/trigger β†’ Backend checks 60s cooldown β†’ Runs test in background β†’ Result arrives via WebSocket
3
Page Load Frontend fetches GET /api/speedtest for cached result β†’ Displays immediately β†’ Subscribes to WebSocket for live updates
4
CSV Export User clicks export β†’ Backend returns FileResponse β†’ Filename: speedtest_history_2026-01-27.csv

CSV Log Format

speedtest.csv
timestamp,download_mbps,upload_mbps,ping_ms,jitter_ms,packet_loss_pct,server_id,server_name,server_location,result_url,status
2026-01-27T14:30:00Z,487.23,52.41,12.3,2.1,0,12345,Spectrum,"Tampa, FL",https://speedtest.net/result/...,success
2026-01-27T14:35:00Z,491.87,51.22,11.8,1.9,0,12345,Spectrum,"Tampa, FL",https://speedtest.net/result/...,success

Technical Details

CLI tool Ookla Speedtest CLI (/usr/local/bin/speedtest)
Execution asyncio subprocess - no shell injection risk
Timeout 2 minutes (handles hung tests)
Output format JSON (--format=json)
Conversion (bytes/sec Γ— 8) / 1,000,000 = Mbps
Cache TTL 1 hour in Redis
Cooldown 60 seconds between manual triggers
Interface binding Optional --interface flag for multi-NIC hosts

Why This Approach

πŸ†
Ookla CLI Industry standard, trusted by ISPs, consistent results
⚑
Async subprocess Non-blocking, doesn't tie up web server threads
πŸ“‘
WebSocket push Instant updates without polling overhead
πŸ›‘οΈ
Cooldown enforcement Prevents API abuse and bandwidth waste
πŸ“Š
CSV logging Long-term trending without database complexity
πŸŽ›οΈ
Configurable thresholds Adapts to different connection speeds (fiber vs cable vs DSL)

Feature Deep Dive: Port Group Monitoring

This feature aggregates real-time traffic across multiple switch ports that share a common SNMP description pattern. Built to monitor computer labs where dozens of workstations connect through various switches but need to be viewed as a single logical group.

🎬 Use Case: Digital Media Labs

The Digital Media Mac Pro Lab has ~140 workstation ports spread across multiple access switches. Each port has an SNMP description (ifAlias) containing "digital-media" (the department identifier). Instead of monitoring 140 individual ports, this feature sums all their traffic into a single dashboard widget showing aggregate bandwidth consumption.

How It Works

βš™οΈ CONFIG (config.yaml)
port_groups:
  - name: "Digital Media Labs"
    description: "Digital Media Mac Pro Lab"
    match_alias: "digital-media"  # Pattern to match
    thresholds:
      warning_mbps: 500         # Yellow threshold
      critical_mbps: 800        # Red threshold
    logging:
      enabled: true
      path: "/opt/watchtower/data/port_groups.csv"
β–Ό
🐍 BACKEND (portgroups.py)
1 Fetch ALL ports from LibreNMS API
2 Filter ports where ifAlias contains "digital-media" (case-insensitive) β†’ ~140 ports
3 Filter to only "up" ports (ifOperStatus == "up")
4 Sum traffic rates: ifInOctets_rate + ifOutOctets_rate (bytes/sec)
5 Convert: (bytes/sec Γ— 8) / 1,000,000 = Mbps
6 Determine status vs thresholds: <500 = ok, β‰₯500 = warning, β‰₯800 = critical
β–Ό
πŸ“‘ API RESPONSE (GET /api/port-groups)
{
  "name": "Digital Media Labs",
  "description": "Digital Media Mac Pro Lab",
  "port_count": 140,           // Total matching ports
  "active_port_count": 87,     // Ports currently up
  "in_mbps": 360.0,            // Inbound Mbps
  "out_mbps": 96.0,            // Outbound Mbps
  "total_mbps": 456.0,         // Combined
  "status": "ok",              // Based on thresholds
  "thresholds": { "warning_mbps": 500, "critical_mbps": 800 }
}
β–Ό
πŸ“Š FRONTEND (PortGroupWidget.tsx)
πŸ”² Digital Media Labs ●
Digital Media Mac Pro Lab
↓ 360.0 Mbps In
↑ 96.0 Mbps Out
Total: 456.0 Mbps 57%
0 500 800
87 of 140 ports active
πŸ“₯ Export CSV

Technical Details

Pattern matching Case-insensitive substring match on ifAlias field
Traffic source LibreNMS ifInOctets_rate / ifOutOctets_rate (SNMP counters)
Only active ports Filters ifOperStatus == "up" before summing
Conversion (bytes/sec Γ— 8) / 1,000,000 = Mbps
Polling interval Widget fetches every 60s (matches LibreNMS interface polling)
Thresholds Configurable per-group warning/critical values in Mbps
Persistence CSV logging for historical analysis and capacity planning

Why This Approach

πŸ“‘
SNMP-based Uses existing LibreNMS infrastructure, no agent installation needed
πŸ”€
Pattern matching Network admins already use naming conventions in port descriptions
⚑
Real-time 60-second granularity matches SNMP polling cycle
πŸ“ˆ
Scalable Works whether a lab has 10 ports or 500 ports
πŸ“Š
Exportable CSV export enables capacity planning spreadsheets and reports

Feature Deep Dive: Diagnosing Slowdowns

When users report "the internet is slow," the question is: is it the ISP or something internal? Both the Speedtest Widget and Port Group Monitoring export CSV logs. Comparing these two data sources pinpoints whether the bottleneck is upstream (ISP) or within the campus network.

The Diagnostic Question

🌐 ISP Problem
πŸ“Ά Speedtest Slow
πŸ“Š Lab Traffic Normal/Low

WAN link is degraded but internal network has capacity. Problem is with the upstream provider.

πŸ”€ Internal Bottleneck
πŸ“Ά Speedtest Normal
πŸ“Š Lab Traffic Near Threshold

ISP link is healthy but lab ports are saturated. Congestion is on the access or distribution layer.

How It Works

1
Collect Speedtest CSV Scheduled tests (every 5-15 min) log timestamp, download Mbps, upload Mbps, latency /opt/watchtower/data/speedtest.csv
2
Collect Port Group CSV Polled traffic (every 60s) logs timestamp, group name, active ports, in/out Mbps /opt/watchtower/data/port_groups.csv
3
Correlate by Timestamp Join both CSVs on timestamp (within tolerance) to see WAN health alongside lab traffic
4
Identify Pattern Slowdowns with low internal traffic β†’ ISP. Slowdowns with high internal traffic β†’ congestion.

Example Analysis

speedtest.csv
timestamp,download_mbps,upload_mbps,latency_ms
2026-01-27T14:00:00,850.2,410.5,12
2026-01-27T14:15:00,180.4,95.2,45    ← degraded
2026-01-27T14:30:00,175.8,88.1,52    ← degraded
2026-01-27T14:45:00,820.1,405.3,14
port_groups.csv (Digital Media Labs)
timestamp,group,active_ports,in_mbps,out_mbps,total_mbps
2026-01-27T14:00:00,Digital Media Labs,87,360.0,96.0,456.0
2026-01-27T14:15:00,Digital Media Labs,85,340.2,88.4,428.6
2026-01-27T14:30:00,Digital Media Labs,82,310.5,82.1,392.6
2026-01-27T14:45:00,Digital Media Labs,88,375.2,102.3,477.5
πŸ”
Conclusion: ISP Issue

At 14:15 and 14:30, speedtest dropped from ~850 Mbps to ~180 Mbps while lab traffic stayed consistent around 400-430 Mbps. The labs weren't saturating anything; the WAN link was degraded. Time to call the ISP.

Why This Matters

🎯
Accurate Diagnosis Eliminates guesswork about where the problem actually is
πŸ“ž
ISP Accountability CSV logs provide evidence when escalating to the provider
πŸ“ˆ
Capacity Planning Historical data shows when internal upgrades are actually needed
⏱️
Faster Resolution Skip the "have you tried rebooting" phase with concrete data

Feature Deep Dive: Cisco Port Grid

This feature renders a visual representation of physical switch ports that mirrors the actual front panel of a Cisco Catalyst switch: odd ports (1, 3, 5...) on top and even ports (2, 4, 6...) on the bottom, grouped into banks of 24 and organized by line card/module.

πŸ”Œ Use Case: Physical Correlation

Network engineers standing at a rack can immediately correlate dashboard status with physical ports. Port 23 is always top-right of a bank, port 48 is always bottom-right. This layout matches what they see on the actual Cisco Catalyst hardware, eliminating mental translation.

Data Flow

πŸ“‘ LIBRENMS API
GET /api/v0/devices/{id}/ports

Returns array of interfaces:
[
  { name: "Gi1/0/1", ifOperStatus: "up", ifSpeed: 1000000000 },
  { name: "Gi1/0/2", ifOperStatus: "down" },
  { name: "Te1/1/1", ifOperStatus: "up", ifSpeed: 10000000000 }
]
β–Ό
Backend Aggregator β†’ Topology β†’ WebSocket β†’ Frontend Store
β–Ό
πŸ”§ INTERFACE NAME PARSING (portUtils.ts)
Input: "Gi1/0/24"
        β”‚β”‚ β”‚ β”‚ └── Port number (24)
        β”‚β”‚ β”‚ └──── Module/slot (0)
        β”‚β”‚ └────── Stack number (1)
        │└──────── Type prefix (Gi = Gigabit)
        └───────── Interface type

Output: {
  type: 'gigabit',
  typePrefix: 'Gi',
  stack: 1,
  module: 0,
  port: 24,
  sortKey: 10024,    // (1 Γ— 10000) + (0 Γ— 100) + 24
  isUplink: false,
  original: 'Gi1/0/24'
}
β–Ό
πŸ“Š PORT ORGANIZATION (organizePortsForGrid)
1 Parse each interface name β†’ GridPort object (failures β†’ "other" array)
2 Separate uplinks (10G+: Te*, Fo*, Hu*, Twe*) from access ports (Gi*, Fa*)
3 Group access ports by module (line card): Module 0, Module 1, etc.
4 Within each module, create banks of 24 ports: Bank 0 (1-24), Bank 1 (25-48)
5 Within each bank, split by odd/even for physical row layout
β–Ό
πŸ“‹ GRID DATA STRUCTURE
{
  modules: [
    {
      moduleId: 0,
      label: "Slot 0",
      banks: [
        { label: "1-24", oddPorts: [12 ports], evenPorts: [12 ports] },
        { label: "25-48", oddPorts: [12 ports], evenPorts: [12 ports] }
      ]
    }
  ],
  uplinks: [GridPort array],   // 10G+ SFP ports
  other: [Interface array]     // VLANs, mgmt, loopback
}

Interface Type Support

Prefix Full Name Speed Uplink?
Hu HundredGigabitEthernet 100G Yes
Fo FortyGigabitEthernet 40G Yes
Twe TwentyFiveGigE 25G Yes
Te TenGigabitEthernet 10G Yes
Gi GigabitEthernet 1G No
Fa FastEthernet 100M No

Visual Rendering

CISCO sw-core-01.network.local
Slot 0
1357911 131517192123
24681012 141618202224
Slot 1
1357911 131517192123
24681012 141618202224
Slot 2
1357911 131517192123
24681012 141618202224
58 up 4 down 10 disabled
LED Status Colors
Green: Port up, normal operation
Red: Port down, link failure
Amber: Up but >80% utilization
Grey: Admin disabled

Port Detail Panel

Clicking a port opens an expanded detail view:

← Back Gi1/0/24
Office HP LaserJet
Status ● up
Speed 1G
45%
In 12.5 Mbps
Out 2.3 Mbps
Trunk PoE: 12.5W

Component Hierarchy

PortGrid.tsx
Chassis Container (dark metallic styling)
Bezel Top: "CISCO" brand + device name
Uplink Module: PortSquare[] (variant="sfp")
Port Modules (for each line card)
PortBankDisplay (for each bank of 24)
Port number labels (1, 3, 5...)
Top row: PortSquare[] (odd ports)
Bottom row: PortSquare[] (even ports)
Bezel Bottom: Status legend
Other Interfaces (VLANs, mgmt, loopback)

Technical Details

Data source LibreNMS /api/v0/devices/{id}/ports
Parsing Regex matching for Cisco naming conventions
Layout CSS Flexbox, 12 ports per row
Port sizing RJ45: 18Γ—14px, SFP: 24Γ—20px
State React useState for selected port, hover tooltip
Interactivity Click β†’ detail panel, hover β†’ tooltip
Responsiveness Wraps naturally, banks stack vertically

Why This Layout

πŸ”Œ
Physical accuracy Odd ports top, even bottom; matches actual Cisco Catalyst hardware
πŸ“¦
Banks of 24 Matches line card groupings on physical switches
⬆️
SFP+ at top Mirrors physical uplink placement on switch chassis
🏷️
Module labels For modular switches (4506, etc.) with multiple line cards
πŸ‘οΈ
Rack correlation Engineers can match dashboard to physical port at a glance

Feature Deep Dive: Proxmox Panel

When clicking a Proxmox hypervisor node in the topology, this panel displays a detailed dashboard showing the node's health, all VMs and containers with live CPU/RAM metrics, and storage pool utilization with color-coded capacity warnings.

πŸ–₯️ Use Case: At-a-Glance Hypervisor Health

NOC operators need quick visibility into what's running on each Proxmox node without logging into the Proxmox web UI. This panel shows running/stopped VMs, resource consumption, and storage capacity in the same sidebar used for network device details.

Data Flow

πŸ–±οΈ USER ACTION
1 User clicks Proxmox node in topology (e.g., "proxmox1")
2 DeviceCard.tsx detects device_type === 'hypervisor'
3 ProxmoxPanel.tsx renders with nodeName prop
β–Ό
GET /api/vms/node/{nodeName}
β–Ό
🐍 BACKEND (vms.py)
1 Look up node in Redis cache (match by name, instance, or normalized input)
2 Filter VMs/LXCs from cache, separate into QEMU and LXC lists
3 Fetch storage pools from Proxmox API (on-demand, not cached)
4 Calculate running/total counts for summary badges
β–Ό
πŸ“‹ API RESPONSE
{
  "node": {
    "node": "proxmox1", "status": "online",
    "cpu": 23.5, "memory": 67.2,
    "maxcpu": 32, "maxmem": 137438953472
  },
  "vms": [
    { "vmid": 100, "name": "dc1", "status": "running",
      "cpu": 2.3, "memory": 45.0 }
  ],
  "lxcs": [
    { "vmid": 200, "name": "librenms", "status": "running",
      "cpu": 5.1, "memory": 32.0 }
  ],
  "storage": [
    { "storage": "local-zfs", "type": "zfspool",
      "used": 500000000000, "total": 2000000000000,
      "used_percent": 25.0 }
  ],
  "vms_running": 8, "vms_total": 10,
  "lxcs_running": 12, "lxcs_total": 12
}

Visual Layout

πŸ–₯️ NODE 1 / 1
Name CPU RAM
proxmox1 23.5% 67.2%
πŸ’» VMS 8 / 10
Name CPU RAM
dc1 2.3% 45.0%
wazuh-manager 12.5% 78.3%
template-win11 0.0% 0.0%
thehive 8.2% 65.4%
πŸ“¦ LXCS 12 / 12
Name CPU RAM
librenms 5.1% 32.0%
netdisco 1.2% 18.5%
watchtower 3.4% 24.2%
wazuh-indexer 15.2% 85.3%
πŸ’Ύ STORAGE 3 / 3
local-zfs zfspool
500 GB / 2.0 TB 25.0%
ceph-pool rbd
3.2 TB / 5.0 TB 64.0%
backup-nfs nfs
8.5 TB / 10.0 TB 85.0%
Legend
Running / Online
Stopped
23.5% CPU (blue)
67.2% RAM (purple)
Normal (<70%) Warning (70-90%) Critical (>90%)

Data Sources

scheduler.py poll_proxmox() runs every 60 seconds
GET /api2/json/nodes β†’ Node health (cached in CACHE_PROXMOX)
GET /api2/json/nodes/{node}/qemu β†’ VMs per node (cached in CACHE_PROXMOX_VMS)
GET /api2/json/nodes/{node}/lxc β†’ Containers per node (cached)
GET /api2/json/nodes/{node}/storage β†’ Storage pools (fetched on-demand)

Node Matching Logic

The backend uses multiple strategies to match clicked nodes flexibly:

1 Exact match "proxmox1" β†’ matches node "proxmox1"
2 Normalized "Proxmox 1" β†’ matches "proxmox1" (ignores spaces, case)
3 Partial "host20" β†’ matches node in instance "pve-netlab"
4 Reverse Key "pve:host20" contains input "host20"

Technical Details

API Auth PVEAPIToken=user@realm!tokenid=secret header
CPU values 0.0-1.0 from API, multiply by 100 for display %
Memory Bytes from API, converted with formatBytes()
Polling 60 seconds (scheduler), 30 seconds (panel refresh)
Storage Fetched on-demand when panel opens (always fresh)
Node matching Multiple strategies for flexible matching

API Endpoints

GET /api/vms All running VMs/LXCs with summary stats
GET /api/vms/summary Just the counts (total_running, total_qemu, total_lxc)
GET /api/vms/node/{node_name} Full node detail with VMs, LXCs, storage (used by panel)

Feature Deep Dive: L2/L3 Topology Toggle

Toggles between two fundamentally different network visualizations: L2 Physical View shows CDP/LLDP discovered switch connections and device clusters, while L3 Logical View groups devices by VLAN membership with gateway routing relationships.

πŸ”€ Use Case: Dual Perspectives for Network Engineers

Network engineers need to see the network from two perspectives. The Physical (L2) view shows "what's plugged into what": actual cable connections discovered via CDP/LLDP. The Logical (L3) view shows "what talks to what": devices grouped by their VLAN assignments with inter-VLAN gateways highlighted.

Architecture

VIEW MODE TOGGLE
User clicks "Physical (L2)" or "Logical (L3)"
L2 Physical
TopologyCanvas.tsx
↓
Uses cached topology data from WebSocket updates
↓
topologyToNodes() / topologyToEdges()
↓
Creates:
  • ClusterNode for each cluster
  • DeviceNode when expanded
  • ExternalNode for WAN links
  • PhysicalLinkEdge for CDP/LLDP
↓
ReactFlow renders L2 topology with clusters, devices, and CDP/LLDP connections
L3 Logical
fetchL3Topology()
↓
GET /api/topology/l3
↓
BACKEND: get_l3_topology()
↓
Aggregator:
  • Load VLANs from Redis
  • Map LibreNMS IDs β†’ topology IDs
  • Group devices by VLAN
  • Identify gateway devices
↓
l3TopologyToNodes() / l3TopologyToEdges()
↓
ReactFlow renders L3 topology with VLAN groups and routing edges

Visual Layout

L2 Physical View (Cyan theme)
Physical (L2) Logical (L3) Visualize Export Reset
Core Network
cat-1 cat-2 cat-3
Firewalls
pa-3410
Access Layer
☁ Internet WAN
LINK COLORS
Healthy (Speedtest OK) Active Link Degraded / Warning Down Dumb switch (no CDP)
L3 Logical View (Purple theme)
Physical (L2) Logical (L3) Reset Layout
πŸ”² VLAN 10 GW
Management
4 devices
GW
πŸ”² VLAN 20 GW
Servers
5 devices
πŸ”² VLAN 100
Student Labs
12 devices
GW
πŸ”² VLAN 200
Faculty
3 devices
VLAN FILTER
β˜‘ VLAN 10 β˜‘ VLAN 20 ☐ VLAN 100 ☐ VLAN 200 [Clear]

VlanGroupNode Component

πŸ”² VLAN 10 GW
Management Network
up (green) down (red) gateway (purple ring)

Gateway Detection Algorithm

1
Build device-to-VLAN mapping For each VLAN membership: device_vlans[device_id].add(vlan_id)
2
Identify gateways IF len(device_vlans[device]) > 1: gateway_devices.append(device)
3
Mark gateways in VLAN groups device.is_gateway = device_id in gateway_devices
β†’
βœ“ Core switches are gateways (in VLAN 1, 10, 20, 100, etc.)
βœ“ Access devices are NOT gateways (only in one VLAN)
βœ“ L3 edges connect VLANs that share gateway devices

L3 Edge Connection Algorithm

1
Build gateway-to-VLANs map gatewayToVlans[gateway_device].append(vlan_id)
2
Create edges between VLANs sharing gateways For each pair (vlan_i, vlan_j): Create edge vlan-i ↔ vlan-j
Example:
Gateway "cat-1" is in VLANs [10, 20, 100]
Creates edges: VLAN 10 ↔ VLAN 20, VLAN 10 ↔ VLAN 100, VLAN 20 ↔ VLAN 100

Data Models

🐍 Backend Models (vlan.py)
class Vlan:
    vlan_id: int           # e.g., 10, 20, 100
    vlan_name: str | None  # e.g., "Management"
    device_count: int

class VlanMembership:
    device_id: str         # Topology device ID
    librenms_device_id: int
    port_name: str | None  # e.g., "Gi1/0/24"
    vlan_id: int
    is_untagged: bool

class L3TopologyNode:
    device_id: str
    display_name: str
    status: str            # "up", "down", "unknown"
    is_gateway: bool       # True if multi-VLAN
    vlan_ids: list[int]

class L3TopologyVlanGroup:
    vlan_id: int
    vlan_name: str | None
    devices: list[L3TopologyNode]
    gateway_devices: list[str]
πŸ“˜ Frontend Types (vlan.ts)
type ViewMode = 'l2' | 'l3'

interface L3Topology {
  vlans: Vlan[]
  memberships: VlanMembership[]
  vlan_groups: L3TopologyVlanGroup[]
  gateway_devices: string[]
}

// Zustand Store (nocStore.ts)
interface NocState {
  viewMode: ViewMode
  l3Topology: L3Topology | null
  selectedVlans: Set<number>

  setViewMode: (mode: ViewMode) => void
  setL3Topology: (t: L3Topology | null) => void
  toggleVlanFilter: (id: number) => void
  clearVlanFilter: () => void
}

Data Flow

πŸ“‘ LIBRENMS API
GET /resources/vlans Returns all VLANs with device_id mappings
β–Ό
⏱️ SCHEDULER (poll_vlans)
1 Runs every 300 seconds
2 Stores CACHE_VLANS (unique VLANs)
3 Stores CACHE_VLAN_MEMBERSHIPS (port→VLAN)
β–Ό
GET /api/topology/l3
β–Ό
πŸ”§ AGGREGATOR (get_l3_topology)
1 Map LibreNMS IDs to topology IDs
2 Group devices by VLAN
3 Detect gateways (multi-VLAN devices)
4 Build vlan_groups with nodes
β–Ό
πŸ“‹ API RESPONSE
{
  "vlans": [{ "vlan_id": 10, "vlan_name": "Management", "device_count": 8 }],
  "memberships": [{ "device_id": "cat-1", "vlan_id": 10, "port_name": "Gi1/0/1" }],
  "vlan_groups": [{
    "vlan_id": 10,
    "vlan_name": "Management",
    "devices": [{ "device_id": "cat-1", "is_gateway": true, "status": "up" }],
    "gateway_devices": ["cat-1", "cat-2"]
  }],
  "gateway_devices": ["cat-1", "cat-2", "cat-3"]
}
β–Ό
βš›οΈ FRONTEND TRANSFORM
1 l3TopologyToNodes() creates VlanGroupNode per VLAN
2 Grid layout: columns = √(vlan_count)
3 l3TopologyToEdges() connects shared-gateway VLANs
4 Purple dashed edges for routing relationships

Component Hierarchy

TopologyCanvas.tsx
β”œβ”€β”€ View Mode Toggle
β”‚ β”œβ”€β”€ "Physical (L2)" button [Cyan when active]
β”‚ └── "Logical (L3)" button [Purple when active]
β”œβ”€β”€ L2 Mode (viewMode === 'l2')
β”‚ β”œβ”€β”€ ClusterNode[] - device clusters with status dots
β”‚ β”œβ”€β”€ DeviceNode[] - expanded cluster devices
β”‚ β”œβ”€β”€ ExternalNode[] - WAN links
β”‚ β”œβ”€β”€ PhysicalLinkEdge[] - CDP/LLDP connections
β”‚ └── Mermaid export buttons
└── L3 Mode (viewMode === 'l3')
β”œβ”€β”€ VlanGroupNode[] - VLAN boxes with device grids
β”œβ”€β”€ L3 Edges - purple dashed gateway connections
└── VLAN Filter Sidebar - checkbox chips

Technical Details

VLAN polling 300s interval (same as topology)
Gateway detection Devices in >1 VLAN automatically flagged
L3 node layout Grid with √(count) columns
L3 edge creation Connect VLANs sharing gateway devices
View persistence LocalStorage for node positions per view mode
VLAN filter Clickable chips, filters displayed groups
Theme colors Cyan = L2/Physical, Purple = L3/Logical
Loading state "Loading L3 topology..." while fetching

UI States

State Display
L2 active Cyan toggle button, cluster/device/link view
L3 active Purple toggle button, VLAN group view
L3 loading Centered "Loading L3 topology..." spinner
VLAN filtered Only selected VLANs visible, Clear button enabled
No filter All VLANs displayed in grid

Tech Stack

Data Source LibreNMS /resources/vlans endpoint
Backend FastAPI, Redis cache
Polling APScheduler (300s interval)
Frontend React, TypeScript, ReactFlow
State Zustand store
Styling Tailwind CSS (purple accent for L3)

Feature Deep Dive: Mermaid Topology Export

Generates a Mermaid flowchart diagram from the live network topology data, allowing users to either view an interactive visualization in a modal or download a .mmd file for use in documentation, wikis, or presentations.

πŸ“ Use Case: Portable Network Documentation

NOC operators and network engineers need to document or share the network topology without giving direct dashboard access. The Mermaid export creates portable diagram code that renders in GitHub READMEs, Confluence, Notion, Obsidian, and any Mermaid-compatible tool. The visualize modal provides immediate in-app viewing with pan/zoom controls.

Architecture

USER INTERACTION
User clicks "Visualize" or "Export Mermaid"
Visualize
handleVisualizeMermaid()
↓
Steps:
  • Get topology from store
  • generateMermaidDiagram(topology)
  • setMermaidDiagram(content)
  • setShowMermaidModal(true)
↓
MermaidModal.tsx
↓
Render:
  • mermaid.render('mermaid-diagram', diagram)
  • Parse SVG output
  • Inject into container
  • Pan/zoom controls
↓
Interactive SVG diagram with dark theme styling
Export
handleExportMermaid()
↓
Steps:
  • Get topology from store
  • generateMermaidDiagram(topology)
  • downloadFile(content, filename)
↓
Browser Download API
↓
Downloads "topology-YYYY-MM-DD.mmd"

Visual Layout

Mermaid Generation Algorithm

INPUT: Topology object from store: { clusters, devices, connections, external_links }
1
Initialize flowchart lines = ['flowchart TB']
2
Create subgraphs for each cluster
subgraph {clusterId}["{clusterName}"]
β€’ sanitizedId = sanitizeMermaidId(deviceId) β€’ Shape based on device type (hexagon/trapezoid/rectangle) β€’ Label includes name + IP in <small> tag
3
Add external endpoint nodes {externalId}(("{label}")) // circle/stadium shape
4
Add internal connections
β€’ Deduplicate bidirectional links (sort endpoints, track in Set) β€’ With ports: A ---|"Gi0/1 ↔ Gi0/2"| B β€’ Without ports: A --- B
5
Add external link connections (dashed) {sourceId} -.-> {targetId} // dashed arrow
OUTPUT: lines.join('\\n')

Generated Mermaid Syntax

Mermaid Example output
flowchart TB
    subgraph core_network["Core Network"]
        cat_1{{"Core Switch 1<br/><small>10.2.10.1</small>"}}
        cat_2{{"Core Switch 2<br/><small>10.2.10.2</small>"}}
    end

    subgraph firewalls["Firewalls"]
        pa_3410[/"PA-3410<br/><small>10.2.60.3</small>"\]
    end

    ext_Internet(("Internet"))

    cat_1 ---|"Gi0/1 ↔ Gi0/1"| cat_2
    cat_1 ---|"Te1/0/1 ↔ eth1"| pa_3410
    pa_3410 -.-> ext_Internet

Device Shape Mapping

Device Type Mermaid Shape Syntax Visual
Switch Hexagon {{"label"}} ⬑
Firewall Trapezoid [/"label"\] β—’
Server/Other Rectangle ["label"] β–­
External Stadium/Circle (("label")) ●

Connection Style Mapping

Connection Type Mermaid Syntax Visual
Physical Link A --- B ── (solid)
Physical + Ports A ---|"Gi0/1 ↔ Gi0/2"| B ── with label
External/WAN A -.-> B β”„β”„ (dashed arrow)

ID Sanitization

function sanitizeMermaidId(id: string): string {
  // Mermaid only allows alphanumeric and dashes
  // Replace all other characters with underscore
  return id.replace(/[^a-zA-Z0-9-]/g, '_')
}

// Examples:
// "cat-1.corp.local" β†’ "cat_1_corp_local"
// "Core Switch #1"   β†’ "Core_Switch__1"
// "10.2.10.1"        β†’ "10_2_10_1"

MermaidModal Component

MermaidModal.tsx
β”œβ”€β”€ Mermaid Initialization (dark theme)
β”‚ β”œβ”€β”€ primaryColor: #3b82f6 (blue)
β”‚ β”œβ”€β”€ background: #0d1117 (dark)
β”‚ β”œβ”€β”€ clusterBkg: #161b22
β”‚ └── fontFamily: monospace
β”œβ”€β”€ State
β”‚ β”œβ”€β”€ scale: number (0.1 - 20)
β”‚ β”œβ”€β”€ position: { x, y }
β”‚ β”œβ”€β”€ isDragging: boolean
β”‚ └── error: string | null
β”œβ”€β”€ Render Flow
β”‚ β”œβ”€β”€ mermaid.render('mermaid-diagram', diagram)
β”‚ β”œβ”€β”€ DOMParser.parseFromString(svg, 'text/html')
β”‚ └── NOTE: Uses 'text/html' not 'image/svg+xml' (Mermaid SVG contains HTML)
β”œβ”€β”€ Controls
β”‚ β”œβ”€β”€ Zoom: [ βˆ’ ] [100%] [ + ]
β”‚ β”œβ”€β”€ Reset: [ β€’ ]
β”‚ └── Close: [ βœ• ] or Escape key
└── Interactions
β”œβ”€β”€ Drag to pan (left mouse button)
β”œβ”€β”€ Scroll wheel to zoom
└── Escape key to close

Download Function

function downloadFile(content: string, filename: string) {
  // 1. Create Blob from content
  const blob = new Blob([content], { type: 'text/plain' })

  // 2. Create object URL
  const url = URL.createObjectURL(blob)

  // 3. Create temporary anchor element
  const a = document.createElement('a')
  a.href = url
  a.download = filename  // e.g., "topology-2026-01-27.mmd"

  // 4. Trigger download
  document.body.appendChild(a)
  a.click()

  // 5. Cleanup
  document.body.removeChild(a)
  URL.revokeObjectURL(url)
}

Mermaid Theme Configuration

Colors
primaryColor: '#3b82f6' Nodes
primaryTextColor: '#e5e7eb' Text
lineColor: '#6b7280' Edges
Backgrounds
background: '#0d1117' GitHub dark
clusterBkg: '#161b22' Subgraphs
Flowchart Options
htmlLabels: true Allow <br/>, <small>
curve: 'basis' Smooth edges
nodeSpacing: 50
rankSpacing: 80

Technical Details

Diagram type flowchart TB (top-to-bottom)
Cluster rendering Mermaid subgraph blocks
Device labels HTML with <br/> and <small> for IP
ID sanitization Replace non-alphanumeric with underscore
Deduplication Sort endpoints, join with --, track in Set
SVG parsing Use text/html parser (not XML) for HTML labels
Zoom range 10% - 2000%
Export filename topology-{YYYY-MM-DD}.mmd
Modal size 90vw Γ— 85vh

UI States

State Display
L2 mode Visualize + Export buttons visible
L3 mode Buttons hidden (Mermaid only for L2)
Modal open Dark overlay, centered diagram
Rendering Mermaid processes diagram async
Error Red warning icon + error message
Dragging Cursor changes to grabbing

Tech Stack

Diagram Library Mermaid.js (v10+)
SVG Rendering Browser DOMParser
Pan/Zoom React useState + CSS transform
Download Blob API + Object URL
Styling Tailwind CSS (dark theme)

File Output Format

πŸ“„ topology-2026-01-27.mmd
flowchart TB
    subgraph core_network["Core Network"]
        cat_1{{"Core Switch 1<br/><small>10.2.10.1</small>"}}
        cat_2{{"Core Switch 2<br/><small>10.2.10.2</small>"}}
    end

    subgraph access_layer["Access Layer"]
        sw_bldg1{{"Building 1 Switch<br/><small>10.2.20.1</small>"}}
    end

    ext_Internet(("Internet"))

    cat_1 ---|"Te1/0/1 ↔ Te1/0/1"| cat_2
    cat_1 --- sw_bldg1
    cat_1 -.-> ext_Internet

Feature Deep Dive: Alert System

Real-time alerting system that derives alerts from device status and LibreNMS monitoring data. Features a multi-tier notification approach: bell icon with count badge, toast notifications, and full-screen critical overlay for urgent incidents requiring immediate attention.

πŸ”” Use Case: Escalating Notification Levels

NOC operators need immediate visibility when network devices go down. The alert system provides escalating notification levels: a subtle count badge for awareness, toast popups for new events, and an unmissable full-screen overlay with audio for critical outages. Alerts auto-clear when devices recover, with optional acknowledgment to track incident response.

Architecture

DATA SOURCES
πŸ“Š
Device Status status === 'down' β†’ Alert
πŸ“‘
LibreNMS Alerts Cached alert rules β†’ Alert
β–Ό
🐍 BACKEND (alerts.py)
_get_device_down_alerts() _get_librenms_alerts()
Combine + Sort by timestamp
In-memory tracking: _acknowledged_alerts: set[str]
β–Ό
GET /api/alerts
β–Ό
FRONTEND
useAlerts() hook
alertStore.ts settingsStore.ts
πŸ”” Header.tsx Bell Icon + Badge
πŸ“‹ ToastContainer Toast Stack (max 5)
🚨 CriticalOverlay Full Screen + Audio
βš™οΈ Settings Sound/Volume

Visual Layout

Header Bell Icon
WATCHTOWER
πŸ”” 3
βš™οΈ ☰
Red pulsing badge = critical alerts
Toast Notifications (top-right)
⚠️
core-sw-1
Device unreachable
⚠️
firewall-1
High CPU utilization
βœ“
ap-lobby
Device recovered
+2 more alerts
Critical Overlay (full-screen)
⚠️ CRITICAL ALERT core-sw-1 Device unreachable 10:42:15 AM
Acknowledge View Device
πŸ”΄ Browser tab flashing πŸ”Š Alert sound plays πŸ“’ Browser notification (if hidden)

Data Models

🐍 Backend Models (alert.py)
class AlertSeverity(str, Enum):
    CRITICAL = "critical"   # Device down
    WARNING = "warning"     # Degraded
    INFO = "info"           # Informational
    RECOVERY = "recovery"   # Device back up

class AlertStatus(str, Enum):
    ACTIVE = "active"
    ACKNOWLEDGED = "acknowledged"
    RESOLVED = "resolved"

class Alert(BaseModel):
    id: str              # "device-down-core-sw-1"
    device_id: str       # Topology device ID
    severity: AlertSeverity
    message: str
    details: str | None
    status: AlertStatus
    timestamp: datetime
    acknowledged_at: datetime | None
    acknowledged_by: str | None
πŸ“˜ Frontend Types
interface Alert {
  id: string
  device_id: string
  severity: 'critical' | 'warning' | 'info' | 'recovery'
  message: string
  timestamp: string
  status: 'active' | 'acknowledged' | 'resolved'
}

interface Toast {
  id: string        // "toast-{alertId}-{timestamp}"
  alert: Alert
  dismissed: boolean
}

// alertStore.ts
interface AlertState {
  alerts: Alert[]
  toasts: Toast[]
  criticalOverlay: Alert | null
  // + action methods
}

Alert Generation Flow

⏱️ POLLING (every 30s)
1 Check device status from topology
2 Load LibreNMS alerts from cache
β–Ό
πŸ” ALERT DETECTION
device.status === 'down' β†’ Generate CRITICAL alert
id: "device-down-{device_id}" message: "Device unreachable: {name}"
β–Ό
πŸ”„ COMBINE & DEDUPLICATE
Sort by timestamp (newest first) Return to frontend via /api/alerts

Acknowledgment System

Frontend
POST /api/alert/{id}/acknowledge
β†’
Backend
_acknowledged_alerts.add(alert_id)
Response
{"status": "acknowledged"}
←
Store
alertStore.acknowledgeAlert(id)
ℹ️ Acknowledgments stored in memory; reset on service restart. Auto-clears when device recovers.

Component Hierarchy

Layout.tsx
β”œβ”€β”€ Header.tsx β†’ Bell Icon Button
β”‚ β”œβ”€β”€ SVG bell icon
β”‚ └── Badge (count + color): Red+pulse=critical, Amber=warnings
β”œβ”€β”€ ToastContainer.tsx β†’ Toast[] (max 5)
β”‚ β”œβ”€β”€ Severity icon (⚠️ βœ“ ℹ️)
β”‚ β”œβ”€β”€ StatusDot (pulsing if critical)
β”‚ β”œβ”€β”€ Device ID + Message + Time
β”‚ └── Actions: [Acknowledge] [Dismiss]
└── CriticalOverlay.tsx
β”œβ”€β”€ Dark backdrop (blur + red glow)
β”œβ”€β”€ Modal: icon, header, device, buttons
└── Effects: playAlertSound(), flash title, browser notification

Alert Lifecycle

DEVICE GOES DOWN ↓
Alert: status=ACTIVE
β†’ Bell badge count increases β†’ Toast appears (top-right) β†’ Critical overlay (if enabled)
User clicks [Acknowledge] ↓
status=ACKNOWLEDGED
β†’ Badge remains (still active) β†’ Toast can be dismissed β†’ Overlay closes
Device recovers (status β†’ 'up') ↓
Alert removed from list
β†’ Badge count decreases (or disappears)

Severity Styling

Severity Color Icon Badge
Critical #f85149 ⚠️ Triangle Red + pulse
Warning #d29922 ⚠️ Triangle Amber
Info #58a6ff ℹ️ Circle -
Recovery #3fb950 βœ“ Check -

Critical Overlay Features

Backdrop bg-black/80 backdrop-blur-sm
Red glow box-shadow: inset 0 0 100px 20px rgba(248,81,73,0.3)
Pulsing animation: pulse 2s ease-in-out infinite
Audio new Audio('/alert-sound.mp3').play()
Title flash Toggle πŸ”΄ CRITICAL β€” prefix every 1s
Notification new Notification('Watchtower Critical Alert', {options})

API Endpoints

GET /api/alerts List all alerts (sorted by time)
GET /api/alerts?status=active Filter by status
GET /api/alert/{id} Get single alert details
POST /api/alert/{id}/acknowledge Mark as acknowledged
POST /api/alert/{id}/resolve Mark as resolved

Technical Details

Polling interval 30s (alerts refresh)
Max toasts 5 visible (+N more indicator)
Toast animation 300ms slide + fade
Auto-dismiss Configurable per-severity
Device down ID device-down-{device_id}
LibreNMS ID librenms-{alert_id}
Auto-clear When device status returns to 'up'

Roadmap: Remote Notifications

πŸš€ Planned: Phase 9

Remote notification support to alert operators even when away from the dashboard. Critical alerts will trigger notifications through multiple channels for redundancy.

πŸ’¬
Discord Webhooks Post alerts to dedicated NOC channel with severity-based formatting
πŸ“§
Email Alerts SMTP integration for critical alerts with device details and quick links
πŸ“ˆ
Escalation Rules Time-based escalation if alerts remain unacknowledged

Tech Stack

Backend FastAPI, Pydantic models
Alert Sources Device status + LibreNMS cache
Acknowledgment In-memory set (service lifetime)
Frontend State Zustand (alertStore, settingsStore)
Notifications Browser Notification API, Web Audio API
Styling Tailwind CSS (status colors)

Key Technical Decisions

WebSocket over polling Reduces latency and server load for real-time updates
Redis caching Fast in-memory access for frequently-read topology data
Multi-source aggregation No single monitoring tool has complete network visibility
ReactFlow for topology Handles complex graph layouts with built-in pan/zoom/interactions
Zustand over Redux Simpler state management for medium complexity app
APScheduler Reliable job scheduling with configurable intervals per data type
YAML configuration Human-readable, easy to version control (minus secrets)
TypeScript throughout Type safety for complex nested data models

API Surface

REST Endpoints
/api/topology /api/topology/l3 /api/alerts /api/vms /api/vms/node/{name} /api/speedtest /api/port-groups /api/port-groups/export /api/ports/search /api/ports/aliases
WebSocket
/ws/topology Pushes on cache changes

Development Roadmap

Phase 1 βœ“
Core Dashboard
  • FastAPI backend
  • React frontend
  • Basic topology
Phase 2 βœ“
Dual-View Topology
  • L2/L3 toggle
  • VLAN filtering
  • Cluster nodes
Phase 3 βœ“
Integrations
  • LibreNMS polling
  • Proxmox panel
  • Netdisco data
Phase 4 βœ“
Auto-Discovery
  • CDP/LLDP links
  • Dynamic topology
  • Collision detection
Phase 5 βœ“
Monitoring Widgets
  • Speedtest
  • Port groups
  • Port search
Phase 6 βœ“
Export & Polish
  • Mermaid export
  • CSV logging
  • Collapsible widgets
Phase 7 β—‹
Authentication
  • JWT auth
  • User sessions
  • Role-based access
Phase 8 β—‹
Settings UI
  • Config editor
  • Polling controls
  • Theme options
Phase 9 β—‹
Remote Notifications
  • Discord webhooks
  • Email alerts
  • Escalation rules

Current Status

Core functionality deployed and operational
  • βœ“ FastAPI backend with Redis caching
  • βœ“ React frontend with ReactFlow topology
  • βœ“ LibreNMS + Netdisco + Proxmox integrations
  • βœ“ WebSocket real-time updates
  • βœ“ CDP/LLDP auto-discovery
  • βœ“ L2/L3 topology view toggle
  • βœ“ Cisco-style port grid visualization
  • βœ“ Speedtest widget with CSV export
  • βœ“ Port group traffic monitoring
  • βœ“ Mermaid diagram export
  • β—‹ JWT authentication
  • β—‹ Settings UI