Designing and Building an Open Source SOC

[+] Status: In Progress [+] Origin: Polk State College [+] Date: 2025.12

>> TECH_STACK:

[Wazuh][TheHive][Cortex][MISP][Zeek][Suricata][Elasticsearch][Docker][Proxmox][Ubuntu][MCP]

A full-stack Security Operations Center built entirely with open-source tools, deployed on virtualized infrastructure. This project provides enterprise-grade threat detection, incident response, and threat intelligence capabilities for monitoring the college network environment.

// Institutional Value Created

By building this SOC in-house with open-source tools, Polk State achieved enterprise-grade security capabilities while avoiding the substantial costs of outsourced alternatives.

$140K–$217K Cost Avoidance (3-Year) vs. outsourced MSSP at $45/endpoint/month

$50K–$150K Educational Platform Value Equivalent commercial cyber range capability

~200 Endpoints Monitored Windows, Linux, and network devices

7 VMs Integrated Components SIEM, IR, threat intel, NIDS

// Industry Comparison (200 Endpoints)

Basic MSSP (3-year) $324,000 $45/endpoint/month industry baseline

Open-Source Build (3-year) $0 Infrastructure + implementation + maintenance

For context: LSU's statewide SOC serving 25 institutions costs $7.5M annually (~$300K/institution); Cleveland State received $451K in state capital funding for IT security infrastructure; and Houston Community College's SOC/Cyber Range RFP estimated at $150K–$300K.

🎓 Educational Platform Production-grade training environment for cybersecurity students; equivalent to $50K–$150K commercial cyber range platforms (CYRIN, Cyberbit, TryHackMe Enterprise) 

🔒 Vendor Independence Zero licensing costs on core components; full ownership of customizations and institutional knowledge 

📈 Scalable Architecture Built for growth; infrastructure supports hundreds of additional endpoints without licensing increases 

🏫 Proven Higher Ed Model Joins 14+ universities nationwide operating student-powered SOCs (LSU, UNLV, UC, Fairfield, Maryville); Florida allocated $20M for state college cybersecurity programs 

// Sources & Benchmarks

University SOC Investments

Industry Pricing & State Funding

INFRASTRUCTURE

⚙ Proxmox VE 9.2.1 Hypervisor

⇄ LACP Bond + VLAN Bridge vmbr0 (802.3ad)

ACL-Protected Boundary

SOC STACK

Wazuh SIEM/XDR

TheHive IR Platform

Cortex Analysis

MISP Threat Intel

NIDS Network IDS

Monitored Networks

ENDPOINTS

Windows

Linux

Network Devices

Cloud Services

The SOC stack consists of seven specialized VMs, each serving a distinct security function:

Wazuh Server

OS: Ubuntu 24.04

Software: Wazuh Manager, Filebeat

Role: SIEM Core, Log Aggregation, Alerting

Wazuh Indexer

OS: Ubuntu 24.04

Software: Wazuh Indexer (ES fork)

Role: Database/Storage for SIEM data

Wazuh Dashboard

OS: Ubuntu 24.04

Software: Wazuh Dashboard (Kibana)

Role: Web UI for SIEM

TheHive 5

OS: Ubuntu 22.04

Software: TheHive, Cassandra, ES

Role: Incident Response (Case Mgmt)

Cortex

OS: Ubuntu 22.04

Software: Cortex, Docker

Role: Observable Analysis Engine

MISP

OS: Ubuntu 24.04

Software: MISP 2.5, MariaDB, Redis

Role: Threat Intelligence Platform

NIDS

OS: Ubuntu 24.04

Software: Suricata, Zeek, Wazuh Agent

Role: Network Intrusion Detection

Data flows between components via authenticated APIs and agent protocols:

💻 Endpoints

TCP/1514

▶

🛡 Wazuh Manager

▶

TCP/1514

👁 NIDS Suricata + Zeek

HTTPS/9200

🗃 Wazuh Indexer

▶

Query

📈 Dashboard

Alert Threshold Trigger

📝 custom-w2thive.py

API/9000

▶

🐝 TheHive

API/9001

▶

🧠 Cortex

🔎 MISP

Threat Intel Sync

🐝 TheHive

01 Endpoint Intrusion Detection

Example: PowerShell-based attack (MITRE T1059.001)

Detection Wazuh Agent detects suspicious PowerShell execution

Transport Log sent to Wazuh Manager via TCP/1514

Correlation Manager correlates against MITRE ATT&CK rules

Escalation High severity alert pushed to TheHive via API

Investigation Analyst promotes alert to case, extracts observables

Enrichment Cortex queries VirusTotal, AbuseIPDB for context

02 Network Threat Detection

Example: Malware Command & Control communication

Capture NIDS receives mirrored traffic via SPAN port

Analysis Suricata matches signature, Zeek logs metadata

Ingestion Wazuh Agent reads JSON logs, ships to Manager

Alerting Manager generates alert from NIDS input

Intel Check MISP queried for known IoC correlation

🛡 Wazuh — The Nervous System

Central log collection, correlation, file integrity monitoring, and alert generation.

Ingestion

Agents: Endpoints send logs via TCP/1514 to Manager
NIDS: Zeek/Suricata JSON logs via local Wazuh Agent

Processing

Decodes logs against decoders and rules
Correlates events against MITRE ATT&CK framework
Triggers alerts based on rule matches

Storage

Ships data via HTTPS/9200 to Wazuh Indexer
Indexer provides long-term storage and search

🐝 TheHive — The Brain

Central hub for alert management, case creation, and response coordination.

Backend

Cassandra: Stores case data, tasks, and logs
Elasticsearch: Indexing with X-Pack security and thehive_role

Wazuh Integration

Custom custom-w2thive.py script on Wazuh Manager
Converts high-severity alerts to JSON, POSTs to TheHive API
Analysts promote alerts to cases for investigation

Cluster Config

Configured as "Cluster of One" using Akka/Pekko
Prevents service shutdown from missing cluster peers

🧠 Cortex — The Muscle

Performs active analysis on observables extracted from cases.

Interaction

TheHive connects via API on port 9001
Receives observables (IPs, hashes, domains)

Analyzers

Python scripts in Docker containers
Queries: VirusTotal, AbuseIPDB, MISP, etc.
Returns enrichment reports to TheHive

🔎 MISP — The Memory

Stores Indicators of Compromise and correlates against threat feeds.

TheHive Integration

Pulls data via API (Port 443) to enrich cases
SSL: ssl.loose.acceptAnyCertificate = true for self-signed certs
Bi-directional: Import events, export case intel

Cortex Integration

MISP Analyzer queries local database
"Have we seen this hash/IP before?"

👁 NIDS — The Eyes

Monitors raw network traffic via mirror port for signatures and anomalies.

Software Stack

Suricata: Signature-based IDS, outputs eve.json
Zeek: Protocol analysis, configured for JSON output (not default TSV)

Integration

Wazuh Agent reads local JSON logs
Ships to Manager for correlation
Enables network-level visibility in SIEM

🔌 VLAN Tagging

Tags stripped by Proxmox Bridge (vmbr0) before reaching VM. Guest OS sees standard interface (ens18) — no internal VLAN config needed.

🔒 TheHive + Elasticsearch

TheHive 5 requires ES 8.x with X-Pack security. Must create thehive_role with proper index permissions.

📄 Zeek JSON Output

Default is TSV. Apply local policy: @load policy/tuning/json-logs.zeek for Wazuh compatibility.

💾 LVM Resizing

Ubuntu VMs default to 50% disk. Run lvextend and resize2fs to utilize full virtual disk.

🔗 MISP Auth

Generate authKey in MISP, configure in TheHive's application.conf to bridge systems.

☕ JVM Memory Tuning

TheHive runs on JVM with Elasticsearch backend—both are memory-hungry. JVMs allocate heap at startup and don't release it. Tune -Xms/-Xmx flags carefully to balance performance without exhausting host RAM.

🐧 Ubuntu 22.04 LTS for TheHive & Cortex

Ubuntu 22.04 LTS ("Jammy Jellyfish") provides a stable, secure, and modern foundation for running TheHive and Cortex, particularly in containerized environments. It ensures compatibility with updated dependencies like OpenSSL 3.0 and provides long-term support until 2032 for critical security infrastructure.

Optimal Docker Environment: TheHive and Cortex are frequently deployed together using Docker Compose for 24/7 SOCs. Jammy's robust container support makes it ideal for high-load, single-server setups.
Modern Security Requirements: Provides necessary libraries and security hardening including OpenSSL 3.0 and the 5.15 Linux kernel—crucial for running latest Java versions (TheHive) and secure network communication (Cortex analyzers).
Stability & Performance: LTS nature ensures the OS remains stable for years, essential for security infrastructure that cannot afford downtime.
Efficient Resource Handling: Optimized for enterprise-class deployments "from data center to edge," helping manage the resource-intensive nature of TheHive (with its Elasticsearch backend) and Cortex running together.

// Roadmap: AI-Powered SOC Assistant

The next phase of this project is building an AI chatbot that can interact with all SOC components through Model Context Protocol (MCP) servers. This enables natural language queries and actions across the entire security stack.

🤖

Local AI Chatbot

LLM-Powered Security Assistant

MCP Protocol

wazuh-mcp Wazuh

Query alerts by severity/timeframe
Search indexed events
Get agent status
Retrieve rule information

Endpoint: Wazuh API (55000)

thehive-mcp TheHive

Create/update cases
Add observables
Manage tasks
Search case history

Endpoint: TheHive API (9000)

cortex-mcp Cortex

Run analyzers on observables
Get analysis reports
List available analyzers

Endpoint: Cortex API (9001)

misp-mcp MISP

Search threat intel
Query IoC database
Get event details
Correlate observables

Endpoint: MISP API (443)

// Example Analyst Workflows

💬 "Show me all critical Wazuh alerts from the last 4 hours"

Chatbot → wazuh-mcp → Wazuh API → Returns filtered alerts

💬 "Create a case in TheHive for this suspicious IP and run VirusTotal analysis"

Chatbot → thehive-mcp (create case) → cortex-mcp (run analyzer) → Returns case link + report

💬 "Have we seen this hash before in our threat intel?"

Chatbot → misp-mcp → MISP search → Returns matching events/context

⚡

Faster Triage Natural language queries replace manual dashboard navigation

🔗

Unified Interface One conversation thread spans all four platforms

📚

Training Tool Students learn SOC workflows through guided AI interactions

Core infrastructure deployed and operational

✓ Proxmox VE hypervisor configured
✓ Wazuh stack (Manager, Indexer, Dashboard) deployed
✓ TheHive 5 + Cortex integrated
✓ MISP threat intelligence platform online
✓ NIDS (Suricata + Zeek) monitoring network
◯ Integrating additional threat intelligence feeds
◯ Building custom detection rules
◯ Developing response playbooks
◯ Building AI chatbot interface
◯ Developing wazuh-mcp server
◯ Developing thehive-mcp server
◯ Developing cortex-mcp server
◯ Developing misp-mcp server