Designing and Building an Open Source SOC

[+] Status: In Progress [+] Origin: Polk State College [+] Date: 2025.12
>> TECH_STACK:
[Wazuh][TheHive][Cortex][MISP][Zeek][Suricata][Elasticsearch][Docker][Proxmox][Ubuntu][MCP]

A full-stack Security Operations Center built entirely with open-source tools, deployed on virtualized infrastructure. This project provides enterprise-grade threat detection, incident response, and threat intelligence capabilities for monitoring the college network environment.

// Institutional Value Created

By building this SOC in-house with open-source tools, Polk State achieved enterprise-grade security capabilities while avoiding the substantial costs of outsourced alternatives.

$140K–$217K Cost Avoidance (3-Year) vs. outsourced MSSP at $45/endpoint/month
$50K–$150K Educational Platform Value Equivalent commercial cyber range capability
~200 Endpoints Monitored Windows, Linux, and network devices
7 VMs Integrated Components SIEM, IR, threat intel, NIDS

// Industry Comparison (200 Endpoints)

Basic MSSP (3-year) $324,000 $45/endpoint/month industry baseline
vs
Open-Source Build (3-year) $0 Infrastructure + implementation + maintenance

For context: LSU's statewide SOC serving 25 institutions costs $7.5M annually (~$300K/institution); Cleveland State received $451K in state capital funding for IT security infrastructure; and Houston Community College's SOC/Cyber Range RFP estimated at $150K–$300K.

🎓
Educational Platform Production-grade training environment for cybersecurity students; equivalent to $50K–$150K commercial cyber range platforms (CYRIN, Cyberbit, TryHackMe Enterprise)
🔒
Vendor Independence Zero licensing costs on core components; full ownership of customizations and institutional knowledge
📈
Scalable Architecture Built for growth; infrastructure supports hundreds of additional endpoints without licensing increases
🏫
Proven Higher Ed Model Joins 14+ universities nationwide operating student-powered SOCs (LSU, UNLV, UC, Fairfield, Maryville); Florida allocated $20M for state college cybersecurity programs
INFRASTRUCTURE
Proxmox VE 9.2.1 Hypervisor
LACP Bond + VLAN Bridge vmbr0 (802.3ad)
ACL-Protected Boundary
SOC STACK
Wazuh SIEM/XDR
TheHive IR Platform
Cortex Analysis
MISP Threat Intel
NIDS Network IDS
Monitored Networks
ENDPOINTS
Windows
Linux
Network Devices
Cloud Services

The SOC stack consists of seven specialized VMs, each serving a distinct security function:

Wazuh Server
OS: Ubuntu 24.04
Software: Wazuh Manager, Filebeat
Role: SIEM Core, Log Aggregation, Alerting
Wazuh Indexer
OS: Ubuntu 24.04
Software: Wazuh Indexer (ES fork)
Role: Database/Storage for SIEM data
Wazuh Dashboard
OS: Ubuntu 24.04
Software: Wazuh Dashboard (Kibana)
Role: Web UI for SIEM
TheHive 5
OS: Ubuntu 22.04
Software: TheHive, Cassandra, ES
Role: Incident Response (Case Mgmt)
Cortex
OS: Ubuntu 22.04
Software: Cortex, Docker
Role: Observable Analysis Engine
MISP
OS: Ubuntu 24.04
Software: MISP 2.5, MariaDB, Redis
Role: Threat Intelligence Platform
NIDS
OS: Ubuntu 24.04
Software: Suricata, Zeek, Wazuh Agent
Role: Network Intrusion Detection

Data flows between components via authenticated APIs and agent protocols:

💻 Endpoints
TCP/1514
🛡 Wazuh Manager
TCP/1514
👁 NIDS Suricata + Zeek
HTTPS/9200
🗃 Wazuh Indexer
Query
📈 Dashboard
Alert Threshold Trigger
📝 custom-w2thive.py
API/9000
🐝 TheHive
API/9001
🧠 Cortex
🔎 MISP
Threat Intel Sync
🐝 TheHive

01 Endpoint Intrusion Detection

Example: PowerShell-based attack (MITRE T1059.001)

1
Detection Wazuh Agent detects suspicious PowerShell execution
2
Transport Log sent to Wazuh Manager via TCP/1514
3
Correlation Manager correlates against MITRE ATT&CK rules
4
Escalation High severity alert pushed to TheHive via API
5
Investigation Analyst promotes alert to case, extracts observables
6
Enrichment Cortex queries VirusTotal, AbuseIPDB for context

02 Network Threat Detection

Example: Malware Command & Control communication

1
Capture NIDS receives mirrored traffic via SPAN port
2
Analysis Suricata matches signature, Zeek logs metadata
3
Ingestion Wazuh Agent reads JSON logs, ships to Manager
4
Alerting Manager generates alert from NIDS input
5
Intel Check MISP queried for known IoC correlation

🛡 Wazuh — The Nervous System

Central log collection, correlation, file integrity monitoring, and alert generation.

Ingestion

  • Agents: Endpoints send logs via TCP/1514 to Manager
  • NIDS: Zeek/Suricata JSON logs via local Wazuh Agent

Processing

  • Decodes logs against decoders and rules
  • Correlates events against MITRE ATT&CK framework
  • Triggers alerts based on rule matches

Storage

  • Ships data via HTTPS/9200 to Wazuh Indexer
  • Indexer provides long-term storage and search

🐝 TheHive — The Brain

Central hub for alert management, case creation, and response coordination.

Backend

  • Cassandra: Stores case data, tasks, and logs
  • Elasticsearch: Indexing with X-Pack security and thehive_role

Wazuh Integration

  • Custom custom-w2thive.py script on Wazuh Manager
  • Converts high-severity alerts to JSON, POSTs to TheHive API
  • Analysts promote alerts to cases for investigation

Cluster Config

  • Configured as "Cluster of One" using Akka/Pekko
  • Prevents service shutdown from missing cluster peers

🧠 Cortex — The Muscle

Performs active analysis on observables extracted from cases.

Interaction

  • TheHive connects via API on port 9001
  • Receives observables (IPs, hashes, domains)

Analyzers

  • Python scripts in Docker containers
  • Queries: VirusTotal, AbuseIPDB, MISP, etc.
  • Returns enrichment reports to TheHive

🔎 MISP — The Memory

Stores Indicators of Compromise and correlates against threat feeds.

TheHive Integration

  • Pulls data via API (Port 443) to enrich cases
  • SSL: ssl.loose.acceptAnyCertificate = true for self-signed certs
  • Bi-directional: Import events, export case intel

Cortex Integration

  • MISP Analyzer queries local database
  • "Have we seen this hash/IP before?"

👁 NIDS — The Eyes

Monitors raw network traffic via mirror port for signatures and anomalies.

Software Stack

  • Suricata: Signature-based IDS, outputs eve.json
  • Zeek: Protocol analysis, configured for JSON output (not default TSV)

Integration

  • Wazuh Agent reads local JSON logs
  • Ships to Manager for correlation
  • Enables network-level visibility in SIEM
🔌 VLAN Tagging

Tags stripped by Proxmox Bridge (vmbr0) before reaching VM. Guest OS sees standard interface (ens18) — no internal VLAN config needed.

🔒 TheHive + Elasticsearch

TheHive 5 requires ES 8.x with X-Pack security. Must create thehive_role with proper index permissions.

📄 Zeek JSON Output

Default is TSV. Apply local policy: @load policy/tuning/json-logs.zeek for Wazuh compatibility.

💾 LVM Resizing

Ubuntu VMs default to 50% disk. Run lvextend and resize2fs to utilize full virtual disk.

🔗 MISP Auth

Generate authKey in MISP, configure in TheHive's application.conf to bridge systems.

JVM Memory Tuning

TheHive runs on JVM with Elasticsearch backend—both are memory-hungry. JVMs allocate heap at startup and don't release it. Tune -Xms/-Xmx flags carefully to balance performance without exhausting host RAM.

🐧 Ubuntu 22.04 LTS for TheHive & Cortex

Ubuntu 22.04 LTS ("Jammy Jellyfish") provides a stable, secure, and modern foundation for running TheHive and Cortex, particularly in containerized environments. It ensures compatibility with updated dependencies like OpenSSL 3.0 and provides long-term support until 2032 for critical security infrastructure.

  • Optimal Docker Environment: TheHive and Cortex are frequently deployed together using Docker Compose for 24/7 SOCs. Jammy's robust container support makes it ideal for high-load, single-server setups.
  • Modern Security Requirements: Provides necessary libraries and security hardening including OpenSSL 3.0 and the 5.15 Linux kernel—crucial for running latest Java versions (TheHive) and secure network communication (Cortex analyzers).
  • Stability & Performance: LTS nature ensures the OS remains stable for years, essential for security infrastructure that cannot afford downtime.
  • Efficient Resource Handling: Optimized for enterprise-class deployments "from data center to edge," helping manage the resource-intensive nature of TheHive (with its Elasticsearch backend) and Cortex running together.

// Roadmap: AI-Powered SOC Assistant

The next phase of this project is building an AI chatbot that can interact with all SOC components through Model Context Protocol (MCP) servers. This enables natural language queries and actions across the entire security stack.

🤖
Local AI Chatbot
LLM-Powered Security Assistant
MCP Protocol
wazuh-mcp Wazuh
  • Query alerts by severity/timeframe
  • Search indexed events
  • Get agent status
  • Retrieve rule information
Endpoint: Wazuh API (55000)
thehive-mcp TheHive
  • Create/update cases
  • Add observables
  • Manage tasks
  • Search case history
Endpoint: TheHive API (9000)
cortex-mcp Cortex
  • Run analyzers on observables
  • Get analysis reports
  • List available analyzers
Endpoint: Cortex API (9001)
misp-mcp MISP
  • Search threat intel
  • Query IoC database
  • Get event details
  • Correlate observables
Endpoint: MISP API (443)

// Example Analyst Workflows

💬 "Show me all critical Wazuh alerts from the last 4 hours"
Chatbot → wazuh-mcp → Wazuh API → Returns filtered alerts
💬 "Create a case in TheHive for this suspicious IP and run VirusTotal analysis"
Chatbot → thehive-mcp (create case) → cortex-mcp (run analyzer) → Returns case link + report
💬 "Have we seen this hash before in our threat intel?"
Chatbot → misp-mcp → MISP search → Returns matching events/context
Faster Triage Natural language queries replace manual dashboard navigation
🔗
Unified Interface One conversation thread spans all four platforms
📚
Training Tool Students learn SOC workflows through guided AI interactions
Core infrastructure deployed and operational
  • Proxmox VE hypervisor configured
  • Wazuh stack (Manager, Indexer, Dashboard) deployed
  • TheHive 5 + Cortex integrated
  • MISP threat intelligence platform online
  • NIDS (Suricata + Zeek) monitoring network
  • Integrating additional threat intelligence feeds
  • Building custom detection rules
  • Developing response playbooks
  • Building AI chatbot interface
  • Developing wazuh-mcp server
  • Developing thehive-mcp server
  • Developing cortex-mcp server
  • Developing misp-mcp server