Category filter

Achieving Sub-Second Latency: The Hexnode Three-Tier Database Topology

In large-scale enterprise environments managing over 500,000 endpoints, the centralized database often becomes the primary performance bottleneck. A traditional monolithic database architecture struggles to simultaneously process massive volumes of device telemetry (“Write” operations) while serving real-time search queries for hundreds of concurrent technicians (“Read” operations).

To address this, Hexnode employs a High-Concurrency Three-Tier data base architecture Topology. This architecture abandons the “one database does it all” model in favor of a specialized, multi-tiered approach. By physically separating the ingestion of device data from the processing of administrative actions, we guarantee sub-second dashboard responsiveness (<300ms latency) and real-time inventory updates, even during peak load scenarios.

Three-Tier data base architecture

Our database topology is structured into three distinct layers, each optimized for a specific type of workload. This ensures that a surge in activity in one area does not degrade performance in another.

1. The Primary Write Master (“Source of Truth”)

This layer is the operational heart of the Hexnode deployment. It is dedicated exclusively to recording state changes and maintaining data integrity.

  • Role: The Write Master handles high-velocity “Write” operations. This includes recording device check-ins, logging command statuses (e.g., “Wipe Initiated”), and escrowing sensitive data like LAPS passwords.
  • The “Thundering Herd” Defense: In scenarios where a network outage occurs and is resolved, hundreds of thousands of devices may attempt to reconnect simultaneously to a phenomenon known as the “Thundering Herd.” The Write Master is engineered to absorb this massive ingestion of data.
  • Data Durability: We ensure that every transaction is cryptographically recorded before it is applied. This guarantees 100% data durability (ACID compliance), meaning no command or log is ever lost, even in the event of a sudden hardware failure.

2. Horizontal Sharding (Intelligent Workload Partitioning)

As an endpoint fleet grows from 50,000 to 500,000, storing all data in a single table becomes inefficient and risky. Hexnode mitigates this through Horizontal Sharding.

  • The Logic: Instead of a single massive database, device records are logically partitioned across multiple physical database nodes.
  • Eliminating “Lock Contention”: In a monolithic database, if an admin updates an app for 50,000 devices, the database might “lock” the table, forcing other admins to wait. With sharding, workloads are isolated. An update running on the “North America” shard has absolutely no impact on the performance of the “Europe” shard. This ensures that administrative actions remain fast and fluid, regardless of what other teams are doing.

3. Dedicated Read-Replicas (The “Technician Fast Lane”)

Administrative work searching for devices, viewing dashboards, and generating reports consists almost entirely of “Read” operations. To ensure these actions are never slowed down by backend processing, we utilize Dedicated Read-Replicas.

  • Mechanism: We maintain a cluster of 4x Read-Replicas, which are real-time, read-only copies of the Primary Master.
  • The Technician Experience: When you type a serial number into the Global Search bar, your query is routed directly to a Read-Replica. Because this node is not burdened with processing millions of device check-ins, it returns to your results instantly.
  • Impact: This separation guarantees a search latency of < 300ms, providing a “zero-lag” experience for technicians, even when the system is processing heavy background tasks.

Performance Benchmarks

This topology is rigorously tested to meet specific performance targets, ensuring operational continuity for large administrative teams.

Metric Target Architectural Enabler Description
Search Latency < 300ms Read-Replicas Time taken to return a specific device record from the search bar.
Command Throughput 10k+ Ops/Sec Write Master (WAL) Number of simultaneous commands (e.g., Lock, Wipe) for the system can be processed per second.
Inventory Refresh Real-Time MQTT Streaming Delay between a device changing state and the dashboard reflecting that change.
Bulk Reporting < 10 Seconds Analytical Sharding Time taken to generate complex reports (e.g., “All Non-Compliant Windows Devices”) for 50k+ records.

Data Governance & Security

At an enterprise scale, performance must not come at the expense of security. Our topology incorporates advanced protection mechanisms for sensitive data.

  • Tenant Isolation: For MSPs and multi-tenant environments, data is strictly siloed at the schema or physical database level. This ensures that a query of execution in “Tenant A” is technically incapable of accessing data from “Tenant B,” preventing cross-tenant data leakage.
  • LAPS Escrow Encryption: Highly sensitive credentials, such as Local Admin Passwords (LAPS), are stored in specialized, hardened tables using Field-Level Encryption (FLE).
  • Hardware Security Module (HSM): Highly critical keys and certificates are stored in a physical HSM.
  • High Availability (HA): We employ automatic synchronous replication to a Standby Master node. If the Primary Master fails, the Standby node takes over instantly with Zero Data Loss (RPO = 0), ensuring the portal remains online and operational.

Concurrency Management for 500+ Admins

To support hundreds of technicians working simultaneously without causing database “deadlocks,” we employ sophisticated traffic management strategies.

  • Connection Pooling: Establishing a new connection to a database is resource intensive. We use advanced middleware to maintain a pool of “warm” connections. This allows thousands of application requests to be served rapidly without the overhead of constant opening and closing links.
  • Intelligent Query Throttling: The system automatically categorizes and prioritizes traffic. Critical management commands (e.g., “Remote Wipe,” “Device Lock”) are given the “Fast Lane.” Less critical background tasks (e.g., “Generate Monthly Usage Report”) are throttled during peak hours to ensure they do not impact real-time security operations.
Solution Framework