Skip to main content

Client Overview

A large enterprise messaging services provider required a highly available, secure, and scalable networking infrastructure to support their next-generation messaging platform. The platform needed to reliably handle 50,000 transactions per second (TPS) while serving thousands of enterprise customers across geographical regions.

Business Challenge

The customer’s legacy environment faced several limitations:

  • Scalability constraints within the on-premises datacenter
  • High latency and inefficient routing for cloud workloads
  • Lack of seamless hybrid connectivity between their private datacenter and AWS
  • Need for secure multi-tenant isolation as their customer base grew
  • Strict requirements for 99.99% uptime, security, and audit compliance

The goal was to build a modern, hybrid, high-performance networking environment capable of scaling messaging workloads elastically across on-prem and AWS.

 

Solution Overview

1. Datacenter Network Infrastructure (Expanded)


1.1 High-Availability Physical Network Topology

The datacenter was designed using a leaf–spine architecture to ensure deterministic latency, non-blocking throughput, and seamless scalability across 10 racks.

Key High-Availability Features:

  • Dual Spine Switches

    • All leaf switches uplink to both spine switches using ECMP.
    • Provides automatic traffic rerouting during spine failure. 
  • Redundant Leaf Switch Pairs per Rack

    • Rack servers/storage connect to Leaf-A and Leaf-B.
    • Enables switch-level failover with zero downtime. 
  • Physical Redundancy in Cabling

    • Dual-path fiber/copper cabling across racks.
    • Separated cable trays to reduce risk of physical failure. 

1.2 High Availability for Routers, Firewalls & Load Balancers

Edge Routers

  • Dual redundant routers configured with:
    • BGP routing for dynamic failover and load balancing
    • Graceful restart and BFD for rapid convergence
    • Policy-based routing for workload optimal paths

Firewalls (Active-Active/Active-Passive)

  • Next-Generation Firewalls deployed in high-availability clusters with:
    • Session state synchronization
    • Health monitoring of interfaces and routes
    • Automatic failover without service interruption

Load Balancers

  • Hardware load balancers configured in HA pair using:
    • VRRP / clustering protocol
    • Floating IPs for failover
    • Multi-gigabit throughput redundancy for messaging flows

1.3 Server & Storage Connectivity with Ethernet Bonding

Server NIC Bonding

To maximize throughput and availability:

  • Dual/quad 10G–25G NICs per server
  • LACP (802.3ad) for link bonding
  • Active/Active or Active/Passive mode as per workload
  • NICs terminated on two different leaf switches

Benefits:

  • Eliminates NIC/switch failures
  • Higher throughput
  • Zero-downtime network maintenance

Storage Connectivity

Storage arrays configured using:

  • MPIO for path redundancy
  • LACP-based uplinks to dual leaf switches
  • Redundant controllers
  • Dedicated VLANs and jumbo frames for enhanced I/O performance

1.4 VLAN, VRF & Security Segmentation

  • VLANs created for compute, storage, management, customer workloads, and security zones
  • VRFs used for tenant isolation
  • Firewall contexts (VSYS/virtual systems) implemented for multi-tenancy
  • Segmentation ensures regulatory compliance and airtight customer isolation

1.5 Dynamic Routing Architecture

Routing optimized using:

  • OSPF or BGP with ECMP between leaf and spine
  • BGP used at network edges for WAN and AWS connectivity
  • Route dampening to prevent instability
  • BFD enabling sub-second failover

1.6 Monitoring, Telemetry & Failover Testing

A comprehensive observability framework was deployed:

  • SNMP/NETCONF/GNMI for real-time device telemetry
  • Flow logs and packet captures for troubleshooting
  • Automated link/node failover simulations
  • Continuous baseline performance checks
  • Alerts for latency, packet loss, and throughput changes

1.7 Datacenter Results

  • Zero single points of failure
  • Sub-second failover for critical network elements
  • 2× throughput gains from server/storage bonding
  • Predictable latency for messaging workloads
  • Stable environment supporting 50,000 TPS
  • Full redundancy from rack to core to WAN edge

2. AWS Cloud Network Architecture

To support elastic scaling and distributed workloads, multiple AWS VPCs were deployed based on application clusters and customer segmentation.

Each VPC included:

  • Public, private, and database subnets
  • NAT gateways and Internet gateways
  • EC2 workloads in Auto Scaling groups
  • VPC peering and AWS Transit Gateway for inter-VPC routing
  • Multi-AZ deployment for resilience

Key Outcomes:

  • Cloud-native scalability
  • Strong network isolation
  • Multi-AZ high availability
  • Rapid onboarding of new tenants

3. Hybrid Connectivity with AWS Direct Connect

To integrate the datacenter with AWS:

  • Redundant Direct Connect links (1/10/40 Gbps)
  • BGP routing with private VIFs
  • Integration with the on-prem edge routers
  • Backup IPSec VPN for DR and failover
  • Traffic engineering for optimized latency paths

Results:

  • Consistent, low-latency connectivity
  • No reliance on public internet
  • Smooth hybrid cloud operations

Performance & Business Outcomes

✔ Achieved 50,000+ TPS

The combined datacenter + AWS deployment supported sustained high-volume messaging with very low failure rates.

✔ 99.99% Uptime

Through redundant infrastructure across all layers:

  • HA switches
  • Router and firewall clusters
  • Multi-AZ cloud configuration
  • Load-balanced traffic paths
  • Direct Connect redundancy

✔ Multi-Tenant Isolation

Network segmentation, VRFs, firewall contexts, and VPC boundaries ensured enterprise-grade security.

✔ Ability to Scale Rapidly

  • Cloud VPC expansion
  • Datacenter rack-level scalability
  • Auto Scaling for cloud workloads
  • Dynamic routing for workload distribution

Conclusion

Through a robust hybrid networking architecture spanning a high-availability datacenter design and scalable AWS cloud footprint, the client can now support thousands of enterprise customers with peak performance, resilience, and security. The modernized network infrastructure ensures reliable delivery of 50,000 TPS, enabling the messaging platform to scale with business growth.

 

Executive Summary – High-Availability Hybrid Network for Large-Scale Messaging Platform


Overview

A leading enterprise communication services provider required a modern, fault-tolerant, and scalable network architecture to power its next-generation messaging platform. The platform needed to reliably deliver 50,000 transactions per second (TPS) while supporting thousands of enterprise customers with strict uptime and security requirements.

Our team designed and implemented a hybrid datacenter + AWS network infrastructure that ensures high availability, low latency, and seamless scalability.

Business Challenges

  • Legacy infrastructure unable to scale with customer growth
  • High latency and routing inefficiencies for cloud-hosted workloads
  • No unified hybrid connectivity between datacenter and AWS
  • Need for strong multi-tenant isolation and enterprise-grade security
  • Requirement for 99.99% uptime for mission-critical messaging 

Solution Highlights

1. High-Availability Datacenter Network

  • Built a fully redundant leaf–spine architecture across 10 racks
  • Dual spine and dual leaf switches per rack with ECMP-based resiliency
  • Redundant routers, active-active/active-passive firewalls, and HA load balancers
  • Server and storage interfaces configured with LACP bonding and MPIO
  • Isolated network domains using VLANs, VRFs, and firewall contexts
  • Optimized routing using OSPF/BGP with sub-second BFD failover
  • Comprehensive monitoring and automated failover testing

2. Scalable AWS Cloud Architecture

  • Multiple VPCs deployed for tenant isolation and application tiers
  • Multi-AZ EC2 workloads, NAT gateways, and Auto Scaling groups
  • VPC peering and Transit Gateway for simplified central routing
  • Cloud-native security and logging for compliance

3. Hybrid Connectivity with AWS Direct Connect

  • Redundant high-bandwidth Direct Connect circuits
  • BGP integration for routing stability and predictable latency
  • VPN failover for DR and continuous availability
  • Ensured secure private connectivity between datacenter and AWS

Key Outcomes

✔ 50,000+ TPS Sustained Throughput

High-performance network paths and elastic cloud scale supported massive message volumes.

✔ 99.99% Availability

No single point of failure across datacenter, cloud, routing, and connectivity layers.

✔ Strong Security & Tenant Isolation

Segregated networks, VRFs, VPC boundaries, and firewall contexts ensure airtight security.

✔ Rapid Scalability

New customers and workloads can be onboarded instantly with no architectural redesign.

✔ Unified Hybrid Operations

Seamless workload distribution and consistent low latency across on-prem and cloud.

Conclusion

The modern hybrid networking architecture delivered a resilient, scalable, and secure foundation for the client’s high-volume messaging platform. With this design, the customer is now positioned to support future growth, onboard new tenants rapidly, and maintain industry-leading service reliability.

 

Leave a Reply