Client Overview
A large enterprise messaging services provider required a highly available, secure, and scalable networking infrastructure to support their next-generation messaging platform. The platform needed to reliably handle 50,000 transactions per second (TPS) while serving thousands of enterprise customers across geographical regions.
Business Challenge
The customer’s legacy environment faced several limitations:
- Scalability constraints within the on-premises datacenter
- High latency and inefficient routing for cloud workloads
- Lack of seamless hybrid connectivity between their private datacenter and AWS
- Need for secure multi-tenant isolation as their customer base grew
- Strict requirements for 99.99% uptime, security, and audit compliance
The goal was to build a modern, hybrid, high-performance networking environment capable of scaling messaging workloads elastically across on-prem and AWS.
Solution Overview
1. Datacenter Network Infrastructure (Expanded)
1.1 High-Availability Physical Network Topology
The datacenter was designed using a leaf–spine architecture to ensure deterministic latency, non-blocking throughput, and seamless scalability across 10 racks.
Key High-Availability Features:
- Dual Spine Switches
- All leaf switches uplink to both spine switches using ECMP.
- Provides automatic traffic rerouting during spine failure.
- All leaf switches uplink to both spine switches using ECMP.
- Redundant Leaf Switch Pairs per Rack
- Rack servers/storage connect to Leaf-A and Leaf-B.
- Enables switch-level failover with zero downtime.
- Rack servers/storage connect to Leaf-A and Leaf-B.
- Physical Redundancy in Cabling
- Dual-path fiber/copper cabling across racks.
- Separated cable trays to reduce risk of physical failure.
- Dual-path fiber/copper cabling across racks.
1.2 High Availability for Routers, Firewalls & Load Balancers
Edge Routers
- Dual redundant routers configured with:
- BGP routing for dynamic failover and load balancing
- Graceful restart and BFD for rapid convergence
- Policy-based routing for workload optimal paths
- BGP routing for dynamic failover and load balancing
Firewalls (Active-Active/Active-Passive)
- Next-Generation Firewalls deployed in high-availability clusters with:
- Session state synchronization
- Health monitoring of interfaces and routes
- Automatic failover without service interruption
- Session state synchronization
Load Balancers
- Hardware load balancers configured in HA pair using:
- VRRP / clustering protocol
- Floating IPs for failover
- Multi-gigabit throughput redundancy for messaging flows
- VRRP / clustering protocol
1.3 Server & Storage Connectivity with Ethernet Bonding
Server NIC Bonding
To maximize throughput and availability:
- Dual/quad 10G–25G NICs per server
- LACP (802.3ad) for link bonding
- Active/Active or Active/Passive mode as per workload
- NICs terminated on two different leaf switches
Benefits:
- Eliminates NIC/switch failures
- Higher throughput
- Zero-downtime network maintenance
Storage Connectivity
Storage arrays configured using:
- MPIO for path redundancy
- LACP-based uplinks to dual leaf switches
- Redundant controllers
- Dedicated VLANs and jumbo frames for enhanced I/O performance
1.4 VLAN, VRF & Security Segmentation
- VLANs created for compute, storage, management, customer workloads, and security zones
- VRFs used for tenant isolation
- Firewall contexts (VSYS/virtual systems) implemented for multi-tenancy
- Segmentation ensures regulatory compliance and airtight customer isolation
1.5 Dynamic Routing Architecture
Routing optimized using:
- OSPF or BGP with ECMP between leaf and spine
- BGP used at network edges for WAN and AWS connectivity
- Route dampening to prevent instability
- BFD enabling sub-second failover
1.6 Monitoring, Telemetry & Failover Testing
A comprehensive observability framework was deployed:
- SNMP/NETCONF/GNMI for real-time device telemetry
- Flow logs and packet captures for troubleshooting
- Automated link/node failover simulations
- Continuous baseline performance checks
- Alerts for latency, packet loss, and throughput changes
1.7 Datacenter Results
- Zero single points of failure
- Sub-second failover for critical network elements
- 2× throughput gains from server/storage bonding
- Predictable latency for messaging workloads
- Stable environment supporting 50,000 TPS
- Full redundancy from rack to core to WAN edge
2. AWS Cloud Network Architecture
To support elastic scaling and distributed workloads, multiple AWS VPCs were deployed based on application clusters and customer segmentation.
Each VPC included:
- Public, private, and database subnets
- NAT gateways and Internet gateways
- EC2 workloads in Auto Scaling groups
- VPC peering and AWS Transit Gateway for inter-VPC routing
- Multi-AZ deployment for resilience
Key Outcomes:
- Cloud-native scalability
- Strong network isolation
- Multi-AZ high availability
- Rapid onboarding of new tenants
3. Hybrid Connectivity with AWS Direct Connect
To integrate the datacenter with AWS:
- Redundant Direct Connect links (1/10/40 Gbps)
- BGP routing with private VIFs
- Integration with the on-prem edge routers
- Backup IPSec VPN for DR and failover
- Traffic engineering for optimized latency paths
Results:
- Consistent, low-latency connectivity
- No reliance on public internet
- Smooth hybrid cloud operations
Performance & Business Outcomes
✔ Achieved 50,000+ TPS
The combined datacenter + AWS deployment supported sustained high-volume messaging with very low failure rates.
✔ 99.99% Uptime
Through redundant infrastructure across all layers:
- HA switches
- Router and firewall clusters
- Multi-AZ cloud configuration
- Load-balanced traffic paths
- Direct Connect redundancy
✔ Multi-Tenant Isolation
Network segmentation, VRFs, firewall contexts, and VPC boundaries ensured enterprise-grade security.
✔ Ability to Scale Rapidly
- Cloud VPC expansion
- Datacenter rack-level scalability
- Auto Scaling for cloud workloads
- Dynamic routing for workload distribution
Conclusion
Through a robust hybrid networking architecture spanning a high-availability datacenter design and scalable AWS cloud footprint, the client can now support thousands of enterprise customers with peak performance, resilience, and security. The modernized network infrastructure ensures reliable delivery of 50,000 TPS, enabling the messaging platform to scale with business growth.
Executive Summary – High-Availability Hybrid Network for Large-Scale Messaging Platform
Overview
A leading enterprise communication services provider required a modern, fault-tolerant, and scalable network architecture to power its next-generation messaging platform. The platform needed to reliably deliver 50,000 transactions per second (TPS) while supporting thousands of enterprise customers with strict uptime and security requirements.
Our team designed and implemented a hybrid datacenter + AWS network infrastructure that ensures high availability, low latency, and seamless scalability.
Business Challenges
- Legacy infrastructure unable to scale with customer growth
- High latency and routing inefficiencies for cloud-hosted workloads
- No unified hybrid connectivity between datacenter and AWS
- Need for strong multi-tenant isolation and enterprise-grade security
- Requirement for 99.99% uptime for mission-critical messaging
Solution Highlights
1. High-Availability Datacenter Network
- Built a fully redundant leaf–spine architecture across 10 racks
- Dual spine and dual leaf switches per rack with ECMP-based resiliency
- Redundant routers, active-active/active-passive firewalls, and HA load balancers
- Server and storage interfaces configured with LACP bonding and MPIO
- Isolated network domains using VLANs, VRFs, and firewall contexts
- Optimized routing using OSPF/BGP with sub-second BFD failover
- Comprehensive monitoring and automated failover testing
2. Scalable AWS Cloud Architecture
- Multiple VPCs deployed for tenant isolation and application tiers
- Multi-AZ EC2 workloads, NAT gateways, and Auto Scaling groups
- VPC peering and Transit Gateway for simplified central routing
- Cloud-native security and logging for compliance
3. Hybrid Connectivity with AWS Direct Connect
- Redundant high-bandwidth Direct Connect circuits
- BGP integration for routing stability and predictable latency
- VPN failover for DR and continuous availability
- Ensured secure private connectivity between datacenter and AWS
Key Outcomes
✔ 50,000+ TPS Sustained Throughput
High-performance network paths and elastic cloud scale supported massive message volumes.
✔ 99.99% Availability
No single point of failure across datacenter, cloud, routing, and connectivity layers.
✔ Strong Security & Tenant Isolation
Segregated networks, VRFs, VPC boundaries, and firewall contexts ensure airtight security.
✔ Rapid Scalability
New customers and workloads can be onboarded instantly with no architectural redesign.
✔ Unified Hybrid Operations
Seamless workload distribution and consistent low latency across on-prem and cloud.
Conclusion
The modern hybrid networking architecture delivered a resilient, scalable, and secure foundation for the client’s high-volume messaging platform. With this design, the customer is now positioned to support future growth, onboard new tenants rapidly, and maintain industry-leading service reliability.



