Microservices Strategy¶
Overview¶
The HNS Ticketing System follows a modular monolith with selective service extraction strategy. Most functionality remains in the PHP/Symfony monolith with clear bounded context modules, while two components are extracted as separate services to address specific scaling and technology requirements.
Architecture Decision¶
Why Not Full Microservices?¶
| Consideration | Decision |
|---|---|
| Team expertise | PHP/Symfony - single technology reduces cognitive load |
| Transaction integrity | Ticket purchase requires atomic inventory + payment operations |
| Operational complexity | 15 microservices would require significant DevOps investment |
| Current scale | Most operations under 10k concurrent users |
Why Extract Queue Service and Notification Workers?¶
| Service | Scale Requirement | Technology Mismatch |
|---|---|---|
| Queue Service | 100k+ concurrent WebSocket connections | PHP unsuitable for long-lived connections |
| Notification Workers | 100k+ push notifications in minutes | Async processing with rate limiting required |
Extracted Services¶
Queue Service¶
The Queue Service handles all waiting queue functionality for high-demand match sales.
Responsibilities¶
- Queue join and FIFO position assignment
- Real-time position updates (WebSocket/SSE)
- 20-minute purchase window enforcement
- 30-minute position persistence on disconnect
- Multi-device synchronization (same user = same position)
- Queue closure when sold out
Technology Stack¶
| Component | Technology | Rationale |
|---|---|---|
| Runtime | Node.js or Go | Efficient event loop for 100k+ connections |
| Data store | Redis Sorted Set | O(log N) position operations |
| Real-time | WebSocket + fallback SSE | Bidirectional for heartbeats |
| Queue metadata | Redis Hash | TTL-based session tracking |
Data Model (Redis)¶
# Queue positions (sorted set)
Key: queue:{match_id}
Members: user_id
Score: join_timestamp (FIFO ordering)
# Queue metadata (hash)
Key: queue_meta:{match_id}
Fields:
- total_size: INTEGER
- processing_rate: INTEGER (users/minute)
- created_at: TIMESTAMP
- status: ENUM (active, paused, closed)
# User session (string with TTL)
Key: queue_session:{user_id}:{match_id}
Value: {position, queue_token, websocket_id}
TTL: 30 minutes (position persistence)
# Active purchase windows (hash)
Key: purchase_windows:{match_id}
Field: user_id
Value: {expires_at, session_token}
API Endpoints¶
REST API¶
| Method | Endpoint | Description |
|---|---|---|
| POST | /api/v1/queue/{match_id}/join |
Join queue, returns position and WebSocket URL |
| GET | /api/v1/queue/{match_id}/position |
Get current position (polling fallback) |
| DELETE | /api/v1/queue/{match_id} |
Leave queue voluntarily |
| POST | /api/v1/queue/{match_id}/sold-out |
Internal: Monolith signals sold out |
WebSocket Protocol¶
Connection: wss://queue.hns.hr/ws/{queue_token}
Server → Client:
| Message Type | Frequency | Payload |
|---|---|---|
position_update |
Every 30s or on significant change | {position, estimated_wait_minutes} |
turn_granted |
When user reaches front | {purchase_window_expires_at, checkout_url} |
queue_closed |
On sold out | {reason: "sold_out"} |
Client → Server:
| Message Type | Frequency | Purpose |
|---|---|---|
heartbeat |
Every 60s | Maintain position (prevents 30-min timeout) |
Communication with Monolith¶
┌─────────────────┐ ┌──────────────────────┐
│ Mobile App │ │ HNS Monolith (E4) │
└────────┬────────┘ └──────────┬───────────┘
│ │
│ WebSocket │
▼ │
┌─────────────────┐ Redis Pub/Sub │
│ Queue Service │ ──────────────────────────────┤
│ (Node.js/Go) │ queue.turn_granted │
│ │ queue.expired │
│ │ ◄─────────────────────────────┤
│ │ REST: /sold-out │
└────────┬────────┘ │
│ │
▼ ▼
┌─────────────────────────────────────────────────────────┐
│ Redis Cluster │
│ (sorted sets, pub/sub, session TTL) │
└─────────────────────────────────────────────────────────┘
Events Published to Monolith (Redis Pub/Sub):
// Channel: queue_events
// User granted purchase window
{
"event": "queue.turn_granted",
"user_id": "550e8400-e29b-41d4-a716-446655440000",
"match_id": "660e8400-e29b-41d4-a716-446655440001",
"session_token": "st_abc123xyz",
"expires_at": "2026-09-15T14:35:00Z"
}
// User's purchase window expired without completing
{
"event": "queue.expired",
"user_id": "550e8400-e29b-41d4-a716-446655440000",
"match_id": "660e8400-e29b-41d4-a716-446655440001",
"reason": "window_timeout"
}
REST Calls from Monolith:
POST /api/v1/queue/{match_id}/sold-out
X-Internal-Auth: {shared_secret}
Content-Type: application/json
{
"remaining_sectors": [] // Or sectors with remaining capacity
}
Scaling Considerations¶
| Metric | Target | Strategy |
|---|---|---|
| Concurrent connections | 100k+ | Horizontal pod scaling |
| Position updates | < 100ms latency | Redis sorted set + local caching |
| Failover | < 5s | Health checks + auto-restart |
Deployment:
# Kubernetes HPA example
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: queue-service
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: queue-service
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Pods
pods:
metric:
name: websocket_connections
target:
type: AverageValue
averageValue: "5000"
Notification Workers¶
Notification Workers process email and push notification jobs asynchronously from the monolith.
Responsibilities¶
- Process transactional emails (order confirmation, ticket delivery, quota invitation)
- Process push notifications (queue updates, your turn, reminders)
- Handle external API rate limits
- Retry failed deliveries with exponential backoff
Architecture¶
Workers are PHP processes that share the monolith codebase but run as separate long-running processes consuming from Redis queues.
┌──────────────────────────────────────────────────────────────┐
│ HNS Monolith │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Ticket Purchase (E4) Ticket Management (E9) Quota (E7)│ │
│ │ │ │ │ │ │
│ │ └────────────────────┼────────────────────┘ │ │
│ │ │ │ │
│ │ ┌─────────▼─────────┐ │ │
│ │ │ Notification │ │ │
│ │ │ Event Publisher │ │ │
│ └────────────────────┴─────────┬─────────┴─────────────────┘ │
└─────────────────────────────────┼───────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Redis Queues │
│ ┌──────────────────┐ ┌──────────────────┐ ┌───────────────┐ │
│ │ notification: │ │ notification: │ │ notification: │ │
│ │ email │ │ push │ │ push:priority │ │
│ └────────┬─────────┘ └────────┬─────────┘ └───────┬───────┘ │
└───────────┼─────────────────────┼────────────────────┼──────────┘
│ │ │
▼ ▼ ▼
┌───────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ Email Worker │ │ Push Worker #1 │ │ Push Worker #2 │
│ (single) │ │ (scalable) │ │ (scalable) │
│ │ │ │ │ │
│ ┌───────────┐ │ │ ┌───────────┐ │ │ ┌───────────┐ │
│ │ Mailgun │ │ │ │ Firebase │ │ │ │ Firebase │ │
│ │ API │ │ │ │ FCM API │ │ │ │ FCM API │ │
│ └───────────┘ │ │ └───────────┘ │ │ └───────────┘ │
└───────────────────┘ └──────────────────┘ └──────────────────┘
Job Queue Schema¶
Email Queue (notification:email):
{
"job_id": "job_abc123",
"type": "email",
"template": "order_confirmation",
"recipient": "user@example.com",
"payload": {
"order_id": "ORD-2026-001234",
"match_name": "Croatia vs Italy",
"tickets": [
{"sector": "West A", "row": "12", "seat": "15"}
],
"total_amount": "450.00 EUR"
},
"created_at": "2026-09-15T14:00:00Z",
"attempts": 0,
"max_attempts": 3
}
Push Queue (notification:push):
{
"job_id": "job_xyz789",
"type": "push",
"template": "queue_your_turn",
"user_id": "550e8400-e29b-41d4-a716-446655440000",
"fcm_tokens": ["token1", "token2"],
"payload": {
"title": "Your turn!",
"body": "You have 20 minutes to complete your purchase",
"data": {
"match_id": "660e8400-e29b-41d4-a716-446655440001",
"action": "open_checkout"
}
},
"priority": "high",
"created_at": "2026-09-15T14:00:00Z"
}
Priority Push Queue (notification:push:priority):
Used for time-critical messages like "your turn" notifications. Workers poll this queue first.
Rate Limiting¶
| Provider | Limit | Worker Strategy |
|---|---|---|
| Mailgun | 100 emails/second | Single worker with token bucket |
| Firebase FCM | 500 tokens/request, 1000 req/second | Batch sends, multiple workers |
Email Worker Rate Limiting:
// Token bucket: 100 tokens, refill 100/second
class EmailWorker
{
private TokenBucket $bucket;
public function process(Job $job): void
{
$this->bucket->consume(1); // Blocks if rate exceeded
$this->mailgun->send($job->payload);
}
}
Push Worker Batching:
// Batch up to 500 tokens per Firebase request
class PushWorker
{
public function processBatch(): void
{
$jobs = $this->redis->lpop('notification:push', 500);
$tokenGroups = $this->groupByFcmTokens($jobs);
foreach ($tokenGroups as $tokens => $notification) {
$this->firebase->sendMulticast($tokens, $notification);
}
}
}
Retry Strategy¶
| Attempt | Delay | Action |
|---|---|---|
| 1 | Immediate | First try |
| 2 | 30 seconds | Retry |
| 3 | 5 minutes | Final retry |
| Failed | - | Move to dead letter queue, alert ops |
Monitoring¶
Workers expose metrics for Prometheus:
# Email worker metrics
notification_email_sent_total{template="order_confirmation"} 1234
notification_email_failed_total{template="order_confirmation"} 12
notification_email_queue_size 45
# Push worker metrics
notification_push_sent_total{template="queue_your_turn"} 98765
notification_push_latency_seconds_bucket{le="0.5"} 95000
notification_push_queue_size 1200
Event Catalog¶
Complete list of events for cross-service communication:
| Event | Publisher | Consumer | Transport | Trigger |
|---|---|---|---|---|
queue.turn_granted |
Queue Service | Monolith (E4) | Redis Pub/Sub | User reaches front of queue |
queue.expired |
Queue Service | Monolith (E4) | Redis Pub/Sub | Purchase window times out |
order.completed |
Monolith (E4) | Email Worker | Redis Queue | Successful checkout |
ticket.generated |
Monolith (E9) | Email Worker, Push Worker | Redis Queue | Ticket issued |
quota.invitation |
Monolith (E7) | Email Worker, Push Worker | Redis Queue | Quota created |
queue.position_update |
Monolith (E4) | Push Worker | Redis Queue | Batch position updates |
payment.webhook |
Stripe | Monolith (E8) | HTTP → Queue | Payment status change |
Deployment Architecture¶
┌─────────────────────────────────────────────────────────────────────────┐
│ Kubernetes Cluster │
│ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ Ingress Controller │ │
│ │ api.hns.hr queue.hns.hr │ │
│ └─────────┬───────────────────────┬───────────────────────────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌─────────────────────┐ ┌─────────────────────┐ │
│ │ HNS Monolith │ │ Queue Service │ │
│ │ (PHP/Symfony) │ │ (Node.js/Go) │ │
│ │ Replicas: 5-20 │ │ Replicas: 3-20 │ │
│ └─────────┬───────────┘ └─────────┬───────────┘ │
│ │ │ │
│ ┌─────────┼────────────────────────┼─────────────────────────────┐ │
│ │ ▼ ▼ │ │
│ │ ┌─────────────────────────────────────────────────────────┐ │ │
│ │ │ Redis Cluster (3 nodes) │ │ │
│ │ │ - Queue positions (sorted sets) │ │ │
│ │ │ - Notification jobs (lists) │ │ │
│ │ │ - Session cache (strings with TTL) │ │ │
│ │ │ - Pub/sub channels │ │ │
│ │ └─────────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ │ ┌─────────────────────────────────────────────────────────┐ │ │
│ │ │ PostgreSQL (Primary + Replica) │ │ │
│ │ │ - Tickets, orders, quotas │ │ │
│ │ │ - User profiles, blacklist │ │ │
│ │ │ - Seat inventory │ │ │
│ │ └─────────────────────────────────────────────────────────┘ │ │
│ │ Data Layer │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ Worker Pods │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │ Email Worker │ │ Push Worker │ │ Push Worker │ │ │
│ │ │ (1 replica) │ │ #1 │ │ #2 │ │ │
│ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Failure Scenarios¶
Queue Service Failure¶
| Scenario | Impact | Mitigation |
|---|---|---|
| Single pod crash | ~5k users lose WebSocket | Kubernetes auto-restart < 5s |
| Redis unavailable | All queue operations fail | Redis Cluster with replicas |
| Network partition | Split-brain positions | Redis single-master prevents |
Recovery: Users with active WebSocket connections receive error and retry. Position preserved in Redis for 30 minutes.
Notification Worker Failure¶
| Scenario | Impact | Mitigation |
|---|---|---|
| Worker crash | Jobs pause | Supervisor restarts worker |
| Mailgun outage | Emails queued | Retry with backoff, dead letter queue |
| Firebase outage | Push queued | Same as email |
Recovery: Jobs remain in Redis queue. Worker restarts and continues processing.
Related Documentation¶
- Architecture Overview
- E5: Waiting Queue System
- E15: Notifications & Communications
- E5-F1: Queue Join and Position Assignment
Last Updated: January 2026