Preskoči na sadržaj

Microservices Strategy

Overview

The HNS Ticketing System follows a modular monolith with selective service extraction strategy. Most functionality remains in the PHP/Symfony monolith with clear bounded context modules, while two components are extracted as separate services to address specific scaling and technology requirements.

Architecture Decision

Why Not Full Microservices?

Consideration Decision
Team expertise PHP/Symfony - single technology reduces cognitive load
Transaction integrity Ticket purchase requires atomic inventory + payment operations
Operational complexity 15 microservices would require significant DevOps investment
Current scale Most operations under 10k concurrent users

Why Extract Queue Service and Notification Workers?

Service Scale Requirement Technology Mismatch
Queue Service 100k+ concurrent WebSocket connections PHP unsuitable for long-lived connections
Notification Workers 100k+ push notifications in minutes Async processing with rate limiting required

Extracted Services

Queue Service

The Queue Service handles all waiting queue functionality for high-demand match sales.

Responsibilities

  • Queue join and FIFO position assignment
  • Real-time position updates (WebSocket/SSE)
  • 20-minute purchase window enforcement
  • 30-minute position persistence on disconnect
  • Multi-device synchronization (same user = same position)
  • Queue closure when sold out

Technology Stack

Component Technology Rationale
Runtime Node.js or Go Efficient event loop for 100k+ connections
Data store Redis Sorted Set O(log N) position operations
Real-time WebSocket + fallback SSE Bidirectional for heartbeats
Queue metadata Redis Hash TTL-based session tracking

Data Model (Redis)

# Queue positions (sorted set)
Key: queue:{match_id}
Members: user_id
Score: join_timestamp (FIFO ordering)

# Queue metadata (hash)
Key: queue_meta:{match_id}
Fields:
  - total_size: INTEGER
  - processing_rate: INTEGER (users/minute)
  - created_at: TIMESTAMP
  - status: ENUM (active, paused, closed)

# User session (string with TTL)
Key: queue_session:{user_id}:{match_id}
Value: {position, queue_token, websocket_id}
TTL: 30 minutes (position persistence)

# Active purchase windows (hash)
Key: purchase_windows:{match_id}
Field: user_id
Value: {expires_at, session_token}

API Endpoints

REST API
Method Endpoint Description
POST /api/v1/queue/{match_id}/join Join queue, returns position and WebSocket URL
GET /api/v1/queue/{match_id}/position Get current position (polling fallback)
DELETE /api/v1/queue/{match_id} Leave queue voluntarily
POST /api/v1/queue/{match_id}/sold-out Internal: Monolith signals sold out
WebSocket Protocol

Connection: wss://queue.hns.hr/ws/{queue_token}

Server → Client:

Message Type Frequency Payload
position_update Every 30s or on significant change {position, estimated_wait_minutes}
turn_granted When user reaches front {purchase_window_expires_at, checkout_url}
queue_closed On sold out {reason: "sold_out"}

Client → Server:

Message Type Frequency Purpose
heartbeat Every 60s Maintain position (prevents 30-min timeout)

Communication with Monolith

┌─────────────────┐                    ┌──────────────────────┐
│   Mobile App    │                    │  HNS Monolith (E4)   │
└────────┬────────┘                    └──────────┬───────────┘
         │                                        │
         │ WebSocket                              │
         ▼                                        │
┌─────────────────┐    Redis Pub/Sub              │
│  Queue Service  │ ──────────────────────────────┤
│  (Node.js/Go)   │    queue.turn_granted         │
│                 │    queue.expired              │
│                 │ ◄─────────────────────────────┤
│                 │    REST: /sold-out            │
└────────┬────────┘                               │
         │                                        │
         ▼                                        ▼
┌─────────────────────────────────────────────────────────┐
│                     Redis Cluster                        │
│  (sorted sets, pub/sub, session TTL)                    │
└─────────────────────────────────────────────────────────┘

Events Published to Monolith (Redis Pub/Sub):

// Channel: queue_events

// User granted purchase window
{
  "event": "queue.turn_granted",
  "user_id": "550e8400-e29b-41d4-a716-446655440000",
  "match_id": "660e8400-e29b-41d4-a716-446655440001",
  "session_token": "st_abc123xyz",
  "expires_at": "2026-09-15T14:35:00Z"
}

// User's purchase window expired without completing
{
  "event": "queue.expired",
  "user_id": "550e8400-e29b-41d4-a716-446655440000",
  "match_id": "660e8400-e29b-41d4-a716-446655440001",
  "reason": "window_timeout"
}

REST Calls from Monolith:

POST /api/v1/queue/{match_id}/sold-out
X-Internal-Auth: {shared_secret}
Content-Type: application/json

{
  "remaining_sectors": []  // Or sectors with remaining capacity
}

Scaling Considerations

Metric Target Strategy
Concurrent connections 100k+ Horizontal pod scaling
Position updates < 100ms latency Redis sorted set + local caching
Failover < 5s Health checks + auto-restart

Deployment:

# Kubernetes HPA example
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: queue-service
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: queue-service
  minReplicas: 3
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Pods
    pods:
      metric:
        name: websocket_connections
      target:
        type: AverageValue
        averageValue: "5000"

Notification Workers

Notification Workers process email and push notification jobs asynchronously from the monolith.

Responsibilities

  • Process transactional emails (order confirmation, ticket delivery, quota invitation)
  • Process push notifications (queue updates, your turn, reminders)
  • Handle external API rate limits
  • Retry failed deliveries with exponential backoff

Architecture

Workers are PHP processes that share the monolith codebase but run as separate long-running processes consuming from Redis queues.

┌──────────────────────────────────────────────────────────────┐
│                     HNS Monolith                              │
│  ┌─────────────────────────────────────────────────────────┐ │
│  │ Ticket Purchase (E4)  Ticket Management (E9)  Quota (E7)│ │
│  │         │                    │                    │      │ │
│  │         └────────────────────┼────────────────────┘      │ │
│  │                              │                           │ │
│  │                    ┌─────────▼─────────┐                 │ │
│  │                    │ Notification      │                 │ │
│  │                    │ Event Publisher   │                 │ │
│  └────────────────────┴─────────┬─────────┴─────────────────┘ │
└─────────────────────────────────┼───────────────────────────┘
                                  │
                                  ▼
┌─────────────────────────────────────────────────────────────────┐
│                       Redis Queues                               │
│  ┌──────────────────┐  ┌──────────────────┐  ┌───────────────┐  │
│  │ notification:    │  │ notification:    │  │ notification: │  │
│  │ email            │  │ push             │  │ push:priority │  │
│  └────────┬─────────┘  └────────┬─────────┘  └───────┬───────┘  │
└───────────┼─────────────────────┼────────────────────┼──────────┘
            │                     │                    │
            ▼                     ▼                    ▼
┌───────────────────┐  ┌──────────────────┐  ┌──────────────────┐
│   Email Worker    │  │  Push Worker #1  │  │  Push Worker #2  │
│   (single)        │  │  (scalable)      │  │  (scalable)      │
│                   │  │                  │  │                  │
│   ┌───────────┐   │  │  ┌───────────┐   │  │  ┌───────────┐   │
│   │  Mailgun  │   │  │  │ Firebase  │   │  │  │ Firebase  │   │
│   │   API     │   │  │  │ FCM API   │   │  │  │ FCM API   │   │
│   └───────────┘   │  │  └───────────┘   │  │  └───────────┘   │
└───────────────────┘  └──────────────────┘  └──────────────────┘

Job Queue Schema

Email Queue (notification:email):

{
  "job_id": "job_abc123",
  "type": "email",
  "template": "order_confirmation",
  "recipient": "user@example.com",
  "payload": {
    "order_id": "ORD-2026-001234",
    "match_name": "Croatia vs Italy",
    "tickets": [
      {"sector": "West A", "row": "12", "seat": "15"}
    ],
    "total_amount": "450.00 EUR"
  },
  "created_at": "2026-09-15T14:00:00Z",
  "attempts": 0,
  "max_attempts": 3
}

Push Queue (notification:push):

{
  "job_id": "job_xyz789",
  "type": "push",
  "template": "queue_your_turn",
  "user_id": "550e8400-e29b-41d4-a716-446655440000",
  "fcm_tokens": ["token1", "token2"],
  "payload": {
    "title": "Your turn!",
    "body": "You have 20 minutes to complete your purchase",
    "data": {
      "match_id": "660e8400-e29b-41d4-a716-446655440001",
      "action": "open_checkout"
    }
  },
  "priority": "high",
  "created_at": "2026-09-15T14:00:00Z"
}

Priority Push Queue (notification:push:priority):

Used for time-critical messages like "your turn" notifications. Workers poll this queue first.

Rate Limiting

Provider Limit Worker Strategy
Mailgun 100 emails/second Single worker with token bucket
Firebase FCM 500 tokens/request, 1000 req/second Batch sends, multiple workers

Email Worker Rate Limiting:

// Token bucket: 100 tokens, refill 100/second
class EmailWorker
{
    private TokenBucket $bucket;

    public function process(Job $job): void
    {
        $this->bucket->consume(1); // Blocks if rate exceeded
        $this->mailgun->send($job->payload);
    }
}

Push Worker Batching:

// Batch up to 500 tokens per Firebase request
class PushWorker
{
    public function processBatch(): void
    {
        $jobs = $this->redis->lpop('notification:push', 500);
        $tokenGroups = $this->groupByFcmTokens($jobs);

        foreach ($tokenGroups as $tokens => $notification) {
            $this->firebase->sendMulticast($tokens, $notification);
        }
    }
}

Retry Strategy

Attempt Delay Action
1 Immediate First try
2 30 seconds Retry
3 5 minutes Final retry
Failed - Move to dead letter queue, alert ops

Monitoring

Workers expose metrics for Prometheus:

# Email worker metrics
notification_email_sent_total{template="order_confirmation"} 1234
notification_email_failed_total{template="order_confirmation"} 12
notification_email_queue_size 45

# Push worker metrics
notification_push_sent_total{template="queue_your_turn"} 98765
notification_push_latency_seconds_bucket{le="0.5"} 95000
notification_push_queue_size 1200

Event Catalog

Complete list of events for cross-service communication:

Event Publisher Consumer Transport Trigger
queue.turn_granted Queue Service Monolith (E4) Redis Pub/Sub User reaches front of queue
queue.expired Queue Service Monolith (E4) Redis Pub/Sub Purchase window times out
order.completed Monolith (E4) Email Worker Redis Queue Successful checkout
ticket.generated Monolith (E9) Email Worker, Push Worker Redis Queue Ticket issued
quota.invitation Monolith (E7) Email Worker, Push Worker Redis Queue Quota created
queue.position_update Monolith (E4) Push Worker Redis Queue Batch position updates
payment.webhook Stripe Monolith (E8) HTTP → Queue Payment status change

Deployment Architecture

┌─────────────────────────────────────────────────────────────────────────┐
│                           Kubernetes Cluster                             │
│                                                                         │
│  ┌─────────────────────────────────────────────────────────────────┐   │
│  │                    Ingress Controller                            │   │
│  │         api.hns.hr          queue.hns.hr                        │   │
│  └─────────┬───────────────────────┬───────────────────────────────┘   │
│            │                       │                                    │
│            ▼                       ▼                                    │
│  ┌─────────────────────┐  ┌─────────────────────┐                      │
│  │  HNS Monolith       │  │  Queue Service      │                      │
│  │  (PHP/Symfony)      │  │  (Node.js/Go)       │                      │
│  │  Replicas: 5-20     │  │  Replicas: 3-20     │                      │
│  └─────────┬───────────┘  └─────────┬───────────┘                      │
│            │                        │                                   │
│  ┌─────────┼────────────────────────┼─────────────────────────────┐    │
│  │         ▼                        ▼                              │    │
│  │  ┌─────────────────────────────────────────────────────────┐   │    │
│  │  │              Redis Cluster (3 nodes)                     │   │    │
│  │  │   - Queue positions (sorted sets)                        │   │    │
│  │  │   - Notification jobs (lists)                            │   │    │
│  │  │   - Session cache (strings with TTL)                     │   │    │
│  │  │   - Pub/sub channels                                     │   │    │
│  │  └─────────────────────────────────────────────────────────┘   │    │
│  │                                                                 │    │
│  │  ┌─────────────────────────────────────────────────────────┐   │    │
│  │  │              PostgreSQL (Primary + Replica)              │   │    │
│  │  │   - Tickets, orders, quotas                              │   │    │
│  │  │   - User profiles, blacklist                             │   │    │
│  │  │   - Seat inventory                                       │   │    │
│  │  └─────────────────────────────────────────────────────────┘   │    │
│  │                        Data Layer                               │    │
│  └─────────────────────────────────────────────────────────────────┘    │
│                                                                         │
│  ┌─────────────────────────────────────────────────────────────────┐   │
│  │                    Worker Pods                                   │   │
│  │  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐           │   │
│  │  │ Email Worker │  │ Push Worker  │  │ Push Worker  │           │   │
│  │  │ (1 replica)  │  │ #1           │  │ #2           │           │   │
│  │  └──────────────┘  └──────────────┘  └──────────────┘           │   │
│  └─────────────────────────────────────────────────────────────────┘   │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

Failure Scenarios

Queue Service Failure

Scenario Impact Mitigation
Single pod crash ~5k users lose WebSocket Kubernetes auto-restart < 5s
Redis unavailable All queue operations fail Redis Cluster with replicas
Network partition Split-brain positions Redis single-master prevents

Recovery: Users with active WebSocket connections receive error and retry. Position preserved in Redis for 30 minutes.

Notification Worker Failure

Scenario Impact Mitigation
Worker crash Jobs pause Supervisor restarts worker
Mailgun outage Emails queued Retry with backoff, dead letter queue
Firebase outage Push queued Same as email

Recovery: Jobs remain in Redis queue. Worker restarts and continues processing.



Last Updated: January 2026