SLANUSZ-25 — Fotózás újra nem működik

P1 — Production screenshot capture broken for all NUSZ operators

Summary

Operators at NUSZ cannot take screenshots during videochat sessions. The IpFilterService rate limiter blocks ALL operators after 5 screenshot attempts within 10 seconds from any combination of operators, because all operators share a single corporate NAT IP (91.82.81.14). Cooldown is 10 minutes, causing recurring 10-15 minute outage windows.

A secondary issue compounds this: 698 RPC transport timeouts (rpc-transport-css) indicate vuer_css is frequently unresponsive to screenshot RPC calls, likely due to customer WebSocket disconnections.

Original Complaint

“Sziasztok! Kérelk nézzétek meg mi lehet a probléma. Ismét nem tududnk fotókat készíteni vagy ha mégis akkor aaz alábbi hibaüzenet érekezik a rendszertől” — SzaboneNagy.Zsuzsa

Translation: “Hi! Please look at what the problem could be. Again we can’t take photos or if we do, the following error message arrives from the system.”

Root Cause 1: IP-Based Rate Limiting (PRIMARY)

Evidence Chain

Confirmed with high confidence

  1. server/web/api/operator/screenshot.js:10 applies createIpMonitorAndFilter('videochat-operator-screenshot')
  2. IpFilterService (server/service/IpFilterService.js) tracks requests by {IP}:{tag} key
  3. Config: suppressAfterAttempts=5, throttlePeriodMs=10000 (10s), coolDownPeriodMs=600000 (10min)
  4. uncheckedIps is empty (no whitelist)
  5. All NUSZ operators exit through corporate NAT: 91.82.81.14
  6. After 5 screenshots from ANY operator combo within 10s → ALL operators blocked for 10 minutes

Audit Log Evidence

TimestampTarget UserIPType
2026-01-30T08:37:416391.82.81.14videochat-operator-screenshot
2026-02-27T08:28:01991.82.81.14videochat-operator-screenshot
2026-03-13T10:32:387491.82.81.14videochat-operator-screenshot
2026-03-16T12:54:515691.82.81.14videochat-operator-screenshot

4 different operators, same IP, same throttle type.

Flow Diagram

sequenceDiagram
    participant OpA as Operator A
    participant OpB as Operator B
    participant API as screenshot.js
    participant IPF as IpFilterService
    participant Audit as AuditLog

    Note over IPF: Tracking: 91.82.81.14:videochat-operator-screenshot

    OpA->>API: POST /api/screenshot (attempt 1)
    API->>IPF: check(91.82.81.14, screenshot)
    IPF-->>API: OK (1/5)

    OpA->>API: POST /api/screenshot (attempt 2)
    IPF-->>API: OK (2/5)

    OpB->>API: POST /api/screenshot (attempt 3)
    IPF-->>API: OK (3/5)

    OpA->>API: POST /api/screenshot (attempt 4)
    IPF-->>API: OK (4/5)

    OpB->>API: POST /api/screenshot (attempt 5)
    IPF-->>API: OK (5/5)

    OpA->>API: POST /api/screenshot (attempt 6)
    API->>IPF: check(91.82.81.14, screenshot)
    IPF->>Audit: user.user.throttled
    IPF-->>API: BLOCKED (cooldown 10min)
    API-->>OpA: ERROR_FKIPFT02

    Note over IPF: ALL operators from 91.82.81.14 blocked for 10 minutes

    OpB->>API: POST /api/screenshot
    IPF-->>API: BLOCKED
    API-->>OpB: ERROR_FKIPFT02

Root Cause 2: RPC Transport Timeouts (SECONDARY)

142 screenshot failures + 698 total transport timeouts

Even without throttling, screenshots fail because the RPC call from vuer_ossvuer_css (rpc-transport-css) times out.

Failed to create remote screenshot Error: RPCCLIENT MESSAGE TIMEOUT rpc-transport-css

Failure distribution by day

DateScreenshot Failures
2026-02-0917
2026-02-2313
2026-02-2710
2026-02-049
2026-01-219
2026-03-137
2026-03-166
2026-03-106

CSS-side evidence

  • TransportError events in vuer_css logs
  • Customer transport close disconnections (“the user has lost connection, or the network was changed from WiFi to 4G”)
  • UNABLE TO MATCH RPC REPLY warnings — responses arrive after timeout

Likely cause

The customer’s browser disconnects (network change, WiFi→4G, poor connectivity) before the screenshot RPC completes. The operator requests a remote screenshot → vuer_oss sends RPC to vuer_css → vuer_css tries to capture from customer’s WebRTC stream → customer is disconnected → timeout.

Affected Components

ComponentFileRole
IpFilterServiceserver/service/IpFilterService.jsRate limiting by IP+tag
Screenshot APIserver/web/api/operator/screenshot.js:10Applies throttle middleware
Audit logserver/auditlog.js:98-99Records throttle events
Configconfig/docker.json > security.ipFilterThresholds
VideoChatServiceserver/service/VideoChatService.jsprocessRemoteScreenshot()
TransportCss RPCserver/queue/rpc_client/rpc-transport-css channel
#FixTypeTimeRisk
1Whitelist 91.82.81.14 in uncheckedIpsConfig5 minDisables ALL rate limiting for that IP
2Increase suppressAfterAttempts to 50+Config5 minWeakens login protection globally
3Per-tag throttle overrides in IpFilterServiceCode2hNone — proper separation
4User-based throttling for authenticated endpointsRefactor4hBest long-term solution
5Investigate RPC transport timeout root causeDebug?May need timeout increase or retry logic
{
  "security": {
    "ipFilter": {
      "suppressAfterAttempts": 5,
      "throttlePeriodMs": 10000,
      "coolDownPeriodMs": 600000,
      "overrides": {
        "videochat-operator-screenshot": {
          "suppressAfterAttempts": 100,
          "throttlePeriodMs": 60000,
          "coolDownPeriodMs": 60000
        }
      }
    }
  }
}

Prevention

  • All IP-based rate limiting should be reviewed for NAT-awareness
  • Authenticated endpoints should throttle by req.user.id, not req.ip
  • Rate limiter should emit a more descriptive error (not HTTP 200 with error string)
  • Consider adding a monitoring alert for user.throttled events