SLANUSZ-25 — Fotózás újra nem működik

P1 — Production screenshot capture broken for all NUSZ operators

Summary

Operators at NUSZ cannot take screenshots during videochat sessions. The IpFilterService rate limiter blocks ALL operators after 5 screenshot attempts within 10 seconds from any combination of operators, because all operators share a single corporate NAT IP (91.82.81.14). Cooldown is 10 minutes, causing recurring 10-15 minute outage windows.

A secondary issue compounds this: 698 RPC transport timeouts (rpc-transport-css) indicate vuer_css is frequently unresponsive to screenshot RPC calls, likely due to customer WebSocket disconnections.

Original Complaint

“Sziasztok! Kérelk nézzétek meg mi lehet a probléma. Ismét nem tududnk fotókat készíteni vagy ha mégis akkor aaz alábbi hibaüzenet érekezik a rendszertől” — SzaboneNagy.Zsuzsa

Translation: “Hi! Please look at what the problem could be. Again we can’t take photos or if we do, the following error message arrives from the system.”

Root Cause 1: IP-Based Rate Limiting (PRIMARY)

Evidence Chain

Confirmed with high confidence

server/web/api/operator/screenshot.js:10 applies createIpMonitorAndFilter('videochat-operator-screenshot')
IpFilterService (server/service/IpFilterService.js) tracks requests by {IP}:{tag} key
Config: suppressAfterAttempts=5, throttlePeriodMs=10000 (10s), coolDownPeriodMs=600000 (10min)
uncheckedIps is empty (no whitelist)
All NUSZ operators exit through corporate NAT: 91.82.81.14
After 5 screenshots from ANY operator combo within 10s → ALL operators blocked for 10 minutes

Audit Log Evidence

Timestamp	Target User	IP	Type
2026-01-30T08:37:41	63	91.82.81.14	videochat-operator-screenshot
2026-02-27T08:28:01	9	91.82.81.14	videochat-operator-screenshot
2026-03-13T10:32:38	74	91.82.81.14	videochat-operator-screenshot
2026-03-16T12:54:51	56	91.82.81.14	videochat-operator-screenshot

4 different operators, same IP, same throttle type.

Flow Diagram

sequenceDiagram
    participant OpA as Operator A
    participant OpB as Operator B
    participant API as screenshot.js
    participant IPF as IpFilterService
    participant Audit as AuditLog

    Note over IPF: Tracking: 91.82.81.14:videochat-operator-screenshot

    OpA->>API: POST /api/screenshot (attempt 1)
    API->>IPF: check(91.82.81.14, screenshot)
    IPF-->>API: OK (1/5)

    OpA->>API: POST /api/screenshot (attempt 2)
    IPF-->>API: OK (2/5)

    OpB->>API: POST /api/screenshot (attempt 3)
    IPF-->>API: OK (3/5)

    OpA->>API: POST /api/screenshot (attempt 4)
    IPF-->>API: OK (4/5)

    OpB->>API: POST /api/screenshot (attempt 5)
    IPF-->>API: OK (5/5)

    OpA->>API: POST /api/screenshot (attempt 6)
    API->>IPF: check(91.82.81.14, screenshot)
    IPF->>Audit: user.user.throttled
    IPF-->>API: BLOCKED (cooldown 10min)
    API-->>OpA: ERROR_FKIPFT02

    Note over IPF: ALL operators from 91.82.81.14 blocked for 10 minutes

    OpB->>API: POST /api/screenshot
    IPF-->>API: BLOCKED
    API-->>OpB: ERROR_FKIPFT02

Root Cause 2: RPC Transport Timeouts (SECONDARY)

142 screenshot failures + 698 total transport timeouts

Even without throttling, screenshots fail because the RPC call from vuer_oss → vuer_css (rpc-transport-css) times out.

Failed to create remote screenshot Error: RPCCLIENT MESSAGE TIMEOUT rpc-transport-css

Failure distribution by day

Date	Screenshot Failures
2026-02-09	17
2026-02-23	13
2026-02-27	10
2026-02-04	9
2026-01-21	9
2026-03-13	7
2026-03-16	6
2026-03-10	6

CSS-side evidence

TransportError events in vuer_css logs
Customer transport close disconnections (“the user has lost connection, or the network was changed from WiFi to 4G”)
UNABLE TO MATCH RPC REPLY warnings — responses arrive after timeout

Likely cause

The customer’s browser disconnects (network change, WiFi→4G, poor connectivity) before the screenshot RPC completes. The operator requests a remote screenshot → vuer_oss sends RPC to vuer_css → vuer_css tries to capture from customer’s WebRTC stream → customer is disconnected → timeout.

Affected Components

Component	File	Role
IpFilterService	`server/service/IpFilterService.js`	Rate limiting by IP+tag
Screenshot API	`server/web/api/operator/screenshot.js:10`	Applies throttle middleware
Audit log	`server/auditlog.js:98-99`	Records throttle events
Config	`config/docker.json` > `security.ipFilter`	Thresholds
VideoChatService	`server/service/VideoChatService.js`	`processRemoteScreenshot()`
TransportCss RPC	`server/queue/rpc_client/`	`rpc-transport-css` channel

Recommended Fixes

#	Fix	Type	Time	Risk
1	Whitelist `91.82.81.14` in `uncheckedIps`	Config	5 min	Disables ALL rate limiting for that IP
2	Increase `suppressAfterAttempts` to 50+	Config	5 min	Weakens login protection globally
3	Per-tag throttle overrides in IpFilterService	Code	2h	None — proper separation
4	User-based throttling for authenticated endpoints	Refactor	4h	Best long-term solution
5	Investigate RPC transport timeout root cause	Debug	?	May need timeout increase or retry logic

Fix 3 Example (Recommended)

{
  "security": {
    "ipFilter": {
      "suppressAfterAttempts": 5,
      "throttlePeriodMs": 10000,
      "coolDownPeriodMs": 600000,
      "overrides": {
        "videochat-operator-screenshot": {
          "suppressAfterAttempts": 100,
          "throttlePeriodMs": 60000,
          "coolDownPeriodMs": 60000
        }
      }
    }
  }
}

Prevention

All IP-based rate limiting should be reviewed for NAT-awareness
Authenticated endpoints should throttle by req.user.id, not req.ip
Rate limiter should emit a more descriptive error (not HTTP 200 with error string)
Consider adding a monitoring alert for user.throttled events

FaceKom — Platform overview
vuer_oss — Backend architecture (IpFilterService in Services Layer section)
security-audit — Rate limiting listed as positive finding, but this shows a gap
breakage-risks — NUSZ has 74 core file modifications
customization-branches — NUSZ branch details
debug-agents — Agent pipeline used for this investigation
room-export-blueprint — Room export analysis methodology

Levandor

Explorer

SLANUSZ-25

SLANUSZ-25 — Fotózás újra nem működik

Summary

Original Complaint

Root Cause 1: IP-Based Rate Limiting (PRIMARY)

Evidence Chain

Audit Log Evidence

Flow Diagram

Root Cause 2: RPC Transport Timeouts (SECONDARY)

Failure distribution by day

CSS-side evidence

Likely cause

Affected Components

Recommended Fixes

Fix 3 Example (Recommended)

Prevention

Graph View

Table of Contents

Backlinks

Levandor

Explorer

SLANUSZ-25

SLANUSZ-25 — Fotózás újra nem működik

Summary

Original Complaint

Root Cause 1: IP-Based Rate Limiting (PRIMARY)

Evidence Chain

Audit Log Evidence

Flow Diagram

Root Cause 2: RPC Transport Timeouts (SECONDARY)

Failure distribution by day

CSS-side evidence

Likely cause

Affected Components

Recommended Fixes

Fix 3 Example (Recommended)

Prevention

Related

Graph View

Table of Contents

Backlinks