vuer_cv (Computer Vision Service)

Role

FaceKom’s computer vision and ML inference service. Provides face detection, face comparison, liveness detection, document recognition, OCR, MRZ reading, barcode detection, hologram verification, anti-spoofing (PAD/deepfake), background masking, and speech detection via HTTP and WebSocket APIs. Communicates with vuer_oss through HTTP/WebSocket, and internally uses Redis RPC between its own ~10 Supervisor-managed processes.

Property	Value
Runtime	Python (uWSGI + asyncio)
HTTP Framework	Falcon (WSGI)
Inference	ONNX Runtime (GPU/CPU), PyTorch, Detectron2
Repo	TechTeamer/vuer_cv
Path (remote)	`/workspace/vuer_cv`
Path (local mount)	`/Users/levander/coding/mnt/Facekom/vuer_cv`
Models	16 ONNX + 5 non-ONNX (Git LFS)
Processes	~10 Supervisor-managed

Architecture Overview
Entry Point & Process Model
ONNX Runtime & Model Loading
Redis RPC System
Image Cache (AppCache)
CV Engines
HTTP API (REST Endpoints)
WebSocket System (Real-Time)
Document Definitions
Face Pipeline
Document Pipeline
Liveness Detection
Anti-Spoofing (PAD & Deepfake)
Hologram Detection
OCR System
Utility Layer
Config System
Security Analysis
Performance Patterns
Code Smells & Technical Debt
Magic Numbers & Hardcoded Thresholds
Complete File Index

Architecture Overview

vuer_cv is a multi-process Python service. Each process loads different ONNX ML models and communicates via Redis RPC. The HTTP/WebSocket servers are the entry points; satellite processes handle face recognition, document detection, OCR, MRZ reading, text detection, background masking, PAD/deepfake detection, and speech detection.

                        +------------------+
                        |   vuer_cv (main)  |
                        | HTTP (Falcon)     |
                        | WebSocket (ws)    |
                        +--------+---------+
                                 |
                          Redis RPC (blpop)
                                 |
         +-----------+-----------+-----------+-----------+
         |           |           |           |           |
    app_face.py  app_card*.py  app_ocr.py  app_pad.py  app_onnx.py
    (face RPC)   (card RPC)   (OCR RPC)   (PAD RPC)   (bg/speech)

Supervisor Processes

RPC Queue	Purpose	Entry Point
`face`	Face detection, landmarks, encoding, gender/age, sunglasses	`app_face.py`
`cardDetector`	Card corner detection, classification, integrity	`app_card_detector.py`
`ocr`	Document OCR, generic OCR, KAPTCHA	`app_ocr.py`
`mrz`	MRZ reading from documents	`app_mrz.py`
`textDetector`	Text region detection in images	`app_text_detector.py`
`detectron2`	Barcode detection (via Detectron2)	`app_detectron2.py`
`pad`	Deepfake detection, PAD (paper/screen) scoring	`app_pad.py`
`onnx`	Background masking, speech detection	`app_onnx.py`
-	HTTP API (uWSGI)	`app_http.py`
-	WebSocket real-time processing	`app_websocket.py`

All managed by Supervisor with Nginx reverse proxy (config generated at runtime via Jinja2 templates from scaling settings).

Entry Point & Process Model

Canonical startup sequence (using `app_face.py` as example)

Import config from server/cfg.py
Call checkGpuEnabled(config, "face") — checks if GPU should be disabled per scaling config
Import ONNX models (face_detection, face_landmark_detection, face_encoder, gender_age_prediction, sunglasses_detection)
Connect to Redis, create AppCache
Define decorated functions using @LazyImage.convert (makes them callable via RPC with just imageId)
Run model warmup (5 runs by default, configurable via WARMUP_NUM_RUNS env var)
Start Redis RPC server

Scaling

If config.scaling.face.workers > 1, uses ThreadPoolExecutor to run multiple RPC server instances in parallel within the same process (sharing models in memory).

Face RPC Functions

Function	Purpose
`getLocations(imageId, threshold)`	Face bounding boxes
`getLandmarks(imageId)`	98-point face landmarks
`getEncodings(imageId)`	Face embeddings (for comparison)
`getSunglassesScores(imageId)`	Sunglasses detection probability
`getGenderAgePredictions(imageId, box)`	Gender probability + age
`calcFacePositions(imageId)`	Distance from image center
`calcFaceDistance(enc1, enc2)`	Cosine distance between face embeddings
`detectFace(imageId)`	Combined detection + encoding
`drawFace(imageId)`	Debug visualization with landmark markers
`compareFaces(img1, img2)`	Full compare pipeline

Caching Strategy

Results are memoized per-image via LazyImage.getMeta() which stores computed results in Redis (keyed by metadata:{imageId}:{metaName}). Repeated calls for the same image ID return cached landmarks/encodings/etc.

ONNX Runtime & Model Loading

GPU/CPU Selection (`onnx/utils.py`)

DeviceMode enum: cpu, gpu, force_gpu

createInferenceSession() handles GPU fallback logic:

INFERENCE_DEVICE_MODE env var controls mode
gpu mode: tries CUDA, falls back to CPU on error (default)
force_gpu: no fallback, crashes if GPU unavailable
cpu: CPU only
GPU detection: calls nvidia-smi -L and checks output
Supports TensorRT acceleration if useTensorRt=True

runSession() has two execution paths:

CPU: Direct session.run()
GPU: Uses io_binding() for zero-copy GPU memory transfers (bind_cpu_input → copy_outputs_to_cpu)

Global env mutation

checkGpuEnabled() mutates os.environ globally when disabling GPU for a specific model, affecting all subsequent model loads in the same process.

Mixins

NdarrayNormalizerMixin: ImageNet-style normalization (mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
AspectPreservingResizePadMixin: Resize with letterboxing to square, returns scale/offset for coordinate back-mapping

Base Class (`onnx/ml_model_runner.py`)

Abstract base class for all ONNX model runners. Constructor loads model via createInferenceSession() and runs random warmup. Subclasses implement _setup() (returns warmup shape/dtype) and _runSession().

ONNX Models (16 total)

Model	Purpose	Used By
face_detection	Face bounding box detection	app_face.py
face_landmark_detection	98-point facial landmarks	app_face.py
face_encoder	Face embedding generation	app_face.py
gender_age_prediction	Gender probability + age estimation	app_face.py
sunglasses_detection	Sunglasses detection	app_face.py
localization	Document corner detection	card detector
card_classification	Document type classification	card detector
card_integrity	Document tampering detection	card detector
text_detection	Text region detection (CRAFT-like)	text detector
lstm	Character recognition (CRNN)	OCR process
barcode_detection	Barcode region detection (Detectron2)	detectron2
kaptcha	CAPTCHA solving	OCR process
deepfake_detection	Video deepfake detection	PAD process
pad_prediction	Presentation attack detection (paper/screen)	PAD process
background_masking	Person segmentation / background removal	onnx process
speechDetection	Voice activity detection	onnx process

Non-ONNX Models (CPU only)

Model	Function	Runtime
card_detector_classification_svm.pkl	SVM classification	Scikit-learn
easyOcr/cyrillic_g2.pth	Serbian Cyrillic OCR	EasyOCR (PyTorch)
easyOcr/latin_g2.pth	Serbian Latin OCR	EasyOCR (PyTorch)
detectron2_card_detector.pth	Document recognition	Detectron2 (PyTorch)
detectron2_barcode_detector.pth	Barcode detection	Detectron2 (PyTorch)
tesseract OCR data files	MRZ reading	Passporteye

GPU Inference Modes

Mode	Behavior
`cpu`	Force all models to CPU
`gpu`	Try CUDA, fallback to CPU (default)
`force_gpu`	Require CUDA, fail if unavailable

Requires NVIDIA Container Toolkit on host.

Redis RPC System

Protocol (`server/rpc/pyredisrpc.py`)

Based on gowhari/pyredisrpc (MIT license), customized.

sequenceDiagram
    participant Client as HTTP/WS Process
    participant Redis
    participant Server as Model Process (e.g. app_face.py)

    Client->>Redis: LPUSH rpc:{serviceName} {id, method, params, tmchk}
    Client->>Redis: SET rpc:{reqId}:tmchk (TTL)
    Server->>Redis: BLPOP rpc:{serviceName}
    Redis-->>Server: request JSON
    Server->>Server: Process (run ONNX model)
    Server->>Redis: Check rpc:{reqId}:tmchk not expired
    Server->>Redis: LPUSH rpc:{reqId} {id, result, error}
    Client->>Redis: BLPOP rpc:{reqId} (timeout)
    Redis-->>Client: response JSON

Magic __getattr__: RedisRpcClient uses __getattr__ to dynamically generate method calls. faceRpcClient.detectFace(imageId) becomes call("detectFace", [[imageId], {}]).

NumpyEncoder: Custom JSON encoder that converts np.ndarray to lists for serialization. All numpy arrays are serialized to JSON lists for Redis transport.

Performance concern

Every RPC call serializes numpy arrays to JSON, passes through Redis, and deserializes. For face encodings (512-float vectors) and landmarks (98x2 floats), this adds latency on every call.

Image Cache (AppCache)

`server/appcache.py` — Redis-Based Storage

Key Pattern	Format	TTL	Purpose
`raw-image:{uuid}`	Original bytes (JPEG/PNG)	60s	Uploaded images
`numpy-image:{uuid}`	Binary ndarray + 12-byte header (h,w,c as big-endian uint32)	60s	Decoded images
`metadata:{uuid}:{name}`	Pickled Python objects	60s	Face locations, encodings, etc.
`audio-frame:{uuid}`	1B layout + 15B format + 3B sample_rate + payload	2s	Audio frames

Lazy conversion: getNumPyImage() auto-converts raw images to numpy if the numpy version doesn’t exist (using imageio_imread with EXIF rotation). Alpha channels are stripped.

Pickle deserialization

Metadata uses pickle.loads() for deserialization (server/appcache.py:106). If an attacker can write to Redis, they can achieve arbitrary code execution. Metadata keys are predictable (metadata:{imageId}:{metaName}). See security-audit.

CV Engines

FaceEngine (`server/cv/face_engine.py`)

Computes facial action units and head pose from landmarks. Does NOT do ML inference — uses geometric calculations on the 98-point landmark output.

Computed features:

Feature	Method	Description
`eyeRatio(face, side)`	height/width	Eye aspect ratio — blink detection
`smileRatio(face)`	mouth width / face width	Smile detection
`mouthRatio(face)`	mouth height / width	Mouth openness
`headX(face)`	ear-to-nose distance ratio	Horizontal turn (-1 left to +1 right)
`headY(face)`	ear-line vs nose position	Vertical tilt
`headTilt(face)`	arctan2 of ear positions	Head rotation

Action detection thresholds:

Action	Threshold	Metric
Blink	< 0.21	Both eye aspect ratios
Smile	> 0.51 (or idle + 0.06)	Smile ratio
Look left	< -0.6	headX
Look right	> 0.6	headX
Look up	> 0.25	headY
Look down	< -0.3	headY (asymmetric!)

Asymmetric thresholds

Look up/down thresholds are asymmetric (0.25 vs 0.3), presumably because looking down produces more distinct landmark displacement.

DocumentEngine (`server/cv/document_engine.py`)

Thin facade over RPC calls to cardDetector process:

detect(imageId) — calls rpcClient.detectCard() (localization model)
classify(imageId) — calls rpcClient.classifyCard() (classification model)
checkCardIntegrity(imageId) — calls rpcClient.checkCardIntegrity()
resizeToAspectRatio(warpedCardId, documentName) — resizes warped card to match document template

DocumentOcrEngine (`server/cv/document_ocr_engine.py`)

Orchestrates OCR for recognized documents. Most complex engine.

Flow:

Instantiate document class with image
For each requested OCR key: a. Try MRZ first (if document has MRZ and key maps to MRZ field) b. If MRZ unavailable/invalid, detect text regions (TextDetectorEngine) c. Calculate anchor offsets to compensate for document alignment d. Crop text ROIs and run OCR
Apply corrections (number/letter confusion fixing, regex validation)
Retry failed keys with deskewing

ThreadPoolExecutor: Uses cpu_cores/4 workers (min 1, max 8) for parallel OCR.

Anchor system: Documents define “anchor points” — known text positions used to calculate offset between template and actual image, compensating for warping imperfections.

OcrEngine (`server/cv/ocr_engine.py`)

OCR backends:

LSTM OCR (default): Custom CRNN model via ONNX
EasyOCR rs_cyrillic: For Serbian Cyrillic documents (config-gated)
EasyOCR rs_latin: For Serbian Latin documents (config-gated)

EasyOCR downloads are disabled (download_enabled=False), models stored at /workspace/vuer_cv/data/easyOcr.

Character confusion correction:

CONFUSED_LETTERS: 0->O, 1->I, 2->Z, 3->B, 4->A, 5->S, 6->G, 7->T, 8->B, 9->Y
CONFUSED_NUMBERS: D->0, Q->0, O->0, I->1, Z->2, A->4, S->5, G->6, T->7, B->8, Y->9

CardWarpEngine (`server/cv/card_warp_engine.py`)

Detects document corners and applies perspective transform: onnx.localization.main.detectCorners() → cv2.getPerspectiveTransform → cv2.warpPerspective.

CardClassificationEngine (`server/cv/card_classification_engine.py`)

Classifies document type from warped card image. CLASSIFICATION_SCORE_THRESHOLD = 0.3 — returns None if confidence < 30%.

CardIntegrityCheckEngine (`server/cv/card_integrity_check_engine.py`)

Detects document tampering. Returns max anomaly score.

TextDetectorEngine (`server/cv/text_detector_engine.py`)

Detects text regions in images for OCR. CANVAS_SIZE = 1024 (images resized to this for detection). PADDING = 1% horizontal and vertical added to detected boxes.

Complex ROI logic (getRoisFromMask): Filters text detections by center point within specified ROI, height > 2x padding, blacklist filtering, target height adjustment, and multi-line splitting support.

BarcodeEngine (`server/cv/barcode_engine.py`)

Uses Detectron2 model via RPC for barcode localization, then pyzbar library for decoding. Tries 6 rotation angles: 0, 15, -15, 30, -30, 45 degrees.

BackgroundMaskEngine (`server/cv/background_mask_engine.py`)

Person segmentation and background replacement. Alpha blending: result = image * mask + background * (1 - mask). Default white background.

SharpnessEngine (`server/cv/sharpness_engine.py`)

Image quality via Laplacian variance: cv2.Laplacian(resized, CV_8U).var(), clamped to [0, 255]. Converts to grayscale, rotates to horizontal if portrait, resizes to fit 320x240.

PADDetectionEngine (`server/cv/pad_detection_engine.py`)

Thin RPC wrapper for deepfake and presentation attack detection:

processClipForDeepFake(clip) — sends video clip frames for deepfake analysis
getPADScores(imageId, locations) — gets paper/screen attack scores per face

KaptchaEngine (`server/cv/kaptcha_detect_engine.py`)

CAPTCHA solving via OCR process RPC.

StatusEngine (`server/cv/status_engine.py`)

Health monitoring and hardware status. Health levels: green/yellow/red.

Checks all Supervisor processes via supervisorctl status
Yellow if any process not RUNNING or started >10s after main process
Checks WebSocket process logs for “LISTENING” after 24h uptime
Reports CPU (model, cores, frequency, utilization), GPU (via nvidia-smi), RAM info

HTTP API (REST Endpoints)

Framework: Falcon (WSGI). All endpoints use JSON schema validation.

Face Endpoints

Method	Path	Resource Class	Purpose
POST	/face/detect	FaceDetectResource	Detect faces, encodings, landmarks, actions
POST	/face/compare	FaceCompareResource	Compare two face images (cosine distance)
POST	/face/draw	FaceDrawResource	Draw landmarks on image
POST	/face/age-gender	FaceGenderAgeResource	Predict age and gender
POST	/face/reference-extract	ReferenceFaceExtractResource	Extract reference face with quality checks

Document Endpoints

Method	Path	Resource Class	Purpose
POST	/document/recognition	DocumentRecognitionResourceV2	Detect + classify document
POST	/document/warp	DocumentWarpResourceV2	Detect + warp + classify + sharpness
POST	/document/ocr	DocumentOcrResource	OCR on recognized document
POST	/document/card-warp	CardWarpResource	Just corner detection + warp
POST	/document/card-integrity	CardIntegrityCheckResource	Tampering score
GET	/document/types	DocumentTypesResource	List supported document types

Other Endpoints

Method	Path	Resource Class	Purpose
POST	/image/upload	ImageUploadResource	Upload image (max 3250x3250, auto-downsize)
POST	/image/download	ImageDownloadResource	Download image as PNG
POST	/barcode	BarCodeResource	Read barcodes from image
POST	/barcode/detect	BarCodeDetectResource	Detect barcode regions
POST	/mrz	MrzResource	Read MRZ from image
POST	/ocr	OcrResource	Generic OCR
POST	/sharpness	SharpnessResource	Calculate image sharpness
POST	/kaptcha	KaptchaDecoderResource	Decode CAPTCHA image
POST	/background-mask	BackgroundMaskResource	Background removal/replacement
GET	/ping	PingResource	Health check
GET	/status	StatusResource	Detailed server status

Image Upload Security

ImageUploadResource:

MIME type validation via python-magic (reads first 1024 bytes)
Supported types from config (config.image.supportedMimeTypes)
Max resolution 3250x3250, auto-downsized while preserving EXIF
Images stored in Redis with 60s TTL

WebSocket System (Real-Time)

Architecture

flowchart TD
    Browser["Browser"]
    WS["WSConnection\n(pyee EventEmitter)"]
    SH["StreamHandler\n(WebRTC signaling via aiortc)"]
    FS["FrameStore\n(video/audio frame extraction)"]
    P["Processor\n(multiprocessing.Process per worker)"]
    CVT["CVTask\n(orchestrates pipelines)"]

    Browser <-->|"WebSocket JSON + binary"| WS
    WS --> SH
    SH --> FS
    FS --> P
    P --> CVT
    CVT -->|"results via WS"| Browser

WSConnection (`server/websocket/wsconnection.py`)

Event-based message routing using pyee.asyncio.AsyncIOEventEmitter. Messages are JSON with {type, payload} structure. Binary messages emit “binary” event.

StreamHandler (`server/websocket/stream_handler.py`)

Handles WebRTC signaling (offer/answer). Uses aiortc for server-side WebRTC. Supports custom ICE servers passed from client or from config. Retry logic: webrtcConnectWithAllocationMismatchRetry() retries 3 times on TURN 437 errors.

FrameStores

Class	Purpose	Storage
NumpyFrameStore	Video frames as numpy arrays	Redis NumPy format, 2s TTL
RawFrameStore	Video frames as JPEG	Redis raw format, 2s TTL
AudioFrameStore	Audio with resampling + accumulation	Redis audio format, 2s TTL

Processor (`server/websocket/processor/processor.py`)

Abstract multiprocessing processor using multiprocessing.Queue for IPC:

input() puts imageId into input queue (non-blocking)
Background process reads from input queue, calls doProcess(), puts result in output queue
Frame dropping: If input queue has items when process finishes one, drains queue (keeps only latest)
Cleanup: Graceful shutdown with sentinel values → SIGTERM → SIGKILL with escalating timeouts

CV Task Hierarchy

classDiagram
    AbstractCVTask <|-- BasicCVTask
    BasicCVTask <|-- CVTask
    CVTask <|-- FaceCVTask
    CVTask <|-- DocumentCVTask
    CVTask <|-- LivenessCVTask
    CVTask <|-- LivenessV2CVTask
    CVTask <|-- BarcodeCVTask
    CVTask <|-- MrzCVTask
    CVTask <|-- SharpnessCVTask
    CVTask <|-- SpeechDetectorTask
    CVTask <|-- ActionCVTask
    CVTask <|-- HoloVideoCVTask
    CVTask <|-- HoloImageCVTask
    CVTask <|-- HoloV2CVTask
    AbstractCVTask <|-- PADDetectorCVTask
    class AbstractCVTask {
        +start()
        +stop()
    }
    class BasicCVTask {
        +FrameStore
        +Processor
    }
    class CVTask {
        +StreamHandler (WebRTC)
        +FPSCounter
    }
    class PADDetectorCVTask {
        standalone, no WebRTC
        binary WS chunks
    }

StreamPlayerCVTask is completely separate — plays pre-recorded media files via WebRTC.

Key CV Tasks

FaceCVTask: Accepts recognitionType: "smile" | "blink" and optional faceToFind encoding. Detects face, waits for specified action, compares with faceToFind if provided (threshold default 0.6).

DocumentCVTask: Streaming document detection with sharpness check.

PADDetectorCVTask: Receives video as binary WebSocket chunks (not WebRTC). Uses StreamingBuffer with av for demuxing. Builds clips of 8 frames for deepfake detection. PAD scoring weighted by face crop area.

StreamPlayerCVTask: Plays pre-recorded media files via WebRTC. Directory restriction: config.streamPlayerDir must be set and file must be within it.

Document Definitions

AbstractDocument (`server/documents/document.py`)

Base class for all document definitions. Defines:

Static document dimensions (width, height in “document coordinates”)
Face ROI location (for face extraction from ID photos)
Hologram ROI definitions (position, density, weight)
MRZ ROI location
OCR field definitions (ROIs, regex, corrections, whitelists)
Anchor points for OCR alignment correction

Supported Documents (17 total)

Hungarian (HUN) — 14 documents:

Class	Name	Type
HunAo03001	HUN-AO-03001	ID card (address side)
HunBo03004Back	HUN-BO-03004_BACK	Driving license back
HunBo03004Front	HUN-BO-03004_FRONT	Driving license front
HunBo04001Front	HUN-BO-04001_FRONT	Driving license front (newer)
HunBo05001Back	HUN-BO-05001_BACK	Driving license back (newer)
HunBo05001Front	HUN-BO-05001_FRONT	Driving license front (newer)
HunBo06001BackPO	HUN-BO-06001_BACK_PO	Driving license back (latest)
HunBo06001Front	HUN-BO-06001_FRONT	Driving license front (latest)
HunBo07001Front	HUN-BO-07001_FRONT	Driving license front (latest+)
HunFo02001Back	HUN-FO-02001_BACK	Personal ID back
HunFo02001Front	HUN-FO-02001_FRONT	Personal ID front
HunFo04001Back	HUN-FO-04001_BACK	Personal ID back (newer)
HunFo04001Front	HUN-FO-04001_FRONT	Personal ID front (newer)
HunHo10001	HUN-HO-10001	Residence permit

Serbian (SRB) — 3 documents:

Class	Name	Type
SrbAo01001	SRB-AO-01001	Serbian ID
SrbBo01001Back	SRB-BO-01001_BACK	Driving license back
SrbBo01001Front	SRB-BO-01001_FRONT	Driving license front

Example: HUN-AO-03001 (Hungarian ID card)

Dimensions: 940 x 650 (document coordinates)
Face ROI: [15, 135, 260, 335] (x, y, w, h)
MRZ ROI: [10, 490, 920, 130]
Hologram ROIs: 2 regions (main hologram + OVI security feature)
OCR fields: documentId, type, code, lastName, firstName, birthName, nationality, dateOfBirth, sex, placeOfBirth, dateOfIssue, authority, dateOfExpiry
Anchor-based alignment: Uses documentId and dateOfExpiry as anchor points with max 25px offset tolerance
Custom corrections: correctDate() (Hungarian month names), correctNationality() (fuzzy match “MAGYAR/HUNGARIAN”), correctAddress() (Hungarian address format), correctAuthority() (issuing authority names)

Document Classification Mapping

config.DOCUMENT_CLASS_ID_MAPPING maps document names to integer class IDs (loaded from data/card_detector_classification_classId_mapping.json). The classification model outputs these integer IDs.

Face Pipeline

HTTP Path (batch)

flowchart LR
    Client["Client POST /face/detect"]
    FDR["FaceDetectResource"]
    RPC["Redis RPC -> app_face.py"]
    FD["face_detection ONNX"]
    FL["face_landmark_detection ONNX"]
    FE["face_encoder ONNX"]
    Cache["Redis metadata cache"]

    Client --> FDR --> RPC
    RPC --> FD -->|"bounding boxes"| FL -->|"98-point landmarks"| FE -->|"512-d embeddings"| Cache

Client POST /face/detect {imageId}
  -> FaceDetectResource
    -> faceRpcClient.detectFace(imageId) [Redis RPC to app_face.py]
      -> getLocations(imageId) -> detectFaces(image, threshold) [face_detection ONNX]
      -> getLandmarks(imageId) -> detectFaceLandmarks(image, box) [face_landmark_detection ONNX] per face
      -> getEncodings(imageId) -> getFaceEncodings(image, landmark) [face_encoder ONNX] per face
    -> Results cached per imageId in Redis metadata

WebSocket Path (streaming)

Browser <-> WebRTC <-> StreamHandler <-> NumpyFrameStore
  -> FaceProcessor (multiprocessing)
    -> faceRpcClient.getLandmarks(imageId) [Redis RPC]
    -> faceRpcClient.getLocations(imageId) [Redis RPC]
    -> faceRpcClient.getEncodings(imageId) [Redis RPC]
    -> FaceEngine.getFaceData(landmarks) [geometric computation]
    -> FaceEngine.userActions(faceData) [threshold checks]
  -> FaceCVTask.processOutput() [send results via WebSocket]

Face Comparison

cosine_distances(enc1.reshape(1,-1), enc2.reshape(1,-1))[0][0]

Using sklearn’s cosine_distances. Distance 0 = identical, 1 = completely different. Default threshold: 0.6.

Document Pipeline

flowchart TD
    IMG["Input Image"]
    DET["DocumentEngine.detect()\nlocalization ONNX -> 4 corners"]
    WARP["CardWarpEngine.warp()\nperspective transform"]
    CLS["DocumentEngine.classify()\ncard_classification ONNX -> classId"]
    LOOKUP["getDocumentByClassId()\nlookup document definition"]
    SHARP["SharpnessEngine.calcSharpness()\nLaplacian variance check"]
    RESIZE["DocumentEngine.resizeToAspectRatio()"]
    OCR["DocumentOcrEngine.run()"]
    MRZ["1. Try MRZ first"]
    TEXT["2. Detect text regions"]
    ANCHOR["3. Calculate anchor offsets"]
    CROP["4. Crop ROIs per field"]
    RECOG["5. Run character recognition"]
    CORRECT["6. Apply corrections"]
    RETRY["7. Retry failed with deskewing"]

    IMG --> DET --> WARP --> CLS --> LOOKUP --> SHARP --> RESIZE --> OCR
    OCR --> MRZ --> TEXT --> ANCHOR --> CROP --> RECOG --> CORRECT --> RETRY

Liveness Detection

V1 (`LivenessCVTask`)

Protocol (WebSocket messages):

Client sends video stream via WebRTC
Server detects face, validates position/size
Server sends {type: "liveness", task: "face"} — “show your face”
Server confirms face, sends random action: {task: "smile"}
Client performs action
Server evaluates: success (+1) or wrong_action/too_many_actions (-1)
Repeat until score >= successScore (3) or score <= failScore (-2)
Optional faceToFind encoding comparison on each frame

Face position validation:

Must be within 35% of center (both axes)
Face size between 3-25% of image area
Guide messages: “move_head_up/down/left/right/closer/away”

V2 (`LivenessV2CVTask`)

Two phases:

Phase 1: Reference Face Capture

Wait 3 seconds for user to position face
Capture low-quality reference (min 360px)
Request high-quality still image from client (min 720px)
Run quality checks: resolution, single person, face size, face position, extras (sunglasses, closed mouth, no smile, eyes open, facing camera)
Compare low/high quality face encodings

Phase 2: Action Challenges (same as V1 but with progress)

Reports action progress (0.0 to 1.0) per frame
Guide state machine prevents UI flicker with grace periods and cooldowns

Guide State Machine (5 states):

stateDiagram-v2
    [*] --> NOMINAL
    NOMINAL --> GRACE_PERIOD: receive negative guide
    GRACE_PERIOD --> ERROR_COOLDOWN: timeout
    GRACE_PERIOD --> NOMINAL: same positive guide
    ERROR_COOLDOWN --> NOMINAL_COOLDOWN: timeout + positive
    ERROR_COOLDOWN --> ERROR: timeout + negative
    NOMINAL_COOLDOWN --> NOMINAL: timeout + positive
    ERROR --> NOMINAL_COOLDOWN: receive positive

Quality Checks for Reference Face

Check	Threshold	Default
Sunglasses	> 0.5	Enabled
Closed mouth	> 0.3 (mouth_aspect_ratio)	Enabled
Not smiling	> 0.51 (smile_ratio)	Enabled
Not blinking	< 0.21 (eye_aspect_ratio)	Enabled
Facing camera	> 0.23 (max head angle)	Enabled

Anti-Spoofing (PAD & Deepfake)

PADDetectorCVTask

Input: Video sent as binary WebSocket chunks (not WebRTC). Uses av library to demux.

Two-pronged detection:

PAD (Presentation Attack Detection): Per-frame, per-face
- Gets face location from FaceProcessor
- Calls PADDetectionEngine.getPADScores(imageId, locations) via RPC
- Returns paper/screen attack probability
- Weighted by sqrt(face_crop_area) — larger faces = more reliable scores
- Min face crop area: 17,280 pixels (360 * 480 * 0.1)
Deepfake Detection: Per-clip (8 frames)
- Builds clips of 8 consecutive frames with valid faces
- Sends clip (imageIds + landmarks) to PADDetectionEngine.processClipForDeepFake()
- Default max 10 clips per session

Final result (averaged across all frames/clips):

deepFake: Mean of all clip scores
padPaper: Weighted average of per-frame paper scores
padScreen: Weighted average of per-frame screen scores

Hologram Detection

V1: Video-Based (`HoloDetectorEngine`)

flowchart TD
    REF["Reference image (warped card)"]
    VID["Video stream of card under light"]
    SIFT["SIFT feature matching\n+ FLANN + Lowe's ratio (0.75)"]
    HOMO["Homography (RANSAC)\n+ perspective warp"]
    HSV["Store warped frames as HSV\nin HoloStack"]
    CALC["HoloCalculator:\n95th-5th percentile H range\nChi-squared uniformity test\nAdaptive thresholding"]
    SCORE["Score: detections in ROIs = 255\noutside = negative penalty"]

    REF --> SIFT
    VID --> SIFT --> HOMO --> HSV
    HSV -->|"MIN_FRAMES=50"| CALC --> SCORE

HoloWarp: SIFT feature detector + FLANN-based matcher. Lowe’s ratio test (0.75). Min keypoint pairs configurable per document (default 80). Rejects: outside corners, too small detections (< 10% of frame area).

HoloStack: Stores warped frames as HSV, resized to document dimensions. Appends along 4th axis (h, w, 3, num_frames).

HoloCalculator:

Pixel detection: 95th-5th percentile range of H channel across all frames
High range = pixel changes color significantly = hologram
Filtering: Minimum saturation and value thresholds
Chi-squared uniformity test: true hologram pixels should have uniform hue distribution
Adaptive thresholding: iteratively adjusts threshold to find between min/max hologram pixels
Score: detections inside hologram ROIs score 255, outside get negative penalty (distance * multiplier)

V2: Image-Based (`HoloImageEngine`)

User provides multiple static images of card
Each image: ORB feature matching → homography → align to reference
Histogram matching to normalize lighting
absdiff() between reference and aligned image
HSV filtering: saturation > 80, value > 30 → hologram candidates
Morphological opening to remove noise
Accumulate across images; require MIN_POSITIVE_SAMPLES=2 detections per pixel

Image distance check: Cosine similarity between grayscale flattened images. Rejects images with distance > maxImgDistanceScore (default 0.0185).

Per-Document Hologram Config

Documents can override base thresholds:

HUN-AO-03001: Lower saturation (20), higher uniformity (500), lower false positive penalty (0.75)

OCR System

Pipeline

flowchart TD
    IMG["Image"]
    TD["TextDetectorEngine.detect()\ntext_detection ONNX -> polygons"]
    TA["TextDetectorEngine.getTextAreas()\nbounding boxes with centers"]
    AO["DocumentOcrEngine.calcOffsetFromAnchors()\nalignment correction"]
    ROI["TextDetectorEngine.getRoisFromMask()\nfilter by document ROI definitions"]
    CROP["TextDetectorEngine.cropRois()\ncrop + optional rotation"]
    OCR["OcrEngine.getText()\nLSTM ONNX or EasyOCR"]
    TC["OcrEngine.textCorrection()\npost-processing"]

    IMG --> TD --> TA --> AO --> ROI --> CROP --> OCR --> TC

LSTM OCR

Custom CRNN model with LstmLabelConverter:

Character set includes Hungarian accented characters
CTC decoding (collapse repeated characters, remove blanks marked as ¤)
Returns text, confidence, per-character confidences

Text Correction Pipeline

Label removal: Split at : (e.g., “Vezetknv:NAGY” → “NAGY”)
Type-specific correction:
- "numbers": Replace letter-like digits (O→0, I→1, etc.)
- "letters": Replace digit-like letters (0→O, 1→I, etc.)
- [start, end, method]: Positional correction (e.g., first 2 chars = letters, next 7 = numbers)
- ["option1", "option2"]: Fuzzy match against known values (SequenceMatcher, 0.6 threshold)
- callable: Custom correction function
Whitelist filtering: Remove characters not in allowed set
Regex validation: Final result must match pattern

Utility Layer

LazyImage (`server/utils/lazy_image.py`)

Defers image loading until actually needed. Stores imageId, loads numpy array on first .array access. @LazyImage.convert decorator allows RPC functions to accept either imageId (string) or LazyImage.

Processing (`server/utils/processing.py`)

runAsyncProcess(func, *args): Runs function in a separate multiprocessing.Process with 60s timeout. Uses asyncio integration.

Expensive per-call

Spawns a new OS process for each call. Used for CPU-intensive operations that would block the event loop (hologram detection, face distance calculation).

StreamingBuffer (`server/utils/streaming_buffer.py`)

Thread-safe buffer for streaming binary data. Uses threading.RLock + Condition for synchronization. Supports async starvation signaling for flow control.

Other Utilities

File	Purpose
`fps_counter.py`	FPS tracking with 3-second sliding window
`async_extra.py`	AsyncSlot (single-value async container), waitForUntilEvent (cancellation pattern)
`warmup.py`	Model warmup runner (GPU only, 5 runs)
`webrtc.py`	WebRTC retry logic for TURN allocation mismatch
`roi.py`	ROI geometry utilities (rectangle, polygon, point-in-polygon)
`face.py`	Face position calculation, liveness visualization
`liveness.py`	Action sequence generation, reference face checks, thresholds
`image.py`	`writeOnImage()` helper
`dates.py`	Hungarian month names
`dict.py`	`getKeyByValue()` utility
`exceptions.py`	Custom exception classes (12 types)
`lstm_label_converter.py`	CTC decoder for LSTM OCR

Config System

`server/cfg.py`

Config file loading order (merged with jsonmerge):

config/{PYTHON_ENV}.json (dev or docker)
config/jsonschemas.json (request/response schemas)
config/scaling_presets/{SCALING_PRESET}.json (optional, docker only)
config/local.json (local overrides)

Environment Variables

Variable	Purpose	Values
`PYTHON_ENV`	Config environment	`dev` or `docker`
`ENV_VERSION`	Must match `config.requiredEnvVersion`	-
`DEV_DOMAIN`	Used to construct `config.host` in dev mode	-
`SCALING_PRESET`	Selects scaling preset in docker mode	e.g. `50rt_25non_rt`
`INFERENCE_DEVICE_MODE`	GPU mode	`cpu` / `gpu` / `force_gpu`
`WARMUP_NUM_RUNS`	Number of warmup iterations	Default: 5

Scaling Configuration

{
  "scaling": {
    "face": { "processes": 1, "workers": 1, "gpuEnabled": true },
    "websocket": { "processes": 2 },
    "http": { "processes": 2 }
  }
}

Production preset available: 50rt_25non_rt (Nvidia L40S, 50 real-time + 25 non-real-time clients).

Document whitelist: config.documentWhitelist restricts which documents are accepted.

JSON Schema validation: All HTTP request/response schemas validated at startup using Draft7Validator.

Security Analysis

Critical Issues

Pickle deserialization in AppCache (server/appcache.py:106): pickle.loads(raw) — arbitrary code execution if attacker can write to Redis. Keys are predictable.

No authentication on HTTP/WS endpoints: All endpoints are unauthenticated. Auth is presumably handled by Nginx/vuer_oss, but vuer_cv has no auth checks.

Redis as single point of failure: All image data, metadata, and RPC goes through Redis. No encryption, no auth visible in code.

Medium Risk

File path traversal in StreamPlayerCVTask: Validates file.startswith(config["streamPlayerDir"]) — prefix check, not proper path containment. However, os.path.abspath() is called first, which normalizes the path.

MIME type validation bypass: ImageUploadResource uses magic.from_buffer(first_1024_bytes). Polyglot files could bypass this.

No rate limiting: HTTP endpoints have no rate limiting.

Lower Risk

Error messages leak internal details: Exception messages and stack traces logged with logger.exception().

Hardcoded /workspace/vuer_cv/ paths: EasyOCR model path, warmup input path are hardcoded.

Command execution in StatusEngine: subprocess.run(['supervisorctl', 'status']), subprocess.run(['nvidia-smi', '-L']), subprocess.run(["lscpu"]) — safe (no user input in args), but worth noting.

See security-audit for tracking.

Performance Patterns

GPU Optimization

Per-model GPU control via config.scaling.{model}.gpuEnabled
TensorRT acceleration support
IO binding for GPU sessions (avoids data copy overhead)
Warmup runs (5x by default) to pre-heat GPU caches

Caching

Image metadata caching: computed results (landmarks, encodings) cached in Redis per imageId
LazyImage: defers numpy conversion until needed
Short TTLs: all Redis keys expire in 2-60 seconds (prevents memory bloat)

Parallelism

Multi-process: each model runs in separate Supervisor process
Multi-worker RPC: within a process, ThreadPoolExecutor for multiple RPC server instances
WebSocket processors: use multiprocessing.Process for frame processing
Frame dropping: processors drain input queue, keeping only latest frame

Bottlenecks

Known performance issues

Redis serialization: All numpy arrays serialized to JSON lists via NumpyEncoder for RPC. Face encodings (512 floats) and landmarks (98x2 floats) serialized/deserialized on every call.

Per-call process spawning: runAsyncProcess() creates a new multiprocessing.Process for each invocation (hologram detection, face distance). Significant overhead.

Sequential face pipeline: For each face: detection → landmarks → encoding runs sequentially through RPC. No batching across faces.

HoloStack memory: np.append() on every frame copies the entire stack — O(n^2) memory allocation for n frames.

Code Smells & Technical Debt

Marked in Code

OCR rewrite planned: Multiple comments ### this logic will be rewamped with standardized ocr api developement ### in:
- server/cv/document_ocr_engine.py:281
- server/cv/ocr_engine.py:18, 69, 323
TODO items:
- server/documents/hun_bo_05001_back.py:75: "roi": [35, 1, 400, 1], # TODO?
- server/documents/hun_bo_06001_back_po.py:75: Same TODO
NOSONAR suppressions (14 instances): Complexity warnings suppressed on critical methods like calcHoloMask, getRoisFromMask, processOutput in liveness tasks.

Structural Issues

Typo in filename

server/http/exeption_handler.py — “exeption” instead of “exception”.

Type confusion in error handling: FaceCompareResource returns HTTP 403 for generic exceptions (should be 500). Multiple endpoints use 403 for server errors.
Mixed concerns in FaceProcessor: OnDemandFaceProcessor tracks idleSmileRatio as instance state but is shared across frames. Couples calibration state to processor lifecycle.
Global mutable state: OcrEngine.CONFUSED_LETTERS and CONFUSED_NUMBERS are class-level mutable lists. Not a bug but fragile.
np.append in HoloStack: self.stack = np.append(self.stack, card, 3) copies entire array on each frame. Should use pre-allocated buffer or list-then-stack pattern.

See tech-debt for tracking.

Magic Numbers & Hardcoded Thresholds

Face Detection

Constant	Value	Location
BLINK_THRESHOLD	0.21	face_engine.py
SMILE_THRESHOLD	0.51	face_engine.py
SMILE_INCREASE_THRESHOLD	0.06	face_engine.py
HEAD_X_THRESHOLD	0.6	face_engine.py
HEAD_UP_THRESHOLD	0.25	face_engine.py
HEAD_DOWN_THRESHOLD	0.3	face_engine.py
Face distance threshold	0.6	face_cv_task.py, liveness_cv_task.py
MAX_FACE_DISTANCE_FROM_CENTER	0.35	face_processor.py
MIN_FACE_SIZE	0.03	face_processor.py
MAX_FACE_SIZE	0.25	face_processor.py

Document Processing

Constant	Value	Location
CLASSIFICATION_SCORE_THRESHOLD	0.3	card_classification_engine.py
MIN_SHARPNESS	80	document_processor.py
MAX_IMAGE_SIZE	3250x3250	image_upload.py
CORNER_PADDING_PERCENT	1.5	document_warp_v2.py

Hologram Detection

Constant	Value	Location
MIN_FRAMES	50	holo_detector_engine.py
MIN_POSITIVE_SAMPLES	2	holo_image_engine.py
ORB_MAX_FEATURES	500	holo_image_engine.py
ORB_KEEP_PERCENT	0.2	holo_image_engine.py
maxImgDistanceScore	0.0185	holo_image_engine.py
HOLO_INITIAL_THRESHOLD	80	document.py
HOLO_THRESHOLD_STEP	1	document.py
HOLO_MIN_THRESHOLD	20	document.py
HOLO_MAX_THRESHOLD	300	document.py
HOLO_UNIFORMITY_THRESHOLD	200	document.py
HOLO_HSV_MIN_SATURATION	40	document.py
HOLO_HSV_MIN_VALUE	40	document.py
HOLO_FALSE_POSITIVE_MULTIPLIER	10	document.py
HOLO_WARP_MIN_KP_PAIRS	80	document.py
filterMatches ratio	0.75	holo_warp.py

PAD Detection

Constant	Value	Location
MIN_FACE_CROP_AREA	17280	pad_detector_cv_task.py
clipSize	8	pad_detector_cv_task.py
maxNumClips	10	pad_detector_cv_task.py

Liveness V2

Constant	Value	Location
waitBeforeActionMs	2000	liveness_v2_cv_task.py
waitForStillImageMs	5000	liveness_v2_cv_task.py
sunglasses threshold	0.5	liveness.py
closed_mouth threshold	0.3	liveness.py
reference_face_head_angle	0.23	liveness.py
face_difference_head_angle	0.25	liveness.py
minResolution low	360	liveness.py
minResolution high	720	liveness.py

Other

Constant	Value	Location
TEXT_DETECTOR_CANVAS_SIZE	1024	text_detector_engine.py
TEXT_DETECTOR_PADDING	(0.01, 0.01)	text_detector_engine.py
SPEECH_DETECT_THRESHOLD	0.9	speech_detector_cv_task.py
FRAME_SIZE_TO_PROCESS_MS	200	speech_detector_cv_task.py
Barcode rotation angles	0, 15, -15, 30, -30, 45	barcode_engine.py
Sharpness resize target	320x240	sharpness_engine.py
Slave process start delay	10s	status_engine.py
WS up check after	86400s (24h)	status_engine.py
Redis image expire	60s	appcache.py
Redis audio expire	2s	appcache.py
WebRTC frame timeout	2s	frame_store.py
runAsyncProcess timeout	60s	processing.py

Complete File Index

Entry Points

File	Purpose
`app_face.py`	Face recognition RPC server
`app_card_detector.py`	Card detection RPC server
`app_text_detector.py`	Text detection RPC server
`app_ocr.py`	OCR RPC server
`app_mrz.py`	MRZ reading RPC server
`app_detectron2.py`	Detectron2 (barcode) RPC server
`app_pad.py`	PAD/deepfake RPC server
`app_onnx.py`	Background masking / speech RPC server
`app_http.py`	HTTP API (Falcon + uWSGI)
`app_websocket.py`	WebSocket server

CV Engines (`server/cv/`)

File	Class	Purpose
`face_engine.py`	FaceEngine	Geometric face analysis (actions, head pose)
`document_engine.py`	DocumentEngine	Document detection/classification facade
`document_ocr_engine.py`	DocumentOcrEngine	Structured document OCR orchestration
`ocr_engine.py`	OcrEngine	Character recognition + text correction
`barcode_engine.py`	BarcodeEngine	Barcode detection + reading with rotation
`card_classification_engine.py`	CardClassificationEngine	Document type classification
`card_integrity_check_engine.py`	CardIntegrityCheckEngine	Tampering detection
`card_warp_engine.py`	CardWarpEngine	Document corner detection + perspective warp
`background_mask_engine.py`	BackgroundMaskEngine	Background removal/replacement
`holo_detector_engine.py`	HoloDetectorEngine	Video-based hologram detection orchestrator
`holo_image_engine.py`	HoloImageEngine	Image-based hologram detection
`kaptcha_detect_engine.py`	KaptchaEngine	CAPTCHA solving
`pad_detection_engine.py`	PADDetectionEngine	Deepfake + PAD RPC facade
`sharpness_engine.py`	SharpnessEngine	Image sharpness (Laplacian variance)
`status_engine.py`	StatusEngine	System health monitoring
`text_detector_engine.py`	TextDetectorEngine	Text region detection

Hologram (`server/cv/holo/`)

File	Class	Purpose
`holo_calculator.py`	HoloCalculator	HSV analysis, chi-squared filtering, scoring
`holo_stack.py`	HoloStack	Frame accumulation in HSV space
`holo_warp.py`	HoloWarp	SIFT+FLANN alignment + perspective warp

HTTP Resources (`server/http/resources/`)

File	Class	Endpoint
`face_detect.py`	FaceDetectResource	Face detection + encodings
`face_compare.py`	FaceCompareResource	Face comparison
`face_draw.py`	FaceDrawResource	Landmark visualization
`face_age_gender.py`	FaceGenderAgeResource	Age/gender prediction
`reference_face_extract.py`	ReferenceFaceExtractResource	Quality-checked face extraction
`document_recognition_v2.py`	DocumentRecognitionResourceV2	Document detection + classification
`document_warp_v2.py`	DocumentWarpResourceV2	Full document warp pipeline
`document_ocr.py`	DocumentOcrResource	Document OCR
`document_types.py`	DocumentTypesResource	List document types
`card_warp.py`	CardWarpResource	Corner detection + warp only
`card_integrity_check.py`	CardIntegrityCheckResource	Tampering detection
`barcode.py`	BarCodeResource	Barcode reading
`barcode_detect.py`	BarCodeDetectResource	Barcode region detection
`background_mask.py`	BackgroundMaskResource	Background removal
`image_upload.py`	ImageUploadResource	Image upload
`image_download.py`	ImageDownloadResource	Image download
`kaptcha_decoder.py`	KaptchaDecoderResource	CAPTCHA solving
`mrz.py`	MrzResource	MRZ reading
`ocr.py`	OcrResource	Generic OCR
`sharpness.py`	SharpnessResource	Sharpness calculation
`ping.py`	PingResource	Health check
`status.py`	StatusResource	Server status
`exeption_handler.py`	ErrorBase	Global exception handler (typo in filename)

WebSocket Tasks (`server/websocket/task/`)

File	Class	Purpose
`cv_task.py`	AbstractCVTask, BasicCVTask, CVTask	Base task hierarchy
`face_cv_task.py`	FaceCVTask	Streaming face recognition
`document_cv_task.py`	DocumentCVTask	Streaming document detection
`liveness_cv_task.py`	LivenessCVTask	V1 liveness challenge
`liveness_v2_cv_task.py`	LivenessV2CVTask	V2 liveness with phases
`action_cv_task.py`	ActionCVTask	Single action verification
`barcode_cv_task.py`	BarcodeCVTask	Streaming barcode reading
`mrz_cv_task.py`	MrzCVTask	Streaming MRZ reading
`sharpness_cv_task.py`	SharpnessCVTask	Streaming sharpness check
`speech_detector_cv_task.py`	SpeechDetectorTask	Voice activity detection
`holo_video_cv_task.py`	HoloVideoCVTask	Video hologram detection
`holo_image_cv_task.py`	HoloImageCVTask	Image hologram detection (via WS)
`holo_v2_cv_task.py`	HoloV2CVTask	V2 hologram (ORB-based)
`pad_detector_cv_task.py`	PADDetectorCVTask	Anti-spoofing detection
`stream_player_cv_task.py`	StreamPlayerCVTask	Media file playback via WebRTC

WebSocket Processors (`server/websocket/processor/`)

File	Class	Purpose
`processor.py`	AbstractProcessor	Base multiprocessing processor
`face_processor.py`	FaceProcessor, OnDemandFaceProcessor	Face processing with config
`document_processor.py`	DocumentProcessor	Document detection pipeline
`barcode_processor.py`	BarcodeProcessor	Barcode reading
`holo_processor.py`	HoloProcessor	SIFT-based card warp for holo
`holo_v2_processor.py`	HoloV2Processor	ORB-based diff mask for holo
`mrz_processor.py`	MrzProcessor	MRZ reading via RPC
`sharpness_processor.py`	SharpnessProcessor	Sharpness calculation
`speech_processor.py`	SpeechProcessor	Speech detection via RPC

WebSocket FrameStores (`server/websocket/framestore/`)

File	Class	Purpose
`frame_store.py`	FrameStore	Base class for frame extraction
`numpy_frame_store.py`	NumpyFrameStore	Video → numpy via Redis
`raw_frame_store.py`	RawFrameStore	Video → JPEG via Redis
`audio_frame_store.py`	AudioFrameStore	Audio with resampling

WebSocket Core

File	Class	Purpose
`stream_handler.py`	StreamHandler	WebRTC signaling
`wsconnection.py`	WSConnection	WebSocket message routing

Documents (`server/documents/`)

File	Class	Purpose
`document.py`	AbstractDocument	Base document definition
`__init__.py`	-	Document registry (17 documents)
`hun_ao_03001.py`	HunAo03001	Hungarian ID card
`hun_bo_*.py`	HunBo*	Hungarian driving licenses (7 variants)
`hun_fo_*.py`	HunFo*	Hungarian personal IDs (4 variants)
`hun_ho_10001.py`	HunHo10001	Hungarian residence permit
`srb_ao_01001.py`	SrbAo01001	Serbian ID
`srb_bo_01001_*.py`	SrbBo01001*	Serbian driving license (front/back)

Utilities (`server/utils/`)

File	Purpose
`image.py`	`writeOnImage()` helper
`lazy_image.py`	LazyImage with Redis-backed metadata caching
`face.py`	Face position calculation, liveness visualization
`liveness.py`	Action sequence generation, reference face checks, thresholds
`processing.py`	`runAsyncProcess()` — spawn OS process for blocking work
`roi.py`	ROI geometry utilities (rectangle, polygon, point-in-polygon)
`streaming_buffer.py`	Thread-safe streaming buffer for PAD detection
`fps_counter.py`	FPS measurement with sliding window
`warmup.py`	Model warmup runner (GPU only, 5 runs)
`webrtc.py`	WebRTC retry logic for TURN allocation mismatch
`dates.py`	Hungarian month names
`dict.py`	`getKeyByValue()` utility
`exceptions.py`	Custom exception classes (12 types)
`async_extra.py`	AsyncSlot, waitForUntilEvent
`lstm_label_converter.py`	CTC decoder for LSTM OCR
`http_not_found.py`	404 handler

Config & Infrastructure

File	Purpose
`server/cfg.py`	Config loading, JSON schema validation
`server/appcache.py`	Redis-based image/audio/metadata cache
`server/rpc/pyredisrpc.py`	Redis RPC client/server

Project Structure

app_*.py              Entry points (10 processes)
server/               Python application code
  cv/                 CV engines (16 engines)
    holo/             Hologram detection (3 modules)
  http/               Falcon HTTP resources (22 endpoints)
    resources/
  websocket/          WebSocket system
    task/             CV tasks (15 task types)
    processor/        Multiprocessing processors (9 types)
    framestore/       Frame extraction (4 types)
  documents/          Document definitions (17 documents)
  utils/              Utility modules (15 files)
  rpc/                Redis RPC implementation
packages/             Internal Python packages
onnx/                 ONNX model weight files (Git LFS)
config/               Configuration files + scaling presets
data/                 Runtime data (EasyOCR models, class mappings)
requirements/         Python dependency management
  *.in                Direct dependencies
  *.txt               Pinned/compiled dependencies
  compile.sh          Compile dependencies
  install.sh          Install dependencies
setup/                Setup scripts
tests/                Test suites
web/                  Web assets
bin/                  CLI utilities

Development

# Development mode (via vuer_docker)
docker-compose -f vuer-cv-dev.yml up -d
 
# Install new packages
docker exec -it vuer_cv_dev bash
./requirements/compile.sh
./requirements/install.sh
 
# Upgrade all dependencies
./requirements/compile.sh --upgrade
 
# Git LFS (required for model weights)
git lfs pull

vuer_oss - Backend server that calls CV services via HTTP/WebSocket
vuer_css - Frontend that initiates WebRTC sessions for real-time CV
vuer_docker - Docker orchestration (vuer-cv.yml / vuer-cv-dev.yml / vuer-cv-gpu.yml)
FaceKom - Platform overview
security-audit - Security issues tracking
tech-debt - Technical debt tracking

Levandor

Explorer

vuer_cv

vuer_cv (Computer Vision Service)

Table of Contents

Architecture Overview

Supervisor Processes

Entry Point & Process Model

Canonical startup sequence (using app_face.py as example)

Scaling

Face RPC Functions

Caching Strategy

ONNX Runtime & Model Loading

GPU/CPU Selection (onnx/utils.py)

Mixins

Base Class (onnx/ml_model_runner.py)

ONNX Models (16 total)

Non-ONNX Models (CPU only)

GPU Inference Modes

Redis RPC System

Protocol (server/rpc/pyredisrpc.py)

Image Cache (AppCache)

server/appcache.py — Redis-Based Storage

CV Engines

FaceEngine (server/cv/face_engine.py)

DocumentEngine (server/cv/document_engine.py)

DocumentOcrEngine (server/cv/document_ocr_engine.py)

OcrEngine (server/cv/ocr_engine.py)

CardWarpEngine (server/cv/card_warp_engine.py)

CardClassificationEngine (server/cv/card_classification_engine.py)

CardIntegrityCheckEngine (server/cv/card_integrity_check_engine.py)

TextDetectorEngine (server/cv/text_detector_engine.py)

BarcodeEngine (server/cv/barcode_engine.py)

BackgroundMaskEngine (server/cv/background_mask_engine.py)

SharpnessEngine (server/cv/sharpness_engine.py)

PADDetectionEngine (server/cv/pad_detection_engine.py)

KaptchaEngine (server/cv/kaptcha_detect_engine.py)

StatusEngine (server/cv/status_engine.py)

HTTP API (REST Endpoints)

Face Endpoints

Document Endpoints

Other Endpoints

Image Upload Security

WebSocket System (Real-Time)

Architecture

WSConnection (server/websocket/wsconnection.py)

StreamHandler (server/websocket/stream_handler.py)

FrameStores

Processor (server/websocket/processor/processor.py)

CV Task Hierarchy

Key CV Tasks

Document Definitions

AbstractDocument (server/documents/document.py)

Supported Documents (17 total)

Example: HUN-AO-03001 (Hungarian ID card)

Document Classification Mapping

Face Pipeline

HTTP Path (batch)

WebSocket Path (streaming)

Face Comparison

Document Pipeline

Liveness Detection

V1 (LivenessCVTask)

V2 (LivenessV2CVTask)

Quality Checks for Reference Face

Anti-Spoofing (PAD & Deepfake)

PADDetectorCVTask

Hologram Detection

V1: Video-Based (HoloDetectorEngine)

V2: Image-Based (HoloImageEngine)

Per-Document Hologram Config

OCR System

Pipeline

LSTM OCR

Text Correction Pipeline

Utility Layer

LazyImage (server/utils/lazy_image.py)

Processing (server/utils/processing.py)

StreamingBuffer (server/utils/streaming_buffer.py)

Other Utilities

Canonical startup sequence (using `app_face.py` as example)

GPU/CPU Selection (`onnx/utils.py`)

Base Class (`onnx/ml_model_runner.py`)

Protocol (`server/rpc/pyredisrpc.py`)

`server/appcache.py` — Redis-Based Storage

FaceEngine (`server/cv/face_engine.py`)

DocumentEngine (`server/cv/document_engine.py`)

DocumentOcrEngine (`server/cv/document_ocr_engine.py`)

OcrEngine (`server/cv/ocr_engine.py`)

CardWarpEngine (`server/cv/card_warp_engine.py`)

CardClassificationEngine (`server/cv/card_classification_engine.py`)

CardIntegrityCheckEngine (`server/cv/card_integrity_check_engine.py`)

TextDetectorEngine (`server/cv/text_detector_engine.py`)

BarcodeEngine (`server/cv/barcode_engine.py`)

BackgroundMaskEngine (`server/cv/background_mask_engine.py`)

SharpnessEngine (`server/cv/sharpness_engine.py`)

PADDetectionEngine (`server/cv/pad_detection_engine.py`)

KaptchaEngine (`server/cv/kaptcha_detect_engine.py`)

StatusEngine (`server/cv/status_engine.py`)

WSConnection (`server/websocket/wsconnection.py`)

StreamHandler (`server/websocket/stream_handler.py`)

Processor (`server/websocket/processor/processor.py`)

AbstractDocument (`server/documents/document.py`)

V1 (`LivenessCVTask`)

V2 (`LivenessV2CVTask`)

V1: Video-Based (`HoloDetectorEngine`)

V2: Image-Based (`HoloImageEngine`)

LazyImage (`server/utils/lazy_image.py`)

Processing (`server/utils/processing.py`)

StreamingBuffer (`server/utils/streaming_buffer.py`)

`server/cfg.py`

CV Engines (`server/cv/`)

Hologram (`server/cv/holo/`)

HTTP Resources (`server/http/resources/`)

WebSocket Tasks (`server/websocket/task/`)

WebSocket Processors (`server/websocket/processor/`)

WebSocket FrameStores (`server/websocket/framestore/`)

Documents (`server/documents/`)

Utilities (`server/utils/`)