vuer_cv (Computer Vision Service)

Role

FaceKom’s computer vision and ML inference service. Provides face detection, face comparison, liveness detection, document recognition, OCR, MRZ reading, barcode detection, hologram verification, anti-spoofing (PAD/deepfake), background masking, and speech detection via HTTP and WebSocket APIs. Communicates with vuer_oss through HTTP/WebSocket, and internally uses Redis RPC between its own ~10 Supervisor-managed processes.

PropertyValue
RuntimePython (uWSGI + asyncio)
HTTP FrameworkFalcon (WSGI)
InferenceONNX Runtime (GPU/CPU), PyTorch, Detectron2
RepoTechTeamer/vuer_cv
Path (remote)/workspace/vuer_cv
Path (local mount)/Users/levander/coding/mnt/Facekom/vuer_cv
Models16 ONNX + 5 non-ONNX (Git LFS)
Processes~10 Supervisor-managed

Table of Contents

  1. Architecture Overview
  2. Entry Point & Process Model
  3. ONNX Runtime & Model Loading
  4. Redis RPC System
  5. Image Cache (AppCache)
  6. CV Engines
  7. HTTP API (REST Endpoints)
  8. WebSocket System (Real-Time)
  9. Document Definitions
  10. Face Pipeline
  11. Document Pipeline
  12. Liveness Detection
  13. Anti-Spoofing (PAD & Deepfake)
  14. Hologram Detection
  15. OCR System
  16. Utility Layer
  17. Config System
  18. Security Analysis
  19. Performance Patterns
  20. Code Smells & Technical Debt
  21. Magic Numbers & Hardcoded Thresholds
  22. Complete File Index

Architecture Overview

vuer_cv is a multi-process Python service. Each process loads different ONNX ML models and communicates via Redis RPC. The HTTP/WebSocket servers are the entry points; satellite processes handle face recognition, document detection, OCR, MRZ reading, text detection, background masking, PAD/deepfake detection, and speech detection.

                        +------------------+
                        |   vuer_cv (main)  |
                        | HTTP (Falcon)     |
                        | WebSocket (ws)    |
                        +--------+---------+
                                 |
                          Redis RPC (blpop)
                                 |
         +-----------+-----------+-----------+-----------+
         |           |           |           |           |
    app_face.py  app_card*.py  app_ocr.py  app_pad.py  app_onnx.py
    (face RPC)   (card RPC)   (OCR RPC)   (PAD RPC)   (bg/speech)

Supervisor Processes

RPC QueuePurposeEntry Point
faceFace detection, landmarks, encoding, gender/age, sunglassesapp_face.py
cardDetectorCard corner detection, classification, integrityapp_card_detector.py
ocrDocument OCR, generic OCR, KAPTCHAapp_ocr.py
mrzMRZ reading from documentsapp_mrz.py
textDetectorText region detection in imagesapp_text_detector.py
detectron2Barcode detection (via Detectron2)app_detectron2.py
padDeepfake detection, PAD (paper/screen) scoringapp_pad.py
onnxBackground masking, speech detectionapp_onnx.py
-HTTP API (uWSGI)app_http.py
-WebSocket real-time processingapp_websocket.py

All managed by Supervisor with Nginx reverse proxy (config generated at runtime via Jinja2 templates from scaling settings).


Entry Point & Process Model

Canonical startup sequence (using app_face.py as example)

  1. Import config from server/cfg.py
  2. Call checkGpuEnabled(config, "face") — checks if GPU should be disabled per scaling config
  3. Import ONNX models (face_detection, face_landmark_detection, face_encoder, gender_age_prediction, sunglasses_detection)
  4. Connect to Redis, create AppCache
  5. Define decorated functions using @LazyImage.convert (makes them callable via RPC with just imageId)
  6. Run model warmup (5 runs by default, configurable via WARMUP_NUM_RUNS env var)
  7. Start Redis RPC server

Scaling

If config.scaling.face.workers > 1, uses ThreadPoolExecutor to run multiple RPC server instances in parallel within the same process (sharing models in memory).

Face RPC Functions

FunctionPurpose
getLocations(imageId, threshold)Face bounding boxes
getLandmarks(imageId)98-point face landmarks
getEncodings(imageId)Face embeddings (for comparison)
getSunglassesScores(imageId)Sunglasses detection probability
getGenderAgePredictions(imageId, box)Gender probability + age
calcFacePositions(imageId)Distance from image center
calcFaceDistance(enc1, enc2)Cosine distance between face embeddings
detectFace(imageId)Combined detection + encoding
drawFace(imageId)Debug visualization with landmark markers
compareFaces(img1, img2)Full compare pipeline

Caching Strategy

Results are memoized per-image via LazyImage.getMeta() which stores computed results in Redis (keyed by metadata:{imageId}:{metaName}). Repeated calls for the same image ID return cached landmarks/encodings/etc.


ONNX Runtime & Model Loading

GPU/CPU Selection (onnx/utils.py)

DeviceMode enum: cpu, gpu, force_gpu

createInferenceSession() handles GPU fallback logic:

  • INFERENCE_DEVICE_MODE env var controls mode
  • gpu mode: tries CUDA, falls back to CPU on error (default)
  • force_gpu: no fallback, crashes if GPU unavailable
  • cpu: CPU only
  • GPU detection: calls nvidia-smi -L and checks output
  • Supports TensorRT acceleration if useTensorRt=True

runSession() has two execution paths:

  • CPU: Direct session.run()
  • GPU: Uses io_binding() for zero-copy GPU memory transfers (bind_cpu_input copy_outputs_to_cpu)

Global env mutation

checkGpuEnabled() mutates os.environ globally when disabling GPU for a specific model, affecting all subsequent model loads in the same process.

Mixins

  • NdarrayNormalizerMixin: ImageNet-style normalization (mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
  • AspectPreservingResizePadMixin: Resize with letterboxing to square, returns scale/offset for coordinate back-mapping

Base Class (onnx/ml_model_runner.py)

Abstract base class for all ONNX model runners. Constructor loads model via createInferenceSession() and runs random warmup. Subclasses implement _setup() (returns warmup shape/dtype) and _runSession().

ONNX Models (16 total)

ModelPurposeUsed By
face_detectionFace bounding box detectionapp_face.py
face_landmark_detection98-point facial landmarksapp_face.py
face_encoderFace embedding generationapp_face.py
gender_age_predictionGender probability + age estimationapp_face.py
sunglasses_detectionSunglasses detectionapp_face.py
localizationDocument corner detectioncard detector
card_classificationDocument type classificationcard detector
card_integrityDocument tampering detectioncard detector
text_detectionText region detection (CRAFT-like)text detector
lstmCharacter recognition (CRNN)OCR process
barcode_detectionBarcode region detection (Detectron2)detectron2
kaptchaCAPTCHA solvingOCR process
deepfake_detectionVideo deepfake detectionPAD process
pad_predictionPresentation attack detection (paper/screen)PAD process
background_maskingPerson segmentation / background removalonnx process
speechDetectionVoice activity detectiononnx process

Non-ONNX Models (CPU only)

ModelFunctionRuntime
card_detector_classification_svm.pklSVM classificationScikit-learn
easyOcr/cyrillic_g2.pthSerbian Cyrillic OCREasyOCR (PyTorch)
easyOcr/latin_g2.pthSerbian Latin OCREasyOCR (PyTorch)
detectron2_card_detector.pthDocument recognitionDetectron2 (PyTorch)
detectron2_barcode_detector.pthBarcode detectionDetectron2 (PyTorch)
tesseract OCR data filesMRZ readingPassporteye

GPU Inference Modes

ModeBehavior
cpuForce all models to CPU
gpuTry CUDA, fallback to CPU (default)
force_gpuRequire CUDA, fail if unavailable

Requires NVIDIA Container Toolkit on host.


Redis RPC System

Protocol (server/rpc/pyredisrpc.py)

Based on gowhari/pyredisrpc (MIT license), customized.

sequenceDiagram
    participant Client as HTTP/WS Process
    participant Redis
    participant Server as Model Process (e.g. app_face.py)

    Client->>Redis: LPUSH rpc:{serviceName} {id, method, params, tmchk}
    Client->>Redis: SET rpc:{reqId}:tmchk (TTL)
    Server->>Redis: BLPOP rpc:{serviceName}
    Redis-->>Server: request JSON
    Server->>Server: Process (run ONNX model)
    Server->>Redis: Check rpc:{reqId}:tmchk not expired
    Server->>Redis: LPUSH rpc:{reqId} {id, result, error}
    Client->>Redis: BLPOP rpc:{reqId} (timeout)
    Redis-->>Client: response JSON

Magic __getattr__: RedisRpcClient uses __getattr__ to dynamically generate method calls. faceRpcClient.detectFace(imageId) becomes call("detectFace", [[imageId], {}]).

NumpyEncoder: Custom JSON encoder that converts np.ndarray to lists for serialization. All numpy arrays are serialized to JSON lists for Redis transport.

Performance concern

Every RPC call serializes numpy arrays to JSON, passes through Redis, and deserializes. For face encodings (512-float vectors) and landmarks (98x2 floats), this adds latency on every call.


Image Cache (AppCache)

server/appcache.py — Redis-Based Storage

Key PatternFormatTTLPurpose
raw-image:{uuid}Original bytes (JPEG/PNG)60sUploaded images
numpy-image:{uuid}Binary ndarray + 12-byte header (h,w,c as big-endian uint32)60sDecoded images
metadata:{uuid}:{name}Pickled Python objects60sFace locations, encodings, etc.
audio-frame:{uuid}1B layout + 15B format + 3B sample_rate + payload2sAudio frames

Lazy conversion: getNumPyImage() auto-converts raw images to numpy if the numpy version doesn’t exist (using imageio_imread with EXIF rotation). Alpha channels are stripped.

Pickle deserialization

Metadata uses pickle.loads() for deserialization (server/appcache.py:106). If an attacker can write to Redis, they can achieve arbitrary code execution. Metadata keys are predictable (metadata:{imageId}:{metaName}). See security-audit.


CV Engines

FaceEngine (server/cv/face_engine.py)

Computes facial action units and head pose from landmarks. Does NOT do ML inference — uses geometric calculations on the 98-point landmark output.

Computed features:

FeatureMethodDescription
eyeRatio(face, side)height/widthEye aspect ratio — blink detection
smileRatio(face)mouth width / face widthSmile detection
mouthRatio(face)mouth height / widthMouth openness
headX(face)ear-to-nose distance ratioHorizontal turn (-1 left to +1 right)
headY(face)ear-line vs nose positionVertical tilt
headTilt(face)arctan2 of ear positionsHead rotation

Action detection thresholds:

ActionThresholdMetric
Blink< 0.21Both eye aspect ratios
Smile> 0.51 (or idle + 0.06)Smile ratio
Look left< -0.6headX
Look right> 0.6headX
Look up> 0.25headY
Look down< -0.3headY (asymmetric!)

Asymmetric thresholds

Look up/down thresholds are asymmetric (0.25 vs 0.3), presumably because looking down produces more distinct landmark displacement.

DocumentEngine (server/cv/document_engine.py)

Thin facade over RPC calls to cardDetector process:

  • detect(imageId) — calls rpcClient.detectCard() (localization model)
  • classify(imageId) — calls rpcClient.classifyCard() (classification model)
  • checkCardIntegrity(imageId) — calls rpcClient.checkCardIntegrity()
  • resizeToAspectRatio(warpedCardId, documentName) — resizes warped card to match document template

DocumentOcrEngine (server/cv/document_ocr_engine.py)

Orchestrates OCR for recognized documents. Most complex engine.

Flow:

  1. Instantiate document class with image
  2. For each requested OCR key: a. Try MRZ first (if document has MRZ and key maps to MRZ field) b. If MRZ unavailable/invalid, detect text regions (TextDetectorEngine) c. Calculate anchor offsets to compensate for document alignment d. Crop text ROIs and run OCR
  3. Apply corrections (number/letter confusion fixing, regex validation)
  4. Retry failed keys with deskewing

ThreadPoolExecutor: Uses cpu_cores/4 workers (min 1, max 8) for parallel OCR.

Anchor system: Documents define “anchor points” — known text positions used to calculate offset between template and actual image, compensating for warping imperfections.

OcrEngine (server/cv/ocr_engine.py)

OCR backends:

  1. LSTM OCR (default): Custom CRNN model via ONNX
  2. EasyOCR rs_cyrillic: For Serbian Cyrillic documents (config-gated)
  3. EasyOCR rs_latin: For Serbian Latin documents (config-gated)

EasyOCR downloads are disabled (download_enabled=False), models stored at /workspace/vuer_cv/data/easyOcr.

Character confusion correction:

CONFUSED_LETTERS: 0->O, 1->I, 2->Z, 3->B, 4->A, 5->S, 6->G, 7->T, 8->B, 9->Y
CONFUSED_NUMBERS: D->0, Q->0, O->0, I->1, Z->2, A->4, S->5, G->6, T->7, B->8, Y->9

CardWarpEngine (server/cv/card_warp_engine.py)

Detects document corners and applies perspective transform: onnx.localization.main.detectCorners() cv2.getPerspectiveTransform cv2.warpPerspective.

CardClassificationEngine (server/cv/card_classification_engine.py)

Classifies document type from warped card image. CLASSIFICATION_SCORE_THRESHOLD = 0.3 — returns None if confidence < 30%.

CardIntegrityCheckEngine (server/cv/card_integrity_check_engine.py)

Detects document tampering. Returns max anomaly score.

TextDetectorEngine (server/cv/text_detector_engine.py)

Detects text regions in images for OCR. CANVAS_SIZE = 1024 (images resized to this for detection). PADDING = 1% horizontal and vertical added to detected boxes.

Complex ROI logic (getRoisFromMask): Filters text detections by center point within specified ROI, height > 2x padding, blacklist filtering, target height adjustment, and multi-line splitting support.

BarcodeEngine (server/cv/barcode_engine.py)

Uses Detectron2 model via RPC for barcode localization, then pyzbar library for decoding. Tries 6 rotation angles: 0, 15, -15, 30, -30, 45 degrees.

BackgroundMaskEngine (server/cv/background_mask_engine.py)

Person segmentation and background replacement. Alpha blending: result = image * mask + background * (1 - mask). Default white background.

SharpnessEngine (server/cv/sharpness_engine.py)

Image quality via Laplacian variance: cv2.Laplacian(resized, CV_8U).var(), clamped to [0, 255]. Converts to grayscale, rotates to horizontal if portrait, resizes to fit 320x240.

PADDetectionEngine (server/cv/pad_detection_engine.py)

Thin RPC wrapper for deepfake and presentation attack detection:

  • processClipForDeepFake(clip) — sends video clip frames for deepfake analysis
  • getPADScores(imageId, locations) — gets paper/screen attack scores per face

KaptchaEngine (server/cv/kaptcha_detect_engine.py)

CAPTCHA solving via OCR process RPC.

StatusEngine (server/cv/status_engine.py)

Health monitoring and hardware status. Health levels: green/yellow/red.

  • Checks all Supervisor processes via supervisorctl status
  • Yellow if any process not RUNNING or started >10s after main process
  • Checks WebSocket process logs for “LISTENING” after 24h uptime
  • Reports CPU (model, cores, frequency, utilization), GPU (via nvidia-smi), RAM info

HTTP API (REST Endpoints)

Framework: Falcon (WSGI). All endpoints use JSON schema validation.

Face Endpoints

MethodPathResource ClassPurpose
POST/face/detectFaceDetectResourceDetect faces, encodings, landmarks, actions
POST/face/compareFaceCompareResourceCompare two face images (cosine distance)
POST/face/drawFaceDrawResourceDraw landmarks on image
POST/face/age-genderFaceGenderAgeResourcePredict age and gender
POST/face/reference-extractReferenceFaceExtractResourceExtract reference face with quality checks

Document Endpoints

MethodPathResource ClassPurpose
POST/document/recognitionDocumentRecognitionResourceV2Detect + classify document
POST/document/warpDocumentWarpResourceV2Detect + warp + classify + sharpness
POST/document/ocrDocumentOcrResourceOCR on recognized document
POST/document/card-warpCardWarpResourceJust corner detection + warp
POST/document/card-integrityCardIntegrityCheckResourceTampering score
GET/document/typesDocumentTypesResourceList supported document types

Other Endpoints

MethodPathResource ClassPurpose
POST/image/uploadImageUploadResourceUpload image (max 3250x3250, auto-downsize)
POST/image/downloadImageDownloadResourceDownload image as PNG
POST/barcodeBarCodeResourceRead barcodes from image
POST/barcode/detectBarCodeDetectResourceDetect barcode regions
POST/mrzMrzResourceRead MRZ from image
POST/ocrOcrResourceGeneric OCR
POST/sharpnessSharpnessResourceCalculate image sharpness
POST/kaptchaKaptchaDecoderResourceDecode CAPTCHA image
POST/background-maskBackgroundMaskResourceBackground removal/replacement
GET/pingPingResourceHealth check
GET/statusStatusResourceDetailed server status

Image Upload Security

ImageUploadResource:

  • MIME type validation via python-magic (reads first 1024 bytes)
  • Supported types from config (config.image.supportedMimeTypes)
  • Max resolution 3250x3250, auto-downsized while preserving EXIF
  • Images stored in Redis with 60s TTL

WebSocket System (Real-Time)

Architecture

flowchart TD
    Browser["Browser"]
    WS["WSConnection\n(pyee EventEmitter)"]
    SH["StreamHandler\n(WebRTC signaling via aiortc)"]
    FS["FrameStore\n(video/audio frame extraction)"]
    P["Processor\n(multiprocessing.Process per worker)"]
    CVT["CVTask\n(orchestrates pipelines)"]

    Browser <-->|"WebSocket JSON + binary"| WS
    WS --> SH
    SH --> FS
    FS --> P
    P --> CVT
    CVT -->|"results via WS"| Browser

WSConnection (server/websocket/wsconnection.py)

Event-based message routing using pyee.asyncio.AsyncIOEventEmitter. Messages are JSON with {type, payload} structure. Binary messages emit “binary” event.

StreamHandler (server/websocket/stream_handler.py)

Handles WebRTC signaling (offer/answer). Uses aiortc for server-side WebRTC. Supports custom ICE servers passed from client or from config. Retry logic: webrtcConnectWithAllocationMismatchRetry() retries 3 times on TURN 437 errors.

FrameStores

ClassPurposeStorage
NumpyFrameStoreVideo frames as numpy arraysRedis NumPy format, 2s TTL
RawFrameStoreVideo frames as JPEGRedis raw format, 2s TTL
AudioFrameStoreAudio with resampling + accumulationRedis audio format, 2s TTL

Processor (server/websocket/processor/processor.py)

Abstract multiprocessing processor using multiprocessing.Queue for IPC:

  • input() puts imageId into input queue (non-blocking)
  • Background process reads from input queue, calls doProcess(), puts result in output queue
  • Frame dropping: If input queue has items when process finishes one, drains queue (keeps only latest)
  • Cleanup: Graceful shutdown with sentinel values SIGTERM SIGKILL with escalating timeouts

CV Task Hierarchy

classDiagram
    AbstractCVTask <|-- BasicCVTask
    BasicCVTask <|-- CVTask
    CVTask <|-- FaceCVTask
    CVTask <|-- DocumentCVTask
    CVTask <|-- LivenessCVTask
    CVTask <|-- LivenessV2CVTask
    CVTask <|-- BarcodeCVTask
    CVTask <|-- MrzCVTask
    CVTask <|-- SharpnessCVTask
    CVTask <|-- SpeechDetectorTask
    CVTask <|-- ActionCVTask
    CVTask <|-- HoloVideoCVTask
    CVTask <|-- HoloImageCVTask
    CVTask <|-- HoloV2CVTask
    AbstractCVTask <|-- PADDetectorCVTask
    class AbstractCVTask {
        +start()
        +stop()
    }
    class BasicCVTask {
        +FrameStore
        +Processor
    }
    class CVTask {
        +StreamHandler (WebRTC)
        +FPSCounter
    }
    class PADDetectorCVTask {
        standalone, no WebRTC
        binary WS chunks
    }

StreamPlayerCVTask is completely separate — plays pre-recorded media files via WebRTC.

Key CV Tasks

FaceCVTask: Accepts recognitionType: "smile" | "blink" and optional faceToFind encoding. Detects face, waits for specified action, compares with faceToFind if provided (threshold default 0.6).

DocumentCVTask: Streaming document detection with sharpness check.

PADDetectorCVTask: Receives video as binary WebSocket chunks (not WebRTC). Uses StreamingBuffer with av for demuxing. Builds clips of 8 frames for deepfake detection. PAD scoring weighted by face crop area.

StreamPlayerCVTask: Plays pre-recorded media files via WebRTC. Directory restriction: config.streamPlayerDir must be set and file must be within it.


Document Definitions

AbstractDocument (server/documents/document.py)

Base class for all document definitions. Defines:

  • Static document dimensions (width, height in “document coordinates”)
  • Face ROI location (for face extraction from ID photos)
  • Hologram ROI definitions (position, density, weight)
  • MRZ ROI location
  • OCR field definitions (ROIs, regex, corrections, whitelists)
  • Anchor points for OCR alignment correction

Supported Documents (17 total)

Hungarian (HUN) — 14 documents:

ClassNameType
HunAo03001HUN-AO-03001ID card (address side)
HunBo03004BackHUN-BO-03004_BACKDriving license back
HunBo03004FrontHUN-BO-03004_FRONTDriving license front
HunBo04001FrontHUN-BO-04001_FRONTDriving license front (newer)
HunBo05001BackHUN-BO-05001_BACKDriving license back (newer)
HunBo05001FrontHUN-BO-05001_FRONTDriving license front (newer)
HunBo06001BackPOHUN-BO-06001_BACK_PODriving license back (latest)
HunBo06001FrontHUN-BO-06001_FRONTDriving license front (latest)
HunBo07001FrontHUN-BO-07001_FRONTDriving license front (latest+)
HunFo02001BackHUN-FO-02001_BACKPersonal ID back
HunFo02001FrontHUN-FO-02001_FRONTPersonal ID front
HunFo04001BackHUN-FO-04001_BACKPersonal ID back (newer)
HunFo04001FrontHUN-FO-04001_FRONTPersonal ID front (newer)
HunHo10001HUN-HO-10001Residence permit

Serbian (SRB) — 3 documents:

ClassNameType
SrbAo01001SRB-AO-01001Serbian ID
SrbBo01001BackSRB-BO-01001_BACKDriving license back
SrbBo01001FrontSRB-BO-01001_FRONTDriving license front

Example: HUN-AO-03001 (Hungarian ID card)

  • Dimensions: 940 x 650 (document coordinates)
  • Face ROI: [15, 135, 260, 335] (x, y, w, h)
  • MRZ ROI: [10, 490, 920, 130]
  • Hologram ROIs: 2 regions (main hologram + OVI security feature)
  • OCR fields: documentId, type, code, lastName, firstName, birthName, nationality, dateOfBirth, sex, placeOfBirth, dateOfIssue, authority, dateOfExpiry
  • Anchor-based alignment: Uses documentId and dateOfExpiry as anchor points with max 25px offset tolerance
  • Custom corrections: correctDate() (Hungarian month names), correctNationality() (fuzzy match “MAGYAR/HUNGARIAN”), correctAddress() (Hungarian address format), correctAuthority() (issuing authority names)

Document Classification Mapping

config.DOCUMENT_CLASS_ID_MAPPING maps document names to integer class IDs (loaded from data/card_detector_classification_classId_mapping.json). The classification model outputs these integer IDs.


Face Pipeline

HTTP Path (batch)

flowchart LR
    Client["Client POST /face/detect"]
    FDR["FaceDetectResource"]
    RPC["Redis RPC -> app_face.py"]
    FD["face_detection ONNX"]
    FL["face_landmark_detection ONNX"]
    FE["face_encoder ONNX"]
    Cache["Redis metadata cache"]

    Client --> FDR --> RPC
    RPC --> FD -->|"bounding boxes"| FL -->|"98-point landmarks"| FE -->|"512-d embeddings"| Cache
Client POST /face/detect {imageId}
  -> FaceDetectResource
    -> faceRpcClient.detectFace(imageId) [Redis RPC to app_face.py]
      -> getLocations(imageId) -> detectFaces(image, threshold) [face_detection ONNX]
      -> getLandmarks(imageId) -> detectFaceLandmarks(image, box) [face_landmark_detection ONNX] per face
      -> getEncodings(imageId) -> getFaceEncodings(image, landmark) [face_encoder ONNX] per face
    -> Results cached per imageId in Redis metadata

WebSocket Path (streaming)

Browser <-> WebRTC <-> StreamHandler <-> NumpyFrameStore
  -> FaceProcessor (multiprocessing)
    -> faceRpcClient.getLandmarks(imageId) [Redis RPC]
    -> faceRpcClient.getLocations(imageId) [Redis RPC]
    -> faceRpcClient.getEncodings(imageId) [Redis RPC]
    -> FaceEngine.getFaceData(landmarks) [geometric computation]
    -> FaceEngine.userActions(faceData) [threshold checks]
  -> FaceCVTask.processOutput() [send results via WebSocket]

Face Comparison

cosine_distances(enc1.reshape(1,-1), enc2.reshape(1,-1))[0][0]

Using sklearn’s cosine_distances. Distance 0 = identical, 1 = completely different. Default threshold: 0.6.


Document Pipeline

flowchart TD
    IMG["Input Image"]
    DET["DocumentEngine.detect()\nlocalization ONNX -> 4 corners"]
    WARP["CardWarpEngine.warp()\nperspective transform"]
    CLS["DocumentEngine.classify()\ncard_classification ONNX -> classId"]
    LOOKUP["getDocumentByClassId()\nlookup document definition"]
    SHARP["SharpnessEngine.calcSharpness()\nLaplacian variance check"]
    RESIZE["DocumentEngine.resizeToAspectRatio()"]
    OCR["DocumentOcrEngine.run()"]
    MRZ["1. Try MRZ first"]
    TEXT["2. Detect text regions"]
    ANCHOR["3. Calculate anchor offsets"]
    CROP["4. Crop ROIs per field"]
    RECOG["5. Run character recognition"]
    CORRECT["6. Apply corrections"]
    RETRY["7. Retry failed with deskewing"]

    IMG --> DET --> WARP --> CLS --> LOOKUP --> SHARP --> RESIZE --> OCR
    OCR --> MRZ --> TEXT --> ANCHOR --> CROP --> RECOG --> CORRECT --> RETRY

Liveness Detection

V1 (LivenessCVTask)

Protocol (WebSocket messages):

  1. Client sends video stream via WebRTC
  2. Server detects face, validates position/size
  3. Server sends {type: "liveness", task: "face"} — “show your face”
  4. Server confirms face, sends random action: {task: "smile"}
  5. Client performs action
  6. Server evaluates: success (+1) or wrong_action/too_many_actions (-1)
  7. Repeat until score >= successScore (3) or score <= failScore (-2)
  8. Optional faceToFind encoding comparison on each frame

Face position validation:

  • Must be within 35% of center (both axes)
  • Face size between 3-25% of image area
  • Guide messages: “move_head_up/down/left/right/closer/away”

V2 (LivenessV2CVTask)

Two phases:

Phase 1: Reference Face Capture

  1. Wait 3 seconds for user to position face
  2. Capture low-quality reference (min 360px)
  3. Request high-quality still image from client (min 720px)
  4. Run quality checks: resolution, single person, face size, face position, extras (sunglasses, closed mouth, no smile, eyes open, facing camera)
  5. Compare low/high quality face encodings

Phase 2: Action Challenges (same as V1 but with progress)

  • Reports action progress (0.0 to 1.0) per frame
  • Guide state machine prevents UI flicker with grace periods and cooldowns

Guide State Machine (5 states):

stateDiagram-v2
    [*] --> NOMINAL
    NOMINAL --> GRACE_PERIOD: receive negative guide
    GRACE_PERIOD --> ERROR_COOLDOWN: timeout
    GRACE_PERIOD --> NOMINAL: same positive guide
    ERROR_COOLDOWN --> NOMINAL_COOLDOWN: timeout + positive
    ERROR_COOLDOWN --> ERROR: timeout + negative
    NOMINAL_COOLDOWN --> NOMINAL: timeout + positive
    ERROR --> NOMINAL_COOLDOWN: receive positive

Quality Checks for Reference Face

CheckThresholdDefault
Sunglasses> 0.5Enabled
Closed mouth> 0.3 (mouth_aspect_ratio)Enabled
Not smiling> 0.51 (smile_ratio)Enabled
Not blinking< 0.21 (eye_aspect_ratio)Enabled
Facing camera> 0.23 (max head angle)Enabled

Anti-Spoofing (PAD & Deepfake)

PADDetectorCVTask

Input: Video sent as binary WebSocket chunks (not WebRTC). Uses av library to demux.

Two-pronged detection:

  1. PAD (Presentation Attack Detection): Per-frame, per-face

    • Gets face location from FaceProcessor
    • Calls PADDetectionEngine.getPADScores(imageId, locations) via RPC
    • Returns paper/screen attack probability
    • Weighted by sqrt(face_crop_area) — larger faces = more reliable scores
    • Min face crop area: 17,280 pixels (360 * 480 * 0.1)
  2. Deepfake Detection: Per-clip (8 frames)

    • Builds clips of 8 consecutive frames with valid faces
    • Sends clip (imageIds + landmarks) to PADDetectionEngine.processClipForDeepFake()
    • Default max 10 clips per session

Final result (averaged across all frames/clips):

  • deepFake: Mean of all clip scores
  • padPaper: Weighted average of per-frame paper scores
  • padScreen: Weighted average of per-frame screen scores

Hologram Detection

V1: Video-Based (HoloDetectorEngine)

flowchart TD
    REF["Reference image (warped card)"]
    VID["Video stream of card under light"]
    SIFT["SIFT feature matching\n+ FLANN + Lowe's ratio (0.75)"]
    HOMO["Homography (RANSAC)\n+ perspective warp"]
    HSV["Store warped frames as HSV\nin HoloStack"]
    CALC["HoloCalculator:\n95th-5th percentile H range\nChi-squared uniformity test\nAdaptive thresholding"]
    SCORE["Score: detections in ROIs = 255\noutside = negative penalty"]

    REF --> SIFT
    VID --> SIFT --> HOMO --> HSV
    HSV -->|"MIN_FRAMES=50"| CALC --> SCORE

HoloWarp: SIFT feature detector + FLANN-based matcher. Lowe’s ratio test (0.75). Min keypoint pairs configurable per document (default 80). Rejects: outside corners, too small detections (< 10% of frame area).

HoloStack: Stores warped frames as HSV, resized to document dimensions. Appends along 4th axis (h, w, 3, num_frames).

HoloCalculator:

  • Pixel detection: 95th-5th percentile range of H channel across all frames
  • High range = pixel changes color significantly = hologram
  • Filtering: Minimum saturation and value thresholds
  • Chi-squared uniformity test: true hologram pixels should have uniform hue distribution
  • Adaptive thresholding: iteratively adjusts threshold to find between min/max hologram pixels
  • Score: detections inside hologram ROIs score 255, outside get negative penalty (distance * multiplier)

V2: Image-Based (HoloImageEngine)

  1. User provides multiple static images of card
  2. Each image: ORB feature matching homography align to reference
  3. Histogram matching to normalize lighting
  4. absdiff() between reference and aligned image
  5. HSV filtering: saturation > 80, value > 30 hologram candidates
  6. Morphological opening to remove noise
  7. Accumulate across images; require MIN_POSITIVE_SAMPLES=2 detections per pixel

Image distance check: Cosine similarity between grayscale flattened images. Rejects images with distance > maxImgDistanceScore (default 0.0185).

Per-Document Hologram Config

Documents can override base thresholds:

  • HUN-AO-03001: Lower saturation (20), higher uniformity (500), lower false positive penalty (0.75)

OCR System

Pipeline

flowchart TD
    IMG["Image"]
    TD["TextDetectorEngine.detect()\ntext_detection ONNX -> polygons"]
    TA["TextDetectorEngine.getTextAreas()\nbounding boxes with centers"]
    AO["DocumentOcrEngine.calcOffsetFromAnchors()\nalignment correction"]
    ROI["TextDetectorEngine.getRoisFromMask()\nfilter by document ROI definitions"]
    CROP["TextDetectorEngine.cropRois()\ncrop + optional rotation"]
    OCR["OcrEngine.getText()\nLSTM ONNX or EasyOCR"]
    TC["OcrEngine.textCorrection()\npost-processing"]

    IMG --> TD --> TA --> AO --> ROI --> CROP --> OCR --> TC

LSTM OCR

Custom CRNN model with LstmLabelConverter:

  • Character set includes Hungarian accented characters
  • CTC decoding (collapse repeated characters, remove blanks marked as ¤)
  • Returns text, confidence, per-character confidences

Text Correction Pipeline

  1. Label removal: Split at : (e.g., “Vezetknv:NAGY” “NAGY”)
  2. Type-specific correction:
    • "numbers": Replace letter-like digits (O0, I1, etc.)
    • "letters": Replace digit-like letters (0O, 1I, etc.)
    • [start, end, method]: Positional correction (e.g., first 2 chars = letters, next 7 = numbers)
    • ["option1", "option2"]: Fuzzy match against known values (SequenceMatcher, 0.6 threshold)
    • callable: Custom correction function
  3. Whitelist filtering: Remove characters not in allowed set
  4. Regex validation: Final result must match pattern

Utility Layer

LazyImage (server/utils/lazy_image.py)

Defers image loading until actually needed. Stores imageId, loads numpy array on first .array access. @LazyImage.convert decorator allows RPC functions to accept either imageId (string) or LazyImage.

Processing (server/utils/processing.py)

runAsyncProcess(func, *args): Runs function in a separate multiprocessing.Process with 60s timeout. Uses asyncio integration.

Expensive per-call

Spawns a new OS process for each call. Used for CPU-intensive operations that would block the event loop (hologram detection, face distance calculation).

StreamingBuffer (server/utils/streaming_buffer.py)

Thread-safe buffer for streaming binary data. Uses threading.RLock + Condition for synchronization. Supports async starvation signaling for flow control.

Other Utilities

FilePurpose
fps_counter.pyFPS tracking with 3-second sliding window
async_extra.pyAsyncSlot (single-value async container), waitForUntilEvent (cancellation pattern)
warmup.pyModel warmup runner (GPU only, 5 runs)
webrtc.pyWebRTC retry logic for TURN allocation mismatch
roi.pyROI geometry utilities (rectangle, polygon, point-in-polygon)
face.pyFace position calculation, liveness visualization
liveness.pyAction sequence generation, reference face checks, thresholds
image.pywriteOnImage() helper
dates.pyHungarian month names
dict.pygetKeyByValue() utility
exceptions.pyCustom exception classes (12 types)
lstm_label_converter.pyCTC decoder for LSTM OCR

Config System

server/cfg.py

Config file loading order (merged with jsonmerge):

  1. config/{PYTHON_ENV}.json (dev or docker)
  2. config/jsonschemas.json (request/response schemas)
  3. config/scaling_presets/{SCALING_PRESET}.json (optional, docker only)
  4. config/local.json (local overrides)

Environment Variables

VariablePurposeValues
PYTHON_ENVConfig environmentdev or docker
ENV_VERSIONMust match config.requiredEnvVersion-
DEV_DOMAINUsed to construct config.host in dev mode-
SCALING_PRESETSelects scaling preset in docker modee.g. 50rt_25non_rt
INFERENCE_DEVICE_MODEGPU modecpu / gpu / force_gpu
WARMUP_NUM_RUNSNumber of warmup iterationsDefault: 5

Scaling Configuration

{
  "scaling": {
    "face": { "processes": 1, "workers": 1, "gpuEnabled": true },
    "websocket": { "processes": 2 },
    "http": { "processes": 2 }
  }
}

Production preset available: 50rt_25non_rt (Nvidia L40S, 50 real-time + 25 non-real-time clients).

Document whitelist: config.documentWhitelist restricts which documents are accepted.

JSON Schema validation: All HTTP request/response schemas validated at startup using Draft7Validator.


Security Analysis

Critical Issues

  1. Pickle deserialization in AppCache (server/appcache.py:106): pickle.loads(raw) — arbitrary code execution if attacker can write to Redis. Keys are predictable.
  2. No authentication on HTTP/WS endpoints: All endpoints are unauthenticated. Auth is presumably handled by Nginx/vuer_oss, but vuer_cv has no auth checks.
  3. Redis as single point of failure: All image data, metadata, and RPC goes through Redis. No encryption, no auth visible in code.

Medium Risk

  1. File path traversal in StreamPlayerCVTask: Validates file.startswith(config["streamPlayerDir"]) — prefix check, not proper path containment. However, os.path.abspath() is called first, which normalizes the path.
  2. MIME type validation bypass: ImageUploadResource uses magic.from_buffer(first_1024_bytes). Polyglot files could bypass this.
  3. No rate limiting: HTTP endpoints have no rate limiting.

Lower Risk

  1. Error messages leak internal details: Exception messages and stack traces logged with logger.exception().
  2. Hardcoded /workspace/vuer_cv/ paths: EasyOCR model path, warmup input path are hardcoded.
  3. Command execution in StatusEngine: subprocess.run(['supervisorctl', 'status']), subprocess.run(['nvidia-smi', '-L']), subprocess.run(["lscpu"]) — safe (no user input in args), but worth noting.

See security-audit for tracking.


Performance Patterns

GPU Optimization

  • Per-model GPU control via config.scaling.{model}.gpuEnabled
  • TensorRT acceleration support
  • IO binding for GPU sessions (avoids data copy overhead)
  • Warmup runs (5x by default) to pre-heat GPU caches

Caching

  • Image metadata caching: computed results (landmarks, encodings) cached in Redis per imageId
  • LazyImage: defers numpy conversion until needed
  • Short TTLs: all Redis keys expire in 2-60 seconds (prevents memory bloat)

Parallelism

  • Multi-process: each model runs in separate Supervisor process
  • Multi-worker RPC: within a process, ThreadPoolExecutor for multiple RPC server instances
  • WebSocket processors: use multiprocessing.Process for frame processing
  • Frame dropping: processors drain input queue, keeping only latest frame

Bottlenecks

Known performance issues

  1. Redis serialization: All numpy arrays serialized to JSON lists via NumpyEncoder for RPC. Face encodings (512 floats) and landmarks (98x2 floats) serialized/deserialized on every call.
  2. Per-call process spawning: runAsyncProcess() creates a new multiprocessing.Process for each invocation (hologram detection, face distance). Significant overhead.
  3. Sequential face pipeline: For each face: detection landmarks encoding runs sequentially through RPC. No batching across faces.
  4. HoloStack memory: np.append() on every frame copies the entire stack — O(n^2) memory allocation for n frames.

Code Smells & Technical Debt

Marked in Code

  1. OCR rewrite planned: Multiple comments ### this logic will be rewamped with standardized ocr api developement ### in:

    • server/cv/document_ocr_engine.py:281
    • server/cv/ocr_engine.py:18, 69, 323
  2. TODO items:

    • server/documents/hun_bo_05001_back.py:75: "roi": [35, 1, 400, 1], # TODO?
    • server/documents/hun_bo_06001_back_po.py:75: Same TODO
  3. NOSONAR suppressions (14 instances): Complexity warnings suppressed on critical methods like calcHoloMask, getRoisFromMask, processOutput in liveness tasks.

Structural Issues

Typo in filename

server/http/exeption_handler.py — “exeption” instead of “exception”.

  1. Type confusion in error handling: FaceCompareResource returns HTTP 403 for generic exceptions (should be 500). Multiple endpoints use 403 for server errors.

  2. Mixed concerns in FaceProcessor: OnDemandFaceProcessor tracks idleSmileRatio as instance state but is shared across frames. Couples calibration state to processor lifecycle.

  3. Global mutable state: OcrEngine.CONFUSED_LETTERS and CONFUSED_NUMBERS are class-level mutable lists. Not a bug but fragile.

  4. np.append in HoloStack: self.stack = np.append(self.stack, card, 3) copies entire array on each frame. Should use pre-allocated buffer or list-then-stack pattern.

See tech-debt for tracking.


Magic Numbers & Hardcoded Thresholds

Face Detection

ConstantValueLocation
BLINK_THRESHOLD0.21face_engine.py
SMILE_THRESHOLD0.51face_engine.py
SMILE_INCREASE_THRESHOLD0.06face_engine.py
HEAD_X_THRESHOLD0.6face_engine.py
HEAD_UP_THRESHOLD0.25face_engine.py
HEAD_DOWN_THRESHOLD0.3face_engine.py
Face distance threshold0.6face_cv_task.py, liveness_cv_task.py
MAX_FACE_DISTANCE_FROM_CENTER0.35face_processor.py
MIN_FACE_SIZE0.03face_processor.py
MAX_FACE_SIZE0.25face_processor.py

Document Processing

ConstantValueLocation
CLASSIFICATION_SCORE_THRESHOLD0.3card_classification_engine.py
MIN_SHARPNESS80document_processor.py
MAX_IMAGE_SIZE3250x3250image_upload.py
CORNER_PADDING_PERCENT1.5document_warp_v2.py

Hologram Detection

ConstantValueLocation
MIN_FRAMES50holo_detector_engine.py
MIN_POSITIVE_SAMPLES2holo_image_engine.py
ORB_MAX_FEATURES500holo_image_engine.py
ORB_KEEP_PERCENT0.2holo_image_engine.py
maxImgDistanceScore0.0185holo_image_engine.py
HOLO_INITIAL_THRESHOLD80document.py
HOLO_THRESHOLD_STEP1document.py
HOLO_MIN_THRESHOLD20document.py
HOLO_MAX_THRESHOLD300document.py
HOLO_UNIFORMITY_THRESHOLD200document.py
HOLO_HSV_MIN_SATURATION40document.py
HOLO_HSV_MIN_VALUE40document.py
HOLO_FALSE_POSITIVE_MULTIPLIER10document.py
HOLO_WARP_MIN_KP_PAIRS80document.py
filterMatches ratio0.75holo_warp.py

PAD Detection

ConstantValueLocation
MIN_FACE_CROP_AREA17280pad_detector_cv_task.py
clipSize8pad_detector_cv_task.py
maxNumClips10pad_detector_cv_task.py

Liveness V2

ConstantValueLocation
waitBeforeActionMs2000liveness_v2_cv_task.py
waitForStillImageMs5000liveness_v2_cv_task.py
sunglasses threshold0.5liveness.py
closed_mouth threshold0.3liveness.py
reference_face_head_angle0.23liveness.py
face_difference_head_angle0.25liveness.py
minResolution low360liveness.py
minResolution high720liveness.py

Other

ConstantValueLocation
TEXT_DETECTOR_CANVAS_SIZE1024text_detector_engine.py
TEXT_DETECTOR_PADDING(0.01, 0.01)text_detector_engine.py
SPEECH_DETECT_THRESHOLD0.9speech_detector_cv_task.py
FRAME_SIZE_TO_PROCESS_MS200speech_detector_cv_task.py
Barcode rotation angles0, 15, -15, 30, -30, 45barcode_engine.py
Sharpness resize target320x240sharpness_engine.py
Slave process start delay10sstatus_engine.py
WS up check after86400s (24h)status_engine.py
Redis image expire60sappcache.py
Redis audio expire2sappcache.py
WebRTC frame timeout2sframe_store.py
runAsyncProcess timeout60sprocessing.py

Complete File Index

Entry Points

FilePurpose
app_face.pyFace recognition RPC server
app_card_detector.pyCard detection RPC server
app_text_detector.pyText detection RPC server
app_ocr.pyOCR RPC server
app_mrz.pyMRZ reading RPC server
app_detectron2.pyDetectron2 (barcode) RPC server
app_pad.pyPAD/deepfake RPC server
app_onnx.pyBackground masking / speech RPC server
app_http.pyHTTP API (Falcon + uWSGI)
app_websocket.pyWebSocket server

CV Engines (server/cv/)

FileClassPurpose
face_engine.pyFaceEngineGeometric face analysis (actions, head pose)
document_engine.pyDocumentEngineDocument detection/classification facade
document_ocr_engine.pyDocumentOcrEngineStructured document OCR orchestration
ocr_engine.pyOcrEngineCharacter recognition + text correction
barcode_engine.pyBarcodeEngineBarcode detection + reading with rotation
card_classification_engine.pyCardClassificationEngineDocument type classification
card_integrity_check_engine.pyCardIntegrityCheckEngineTampering detection
card_warp_engine.pyCardWarpEngineDocument corner detection + perspective warp
background_mask_engine.pyBackgroundMaskEngineBackground removal/replacement
holo_detector_engine.pyHoloDetectorEngineVideo-based hologram detection orchestrator
holo_image_engine.pyHoloImageEngineImage-based hologram detection
kaptcha_detect_engine.pyKaptchaEngineCAPTCHA solving
pad_detection_engine.pyPADDetectionEngineDeepfake + PAD RPC facade
sharpness_engine.pySharpnessEngineImage sharpness (Laplacian variance)
status_engine.pyStatusEngineSystem health monitoring
text_detector_engine.pyTextDetectorEngineText region detection

Hologram (server/cv/holo/)

FileClassPurpose
holo_calculator.pyHoloCalculatorHSV analysis, chi-squared filtering, scoring
holo_stack.pyHoloStackFrame accumulation in HSV space
holo_warp.pyHoloWarpSIFT+FLANN alignment + perspective warp

HTTP Resources (server/http/resources/)

FileClassEndpoint
face_detect.pyFaceDetectResourceFace detection + encodings
face_compare.pyFaceCompareResourceFace comparison
face_draw.pyFaceDrawResourceLandmark visualization
face_age_gender.pyFaceGenderAgeResourceAge/gender prediction
reference_face_extract.pyReferenceFaceExtractResourceQuality-checked face extraction
document_recognition_v2.pyDocumentRecognitionResourceV2Document detection + classification
document_warp_v2.pyDocumentWarpResourceV2Full document warp pipeline
document_ocr.pyDocumentOcrResourceDocument OCR
document_types.pyDocumentTypesResourceList document types
card_warp.pyCardWarpResourceCorner detection + warp only
card_integrity_check.pyCardIntegrityCheckResourceTampering detection
barcode.pyBarCodeResourceBarcode reading
barcode_detect.pyBarCodeDetectResourceBarcode region detection
background_mask.pyBackgroundMaskResourceBackground removal
image_upload.pyImageUploadResourceImage upload
image_download.pyImageDownloadResourceImage download
kaptcha_decoder.pyKaptchaDecoderResourceCAPTCHA solving
mrz.pyMrzResourceMRZ reading
ocr.pyOcrResourceGeneric OCR
sharpness.pySharpnessResourceSharpness calculation
ping.pyPingResourceHealth check
status.pyStatusResourceServer status
exeption_handler.pyErrorBaseGlobal exception handler (typo in filename)

WebSocket Tasks (server/websocket/task/)

FileClassPurpose
cv_task.pyAbstractCVTask, BasicCVTask, CVTaskBase task hierarchy
face_cv_task.pyFaceCVTaskStreaming face recognition
document_cv_task.pyDocumentCVTaskStreaming document detection
liveness_cv_task.pyLivenessCVTaskV1 liveness challenge
liveness_v2_cv_task.pyLivenessV2CVTaskV2 liveness with phases
action_cv_task.pyActionCVTaskSingle action verification
barcode_cv_task.pyBarcodeCVTaskStreaming barcode reading
mrz_cv_task.pyMrzCVTaskStreaming MRZ reading
sharpness_cv_task.pySharpnessCVTaskStreaming sharpness check
speech_detector_cv_task.pySpeechDetectorTaskVoice activity detection
holo_video_cv_task.pyHoloVideoCVTaskVideo hologram detection
holo_image_cv_task.pyHoloImageCVTaskImage hologram detection (via WS)
holo_v2_cv_task.pyHoloV2CVTaskV2 hologram (ORB-based)
pad_detector_cv_task.pyPADDetectorCVTaskAnti-spoofing detection
stream_player_cv_task.pyStreamPlayerCVTaskMedia file playback via WebRTC

WebSocket Processors (server/websocket/processor/)

FileClassPurpose
processor.pyAbstractProcessorBase multiprocessing processor
face_processor.pyFaceProcessor, OnDemandFaceProcessorFace processing with config
document_processor.pyDocumentProcessorDocument detection pipeline
barcode_processor.pyBarcodeProcessorBarcode reading
holo_processor.pyHoloProcessorSIFT-based card warp for holo
holo_v2_processor.pyHoloV2ProcessorORB-based diff mask for holo
mrz_processor.pyMrzProcessorMRZ reading via RPC
sharpness_processor.pySharpnessProcessorSharpness calculation
speech_processor.pySpeechProcessorSpeech detection via RPC

WebSocket FrameStores (server/websocket/framestore/)

FileClassPurpose
frame_store.pyFrameStoreBase class for frame extraction
numpy_frame_store.pyNumpyFrameStoreVideo numpy via Redis
raw_frame_store.pyRawFrameStoreVideo JPEG via Redis
audio_frame_store.pyAudioFrameStoreAudio with resampling

WebSocket Core

FileClassPurpose
stream_handler.pyStreamHandlerWebRTC signaling
wsconnection.pyWSConnectionWebSocket message routing

Documents (server/documents/)

FileClassPurpose
document.pyAbstractDocumentBase document definition
__init__.py-Document registry (17 documents)
hun_ao_03001.pyHunAo03001Hungarian ID card
hun_bo_*.pyHunBo*Hungarian driving licenses (7 variants)
hun_fo_*.pyHunFo*Hungarian personal IDs (4 variants)
hun_ho_10001.pyHunHo10001Hungarian residence permit
srb_ao_01001.pySrbAo01001Serbian ID
srb_bo_01001_*.pySrbBo01001*Serbian driving license (front/back)

Utilities (server/utils/)

FilePurpose
image.pywriteOnImage() helper
lazy_image.pyLazyImage with Redis-backed metadata caching
face.pyFace position calculation, liveness visualization
liveness.pyAction sequence generation, reference face checks, thresholds
processing.pyrunAsyncProcess() — spawn OS process for blocking work
roi.pyROI geometry utilities (rectangle, polygon, point-in-polygon)
streaming_buffer.pyThread-safe streaming buffer for PAD detection
fps_counter.pyFPS measurement with sliding window
warmup.pyModel warmup runner (GPU only, 5 runs)
webrtc.pyWebRTC retry logic for TURN allocation mismatch
dates.pyHungarian month names
dict.pygetKeyByValue() utility
exceptions.pyCustom exception classes (12 types)
async_extra.pyAsyncSlot, waitForUntilEvent
lstm_label_converter.pyCTC decoder for LSTM OCR
http_not_found.py404 handler

Config & Infrastructure

FilePurpose
server/cfg.pyConfig loading, JSON schema validation
server/appcache.pyRedis-based image/audio/metadata cache
server/rpc/pyredisrpc.pyRedis RPC client/server

Project Structure

app_*.py              Entry points (10 processes)
server/               Python application code
  cv/                 CV engines (16 engines)
    holo/             Hologram detection (3 modules)
  http/               Falcon HTTP resources (22 endpoints)
    resources/
  websocket/          WebSocket system
    task/             CV tasks (15 task types)
    processor/        Multiprocessing processors (9 types)
    framestore/       Frame extraction (4 types)
  documents/          Document definitions (17 documents)
  utils/              Utility modules (15 files)
  rpc/                Redis RPC implementation
packages/             Internal Python packages
onnx/                 ONNX model weight files (Git LFS)
config/               Configuration files + scaling presets
data/                 Runtime data (EasyOCR models, class mappings)
requirements/         Python dependency management
  *.in                Direct dependencies
  *.txt               Pinned/compiled dependencies
  compile.sh          Compile dependencies
  install.sh          Install dependencies
setup/                Setup scripts
tests/                Test suites
web/                  Web assets
bin/                  CLI utilities

Development

# Development mode (via vuer_docker)
docker-compose -f vuer-cv-dev.yml up -d
 
# Install new packages
docker exec -it vuer_cv_dev bash
./requirements/compile.sh
./requirements/install.sh
 
# Upgrade all dependencies
./requirements/compile.sh --upgrade
 
# Git LFS (required for model weights)
git lfs pull

  • vuer_oss - Backend server that calls CV services via HTTP/WebSocket
  • vuer_css - Frontend that initiates WebRTC sessions for real-time CV
  • vuer_docker - Docker orchestration (vuer-cv.yml / vuer-cv-dev.yml / vuer-cv-gpu.yml)
  • FaceKom - Platform overview
  • security-audit - Security issues tracking
  • tech-debt - Technical debt tracking