vuer_cv (Computer Vision Service)
Role
FaceKom’s computer vision and ML inference service. Provides face detection, face comparison, liveness detection, document recognition, OCR, MRZ reading, barcode detection, hologram verification, anti-spoofing (PAD/deepfake), background masking, and speech detection via HTTP and WebSocket APIs. Communicates with vuer_oss through HTTP/WebSocket, and internally uses Redis RPC between its own ~10 Supervisor-managed processes.
| Property | Value |
|---|---|
| Runtime | Python (uWSGI + asyncio) |
| HTTP Framework | Falcon (WSGI) |
| Inference | ONNX Runtime (GPU/CPU), PyTorch, Detectron2 |
| Repo | TechTeamer/vuer_cv |
| Path (remote) | /workspace/vuer_cv |
| Path (local mount) | /Users/levander/coding/mnt/Facekom/vuer_cv |
| Models | 16 ONNX + 5 non-ONNX (Git LFS) |
| Processes | ~10 Supervisor-managed |
Table of Contents
- Architecture Overview
- Entry Point & Process Model
- ONNX Runtime & Model Loading
- Redis RPC System
- Image Cache (AppCache)
- CV Engines
- HTTP API (REST Endpoints)
- WebSocket System (Real-Time)
- Document Definitions
- Face Pipeline
- Document Pipeline
- Liveness Detection
- Anti-Spoofing (PAD & Deepfake)
- Hologram Detection
- OCR System
- Utility Layer
- Config System
- Security Analysis
- Performance Patterns
- Code Smells & Technical Debt
- Magic Numbers & Hardcoded Thresholds
- Complete File Index
Architecture Overview
vuer_cv is a multi-process Python service. Each process loads different ONNX ML models and communicates via Redis RPC. The HTTP/WebSocket servers are the entry points; satellite processes handle face recognition, document detection, OCR, MRZ reading, text detection, background masking, PAD/deepfake detection, and speech detection.
+------------------+
| vuer_cv (main) |
| HTTP (Falcon) |
| WebSocket (ws) |
+--------+---------+
|
Redis RPC (blpop)
|
+-----------+-----------+-----------+-----------+
| | | | |
app_face.py app_card*.py app_ocr.py app_pad.py app_onnx.py
(face RPC) (card RPC) (OCR RPC) (PAD RPC) (bg/speech)
Supervisor Processes
| RPC Queue | Purpose | Entry Point |
|---|---|---|
face | Face detection, landmarks, encoding, gender/age, sunglasses | app_face.py |
cardDetector | Card corner detection, classification, integrity | app_card_detector.py |
ocr | Document OCR, generic OCR, KAPTCHA | app_ocr.py |
mrz | MRZ reading from documents | app_mrz.py |
textDetector | Text region detection in images | app_text_detector.py |
detectron2 | Barcode detection (via Detectron2) | app_detectron2.py |
pad | Deepfake detection, PAD (paper/screen) scoring | app_pad.py |
onnx | Background masking, speech detection | app_onnx.py |
| - | HTTP API (uWSGI) | app_http.py |
| - | WebSocket real-time processing | app_websocket.py |
All managed by Supervisor with Nginx reverse proxy (config generated at runtime via Jinja2 templates from scaling settings).
Entry Point & Process Model
Canonical startup sequence (using app_face.py as example)
- Import config from
server/cfg.py - Call
checkGpuEnabled(config, "face")— checks if GPU should be disabled per scaling config - Import ONNX models (face_detection, face_landmark_detection, face_encoder, gender_age_prediction, sunglasses_detection)
- Connect to Redis, create AppCache
- Define decorated functions using
@LazyImage.convert(makes them callable via RPC with just imageId) - Run model warmup (5 runs by default, configurable via
WARMUP_NUM_RUNSenv var) - Start Redis RPC server
Scaling
If config.scaling.face.workers > 1, uses ThreadPoolExecutor to run multiple RPC server instances in parallel within the same process (sharing models in memory).
Face RPC Functions
| Function | Purpose |
|---|---|
getLocations(imageId, threshold) | Face bounding boxes |
getLandmarks(imageId) | 98-point face landmarks |
getEncodings(imageId) | Face embeddings (for comparison) |
getSunglassesScores(imageId) | Sunglasses detection probability |
getGenderAgePredictions(imageId, box) | Gender probability + age |
calcFacePositions(imageId) | Distance from image center |
calcFaceDistance(enc1, enc2) | Cosine distance between face embeddings |
detectFace(imageId) | Combined detection + encoding |
drawFace(imageId) | Debug visualization with landmark markers |
compareFaces(img1, img2) | Full compare pipeline |
Caching Strategy
Results are memoized per-image via LazyImage.getMeta() which stores computed results in Redis (keyed by metadata:{imageId}:{metaName}). Repeated calls for the same image ID return cached landmarks/encodings/etc.
ONNX Runtime & Model Loading
GPU/CPU Selection (onnx/utils.py)
DeviceMode enum: cpu, gpu, force_gpu
createInferenceSession() handles GPU fallback logic:
INFERENCE_DEVICE_MODEenv var controls modegpumode: tries CUDA, falls back to CPU on error (default)force_gpu: no fallback, crashes if GPU unavailablecpu: CPU only- GPU detection: calls
nvidia-smi -Land checks output - Supports TensorRT acceleration if
useTensorRt=True
runSession() has two execution paths:
- CPU: Direct
session.run() - GPU: Uses
io_binding()for zero-copy GPU memory transfers (bind_cpu_input→copy_outputs_to_cpu)
Global env mutation
checkGpuEnabled()mutatesos.environglobally when disabling GPU for a specific model, affecting all subsequent model loads in the same process.
Mixins
NdarrayNormalizerMixin: ImageNet-style normalization (mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])AspectPreservingResizePadMixin: Resize with letterboxing to square, returns scale/offset for coordinate back-mapping
Base Class (onnx/ml_model_runner.py)
Abstract base class for all ONNX model runners. Constructor loads model via createInferenceSession() and runs random warmup. Subclasses implement _setup() (returns warmup shape/dtype) and _runSession().
ONNX Models (16 total)
| Model | Purpose | Used By |
|---|---|---|
| face_detection | Face bounding box detection | app_face.py |
| face_landmark_detection | 98-point facial landmarks | app_face.py |
| face_encoder | Face embedding generation | app_face.py |
| gender_age_prediction | Gender probability + age estimation | app_face.py |
| sunglasses_detection | Sunglasses detection | app_face.py |
| localization | Document corner detection | card detector |
| card_classification | Document type classification | card detector |
| card_integrity | Document tampering detection | card detector |
| text_detection | Text region detection (CRAFT-like) | text detector |
| lstm | Character recognition (CRNN) | OCR process |
| barcode_detection | Barcode region detection (Detectron2) | detectron2 |
| kaptcha | CAPTCHA solving | OCR process |
| deepfake_detection | Video deepfake detection | PAD process |
| pad_prediction | Presentation attack detection (paper/screen) | PAD process |
| background_masking | Person segmentation / background removal | onnx process |
| speechDetection | Voice activity detection | onnx process |
Non-ONNX Models (CPU only)
| Model | Function | Runtime |
|---|---|---|
| card_detector_classification_svm.pkl | SVM classification | Scikit-learn |
| easyOcr/cyrillic_g2.pth | Serbian Cyrillic OCR | EasyOCR (PyTorch) |
| easyOcr/latin_g2.pth | Serbian Latin OCR | EasyOCR (PyTorch) |
| detectron2_card_detector.pth | Document recognition | Detectron2 (PyTorch) |
| detectron2_barcode_detector.pth | Barcode detection | Detectron2 (PyTorch) |
| tesseract OCR data files | MRZ reading | Passporteye |
GPU Inference Modes
| Mode | Behavior |
|---|---|
cpu | Force all models to CPU |
gpu | Try CUDA, fallback to CPU (default) |
force_gpu | Require CUDA, fail if unavailable |
Requires NVIDIA Container Toolkit on host.
Redis RPC System
Protocol (server/rpc/pyredisrpc.py)
Based on gowhari/pyredisrpc (MIT license), customized.
sequenceDiagram participant Client as HTTP/WS Process participant Redis participant Server as Model Process (e.g. app_face.py) Client->>Redis: LPUSH rpc:{serviceName} {id, method, params, tmchk} Client->>Redis: SET rpc:{reqId}:tmchk (TTL) Server->>Redis: BLPOP rpc:{serviceName} Redis-->>Server: request JSON Server->>Server: Process (run ONNX model) Server->>Redis: Check rpc:{reqId}:tmchk not expired Server->>Redis: LPUSH rpc:{reqId} {id, result, error} Client->>Redis: BLPOP rpc:{reqId} (timeout) Redis-->>Client: response JSON
Magic __getattr__: RedisRpcClient uses __getattr__ to dynamically generate method calls. faceRpcClient.detectFace(imageId) becomes call("detectFace", [[imageId], {}]).
NumpyEncoder: Custom JSON encoder that converts np.ndarray to lists for serialization. All numpy arrays are serialized to JSON lists for Redis transport.
Performance concern
Every RPC call serializes numpy arrays to JSON, passes through Redis, and deserializes. For face encodings (512-float vectors) and landmarks (98x2 floats), this adds latency on every call.
Image Cache (AppCache)
server/appcache.py — Redis-Based Storage
| Key Pattern | Format | TTL | Purpose |
|---|---|---|---|
raw-image:{uuid} | Original bytes (JPEG/PNG) | 60s | Uploaded images |
numpy-image:{uuid} | Binary ndarray + 12-byte header (h,w,c as big-endian uint32) | 60s | Decoded images |
metadata:{uuid}:{name} | Pickled Python objects | 60s | Face locations, encodings, etc. |
audio-frame:{uuid} | 1B layout + 15B format + 3B sample_rate + payload | 2s | Audio frames |
Lazy conversion: getNumPyImage() auto-converts raw images to numpy if the numpy version doesn’t exist (using imageio_imread with EXIF rotation). Alpha channels are stripped.
Pickle deserialization
Metadata uses
pickle.loads()for deserialization (server/appcache.py:106). If an attacker can write to Redis, they can achieve arbitrary code execution. Metadata keys are predictable (metadata:{imageId}:{metaName}). See security-audit.
CV Engines
FaceEngine (server/cv/face_engine.py)
Computes facial action units and head pose from landmarks. Does NOT do ML inference — uses geometric calculations on the 98-point landmark output.
Computed features:
| Feature | Method | Description |
|---|---|---|
eyeRatio(face, side) | height/width | Eye aspect ratio — blink detection |
smileRatio(face) | mouth width / face width | Smile detection |
mouthRatio(face) | mouth height / width | Mouth openness |
headX(face) | ear-to-nose distance ratio | Horizontal turn (-1 left to +1 right) |
headY(face) | ear-line vs nose position | Vertical tilt |
headTilt(face) | arctan2 of ear positions | Head rotation |
Action detection thresholds:
| Action | Threshold | Metric |
|---|---|---|
| Blink | < 0.21 | Both eye aspect ratios |
| Smile | > 0.51 (or idle + 0.06) | Smile ratio |
| Look left | < -0.6 | headX |
| Look right | > 0.6 | headX |
| Look up | > 0.25 | headY |
| Look down | < -0.3 | headY (asymmetric!) |
Asymmetric thresholds
Look up/down thresholds are asymmetric (0.25 vs 0.3), presumably because looking down produces more distinct landmark displacement.
DocumentEngine (server/cv/document_engine.py)
Thin facade over RPC calls to cardDetector process:
detect(imageId)— callsrpcClient.detectCard()(localization model)classify(imageId)— callsrpcClient.classifyCard()(classification model)checkCardIntegrity(imageId)— callsrpcClient.checkCardIntegrity()resizeToAspectRatio(warpedCardId, documentName)— resizes warped card to match document template
DocumentOcrEngine (server/cv/document_ocr_engine.py)
Orchestrates OCR for recognized documents. Most complex engine.
Flow:
- Instantiate document class with image
- For each requested OCR key: a. Try MRZ first (if document has MRZ and key maps to MRZ field) b. If MRZ unavailable/invalid, detect text regions (TextDetectorEngine) c. Calculate anchor offsets to compensate for document alignment d. Crop text ROIs and run OCR
- Apply corrections (number/letter confusion fixing, regex validation)
- Retry failed keys with deskewing
ThreadPoolExecutor: Uses cpu_cores/4 workers (min 1, max 8) for parallel OCR.
Anchor system: Documents define “anchor points” — known text positions used to calculate offset between template and actual image, compensating for warping imperfections.
OcrEngine (server/cv/ocr_engine.py)
OCR backends:
- LSTM OCR (default): Custom CRNN model via ONNX
- EasyOCR rs_cyrillic: For Serbian Cyrillic documents (config-gated)
- EasyOCR rs_latin: For Serbian Latin documents (config-gated)
EasyOCR downloads are disabled (download_enabled=False), models stored at /workspace/vuer_cv/data/easyOcr.
Character confusion correction:
CONFUSED_LETTERS: 0->O, 1->I, 2->Z, 3->B, 4->A, 5->S, 6->G, 7->T, 8->B, 9->Y
CONFUSED_NUMBERS: D->0, Q->0, O->0, I->1, Z->2, A->4, S->5, G->6, T->7, B->8, Y->9
CardWarpEngine (server/cv/card_warp_engine.py)
Detects document corners and applies perspective transform: onnx.localization.main.detectCorners() → cv2.getPerspectiveTransform → cv2.warpPerspective.
CardClassificationEngine (server/cv/card_classification_engine.py)
Classifies document type from warped card image. CLASSIFICATION_SCORE_THRESHOLD = 0.3 — returns None if confidence < 30%.
CardIntegrityCheckEngine (server/cv/card_integrity_check_engine.py)
Detects document tampering. Returns max anomaly score.
TextDetectorEngine (server/cv/text_detector_engine.py)
Detects text regions in images for OCR. CANVAS_SIZE = 1024 (images resized to this for detection). PADDING = 1% horizontal and vertical added to detected boxes.
Complex ROI logic (getRoisFromMask): Filters text detections by center point within specified ROI, height > 2x padding, blacklist filtering, target height adjustment, and multi-line splitting support.
BarcodeEngine (server/cv/barcode_engine.py)
Uses Detectron2 model via RPC for barcode localization, then pyzbar library for decoding. Tries 6 rotation angles: 0, 15, -15, 30, -30, 45 degrees.
BackgroundMaskEngine (server/cv/background_mask_engine.py)
Person segmentation and background replacement. Alpha blending: result = image * mask + background * (1 - mask). Default white background.
SharpnessEngine (server/cv/sharpness_engine.py)
Image quality via Laplacian variance: cv2.Laplacian(resized, CV_8U).var(), clamped to [0, 255]. Converts to grayscale, rotates to horizontal if portrait, resizes to fit 320x240.
PADDetectionEngine (server/cv/pad_detection_engine.py)
Thin RPC wrapper for deepfake and presentation attack detection:
processClipForDeepFake(clip)— sends video clip frames for deepfake analysisgetPADScores(imageId, locations)— gets paper/screen attack scores per face
KaptchaEngine (server/cv/kaptcha_detect_engine.py)
CAPTCHA solving via OCR process RPC.
StatusEngine (server/cv/status_engine.py)
Health monitoring and hardware status. Health levels: green/yellow/red.
- Checks all Supervisor processes via
supervisorctl status - Yellow if any process not RUNNING or started >10s after main process
- Checks WebSocket process logs for “LISTENING” after 24h uptime
- Reports CPU (model, cores, frequency, utilization), GPU (via nvidia-smi), RAM info
HTTP API (REST Endpoints)
Framework: Falcon (WSGI). All endpoints use JSON schema validation.
Face Endpoints
| Method | Path | Resource Class | Purpose |
|---|---|---|---|
| POST | /face/detect | FaceDetectResource | Detect faces, encodings, landmarks, actions |
| POST | /face/compare | FaceCompareResource | Compare two face images (cosine distance) |
| POST | /face/draw | FaceDrawResource | Draw landmarks on image |
| POST | /face/age-gender | FaceGenderAgeResource | Predict age and gender |
| POST | /face/reference-extract | ReferenceFaceExtractResource | Extract reference face with quality checks |
Document Endpoints
| Method | Path | Resource Class | Purpose |
|---|---|---|---|
| POST | /document/recognition | DocumentRecognitionResourceV2 | Detect + classify document |
| POST | /document/warp | DocumentWarpResourceV2 | Detect + warp + classify + sharpness |
| POST | /document/ocr | DocumentOcrResource | OCR on recognized document |
| POST | /document/card-warp | CardWarpResource | Just corner detection + warp |
| POST | /document/card-integrity | CardIntegrityCheckResource | Tampering score |
| GET | /document/types | DocumentTypesResource | List supported document types |
Other Endpoints
| Method | Path | Resource Class | Purpose |
|---|---|---|---|
| POST | /image/upload | ImageUploadResource | Upload image (max 3250x3250, auto-downsize) |
| POST | /image/download | ImageDownloadResource | Download image as PNG |
| POST | /barcode | BarCodeResource | Read barcodes from image |
| POST | /barcode/detect | BarCodeDetectResource | Detect barcode regions |
| POST | /mrz | MrzResource | Read MRZ from image |
| POST | /ocr | OcrResource | Generic OCR |
| POST | /sharpness | SharpnessResource | Calculate image sharpness |
| POST | /kaptcha | KaptchaDecoderResource | Decode CAPTCHA image |
| POST | /background-mask | BackgroundMaskResource | Background removal/replacement |
| GET | /ping | PingResource | Health check |
| GET | /status | StatusResource | Detailed server status |
Image Upload Security
ImageUploadResource:
- MIME type validation via
python-magic(reads first 1024 bytes) - Supported types from config (
config.image.supportedMimeTypes) - Max resolution 3250x3250, auto-downsized while preserving EXIF
- Images stored in Redis with 60s TTL
WebSocket System (Real-Time)
Architecture
flowchart TD Browser["Browser"] WS["WSConnection\n(pyee EventEmitter)"] SH["StreamHandler\n(WebRTC signaling via aiortc)"] FS["FrameStore\n(video/audio frame extraction)"] P["Processor\n(multiprocessing.Process per worker)"] CVT["CVTask\n(orchestrates pipelines)"] Browser <-->|"WebSocket JSON + binary"| WS WS --> SH SH --> FS FS --> P P --> CVT CVT -->|"results via WS"| Browser
WSConnection (server/websocket/wsconnection.py)
Event-based message routing using pyee.asyncio.AsyncIOEventEmitter. Messages are JSON with {type, payload} structure. Binary messages emit “binary” event.
StreamHandler (server/websocket/stream_handler.py)
Handles WebRTC signaling (offer/answer). Uses aiortc for server-side WebRTC. Supports custom ICE servers passed from client or from config. Retry logic: webrtcConnectWithAllocationMismatchRetry() retries 3 times on TURN 437 errors.
FrameStores
| Class | Purpose | Storage |
|---|---|---|
| NumpyFrameStore | Video frames as numpy arrays | Redis NumPy format, 2s TTL |
| RawFrameStore | Video frames as JPEG | Redis raw format, 2s TTL |
| AudioFrameStore | Audio with resampling + accumulation | Redis audio format, 2s TTL |
Processor (server/websocket/processor/processor.py)
Abstract multiprocessing processor using multiprocessing.Queue for IPC:
input()puts imageId into input queue (non-blocking)- Background process reads from input queue, calls
doProcess(), puts result in output queue - Frame dropping: If input queue has items when process finishes one, drains queue (keeps only latest)
- Cleanup: Graceful shutdown with sentinel values → SIGTERM → SIGKILL with escalating timeouts
CV Task Hierarchy
classDiagram AbstractCVTask <|-- BasicCVTask BasicCVTask <|-- CVTask CVTask <|-- FaceCVTask CVTask <|-- DocumentCVTask CVTask <|-- LivenessCVTask CVTask <|-- LivenessV2CVTask CVTask <|-- BarcodeCVTask CVTask <|-- MrzCVTask CVTask <|-- SharpnessCVTask CVTask <|-- SpeechDetectorTask CVTask <|-- ActionCVTask CVTask <|-- HoloVideoCVTask CVTask <|-- HoloImageCVTask CVTask <|-- HoloV2CVTask AbstractCVTask <|-- PADDetectorCVTask class AbstractCVTask { +start() +stop() } class BasicCVTask { +FrameStore +Processor } class CVTask { +StreamHandler (WebRTC) +FPSCounter } class PADDetectorCVTask { standalone, no WebRTC binary WS chunks }
StreamPlayerCVTask is completely separate — plays pre-recorded media files via WebRTC.
Key CV Tasks
FaceCVTask: Accepts recognitionType: "smile" | "blink" and optional faceToFind encoding. Detects face, waits for specified action, compares with faceToFind if provided (threshold default 0.6).
DocumentCVTask: Streaming document detection with sharpness check.
PADDetectorCVTask: Receives video as binary WebSocket chunks (not WebRTC). Uses StreamingBuffer with av for demuxing. Builds clips of 8 frames for deepfake detection. PAD scoring weighted by face crop area.
StreamPlayerCVTask: Plays pre-recorded media files via WebRTC. Directory restriction: config.streamPlayerDir must be set and file must be within it.
Document Definitions
AbstractDocument (server/documents/document.py)
Base class for all document definitions. Defines:
- Static document dimensions (width, height in “document coordinates”)
- Face ROI location (for face extraction from ID photos)
- Hologram ROI definitions (position, density, weight)
- MRZ ROI location
- OCR field definitions (ROIs, regex, corrections, whitelists)
- Anchor points for OCR alignment correction
Supported Documents (17 total)
Hungarian (HUN) — 14 documents:
| Class | Name | Type |
|---|---|---|
| HunAo03001 | HUN-AO-03001 | ID card (address side) |
| HunBo03004Back | HUN-BO-03004_BACK | Driving license back |
| HunBo03004Front | HUN-BO-03004_FRONT | Driving license front |
| HunBo04001Front | HUN-BO-04001_FRONT | Driving license front (newer) |
| HunBo05001Back | HUN-BO-05001_BACK | Driving license back (newer) |
| HunBo05001Front | HUN-BO-05001_FRONT | Driving license front (newer) |
| HunBo06001BackPO | HUN-BO-06001_BACK_PO | Driving license back (latest) |
| HunBo06001Front | HUN-BO-06001_FRONT | Driving license front (latest) |
| HunBo07001Front | HUN-BO-07001_FRONT | Driving license front (latest+) |
| HunFo02001Back | HUN-FO-02001_BACK | Personal ID back |
| HunFo02001Front | HUN-FO-02001_FRONT | Personal ID front |
| HunFo04001Back | HUN-FO-04001_BACK | Personal ID back (newer) |
| HunFo04001Front | HUN-FO-04001_FRONT | Personal ID front (newer) |
| HunHo10001 | HUN-HO-10001 | Residence permit |
Serbian (SRB) — 3 documents:
| Class | Name | Type |
|---|---|---|
| SrbAo01001 | SRB-AO-01001 | Serbian ID |
| SrbBo01001Back | SRB-BO-01001_BACK | Driving license back |
| SrbBo01001Front | SRB-BO-01001_FRONT | Driving license front |
Example: HUN-AO-03001 (Hungarian ID card)
- Dimensions: 940 x 650 (document coordinates)
- Face ROI: [15, 135, 260, 335] (x, y, w, h)
- MRZ ROI: [10, 490, 920, 130]
- Hologram ROIs: 2 regions (main hologram + OVI security feature)
- OCR fields: documentId, type, code, lastName, firstName, birthName, nationality, dateOfBirth, sex, placeOfBirth, dateOfIssue, authority, dateOfExpiry
- Anchor-based alignment: Uses documentId and dateOfExpiry as anchor points with max 25px offset tolerance
- Custom corrections:
correctDate()(Hungarian month names),correctNationality()(fuzzy match “MAGYAR/HUNGARIAN”),correctAddress()(Hungarian address format),correctAuthority()(issuing authority names)
Document Classification Mapping
config.DOCUMENT_CLASS_ID_MAPPING maps document names to integer class IDs (loaded from data/card_detector_classification_classId_mapping.json). The classification model outputs these integer IDs.
Face Pipeline
HTTP Path (batch)
flowchart LR Client["Client POST /face/detect"] FDR["FaceDetectResource"] RPC["Redis RPC -> app_face.py"] FD["face_detection ONNX"] FL["face_landmark_detection ONNX"] FE["face_encoder ONNX"] Cache["Redis metadata cache"] Client --> FDR --> RPC RPC --> FD -->|"bounding boxes"| FL -->|"98-point landmarks"| FE -->|"512-d embeddings"| Cache
Client POST /face/detect {imageId}
-> FaceDetectResource
-> faceRpcClient.detectFace(imageId) [Redis RPC to app_face.py]
-> getLocations(imageId) -> detectFaces(image, threshold) [face_detection ONNX]
-> getLandmarks(imageId) -> detectFaceLandmarks(image, box) [face_landmark_detection ONNX] per face
-> getEncodings(imageId) -> getFaceEncodings(image, landmark) [face_encoder ONNX] per face
-> Results cached per imageId in Redis metadata
WebSocket Path (streaming)
Browser <-> WebRTC <-> StreamHandler <-> NumpyFrameStore
-> FaceProcessor (multiprocessing)
-> faceRpcClient.getLandmarks(imageId) [Redis RPC]
-> faceRpcClient.getLocations(imageId) [Redis RPC]
-> faceRpcClient.getEncodings(imageId) [Redis RPC]
-> FaceEngine.getFaceData(landmarks) [geometric computation]
-> FaceEngine.userActions(faceData) [threshold checks]
-> FaceCVTask.processOutput() [send results via WebSocket]
Face Comparison
cosine_distances(enc1.reshape(1,-1), enc2.reshape(1,-1))[0][0]Using sklearn’s cosine_distances. Distance 0 = identical, 1 = completely different. Default threshold: 0.6.
Document Pipeline
flowchart TD IMG["Input Image"] DET["DocumentEngine.detect()\nlocalization ONNX -> 4 corners"] WARP["CardWarpEngine.warp()\nperspective transform"] CLS["DocumentEngine.classify()\ncard_classification ONNX -> classId"] LOOKUP["getDocumentByClassId()\nlookup document definition"] SHARP["SharpnessEngine.calcSharpness()\nLaplacian variance check"] RESIZE["DocumentEngine.resizeToAspectRatio()"] OCR["DocumentOcrEngine.run()"] MRZ["1. Try MRZ first"] TEXT["2. Detect text regions"] ANCHOR["3. Calculate anchor offsets"] CROP["4. Crop ROIs per field"] RECOG["5. Run character recognition"] CORRECT["6. Apply corrections"] RETRY["7. Retry failed with deskewing"] IMG --> DET --> WARP --> CLS --> LOOKUP --> SHARP --> RESIZE --> OCR OCR --> MRZ --> TEXT --> ANCHOR --> CROP --> RECOG --> CORRECT --> RETRY
Liveness Detection
V1 (LivenessCVTask)
Protocol (WebSocket messages):
- Client sends video stream via WebRTC
- Server detects face, validates position/size
- Server sends
{type: "liveness", task: "face"}— “show your face” - Server confirms face, sends random action:
{task: "smile"} - Client performs action
- Server evaluates: success (+1) or wrong_action/too_many_actions (-1)
- Repeat until
score >= successScore (3)orscore <= failScore (-2) - Optional
faceToFindencoding comparison on each frame
Face position validation:
- Must be within 35% of center (both axes)
- Face size between 3-25% of image area
- Guide messages: “move_head_up/down/left/right/closer/away”
V2 (LivenessV2CVTask)
Two phases:
Phase 1: Reference Face Capture
- Wait 3 seconds for user to position face
- Capture low-quality reference (min 360px)
- Request high-quality still image from client (min 720px)
- Run quality checks: resolution, single person, face size, face position, extras (sunglasses, closed mouth, no smile, eyes open, facing camera)
- Compare low/high quality face encodings
Phase 2: Action Challenges (same as V1 but with progress)
- Reports action progress (0.0 to 1.0) per frame
- Guide state machine prevents UI flicker with grace periods and cooldowns
Guide State Machine (5 states):
stateDiagram-v2 [*] --> NOMINAL NOMINAL --> GRACE_PERIOD: receive negative guide GRACE_PERIOD --> ERROR_COOLDOWN: timeout GRACE_PERIOD --> NOMINAL: same positive guide ERROR_COOLDOWN --> NOMINAL_COOLDOWN: timeout + positive ERROR_COOLDOWN --> ERROR: timeout + negative NOMINAL_COOLDOWN --> NOMINAL: timeout + positive ERROR --> NOMINAL_COOLDOWN: receive positive
Quality Checks for Reference Face
| Check | Threshold | Default |
|---|---|---|
| Sunglasses | > 0.5 | Enabled |
| Closed mouth | > 0.3 (mouth_aspect_ratio) | Enabled |
| Not smiling | > 0.51 (smile_ratio) | Enabled |
| Not blinking | < 0.21 (eye_aspect_ratio) | Enabled |
| Facing camera | > 0.23 (max head angle) | Enabled |
Anti-Spoofing (PAD & Deepfake)
PADDetectorCVTask
Input: Video sent as binary WebSocket chunks (not WebRTC). Uses av library to demux.
Two-pronged detection:
-
PAD (Presentation Attack Detection): Per-frame, per-face
- Gets face location from FaceProcessor
- Calls
PADDetectionEngine.getPADScores(imageId, locations)via RPC - Returns paper/screen attack probability
- Weighted by
sqrt(face_crop_area)— larger faces = more reliable scores - Min face crop area: 17,280 pixels (360 * 480 * 0.1)
-
Deepfake Detection: Per-clip (8 frames)
- Builds clips of 8 consecutive frames with valid faces
- Sends clip (imageIds + landmarks) to
PADDetectionEngine.processClipForDeepFake() - Default max 10 clips per session
Final result (averaged across all frames/clips):
deepFake: Mean of all clip scorespadPaper: Weighted average of per-frame paper scorespadScreen: Weighted average of per-frame screen scores
Hologram Detection
V1: Video-Based (HoloDetectorEngine)
flowchart TD REF["Reference image (warped card)"] VID["Video stream of card under light"] SIFT["SIFT feature matching\n+ FLANN + Lowe's ratio (0.75)"] HOMO["Homography (RANSAC)\n+ perspective warp"] HSV["Store warped frames as HSV\nin HoloStack"] CALC["HoloCalculator:\n95th-5th percentile H range\nChi-squared uniformity test\nAdaptive thresholding"] SCORE["Score: detections in ROIs = 255\noutside = negative penalty"] REF --> SIFT VID --> SIFT --> HOMO --> HSV HSV -->|"MIN_FRAMES=50"| CALC --> SCORE
HoloWarp: SIFT feature detector + FLANN-based matcher. Lowe’s ratio test (0.75). Min keypoint pairs configurable per document (default 80). Rejects: outside corners, too small detections (< 10% of frame area).
HoloStack: Stores warped frames as HSV, resized to document dimensions. Appends along 4th axis (h, w, 3, num_frames).
HoloCalculator:
- Pixel detection: 95th-5th percentile range of H channel across all frames
- High range = pixel changes color significantly = hologram
- Filtering: Minimum saturation and value thresholds
- Chi-squared uniformity test: true hologram pixels should have uniform hue distribution
- Adaptive thresholding: iteratively adjusts threshold to find between min/max hologram pixels
- Score: detections inside hologram ROIs score 255, outside get negative penalty (distance * multiplier)
V2: Image-Based (HoloImageEngine)
- User provides multiple static images of card
- Each image: ORB feature matching → homography → align to reference
- Histogram matching to normalize lighting
absdiff()between reference and aligned image- HSV filtering: saturation > 80, value > 30 → hologram candidates
- Morphological opening to remove noise
- Accumulate across images; require
MIN_POSITIVE_SAMPLES=2detections per pixel
Image distance check: Cosine similarity between grayscale flattened images. Rejects images with distance > maxImgDistanceScore (default 0.0185).
Per-Document Hologram Config
Documents can override base thresholds:
- HUN-AO-03001: Lower saturation (20), higher uniformity (500), lower false positive penalty (0.75)
OCR System
Pipeline
flowchart TD IMG["Image"] TD["TextDetectorEngine.detect()\ntext_detection ONNX -> polygons"] TA["TextDetectorEngine.getTextAreas()\nbounding boxes with centers"] AO["DocumentOcrEngine.calcOffsetFromAnchors()\nalignment correction"] ROI["TextDetectorEngine.getRoisFromMask()\nfilter by document ROI definitions"] CROP["TextDetectorEngine.cropRois()\ncrop + optional rotation"] OCR["OcrEngine.getText()\nLSTM ONNX or EasyOCR"] TC["OcrEngine.textCorrection()\npost-processing"] IMG --> TD --> TA --> AO --> ROI --> CROP --> OCR --> TC
LSTM OCR
Custom CRNN model with LstmLabelConverter:
- Character set includes Hungarian accented characters
- CTC decoding (collapse repeated characters, remove blanks marked as
¤) - Returns text, confidence, per-character confidences
Text Correction Pipeline
- Label removal: Split at
:(e.g., “Vezetknv:NAGY” → “NAGY”) - Type-specific correction:
"numbers": Replace letter-like digits (O→0, I→1, etc.)"letters": Replace digit-like letters (0→O, 1→I, etc.)[start, end, method]: Positional correction (e.g., first 2 chars = letters, next 7 = numbers)["option1", "option2"]: Fuzzy match against known values (SequenceMatcher, 0.6 threshold)callable: Custom correction function
- Whitelist filtering: Remove characters not in allowed set
- Regex validation: Final result must match pattern
Utility Layer
LazyImage (server/utils/lazy_image.py)
Defers image loading until actually needed. Stores imageId, loads numpy array on first .array access. @LazyImage.convert decorator allows RPC functions to accept either imageId (string) or LazyImage.
Processing (server/utils/processing.py)
runAsyncProcess(func, *args): Runs function in a separate multiprocessing.Process with 60s timeout. Uses asyncio integration.
Expensive per-call
Spawns a new OS process for each call. Used for CPU-intensive operations that would block the event loop (hologram detection, face distance calculation).
StreamingBuffer (server/utils/streaming_buffer.py)
Thread-safe buffer for streaming binary data. Uses threading.RLock + Condition for synchronization. Supports async starvation signaling for flow control.
Other Utilities
| File | Purpose |
|---|---|
fps_counter.py | FPS tracking with 3-second sliding window |
async_extra.py | AsyncSlot (single-value async container), waitForUntilEvent (cancellation pattern) |
warmup.py | Model warmup runner (GPU only, 5 runs) |
webrtc.py | WebRTC retry logic for TURN allocation mismatch |
roi.py | ROI geometry utilities (rectangle, polygon, point-in-polygon) |
face.py | Face position calculation, liveness visualization |
liveness.py | Action sequence generation, reference face checks, thresholds |
image.py | writeOnImage() helper |
dates.py | Hungarian month names |
dict.py | getKeyByValue() utility |
exceptions.py | Custom exception classes (12 types) |
lstm_label_converter.py | CTC decoder for LSTM OCR |
Config System
server/cfg.py
Config file loading order (merged with jsonmerge):
config/{PYTHON_ENV}.json(dev or docker)config/jsonschemas.json(request/response schemas)config/scaling_presets/{SCALING_PRESET}.json(optional, docker only)config/local.json(local overrides)
Environment Variables
| Variable | Purpose | Values |
|---|---|---|
PYTHON_ENV | Config environment | dev or docker |
ENV_VERSION | Must match config.requiredEnvVersion | - |
DEV_DOMAIN | Used to construct config.host in dev mode | - |
SCALING_PRESET | Selects scaling preset in docker mode | e.g. 50rt_25non_rt |
INFERENCE_DEVICE_MODE | GPU mode | cpu / gpu / force_gpu |
WARMUP_NUM_RUNS | Number of warmup iterations | Default: 5 |
Scaling Configuration
{
"scaling": {
"face": { "processes": 1, "workers": 1, "gpuEnabled": true },
"websocket": { "processes": 2 },
"http": { "processes": 2 }
}
}Production preset available: 50rt_25non_rt (Nvidia L40S, 50 real-time + 25 non-real-time clients).
Document whitelist: config.documentWhitelist restricts which documents are accepted.
JSON Schema validation: All HTTP request/response schemas validated at startup using Draft7Validator.
Security Analysis
Critical Issues
- Pickle deserialization in AppCache (
server/appcache.py:106):pickle.loads(raw)— arbitrary code execution if attacker can write to Redis. Keys are predictable.- No authentication on HTTP/WS endpoints: All endpoints are unauthenticated. Auth is presumably handled by Nginx/vuer_oss, but vuer_cv has no auth checks.
- Redis as single point of failure: All image data, metadata, and RPC goes through Redis. No encryption, no auth visible in code.
Medium Risk
- File path traversal in StreamPlayerCVTask: Validates
file.startswith(config["streamPlayerDir"])— prefix check, not proper path containment. However,os.path.abspath()is called first, which normalizes the path.- MIME type validation bypass:
ImageUploadResourceusesmagic.from_buffer(first_1024_bytes). Polyglot files could bypass this.- No rate limiting: HTTP endpoints have no rate limiting.
Lower Risk
- Error messages leak internal details: Exception messages and stack traces logged with
logger.exception().- Hardcoded
/workspace/vuer_cv/paths: EasyOCR model path, warmup input path are hardcoded.- Command execution in StatusEngine:
subprocess.run(['supervisorctl', 'status']),subprocess.run(['nvidia-smi', '-L']),subprocess.run(["lscpu"])— safe (no user input in args), but worth noting.
See security-audit for tracking.
Performance Patterns
GPU Optimization
- Per-model GPU control via
config.scaling.{model}.gpuEnabled - TensorRT acceleration support
- IO binding for GPU sessions (avoids data copy overhead)
- Warmup runs (5x by default) to pre-heat GPU caches
Caching
- Image metadata caching: computed results (landmarks, encodings) cached in Redis per imageId
- LazyImage: defers numpy conversion until needed
- Short TTLs: all Redis keys expire in 2-60 seconds (prevents memory bloat)
Parallelism
- Multi-process: each model runs in separate Supervisor process
- Multi-worker RPC: within a process, ThreadPoolExecutor for multiple RPC server instances
- WebSocket processors: use multiprocessing.Process for frame processing
- Frame dropping: processors drain input queue, keeping only latest frame
Bottlenecks
Known performance issues
- Redis serialization: All numpy arrays serialized to JSON lists via NumpyEncoder for RPC. Face encodings (512 floats) and landmarks (98x2 floats) serialized/deserialized on every call.
- Per-call process spawning:
runAsyncProcess()creates a newmultiprocessing.Processfor each invocation (hologram detection, face distance). Significant overhead.- Sequential face pipeline: For each face: detection → landmarks → encoding runs sequentially through RPC. No batching across faces.
- HoloStack memory:
np.append()on every frame copies the entire stack — O(n^2) memory allocation for n frames.
Code Smells & Technical Debt
Marked in Code
-
OCR rewrite planned: Multiple comments
### this logic will be rewamped with standardized ocr api developement ###in:server/cv/document_ocr_engine.py:281server/cv/ocr_engine.py:18, 69, 323
-
TODO items:
server/documents/hun_bo_05001_back.py:75:"roi": [35, 1, 400, 1], # TODO?server/documents/hun_bo_06001_back_po.py:75: Same TODO
-
NOSONAR suppressions (14 instances): Complexity warnings suppressed on critical methods like
calcHoloMask,getRoisFromMask,processOutputin liveness tasks.
Structural Issues
Typo in filename
server/http/exeption_handler.py— “exeption” instead of “exception”.
-
Type confusion in error handling:
FaceCompareResourcereturns HTTP 403 for generic exceptions (should be 500). Multiple endpoints use 403 for server errors. -
Mixed concerns in FaceProcessor:
OnDemandFaceProcessortracksidleSmileRatioas instance state but is shared across frames. Couples calibration state to processor lifecycle. -
Global mutable state:
OcrEngine.CONFUSED_LETTERSandCONFUSED_NUMBERSare class-level mutable lists. Not a bug but fragile. -
np.appendin HoloStack:self.stack = np.append(self.stack, card, 3)copies entire array on each frame. Should use pre-allocated buffer or list-then-stack pattern.
See tech-debt for tracking.
Magic Numbers & Hardcoded Thresholds
Face Detection
| Constant | Value | Location |
|---|---|---|
| BLINK_THRESHOLD | 0.21 | face_engine.py |
| SMILE_THRESHOLD | 0.51 | face_engine.py |
| SMILE_INCREASE_THRESHOLD | 0.06 | face_engine.py |
| HEAD_X_THRESHOLD | 0.6 | face_engine.py |
| HEAD_UP_THRESHOLD | 0.25 | face_engine.py |
| HEAD_DOWN_THRESHOLD | 0.3 | face_engine.py |
| Face distance threshold | 0.6 | face_cv_task.py, liveness_cv_task.py |
| MAX_FACE_DISTANCE_FROM_CENTER | 0.35 | face_processor.py |
| MIN_FACE_SIZE | 0.03 | face_processor.py |
| MAX_FACE_SIZE | 0.25 | face_processor.py |
Document Processing
| Constant | Value | Location |
|---|---|---|
| CLASSIFICATION_SCORE_THRESHOLD | 0.3 | card_classification_engine.py |
| MIN_SHARPNESS | 80 | document_processor.py |
| MAX_IMAGE_SIZE | 3250x3250 | image_upload.py |
| CORNER_PADDING_PERCENT | 1.5 | document_warp_v2.py |
Hologram Detection
| Constant | Value | Location |
|---|---|---|
| MIN_FRAMES | 50 | holo_detector_engine.py |
| MIN_POSITIVE_SAMPLES | 2 | holo_image_engine.py |
| ORB_MAX_FEATURES | 500 | holo_image_engine.py |
| ORB_KEEP_PERCENT | 0.2 | holo_image_engine.py |
| maxImgDistanceScore | 0.0185 | holo_image_engine.py |
| HOLO_INITIAL_THRESHOLD | 80 | document.py |
| HOLO_THRESHOLD_STEP | 1 | document.py |
| HOLO_MIN_THRESHOLD | 20 | document.py |
| HOLO_MAX_THRESHOLD | 300 | document.py |
| HOLO_UNIFORMITY_THRESHOLD | 200 | document.py |
| HOLO_HSV_MIN_SATURATION | 40 | document.py |
| HOLO_HSV_MIN_VALUE | 40 | document.py |
| HOLO_FALSE_POSITIVE_MULTIPLIER | 10 | document.py |
| HOLO_WARP_MIN_KP_PAIRS | 80 | document.py |
| filterMatches ratio | 0.75 | holo_warp.py |
PAD Detection
| Constant | Value | Location |
|---|---|---|
| MIN_FACE_CROP_AREA | 17280 | pad_detector_cv_task.py |
| clipSize | 8 | pad_detector_cv_task.py |
| maxNumClips | 10 | pad_detector_cv_task.py |
Liveness V2
| Constant | Value | Location |
|---|---|---|
| waitBeforeActionMs | 2000 | liveness_v2_cv_task.py |
| waitForStillImageMs | 5000 | liveness_v2_cv_task.py |
| sunglasses threshold | 0.5 | liveness.py |
| closed_mouth threshold | 0.3 | liveness.py |
| reference_face_head_angle | 0.23 | liveness.py |
| face_difference_head_angle | 0.25 | liveness.py |
| minResolution low | 360 | liveness.py |
| minResolution high | 720 | liveness.py |
Other
| Constant | Value | Location |
|---|---|---|
| TEXT_DETECTOR_CANVAS_SIZE | 1024 | text_detector_engine.py |
| TEXT_DETECTOR_PADDING | (0.01, 0.01) | text_detector_engine.py |
| SPEECH_DETECT_THRESHOLD | 0.9 | speech_detector_cv_task.py |
| FRAME_SIZE_TO_PROCESS_MS | 200 | speech_detector_cv_task.py |
| Barcode rotation angles | 0, 15, -15, 30, -30, 45 | barcode_engine.py |
| Sharpness resize target | 320x240 | sharpness_engine.py |
| Slave process start delay | 10s | status_engine.py |
| WS up check after | 86400s (24h) | status_engine.py |
| Redis image expire | 60s | appcache.py |
| Redis audio expire | 2s | appcache.py |
| WebRTC frame timeout | 2s | frame_store.py |
| runAsyncProcess timeout | 60s | processing.py |
Complete File Index
Entry Points
| File | Purpose |
|---|---|
app_face.py | Face recognition RPC server |
app_card_detector.py | Card detection RPC server |
app_text_detector.py | Text detection RPC server |
app_ocr.py | OCR RPC server |
app_mrz.py | MRZ reading RPC server |
app_detectron2.py | Detectron2 (barcode) RPC server |
app_pad.py | PAD/deepfake RPC server |
app_onnx.py | Background masking / speech RPC server |
app_http.py | HTTP API (Falcon + uWSGI) |
app_websocket.py | WebSocket server |
CV Engines (server/cv/)
| File | Class | Purpose |
|---|---|---|
face_engine.py | FaceEngine | Geometric face analysis (actions, head pose) |
document_engine.py | DocumentEngine | Document detection/classification facade |
document_ocr_engine.py | DocumentOcrEngine | Structured document OCR orchestration |
ocr_engine.py | OcrEngine | Character recognition + text correction |
barcode_engine.py | BarcodeEngine | Barcode detection + reading with rotation |
card_classification_engine.py | CardClassificationEngine | Document type classification |
card_integrity_check_engine.py | CardIntegrityCheckEngine | Tampering detection |
card_warp_engine.py | CardWarpEngine | Document corner detection + perspective warp |
background_mask_engine.py | BackgroundMaskEngine | Background removal/replacement |
holo_detector_engine.py | HoloDetectorEngine | Video-based hologram detection orchestrator |
holo_image_engine.py | HoloImageEngine | Image-based hologram detection |
kaptcha_detect_engine.py | KaptchaEngine | CAPTCHA solving |
pad_detection_engine.py | PADDetectionEngine | Deepfake + PAD RPC facade |
sharpness_engine.py | SharpnessEngine | Image sharpness (Laplacian variance) |
status_engine.py | StatusEngine | System health monitoring |
text_detector_engine.py | TextDetectorEngine | Text region detection |
Hologram (server/cv/holo/)
| File | Class | Purpose |
|---|---|---|
holo_calculator.py | HoloCalculator | HSV analysis, chi-squared filtering, scoring |
holo_stack.py | HoloStack | Frame accumulation in HSV space |
holo_warp.py | HoloWarp | SIFT+FLANN alignment + perspective warp |
HTTP Resources (server/http/resources/)
| File | Class | Endpoint |
|---|---|---|
face_detect.py | FaceDetectResource | Face detection + encodings |
face_compare.py | FaceCompareResource | Face comparison |
face_draw.py | FaceDrawResource | Landmark visualization |
face_age_gender.py | FaceGenderAgeResource | Age/gender prediction |
reference_face_extract.py | ReferenceFaceExtractResource | Quality-checked face extraction |
document_recognition_v2.py | DocumentRecognitionResourceV2 | Document detection + classification |
document_warp_v2.py | DocumentWarpResourceV2 | Full document warp pipeline |
document_ocr.py | DocumentOcrResource | Document OCR |
document_types.py | DocumentTypesResource | List document types |
card_warp.py | CardWarpResource | Corner detection + warp only |
card_integrity_check.py | CardIntegrityCheckResource | Tampering detection |
barcode.py | BarCodeResource | Barcode reading |
barcode_detect.py | BarCodeDetectResource | Barcode region detection |
background_mask.py | BackgroundMaskResource | Background removal |
image_upload.py | ImageUploadResource | Image upload |
image_download.py | ImageDownloadResource | Image download |
kaptcha_decoder.py | KaptchaDecoderResource | CAPTCHA solving |
mrz.py | MrzResource | MRZ reading |
ocr.py | OcrResource | Generic OCR |
sharpness.py | SharpnessResource | Sharpness calculation |
ping.py | PingResource | Health check |
status.py | StatusResource | Server status |
exeption_handler.py | ErrorBase | Global exception handler (typo in filename) |
WebSocket Tasks (server/websocket/task/)
| File | Class | Purpose |
|---|---|---|
cv_task.py | AbstractCVTask, BasicCVTask, CVTask | Base task hierarchy |
face_cv_task.py | FaceCVTask | Streaming face recognition |
document_cv_task.py | DocumentCVTask | Streaming document detection |
liveness_cv_task.py | LivenessCVTask | V1 liveness challenge |
liveness_v2_cv_task.py | LivenessV2CVTask | V2 liveness with phases |
action_cv_task.py | ActionCVTask | Single action verification |
barcode_cv_task.py | BarcodeCVTask | Streaming barcode reading |
mrz_cv_task.py | MrzCVTask | Streaming MRZ reading |
sharpness_cv_task.py | SharpnessCVTask | Streaming sharpness check |
speech_detector_cv_task.py | SpeechDetectorTask | Voice activity detection |
holo_video_cv_task.py | HoloVideoCVTask | Video hologram detection |
holo_image_cv_task.py | HoloImageCVTask | Image hologram detection (via WS) |
holo_v2_cv_task.py | HoloV2CVTask | V2 hologram (ORB-based) |
pad_detector_cv_task.py | PADDetectorCVTask | Anti-spoofing detection |
stream_player_cv_task.py | StreamPlayerCVTask | Media file playback via WebRTC |
WebSocket Processors (server/websocket/processor/)
| File | Class | Purpose |
|---|---|---|
processor.py | AbstractProcessor | Base multiprocessing processor |
face_processor.py | FaceProcessor, OnDemandFaceProcessor | Face processing with config |
document_processor.py | DocumentProcessor | Document detection pipeline |
barcode_processor.py | BarcodeProcessor | Barcode reading |
holo_processor.py | HoloProcessor | SIFT-based card warp for holo |
holo_v2_processor.py | HoloV2Processor | ORB-based diff mask for holo |
mrz_processor.py | MrzProcessor | MRZ reading via RPC |
sharpness_processor.py | SharpnessProcessor | Sharpness calculation |
speech_processor.py | SpeechProcessor | Speech detection via RPC |
WebSocket FrameStores (server/websocket/framestore/)
| File | Class | Purpose |
|---|---|---|
frame_store.py | FrameStore | Base class for frame extraction |
numpy_frame_store.py | NumpyFrameStore | Video → numpy via Redis |
raw_frame_store.py | RawFrameStore | Video → JPEG via Redis |
audio_frame_store.py | AudioFrameStore | Audio with resampling |
WebSocket Core
| File | Class | Purpose |
|---|---|---|
stream_handler.py | StreamHandler | WebRTC signaling |
wsconnection.py | WSConnection | WebSocket message routing |
Documents (server/documents/)
| File | Class | Purpose |
|---|---|---|
document.py | AbstractDocument | Base document definition |
__init__.py | - | Document registry (17 documents) |
hun_ao_03001.py | HunAo03001 | Hungarian ID card |
hun_bo_*.py | HunBo* | Hungarian driving licenses (7 variants) |
hun_fo_*.py | HunFo* | Hungarian personal IDs (4 variants) |
hun_ho_10001.py | HunHo10001 | Hungarian residence permit |
srb_ao_01001.py | SrbAo01001 | Serbian ID |
srb_bo_01001_*.py | SrbBo01001* | Serbian driving license (front/back) |
Utilities (server/utils/)
| File | Purpose |
|---|---|
image.py | writeOnImage() helper |
lazy_image.py | LazyImage with Redis-backed metadata caching |
face.py | Face position calculation, liveness visualization |
liveness.py | Action sequence generation, reference face checks, thresholds |
processing.py | runAsyncProcess() — spawn OS process for blocking work |
roi.py | ROI geometry utilities (rectangle, polygon, point-in-polygon) |
streaming_buffer.py | Thread-safe streaming buffer for PAD detection |
fps_counter.py | FPS measurement with sliding window |
warmup.py | Model warmup runner (GPU only, 5 runs) |
webrtc.py | WebRTC retry logic for TURN allocation mismatch |
dates.py | Hungarian month names |
dict.py | getKeyByValue() utility |
exceptions.py | Custom exception classes (12 types) |
async_extra.py | AsyncSlot, waitForUntilEvent |
lstm_label_converter.py | CTC decoder for LSTM OCR |
http_not_found.py | 404 handler |
Config & Infrastructure
| File | Purpose |
|---|---|
server/cfg.py | Config loading, JSON schema validation |
server/appcache.py | Redis-based image/audio/metadata cache |
server/rpc/pyredisrpc.py | Redis RPC client/server |
Project Structure
app_*.py Entry points (10 processes)
server/ Python application code
cv/ CV engines (16 engines)
holo/ Hologram detection (3 modules)
http/ Falcon HTTP resources (22 endpoints)
resources/
websocket/ WebSocket system
task/ CV tasks (15 task types)
processor/ Multiprocessing processors (9 types)
framestore/ Frame extraction (4 types)
documents/ Document definitions (17 documents)
utils/ Utility modules (15 files)
rpc/ Redis RPC implementation
packages/ Internal Python packages
onnx/ ONNX model weight files (Git LFS)
config/ Configuration files + scaling presets
data/ Runtime data (EasyOCR models, class mappings)
requirements/ Python dependency management
*.in Direct dependencies
*.txt Pinned/compiled dependencies
compile.sh Compile dependencies
install.sh Install dependencies
setup/ Setup scripts
tests/ Test suites
web/ Web assets
bin/ CLI utilities
Development
# Development mode (via vuer_docker)
docker-compose -f vuer-cv-dev.yml up -d
# Install new packages
docker exec -it vuer_cv_dev bash
./requirements/compile.sh
./requirements/install.sh
# Upgrade all dependencies
./requirements/compile.sh --upgrade
# Git LFS (required for model weights)
git lfs pullRelated
- vuer_oss - Backend server that calls CV services via HTTP/WebSocket
- vuer_css - Frontend that initiates WebRTC sessions for real-time CV
- vuer_docker - Docker orchestration (vuer-cv.yml / vuer-cv-dev.yml / vuer-cv-gpu.yml)
- FaceKom - Platform overview
- security-audit - Security issues tracking
- tech-debt - Technical debt tracking