EU & HU Tech Tender Pipeline — Implementation Guide

This document validates every data source, API, URL, schema, and library choice from the original spec. All claims were verified against live systems on 2026-04-26. Where the spec was wrong or incomplete, corrections are noted.


1. Data Source Validation

1.1 TED Search API v3

Status: Live, no auth required for published notices.

Endpoint: POST https://api.ted.europa.eu/v3/notices/search

Request format: JSON body:

{
  "query": "classification-cpv IN (72000000) AND buyer-country IN (HUN) AND publication-date >= 20260101",
  "fields": ["publication-number", "buyer-name", "notice-type", "estimated-value"],
  "limit": 250,
  "page": 1,
  "paginationMode": "ITERATION"
}

Response format: JSON with totalNoticeCount, iterationNextToken, notices[]. Each notice includes links.xml, links.pdf, links.html keyed by 3-letter language codes (HUN, ENG, DEU).

Spec correction: country codes

The TED API requires ISO 3166-1 alpha-3 codes. Use HUN, not HU. Using HU returns an explicit error: "Value 'HU' is not supported for search field 'buyer-country'".

Pagination:

  • PAGE_NUMBER mode: capped at 15,000 total results, 250 per page.
  • ITERATION mode: no total limit. Returns opaque iterationNextToken for cursor-based pagination. Use this mode for bulk ingestion.

Rate limits: No documented rate limit. No rate-limit headers observed in responses. Test conservatively and increase.

Swagger UI: https://api.ted.europa.eu/swagger-ui/index.html (the /swagger path redirects here). The OpenAPI spec is NOT downloadable as a static JSON file — it loads dynamically in the Swagger UI.

Available sub-APIs (all under same base URL):

  1. Search API
  2. Publication API
  3. Validation API
  4. Visualisation API (renders notices as HTML/PDF)
  5. Conversion API (legacy TED XML → eForms)
  6. Developer Operations API

Key search fields:

FieldDescriptionNotes
classification-cpvCPV codesSupports wildcards: 72*
buyer-countryBuyer country3-letter ISO only (HUN, DEU, FRA)
publication-datePublication dateFormat YYYYMMDD
notice-typeNotice type codecn-standard, can-standard, can-modif
form-typeForm typecompetition, result, dir-awa-pre, cont-modif
procedure-typeProcedure type codeBT-105
estimated-value-*Estimated valueWith currency variants
tender-valueAwarded tender value
buyer-nameBuyer organization name
deadline-receipt-tender-date-lotSubmission deadline

Concrete query examples:

# All HU IT tenders (all time)
classification-cpv IN (72000000) AND buyer-country IN (HUN)

# HU IT tenders since 2026
classification-cpv IN (72000000) AND buyer-country IN (HUN) AND publication-date >= 20260101

# Broad tech CPV wildcard
classification-cpv IN (72*) AND buyer-country IN (HUN)

# Exclusion pattern
classification-cpv IN (72000000) AND classification-cpv NOT IN (72800* 72900*)

# Contract award notices only
form-type IN (result) AND classification-cpv IN (72000000) AND buyer-country IN (HUN)

1.2 TED Bulk XML Packages

Status: Live, no authentication required. Served via CloudFront CDN.

Daily packages:

  • URL: https://ted.europa.eu/packages/daily/{YYYYNNNNN}
  • Example: https://ted.europa.eu/packages/daily/20240000120240102_2024001.tar.gz (~9 MB)
  • The ID is the OJ S publication number for that day

Monthly packages:

  • URL: https://ted.europa.eu/packages/monthly/{YYYY-M}
  • Example: https://ted.europa.eu/packages/monthly/2024-12024-01.tar.gz (~278 MB)

Format: .tar.gz containing individual XML files.

XML filename convention inside archives:

  • eForms notices: up to 8-digit number + year (e.g., 00654321_2022.xml)
  • Legacy notices: up to 6-digit number + year (e.g., 123456_2022.xml)

Spec correction: publication numbers

The 8-digit vs 6-digit distinction applies to XML filenames in bulk archives, NOT to publication numbers in the API. The API normalizes all publication numbers to {number}-{year} format (e.g., 587662-2024), regardless of era. Leading zeros are stripped.

Download page: https://ted.europa.eu/en/simap/xml-bulk-download (the spec’s https://ted.europa.eu/en/help/data-reuse returns HTTP 202 — it’s a JS-rendered SPA that may not load reliably).

1.3 TED CSV Dataset

Spec correction: CSV dataset is frozen

The CSV dataset at https://data.europa.eu/data/datasets/ted-csv?locale=en (redirected from the old EUODP URL) covers 2006-01-01 to 2023-12-31 only. Last updated 2024-01-25. It does NOT cover eForms-era notices. Do not use for current data. Only useful for historical backfill of pre-2023 notices in a flat format.

The old URL https://data.europa.eu/euodp/en/data/dataset/ted-csv permanently redirects to the new path.

1.4 EU Procurement Thresholds (2026–2027)

Spec correction: thresholds updated

The 2026–2027 thresholds (effective Jan 1, 2026) are lower than the spec’s figures, per Delegated Regulations (EU) 2025/2150, 2025/2151, 2025/2152.

Buyer typeGoods/Services (spec)Goods/Services (actual 2026)Works
Central authorities~€143k€140,000€5,404,000
Sub-central authorities~€221k€216,000€5,404,000
Utilities~€443k€432,000€5,404,000
Concessions€5,404,000
Social services (Annex XIV)€750,000 (unchanged)

All major thresholds dropped 2–3% due to EUR/SDR exchange rate shifts under WTO GPA.


1.5 EKR — Elektronikus Közbeszerzési Rendszer

Homepage: https://ekr.gov.hu/ (redirects to https://ekr.gov.hu/portal/kezdolap)

Architecture: Angular SPA (server HTML is just a loading spinner). Backend v3.17.7, Frontend v3.17.8.

Major finding: EKR has a full public REST API

The spec planned for HTML scraping, but EKR exposes a rich, undocumented JSON REST API at /eljarastar/api/public/. No authentication required. This is far more efficient and reliable than scraping.

1.5.1 EKR Eljárástár (Procedure Registry) API

Base: https://ekr.gov.hu/eljarastar/api/public/

Search endpoint: GET /eljarastar/api/public/kereso

ParameterDescription
kifejezesKeyword search
aktualisInclude current procedures (boolean)
jovobeliInclude future procedures (boolean)
lezartInclude closed procedures (boolean)
osszesInclude all procedures (boolean)
offsetPagination offset
limitPage size
orderSort order (e.g., MEGINDITAS_DATUMA_DESC)

Procedure detail: GET /eljarastar/api/public/eljaras/{EKR_ID}

Returns full JSON: eljarasTargy, eljarasAzonosito, eljarasrend, eljarasTipus, foCpvKod, foTargy, eljarasSzakasz, meginditasDatum, ajanlatteteliHatarido, ajanlatkeroList (with full org details), hirdetmenyList, dokumentumList, alkalmassagiKovetelmenyReszletek.

Dictionary/lookup endpoints: GET /eljarastar/api/public/szotar/{DICT_NAME}

Available dictionaries:

  • CPV_KODOK — 9,454 CPV codes
  • ELJARAS_TIPUS — 65 procedure types across Uniós/Nemzeti/EPK/Beszerzési categories
  • ELJARAS_SZAKASZ — procedure phases
  • NUTS_KODOK, PENZNEM, SZERZODES_TIPUS
  • eforms-buyer-legal-type, authority-activity, selection-criterion, eu-programme
  • strategic-procurement, gpp-criteria, environmental-impact, social-objective

Database size: ~114,313 total procedures; ~4,300 active/future at any time.

robots.txt: https://ekr.gov.hu/robots.txt returns 404 — no restrictions.

1.5.2 EKR Szerződéstár (Contract Registry) API

Major finding: the pricing "gold mine" has an API

The spec identified Publikus CoRe as the primary source for price benchmarking. EKR’s own contract registry (Szerződéstár) also has a public API with awarded values, and it links contracts directly to their EKR procedure IDs.

Base: https://ekr.gov.hu/ekr-szerzodestar/api/szerzodesapi/1.0/

List contracts: GET /ekr-szerzodestar/api/szerzodesapi/1.0/szerzodesek?offset=0&limit=10&apikey=PSZT

Contract detail: GET /ekr-szerzodestar/api/szerzodesapi/1.0/szerzodes/{ID}?apikey=PSZT

The API key is PSZT (likely "Publikus Szerződéstár"). It is hard-coded in the frontend — this is an intentionally public API.

Contract detail fields:

  • ekrAzonosito — EKR ID (links to procedure)
  • szerzodesTargya — contract subject
  • bruttoOsszeg — gross value
  • nettoOsszeg — net value
  • bruttoOsszegDevizaneme — currency
  • hatalyossagKezdete / Vege — validity period
  • teljesitesHatarideje — performance deadline
  • tipusaNev — contract type
  • uniosForrasbolFinanszirozott — EU-funded flag
  • gazdasagiSzereploList — economic operators (name, tax number, SME status, full address)
  • szerzodesModositasList — contract modifications
  • teljesitesList — performance records
  • alvallalkozoList — subcontractors
  • eljaras — linked procedure details

Database size: 135,299 contracts.

Web UI: https://ekr.gov.hu/ekr-szerzodestar/hu/szerzodesLista

1.5.3 EKR Hirdetményfigyelő (Notice Watcher)

Available at https://ekr.gov.hu/portal/hirdetmenyfigyelo but requires login. The Eljárástár search page mentions the feature is accessible from within the Eljárástár. This is a user-facing alert feature, not a data endpoint. For programmatic monitoring, use the Eljárástár API with date filters.

1.6 Közbeszerzési Hatóság (kozbeszerzes.hu)

Homepage: https://www.kozbeszerzes.hu/ — live, server-rendered HTML (Django-based).

1.6.1 Közbeszerzési Értesítő (Notice Gazette)

Search URL: https://www.kozbeszerzes.hu/adatbazis/keres/hirdetmeny/

robots.txt blocks scraping

https://www.kozbeszerzes.hu/robots.txt explicitly disallows:

Disallow: /adatbazis/keres/hirdetmeny/
Disallow: /adatbazis/megtekint/hirdetmeny/
Disallow: /ertesito/

Both the search page and notice detail pages are blocked. Do not scrape these paths. Use EKR APIs and TED instead.

Available search fields (for reference, not for scraping): CPV kód, Ajánlatkérő neve, Nyertes ajánlattevő, Teljesítés helye, Beszerzés tárgya, Eljárás fajtája, Ajánlattételi/részvételi határidő, Közzététel dátuma, Iktatószám, Ajánlatkérő típusa, Ajánlatkérő tevékenységi köre, Hirdetmény típusa, Lapszám.

1.6.2 Publikus CoRe (Contract Registry)

URL: https://kereso-core.kozbeszerzes.hu (subdomain, NOT a path under www)

CoRe is on a different subdomain

The www.kozbeszerzes.hu robots.txt does NOT apply to kereso-core.kozbeszerzes.hu. CoRe has no robots.txt of its own. Server-rendered HTML, no SPA.

Database size: 152,460 contracts.

Search modes:

  • Simple: https://kereso-core.kozbeszerzes.hu/kereses/kozbeszerzes/egyszeru/
  • Advanced: https://kereso-core.kozbeszerzes.hu/kereses/kozbeszerzes/osszetett/
  • Full database: https://kereso-core.kozbeszerzes.hu/lista/kozbeszerzes/

Pagination: ?oldal=N, 20 items per page, 7,623 pages.

Advanced search fields: Szövegben szereplő szavak, Szerződés típusa, Szerződés jellege, Szerződés tárgya, Ajánlatkérő neve, Eljárás tárgya, Szerződés státusza, Megkötéskori érték (range), Aláírás dátuma (range).

Table columns: Eljárás azonosítója (Procedure ID), Ajánlatkérő, Szerződés tárgya, Típusa, Megkötéskori értéke, Aláírás dátuma.

No export/API. HTML scraping with cheerio is the only option. Value data may only appear in detail pages, not in list view.

CoRe vs EKR Szerződéstár overlap

CoRe has 152,460 contracts (includes pre-EKR era). EKR Szerződéstár has 135,299 (post-2018 only). For historical depth, use CoRe. For structured API access with linked procedure IDs, use EKR Szerződéstár. Consider using both: EKR API as primary, CoRe HTML scraping for pre-2018 backfill.

1.6.3 Publikus KBA (Legacy Archive)

URL: https://kba.kozbeszerzes.hu/apex/f?p=103:35

Oracle APEX application. Still live. Covers pre-2018 procedures. Public user access. Low priority for the pipeline.

ReferenceURLStatus
424/2017. Korm. rendelethttps://net.jogtar.hu/jogszabaly?docid=a1700424.korLive, full text available
Also athttps://uj.jogtar.hu/#doc/db/1/id/a1700424.korPaid Jogtar version

2. eForms SDK & Schema Mapping

2.1 eForms SDK

Repo: https://github.com/OP-TED/eForms-SDK — active, 84 stars, 40 forks.

Latest stable: v1.14.2 (2026-03-02). Maintenance release v1.13.3 on 2026-04-15. Pre-release v2.0.0-alpha.2 on 2026-03-26 (major EFX rewrite, not production-ready).

License: CC-BY-4.0.

Contents:

DirectoryContent
schemas/UBL 2.3 XSDs: ContractNotice, ContractAwardNotice, PriorInformationNotice, BusinessRegistrationInformationNotice
codelists/Genericode (.gc) files: CPV, NUTS, countries, currencies, procedure types, notice types
fields/fields.json — metadata for all 286 unique business terms (1,256 total field entries)
notice-types/51 notice subtype definitions (subtypes 1–40, E1–E6, T01–T02, X01–X02, CEI)
examples/Sample eForms notices with SVRL validation reports
schematrons/Schematron validation rules
efx-grammar/ANTLR4 grammar for EFX expression language
view-templates/Visualisation templates
translations/Multilingual label translations

Related repos:

  • https://github.com/OP-TED/ted-xml-data-converter — XSLT converter from legacy TED R2.0.9 to eForms
  • Non-convertible elements: https://github.com/OP-TED/ted-xml-data-converter/blob/main/ted-elements-not-convertible.md

2.2 Key Business Terms for the Pipeline

Data pointBT IDeForms XPath (abbreviated)
Main CPV codeBT-262-Lot.../MainCommodityClassification/ItemClassificationCode
Additional CPVBT-263-Lot.../AdditionalCommodityClassification/ItemClassificationCode
Estimated valueBT-27-Lot.../RequestedTenderTotal/EstimatedOverallContractAmount
Awarded tender valueBT-720-Tender.../LotTender/LegalMonetaryTotal/...
Notice total valueBT-161-NoticeResult.../NoticeResult/TotalAmount
Procedure typeBT-105-Procedure.../TenderingProcess/ProcedureCode
Submission deadline (date)BT-131(d)-Lot.../TenderSubmissionDeadlinePeriod/EndDate
Submission deadline (time)BT-131(t)-Lot.../TenderSubmissionDeadlinePeriod/EndTime
Place of performance (NUTS)BT-5071-Lot.../RealizedLocation/Address/CountrySubentityCode
Place of performance (country)BT-5141-Lot.../RealizedLocation/Address/Country/IdentificationCode
Notice typeBT-02-notice/*/cbc:NoticeTypeCode
Buyer org nameBT-500-Organization-Company.../PartyName/Name
Buyer org IDBT-501-Organization-Company.../PartyIdentification/ID
Buyer activityBT-10-Procedure-Buyer.../ContractingActivity/ActivityTypeCode
Buyer legal typeBT-11-Procedure-Buyer.../ContractingPartyType/PartyTypeCode

Design notes

  • CPV codes appear at three levels: Procedure, Part, and Lot. Check all three, prioritize Lot.
  • Values appear at multiple levels (Lot, Procedure, NoticeResult). Currency is a separate companion field.
  • Deadline is split into date + time fields.
  • Organizations are in a flat list under efac:Organizations; buyers are linked via OPT-300-Procedure-Buyer referencing org technical IDs.

2.3 Legacy TED vs eForms — Mapping for the Unified Tender Type

AspectLegacy TED XMLeForms (UBL 2.3)
Schema basisCustom TED XSD, form-based (F01–F25)UBL 2.3 Pre-Award documents
Lot structureLots embedded in OBJECT_CONTRACTFirst-class ProcurementProjectLot
CPV codesCPV_MAIN/CPV_CODE @CODEMainCommodityClassification/ItemClassificationCode via BT-262
ValuesVAL_TOTAL, VAL_ESTIMATED_TOTAL, VAL_RANGE_TOTALBT-27 (estimated), BT-720 (tender), BT-161 (notice total)
OrganizationsInline in CONTRACTING_BODY, CONTRACTORCentralized efac:Organizations list with cross-references
Notice typesForm number: F01=PIN, F02=CN, F03=CANNoticeTypeCode + 51 subtypes
Procedure typePROCEDURE/PT_* elements (PT_OPEN, PT_RESTRICTED)Coded value in ProcedureCode
DeadlineDT_DATE_FOR_SUBMISSION (single field)Split: BT-131(d) + BT-131(t)
Place of performanceNUTS/@CODE + MAIN_SITENUTS (BT-5071) + country (BT-5141) + city (BT-5131)

Fields in legacy but NOT in eForms: VAL_RANGE_TOTAL/HIGH+LOW, DATE_AWARD_SCHEDULED, REFERENCE_NUMBER (object-level), URL_NATIONAL_PROCEDURE, CA_TYPE_OTHER/CA_ACTIVITY_OTHER (free text alternatives).

Fields in eForms but NOT in legacy: Framework re-estimated values, buyer review/complaint tracking, foreign subsidy regulation (FSR), green/social/innovative procurement indicators, Clean Vehicles Directive fields, detailed subcontracting fields.


3. CPV Code Reference

3.1 Validation of Spec CPV Codes

All five CPV roots from the spec are confirmed present in the eForms SDK cpv.gc codelist (9,454 entries, CPV 2008 edition):

CodeOfficial labelHierarchy level
72000000IT services: consulting, software development, Internet and supportDivision root
48000000Software package and information systemsDivision root
30200000Computer equipment and suppliesGroup (parent: 30000000)
32000000Radio, television, communication, telecommunication and related equipmentDivision root
51610000Installation services of computers and information-processing equipmentClass (parent: 51600000)

Hierarchy note

30200000 and 51610000 are NOT division roots. Searches must use exact-match or prefix-match at the appropriate level — they will not automatically capture sibling codes under 30000000 or 51600000.

CodeLabelRationale
50300000Repair/maintenance of PCs, office equipment, telecom equipmentIT support/maintenance contracts
64200000Telecommunications servicesService contracts (distinct from equipment in 32xxx)

Optional broader scope: 73000000 (R&D services) if research-oriented tenders are desired.

3.3 CPV Subdivision Trees

72000000 — IT services:

CodeLabel
72100000Hardware consultancy services
72200000Software programming and consultancy services
72300000Data services
72400000Internet services
72500000Computer-related services
72600000Computer support and consultancy services
72700000Computer network services
72800000Computer audit and testing services
72900000Computer back-up and catalogue conversion services

48000000 — Software packages:

CodeLabel
48100000Industry specific software package
48200000Networking, Internet and intranet software package
48300000Document creation, drawing, imaging, scheduling and productivity software
48400000Business transaction and personal business software
48500000Communication and multimedia software package
48600000Database and operating software package
48700000Software package utilities
48800000Information systems and servers
48900000Miscellaneous software package and computer systems

30200000 — Computer equipment:

CodeLabel
30210000Data-processing machines (hardware)
30220000Digital cartography equipment
30230000Computer-related equipment

32000000 — Radio/TV/telecom equipment:

CodeLabel
32200000Transmission apparatus for radiotelephony, radio broadcasting, TV
32300000TV/radio receivers, sound/video recording apparatus
32400000Networks
32500000Telecommunications equipment and supplies

4. Revised Pipeline Architecture

graph TD
    subgraph Sources
        TED["TED Search API v3<br/>(JSON, ITERATION mode)"]
        BULK["TED Bulk XML<br/>(monthly tar.gz backfill)"]
        EKR_P["EKR Eljárástár API<br/>(JSON, /api/public/)"]
        EKR_C["EKR Szerződéstár API<br/>(JSON, apikey=PSZT)"]
        CORE["CoRe HTML scrape<br/>(kereso-core.kozbeszerzes.hu)"]
    end

    subgraph "Parse & Normalize"
        PARSE_TED["Parse TED JSON/XML<br/>fast-xml-parser"]
        PARSE_EKR["Parse EKR JSON"]
        PARSE_CORE["Parse CoRe HTML<br/>cheerio"]
        NORM["Normalize → Tender type<br/>zod .transform()"]
    end

    subgraph "Enrich & Store"
        DEDUP["Dedupe by tender ID<br/>(EKR ID ↔ TED publication-number)"]
        PRICE["Enrich: pricing model<br/>(from EKR Szerződéstár + CoRe)"]
        SCORE["Score & filter<br/>(CPV match, margin viability)"]
        DB["SQLite + FTS5<br/>(better-sqlite3 + drizzle)"]
    end

    TED --> PARSE_TED
    BULK --> PARSE_TED
    EKR_P --> PARSE_EKR
    EKR_C --> PARSE_EKR
    CORE --> PARSE_CORE

    PARSE_TED --> NORM
    PARSE_EKR --> NORM
    PARSE_CORE --> NORM

    NORM --> DEDUP
    DEDUP --> PRICE
    PRICE --> SCORE
    SCORE --> DB

Key architecture change from original spec

The original spec planned for 3 sources (TED API, EKR scrape, CoRe scrape). The validated architecture uses 5 sources: TED Search API, TED Bulk XML, EKR Eljárástár API, EKR Szerződéstár API, and CoRe HTML scrape. EKR is now API-driven (no HTML scraping). CoRe remains HTML-only but is supplementary — most pricing data is available via the EKR Szerződéstár API.


5. Implementation Steps

Step 1: Project Setup

tender-pipeline/
├── src/
│   ├── sources/
│   │   ├── ted-api.ts           # TED Search API client
│   │   ├── ted-bulk.ts          # TED bulk XML ingestion
│   │   ├── ekr-procedures.ts    # EKR Eljárástár API client
│   │   ├── ekr-contracts.ts     # EKR Szerződéstár API client
│   │   └── core-scraper.ts      # CoRe HTML scraper
│   ├── schema/
│   │   ├── tender.ts            # Unified Tender zod schema
│   │   ├── contract.ts          # Unified Contract zod schema
│   │   └── adapters/
│   │       ├── ted-eforms.ts    # eForms XML → Tender
│   │       ├── ted-legacy.ts    # Legacy TED XML → Tender
│   │       ├── ekr.ts           # EKR JSON → Tender
│   │       └── core.ts          # CoRe HTML → Contract
│   ├── pipeline/
│   │   ├── stage.ts             # Stage<TIn, TOut> interface
│   │   ├── runner.ts            # Pipeline composition and execution
│   │   ├── normalize.ts         # Normalization stage
│   │   ├── dedupe.ts            # Deduplication stage
│   │   ├── enrich-pricing.ts    # Price enrichment stage
│   │   └── score.ts             # Scoring/filtering stage
│   ├── db/
│   │   ├── schema.ts            # Drizzle schema definition
│   │   ├── migrations/          # Drizzle migrations
│   │   └── queries.ts           # Common query helpers
│   └── ingest.ts                # Entry point (daily run)
├── drizzle.config.ts
├── package.json
└── tsconfig.json

Dependencies:

PackageVersionPurpose
undici8.1.0HTTP client (streaming, cookies, retry)
fast-xml-parser5.7.2XML parsing (TED notices)
cheerio1.2.0HTML scraping (CoRe)
p-throttle8.1.0Per-domain rate limiting
p-queue9.1.2Concurrency control
tar7.5.13Bulk XML extraction from tar.gz
zod4.3.6Schema validation + type inference
better-sqlite312.9.0SQLite database
drizzle-orm0.45.2Type-safe ORM
drizzle-kit0.31.10Schema migrations
node-cron4.2.1Scheduling (if not using OS cron)

Step 2: Source Clients

TED Search API client:

  • Use undici.fetch() for JSON POST requests
  • ITERATION mode pagination (no 15k cap)
  • Separate throttled fetcher: pThrottle({limit: 5, interval: 1000})(tedFetch)
  • Query builder for Expert Search syntax
  • Fields to request: publication-number, notice-type, form-type, buyer-name, all value fields, CPV, deadline, place of performance

TED Bulk XML ingestion:

  • Use undici.request() for streaming response
  • Pipe through tar.extract() with filter for .xml files
  • Parse each XML file with fast-xml-parser using removeNSPrefix: true, ignoreAttributes: false
  • Two adapters: one for eForms (UBL 2.3), one for legacy TED schema
  • Detect format by checking root element namespace or publication number format

EKR Eljárástár API client:

  • Use undici.fetch() for JSON GET requests
  • Paginate via offset + limit parameters
  • Fetch procedure detail by EKR ID for full data
  • Use dictionary endpoints to populate CPV and procedure type lookup tables on startup
  • Throttle: pThrottle({limit: 1, interval: 1000})(ekrFetch)

EKR Szerződéstár API client:

  • Use undici.fetch() with apikey=PSZT query parameter
  • Paginate via offset + limit
  • Extract bruttoOsszeg, nettoOsszeg, bruttoOsszegDevizaneme, gazdasagiSzereploList
  • Link to procedures via eljarasEkrAzonosito

CoRe HTML scraper:

  • Use undici.fetch() + cheerio.load() for server-rendered HTML
  • Paginate via ?oldal=N parameter (20 items/page, ~7,600 pages for full backfill)
  • Extract from list view: procedure ID, buyer, subject, type, value, signing date
  • Follow detail links for additional fields if needed
  • Throttle: pThrottle({limit: 1, interval: 1000})(coreFetch)

Step 3: Unified Schema

Define a Tender zod schema covering the union of fields across all sources. Use .transform() in each adapter to normalize source-specific shapes into the common type.

Key fields for the unified Tender type:

  • id (composite: source + source-specific ID)
  • source (ted | ekr | core)
  • tedPublicationNumber (nullable)
  • ekrId (nullable)
  • noticeType (cn | can | pin | other)
  • title / subject
  • cpvCodes (array)
  • buyerName, buyerCountry, buyerType
  • estimatedValue, awardedValue, currency
  • procedureType
  • submissionDeadline
  • placeOfPerformance (NUTS code + text)
  • publicationDate
  • rawSource (original JSON/XML stored for forensic reference)

Step 4: Deduplication

Above-threshold HU tenders appear in both TED and EKR. Deduplicate by:

  1. Match on EKR ID (if present in both sources)
  2. Match on TED publication number (if EKR notice references it)
  3. Fuzzy match on title + buyer + CPV + date (fallback)

Prefer EKR as the authoritative source for HU-origin tenders (more fields, Hungarian-language detail).

Step 5: Pricing Model

Build from EKR Szerződéstár + CoRe data:

  • Index awarded contracts by CPV code (primary), buyer (secondary), region (tertiary)
  • Compute per-CPV median/p25/p75 awarded values
  • Normalize to EUR using ECB daily rates (HUF tenders need conversion)
  • The awarded value, not the estimated value, is the market signal

Step 6: Scoring & Output

For each active tender, compute:

  • CPV relevance score
  • Price competitiveness (estimated value vs historical award median for same CPV)
  • MEAT weight analysis (if evaluation criteria are available from the notice)
  • Days until deadline
  • Buyer history (repeat buyer? predictable evaluation patterns?)

Store in SQLite with FTS5 index on title, subject, buyer name for ad-hoc search.

Step 7: Scheduling

Incremental (daily):

  • TED Search API: query with publication-date >= {yesterday} for new notices
  • EKR Eljárástár API: query with date filter for new/updated procedures
  • EKR Szerződéstár API: paginate recent contracts for pricing model updates

Backfill (one-time):

  • TED Bulk XML: download monthly packages for historical data
  • EKR: paginate full procedure list (osszes=true, 114k+ records)
  • CoRe: paginate full database (~7,600 pages)

Schedule via OS cron: 0 6 * * * cd /path/to/pipeline && node dist/ingest.ts


6. Open Questions — Updated

  • Verify TED Search API v3 anonymous rate limit — No documented limit. Test and increase.
  • EKR ToS review: legal status of scraping — EKR has no robots.txt, exposes /api/public/ endpoints intentionally. Strong signal this is meant to be public.
  • CoRe export format — HTML only. No structured download or API.
  • eForms vs legacy TED schema: build the two adapters and test against sample notices from the eForms SDK examples/ directory.
  • Currency normalization: integrate ECB daily reference rates for HUF→EUR conversion. API at https://data-api.ecb.europa.eu/.
  • Storage decision — SQLite + FTS5 via better-sqlite3 + drizzle. Keep rawSource field for forensic reference of original data.
  • Pricing model partition key: start with per-CPV (broadest coverage), add per-buyer as a refinement once data volume is sufficient.
  • EKR Eljárástár API: test whether CPV filtering works via query parameter or requires post-fetch filtering.
  • Determine if EKR Szerződéstár contract detail always includes value data, or if some contracts have null values.

7. Verified URLs

ResourceURLStatus
TED Search APIhttps://api.ted.europa.eu/v3/notices/searchLive
TED Swagger UIhttps://api.ted.europa.eu/swagger-ui/index.htmlLive
TED Bulk (daily)https://ted.europa.eu/packages/daily/{YYYYNNNNN}Live
TED Bulk (monthly)https://ted.europa.eu/packages/monthly/{YYYY-M}Live
TED Docshttps://docs.ted.europa.eu/api/latest/Live
TED Search Docshttps://docs.ted.europa.eu/api/latest/search.htmlLive
TED Search Fieldshttps://docs.ted.europa.eu/ODS/latest/reuse/field-list.htmlLive
TED CSV (frozen)https://data.europa.eu/data/datasets/ted-csv?locale=enLive (data ends 2023)
eForms SDKhttps://github.com/OP-TED/eForms-SDKActive (v1.14.2)
TED XML Converterhttps://github.com/OP-TED/ted-xml-data-converterActive
EKR Homepagehttps://ekr.gov.hu/portal/kezdolapLive
EKR Eljárástárhttps://ekr.gov.hu/eljarastar/Live
EKR Procedure APIhttps://ekr.gov.hu/eljarastar/api/public/keresoLive
EKR Dictionary APIhttps://ekr.gov.hu/eljarastar/api/public/szotar/{name}Live
EKR Szerződéstárhttps://ekr.gov.hu/ekr-szerzodestar/hu/szerzodesListaLive
EKR Contract APIhttps://ekr.gov.hu/ekr-szerzodestar/api/szerzodesapi/1.0/szerzodesek?apikey=PSZTLive
KH Homepagehttps://www.kozbeszerzes.hu/Live
KH Értesítő Searchhttps://www.kozbeszerzes.hu/adatbazis/keres/hirdetmeny/Live (robots.txt blocked)
Publikus CoRehttps://kereso-core.kozbeszerzes.huLive
Publikus KBAhttps://kba.kozbeszerzes.hu/apex/f?p=103:35Live (legacy)
CPV Referencehttps://simap.ted.europa.eu/web/simap/cpvLive
424/2017 Korm. rendelethttps://net.jogtar.hu/jogszabaly?docid=a1700424.korLive