EU & HU Tech Tender Pipeline — Implementation Guide
Companion to source-strategy-spec
This document validates every data source, API, URL, schema, and library choice from the original spec. All claims were verified against live systems on 2026-04-26. Where the spec was wrong or incomplete, corrections are noted.
1. Data Source Validation
1.1 TED Search API v3
Status: Live, no auth required for published notices.
Endpoint: POST https://api.ted.europa.eu/v3/notices/search
Request format: JSON body:
{
"query": "classification-cpv IN (72000000) AND buyer-country IN (HUN) AND publication-date >= 20260101",
"fields": ["publication-number", "buyer-name", "notice-type", "estimated-value"],
"limit": 250,
"page": 1,
"paginationMode": "ITERATION"
}Response format: JSON with totalNoticeCount, iterationNextToken, notices[]. Each notice includes links.xml, links.pdf, links.html keyed by 3-letter language codes (HUN, ENG, DEU).
Spec correction: country codes
The TED API requires ISO 3166-1 alpha-3 codes. Use
HUN, notHU. UsingHUreturns an explicit error:"Value 'HU' is not supported for search field 'buyer-country'".
Pagination:
PAGE_NUMBERmode: capped at 15,000 total results, 250 per page.ITERATIONmode: no total limit. Returns opaqueiterationNextTokenfor cursor-based pagination. Use this mode for bulk ingestion.
Rate limits: No documented rate limit. No rate-limit headers observed in responses. Test conservatively and increase.
Swagger UI: https://api.ted.europa.eu/swagger-ui/index.html (the /swagger path redirects here). The OpenAPI spec is NOT downloadable as a static JSON file — it loads dynamically in the Swagger UI.
Available sub-APIs (all under same base URL):
- Search API
- Publication API
- Validation API
- Visualisation API (renders notices as HTML/PDF)
- Conversion API (legacy TED XML → eForms)
- Developer Operations API
Key search fields:
| Field | Description | Notes |
|---|---|---|
classification-cpv | CPV codes | Supports wildcards: 72* |
buyer-country | Buyer country | 3-letter ISO only (HUN, DEU, FRA) |
publication-date | Publication date | Format YYYYMMDD |
notice-type | Notice type code | cn-standard, can-standard, can-modif |
form-type | Form type | competition, result, dir-awa-pre, cont-modif |
procedure-type | Procedure type code | BT-105 |
estimated-value-* | Estimated value | With currency variants |
tender-value | Awarded tender value | |
buyer-name | Buyer organization name | |
deadline-receipt-tender-date-lot | Submission deadline |
Concrete query examples:
# All HU IT tenders (all time)
classification-cpv IN (72000000) AND buyer-country IN (HUN)
# HU IT tenders since 2026
classification-cpv IN (72000000) AND buyer-country IN (HUN) AND publication-date >= 20260101
# Broad tech CPV wildcard
classification-cpv IN (72*) AND buyer-country IN (HUN)
# Exclusion pattern
classification-cpv IN (72000000) AND classification-cpv NOT IN (72800* 72900*)
# Contract award notices only
form-type IN (result) AND classification-cpv IN (72000000) AND buyer-country IN (HUN)
1.2 TED Bulk XML Packages
Status: Live, no authentication required. Served via CloudFront CDN.
Daily packages:
- URL:
https://ted.europa.eu/packages/daily/{YYYYNNNNN} - Example:
https://ted.europa.eu/packages/daily/202400001→20240102_2024001.tar.gz(~9 MB) - The ID is the OJ S publication number for that day
Monthly packages:
- URL:
https://ted.europa.eu/packages/monthly/{YYYY-M} - Example:
https://ted.europa.eu/packages/monthly/2024-1→2024-01.tar.gz(~278 MB)
Format: .tar.gz containing individual XML files.
XML filename convention inside archives:
- eForms notices: up to 8-digit number + year (e.g.,
00654321_2022.xml) - Legacy notices: up to 6-digit number + year (e.g.,
123456_2022.xml)
Spec correction: publication numbers
The 8-digit vs 6-digit distinction applies to XML filenames in bulk archives, NOT to publication numbers in the API. The API normalizes all publication numbers to
{number}-{year}format (e.g.,587662-2024), regardless of era. Leading zeros are stripped.
Download page: https://ted.europa.eu/en/simap/xml-bulk-download (the spec’s https://ted.europa.eu/en/help/data-reuse returns HTTP 202 — it’s a JS-rendered SPA that may not load reliably).
1.3 TED CSV Dataset
Spec correction: CSV dataset is frozen
The CSV dataset at
https://data.europa.eu/data/datasets/ted-csv?locale=en(redirected from the old EUODP URL) covers 2006-01-01 to 2023-12-31 only. Last updated 2024-01-25. It does NOT cover eForms-era notices. Do not use for current data. Only useful for historical backfill of pre-2023 notices in a flat format.
The old URL https://data.europa.eu/euodp/en/data/dataset/ted-csv permanently redirects to the new path.
1.4 EU Procurement Thresholds (2026–2027)
Spec correction: thresholds updated
The 2026–2027 thresholds (effective Jan 1, 2026) are lower than the spec’s figures, per Delegated Regulations (EU) 2025/2150, 2025/2151, 2025/2152.
| Buyer type | Goods/Services (spec) | Goods/Services (actual 2026) | Works |
|---|---|---|---|
| Central authorities | ~€143k | €140,000 | €5,404,000 |
| Sub-central authorities | ~€221k | €216,000 | €5,404,000 |
| Utilities | ~€443k | €432,000 | €5,404,000 |
| Concessions | — | — | €5,404,000 |
| Social services (Annex XIV) | — | €750,000 (unchanged) | — |
All major thresholds dropped 2–3% due to EUR/SDR exchange rate shifts under WTO GPA.
1.5 EKR — Elektronikus Közbeszerzési Rendszer
Homepage: https://ekr.gov.hu/ (redirects to https://ekr.gov.hu/portal/kezdolap)
Architecture: Angular SPA (server HTML is just a loading spinner). Backend v3.17.7, Frontend v3.17.8.
Major finding: EKR has a full public REST API
The spec planned for HTML scraping, but EKR exposes a rich, undocumented JSON REST API at
/eljarastar/api/public/. No authentication required. This is far more efficient and reliable than scraping.
1.5.1 EKR Eljárástár (Procedure Registry) API
Base: https://ekr.gov.hu/eljarastar/api/public/
Search endpoint: GET /eljarastar/api/public/kereso
| Parameter | Description |
|---|---|
kifejezes | Keyword search |
aktualis | Include current procedures (boolean) |
jovobeli | Include future procedures (boolean) |
lezart | Include closed procedures (boolean) |
osszes | Include all procedures (boolean) |
offset | Pagination offset |
limit | Page size |
order | Sort order (e.g., MEGINDITAS_DATUMA_DESC) |
Procedure detail: GET /eljarastar/api/public/eljaras/{EKR_ID}
Returns full JSON: eljarasTargy, eljarasAzonosito, eljarasrend, eljarasTipus, foCpvKod, foTargy, eljarasSzakasz, meginditasDatum, ajanlatteteliHatarido, ajanlatkeroList (with full org details), hirdetmenyList, dokumentumList, alkalmassagiKovetelmenyReszletek.
Dictionary/lookup endpoints: GET /eljarastar/api/public/szotar/{DICT_NAME}
Available dictionaries:
CPV_KODOK— 9,454 CPV codesELJARAS_TIPUS— 65 procedure types across Uniós/Nemzeti/EPK/Beszerzési categoriesELJARAS_SZAKASZ— procedure phasesNUTS_KODOK,PENZNEM,SZERZODES_TIPUSeforms-buyer-legal-type,authority-activity,selection-criterion,eu-programmestrategic-procurement,gpp-criteria,environmental-impact,social-objective
Database size: ~114,313 total procedures; ~4,300 active/future at any time.
robots.txt: https://ekr.gov.hu/robots.txt returns 404 — no restrictions.
1.5.2 EKR Szerződéstár (Contract Registry) API
Major finding: the pricing "gold mine" has an API
The spec identified Publikus CoRe as the primary source for price benchmarking. EKR’s own contract registry (
Szerződéstár) also has a public API with awarded values, and it links contracts directly to their EKR procedure IDs.
Base: https://ekr.gov.hu/ekr-szerzodestar/api/szerzodesapi/1.0/
List contracts: GET /ekr-szerzodestar/api/szerzodesapi/1.0/szerzodesek?offset=0&limit=10&apikey=PSZT
Contract detail: GET /ekr-szerzodestar/api/szerzodesapi/1.0/szerzodes/{ID}?apikey=PSZT
The API key is
PSZT(likely "Publikus Szerződéstár"). It is hard-coded in the frontend — this is an intentionally public API.
Contract detail fields:
ekrAzonosito— EKR ID (links to procedure)szerzodesTargya— contract subjectbruttoOsszeg— gross valuenettoOsszeg— net valuebruttoOsszegDevizaneme— currencyhatalyossagKezdete/Vege— validity periodteljesitesHatarideje— performance deadlinetipusaNev— contract typeuniosForrasbolFinanszirozott— EU-funded flaggazdasagiSzereploList— economic operators (name, tax number, SME status, full address)szerzodesModositasList— contract modificationsteljesitesList— performance recordsalvallalkozoList— subcontractorseljaras— linked procedure details
Database size: 135,299 contracts.
Web UI: https://ekr.gov.hu/ekr-szerzodestar/hu/szerzodesLista
1.5.3 EKR Hirdetményfigyelő (Notice Watcher)
Available at https://ekr.gov.hu/portal/hirdetmenyfigyelo but requires login. The Eljárástár search page mentions the feature is accessible from within the Eljárástár. This is a user-facing alert feature, not a data endpoint. For programmatic monitoring, use the Eljárástár API with date filters.
1.6 Közbeszerzési Hatóság (kozbeszerzes.hu)
Homepage: https://www.kozbeszerzes.hu/ — live, server-rendered HTML (Django-based).
1.6.1 Közbeszerzési Értesítő (Notice Gazette)
Search URL: https://www.kozbeszerzes.hu/adatbazis/keres/hirdetmeny/
robots.txt blocks scraping
https://www.kozbeszerzes.hu/robots.txtexplicitly disallows:Disallow: /adatbazis/keres/hirdetmeny/ Disallow: /adatbazis/megtekint/hirdetmeny/ Disallow: /ertesito/Both the search page and notice detail pages are blocked. Do not scrape these paths. Use EKR APIs and TED instead.
Available search fields (for reference, not for scraping): CPV kód, Ajánlatkérő neve, Nyertes ajánlattevő, Teljesítés helye, Beszerzés tárgya, Eljárás fajtája, Ajánlattételi/részvételi határidő, Közzététel dátuma, Iktatószám, Ajánlatkérő típusa, Ajánlatkérő tevékenységi köre, Hirdetmény típusa, Lapszám.
1.6.2 Publikus CoRe (Contract Registry)
URL: https://kereso-core.kozbeszerzes.hu (subdomain, NOT a path under www)
CoRe is on a different subdomain
The
www.kozbeszerzes.hurobots.txt does NOT apply tokereso-core.kozbeszerzes.hu. CoRe has no robots.txt of its own. Server-rendered HTML, no SPA.
Database size: 152,460 contracts.
Search modes:
- Simple:
https://kereso-core.kozbeszerzes.hu/kereses/kozbeszerzes/egyszeru/ - Advanced:
https://kereso-core.kozbeszerzes.hu/kereses/kozbeszerzes/osszetett/ - Full database:
https://kereso-core.kozbeszerzes.hu/lista/kozbeszerzes/
Pagination: ?oldal=N, 20 items per page, 7,623 pages.
Advanced search fields: Szövegben szereplő szavak, Szerződés típusa, Szerződés jellege, Szerződés tárgya, Ajánlatkérő neve, Eljárás tárgya, Szerződés státusza, Megkötéskori érték (range), Aláírás dátuma (range).
Table columns: Eljárás azonosítója (Procedure ID), Ajánlatkérő, Szerződés tárgya, Típusa, Megkötéskori értéke, Aláírás dátuma.
No export/API. HTML scraping with cheerio is the only option. Value data may only appear in detail pages, not in list view.
CoRe vs EKR Szerződéstár overlap
CoRe has 152,460 contracts (includes pre-EKR era). EKR Szerződéstár has 135,299 (post-2018 only). For historical depth, use CoRe. For structured API access with linked procedure IDs, use EKR Szerződéstár. Consider using both: EKR API as primary, CoRe HTML scraping for pre-2018 backfill.
1.6.3 Publikus KBA (Legacy Archive)
URL: https://kba.kozbeszerzes.hu/apex/f?p=103:35
Oracle APEX application. Still live. Covers pre-2018 procedures. Public user access. Low priority for the pipeline.
1.7 Legal References
| Reference | URL | Status |
|---|---|---|
| 424/2017. Korm. rendelet | https://net.jogtar.hu/jogszabaly?docid=a1700424.kor | Live, full text available |
| Also at | https://uj.jogtar.hu/#doc/db/1/id/a1700424.kor | Paid Jogtar version |
2. eForms SDK & Schema Mapping
2.1 eForms SDK
Repo: https://github.com/OP-TED/eForms-SDK — active, 84 stars, 40 forks.
Latest stable: v1.14.2 (2026-03-02). Maintenance release v1.13.3 on 2026-04-15. Pre-release v2.0.0-alpha.2 on 2026-03-26 (major EFX rewrite, not production-ready).
License: CC-BY-4.0.
Contents:
| Directory | Content |
|---|---|
schemas/ | UBL 2.3 XSDs: ContractNotice, ContractAwardNotice, PriorInformationNotice, BusinessRegistrationInformationNotice |
codelists/ | Genericode (.gc) files: CPV, NUTS, countries, currencies, procedure types, notice types |
fields/ | fields.json — metadata for all 286 unique business terms (1,256 total field entries) |
notice-types/ | 51 notice subtype definitions (subtypes 1–40, E1–E6, T01–T02, X01–X02, CEI) |
examples/ | Sample eForms notices with SVRL validation reports |
schematrons/ | Schematron validation rules |
efx-grammar/ | ANTLR4 grammar for EFX expression language |
view-templates/ | Visualisation templates |
translations/ | Multilingual label translations |
Related repos:
https://github.com/OP-TED/ted-xml-data-converter— XSLT converter from legacy TED R2.0.9 to eForms- Non-convertible elements:
https://github.com/OP-TED/ted-xml-data-converter/blob/main/ted-elements-not-convertible.md
2.2 Key Business Terms for the Pipeline
| Data point | BT ID | eForms XPath (abbreviated) |
|---|---|---|
| Main CPV code | BT-262-Lot | .../MainCommodityClassification/ItemClassificationCode |
| Additional CPV | BT-263-Lot | .../AdditionalCommodityClassification/ItemClassificationCode |
| Estimated value | BT-27-Lot | .../RequestedTenderTotal/EstimatedOverallContractAmount |
| Awarded tender value | BT-720-Tender | .../LotTender/LegalMonetaryTotal/... |
| Notice total value | BT-161-NoticeResult | .../NoticeResult/TotalAmount |
| Procedure type | BT-105-Procedure | .../TenderingProcess/ProcedureCode |
| Submission deadline (date) | BT-131(d)-Lot | .../TenderSubmissionDeadlinePeriod/EndDate |
| Submission deadline (time) | BT-131(t)-Lot | .../TenderSubmissionDeadlinePeriod/EndTime |
| Place of performance (NUTS) | BT-5071-Lot | .../RealizedLocation/Address/CountrySubentityCode |
| Place of performance (country) | BT-5141-Lot | .../RealizedLocation/Address/Country/IdentificationCode |
| Notice type | BT-02-notice | /*/cbc:NoticeTypeCode |
| Buyer org name | BT-500-Organization-Company | .../PartyName/Name |
| Buyer org ID | BT-501-Organization-Company | .../PartyIdentification/ID |
| Buyer activity | BT-10-Procedure-Buyer | .../ContractingActivity/ActivityTypeCode |
| Buyer legal type | BT-11-Procedure-Buyer | .../ContractingPartyType/PartyTypeCode |
Design notes
- CPV codes appear at three levels: Procedure, Part, and Lot. Check all three, prioritize Lot.
- Values appear at multiple levels (Lot, Procedure, NoticeResult). Currency is a separate companion field.
- Deadline is split into date + time fields.
- Organizations are in a flat list under
efac:Organizations; buyers are linked viaOPT-300-Procedure-Buyerreferencing org technical IDs.
2.3 Legacy TED vs eForms — Mapping for the Unified Tender Type
| Aspect | Legacy TED XML | eForms (UBL 2.3) |
|---|---|---|
| Schema basis | Custom TED XSD, form-based (F01–F25) | UBL 2.3 Pre-Award documents |
| Lot structure | Lots embedded in OBJECT_CONTRACT | First-class ProcurementProjectLot |
| CPV codes | CPV_MAIN/CPV_CODE @CODE | MainCommodityClassification/ItemClassificationCode via BT-262 |
| Values | VAL_TOTAL, VAL_ESTIMATED_TOTAL, VAL_RANGE_TOTAL | BT-27 (estimated), BT-720 (tender), BT-161 (notice total) |
| Organizations | Inline in CONTRACTING_BODY, CONTRACTOR | Centralized efac:Organizations list with cross-references |
| Notice types | Form number: F01=PIN, F02=CN, F03=CAN | NoticeTypeCode + 51 subtypes |
| Procedure type | PROCEDURE/PT_* elements (PT_OPEN, PT_RESTRICTED) | Coded value in ProcedureCode |
| Deadline | DT_DATE_FOR_SUBMISSION (single field) | Split: BT-131(d) + BT-131(t) |
| Place of performance | NUTS/@CODE + MAIN_SITE | NUTS (BT-5071) + country (BT-5141) + city (BT-5131) |
Fields in legacy but NOT in eForms:
VAL_RANGE_TOTAL/HIGH+LOW, DATE_AWARD_SCHEDULED, REFERENCE_NUMBER (object-level), URL_NATIONAL_PROCEDURE, CA_TYPE_OTHER/CA_ACTIVITY_OTHER (free text alternatives).
Fields in eForms but NOT in legacy: Framework re-estimated values, buyer review/complaint tracking, foreign subsidy regulation (FSR), green/social/innovative procurement indicators, Clean Vehicles Directive fields, detailed subcontracting fields.
3. CPV Code Reference
3.1 Validation of Spec CPV Codes
All five CPV roots from the spec are confirmed present in the eForms SDK cpv.gc codelist (9,454 entries, CPV 2008 edition):
| Code | Official label | Hierarchy level |
|---|---|---|
72000000 | IT services: consulting, software development, Internet and support | Division root |
48000000 | Software package and information systems | Division root |
30200000 | Computer equipment and supplies | Group (parent: 30000000) |
32000000 | Radio, television, communication, telecommunication and related equipment | Division root |
51610000 | Installation services of computers and information-processing equipment | Class (parent: 51600000) |
Hierarchy note
30200000and51610000are NOT division roots. Searches must use exact-match or prefix-match at the appropriate level — they will not automatically capture sibling codes under30000000or51600000.
3.2 Recommended Additional CPV Codes
| Code | Label | Rationale |
|---|---|---|
50300000 | Repair/maintenance of PCs, office equipment, telecom equipment | IT support/maintenance contracts |
64200000 | Telecommunications services | Service contracts (distinct from equipment in 32xxx) |
Optional broader scope: 73000000 (R&D services) if research-oriented tenders are desired.
3.3 CPV Subdivision Trees
72000000 — IT services:
| Code | Label |
|---|---|
72100000 | Hardware consultancy services |
72200000 | Software programming and consultancy services |
72300000 | Data services |
72400000 | Internet services |
72500000 | Computer-related services |
72600000 | Computer support and consultancy services |
72700000 | Computer network services |
72800000 | Computer audit and testing services |
72900000 | Computer back-up and catalogue conversion services |
48000000 — Software packages:
| Code | Label |
|---|---|
48100000 | Industry specific software package |
48200000 | Networking, Internet and intranet software package |
48300000 | Document creation, drawing, imaging, scheduling and productivity software |
48400000 | Business transaction and personal business software |
48500000 | Communication and multimedia software package |
48600000 | Database and operating software package |
48700000 | Software package utilities |
48800000 | Information systems and servers |
48900000 | Miscellaneous software package and computer systems |
30200000 — Computer equipment:
| Code | Label |
|---|---|
30210000 | Data-processing machines (hardware) |
30220000 | Digital cartography equipment |
30230000 | Computer-related equipment |
32000000 — Radio/TV/telecom equipment:
| Code | Label |
|---|---|
32200000 | Transmission apparatus for radiotelephony, radio broadcasting, TV |
32300000 | TV/radio receivers, sound/video recording apparatus |
32400000 | Networks |
32500000 | Telecommunications equipment and supplies |
4. Revised Pipeline Architecture
graph TD subgraph Sources TED["TED Search API v3<br/>(JSON, ITERATION mode)"] BULK["TED Bulk XML<br/>(monthly tar.gz backfill)"] EKR_P["EKR Eljárástár API<br/>(JSON, /api/public/)"] EKR_C["EKR Szerződéstár API<br/>(JSON, apikey=PSZT)"] CORE["CoRe HTML scrape<br/>(kereso-core.kozbeszerzes.hu)"] end subgraph "Parse & Normalize" PARSE_TED["Parse TED JSON/XML<br/>fast-xml-parser"] PARSE_EKR["Parse EKR JSON"] PARSE_CORE["Parse CoRe HTML<br/>cheerio"] NORM["Normalize → Tender type<br/>zod .transform()"] end subgraph "Enrich & Store" DEDUP["Dedupe by tender ID<br/>(EKR ID ↔ TED publication-number)"] PRICE["Enrich: pricing model<br/>(from EKR Szerződéstár + CoRe)"] SCORE["Score & filter<br/>(CPV match, margin viability)"] DB["SQLite + FTS5<br/>(better-sqlite3 + drizzle)"] end TED --> PARSE_TED BULK --> PARSE_TED EKR_P --> PARSE_EKR EKR_C --> PARSE_EKR CORE --> PARSE_CORE PARSE_TED --> NORM PARSE_EKR --> NORM PARSE_CORE --> NORM NORM --> DEDUP DEDUP --> PRICE PRICE --> SCORE SCORE --> DB
Key architecture change from original spec
The original spec planned for 3 sources (TED API, EKR scrape, CoRe scrape). The validated architecture uses 5 sources: TED Search API, TED Bulk XML, EKR Eljárástár API, EKR Szerződéstár API, and CoRe HTML scrape. EKR is now API-driven (no HTML scraping). CoRe remains HTML-only but is supplementary — most pricing data is available via the EKR Szerződéstár API.
5. Implementation Steps
Step 1: Project Setup
tender-pipeline/
├── src/
│ ├── sources/
│ │ ├── ted-api.ts # TED Search API client
│ │ ├── ted-bulk.ts # TED bulk XML ingestion
│ │ ├── ekr-procedures.ts # EKR Eljárástár API client
│ │ ├── ekr-contracts.ts # EKR Szerződéstár API client
│ │ └── core-scraper.ts # CoRe HTML scraper
│ ├── schema/
│ │ ├── tender.ts # Unified Tender zod schema
│ │ ├── contract.ts # Unified Contract zod schema
│ │ └── adapters/
│ │ ├── ted-eforms.ts # eForms XML → Tender
│ │ ├── ted-legacy.ts # Legacy TED XML → Tender
│ │ ├── ekr.ts # EKR JSON → Tender
│ │ └── core.ts # CoRe HTML → Contract
│ ├── pipeline/
│ │ ├── stage.ts # Stage<TIn, TOut> interface
│ │ ├── runner.ts # Pipeline composition and execution
│ │ ├── normalize.ts # Normalization stage
│ │ ├── dedupe.ts # Deduplication stage
│ │ ├── enrich-pricing.ts # Price enrichment stage
│ │ └── score.ts # Scoring/filtering stage
│ ├── db/
│ │ ├── schema.ts # Drizzle schema definition
│ │ ├── migrations/ # Drizzle migrations
│ │ └── queries.ts # Common query helpers
│ └── ingest.ts # Entry point (daily run)
├── drizzle.config.ts
├── package.json
└── tsconfig.json
Dependencies:
| Package | Version | Purpose |
|---|---|---|
undici | 8.1.0 | HTTP client (streaming, cookies, retry) |
fast-xml-parser | 5.7.2 | XML parsing (TED notices) |
cheerio | 1.2.0 | HTML scraping (CoRe) |
p-throttle | 8.1.0 | Per-domain rate limiting |
p-queue | 9.1.2 | Concurrency control |
tar | 7.5.13 | Bulk XML extraction from tar.gz |
zod | 4.3.6 | Schema validation + type inference |
better-sqlite3 | 12.9.0 | SQLite database |
drizzle-orm | 0.45.2 | Type-safe ORM |
drizzle-kit | 0.31.10 | Schema migrations |
node-cron | 4.2.1 | Scheduling (if not using OS cron) |
Step 2: Source Clients
TED Search API client:
- Use
undici.fetch()for JSON POST requests - ITERATION mode pagination (no 15k cap)
- Separate throttled fetcher:
pThrottle({limit: 5, interval: 1000})(tedFetch) - Query builder for Expert Search syntax
- Fields to request:
publication-number,notice-type,form-type,buyer-name, all value fields, CPV, deadline, place of performance
TED Bulk XML ingestion:
- Use
undici.request()for streaming response - Pipe through
tar.extract()with filter for.xmlfiles - Parse each XML file with
fast-xml-parserusingremoveNSPrefix: true, ignoreAttributes: false - Two adapters: one for eForms (UBL 2.3), one for legacy TED schema
- Detect format by checking root element namespace or publication number format
EKR Eljárástár API client:
- Use
undici.fetch()for JSON GET requests - Paginate via
offset+limitparameters - Fetch procedure detail by EKR ID for full data
- Use dictionary endpoints to populate CPV and procedure type lookup tables on startup
- Throttle:
pThrottle({limit: 1, interval: 1000})(ekrFetch)
EKR Szerződéstár API client:
- Use
undici.fetch()withapikey=PSZTquery parameter - Paginate via
offset+limit - Extract
bruttoOsszeg,nettoOsszeg,bruttoOsszegDevizaneme,gazdasagiSzereploList - Link to procedures via
eljarasEkrAzonosito
CoRe HTML scraper:
- Use
undici.fetch()+cheerio.load()for server-rendered HTML - Paginate via
?oldal=Nparameter (20 items/page, ~7,600 pages for full backfill) - Extract from list view: procedure ID, buyer, subject, type, value, signing date
- Follow detail links for additional fields if needed
- Throttle:
pThrottle({limit: 1, interval: 1000})(coreFetch)
Step 3: Unified Schema
Define a Tender zod schema covering the union of fields across all sources. Use .transform() in each adapter to normalize source-specific shapes into the common type.
Key fields for the unified Tender type:
id(composite: source + source-specific ID)source(ted|ekr|core)tedPublicationNumber(nullable)ekrId(nullable)noticeType(cn|can|pin|other)title/subjectcpvCodes(array)buyerName,buyerCountry,buyerTypeestimatedValue,awardedValue,currencyprocedureTypesubmissionDeadlineplaceOfPerformance(NUTS code + text)publicationDaterawSource(original JSON/XML stored for forensic reference)
Step 4: Deduplication
Above-threshold HU tenders appear in both TED and EKR. Deduplicate by:
- Match on EKR ID (if present in both sources)
- Match on TED publication number (if EKR notice references it)
- Fuzzy match on title + buyer + CPV + date (fallback)
Prefer EKR as the authoritative source for HU-origin tenders (more fields, Hungarian-language detail).
Step 5: Pricing Model
Build from EKR Szerződéstár + CoRe data:
- Index awarded contracts by CPV code (primary), buyer (secondary), region (tertiary)
- Compute per-CPV median/p25/p75 awarded values
- Normalize to EUR using ECB daily rates (HUF tenders need conversion)
- The awarded value, not the estimated value, is the market signal
Step 6: Scoring & Output
For each active tender, compute:
- CPV relevance score
- Price competitiveness (estimated value vs historical award median for same CPV)
- MEAT weight analysis (if evaluation criteria are available from the notice)
- Days until deadline
- Buyer history (repeat buyer? predictable evaluation patterns?)
Store in SQLite with FTS5 index on title, subject, buyer name for ad-hoc search.
Step 7: Scheduling
Incremental (daily):
- TED Search API: query with
publication-date >= {yesterday}for new notices - EKR Eljárástár API: query with date filter for new/updated procedures
- EKR Szerződéstár API: paginate recent contracts for pricing model updates
Backfill (one-time):
- TED Bulk XML: download monthly packages for historical data
- EKR: paginate full procedure list (
osszes=true, 114k+ records) - CoRe: paginate full database (~7,600 pages)
Schedule via OS cron: 0 6 * * * cd /path/to/pipeline && node dist/ingest.ts
6. Open Questions — Updated
-
Verify TED Search API v3 anonymous rate limit— No documented limit. Test and increase. -
EKR ToS review: legal status of scraping— EKR has no robots.txt, exposes/api/public/endpoints intentionally. Strong signal this is meant to be public. -
CoRe export format— HTML only. No structured download or API. - eForms vs legacy TED schema: build the two adapters and test against sample notices from the eForms SDK
examples/directory. - Currency normalization: integrate ECB daily reference rates for HUF→EUR conversion. API at
https://data-api.ecb.europa.eu/. -
Storage decision— SQLite + FTS5 via better-sqlite3 + drizzle. KeeprawSourcefield for forensic reference of original data. - Pricing model partition key: start with per-CPV (broadest coverage), add per-buyer as a refinement once data volume is sufficient.
- EKR Eljárástár API: test whether CPV filtering works via query parameter or requires post-fetch filtering.
- Determine if EKR Szerződéstár contract detail always includes value data, or if some contracts have null values.
7. Verified URLs
| Resource | URL | Status |
|---|---|---|
| TED Search API | https://api.ted.europa.eu/v3/notices/search | Live |
| TED Swagger UI | https://api.ted.europa.eu/swagger-ui/index.html | Live |
| TED Bulk (daily) | https://ted.europa.eu/packages/daily/{YYYYNNNNN} | Live |
| TED Bulk (monthly) | https://ted.europa.eu/packages/monthly/{YYYY-M} | Live |
| TED Docs | https://docs.ted.europa.eu/api/latest/ | Live |
| TED Search Docs | https://docs.ted.europa.eu/api/latest/search.html | Live |
| TED Search Fields | https://docs.ted.europa.eu/ODS/latest/reuse/field-list.html | Live |
| TED CSV (frozen) | https://data.europa.eu/data/datasets/ted-csv?locale=en | Live (data ends 2023) |
| eForms SDK | https://github.com/OP-TED/eForms-SDK | Active (v1.14.2) |
| TED XML Converter | https://github.com/OP-TED/ted-xml-data-converter | Active |
| EKR Homepage | https://ekr.gov.hu/portal/kezdolap | Live |
| EKR Eljárástár | https://ekr.gov.hu/eljarastar/ | Live |
| EKR Procedure API | https://ekr.gov.hu/eljarastar/api/public/kereso | Live |
| EKR Dictionary API | https://ekr.gov.hu/eljarastar/api/public/szotar/{name} | Live |
| EKR Szerződéstár | https://ekr.gov.hu/ekr-szerzodestar/hu/szerzodesLista | Live |
| EKR Contract API | https://ekr.gov.hu/ekr-szerzodestar/api/szerzodesapi/1.0/szerzodesek?apikey=PSZT | Live |
| KH Homepage | https://www.kozbeszerzes.hu/ | Live |
| KH Értesítő Search | https://www.kozbeszerzes.hu/adatbazis/keres/hirdetmeny/ | Live (robots.txt blocked) |
| Publikus CoRe | https://kereso-core.kozbeszerzes.hu | Live |
| Publikus KBA | https://kba.kozbeszerzes.hu/apex/f?p=103:35 | Live (legacy) |
| CPV Reference | https://simap.ted.europa.eu/web/simap/cpv | Live |
| 424/2017 Korm. rendelet | https://net.jogtar.hu/jogszabaly?docid=a1700424.kor | Live |