Performance CSV ingest fix

The K8s CronJob ingest of MonthlyPerformance.csv (and the same-shape Quarterly variant) was failing on every run with a Prisma createMany validation error. Three independent root causes were found and fixed in one session.

Scope

Affected datasources: monthlyPerformanceDs and quarterlyPerformanceDs (see GSR Transformer). The fix touches the Prisma schema, the targets config in src/targets.ts, and the coercion path in src/datasource/tracked-ds.ts.

Symptoms

  • Prisma createMany rejects a batch of rows.
  • Error message: “Provided Date object is invalid. Expected Date.” with the offending row inlined.
  • Affects both monthlyPerformanceDs and quarterlyPerformanceDs (same-shape CSV).
  • Reproduces every CronJob run for any file containing - placeholders or trailing blank rows.

Root Causes

Three independent bugs combining into the same symptom. Each is fixable in isolation.

1. oneMonth field mistyped as DateTime

prisma/schema.prisma had oneMonth DateTime? on both monthlyPerformanceDs and quarterlyPerformanceDs. The CSV’s “1 Month” column is a percentage (e.g. “1.23%” or ”-”), never a date. The header mapping in src/targets.ts matched: "oneMonth": ["1 Month", "DateTime?"].

The smoking gun was the response schema at src/targets.ts:8:

const performancePeriodSchema = {
  type: 'object',
  properties: {
    oneMonth: { type: 'string', nullable: true },   // disagreed with parser/DB
    threeMonth: { type: 'number', nullable: true },
    ...
  }
}

The response schema correctly said string, but the parser config and DB schema said DateTime?. Likely a copy-paste artifact from the adjacent date column.

Fix: Change oneMonth to String? in both the Prisma schema and the header mapping; ship as the first migration 20260504165350_fix_one_month_string.

2. Phantom empty row from PapaParse

Papa.parse was being called with no options:

const parsed = Papa.parse<string[]>(file)

The production CSV ends with a trailing line of all-empty fields:

,,,,,,,,,,,

That is 12 empty fields but non-zero character count (11 commas), so PapaParse processes it as a row of empty strings. With non-nullable required fields like fundName and fundTicker, this row fed empty strings, NaN, and Invalid Date directly to Prisma createMany.

PapaParse skipEmptyLines gotcha

skipEmptyLines: true only skips zero-character lines (e.g. a literal blank line). To skip rows that contain only commas (“all-empty fields”), you must pass skipEmptyLines: ‘greedy’. The string literal ‘greedy’ is required — a boolean is not enough.

Fix:

const parsed = Papa.parse<string[]>(file, { skipEmptyLines: 'greedy' })

3. Dash placeholder produced NaN / Invalid Date

The CSV uses - as a missing-data sentinel. The old inline parser at src/datasource/tracked-ds.ts:117 did:

  • Float/Int columns: Number("-".replace("%", "")) returns NaN
  • DateTime columns: new Date("-") returns Invalid Date (a Date object whose .getTime() is NaN)

Both are silently accepted by JavaScript and only surface when Prisma validates the input.

Fix: Extract a private coerceCell() method that normalises both "" and ”-” to null when the field is nullable, and throws a descriptive error for required fields:

private coerceCell(rawValue: string | undefined, mapping: string, headerName: string, headerType: HeaderType, rowIndex: number) {
    const isNullable = headerType.endsWith("?")
    const type = headerType.replace("?", "")
    const trimmed = rawValue?.trim() ?? ""
    if (trimmed === "" || trimmed === "-") {
        if (isNullable) return null
        throw new Error(`[${this.tablenameAndId}] empty value for required field "${mapping}" (CSV column "${headerName}") at row ${rowIndex}`)
    }
    if (type == "Int" || type == "Float") return Number(trimmed.replace("%", ""))
    if (type == "DateTime") return new Date(trimmed)
    if (type == "Boolean") return ["Y", "y", "1"].includes(trimmed)
    return trimmed
}

Files Touched

  • /Users/levander/coding/scharge/bankofus/prisma/schema.prisma — oneMonth: DateTime? to String? on both monthlyPerformanceDs and quarterlyPerformanceDs
  • /Users/levander/coding/scharge/bankofus/prisma/migrations/20260504165350_fix_one_month_string/migration.sql — new (first migration in the project)
  • /Users/levander/coding/scharge/bankofus/prisma/migrations/migration_lock.toml — new
  • /Users/levander/coding/scharge/bankofus/src/datasource/tracked-ds.ts — added coerceCell(); Papa.parse now passes { skipEmptyLines: ‘greedy’ }
  • /Users/levander/coding/scharge/bankofus/src/targets.ts — oneMonth header mapping changed from DateTime? to String?
  • /Users/levander/coding/scharge/bankofus/scripts/verify-perf-fix.ts — new, ad-hoc verification script

Key Learnings

PapaParse skipEmptyLines values

  • false (default) — keep all lines, even blank ones
  • true — skip lines with zero characters only
  • ‘greedy’ — also skip lines that contain only whitespace and field separators (e.g. ,,,,,,,,,,,)

Bit a real production CSV. Prefer ‘greedy’ for any external data feed where trailing blank rows are plausible.

Don't trust schema field types — validate them against actual data shape

A field marked DateTime? that actually holds a percentage will silently work in tests until the first non-numeric placeholder arrives. The disagreement between the response schema (type: ‘string’) and the DB / parser config (DateTime?) was the smoking gun in this bug.

JavaScript silent NaN failure modes

Three different ways the same coercion can produce silent bad data:

  • Number("") returns 0 (a valid finite number that gets stored)
  • Number(”-”) returns NaN
  • Number(“abc”) returns NaN

Worth a Number.isFinite() guard at every numeric coercion boundary.

new Date with garbage strings

new Date("") and new Date(”-”) both return an Invalid Date — a real Date object whose .getTime() is NaN. instanceof Date is true. The only safe check is !Number.isNaN(d.getTime()).

Prisma 7’s validation error is helpful: “Provided Date object is invalid. Expected Date.” with the offending row inlined.

First Prisma migration after db push

The repo’s prisma/migrations/ directory was empty before this fix — schema had been applied via db push. Adding a real migration requires:

  • creating migration_lock.toml (Prisma generates this)
  • considering baselining strategy: either an initial “baseline” migration capturing the existing schema, or prisma migrate resolve --applied <name> against an existing DB. Otherwise migrate deploy will try to apply changes that already exist.

Latent Issues (flagged, not fixed)

Flagged for a future session. None block the current ingest, but each is a sharp edge.

  • coerceCell still passes silent NaN / Invalid Date for non-dash garbage values. If a numeric column contains “abc” or a date column contains “not a date”, coerceCell returns NaN / Invalid Date and Prisma will reject the batch with a less helpful error.
  • findIndex returning -1 for a missing CSV column produces a misleading “empty value for required field” error from coerceCell instead of “CSV column not found”. Worth checking the index before reading.
  • parsed.errors from PapaParse is never inspected. Field-count mismatches (a row with too many or too few commas) are silently dropped or padded.
  • deletePrevious: true is non-atomic. A mid-batch throw inside createMany leaves stale data without inserting new records — left in an inconsistent state for the current run. Wrap createMany + deleteMany in a transaction.
  • Duplicate await loadEnv() calls — once at top level in src/targets.ts:75, once inside track() in src/datasource/tracked-ds.ts:110. The second is redundant; the first runs at module init.

Architecture Notes (worth preserving)

  • TrackedTabularDataSource (src/datasource/tracked-ds.ts) is the central CSV to Prisma writer. It does header lookup by name (positional via findIndex), so any upstream column rename silently breaks parsing for that column.
  • The header mapping shape is Record<DBField, [CSVHeader, HeaderType]> where HeaderType is one of {Int, Float, DateTime, String, Boolean, Int?, …}. The trailing ? is parsed at runtime via endsWith(”?”).
  • dynDispatchPrismaCall is the reflection escape hatch that lets TrackedTabularDataSource work generically across Prisma models (since the model name is dynamic at runtime).
  • File-hash dedup at tracked-ds.ts:120-128: if any row in the target table already has the file’s SHA-256 hash, ingest is skipped. Same-content re-ingest is intentionally a no-op.