P0 — Service Worker NavigationRoute Locks All Users Out of CF Access
Production outage on wrcm.levandor.io: every user (admin web + iOS PWA) saw a full-screen Authentication Error: No CF Access token available, persisting across browser refresh. Root cause was the custom service worker introduced for push-notifications work intercepting every navigation and serving cached /index.html instead of letting Cloudflare Access run its cookie-refresh redirect dance. Fixed in cd8394f by removing the Workbox NavigationRoute + adding immediate SW takeover + adding an in-app “Reset session and reload” recovery button.
Severity P0
All users locked out of the admin CRM and iOS PWA. Refresh did not recover (service workers persist across refresh). Recovery required DevTools (unregister SW + clear caches) until the fix shipped.
For Agents
If you are ever asked to add a custom service worker (Workbox, vite-plugin-pwa
injectManifest, etc.) to this codebase, read this note first. ZTNA + navigation interception is a permanent trap; the prior research at iOS PWA Gotchas correctly flagged “SW must NOT cache user-scoped API responses” but missedNavigationRouteentirely. The push-notifications spec must explicitly forbidNavigationRoute(or whitelist CF Access endpoints) before being implemented.
Symptoms
- Full-screen error:
Authentication Error: No CF Access token available - Every user affected — admin web AND iOS PWA standalone
- Refresh did NOT clear the error
- New tab to
wrcm.levandor.ioreproduced immediately - Onset was gradual over ~24h post-deploy, not instantaneous (see Detection timeline)
Timeline
| When | Event |
|---|---|
| ~24h before report | Commit ac5f429 build(pwa): switch to injectManifest with custom sw.ts (precache parity) deployed |
| Throughout the day | Users with fresh CF_Authorization cookies kept working — only those whose cookies naturally expired hit the lockout |
| Report time | ”the website is broken now too it shows the same error there too even after refresh” / “our previous impl fucked something up” |
| Investigation | Initial misdirection toward recent centralized notifications work (red herring) |
| Diagnosis | web/src/sw.ts NavigationRoute(createHandlerBoundToURL('/index.html')) identified |
| Fix shipped | Commit cd8394f on master |
Root Cause
The custom service worker introduced in commit ac5f429 (build(pwa): switch to injectManifest with custom sw.ts (precache parity)) registered a Workbox NavigationRoute bound to /index.html:
// web/src/sw.ts (offending code)
import { NavigationRoute, registerRoute } from 'workbox-routing'
registerRoute(new NavigationRoute(createHandlerBoundToURL('/index.html')))This intercepts every page navigation — initial loads, refreshes, new-tab opens — and serves cached /index.html from the Workbox precache instead of going to the network.
Why this is catastrophic behind Cloudflare Access ZTNA
Cloudflare Access expires its CF_Authorization cookie periodically. When the cookie is stale, CF Access responds to a navigation request with a 302 to the IdP login page so the browser can re-authenticate and receive a fresh cookie. This is the entire mechanism by which long-lived sessions stay alive on a ZTNA-protected origin.
With the SW intercepting navigations:
- Browser issues navigation request
- SW returns cached
/index.htmlimmediately — request never leaves the device - CF Access never sees the request → cannot issue the 302 → cookie is never refreshed
- SPA boots,
cf-access.ts:79callsgetCfAccessToken(), finds no validCF_Authorizationindocument.cookie - SPA throws
Authentication Error: No CF Access token available - User refreshes → step 2 again, forever
Why refresh did not recover
Service workers persist across browser refreshes by design (that’s the entire point of offline-capable PWAs). Once the SW is registered, only unregistering the SW + clearing caches (or shipping a new SW that takes over) breaks the loop.
Why it took ~24h to manifest
Users whose CF_Authorization cookie was fresh at deploy time kept working — the SPA’s call to getCfAccessToken() succeeded because the cookie was still valid. The lockout only kicked in for each user the first time their cookie naturally expired after the deploy. As cookies expired throughout the day, more users hit the wall. This staged onset misleads incident triage — symptoms look like “something is gradually getting worse” rather than “we shipped a bug.”
Misleading framing during triage
The user (correctly) suspected the recent centralized notifications work and migrations. Those were a red herring — the notifications-system migrations did not touch auth. The real culprit was the earlier, seemingly-innocuous push-notifications SW commit (ac5f429). Lesson: when an auth outage follows a deploy window with multiple changes, audit every change in the window — especially anything touching service workers, edge config, or middleware — not just the most recent or most suspicious-looking one.
Fix
Commit cd8394f on master. Three changes:
1. Remove NavigationRoute from web/src/sw.ts
Stop intercepting navigations. Let every navigation hit the network so CF Access can run its 302/cookie-refresh dance unimpeded. Workbox precaching for static assets is fine and was kept — only NavigationRoute is the trap.
2. Immediate SW takeover
// web/src/sw.ts
self.addEventListener('install', () => self.skipWaiting())
self.addEventListener('activate', (event) => event.waitUntil(self.clients.claim()))Without these, users who already have the broken SW installed would have to close all tabs to pick up the new SW. With skipWaiting + clients.claim, the new SW activates on the very next page load and takes over existing clients.
3. In-app recovery: “Reset session and reload”
Added a button to the AuthError screen in web/src/lib/supabase.tsx that:
- Calls
navigator.serviceWorker.getRegistrations()and unregisters all SWs - Iterates
caches.keys()and deletes every cache - Reloads the page
This unblocks users already trapped behind the broken SW without requiring DevTools. This is now the canonical recovery pattern for any future ZTNA-vs-SW issue.
Lessons & Gotchas
ZTNA + custom service worker is a permanent trap
Any navigation interception breaks CF Access’s cookie-refresh redirect dance. If
NavigationRouteis ever reintroduced, it MUST whitelist requests to CF Access endpoints AND pass through 30x responses unmodified. Easier rule: don’t useNavigationRouteon this origin, period. Workbox precaching of static assets is fine — that’s not the trap.
Service workers persist across refresh — design for recovery
Refresh is the user’s first instinct. It does not help with SW-induced bugs. Every codebase shipping a custom SW should expose an in-app “reset session” button that unregisters SWs + clears caches. We now have one in
web/src/lib/supabase.tsx— keep it. Do not remove it as part of any future cleanup.
Always include
skipWaiting+clients.claimfor fix-ship SWsWhen shipping a fix to a broken SW, include
self.skipWaiting()oninstallandself.clients.claim()onactivateso the new SW activates immediately rather than waiting for all tabs to close. Without this, recovery requires manual DevTools intervention from each affected user.
Stale doc —
web/CLAUDE.mdsays Supabase has no JWT auth
web/CLAUDE.mdstill claims “Supabase client factory (anon key, no JWT auth)” but the JWT exchange via thecf-access-authEdge Function was reintroduced sometime after commit769ba5c. The current security note also reflects the older anon-only model. Both docs need a separate update pass — out of scope for this incident note but flagged for follow-up.
Workbox precaching is fine — only
NavigationRouteis the trapThe general Workbox approach (precache manifest, runtime caching for static assets) is compatible with CF Access ZTNA. The specific issue is
NavigationRoute+createHandlerBoundToURL, which by design intercepts navigations. If push notifications need a custom SW, build one withoutNavigationRoute(useprecacheAndRoutefor assets only).
Implications for Push Notifications Work
The push-notifications research at push-notifications is the work that needed injectManifest + a custom web/src/sw.ts. The research correctly flagged the CF Access ZTNA + iOS standalone cookie spike as a planning blocker but did not anticipate NavigationRoute would also break navigation cookie refresh on every platform.
Before resuming push-notifications implementation:
- Hard rule: the push SW must NOT register
NavigationRoute. Add this as an explicit “don’t” in the plan. - Test plan addition: every PR touching
web/src/sw.tsmust include manual verification that an expiredCF_Authorizationcookie still triggers the CF Access login redirect. This cannot be unit-tested — it requires a real ZTNA round trip. - Keep the recovery button: the “Reset session and reload” button stays. Future SW changes should consider it part of the contract.
How to Verify Going Forward
When changing anything in web/src/sw.ts or web/vite.config.ts PWA settings:
- Deploy to a CF Access-protected preview environment.
- Manually expire the
CF_Authorizationcookie in DevTools (Application → Cookies → deleteCF_Authorization). - Refresh. The expected behaviour is: browser redirects to CF Access IdP login, user re-authenticates, fresh cookie issued, SPA loads.
- Failure mode: if the SPA loads with
Authentication Error: No CF Access token available, the SW is intercepting navigations again — STOP and audit the SW source.
Related
- security — Auth architecture (CF Access ZTNA + Supabase). May need an update re: JWT exchange via
cf-access-authEdge Function. - push-notifications — Original research that motivated the
injectManifestswitch. Update its “Gotchas” section to addNavigationRouteas a hard “don’t.” - mobile-native-feel — Mobile PWA architecture; consumers of the same SW.
- debugging-log-crm — Project debugging log. This incident also belongs in the index there.
- levandor-crm — Project overview.
- agent-context-crm — Agent quick reference.