Skip to main content

Test philosophy

How we decide what kind of test to write.

The 4-class taxonomy

We use four distinct test classes. Each catches a different bug class. Picking the wrong class is the #1 source of "the test passed but it broke in prod."

1. File-shape contract (Vitest, node env)

What it is: read source code as text, assert a regex/string pattern is present.

// tests/seller-bid-lifecycle.test.ts
it("flips stale pending/countered to expired", () => {
const route = read("src/app/api/seller/offers/route.ts");
expect(route).toContain("prisma.priceOffer.updateMany");
expect(route).toMatch(/expiresAt:\s*\{\s*lt:\s*now\s*\}/);
});

What it catches: "Did someone delete this critical call?" "Did the i18n key get renamed?" "Is this constant still 70 percent?"

What it MISSES: anything about runtime behavior. Whether the code path is actually reached. Whether the UI renders correctly. Whether the API returns the right data.

When to use: as a cheap regression guard against a SPECIFIC named bug. After fixing a bug, write a file-shape test that asserts the fix's pattern exists. If a future refactor moves the code, the test fails loudly and you remember the historical reason.

Speed: ~3 seconds for our 2600+ tests because no execution happens.

2. Component integration (Vitest, jsdom env + Testing Library)

What it is: mount a React component in a simulated DOM, simulate clicks/typing, assert callbacks fire and the right elements appear.

// tests/components/VoiceRecorder.test.tsx
// @vitest-environment jsdom
it("calls onRecordingComplete with a Blob when user taps Stop", async () => {
const onComplete = vi.fn();
render(<VoiceRecorder onRecordingComplete={onComplete} ... />);
fireEvent.click(screen.getByText(/Tap to record/i));
await waitFor(() => screen.getByText(/^Recording$/i));
fireEvent.click(screen.getByText(/Stop/i));
await waitFor(() => expect(onComplete).toHaveBeenCalledTimes(1));
});

What it catches: state machine bugs, conditional rendering bugs, callback wiring bugs, navigation target bugs (the seller-panel 404 trap from 2026-04-18).

What it MISSES: real-browser layout (no actual CSS engine), native-only APIs (camera, geolocation), real network.

When to use: any new UI component that handles user input. Especially: notification panels, login forms, modal dialogs, dropdowns, multi-step wizards.

Speed: ~1 second per test (DOM setup is the slow part).

3. Mobile component (Jest + React Native Testing Library)

What it is: mount a React Native component in jsdom-equivalent, simulate touches.

Three separate Jest suites — one per app: mobile/, mobile-seller/, mobile-doctor/. Each has its own package.json, jest.config.js, and mocks.

Limitations:

  • jsdom doesn't render native modules. Camera, audio recording, push notifications all need to be mocked.
  • LanguageContext / ToastContext / AuthContext need to be wrapped or mocked.
  • Real device-only behavior (keyboard avoidance, system permission dialogs) cannot be tested here.

When to use: "Does this RN screen mount without throwing?" "When this fake button is pressed, does the right API method get called?"

4. Live verification (curl + JWT + psql against prod)

What it is: sign a JWT for a real user with the prod JWT_SECRET, hit the live API, inspect the response and the DB.

SELLER_TOKEN=$(node -e "
const jwt = require('/.../node_modules/jsonwebtoken');
console.log(jwt.sign({sellerId: 13, type: 'seller'}, '$JWT_SECRET', {expiresIn:'1h'}));
")
curl -s -H "Cookie: seller_token=$SELLER_TOKEN" https://ka26.shop/api/seller/offers | jq

What it catches: the truth. Does the deployed code on Cloud Run actually behave correctly with real data?

What it MISSES: UI rendering, mobile behavior, device-specific quirks.

When to use: after every backend deploy. Especially when fixing a bug that involved server-side state (lazy-expire, cron jobs, scheduled events).

See Live verification for the full pattern + example commands.

The mental model: catching bugs before they ship

Each layer catches a different bug class. We've shipped real production outages because we tested with the WRONG class:

Real bug we shippedLayer that would have caught itLayer we ACTUALLY had
6-day [object Object] registration outageLayer 4 (live e2e on signup)Layer 1 only — file-shape proved the function was imported but didn't catch that it returned an Object
Seller-panel 404 trap (notification routed to non-existent page)Layer 2 (assert the navigation target is a real page)Layer 1 only — file-shape proved the click handler existed
Bid analytics events silently no-op (BUG-006e)Layer 4 (run a real bid + check UserEvent table)Layer 1 — assertions confirmed the strings existed; nothing actually fired track()
"Locked · expired" stale accepted bids in active count (BUG-007)Layer 4 (force a stale bid + GET + verify)Layer 1 had partial coverage but missed the accepted-status case

Lesson: for every critical user flow, have AT LEAST one layer-2 component test AND one layer-4 live test. File-shape tests alone are necessary but never sufficient.

When to add which layer

Type of changeRequired layers
Refactor (no behavior change)Layer 1 — pin the new shape
New API endpointLayer 1 (route file shape) + Layer 4 (curl against prod after deploy)
New UI component with inputLayer 1 (imports/exports) + Layer 2 (jsdom interaction)
Bug fixLayer 1 regression guard naming the bug + Layer 4 verification of the fix live
Cron / scheduled jobLayer 1 + Layer 4 — manually trigger conditions and verify the cron's effect on the DB
New mobile screenLayer 1 (file-shape on the .tsx) + Layer 3 (mount test in mobile*/) — and a real-device test from the user before shipping APK

What we explicitly don't have

  • Visual regression (screenshot diffs). Tried Playwright snapshots; the maintenance cost is too high for our screen count + the value is low for an MVP. Re-evaluate at 100+ active users.
  • Synthetic load tests. Cloud Run autoscales; we don't have the RPS to need this yet.
  • Mutation testing. Considered but the marginal coverage isn't worth the runtime cost on 2600+ tests.

These are deliberate omissions, not oversights.