Skip to main content

KA26 Production Monitoring — Setup Guide

3-layer monitoring for going-live, designed to catch issues before users report them.

LayerToolWhat it catchesCostSetup time
1. Synthetic uptimeUptimeRobotSite down / DNS dead / cert expiredFree15 min
2. Error trackingSentry (web + mobile)JS errors, API failures, mobile crashesFree (5k events/mo)30 min
3. Hourly smokeGitHub Actions cronCritical user flows breakingFreeAlready deployed ✓

Layer 1 — UptimeRobot (5 minute setup, your task)

Why this layer

The hourly cron + Sentry can both miss a "site is fully down" event because they require some part of the stack to work. UptimeRobot pings from outside our infra every 5 minutes, with WhatsApp/email alerts when something doesn't respond.

Setup

  1. Go to https://uptimerobot.com/signUp — create a free account with siddugkattimani@gmail.com
  2. Confirm your email
  3. Click Add New Monitor, repeat for each:
Friendly NameTypeURLCheck IntervalTimeout
KA26 ProductionHTTP(s)https://ka26.shop/api/health5 minutes30 s
KA26 LandingHTTP(s)https://ka-26.com5 minutes30 s
KA26 SSL Cert (shop)SSL/TLSka26.shop:4431 day
KA26 SSL Cert (landing)SSL/TLSka-26.com:4431 day
  1. Under My Settings → Alert Contacts, add:

    • Your phone number (free SMS — limited per month) — most important
    • Your email
    • Your WhatsApp (paid tier only — skip unless you upgrade)
  2. Optional: Public status page at status.ka-26.com. Under Status Pages → Add New, create one named "KA26 Status" with the 4 monitors above. Free tier gives you a public URL — add a CNAME status in Hostinger pointing to it.

What you'll see

  • Email / SMS within 2 minutes of any outage
  • Daily uptime percentages (target: 99.9%+)
  • Response time graphs (catches slow degradation before full outage)

Layer 2 — Sentry (web + mobile error tracking, requires DSN)

Status (2026-04-18) — ✅ Fully active in production

Both web and mobile Sentry SDKs are installed AND configured AND receiving events.

SurfaceProject on SentryDSN locationVerified
Web (Next.js + API routes)ka26-marketplace (org: ka26)Cloud Run env vars SENTRY_DSN + NEXT_PUBLIC_SENTRY_DSN + SENTRY_ENV=production
Mobile (React Native)ka26-mobile (org: ka26)mobile/app.jsonexpo.extra.sentryDsn (DSNs are public IDs, safe to ship)

If a JS error fires on either surface, it appears in the Sentry issues feed within ~30s.

Original setup steps (kept for reference / re-setup)

Web setup (15 min)

  1. Sign up at https://sentry.io with siddugkattimani@gmail.com (free tier: 5k events/month)
  2. Create org → name it ka26
  3. Create project:
    • Platform: Next.js
    • Project name: ka26-marketplace
  4. Copy the DSN from the setup screen (looks like https://xxx@oXXX.ingest.sentry.io/YYY)
  5. Set on Cloud Run:
gcloud run services update ka26-marketplace --region us-central1 \
--project=school-mgmt-saas \
--update-env-vars NEXT_PUBLIC_SENTRY_DSN=YOUR_DSN_HERE,SENTRY_DSN=YOUR_DSN_HERE,SENTRY_ENV=production

(Both SENTRY_DSN and NEXT_PUBLIC_SENTRY_DSN must be set — the public one ships to the browser, the server one stays on the backend. They can be the same value.)

  1. Optional but recommended — for readable stack traces, add SENTRY_AUTH_TOKEN:
    • Sentry → Settings → Account → API → Auth Tokens → Create Token (scope: project:releases)
    • Add to GitHub Actions secrets named SENTRY_AUTH_TOKEN
    • Source maps upload automatically on every deploy (already wired in next.config.ts)

Mobile setup (10 min)

  1. Same Sentry org, create another project:
    • Platform: React Native
    • Project name: ka26-mobile
  2. Copy the DSN
  3. Add to mobile/app.json under expo.extra:
{
"expo": {
"extra": {
"EXPO_PUBLIC_SENTRY_DSN": "YOUR_MOBILE_DSN_HERE"
}
}
}

OR use EAS Secret if you build via EAS. 4. Rebuild the APK — Sentry boots on next launch.

What you'll see

  • Issues feed — every JS error grouped by type
  • Performance — slow transactions, N+1 query patterns
  • Release health — crash-free user % (target 99.5%+)
  • Email alert within 5 min of new error type appearing

Layer 3 — GitHub Actions hourly health cron (already deployed ✓)

What it does

Runs at :17 past every hour:

  1. Calls GET https://ka26.shop/api/health?key=ka26-health-2026 — fails if status is error
  2. Runs the full tests/e2e-smoke.test.ts suite against production — verifies critical pages + APIs
  3. On failure: emails the team via Gmail SMTP

What you need to configure

Email alerts use the existing SMTP credentials. Add these GitHub secrets at github.com/sidgk/ka26-marketplace/settings/secrets/actions:

SecretValue
SMTP_USERnoreply@ka-26.com
SMTP_PASS(the App Password we created — same one in GCP Secret Manager)
ALERT_TOsiddugkattimani@gmail.com (and any team emails)

Without these the workflow still runs and fails on health issues — but the email alert is skipped. Set them so you get notified.

Manual trigger

Anytime you want to verify production: GitHub → Actions → "Production Health Check (hourly)" → Run workflow.


Bonus — what the existing /api/health endpoint already monitors

These 7 checks run on every request to /api/health:

  1. DatabaseSELECT 1 round-trip
  2. Critical pages — fetches /, /shop, /reels, /requests, /profile
  3. Auth integrity — verifies admin user exists with correct ID
  4. Reel data integrity — 5 most recent reels have valid data
  5. Route integrity — product detail routes resolve correctly
  6. Order system — at least one active store + restaurant exists
  7. WhatsApp links — admin user's WhatsApp number is non-empty

If any FAILS → endpoint returns status: "error" (HTTP 500-ish equivalent to UptimeRobot). If any WARNS → returns status: "degraded" (still 200).


What to do when an alert fires

  1. UptimeRobot says down → check Cloud Run console, look at recent revision deploys
  2. Sentry says new error type → click into the issue, see the stack trace + breadcrumbs
  3. Health cron fails → open the GitHub Actions run, see which check failed
  4. All 3 fire at once → roll back to the previous Cloud Run revision:
# List revisions
gcloud run revisions list --service ka26-marketplace --region us-central1 --project school-mgmt-saas

# Roll back traffic to a known-good one
gcloud run services update-traffic ka26-marketplace \
--region us-central1 --project school-mgmt-saas \
--to-revisions ka26-marketplace-00XXX-yyy=100

Monitoring philosophy

The 3 layers are defense in depth:

  • UptimeRobot catches what Sentry can't (full outage → no JS to error-report)
  • Sentry catches what UptimeRobot can't (200 OK page that's actually broken inside)
  • Health cron catches what both miss (specific user flow regressions)

If any single layer were perfect, we wouldn't need the others. Together they catch ~95% of issues before users see them.