KA26 Production Monitoring — Setup Guide

3-layer monitoring for going-live, designed to catch issues before users report them.

Layer	Tool	What it catches	Cost	Setup time
1. Synthetic uptime	UptimeRobot	Site down / DNS dead / cert expired	Free	15 min
2. Error tracking	Sentry (web + mobile)	JS errors, API failures, mobile crashes	Free (5k events/mo)	30 min
3. Hourly smoke	GitHub Actions cron	Critical user flows breaking	Free	Already deployed ✓

Layer 1 — UptimeRobot (5 minute setup, your task)

Why this layer

The hourly cron + Sentry can both miss a "site is fully down" event because they require some part of the stack to work. UptimeRobot pings from outside our infra every 5 minutes, with WhatsApp/email alerts when something doesn't respond.

Setup

Go to https://uptimerobot.com/signUp — create a free account with siddugkattimani@gmail.com
Confirm your email
Click Add New Monitor, repeat for each:

Friendly Name	Type	URL	Check Interval	Timeout
KA26 Production	HTTP(s)	`https://ka26.shop/api/health`	5 minutes	30 s
KA26 Landing	HTTP(s)	`https://ka-26.com`	5 minutes	30 s
KA26 SSL Cert (shop)	SSL/TLS	`ka26.shop:443`	1 day	—
KA26 SSL Cert (landing)	SSL/TLS	`ka-26.com:443`	1 day	—

Under My Settings → Alert Contacts, add:
- Your phone number (free SMS — limited per month) — most important
- Your email
- Your WhatsApp (paid tier only — skip unless you upgrade)
Optional: Public status page at status.ka-26.com. Under Status Pages → Add New, create one named "KA26 Status" with the 4 monitors above. Free tier gives you a public URL — add a CNAME status in Hostinger pointing to it.

What you'll see

Email / SMS within 2 minutes of any outage
Daily uptime percentages (target: 99.9%+)
Response time graphs (catches slow degradation before full outage)

Layer 2 — Sentry (web + mobile error tracking, requires DSN)

Status (2026-04-18) — ✅ Fully active in production

Both web and mobile Sentry SDKs are installed AND configured AND receiving events.

Surface	Project on Sentry	DSN location	Verified
Web (Next.js + API routes)	`ka26-marketplace` (org: `ka26`)	Cloud Run env vars `SENTRY_DSN` + `NEXT_PUBLIC_SENTRY_DSN` + `SENTRY_ENV=production`	✅
Mobile (React Native)	`ka26-mobile` (org: `ka26`)	`mobile/app.json` → `expo.extra.sentryDsn` (DSNs are public IDs, safe to ship)	✅

If a JS error fires on either surface, it appears in the Sentry issues feed within ~30s.

Original setup steps (kept for reference / re-setup)

Web setup (15 min)

Sign up at https://sentry.io with siddugkattimani@gmail.com (free tier: 5k events/month)
Create org → name it ka26
Create project:
- Platform: Next.js
- Project name: ka26-marketplace
Copy the DSN from the setup screen (looks like https://xxx@oXXX.ingest.sentry.io/YYY)
Set on Cloud Run:

gcloud run services update ka26-marketplace --region us-central1 \
  --project=school-mgmt-saas \
  --update-env-vars NEXT_PUBLIC_SENTRY_DSN=YOUR_DSN_HERE,SENTRY_DSN=YOUR_DSN_HERE,SENTRY_ENV=production

(Both SENTRY_DSN and NEXT_PUBLIC_SENTRY_DSN must be set — the public one ships to the browser, the server one stays on the backend. They can be the same value.)

Optional but recommended — for readable stack traces, add SENTRY_AUTH_TOKEN:
- Sentry → Settings → Account → API → Auth Tokens → Create Token (scope: project:releases)
- Add to GitHub Actions secrets named SENTRY_AUTH_TOKEN
- Source maps upload automatically on every deploy (already wired in next.config.ts)

Mobile setup (10 min)

Same Sentry org, create another project:
- Platform: React Native
- Project name: ka26-mobile
Copy the DSN
Add to mobile/app.json under expo.extra:

{
  "expo": {
    "extra": {
      "EXPO_PUBLIC_SENTRY_DSN": "YOUR_MOBILE_DSN_HERE"
    }
  }
}

OR use EAS Secret if you build via EAS. 4. Rebuild the APK — Sentry boots on next launch.

What you'll see

Issues feed — every JS error grouped by type
Performance — slow transactions, N+1 query patterns
Release health — crash-free user % (target 99.5%+)
Email alert within 5 min of new error type appearing

Layer 3 — GitHub Actions hourly health cron (already deployed ✓)

What it does

Runs at :17 past every hour:

Calls GET https://ka26.shop/api/health?key=ka26-health-2026 — fails if status is error
Runs the full tests/e2e-smoke.test.ts suite against production — verifies critical pages + APIs
On failure: emails the team via Gmail SMTP

What you need to configure

Email alerts use the existing SMTP credentials. Add these GitHub secrets at github.com/sidgk/ka26-marketplace/settings/secrets/actions:

Secret	Value
`SMTP_USER`	`noreply@ka-26.com`
`SMTP_PASS`	(the App Password we created — same one in GCP Secret Manager)
`ALERT_TO`	`siddugkattimani@gmail.com` (and any team emails)

Without these the workflow still runs and fails on health issues — but the email alert is skipped. Set them so you get notified.

Manual trigger

Anytime you want to verify production: GitHub → Actions → "Production Health Check (hourly)" → Run workflow.

Bonus — what the existing `/api/health` endpoint already monitors

These 7 checks run on every request to /api/health:

Database — SELECT 1 round-trip
Critical pages — fetches /, /shop, /reels, /requests, /profile
Auth integrity — verifies admin user exists with correct ID
Reel data integrity — 5 most recent reels have valid data
Route integrity — product detail routes resolve correctly
Order system — at least one active store + restaurant exists
WhatsApp links — admin user's WhatsApp number is non-empty

If any FAILS → endpoint returns status: "error" (HTTP 500-ish equivalent to UptimeRobot). If any WARNS → returns status: "degraded" (still 200).

What to do when an alert fires

UptimeRobot says down → check Cloud Run console, look at recent revision deploys
Sentry says new error type → click into the issue, see the stack trace + breadcrumbs
Health cron fails → open the GitHub Actions run, see which check failed
All 3 fire at once → roll back to the previous Cloud Run revision:

# List revisions
gcloud run revisions list --service ka26-marketplace --region us-central1 --project school-mgmt-saas

# Roll back traffic to a known-good one
gcloud run services update-traffic ka26-marketplace \
  --region us-central1 --project school-mgmt-saas \
  --to-revisions ka26-marketplace-00XXX-yyy=100

Monitoring philosophy

The 3 layers are defense in depth:

UptimeRobot catches what Sentry can't (full outage → no JS to error-report)
Sentry catches what UptimeRobot can't (200 OK page that's actually broken inside)
Health cron catches what both miss (specific user flow regressions)

If any single layer were perfect, we wouldn't need the others. Together they catch ~95% of issues before users see them.

Layer 1 — UptimeRobot (5 minute setup, your task)​

Why this layer​

Setup​

What you'll see​

Layer 2 — Sentry (web + mobile error tracking, requires DSN)​

Status (2026-04-18) — ✅ Fully active in production​

Original setup steps (kept for reference / re-setup)​

Web setup (15 min)​

Mobile setup (10 min)​

What you'll see​

Layer 3 — GitHub Actions hourly health cron (already deployed ✓)​

What it does​

What you need to configure​

Manual trigger​

Bonus — what the existing /api/health endpoint already monitors​

What to do when an alert fires​

Monitoring philosophy​

Layer 1 — UptimeRobot (5 minute setup, your task)

Why this layer

Setup

What you'll see

Layer 2 — Sentry (web + mobile error tracking, requires DSN)

Status (2026-04-18) — ✅ Fully active in production

Original setup steps (kept for reference / re-setup)

Web setup (15 min)

Mobile setup (10 min)

What you'll see

Layer 3 — GitHub Actions hourly health cron (already deployed ✓)

What it does

What you need to configure

Manual trigger

Bonus — what the existing `/api/health` endpoint already monitors

What to do when an alert fires

Monitoring philosophy