KA26 Production Monitoring — Setup Guide
3-layer monitoring for going-live, designed to catch issues before users report them.
| Layer | Tool | What it catches | Cost | Setup time |
|---|---|---|---|---|
| 1. Synthetic uptime | UptimeRobot | Site down / DNS dead / cert expired | Free | 15 min |
| 2. Error tracking | Sentry (web + mobile) | JS errors, API failures, mobile crashes | Free (5k events/mo) | 30 min |
| 3. Hourly smoke | GitHub Actions cron | Critical user flows breaking | Free | Already deployed ✓ |
Layer 1 — UptimeRobot (5 minute setup, your task)
Why this layer
The hourly cron + Sentry can both miss a "site is fully down" event because they require some part of the stack to work. UptimeRobot pings from outside our infra every 5 minutes, with WhatsApp/email alerts when something doesn't respond.
Setup
- Go to https://uptimerobot.com/signUp — create a free account with
siddugkattimani@gmail.com - Confirm your email
- Click Add New Monitor, repeat for each:
| Friendly Name | Type | URL | Check Interval | Timeout |
|---|---|---|---|---|
| KA26 Production | HTTP(s) | https://ka26.shop/api/health | 5 minutes | 30 s |
| KA26 Landing | HTTP(s) | https://ka-26.com | 5 minutes | 30 s |
| KA26 SSL Cert (shop) | SSL/TLS | ka26.shop:443 | 1 day | — |
| KA26 SSL Cert (landing) | SSL/TLS | ka-26.com:443 | 1 day | — |
-
Under My Settings → Alert Contacts, add:
- Your phone number (free SMS — limited per month) — most important
- Your email
- Your WhatsApp (paid tier only — skip unless you upgrade)
-
Optional: Public status page at
status.ka-26.com. Under Status Pages → Add New, create one named "KA26 Status" with the 4 monitors above. Free tier gives you a public URL — add a CNAMEstatusin Hostinger pointing to it.
What you'll see
- Email / SMS within 2 minutes of any outage
- Daily uptime percentages (target: 99.9%+)
- Response time graphs (catches slow degradation before full outage)
Layer 2 — Sentry (web + mobile error tracking, requires DSN)
Status (2026-04-18) — ✅ Fully active in production
Both web and mobile Sentry SDKs are installed AND configured AND receiving events.
| Surface | Project on Sentry | DSN location | Verified |
|---|---|---|---|
| Web (Next.js + API routes) | ka26-marketplace (org: ka26) | Cloud Run env vars SENTRY_DSN + NEXT_PUBLIC_SENTRY_DSN + SENTRY_ENV=production | ✅ |
| Mobile (React Native) | ka26-mobile (org: ka26) | mobile/app.json → expo.extra.sentryDsn (DSNs are public IDs, safe to ship) | ✅ |
If a JS error fires on either surface, it appears in the Sentry issues feed within ~30s.
Original setup steps (kept for reference / re-setup)
Web setup (15 min)
- Sign up at https://sentry.io with
siddugkattimani@gmail.com(free tier: 5k events/month) - Create org → name it
ka26 - Create project:
- Platform: Next.js
- Project name:
ka26-marketplace
- Copy the DSN from the setup screen (looks like
https://xxx@oXXX.ingest.sentry.io/YYY) - Set on Cloud Run:
gcloud run services update ka26-marketplace --region us-central1 \
--project=school-mgmt-saas \
--update-env-vars NEXT_PUBLIC_SENTRY_DSN=YOUR_DSN_HERE,SENTRY_DSN=YOUR_DSN_HERE,SENTRY_ENV=production
(Both SENTRY_DSN and NEXT_PUBLIC_SENTRY_DSN must be set — the public one ships to the browser, the server one stays on the backend. They can be the same value.)
- Optional but recommended — for readable stack traces, add
SENTRY_AUTH_TOKEN:- Sentry → Settings → Account → API → Auth Tokens → Create Token (scope:
project:releases) - Add to GitHub Actions secrets named
SENTRY_AUTH_TOKEN - Source maps upload automatically on every deploy (already wired in
next.config.ts)
- Sentry → Settings → Account → API → Auth Tokens → Create Token (scope:
Mobile setup (10 min)
- Same Sentry org, create another project:
- Platform: React Native
- Project name:
ka26-mobile
- Copy the DSN
- Add to
mobile/app.jsonunderexpo.extra:
{
"expo": {
"extra": {
"EXPO_PUBLIC_SENTRY_DSN": "YOUR_MOBILE_DSN_HERE"
}
}
}
OR use EAS Secret if you build via EAS. 4. Rebuild the APK — Sentry boots on next launch.
What you'll see
- Issues feed — every JS error grouped by type
- Performance — slow transactions, N+1 query patterns
- Release health — crash-free user % (target 99.5%+)
- Email alert within 5 min of new error type appearing
Layer 3 — GitHub Actions hourly health cron (already deployed ✓)
What it does
Runs at :17 past every hour:
- Calls
GET https://ka26.shop/api/health?key=ka26-health-2026— fails if status iserror - Runs the full
tests/e2e-smoke.test.tssuite against production — verifies critical pages + APIs - On failure: emails the team via Gmail SMTP
What you need to configure
Email alerts use the existing SMTP credentials. Add these GitHub secrets at github.com/sidgk/ka26-marketplace/settings/secrets/actions:
| Secret | Value |
|---|---|
SMTP_USER | noreply@ka-26.com |
SMTP_PASS | (the App Password we created — same one in GCP Secret Manager) |
ALERT_TO | siddugkattimani@gmail.com (and any team emails) |
Without these the workflow still runs and fails on health issues — but the email alert is skipped. Set them so you get notified.
Manual trigger
Anytime you want to verify production: GitHub → Actions → "Production Health Check (hourly)" → Run workflow.
Bonus — what the existing /api/health endpoint already monitors
These 7 checks run on every request to /api/health:
- Database —
SELECT 1round-trip - Critical pages — fetches
/,/shop,/reels,/requests,/profile - Auth integrity — verifies admin user exists with correct ID
- Reel data integrity — 5 most recent reels have valid data
- Route integrity — product detail routes resolve correctly
- Order system — at least one active store + restaurant exists
- WhatsApp links — admin user's WhatsApp number is non-empty
If any FAILS → endpoint returns status: "error" (HTTP 500-ish equivalent to UptimeRobot).
If any WARNS → returns status: "degraded" (still 200).
What to do when an alert fires
- UptimeRobot says down → check Cloud Run console, look at recent revision deploys
- Sentry says new error type → click into the issue, see the stack trace + breadcrumbs
- Health cron fails → open the GitHub Actions run, see which check failed
- All 3 fire at once → roll back to the previous Cloud Run revision:
# List revisions
gcloud run revisions list --service ka26-marketplace --region us-central1 --project school-mgmt-saas
# Roll back traffic to a known-good one
gcloud run services update-traffic ka26-marketplace \
--region us-central1 --project school-mgmt-saas \
--to-revisions ka26-marketplace-00XXX-yyy=100
Monitoring philosophy
The 3 layers are defense in depth:
- UptimeRobot catches what Sentry can't (full outage → no JS to error-report)
- Sentry catches what UptimeRobot can't (200 OK page that's actually broken inside)
- Health cron catches what both miss (specific user flow regressions)
If any single layer were perfect, we wouldn't need the others. Together they catch ~95% of issues before users see them.