🎯 Funday Platform Roadmap & Action Plan#
Last updated: 2026-05-20
Status: Living document — updated as phases complete
Audience: Humans and AI agents working on the Funday platform
This is the prioritized, actionable roadmap for the Funday Gaming Platform. Items are ordered by criticality — infrastructure stability first, then developer experience, then growth.
Legend:
- 🔴 P0 — Critical (broken, broken-adjacent, or data-loss risk)
- 🟠 P1 — High (reliability, security, or major DX gap)
- 🟡 P2 — Medium (quality, performance, or completeness)
- 🟢 P3 — Low (nice-to-have, polish, future-proofing)
🔴 P0 — Critical#
0.1 Nakama Health Endpoint Returning HTTP Error#
Symptom: curl https://funday.gg/v2/healthcheck exits with code 22 (HTTP error).
Impact: External health probes, monitoring, and any client-side health checks fail.
Diagnosis path:
# Check what Nakama returns directly
sudo k3s kubectl exec -n funday-platform deploy/nakama -- curl -s http://localhost:7350/v2/healthcheck
# Check Traefik routing
sudo k3s kubectl get ingress -n funday-platform
sudo k3s kubectl describe ingress nakama -n funday-platform
# Check nginx → Traefik path
curl -v https://funday.gg/v2/healthcheck 2>&1 | grep -E "< HTTP|{|}"Fix vectors:
- Verify Nakama pod is serving
/v2/healthcheckon port 7350 - Verify Traefik ingress routes
/v2to Nakama service on correct port - Verify nginx
/v2location proxies to Traefik:32443 with correct headers - Check if Nakama is returning non-200 (e.g., 503 if DB connection is down)
0.2 Hermes Discord Gateway Offline#
Symptom: hermes-gateway.service is inactive.
Impact: No Discord bot, no agent interaction via Discord.
Diagnosis path:
sudo systemctl status hermes-gateway
sudo journalctl -u hermes-gateway --no-pager -n 50Fix vectors:
- Check if the service crashed — look for OOM, config errors, or token expiry
- Verify Discord bot token in Hermes config
- Restart:
sudo systemctl restart hermes-gateway - If config changed, run
sudo /home/usr/.local/bin/hermes gateway restart
0.3 Database Connection Verification#
Risk: Nakama could be connected to the stale funday-platform/funday DB (~6k users) instead of the real postgresql/nakama DB (19k+ users).
Verify:
sudo k3s kubectl exec -n postgresql deploy/postgres -- psql -U nakama -d nakama -c "SELECT COUNT(*) FROM users;"Expected: 19,000+. If ~6k, Nakama is pointed at the wrong DB — immediate fix required.
🟠 P1 — High#
1.1 Nakama Module Build & Deploy Pipeline#
Current state: Nakama TS modules are built via nakama-modules/ esbuild bundle, then deployed via K8s rollout. No automated CI.
Gap: Manual build → manual deploy. Easy to forget to rebuild after TS changes.
Plan:
- Add a pre-deploy check:
cd nakama-modules && npm run buildmust succeed before any Nakama rollout - Add a verification step: after rollout, confirm match handlers are registered via
/v2/rpc/healthcheck - Document the full deploy sequence in
docs/runbooks/nakama-deploy.md
1.2 Game Completion Matrix — Partial Games Triage#
Current state: 11 partial games, 27 frontend-only, 8 scaffolds.
Gap: 11 partial games are production-registered but incomplete. Players may hit broken experiences.
Plan:
- For each partial game, decide: (a) complete it, (b) set
status: "development"to hide from catalog, or (c) archive togames/_archive/ - Priority order: games with
funday-plugin.jsonbut no entry HTML are the riskiest (registered but unplayable) - Update
docs/game-completion-matrix.mdafter each decision
Partial games to triage:
| Game | Has Plugin | Has Entry | Has Server | Action needed |
|---|---|---|---|---|
| bombergang | ✅ | ❌ | ✅ | Add entry HTML or hide |
| crazycattle | ✅ | ❌ | ❌ | Hide or archive |
| flappaz | ✅ | ❌ | ✅ | Add entry HTML or hide |
| freeciv-web | ✅ | ❌ | ❌ | Hide or archive |
| hwtycoon | ✅ | ❌ | ❌ | Hide or archive |
| mahjong | ✅ | ❌ | ✅ | Add entry HTML or hide |
| pipes | ✅ | ❌ | ✅ | Add entry HTML or hide |
| quest | ✅ | ❌ | ✅ | Add entry HTML or hide |
| settlers | ✅ | ❌ | ✅ | Add entry HTML or hide |
| spaceball | ✅ | ❌ | ❌ | Hide or archive |
| splix | ✅ | ❌ | ❌ | Hide or archive |
1.3 Multiplayer E2E Coverage#
Current state: Only Frogger has a solo regression spec. No multiplayer E2E for any game.
Gap: Multiplayer bugs (match join, state sync, disconnect) are only caught manually.
Plan:
- Phase 1: Write Playwright spec for Connect4 (simplest 2P game) — two browser contexts, create match, play a round
- Phase 2: Write PvP smoke for Battle Cards (already has
card_battlehandler) - Phase 3: Write 2P smoke for Frogger (clear the “at-risk” flag in the playability audit)
- Phase 4: Add Scribblaz multiplayer spec
Spec template:
// frontend/tests/connect4.multiplayer.spec.ts
import { test, expect } from '@playwright/test';
test('Connect4: two players can play a full game', async ({ browser }) => {
// Create two independent browser contexts (two players)
const ctx1 = await browser.newContext();
const ctx2 = await browser.newContext();
const p1 = await ctx1.newPage();
const p2 = await ctx2.newPage();
// Player 1 hosts, Player 2 joins
// ... match flow assertions
});1.4 Nakama Match Handler Audit#
Current state: 27 match handlers registered. Some may be orphaned (code exists but not wired) or have silent bugs.
Plan:
- For each handler in
nakama-modules/index.ts, verify the correspondinggames/{id}/server/match_handler.tsexists and exports correctly - Test each handler’s
matchInitwith empty state to catch runtime crashes - Document handler → game mapping in
docs/nakama-handlers.md
1.5 Monitoring & Alerting Gaps#
Current state: Grafana + Prometheus running. Dashboards exist.
Gap: No alerts for critical failures (Nakama crash, frontend 5xx, DB connection loss).
Plan:
- Add Prometheus alert rule: Nakama pod restarts > 3 in 10 minutes
- Add Prometheus alert rule: Frontend 5xx rate > 1% over 5 minutes
- Add Prometheus alert rule: PostgreSQL connection failures
- Verify Alertmanager webhook is configured and firing
- Add Grafana dashboard panel: active match count per game
🟡 P2 — Medium#
2.1 Frontend — Residual Type Safety#
Current state: svelte-check = 0 errors, 0 warnings. Excellent.
Remaining debt:
- Eliminate remaining
anytypes in game bridge code (FroggerplatformSocket: any,hostUpdate: (partial: any)) - Add strict
tsconfig.jsonflags:strict: true,noUncheckedIndexedAccess: true - Add return-type annotations to all BFF route handlers
2.2 Frogger Refactor (FG1 — Monolithic Component)#
Current state: FroggerGame.svelte is ~1700+ lines mixing render, input, socket, solo loop, and UI.
Plan (incremental):
- Extract
canvasEngine.ts— pure canvas drawing functions - Extract
inputHandler.ts— keyboard/repeat key logic - Extract
soloTick.ts— solo game loop as pure functions - Extract
nakamaHandlers.ts— socket message handlers - Thin
FroggerGame.sveltebecomes orchestrator only (~300 lines)
Each extraction is a separate PR with E2E gate.
2.3 Battle Cards — Matchmaking & Reconnect#
Current state: PvP works via manual match ID only. No matchmaker query. No reconnect.
Plan:
- Wire
listMatches/ matchmaker forgame: battle-cardslabels (BC-N1) - Add reconnect logic: on socket disconnect, attempt to rejoin match with stored match ID (BC-N2)
- Add spectator mode (BC-N2)
- Clean up legacy leaderboard IDs (
card-battle-scoresvsbattle_cards_*) (BC-N4)
2.4 Game Starter Template#
Current state: A starter template exists (Svelte 5 + Threlte) but may have drift.
Plan:
- Verify starter template builds clean with current SvelteKit + Vite versions
- Add CI check:
cd games/_starter && npm run buildon every frontend dependency bump - Document the “create a new game” workflow in
docs/games/creating-a-game.md
2.5 Documentation — Runbooks#
Current state: Architecture docs exist. Operational runbooks are sparse.
Plan: Create the following runbooks in docs/runbooks/:
| Runbook | Content |
|---|---|
nakama-deploy.md | Build → bundle → rollout → verify sequence |
frontend-deploy.md | Atomic build → restart → health check |
db-recovery.md | PostgreSQL backup restore procedure |
game-add.md | End-to-end: new game from scaffold to production |
game-archive.md | Moving a game to _archive/ safely |
incident-response.md | Who to page, what to check, how to rollback |
2.6 Accessibility — Residual Warnings#
Current state: Core shell is clean. Dev tooling (theme generator, kanboard) has minor a11y warnings.
Plan:
- Audit
themegenerator/Preview.svelte— add missingaria-labels - Audit dev kanboard routes — ensure icon buttons have accessible names
- Run
npm run checkafter fixes → target 0 warnings
🟢 P3 — Low#
3.1 Hugo Docs — Content Expansion#
Current state: Basic structure exists (architecture, infrastructure, agents, games, runbooks).
Plan:
- Add
docs/architecture/data-flow.md— detailed request/response diagrams - Add
docs/architecture/auth-flow.md— guest-first auth sequence diagram - Add
docs/games/bridge-protocol.md— complete bridge message reference - Add
docs/games/integration-types.md— svelte-component vs iframe vs dedicated-server - Add
docs/infrastructure/network-map.md— full K8s service topology - Add
docs/infrastructure/tls.md— certificate management, renewal procedure
3.2 Frontend — Performance Pass#
Current state: No major performance issues reported.
Plan:
- Add Lighthouse CI to frontend build (run locally, not in GitHub)
- Audit game card images — ensure lazy loading + correct
aspect-ratio - Audit bundle size —
vite build --report— identify large deps - Add
prefers-reduced-motionguards to any remaining animations
3.3 Developer Experience#
Plan:
- Add
maketargets for common operations:make build # atomic frontend build make deploy # build + restart frontend make nakama-build # build Nakama TS modules make nakama-deploy # build + rollout Nakama make check # svelte-check + typecheck make test # Playwright E2E - Add
.env.examplevalidation — script that checks all required env vars are set - Add pre-commit hook:
svelte-checkon staged files
3.4 Game Polish Backlog#
Per-game items from forensic audits:
| Game | Item | Effort |
|---|---|---|
| Frogger | Lane speed feel vs BASE_LANES constants | Low |
| Frogger | Delete games/frogger/GURU/obsolete/*.js | Low |
| Battle Cards | Guest session continuity (new device ID per client) | Medium |
| Scribblaz | Dedicated Playwright E2E spec | Medium |
| Pebble | Thumbnail asset path audit (.png vs .svg) | Low |
3.5 Infrastructure — Certificate Management#
Current state: Single Let’s Encrypt secret funday-tls-cert.
Plan:
- Verify auto-renewal is working:
sudo certbot certificates - Add monitoring alert: cert expiry < 30 days
- Document renewal procedure in
docs/runbooks/tls-renewal.md
3.6 Infrastructure — Backup Verification#
Current state: PostgreSQL backup job exists (Completed pod visible).
Plan:
- Verify backup integrity: restore to a temp DB and run
SELECT COUNT(*) - Document RTO/RPO expectations
- Add alert: backup job fails or hasn’t run in 25 hours
📊 Current Platform Health Snapshot#
| Component | Status | Notes |
|---|---|---|
| Frontend (SvelteKit) | ✅ Running | funday-frontend.service active, 0 errors |
| Nakama | ⚠️ Degraded | Pod running but /v2/healthcheck returns HTTP error |
| PostgreSQL | ✅ Running | postgresql namespace, 19k+ users |
| Redis | ✅ Running | funday-platform namespace |
| Traefik | ✅ Running | kube-system namespace |
| nginx | ✅ Running | systemd, TLS termination |
| Grafana/Prometheus | ✅ Running | monitoring namespace |
| Hermes Discord | ❌ Offline | hermes-gateway.service inactive |
| Hugo Docs | ✅ Running | hugo-docs.service active |
| Game Pods | ✅ Running | agar, bombergang, micro-racing, splix, zcripple all healthy |
📋 Execution Order#
Immediate (this session):
- Diagnose and fix Nakama healthcheck (P0.1)
- Diagnose and fix Hermes gateway (P0.2)
- Verify DB connection (P0.3)
This week: 4. Partial games triage (P1.2) 5. Nakama deploy pipeline documentation (P1.1) 6. Monitoring alerts (P1.5)
This month: 7. Multiplayer E2E specs (P1.3) 8. Nakama handler audit (P1.4) 9. Frogger refactor phase 1 (P2.1) 10. Runbook creation (P2.5)
Backlog: 11. Hugo docs expansion (P3.1) 12. Performance pass (P3.2) 13. DX improvements (P3.3) 14. Game polish items (P3.4) 15. Certificate monitoring (P3.5) 16. Backup verification (P3.6)
🔄 How to Update This Plan#
When completing an item:
- Move the completed item to the “Completed” section at the bottom
- Add a one-line changelog entry with date
- Re-prioritize remaining items if needed
- Rebuild Hugo docs:
cd /home/usr/funday/dev/docs && hugo
## ✅ Completed
- 2026-05-20 — P0.1: Fixed Nakama healthcheck (root cause: Traefik path prefix strip)
- 2026-05-20 — P0.2: Restarted Hermes gateway (expired Discord token)