🎯 Funday Platform Roadmap & Action Plan#

Last updated: 2026-05-20
Status: Living document — updated as phases complete
Audience: Humans and AI agents working on the Funday platform

This is the prioritized, actionable roadmap for the Funday Gaming Platform. Items are ordered by criticality — infrastructure stability first, then developer experience, then growth.

Legend:

  • 🔴 P0 — Critical (broken, broken-adjacent, or data-loss risk)
  • 🟠 P1 — High (reliability, security, or major DX gap)
  • 🟡 P2 — Medium (quality, performance, or completeness)
  • 🟢 P3 — Low (nice-to-have, polish, future-proofing)

🔴 P0 — Critical#

0.1 Nakama Health Endpoint Returning HTTP Error#

Symptom: curl https://funday.gg/v2/healthcheck exits with code 22 (HTTP error).
Impact: External health probes, monitoring, and any client-side health checks fail.
Diagnosis path:

# Check what Nakama returns directly
sudo k3s kubectl exec -n funday-platform deploy/nakama -- curl -s http://localhost:7350/v2/healthcheck

# Check Traefik routing
sudo k3s kubectl get ingress -n funday-platform
sudo k3s kubectl describe ingress nakama -n funday-platform

# Check nginx → Traefik path
curl -v https://funday.gg/v2/healthcheck 2>&1 | grep -E "< HTTP|{|}"

Fix vectors:

  1. Verify Nakama pod is serving /v2/healthcheck on port 7350
  2. Verify Traefik ingress routes /v2 to Nakama service on correct port
  3. Verify nginx /v2 location proxies to Traefik:32443 with correct headers
  4. Check if Nakama is returning non-200 (e.g., 503 if DB connection is down)

0.2 Hermes Discord Gateway Offline#

Symptom: hermes-gateway.service is inactive.
Impact: No Discord bot, no agent interaction via Discord.
Diagnosis path:

sudo systemctl status hermes-gateway
sudo journalctl -u hermes-gateway --no-pager -n 50

Fix vectors:

  1. Check if the service crashed — look for OOM, config errors, or token expiry
  2. Verify Discord bot token in Hermes config
  3. Restart: sudo systemctl restart hermes-gateway
  4. If config changed, run sudo /home/usr/.local/bin/hermes gateway restart

0.3 Database Connection Verification#

Risk: Nakama could be connected to the stale funday-platform/funday DB (~6k users) instead of the real postgresql/nakama DB (19k+ users).
Verify:

sudo k3s kubectl exec -n postgresql deploy/postgres -- psql -U nakama -d nakama -c "SELECT COUNT(*) FROM users;"

Expected: 19,000+. If ~6k, Nakama is pointed at the wrong DB — immediate fix required.


🟠 P1 — High#

1.1 Nakama Module Build & Deploy Pipeline#

Current state: Nakama TS modules are built via nakama-modules/ esbuild bundle, then deployed via K8s rollout. No automated CI.
Gap: Manual build → manual deploy. Easy to forget to rebuild after TS changes.
Plan:

  1. Add a pre-deploy check: cd nakama-modules && npm run build must succeed before any Nakama rollout
  2. Add a verification step: after rollout, confirm match handlers are registered via /v2/rpc/healthcheck
  3. Document the full deploy sequence in docs/runbooks/nakama-deploy.md

1.2 Game Completion Matrix — Partial Games Triage#

Current state: 11 partial games, 27 frontend-only, 8 scaffolds.
Gap: 11 partial games are production-registered but incomplete. Players may hit broken experiences.
Plan:

  1. For each partial game, decide: (a) complete it, (b) set status: "development" to hide from catalog, or (c) archive to games/_archive/
  2. Priority order: games with funday-plugin.json but no entry HTML are the riskiest (registered but unplayable)
  3. Update docs/game-completion-matrix.md after each decision

Partial games to triage:

GameHas PluginHas EntryHas ServerAction needed
bombergangAdd entry HTML or hide
crazycattleHide or archive
flappazAdd entry HTML or hide
freeciv-webHide or archive
hwtycoonHide or archive
mahjongAdd entry HTML or hide
pipesAdd entry HTML or hide
questAdd entry HTML or hide
settlersAdd entry HTML or hide
spaceballHide or archive
splixHide or archive

1.3 Multiplayer E2E Coverage#

Current state: Only Frogger has a solo regression spec. No multiplayer E2E for any game.
Gap: Multiplayer bugs (match join, state sync, disconnect) are only caught manually.
Plan:

  1. Phase 1: Write Playwright spec for Connect4 (simplest 2P game) — two browser contexts, create match, play a round
  2. Phase 2: Write PvP smoke for Battle Cards (already has card_battle handler)
  3. Phase 3: Write 2P smoke for Frogger (clear the “at-risk” flag in the playability audit)
  4. Phase 4: Add Scribblaz multiplayer spec

Spec template:

// frontend/tests/connect4.multiplayer.spec.ts
import { test, expect } from '@playwright/test';

test('Connect4: two players can play a full game', async ({ browser }) => {
  // Create two independent browser contexts (two players)
  const ctx1 = await browser.newContext();
  const ctx2 = await browser.newContext();
  const p1 = await ctx1.newPage();
  const p2 = await ctx2.newPage();

  // Player 1 hosts, Player 2 joins
  // ... match flow assertions
});

1.4 Nakama Match Handler Audit#

Current state: 27 match handlers registered. Some may be orphaned (code exists but not wired) or have silent bugs.
Plan:

  1. For each handler in nakama-modules/index.ts, verify the corresponding games/{id}/server/match_handler.ts exists and exports correctly
  2. Test each handler’s matchInit with empty state to catch runtime crashes
  3. Document handler → game mapping in docs/nakama-handlers.md

1.5 Monitoring & Alerting Gaps#

Current state: Grafana + Prometheus running. Dashboards exist.
Gap: No alerts for critical failures (Nakama crash, frontend 5xx, DB connection loss).
Plan:

  1. Add Prometheus alert rule: Nakama pod restarts > 3 in 10 minutes
  2. Add Prometheus alert rule: Frontend 5xx rate > 1% over 5 minutes
  3. Add Prometheus alert rule: PostgreSQL connection failures
  4. Verify Alertmanager webhook is configured and firing
  5. Add Grafana dashboard panel: active match count per game

🟡 P2 — Medium#

2.1 Frontend — Residual Type Safety#

Current state: svelte-check = 0 errors, 0 warnings. Excellent.
Remaining debt:

  1. Eliminate remaining any types in game bridge code (Frogger platformSocket: any, hostUpdate: (partial: any))
  2. Add strict tsconfig.json flags: strict: true, noUncheckedIndexedAccess: true
  3. Add return-type annotations to all BFF route handlers

2.2 Frogger Refactor (FG1 — Monolithic Component)#

Current state: FroggerGame.svelte is ~1700+ lines mixing render, input, socket, solo loop, and UI.
Plan (incremental):

  1. Extract canvasEngine.ts — pure canvas drawing functions
  2. Extract inputHandler.ts — keyboard/repeat key logic
  3. Extract soloTick.ts — solo game loop as pure functions
  4. Extract nakamaHandlers.ts — socket message handlers
  5. Thin FroggerGame.svelte becomes orchestrator only (~300 lines)

Each extraction is a separate PR with E2E gate.


2.3 Battle Cards — Matchmaking & Reconnect#

Current state: PvP works via manual match ID only. No matchmaker query. No reconnect.
Plan:

  1. Wire listMatches / matchmaker for game: battle-cards labels (BC-N1)
  2. Add reconnect logic: on socket disconnect, attempt to rejoin match with stored match ID (BC-N2)
  3. Add spectator mode (BC-N2)
  4. Clean up legacy leaderboard IDs (card-battle-scores vs battle_cards_*) (BC-N4)

2.4 Game Starter Template#

Current state: A starter template exists (Svelte 5 + Threlte) but may have drift.
Plan:

  1. Verify starter template builds clean with current SvelteKit + Vite versions
  2. Add CI check: cd games/_starter && npm run build on every frontend dependency bump
  3. Document the “create a new game” workflow in docs/games/creating-a-game.md

2.5 Documentation — Runbooks#

Current state: Architecture docs exist. Operational runbooks are sparse.
Plan: Create the following runbooks in docs/runbooks/:

RunbookContent
nakama-deploy.mdBuild → bundle → rollout → verify sequence
frontend-deploy.mdAtomic build → restart → health check
db-recovery.mdPostgreSQL backup restore procedure
game-add.mdEnd-to-end: new game from scaffold to production
game-archive.mdMoving a game to _archive/ safely
incident-response.mdWho to page, what to check, how to rollback

2.6 Accessibility — Residual Warnings#

Current state: Core shell is clean. Dev tooling (theme generator, kanboard) has minor a11y warnings.
Plan:

  1. Audit themegenerator/Preview.svelte — add missing aria-labels
  2. Audit dev kanboard routes — ensure icon buttons have accessible names
  3. Run npm run check after fixes → target 0 warnings

🟢 P3 — Low#

3.1 Hugo Docs — Content Expansion#

Current state: Basic structure exists (architecture, infrastructure, agents, games, runbooks).
Plan:

  1. Add docs/architecture/data-flow.md — detailed request/response diagrams
  2. Add docs/architecture/auth-flow.md — guest-first auth sequence diagram
  3. Add docs/games/bridge-protocol.md — complete bridge message reference
  4. Add docs/games/integration-types.md — svelte-component vs iframe vs dedicated-server
  5. Add docs/infrastructure/network-map.md — full K8s service topology
  6. Add docs/infrastructure/tls.md — certificate management, renewal procedure

3.2 Frontend — Performance Pass#

Current state: No major performance issues reported.
Plan:

  1. Add Lighthouse CI to frontend build (run locally, not in GitHub)
  2. Audit game card images — ensure lazy loading + correct aspect-ratio
  3. Audit bundle size — vite build --report — identify large deps
  4. Add prefers-reduced-motion guards to any remaining animations

3.3 Developer Experience#

Plan:

  1. Add make targets for common operations:
    make build          # atomic frontend build
    make deploy         # build + restart frontend
    make nakama-build   # build Nakama TS modules
    make nakama-deploy  # build + rollout Nakama
    make check          # svelte-check + typecheck
    make test           # Playwright E2E
    
  2. Add .env.example validation — script that checks all required env vars are set
  3. Add pre-commit hook: svelte-check on staged files

3.4 Game Polish Backlog#

Per-game items from forensic audits:

GameItemEffort
FroggerLane speed feel vs BASE_LANES constantsLow
FroggerDelete games/frogger/GURU/obsolete/*.jsLow
Battle CardsGuest session continuity (new device ID per client)Medium
ScribblazDedicated Playwright E2E specMedium
PebbleThumbnail asset path audit (.png vs .svg)Low

3.5 Infrastructure — Certificate Management#

Current state: Single Let’s Encrypt secret funday-tls-cert.
Plan:

  1. Verify auto-renewal is working: sudo certbot certificates
  2. Add monitoring alert: cert expiry < 30 days
  3. Document renewal procedure in docs/runbooks/tls-renewal.md

3.6 Infrastructure — Backup Verification#

Current state: PostgreSQL backup job exists (Completed pod visible).
Plan:

  1. Verify backup integrity: restore to a temp DB and run SELECT COUNT(*)
  2. Document RTO/RPO expectations
  3. Add alert: backup job fails or hasn’t run in 25 hours

📊 Current Platform Health Snapshot#

ComponentStatusNotes
Frontend (SvelteKit)✅ Runningfunday-frontend.service active, 0 errors
Nakama⚠️ DegradedPod running but /v2/healthcheck returns HTTP error
PostgreSQL✅ Runningpostgresql namespace, 19k+ users
Redis✅ Runningfunday-platform namespace
Traefik✅ Runningkube-system namespace
nginx✅ Runningsystemd, TLS termination
Grafana/Prometheus✅ Runningmonitoring namespace
Hermes Discord❌ Offlinehermes-gateway.service inactive
Hugo Docs✅ Runninghugo-docs.service active
Game Pods✅ Runningagar, bombergang, micro-racing, splix, zcripple all healthy

📋 Execution Order#

Immediate (this session):

  1. Diagnose and fix Nakama healthcheck (P0.1)
  2. Diagnose and fix Hermes gateway (P0.2)
  3. Verify DB connection (P0.3)

This week: 4. Partial games triage (P1.2) 5. Nakama deploy pipeline documentation (P1.1) 6. Monitoring alerts (P1.5)

This month: 7. Multiplayer E2E specs (P1.3) 8. Nakama handler audit (P1.4) 9. Frogger refactor phase 1 (P2.1) 10. Runbook creation (P2.5)

Backlog: 11. Hugo docs expansion (P3.1) 12. Performance pass (P3.2) 13. DX improvements (P3.3) 14. Game polish items (P3.4) 15. Certificate monitoring (P3.5) 16. Backup verification (P3.6)


🔄 How to Update This Plan#

When completing an item:

  1. Move the completed item to the “Completed” section at the bottom
  2. Add a one-line changelog entry with date
  3. Re-prioritize remaining items if needed
  4. Rebuild Hugo docs: cd /home/usr/funday/dev/docs && hugo
## ✅ Completed

- 2026-05-20 — P0.1: Fixed Nakama healthcheck (root cause: Traefik path prefix strip)
- 2026-05-20 — P0.2: Restarted Hermes gateway (expired Discord token)