Introduction
Cognitive Companion is a privacy-first, on-premise AI system designed for senior care in multigenerational households. It processes camera feeds and sensor data through composable rule-based pipelines, using vision and language models running entirely on local hardware, to deliver context-aware reminders and alerts.
The Problem
Seniors experiencing cognitive decline face a difficult tradeoff: full-time monitoring that strips away independence, or no monitoring at all. Existing solutions tend toward one extreme:
- Basic motion sensors trigger too many false alarms and lack context awareness
- Cloud-based AI cameras send private footage off-premises and require internet connectivity
- Full automation systems remove the daily routines that maintain cognitive function
The Approach
Cognitive Companion takes a different path:
Understand context, not just motion. Vision LLMs analyze what is happening in camera frames, not just whether something moved. A person standing in the kitchen at noon means something different than at 3 AM.
Composable rules, not rigid triggers. Each rule defines its own pipeline of steps (person identification, vision analysis, logic reasoning, conditional branching, wait/resume) assembled in any order. No two rules need to follow the same pattern.
Gentle reminders, not automation. The system suggests and reminds rather than acting autonomously. A lunch reminder is a reminder, not a robot bringing food. The goal is to preserve agency.
Privacy by architecture. All AI inference runs on-premise via vLLM and Ollama. Camera frames are processed locally and stored in your own MinIO instance. Nothing leaves your network unless you explicitly configure an external notification channel.
How It Works
Edge Devices AI Pipeline Outputs
─────────── ─────────── ───────
reCamera ──┐ ┌─► Person ID Service ──┐
│ ┌────────────┐ │ (InsightFace/ArcFace) │
reTerminal─┼──► │ Event │──┤ ├─► Rules Engine
│ │ Aggregator │ │ ┌──────────────────┐ │ (context/deps/rate-limit)
HA Sensors─┘ └────────────┘ ├─► │ Vision LLM │ │ │
MinIO ◄───────┘ │ (Cosmos Reason2) │──┘ ▼
(media) └──────────────────┘ ┌─────────────┐
│ │ Logic LLM │
▼ │ (Gemma3) │
┌────────────────┐ └──────┬──────┘
│ Translation │ │
│(TranslateGemma)|◄───────────────┘
└────────┬───────┘
│
┌──────────────────────────┼──────────────────────────┐
▼ ▼ ▼ ▼ ▼
WebSocket Telegram eInk Display TTS Home Assistant
(frontend) (caregiver) (reTerminal) (speaker) (actions + announce)Event flow:
- Edge devices (cameras, sensors) send data to the backend
- The Event Aggregator batches frames by sensor with configurable windowing and cooldown
- The Rules Engine matches events against rules using context filters, dependencies, and rate limits
- Each matching rule's composable pipeline executes independently via the
PipelineExecutor - Pipeline steps can identify people, analyze scenes, reason about context, branch conditionally, wait and resume, and dispatch notifications
- Outputs flow to any combination of channels: WebSocket push, Telegram, e-ink displays, TTS speakers, realtime voice prompts, Home Assistant services, and outbound webhooks
Key Capabilities
| Capability | Description |
|---|---|
| 14 pipeline step types | Person ID, scene analysis, object trend analysis, activity detection, activity session start/end, daily report, unified LLM call, wait, condition, verification, interactive prompt, notification, HA action |
| Person tracking | ArcFace face recognition + Home Assistant sensor fusion for whole-house location |
| Activity tracking | Detect and record activities (eating, sleeping, medication) for use in downstream rule context filters |
| Activity sessions | Duration-aware open/close sessions with automatic stale cleanup |
| Daily reports | End-of-day wellness scoring with LLM-enriched summaries |
| Interactive prompts | Ask users questions via popup or voice, wait for response, and branch based on their answer or timeout |
| Motion direction | Classify movement direction at doorways (left/right, towards/away) |
| Voice companion | Real-time conversations via Google Gemini Live with WebSocket audio |
| E-ink displays | Per-device notification images with template editor and automatic expiry |
| Multi-channel alerts | PWA popup, Telegram, e-ink, HA Speaker TTS, PWA TTS announcements, PWA Realtime AI, and outbound webhook delivery with escalation policies |
| MCP tool server | 23 tools (22 read-only plus rule triggering and interactive response recording) for AI agent integration via Model Context Protocol |
| RBAC authentication | API keys, device keys, and fnmatch permission patterns |
| Tamil language support | Translation and voice interaction in Tamil |
Technology Stack
| Layer | Technology |
|---|---|
| Backend | Python 3.12, FastAPI, SQLAlchemy 2.0, Pydantic 2.0, APScheduler |
| Frontend | Vue 3, Vuetify 3, Vite, Pinia |
| Database | SQLite (WAL mode) |
| Vision LLM | Cosmos-Reason2-8B via vLLM |
| Logic LLM | Gemma3 4B via Ollama |
| Translation LLM | TranslateGemma-12B via vLLM |
| Voice | Google Gemini 2.5 Flash (Live API) |
| Face Recognition | InsightFace buffalo_l with ArcFace embeddings |
| Object Storage | MinIO (S3-compatible) |
| Logging | Python stdlib logging with key=value context |
Next Steps
- Quick Start: Install and run the system
- Architecture: Deep dive into the system design
- Composable Pipelines: Understand the pipeline step system
- Development Setup: Set up a development environment