Introduction

Cognitive Companion is a privacy-first, on-premise AI system designed for senior care in multigenerational households. It processes camera feeds and sensor data through composable rule-based pipelines, using vision and language models running entirely on local hardware, to deliver context-aware reminders and alerts.

The Problem

Seniors experiencing cognitive decline face a difficult tradeoff: full-time monitoring that strips away independence, or no monitoring at all. Existing solutions tend toward one extreme:

Basic motion sensors trigger too many false alarms and lack context awareness
Cloud-based AI cameras send private footage off-premises and require internet connectivity
Full automation systems remove the daily routines that maintain cognitive function

The Approach

Cognitive Companion takes a different path:

Understand context, not just motion. Vision LLMs analyze what is happening in camera frames, not just whether something moved. A person standing in the kitchen at noon means something different than at 3 AM.
Composable rules, not rigid triggers. Each rule defines its own pipeline of steps (person identification, vision analysis, logic reasoning, conditional branching, wait/resume) assembled in any order. No two rules need to follow the same pattern.
Gentle reminders, not automation. The system suggests and reminds rather than acting autonomously. A lunch reminder is a reminder, not a robot bringing food. The goal is to preserve agency.
Privacy by architecture. All AI inference runs on-premise via vLLM and Ollama. Camera frames are processed locally and stored in your own MinIO instance. Nothing leaves your network unless you explicitly configure an external notification channel.

How It Works

text

 Edge Devices                         AI Pipeline                              Outputs
 ───────────                         ───────────                              ───────

 reCamera ──┐                    ┌─► Person ID Service ──┐
            │    ┌────────────┐  │   (InsightFace/ArcFace) │
 reTerminal─┼──► │   Event    │──┤                         ├─► Rules Engine
            │    │ Aggregator │  │   ┌──────────────────┐  │   (context/deps/rate-limit)
 HA Sensors─┘    └────────────┘  ├─► │ Vision LLM       │  │        │
                   MinIO ◄───────┘   │ (Cosmos Reason2) │──┘        ▼
                  (media)            └──────────────────┘    ┌─────────────┐
                                           │                 │  Logic LLM  │
                                           ▼                 │  (Gemma3)   │
                                  ┌────────────────┐         └──────┬──────┘
                                  │ Translation    │                │
                                  │(TranslateGemma)|◄───────────────┘
                                  └────────┬───────┘
                                           │
                ┌──────────────────────────┼──────────────────────────┐
                ▼              ▼           ▼           ▼              ▼
           WebSocket      Telegram     eInk Display   TTS      Home Assistant
           (frontend)     (caregiver)  (reTerminal)  (speaker) (actions + announce)

Event flow:

Edge devices (cameras, sensors) send data to the backend
The Event Aggregator batches frames by sensor with configurable windowing and cooldown
The Rules Engine matches events against rules using context filters, dependencies, and rate limits
Each matching rule's composable pipeline executes independently via the PipelineExecutor
Pipeline steps can identify people, analyze scenes, reason about context, branch conditionally, wait and resume, and dispatch notifications
Outputs flow to any combination of channels: WebSocket push, Telegram, e-ink displays, TTS speakers, realtime voice prompts, Home Assistant services, and outbound webhooks

Key Capabilities

Capability	Description
14 pipeline step types	Person ID, scene analysis, object trend analysis, activity detection, activity session start/end, daily report, unified LLM call, wait, condition, verification, interactive prompt, notification, HA action
Person tracking	ArcFace face recognition + Home Assistant sensor fusion for whole-house location
Activity tracking	Detect and record activities (eating, sleeping, medication) for use in downstream rule context filters
Activity sessions	Duration-aware open/close sessions with automatic stale cleanup
Daily reports	End-of-day wellness scoring with LLM-enriched summaries
Interactive prompts	Ask users questions via popup or voice, wait for response, and branch based on their answer or timeout
Motion direction	Classify movement direction at doorways (left/right, towards/away)
Voice companion	Real-time conversations via Google Gemini Live with WebSocket audio
E-ink displays	Per-device notification images with template editor and automatic expiry
Multi-channel alerts	PWA popup, Telegram, e-ink, HA Speaker TTS, PWA TTS announcements, PWA Realtime AI, and outbound webhook delivery with escalation policies
MCP tool server	23 tools (22 read-only plus rule triggering and interactive response recording) for AI agent integration via Model Context Protocol
RBAC authentication	API keys, device keys, and fnmatch permission patterns
Tamil language support	Translation and voice interaction in Tamil

Technology Stack

Layer	Technology
Backend	Python 3.12, FastAPI, SQLAlchemy 2.0, Pydantic 2.0, APScheduler
Frontend	Vue 3, Vuetify 3, Vite, Pinia
Database	SQLite (WAL mode)
Vision LLM	Cosmos-Reason2-8B via vLLM
Logic LLM	Gemma3 4B via Ollama
Translation LLM	TranslateGemma-12B via vLLM
Voice	Google Gemini 2.5 Flash (Live API)
Face Recognition	InsightFace buffalo_l with ArcFace embeddings
Object Storage	MinIO (S3-compatible)
Logging	Python stdlib logging with key=value context

Next Steps

Quick Start: Install and run the system
Architecture: Deep dive into the system design
Composable Pipelines: Understand the pipeline step system
Development Setup: Set up a development environment

Introduction ​

The Problem ​

The Approach ​

How It Works ​

Key Capabilities ​

Technology Stack ​

Next Steps ​