Skip to content

Introduction

Cognitive Companion is a privacy-first, on-premise AI system designed for senior care in multigenerational households. It processes camera feeds and sensor data through composable rule-based pipelines, using vision and language models running entirely on local hardware, to deliver context-aware reminders and alerts.

The Problem

Seniors experiencing cognitive decline face a difficult tradeoff: full-time monitoring that strips away independence, or no monitoring at all. Existing solutions tend toward one extreme:

  • Basic motion sensors trigger too many false alarms and lack context awareness
  • Cloud-based AI cameras send private footage off-premises and require internet connectivity
  • Full automation systems remove the daily routines that maintain cognitive function

The Approach

Cognitive Companion takes a different path:

  1. Understand context, not just motion. Vision LLMs analyze what is happening in camera frames, not just whether something moved. A person standing in the kitchen at noon means something different than at 3 AM.

  2. Composable rules, not rigid triggers. Each rule defines its own pipeline of steps (person identification, vision analysis, logic reasoning, conditional branching, wait/resume) assembled in any order. No two rules need to follow the same pattern.

  3. Gentle reminders, not automation. The system suggests and reminds rather than acting autonomously. A lunch reminder is a reminder, not a robot bringing food. The goal is to preserve agency.

  4. Privacy by architecture. All AI inference runs on-premise via vLLM and Ollama. Camera frames are processed locally and stored in your own MinIO instance. Nothing leaves your network unless you explicitly configure an external notification channel.

How It Works

text
 Edge Devices                         AI Pipeline                              Outputs
 ───────────                         ───────────                              ───────

 reCamera ──┐                    ┌─► Person ID Service ──┐
            │    ┌────────────┐  │   (InsightFace/ArcFace) │
 reTerminal─┼──► │   Event    │──┤                         ├─► Rules Engine
            │    │ Aggregator │  │   ┌──────────────────┐  │   (context/deps/rate-limit)
 HA Sensors─┘    └────────────┘  ├─► │ Vision LLM       │  │        │
                   MinIO ◄───────┘   │ (Cosmos Reason2) │──┘        ▼
                  (media)            └──────────────────┘    ┌─────────────┐
                                           │                 │  Logic LLM  │
                                           ▼                 │  (Gemma3)   │
                                  ┌────────────────┐         └──────┬──────┘
                                  │ Translation    │                │
                                  │(TranslateGemma)|◄───────────────┘
                                  └────────┬───────┘

                ┌──────────────────────────┼──────────────────────────┐
                ▼              ▼           ▼           ▼              ▼
           WebSocket      Telegram     eInk Display   TTS      Home Assistant
           (frontend)     (caregiver)  (reTerminal)  (speaker) (actions + announce)

Event flow:

  1. Edge devices (cameras, sensors) send data to the backend
  2. The Event Aggregator batches frames by sensor with configurable windowing and cooldown
  3. The Rules Engine matches events against rules using context filters, dependencies, and rate limits
  4. Each matching rule's composable pipeline executes independently via the PipelineExecutor
  5. Pipeline steps can identify people, analyze scenes, reason about context, branch conditionally, wait and resume, and dispatch notifications
  6. Outputs flow to any combination of channels: WebSocket push, Telegram, e-ink displays, TTS speakers, realtime voice prompts, Home Assistant services, and outbound webhooks

Key Capabilities

CapabilityDescription
14 pipeline step typesPerson ID, scene analysis, object trend analysis, activity detection, activity session start/end, daily report, unified LLM call, wait, condition, verification, interactive prompt, notification, HA action
Person trackingArcFace face recognition + Home Assistant sensor fusion for whole-house location
Activity trackingDetect and record activities (eating, sleeping, medication) for use in downstream rule context filters
Activity sessionsDuration-aware open/close sessions with automatic stale cleanup
Daily reportsEnd-of-day wellness scoring with LLM-enriched summaries
Interactive promptsAsk users questions via popup or voice, wait for response, and branch based on their answer or timeout
Motion directionClassify movement direction at doorways (left/right, towards/away)
Voice companionReal-time conversations via Google Gemini Live with WebSocket audio
E-ink displaysPer-device notification images with template editor and automatic expiry
Multi-channel alertsPWA popup, Telegram, e-ink, HA Speaker TTS, PWA TTS announcements, PWA Realtime AI, and outbound webhook delivery with escalation policies
MCP tool server23 tools (22 read-only plus rule triggering and interactive response recording) for AI agent integration via Model Context Protocol
RBAC authenticationAPI keys, device keys, and fnmatch permission patterns
Tamil language supportTranslation and voice interaction in Tamil

Technology Stack

LayerTechnology
BackendPython 3.12, FastAPI, SQLAlchemy 2.0, Pydantic 2.0, APScheduler
FrontendVue 3, Vuetify 3, Vite, Pinia
DatabaseSQLite (WAL mode)
Vision LLMCosmos-Reason2-8B via vLLM
Logic LLMGemma3 4B via Ollama
Translation LLMTranslateGemma-12B via vLLM
VoiceGoogle Gemini 2.5 Flash (Live API)
Face RecognitionInsightFace buffalo_l with ArcFace embeddings
Object StorageMinIO (S3-compatible)
LoggingPython stdlib logging with key=value context

Next Steps

Released under the AGPL-3.0 License.