HomeSense: Fall Detection with Computer Vision

The problem

Falls are the leading cause of injury-related death in adults over 65. Yet most home-monitoring solutions either require the person to actively press a button, or rely on wearable sensors that are forgotten, uncharged, or simply not worn. I wanted a system that works passively — no wristband, no button, just a camera watching a room.

This post covers the fall-detection node I built for HomeSense, a collaborative home-automation platform my team developed in SYSC 3010 (Systems Project). My teammates owned the sensor grid, the controller Pi, and the web dashboard. I owned detection.

System overview

HomeSense is a distributed system: a network of Raspberry Pi nodes communicate over sockets through a central controller. My node’s only job is to consume a video stream, decide whether someone has fallen, and emit a fall_detected event when it’s sure.

Fall detection pipeline — YOLO pose runs on the OAK-D, everything else on the host Raspberry Pi.

The key design choice was putting YOLO pose inference directly on the camera (an OAK-D Lite). The OAK-D has an onboard Myriad X VPU that can run small neural nets at 30 fps without touching the Pi’s CPU. That freed the host to focus on the stateful logic — EMA smoothing, angle computation, and the state machine.

Skeleton → posture label

YOLO outputs 17 COCO keypoints per detected person. I only care about four of them:

Keypoint	Index	Body part
Left shoulder	5	top of torso
Right shoulder	6	top of torso
Left hip	11	bottom of torso
Right hip	12	bottom of torso

The torso vector runs from the midpoint of the hips to the midpoint of the shoulders. Its angle from vertical tells me orientation:

def compute_torso_and_label(
    kps: list[list[float]],
    upright_thresh: float = 40.0,
    horiz_thresh: float = 55.0,
) -> tuple[float, str]:
    """
    Return (torso_angle_deg, label) where label is one of
    UPRIGHT / TRANSITION / HORIZONTAL.
    """
    ls, rs = kps[5], kps[6]   # left/right shoulder
    lh, rh = kps[11], kps[12] # left/right hip

    mid_shoulder = ((ls[0] + rs[0]) / 2, (ls[1] + rs[1]) / 2)
    mid_hip      = ((lh[0] + rh[0]) / 2, (lh[1] + rh[1]) / 2)

    dx = mid_shoulder[0] - mid_hip[0]
    dy = mid_shoulder[1] - mid_hip[1]          # positive → down in image
    angle = abs(math.degrees(math.atan2(dx, -dy)))  # 0° = upright

    if angle < upright_thresh:
        label = "UPRIGHT"
    elif angle > horiz_thresh:
        label = "HORIZONTAL"
    else:
        label = "TRANSITION"

    return angle, label

EMA smoothing

Raw keypoint coordinates from YOLO are jittery frame-to-frame. Before computing the torso angle I pass each keypoint through an Exponential Moving Average filter:

ALPHA = 0.4   # higher = more responsive, lower = smoother

def ema_smooth(
    new_kps: list[list[float]],
    prev_kps: list[list[float]] | None,
) -> list[list[float]]:
    if prev_kps is None:
        return new_kps
    return [
        [ALPHA * n[i] + (1 - ALPHA) * p[i] for i in range(len(n))]
        for n, p in zip(new_kps, prev_kps)
    ]

α = 0.4 was chosen empirically — low enough to suppress single-frame noise, high enough that a genuine fall (which happens in ~0.5 s) still registers within two or three frames.

Detecting the fall event

A torso angle alone isn’t enough. Someone lying down to read a book has a horizontal torso too. I need to detect the transition — a rapid drop in keypoint height.

def compute_rapid_drop_and_down_persist(
    history: deque[tuple[float, str]],   # (angle, label) ring buffer
    drop_window: int = 3,
    drop_threshold: float = 40.0,
    persist_frames: int = 25,
) -> tuple[bool, bool]:
    """
    rapid_drop   → hip midpoint fell > drop_threshold px in drop_window frames
    down_persist → person has been HORIZONTAL for >= persist_frames consecutive frames
    """
    angles = [h[0] for h in history]
    labels = [h[1] for h in history]

    # rapid_drop: large angle increase over a short window
    if len(angles) >= drop_window:
        rapid_drop = (angles[-1] - angles[-drop_window]) > drop_threshold
    else:
        rapid_drop = False

    # down_persist: trailing frames are all HORIZONTAL
    trailing = labels[-persist_frames:] if len(labels) >= persist_frames else labels
    down_persist = len(trailing) == persist_frames and all(l == "HORIZONTAL" for l in trailing)

    return rapid_drop, down_persist

The state machine

Three pieces of evidence combine in a simple state machine:

rapid_drop — hip keypoints fell sharply → enter CANDIDATE
down_persist — stayed horizontal for 25+ frames → enter CONFIRMED
Depth — estimated floor distance < 0.7 m → strengthens CONFIRMED

A 30-second cooldown after each alert prevents a single fall from spamming the dashboard.

Interactive — Fall State Machine

Click the event buttons to step through the detection states.

Normal upright posture

Candidate rapid_drop detected

Confirmed fall alert fired

→ System started. Monitoring posture…

Torso angle

—

Posture label

UPRIGHT

Alerts fired

The transitions map directly to code:

if state == "NORMAL" and rapid_drop:
    state = "CANDIDATE"

elif state == "CANDIDATE":
    if down_persist and depth_ok:
        state = "CONFIRMED"
        emit_fall_alert()          # Socket.IO → controller Pi
        cooldown_until = time.time() + 30
    elif not rapid_drop:           # person caught themselves
        state = "NORMAL"

elif state == "CONFIRMED":
    if time.time() > cooldown_until:
        state = "NORMAL"

DepthAI pipeline setup

The OAK-D pipeline is configured once at startup. The camera feeds frames directly into the neural-net node — no round-trip to the Pi host — and the detections come back over a XLink output queue.

def build_pipeline() -> dai.Pipeline:
    pipeline = dai.Pipeline()

    # RGB camera
    cam = pipeline.create(dai.node.Camera)
    cam.setFps(30)
    cam_out = cam.requestOutput((640, 640), dai.ImgFrame.Type.BGR888p)

    # YOLO pose model (compiled for Myriad X)
    nn = pipeline.create(dai.node.NeuralNetwork)
    nn.setBlobPath(Path("models/yolo_pose.blob"))
    cam_out.link(nn.input)

    # XLink output — detections stream back to the host
    xout = pipeline.create(dai.node.XLinkOut)
    xout.setStreamName("detections")
    nn.out.link(xout.input)

    # Stereo depth
    left  = pipeline.create(dai.node.Camera)
    right = pipeline.create(dai.node.Camera)
    left.setBoardSocket(dai.CameraBoardSocket.CAM_B)
    right.setBoardSocket(dai.CameraBoardSocket.CAM_C)

    stereo = pipeline.create(dai.node.StereoDepth)
    stereo.setDefaultProfilePreset(dai.node.StereoDepth.PresetMode.HIGH_DENSITY)
    left.requestOutput((1280, 720)).link(stereo.left)
    right.requestOutput((1280, 720)).link(stereo.right)

    depth_out = pipeline.create(dai.node.XLinkOut)
    depth_out.setStreamName("depth")
    stereo.depth.link(depth_out.input)

    return pipeline

Results

Testing in the lab with a crash mat, the detector achieved:

True positive rate: 94 % across 50 staged falls (various directions, speeds)
False positive rate: < 2 false alerts per hour of normal activity (sitting, bending, reaching)
Latency: median 1.1 s from fall to Socket.IO event (dominated by the 25-frame down_persist window at 30 fps ≈ 0.83 s + network)

The biggest failure mode was slow, controlled descents — someone carefully lowering themselves to the floor doesn’t trigger rapid_drop. That’s intentional: a slow, deliberate motion is not a fall. The tradeoff is that a person who faints slowly while holding a wall could be missed.

What I learned

DepthAI’s on-device inference is genuinely powerful. Running YOLO at 30 fps without touching the Pi CPU meant the host Pi stayed cool and responsive even when my roommate was running the dashboard on the same device.

State machines beat thresholds. My first prototype used a single angle threshold and generated constant false positives. The two-stage rapid_drop → down_persist design dropped the false positive rate by an order of magnitude.

EMA α matters more than you’d think. With α = 0.8, a stumble could look like a fall. With α = 0.2, an actual fall took too many frames to register. Spending time on this parameter paid off.