Presence detection, without compromising privacy

·Clayton Craft

I use Home Assistant (HA) to control my WiFi-enabled thermostat, which in turn is walled off from the Internet. So no matter how "smart" my thermostat wants to be, it's forced to live in isolation and follow orders from HA. For the past year, I've been trying out crude methods for detecting when humans are home, or not, so that the HVAC can be set accordingly. The last attempt required using the "ping" integration in HA. You basically provide a list of IPs, add users that are associated with the IP / device tracker for it, then you can conditionally run automation based on the status of those things.

This has worked pretty well, however there are some pain points that have been really digging at me:

  1. Setting this up requires using static DHCP for devices on the network that I want to associate with someone being home, like a cell phone. This is tricky since some phones use random MACs, and it's cumbersome to add new people to the "who could be home?" list.

  2. If people come over, I don't want the HVAC to shut off if the regular inhabitants leave. E.g. if my partner and I have to leave the house, and my mother-in-law stays at home, I'll get in trouble if the heat shuts off because HA thinks everyone is gone. Don't ask me how I know.

  3. Setting this all up requires basically configuring HA to track inhabitants and guests in my home. Even though I've gone through great lengths to try to prevent HA from sending any data externally, I still don't like that this data is being stored anywhere. I actually don't care if my partner or some particular guest is home and I'm not, it's easy enough to ask if I suddenly do. But seeing that someone is home is fine. I've heard that some folks use the "cloud" to manage HA and its data. Ouch.

So I present my latest attempt to alleviate those 3 things as much as possible:

  • Monitor IP ranges and detect if any devices in those ranges are "online" (responding to ping)
  • Use DHCP to "put" devices that humans carry or use when they are home into the appropriate monitored IP range
  • HA can use this to determine if any humans are connected (or not), and thus at home (or not)
  • Guests who are on WiFi are counted as "home" by HA without me having to do anything more than just give them the guest WiFi password (which I would have done previously anyways).
  • HA doesn't have any information about specific users and their home/not home status. And it doesn't care if devices use a consistent MAC or not.

All three pain points above are addressed... although maybe not 100% solved in every situation. But it still seems like an improvement.

At the heart of this is a script that runs ping in parallel on an arbitrary number of IP addresses given to it. I'm sure it could be improved... Like, I doubt this would turn out well if the list gets too large. And the run time of the script can be kinda long if there are no devices and it ends up retrying the max number of times. The retry is there because I've seen that some devices might not respond to ping immediately (they were sleeping, or had a WiFi reconnect event at the wrong time, etc...) So if you have any ideas, let me know!

#!/bin/bash
#
# Pings the given addresses and prints "up" if any of
# them respond to the ping, else it prints "down". A
# non-zero return value is an error.
# A simple backoff delay is used between attempts to
# ping the group when none of them responded to the
# last attempt.

function pings(){
    local PIDS=()
    local STATUS=()
    local IP=( "$@" )

    for ip in "${IP[@]}"; do
        ping  -c1 -W1 "$ip" &>/dev/null &
        PIDS+=($!)
    done

    i=0
    for p in "${PIDS[@]}"; do
        wait "${p}"
        STATUS[i]=$?
        ((i+=1))
    done

    for s in "${STATUS[@]}"; do
        if [[ $s -eq 0 ]]; then
            return 0
        fi
    done

    return 1
}

if [ -z "$1" ]; then
    echo "Parameter required!!"
    exit 1
fi

ADDRS=( "$@" )
try_interval=1  # starting interval, before backoff
tries=0
max_tries=4

while ! pings "${ADDRS[@]}"; do
    if [[ $tries -ge $max_tries ]]; then
        echo down
        exit 0
    fi

    sleep "$try_interval"
    ((tries+=1))
    ((try_interval*=2))
done

echo up
exit 0

This script should be saved in some directory that HA can access. Next, we need to actually run the script in a way that HA can consume the output directly and use it. A binary sensor seemed like the best choice:

binary_sensor:
  - platform: command_line
    name: 'network_users'
    device_class: presence
    scan_interval: 30 # seconds
    command_timeout: 30 # seconds
    command: 'bash -c "tools/mass_ping.sh 192.168.20.{100..150} 192.168.40.37"'
    payload_on: 'up'
    payload_off: 'down'

The parameters to that script can be ranges that the shell can expand, so you don't have to list out every IP. Triggering every 30 seconds may be too often, I may back that off later to be a minute or more... so I'll run this for a bit to see if I start to form any opinions about changing it.

Next we need to create a "device tracker" so that this can be associated with some "user" in HA for presence detection. I used device_tracker.see in a scheduled automation to add and update a device tracker thing in HA specifically for network users:

- alias: Update presence tracking sensors
  description: This is done periodically so that device trackers for each sensor have
    updated values after HA startup / config reload
  trigger:
  - platform: time_pattern
    minutes: /1
  condition: []
  action:
  - parallel:
    - service: device_tracker.see
      data:
        dev_id: network_users
        location_name: '{{ "home" if is_state("binary_sensor.network_users", "on")
          else "not_home" }}'
      alias: Update Network Users Tracker Status
    alias: Update device trackers
    
    # .... other binary sensors used for determining presence can be listed here too!
  mode: single

I set this automation to fire every 1 minute, so that the current status is "known" to HA soon after it starts up, or config is reloaded. Maybe 1 minute is "too often", but it's just reading a value from the HA database, so, meh. Basically, I've learned that triggering automations based on state changes is flaky, and it's often more reliable to trigger some automations on a schedule and then check entity states in the condition or action.

Lastly, there needs to be a "person" in HA to associate this device tracker with. I created a generic "person" and associated the tracker with it. This can be done through Settings->People->Add Person. I called the fake person I made for this "Pamela", but the name doesn't matter.

The presence status of this person can now be used in automation and/or dashboard cards:

Home Assistant Card showing that someone is connected to the network, and HVAC is enabled by automation
Home Assistant Card showing that someone is connected to the network, and HVAC is enabled by automation

Now, this obviously isn't perfect... One of the issues is that this can lead to the HVAC being left on if someone forgets a phone at home, for example. But that could have happened over the last year with the old implementation, and I've found that 1) it's rare, and 2) the savings from having a "smarter" HVAC outweighs the handful of situations where it's being wasteful. Also, this could be circumvented if some smart guest assigns a static IP outside any expected range I pass to the script above. But if they do that, then the joke is on them since they'll just slowly freeze as the HVAC is shut off because HA thinks no one is home. I do have some other presence detection things that help out too (like IR motion detectors), that can help a little though.

With all of that, HA can tell if someone is home if they're connected to a network, and have an IP from DHCP (or otherwise) that's being monitored. And HA is completely oblivious to any specific people. Again, that's mostly just a personal concern and probably not very existential for my setup, but maybe this will be helpful for folks who do use some external thing for managing HA data, and are concerned about how location status for household members is handled.