Researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) have developed a system named “RF-Diary” which can monitor the people through the walls and in total darkness. It will be able to create a textual description of people’s activities and interactions with objects in their homes using Radio Signals and Artificial Intelligence.
In case of the traditional video captioning, most people would have privacy concerns about deploying cameras throughout their homes.
This new RF-Diary system can caption the daily life by analyzing the privacy-preserving radio signal in the home with the home’s floormap.
Captioning is an important task in computer vision and natural language processing; it typically generates language descriptions of visual inputs such as images or videos. This RF-Diary research focuses on in-home daily-life captioning, that is, creating a system that observes people at home, and automatically generates a transcript of their everyday life. Such a system would help older people to age-in-place. Older people may have memory problems and some of them suffer from Alzheimer’s. They may forget whether they took their medications, brushed their teeth,slept enough, woke up at night, ate their meals, etc.
Daily life captioning enables a family caregiver, e.g., a daughter or son, to receive text updates about their parent’s daily life, allowing them to care for mom or dad even if they live away, and providing them peace of mind about the wellness and safety of their elderly parents. More generally, daily-life captioning can help people track and analyze their habits and routine at home, which can empower them to change bad habits and improve their life-style. But how do we caption people’s daily life? One option would be to deploy cameras at home, and run existing video-captioning models on the recorded videos. However,most people would have privacy concerns about deploying cameras at home, particularly in the bedroom and bathroom. Also, a single camera usually has a limited field of view; thus, users would need to deploy multiple cameras covering different rooms, which would introduce a significant overhead. Moreover, cameras do not work well in dark settings and occlusions, which are common scenarios at home. To address these limitations, the MIT researchers propose to use radio frequency (RF) signals for daily-life captioning. RF signals are more privacy-preserving than cameras since they are difficult to interpret by humans. Signals from a single RF device can traverse walls and occlusions and cover most of the home. Also, RF signals work in both bright and dark settings without performance degradation. Furthermore, the literature has shown that one can analyze the radio signals that bounce off people’s bodies to capture people’s movements , and track their 3D skeletons.

However, using RF signals also introduces new challenges. One challenge is, RF signals do not have enough information to differentiate objects, since many objects are partially or fully transparent to radio signals. Their wavelength is on the order of a few centimeters, whereas the wavelength of visible light is hundreds nanometer. Thus, it is also hard to capture the exact shape of objects using RF signals.
Another challenge is, currently, there is no training dataset that contains RF signals from people’s homes with the corresponding captions. Training a captioning system typically requires tens of thousands of labeled examples. However, collecting a new large captioning dataset with RF in people’s homes would be a daunting task.

The MIT researchers develop the RF-based in-home daily-life captioning model “RF-Diary” that addresses both challenges, i-e missing objects information and limited training data.
To capture objects information, besides RF signals, RF-Diary also takes as input the home floormap marked with the size and location of static objects like bed, sofa, TV, fridge, etc. Floormaps provide information about the surrounding environment, thus enabling the model to infer human interactions with
objects. Moreover, floormaps are easy to measure with a cheap laser-meter in less than 10minutes . Once measured, the floormap remains unchanged for potentially years, and can be used for all future daily-life captioning from that home. RF-Diary proposes an effective representation to integrate the information in the floormap with that in RF signals.

To deal with the limited availability of training data, the researchers propose a multi-modal feature alignment training scheme to leverage existing video-captioning datasets for training RF-Diary.
In comparison to images, RF signal is privacy-preserving because it is difficult to interpret by humans. However, one may also argue that since RF signals can track people though walls, they could create privacy concerns. This issue can be addressed through a challenge-response authentication protocol that prevent people from maliciously using RF signals to see areas that they are not authorized to access.
News Source: MIT CSAIL