Researchers from MIT CSAIL Introduce ‘Privid’: an AI Tool, Build on Differential Privacy, to Guarantee Privacy in Video Footage from Surveillance Cameras

This research summary article is based on the paper 'Privid: Practical, Privacy-Preserving Video Analytics Queries' and MIT article 'Security tool guarantees privacy in surveillance footage'

Surveillance cameras have an identity crisis exacerbated by a conflict between function and privacy. Machine learning techniques have automated video content analysis on a vast scale as these sophisticated small sensors have shown up seemingly everywhere. Still, with increased mass monitoring, there are currently no legally enforceable standards to curb privacy invasions.

Security cameras have evolved into wiser and more capable tools than the grainy images of the past, which were frequently used as the “hero tool” in crime dramas. Video surveillance can now assist health regulators in determining the percentage of persons using masks, transportation departments in monitoring the density and flow of automobiles, cyclists and walkers, and businesses in gaining a better understanding of buying habits. But why has privacy remained a second-class citizen?


Currently, the footage is retrofitted with blurred faces or black boxes. This prevents analysts from asking some legitimate questions (for example, are people wearing masks? ). Dissatisfied with the present status quo, MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) developed a system with other institutions to better guarantee privacy in surveillance video footage. The system, dubbed “Privid,” allows analysts to input video data searches and then adds a tiny amount of noise (additional data) to the result to ensure that no one can be identified. The method is based on a formal notion of privacy known as “differential privacy,” which permits without having access to aggregate statistics about private data disclosing individually identifying information.

Analysts usually have full access to the video and may do whatever they want, but Privid ensures that the video isn’t a free buffet. Honest analysts can obtain access to the data they need, but that access is limited enough that the wrong analysts won’t be able to do too much with it. Privid achieves this by breaking the video into small chunks and running processing code over each chunk rather than running the code over the entire video in one shot. The segments are aggregated rather than receiving results from each component individually, and additional noise is introduced. (There’s also information on the error bound on the result β€” perhaps a 2% error margin, given the additional noisy data.)

The code might, for example, output the number of persons seen in each video chunk, with the aggregate being the “sum” to count the overall number of people wearing face coverings or the “average” to estimate crowd density. Privid enables analysts to create deep neural networks, which are now widely used in video analytics. This allows analysts to ask questions that Privid’s designers hadn’t considered.

Privid Query Structure



“We’ve reached a point where cameras are everywhere. “You can imagine that entity building an exact timeline of when and where a person has gone if there’s a camera on every street corner, every place you go, and if someone could process all of those videos in aggregate,” says MIT CSAIL Ph.D. student Frank Cangialosi, the lead author on a paper about Privid. “With GPS, people are already concerned about their location privacy – aggregate video data might capture not only one’s location history, but also moods, activities, and more at each site.” Privid introduces a new concept called “duration-based privacy,” which separates the definition of privacy from its enforcement. With obfuscation, if the privacy goal is to protect everyone, the enforcement mechanism will have to perform some work to locate the persons it is supposed to protect, which it may or may not be able to do flawlessly.

Assume there is a video of a street. Alice and Bob, two analysts, claim that they want to count the number of persons passing by each hour, so they can submit a video processing module and request a sum aggregation. The city planning department is the first analyzer and wants to use this data to understand footfall patterns better and develop sidewalks for the city. The model counts the number of persons in each video piece and outputs that number.

Let’s assume that the other analyst is shady and wants to recognize “Charlie” every time he goes by the camera. The model developed looks for Charlie’s face, and if Charlie is present (i.e., the “signal” they’re seeking to extract), it outputs a huge number; else, it outputs zero. If Charlie is present, they hope that the total will not be zero.

From Privid’s point of view, these two queries appear to be the same. It’s difficult to tell what the above models are doing internally, or what the analyst intends to do with the data. Privid runs both requests and adds the same amount of noise to both of them. This is when the commotion begins. In the first scenario, because Alice counted everyone, the noise will have a minor impact on the outcome, but it will most likely have no impact on the usefulness.

Because Bob was seeking a specific signal (Charlie was only visible for a few chunks) in the second scenario. The noise was enough to prevent the analysts from knowing whether or not Charlie was present. If a non-zero result was obtained, it’s possible that Charlie was present or that the model outputs “zero,” but the noise caused it to be non-zero. Privid didn’t need to know when or where Charlie appeared; all it needed was a rough upper bound on how long Charlie would appear, which is easier to express than determining actual locations, which is what previous techniques rely on.



βœ… [Featured Tool] Check out Taipy Enterprise Edition