Tianhong Li, Vibhaalakshmi Sivaraman, Pantea Karimi, Lijie Fan, Mohammad Alizadeh, and Dina Katabi. (2024). Reparo: Loss-Resilient Generative Codec for Video Conferencing.
@unpublished{Reparo,
author = {Li, Tianhong and Sivaraman, Vibhaalakshmi and Karimi, Pantea and Fan, Lijie and Alizadeh, Mohammad and Katabi, Dina},
title = {Reparo: Loss-Resilient Generative Codec for Video Conferencing},
journal = {},
doi = {},
year = {2024},
file = {Reparo.pdf}
}
Packet loss during video conferencing often leads to poor quality and video freezing. Attempting to retransmit lost packets is often impractical due to the need for real-time playback. Employing Forward Error Correction (FEC) for recovering the lost packets is challenging as it is difficult to determine the appropriate redundancy level. To address these issues, we introduce Reparo — a loss-resilient video conferencing framework based on generative deep learning models.
Our approach involves generating missing information when a frame or part of a frame is lost. This generation is conditioned on the data received thus far, taking into account the model’s understanding of how people and objects appear and interact within the visual realm. Experimental results, using publicly available video conferencing datasets, demonstrate that Reparo outperforms state-of-the-art FEC-based video conferencing solutions in terms of both video quality (measured through PSNR, SSIM, and LPIPS) and the occurrence of video freezes
Pantea Karimi, Sadjad Fouladi, Vibhaalakshmi Sivaraman, and Mohammad Alizadeh. (2023). Vidaptive: Efficient and Responsive Rate Control for Real-Time Video on Variable Networks. In arXiv:2309.16869.
@unpublished{Vidaptive,
author = {Karimi, Pantea and Fouladi, Sadjad and Sivaraman, Vibhaalakshmi and Alizadeh, Mohammad},
title = {Vidaptive: Efficient and Responsive Rate Control for Real-Time Video on Variable Networks},
journal = {arXiv:2309.16869},
doi = {2309.16869},
year = {2023},
file = {Vidaptive.pdf}
}
Real-time video streaming relies on rate control mechanisms to adapt video bitrate to network capacity while maintaining high utilization and low delay. However, the current video rate controllers, such as Google Congestion Control (GCC) in WebRTC, are very slow to respond to network changes, leading to link under-utilization and latency spikes. While recent delay-based congestion control algorithms promise high efficiency and rapid adaptation to variable conditions, low-latency video applications have been unable to adopt these schemes due to the intertwined relationship between video encoders and rate control in current systems.
This paper introduces Vidaptive, a new rate control mechanism designed for low-latency video applications. Vidaptive decouples packet transmission decisions from encoder output, injecting dummy padding traffic as needed to treat video streams akin to backlogged flows controlled by a delay-based congestion controller. Vidaptive then adapts the frame rate, resolution, and target bitrate of the encoder to align the video bitrate with the congestion controller’s sending rate. Our evaluations atop WebRTC show that, across a set of cellular traces, Vidaptive achieves 2x higher video bitrate and 1.6 dB higher PSNR, and it reduces 95th-percentile frame latency by 2.7s with a slight increase in median frame latency.
Karimi Pantea. (2019). Location Verification using Latencies and Claimed Coordinates on the Blockchain. In EPFL Internship Report.
@unpublished{EPFL_Report,
author = {Pantea, Karimi},
title = {Location Verification using Latencies and Claimed Coordinates on the Blockchain},
journal = {EPFL Internship Report},
year = {2019},
file = {EPFL_Report.pdf}
}
In many applications such as processing of transactions, speed is an important factor. In self-organizing communities without relying on a central party, the processing of transactions can be performed by a set of validators. To reach higher speed in transaction validation, a Trust-but-Verify approach can be taken. For example in paying for the daily purchase in the local supermarket with cryptocurrency, a user can rely on the local consensus and enjoy the fast processing of the transaction, but can still verify the global state to make sure at the end that the transaction was also verified in the global consensus and was not a double-spend. However, the user has a choice not to trust the local consensus and wait longer for the global consensus. In the Trus-but-Verify approach, nearby validators can start processing the transactions and provide a fast, weak, and temporary proof, while the global verification is being computed by all the validators in the system. To find all the nearby validators for local consensus, each validator should know its latencies to all the other validators in the system. The primary goal of this project is to devise a scalable fault-tolerant algorithm to estimate the pair- wise latencies among all the nodes of the system. The challenge is that some nodes are malicious and behave arbitrarily, including attempts to mislead others about their latencies.
Publications
Pantea Karimi*, Solal Pirelli*, Siva Kesava Reddy Kakarla, Ryan Beckett, Santiago Segarra, Beibin Li, Pooria Namyar, and Behnaz Arzani. (2024). Towards Safer Heuristics With XPlain. In Proceedings of the 23rd ACM Workshop on Hot Topics in Networks (pp. 68–76).
@published{Xplain,
title = {Towards Safer Heuristics With XPlain},
author = {Karimi*, Pantea and Pirelli*, Solal and Kakarla, Siva Kesava Reddy and Beckett, Ryan and Segarra, Santiago and Li, Beibin and Namyar, Pooria and Arzani, Behnaz},
journal = {Proceedings of the 23rd ACM Workshop on Hot Topics in Networks},
pages = {68--76},
year = {2024},
doi = {https://dl.acm.org/doi/abs/10.1145/3696348.3696884},
file = {Xplain.pdf}
}
Many problems that cloud operators solve are computationally expensive, and operators often use heuristic algorithms (that are faster and scale better than optimal) to solve them more efficiently. Heuristic analyzers enable operators to find when and by how much their heuristics underperform. However, these tools do not provide enough detail for operators to mitigate the heuristic’s impact in practice: they only discover a single input instance that causes the heuristic to underperform (and not the full set), and they do not explain why.
We propose XPlain, a tool that extends these analyzers and helps operators understand when and why their heuristics underperform. We present promising initial results that show such an extension is viable.
Pantea Karimi. (2023). Bridging the Gap Between Real-time Video and Backlogged Traffic Congestion Control. In Massachusetts Institute of Technology.
@published{Dumbo,
author = {Karimi, Pantea},
title = {Bridging the Gap Between Real-time Video and Backlogged Traffic Congestion Control},
journal = {Massachusetts Institute of Technology},
doi = {https://hdl.handle.net/1721.1/151675},
year = {2023},
file = {Dumbo.pdf}
}
Real-time video applications, such as video conferencing, have become essential to our daily lives, and ensuring reliable and high-quality video delivery in the face of network fluctuation and resource constraints is critical. However, video congestion control algorithms have been criticized for their sub-optimal performance in managing network congestion and maintaining satisfactory video quality and latency. At the same time, state-of-the-art congestion control algorithms have demonstrated remarkable performance improvements, effectively addressing network congestion challenges and enhancing the overall quality of data transmission. In this work, we first demonstrate why there is such a gap between the performance of congestion control schemes on backlogged flows compared to real-time video streams. Second, we present Dumbo, a design for reshaping the video traffic to look like backlogged traffic, thus enabling state-of-the-art delay-sensitive congestion control algorithms for real-time video. We implemented Dumbo atop WebRTC and evaluated it on emulated network conditions using real-world cellular network traces. Our results show that Dumbo in comparison with GCC achieves a 1.5 dB improvement in PSNR, 1.6 dB improvement in SSIM, 100 ms lower frame latency, 35x faster convergence time, 16% increase in the video bitrate, 32% increase in network utilization, and 4x reduction in the network queueing delay.
Vibhaalakshmi Sivaraman, Pantea Karimi, Vedantha Venkatapathy, Mehrdad Khani, Sadjad Fouladi, Mohammad Alizadeh, Frédo Durand, and Vivienne Sze. (2022). Gemino: Practical and Robust Neural Compression for Video Conferencing. In USENIX Symposium on Networked Systems Design and Implementation 2024.
@published{Gemino,
author = {Sivaraman, Vibhaalakshmi and Karimi, Pantea and Venkatapathy, Vedantha and Khani, Mehrdad and Fouladi, Sadjad and Alizadeh, Mohammad and Durand, Frédo and Sze, Vivienne},
title = {Gemino: Practical and Robust Neural Compression for Video Conferencing},
journal = {USENIX Symposium on Networked Systems Design and Implementation 2024},
doi = {https://www.usenix.org/conference/nsdi24},
git = {https://github.com/geminovc},
year = {2022},
file = {Gemino.pdf}
}
Video conferencing systems suffer from poor user experience when network conditions deteriorate because current video codecs simply cannot operate at extremely low bitrates. Recently, several neural alternatives have been proposed that reconstruct talking head videos at very low bitrates using sparse representations of each frame such as facial landmark information. However, these approaches produce poor reconstructions in scenarios with major movement or occlusions over the course of a call, and do not scale to higher resolutions. We design Gemino, a new neural compression system for video conferencing based on a novel high-frequency-conditional super-resolution pipeline. Gemino upsamples a very low-resolution version of each target frame while enhancing high-frequency details (e.g., skin texture, hair, etc.) based on information extracted from a single high-resolution reference image. We use a multi-scale architecture that runs different components of the model at different resolutions, allowing it to scale to resolutions comparable to 720p, and we personalize the model to learn specific details of each person, achieving much better fidelity at low bitrates. We implement Gemino atop aiortc, an open-source Python implementation of WebRTC, and show that it operates on 1024x1024 videos in real-time on a Titan X GPU, and achieves 2.2-5x lower bitrate than traditional video codecs for the same perceptual quality.
Workshops
Towards Safer Heuristics With XPlain. (2024). In Microsoft Research, Redmond.
@workshop{Xplain_Poster,
presenter = {Pantea, Karimi},
title = {Towards Safer Heuristics With XPlain},
journal = {Microsoft Research, Redmond},
year = {2024},
file = {}
}
Vidaptive: Efficient and Responsive Rate Control for Real-Time Video on Variable Networks. (2023). In GW6 Research Summit, MIT.
@workshop{Vidaptive_Poster,
presenter = {Pantea, Karimi},
title = {Vidaptive: Efficient and Responsive Rate Control for Real-Time Video on Variable Networks},
journal = {GW6 Research Summit, MIT},
year = {2023},
file = {Vidaptive_Poster.pdf}
}
Gemino: Practical and Robust Neural Compression for Video Conferencing. (2023). In MIT AI Hardware Program Symposium.
@workshop{Gemino_Poster,
presenter = {Pantea, Karimi},
title = {Gemino: Practical and Robust Neural Compression for Video Conferencing},
journal = {MIT AI Hardware Program Symposium},
year = {2023},
file = {Gemino_Poster.pdf}
}
Hands on Blockchain Workshop. (2018). In Sharif University of Technology.
@workshop{Blochchain_Poster,
organizer = {Pantea, Karimi},
title = {Hands on Blockchain Workshop},
journal = {Sharif University of Technology},
year = {2018}
}