Due to recent world events, video calls have become the new norm for both
personal and professional remote communication. However, if a participant in a
video call is not careful, he/she can reveal his/her private information to
others in the call. In this paper, we design and evaluate an attack framework
to infer one type of such private information from the video stream of a call
— keystrokes, i.e., text typed during the call. We evaluate our video-based
keystroke inference framework using different experimental settings and
parameters, including different webcams, video resolutions, keyboards,
clothing, and backgrounds. Our relatively high keystroke inference accuracies
under commonly occurring and realistic settings highlight the need for
awareness and countermeasures against such attacks. Consequently, we also
propose and evaluate effective mitigation techniques that can automatically
protect users when they type during a video call.

