DEEP LEARNING-BASED VIDEO CAPTIONING

K. Annapoorneshwari Shetty, Mithisha K R, Thrisha P R

doi:10.25215/8194288797.48

Authors

K. Annapoorneshwari Shetty, Mithisha K R, Thrisha P R

DOI:

https://doi.org/10.25215/8194288797.48

Abstract

Making detailed text descriptions from videos is a difficult task that mixes computer vision with understanding human language. This study introduces a full system that changes video content into easy-to-understand natural language descriptions by using a mix of deep learning techniques. The system uses a pre-trained ResNet-50 model to get spatial features from chosen video frames and a Transformer-based decoder to create smooth captions. To show practical use, we built a web- based interface with Streamlit that allows for real-time video processing and caption creation. This design works well with both real video data and content made by computers, which helps solve problems that come from not having enough data when teaching deep learning models. Our evaluation confirms that this approach strikes a good balance between description accuracy and processing needs, producing relevant captions while running effectively on standard hardware.

DEEP LEARNING-BASED VIDEO CAPTIONING

Authors

DOI:

Abstract

Published

Issue

Section

License