KeyFrame detection in python

I'm building a RAG system for a platform where the primary content consists of videos and slides. My approach involves extracting keyframes from videos using OpenCV

diff = cv2.absdiff(prev_image, curr_image)
gray_diff = cv2.cvtColor(diff, cv2.COLOR_BGR2GRAY)
mean_diff = cv2.mean(gray_diff)[0]

scene change detection. For each keyframe, I generate a caption and apply OCR to extract any embedded text.

However, a major issue I encounter is that many extracted frames are just shots of the speaker, which are not useful for my application. Since scene change detection works by comparing consecutive frames, it often captures moments where the speaker moves slightly, rather than focusing on slides, graphs, or other informative visuals.

I considered filtering out frames containing faces, but the problem is that many useful frames (e.g., slides, graphs) also contain the speaker’s face. Simply discarding frames with faces would result in losing valuable content.

How can I refine the keyframe extraction process to prioritize frames containing meaningful content, such as slides, graphs, or images, while filtering out those that primarily show the speaker?

asked Mar 24 at 15:40

Daniel

133 bronze badges

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

KeyFrame detection in python

0

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest