使用深度学习框架提取视频硬字幕;docker容器免安装深度学习库,使用本地api接口使得界面和后端识别分离;

Overview

extract-video-subtittle

使用深度学习框架提取视频硬字幕;

本地识别无需联网;

CPU识别速度可观;

容器提供API接口;

运行环境

本项目运行环境非常好搭建,我做好了docker容器免安装各种深度学习包;

提供windows界面操作;

容器为CPU版本;

视频演示

https://www.bilibili.com/video/BV18Q4y1f774/

程序说明

1、先启动后端容器实例

docker run -d -p 6666:6666 m986883511/extract_subtitles

image-20210801214757813

2、启动程序

简单介绍页面

1:点击左边按钮连接第一步启动的容器;

2:视频提取字幕的总进度

3:当前视频帧显示的位置,就是视频进度条

4:识别出来的文字会在这里显示一下

image-20210801215010179

image-20210801215258761

3、点击选择视频确认字幕位置

点击选择视频按钮,这时你可以拖动进度条到有字幕的位置;然后点击选择字幕区域;在视频中画一个矩形;

image-20210801215258761

4、点击测试连接API

image-20210801220206554

后端没问题的话,会显示已连通;此时所有步骤准备就绪

5、开始识别

点击请先完成前几步按钮,内部分为这几个步骤

  1. 本地通过ffmpeg提取视频声音保存到temp目录(0%-10%)
  2. api通信将声音文件发送到容器内,容器内spleeter库提取声音中人声,结果保存在容器内temp目录,很耗时间,吃CPU和内存(10%-30)
  3. api通信,将人声根据停顿分片,返回分片结果,耗较短的时间(30%-40%)
  4. 根据说话分片时间开始识别字幕(40-%100%)

当100%的时候查看temp目录就生成了和视频同名的srt字幕文件

运行后台

后端接口容器地址Docker Hub

此过程可能时间较长,您需要预先安装好好docker,并配置好docker加速器,你可能需要先docker login

docker run -d -p 6666:6666 m986883511/extract_subtitles

本项目缺少文件

因网速墙的问题,大文件推送不上去,可以参考.gitignore中写的

其他

视频提取

# 视频片段提取
ffmpeg -ss 00:15:45 -t 00:02:15 -i test/three_body_3_7.mp4 -vcodec copy -acodec copy test/3body.mp4
# 打包界面程序
C:/Python/Python38-32/Scripts/pyinstaller.exe main.spec

参考资料

本项目中深度学习源代码为/docker/backend

原作者为:https://github.com/YaoFANGUK/video-subtitle-extractor

You might also like...
Comments
  • 提取人声一直没结果

    提取人声一直没结果

    image 视频是40多分钟的连续剧。CPU版本。之前用YaoFANGUK/video-subtitle-extractor提取字幕很成功也准确,但时间比较长。看到作者用音频分析减少了识别的帧数,所以试了一下。但在提取人声时,已经等待了近50分钟没有结果。而且CPU的占用只有1%左右,这明显不正常。用YaoFANGUK/video-subtitle-extractor整个的耗时可能都没有这么久。另外autosub也是提取音频来语音识别字幕,识别人声也很快,同样的视频几分钟就完了。麻烦作者看看是出了什么问题呢。

    opened by royzengyi 2
  • 项目咨询

    项目咨询

    Hello,我尝试了一下这个软件,感觉还是不错的,不过在实际使用中还是会有不少问题。

    我是一个独立开发者,这边愿意付费或者合作来完善一下,让这个项目更具实用性,不知道你有没有兴趣呢?

    没有找到联系方式,只好通过issue来试一下,你可以在看到之后删除,谢谢。

    我的邮箱是yedaxia#foxmail.com

    opened by YeDaxia 1
Releases(0.2.0)
Owner
歌者
失去人性,失去很多;失去兽性,失去一切;活着才能燃烧自己。
歌者
ICCV2021 Papers with Code

ICCV2021 Papers with Code

Amusi 1.4k Jan 02, 2023
A U-Net combined with a variational auto-encoder that is able to learn conditional distributions over semantic segmentations.

Probabilistic U-Net + **Update** + An improved Model (the Hierarchical Probabilistic U-Net) + LIDC crops is now available. See below. Re-implementatio

Simon Kohl 498 Dec 26, 2022
Face Recognize System on camera AI OAK1

FRS on OAK1 Face Recognize System on camera OAK1 This project contains our work that deploy on camera OAK1 Features Anti-Spoofing Face detection Face

Tran Anh Tuan 6 Aug 08, 2022
[ICCV2021] Safety-aware Motion Prediction with Unseen Vehicles for Autonomous Driving

Safety-aware Motion Prediction with Unseen Vehicles for Autonomous Driving Safety-aware Motion Prediction with Unseen Vehicles for Autonomous Driving

Xuanchi Ren 44 Dec 03, 2022
A python code to convert Keras pre-trained weights to Pytorch version

Weights_Keras_2_Pytorch 最近想在Pytorch项目里使用一下谷歌的NIMA,但是发现没有预训练好的pytorch权重,于是整理了一下将Keras预训练权重转为Pytorch的代码,目前是支持Keras的Conv2D, Dense, DepthwiseConv2D, Batch

Liu Hengyu 2 Dec 16, 2021
My published benchmark for a Kaggle Simulations Competition

Lux AI Working Title Bot Please refer to the Kaggle notebook for the comment section. The comment section contains my explanation on my code structure

Tong Hui Kang 29 Aug 22, 2022
Encode and decode text application

Text Encoder and Decoder Encode and decode text in many ways using this application! Encode in: ASCII85 Base85 Base64 Base32 Base16 Url MD5 Hash SHA-1

Alice 1 Feb 12, 2022
This is the dataset and code release of the OpenRooms Dataset.

This is the dataset and code release of the OpenRooms Dataset.

Visual Intelligence Lab of UCSD 95 Jan 08, 2023
JAXDL: JAX (Flax) Deep Learning Library

JAXDL: JAX (Flax) Deep Learning Library Simple and clean JAX/Flax deep learning algorithm implementations: Soft-Actor-Critic (arXiv:1812.05905) Transf

Patrick Hart 4 Nov 27, 2022
Advantage Actor Critic (A2C): jax + flax implementation

Advantage Actor Critic (A2C): jax + flax implementation Current version supports only environments with continious action spaces and was tested on muj

Andrey 3 Jan 23, 2022
Line-level Handwritten Text Recognition (HTR) system implemented with TensorFlow.

Line-level Handwritten Text Recognition with TensorFlow This model is an extended version of the Simple HTR system implemented by @Harald Scheidl and

Hoàng Tùng Lâm (Linus) 72 May 07, 2022
Post-Training Quantization for Vision transformers.

PTQ4ViT Post-Training Quantization Framework for Vision Transformers. We use the twin uniform quantization method to reduce the quantization error on

Zhihang Yuan 61 Dec 28, 2022
Mahadi-Now - This Is Pakistani Just Now Login Tools

PAKISTANI JUST NOW LOGIN TOOLS Install apt update apt upgrade apt install python

MAHADI HASAN AFRIDI 19 Apr 06, 2022
A library for preparing, training, and evaluating scalable deep learning hybrid recommender systems using PyTorch.

collie Collie is a library for preparing, training, and evaluating implicit deep learning hybrid recommender systems, named after the Border Collie do

ShopRunner 96 Dec 29, 2022
Scenic: A Jax Library for Computer Vision and Beyond

Scenic Scenic is a codebase with a focus on research around attention-based models for computer vision. Scenic has been successfully used to develop c

Google Research 1.6k Dec 27, 2022
This code is for our paper "VTGAN: Semi-supervised Retinal Image Synthesis and Disease Prediction using Vision Transformers"

ICCV Workshop 2021 VTGAN This code is for our paper "VTGAN: Semi-supervised Retinal Image Synthesis and Disease Prediction using Vision Transformers"

Sharif Amit Kamran 25 Dec 08, 2022
A real world application of a Recurrent Neural Network on a binary classification of time series data

What is this This is a real world application of a Recurrent Neural Network on a binary classification of time series data. This project includes data

Josep Maria Salvia Hornos 2 Jan 30, 2022
Workshop Materials Delivered on 28/02/2022

intro-to-cnn-p1 Repo for hosting workshop materials delivered on 28/02/2022 Questions you will answer in this workshop Learning Objectives What are co

Beginners Machine Learning 5 Feb 28, 2022
Python codes for Lite Audio-Visual Speech Enhancement.

Lite Audio-Visual Speech Enhancement (Interspeech 2020) Introduction This is the PyTorch implementation of Lite Audio-Visual Speech Enhancement (LAVSE

Shang-Yi Chuang 85 Dec 01, 2022
DeceFL: A Principled Decentralized Federated Learning Framework

DeceFL: A Principled Decentralized Federated Learning Framework This repository comprises codes that reproduce experiments in Ye, et al (2021), which

Huazhong Artificial Intelligence Lab (HAIL) 10 May 31, 2022